Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MAINT: Transpose arrays in fx artifact for better compression #2646

Merged
merged 5 commits into from
Feb 6, 2020

Conversation

peterhbromley
Copy link
Contributor

No description provided.

@coveralls
Copy link

coveralls commented Feb 5, 2020

Coverage Status

Coverage decreased (-0.003%) to 88.274% when pulling a560fd9 on transpose-fx-data into 8195f3a on master.

Copy link
Contributor

@ssanderson ssanderson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@peterhbromley couple comment.



class HDF5FXRateWriter(object):
"""Writer class for HDF5 files consumed by HDF5FXRateReader.
"""
def __init__(self, group):
def __init__(self, group, date_chunk_size):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would probably be reasonable to give this a default value so that zipline users don't need to do the same tuning work that we did on this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the default as a global defined at the beginning of the file.

buf[:-1],
np.s_[slice_begin:slice_end],
)
buf[:, :-1] = dataset[:, slice_begin:slice_end]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're not using read_direct anymore, we might as well allocate both an extra row and an extra column and use the extra row/column trick for handling both cases rather than using it for just one of them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

# row. When we then apply the row index to permute the raw data into
# the correct order, any rows with values of -1 will pull from the
# extra row, which will always contain NaN>
# column. When we then apply the column index to permute the raw data
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this comment got a bit scrambled? (1) still refers to the possibility of nonexistent rows, but then we talk about columns here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that we use the extra row/column trick for both cases, I reworded the comment.

zipline/data/fx/hdf5.py Outdated Show resolved Hide resolved
Comment on lines 263 to 267
mapping its column label's currency to ``quote_currency``. The
arrays that are actually written to the HDF5 file will be
transposed to have shape ``(len(currencies), len(dts))`` so that
similar values are in C-contiguous order, which improves overall
compression.
Copy link
Contributor

@ssanderson ssanderson Feb 6, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an implementation detail of the file format. I'm not sure it makes sense to include here.

Copy link
Contributor

@ssanderson ssanderson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Only outstanding comment is the note about transposing the inputs to write feels a little out of place to me. I'd probably either cut it or move it to a Notes section.

@peterhbromley peterhbromley merged commit 74010a8 into master Feb 6, 2020
@peterhbromley peterhbromley deleted the transpose-fx-data branch February 6, 2020 17:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants