MAINT: Transpose arrays in fx artifact for better compression #2646

peterhbromley · 2020-02-05T18:59:11Z

No description provided.

coveralls · 2020-02-05T20:08:40Z

Coverage decreased (-0.003%) to 88.274% when pulling a560fd9 on transpose-fx-data into 8195f3a on master.

ssanderson

ssanderson · 2020-02-05T22:13:51Z

zipline/data/fx/hdf5.py



 class HDF5FXRateWriter(object):
    """Writer class for HDF5 files consumed by HDF5FXRateReader.
    """
-    def __init__(self, group):
+    def __init__(self, group, date_chunk_size):


I think it would probably be reasonable to give this a default value so that zipline users don't need to do the same tuning work that we did on this.

I added the default as a global defined at the beginning of the file.

ssanderson · 2020-02-05T22:31:52Z

zipline/data/fx/hdf5.py

-            buf[:-1],
-            np.s_[slice_begin:slice_end],
-        )
+        buf[:, :-1] = dataset[:, slice_begin:slice_end]


If we're not using read_direct anymore, we might as well allocate both an extra row and an extra column and use the extra row/column trick for handling both cases rather than using it for just one of them.

ssanderson · 2020-02-05T22:32:12Z

zipline/data/fx/hdf5.py

-        # row. When we then apply the row index to permute the raw data into
-        # the correct order, any rows with values of -1 will pull from the
-        # extra row, which will always contain NaN>
+        # column. When we then apply the column index to permute the raw data


I think this comment got a bit scrambled? (1) still refers to the possibility of nonexistent rows, but then we talk about columns here.

Now that we use the extra row/column trick for both cases, I reworded the comment.

zipline/data/fx/hdf5.py

ssanderson · 2020-02-06T16:35:33Z

zipline/data/fx/hdf5.py

+            mapping its column label's currency to ``quote_currency``. The
+            arrays that are actually written to the HDF5 file will be
+            transposed to have shape ``(len(currencies), len(dts))`` so that
+            similar values are in C-contiguous order, which improves overall
+            compression.


This is an implementation detail of the file format. I'm not sure it makes sense to include here.

ssanderson

LGTM. Only outstanding comment is the note about transposing the inputs to write feels a little out of place to me. I'd probably either cut it or move it to a Notes section.

peterhbromley added 2 commits February 5, 2020 13:58

MAINT: Transpose arrays in fx artifact for better compression

6ad5f34

MAINT: Choose chunk size based on number of datetimes

be55bdf

MAINT: Update docstring describing fx artifact rates format

108571a

ssanderson reviewed Feb 5, 2020

View reviewed changes

MAINT: Make chunks param a default

ebc9f27

ssanderson reviewed Feb 6, 2020

View reviewed changes

ssanderson approved these changes Feb 6, 2020

View reviewed changes

MAINT: Remove unnecessary docstring

a560fd9

peterhbromley merged commit 74010a8 into master Feb 6, 2020

peterhbromley deleted the transpose-fx-data branch February 6, 2020 17:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MAINT: Transpose arrays in fx artifact for better compression #2646

MAINT: Transpose arrays in fx artifact for better compression #2646

peterhbromley commented Feb 5, 2020

coveralls commented Feb 5, 2020 •

edited

ssanderson left a comment

ssanderson Feb 5, 2020

peterhbromley Feb 6, 2020

ssanderson Feb 5, 2020

peterhbromley Feb 6, 2020

ssanderson Feb 5, 2020

peterhbromley Feb 6, 2020

ssanderson Feb 6, 2020 •

edited

ssanderson left a comment

MAINT: Transpose arrays in fx artifact for better compression #2646

MAINT: Transpose arrays in fx artifact for better compression #2646

Conversation

peterhbromley commented Feb 5, 2020

coveralls commented Feb 5, 2020 • edited

ssanderson left a comment

Choose a reason for hiding this comment

ssanderson Feb 5, 2020

Choose a reason for hiding this comment

peterhbromley Feb 6, 2020

Choose a reason for hiding this comment

ssanderson Feb 5, 2020

Choose a reason for hiding this comment

peterhbromley Feb 6, 2020

Choose a reason for hiding this comment

ssanderson Feb 5, 2020

Choose a reason for hiding this comment

peterhbromley Feb 6, 2020

Choose a reason for hiding this comment

ssanderson Feb 6, 2020 • edited

Choose a reason for hiding this comment

ssanderson left a comment

Choose a reason for hiding this comment

coveralls commented Feb 5, 2020 •

edited

ssanderson Feb 6, 2020 •

edited