ENH: Add get_rates_columnar method to FXRatesReader. #2617

ssanderson · 2020-01-15T16:09:42Z

Add a new method, get_rates_columnar to FXRateReader, with a default implementation given in terms of get_rates.

The behavior of get_rates_columnar should be logically equivalent to:

def get_rates_columnar(self, rate, quote, bases, dts):
    out = []
    for base, dt in zip(bases, dts):
        out.append(self.get_rate_scalar(rate, quote, base, dt)
    return np.array(out)

coveralls · 2020-01-15T16:31:02Z

Coverage increased (+0.002%) to 88.033% when pulling c6bb976 on columnar-fx-rates into b0b20b0 on master.

peterhbromley

I had one question about whether the fact that we get a bunch of data that we don't end up using in the columnar case matters at all but otherwise this LGTM

peterhbromley · 2020-01-15T23:04:39Z

zipline/data/fx/base.py

+        rates_2d = self.get_rates(
+            rate,
+            quote,
+            unique_bases,
+            pd.DatetimeIndex(unique_dts, tz='utc')
+        )
+        return rates_2d[dts_ix, bases_ix]


I could be wrong about this, but it seems like in the columnar case we might load a bunch more data than we need since get_rates computes the cartesian product but the column is only a subset of that. I assume this doesn't really matter, but was curious if that is a concern at all.

it seems like in the columnar case we might load a bunch more data than we need since get_rates computes the cartesian product but the column is only a subset of that.

Yup, this is definitely a possibility. The worst case scenario here would be something like a columnar query where every day has a different currency in it. We'd end up loading all the currencies for every day, even though we only needed one day of each currency. I'm not too worried about this because:

In our expected workloads, I expect that if we need a currency on day N, we'll probably need it for nearby days. In particular, most datasets will have static currencies, which means the most common case will be that if we ever need a currency we'll need it every day.

Even in the pathological case, it's not clear that we can do much better than what we're doing here, because we're going to want to use compression and chunking in the HDF5 files. What that means in practice is that the data arrays get broken up into regular sized pieces, and each piece gets compressed and stored separately. If you load any value out of the array, you have to decompress and load the entire chunk holding the value, so what dominates performance isn't so much the number of values read: it's the number of distinct chunks read. So, most likely, even if we tried to more precise reads here, we'd end up doing the same amount of IO, and we'd likely have higher overheads from trying to read smaller chunks of the file.

ENH: Add get_rates_columnar method to FXRatesReader.

c6bb976

ssanderson requested a review from peterhbromley January 15, 2020 16:09

peterhbromley approved these changes Jan 15, 2020

View reviewed changes

ssanderson merged commit c825927 into master Jan 16, 2020

ssanderson deleted the columnar-fx-rates branch January 16, 2020 14:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: Add get_rates_columnar method to FXRatesReader. #2617

ENH: Add get_rates_columnar method to FXRatesReader. #2617

ssanderson commented Jan 15, 2020 •

edited

Loading

coveralls commented Jan 15, 2020

peterhbromley left a comment

peterhbromley Jan 15, 2020

ssanderson Jan 15, 2020

ENH: Add get_rates_columnar method to FXRatesReader. #2617

ENH: Add get_rates_columnar method to FXRatesReader. #2617

Conversation

ssanderson commented Jan 15, 2020 • edited Loading

coveralls commented Jan 15, 2020

peterhbromley left a comment

Choose a reason for hiding this comment

peterhbromley Jan 15, 2020

Choose a reason for hiding this comment

ssanderson Jan 15, 2020

Choose a reason for hiding this comment

ssanderson commented Jan 15, 2020 •

edited

Loading