BUG: Fix crash in pipeline with all currency-converted data. #2613

ssanderson · 2020-01-13T22:38:08Z

Fixes a bug where we would crash when trying to run a pipeline that contained
only currency-converted data.

ssanderson · 2020-01-13T22:42:35Z

zipline/pipeline/loaders/equity_pricing_loader.py

@@ -166,7 +166,7 @@ def _inplace_currency_convert(self, columns, arrays, dates, sids):
            by_spec[column.currency_conversion].append(array)

        # Nothing to do for terms with no currency conversion.
-        by_spec.pop(None)
+        by_spec.pop(None, None)


This is the bugfix. The code above this groups the columns by "conversion spec", which describes what currency conversion needs to happen on the column. None is used for columns that don't need conversion, and we want to remove those before proceeding, since we don't need to do anything for those. All my previous tests that exercised currency conversion also included at least one non-converted column, so the pop(None) always succeeded. When running with all converted columns, however, we crash here because there's no None key.

Fixes a bug where we would crash when trying to run a pipeline that contained only currency-converted data.

coveralls · 2020-01-13T22:57:52Z

Coverage increased (+0.004%) to 88.025% when pulling 8a361e7 on fix-only-currency-converted-prices into 16c66a4 on master.

quantophred

Two questions about timezones and additional tests, should be good though.

quantophred · 2020-01-13T23:42:28Z

zipline/data/fx/base.py

+            rate,
+            quote,
+            bases=np.array([base], dtype=object),
+            dts=pd.DatetimeIndex([dt], tz='UTC'),


I'm not seeing much in the Pandas docs about what the tz argument to DatetimeIndex actually does -- is it just used to convert its inputs? If so, should we document its type as "convertible to datetime64", or do you want to limit what this interface officially supports?

Also, some failure modes we may or may not want to account for:

Someone in Japan provides dt='2020-01-01' (or a timezone-naive Timestamp or datetime64 object, intended to represent local time) and gets results for a different market day than they expected.

They try again with dt=pd.Timestamp('2020-01-01', tz='Japan') and getTypeError: Already tz-aware, use tz_convert to convert.

Not sure what happens (or should happen) if non-UTC timezone-aware datetimes are passed directly to get_rates. Looks like the test_fx tests all use UTC Timestamps: should that just be the requirement in the interface?

If so, should we document its type as "convertible to datetime64", or do you want to limit what this interface officially supports?

In general, DatetimeIndex will accept anything that pandas can convert to a datetime. For example:

In [8]: pd.DatetimeIndex(['2014-01-02', 5]) Out[8]: DatetimeIndex(['2014-01-02 00:00:00', '1970-01-01 00:00:00.000000005'], dtype='datetime64[ns]', freq=None)

So, if the goal here is to be maximally accurate, the signature of this is probably "anything that pandas can convert to a datetime". In practice, however, this class is a relatively low-level interface. I expect its primary clients to be pipeline loaders, not end users, so I'd rather advertise the interface I actually expect, which is Timestamp or datetime64.

Looks like the test_fx tests all use UTC Timestamps: should that just be the requirement in the interface?

One of the relatively bad mistakes we made in Zipline a long time ago is that we uniformly represent "dates" as pandas Timestamps at midnight, localized to utc, of the represented date. This is an objectively bad representation for a date for a variety of reasons, but it's been our representation for a long time, and it's pretty deeply ingrained into the codebase. That's the expected representation here, and I updated the docs on get_rates to reflect that here: 8a361e7.

Someone in Japan provides dt='2020-01-01' (or a timezone-naive Timestamp or datetime64 object, intended to represent local time) and gets results for a different market day than they expected.

Both of these cases would result in loading the rates for the date requested, which is probably the expected behavior.

They try again with dt=pd.Timestamp('2020-01-01', tz='Japan') and getTypeError: Already tz-aware, use tz_convert to convert.

As you note, this would be an error, which I think is reasonable. We never expect to see non-utc-localized dates in this function (or, really, anywhere in zipline).

quantophred · 2020-01-13T23:43:37Z

tests/pipeline/test_international_markets.py

+        ('CA', CA_EQUITIES, 'XTSE'),
+        ('GB', GB_EQUITIES, 'XLON'),
+    ])
+    def test_only_currency_converted_data(self, name, domain, calendar_name):


We might could use a direct test for get_rates without any non-converted columns, in case InMemoryFXRateReader ever sprouts its own implementation of get_rate_scalar; but maybe coverage measurement is sufficient to catch that if it happens.

We might could use a direct test for get_rates without any non-converted columns

This doesn't really make sense. get_rates doesn't know anything about pipeline columns. The interface it provides is for reading a dates-by-currencies block of of fx rate data.

That said, I think a reasonable check to add is to ensure that get_rates_scalar is always "just" syntactic sugar for calling get_rates with a single date and currency and extracting the value of the 1 x 1 array. I've added that check to an existing test here: b842b87

ssanderson commented Jan 13, 2020

View reviewed changes

BUG: Fix crash in pipeline with all currency-converted data.

b7d4d78

Fixes a bug where we would crash when trying to run a pipeline that contained only currency-converted data.

ssanderson force-pushed the fix-only-currency-converted-prices branch from 3bf790d to b7d4d78 Compare January 13, 2020 22:44

quantophred approved these changes Jan 13, 2020

View reviewed changes

Scott Sanderson added 2 commits January 14, 2020 16:05

TEST: Add test coverage for get_rate_scalar.

b842b87

DOC: Clarify utc requirements on dts.

8a361e7

ssanderson merged commit b0b20b0 into master Jan 14, 2020

ssanderson deleted the fix-only-currency-converted-prices branch January 14, 2020 22:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: Fix crash in pipeline with all currency-converted data. #2613

BUG: Fix crash in pipeline with all currency-converted data. #2613

ssanderson commented Jan 13, 2020

ssanderson Jan 13, 2020

coveralls commented Jan 13, 2020 •

edited

Loading

quantophred left a comment

quantophred Jan 13, 2020

ssanderson Jan 14, 2020

quantophred Jan 13, 2020

ssanderson Jan 14, 2020

BUG: Fix crash in pipeline with all currency-converted data. #2613

BUG: Fix crash in pipeline with all currency-converted data. #2613

Conversation

ssanderson commented Jan 13, 2020

ssanderson Jan 13, 2020

Choose a reason for hiding this comment

coveralls commented Jan 13, 2020 • edited Loading

quantophred left a comment

Choose a reason for hiding this comment

quantophred Jan 13, 2020

Choose a reason for hiding this comment

ssanderson Jan 14, 2020

Choose a reason for hiding this comment

quantophred Jan 13, 2020

Choose a reason for hiding this comment

ssanderson Jan 14, 2020

Choose a reason for hiding this comment

coveralls commented Jan 13, 2020 •

edited

Loading