Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Fix crash in pipeline with all currency-converted data. #2613

Merged
merged 3 commits into from
Jan 14, 2020

Conversation

ssanderson
Copy link
Contributor

Fixes a bug where we would crash when trying to run a pipeline that contained
only currency-converted data.

@@ -166,7 +166,7 @@ def _inplace_currency_convert(self, columns, arrays, dates, sids):
by_spec[column.currency_conversion].append(array)

# Nothing to do for terms with no currency conversion.
by_spec.pop(None)
by_spec.pop(None, None)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the bugfix. The code above this groups the columns by "conversion spec", which describes what currency conversion needs to happen on the column. None is used for columns that don't need conversion, and we want to remove those before proceeding, since we don't need to do anything for those. All my previous tests that exercised currency conversion also included at least one non-converted column, so the pop(None) always succeeded. When running with all converted columns, however, we crash here because there's no None key.

Fixes a bug where we would crash when trying to run a pipeline that contained
only currency-converted data.
@ssanderson ssanderson force-pushed the fix-only-currency-converted-prices branch from 3bf790d to b7d4d78 Compare January 13, 2020 22:44
@coveralls
Copy link

coveralls commented Jan 13, 2020

Coverage Status

Coverage increased (+0.004%) to 88.025% when pulling 8a361e7 on fix-only-currency-converted-prices into 16c66a4 on master.

Copy link
Contributor

@quantophred quantophred left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two questions about timezones and additional tests, should be good though.

rate,
quote,
bases=np.array([base], dtype=object),
dts=pd.DatetimeIndex([dt], tz='UTC'),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not seeing much in the Pandas docs about what the tz argument to DatetimeIndex actually does -- is it just used to convert its inputs? If so, should we document its type as "convertible to datetime64", or do you want to limit what this interface officially supports?

Also, some failure modes we may or may not want to account for:

  • Someone in Japan provides dt='2020-01-01' (or a timezone-naive Timestamp or datetime64 object, intended to represent local time) and gets results for a different market day than they expected.
  • They try again with dt=pd.Timestamp('2020-01-01', tz='Japan') and getTypeError: Already tz-aware, use tz_convert to convert.

Not sure what happens (or should happen) if non-UTC timezone-aware datetimes are passed directly to get_rates. Looks like the test_fx tests all use UTC Timestamps: should that just be the requirement in the interface?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If so, should we document its type as "convertible to datetime64", or do you want to limit what this interface officially supports?

In general, DatetimeIndex will accept anything that pandas can convert to a datetime. For example:

In [8]: pd.DatetimeIndex(['2014-01-02', 5])
Out[8]: DatetimeIndex(['2014-01-02 00:00:00', '1970-01-01 00:00:00.000000005'], dtype='datetime64[ns]', freq=None)

So, if the goal here is to be maximally accurate, the signature of this is probably "anything that pandas can convert to a datetime". In practice, however, this class is a relatively low-level interface. I expect its primary clients to be pipeline loaders, not end users, so I'd rather advertise the interface I actually expect, which is Timestamp or datetime64.

Looks like the test_fx tests all use UTC Timestamps: should that just be the requirement in the interface?

One of the relatively bad mistakes we made in Zipline a long time ago is that we uniformly represent "dates" as pandas Timestamps at midnight, localized to utc, of the represented date. This is an objectively bad representation for a date for a variety of reasons, but it's been our representation for a long time, and it's pretty deeply ingrained into the codebase. That's the expected representation here, and I updated the docs on get_rates to reflect that here: 8a361e7.

Someone in Japan provides dt='2020-01-01' (or a timezone-naive Timestamp or datetime64 object, intended to represent local time) and gets results for a different market day than they expected.

Both of these cases would result in loading the rates for the date requested, which is probably the expected behavior.

They try again with dt=pd.Timestamp('2020-01-01', tz='Japan') and getTypeError: Already tz-aware, use tz_convert to convert.

As you note, this would be an error, which I think is reasonable. We never expect to see non-utc-localized dates in this function (or, really, anywhere in zipline).

('CA', CA_EQUITIES, 'XTSE'),
('GB', GB_EQUITIES, 'XLON'),
])
def test_only_currency_converted_data(self, name, domain, calendar_name):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might could use a direct test for get_rates without any non-converted columns, in case InMemoryFXRateReader ever sprouts its own implementation of get_rate_scalar; but maybe coverage measurement is sufficient to catch that if it happens.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might could use a direct test for get_rates without any non-converted columns

This doesn't really make sense. get_rates doesn't know anything about pipeline columns. The interface it provides is for reading a dates-by-currencies block of of fx rate data.

That said, I think a reasonable check to add is to ensure that get_rates_scalar is always "just" syntactic sugar for calling get_rates with a single date and currency and extracting the value of the 1 x 1 array. I've added that check to an existing test here: b842b87

@ssanderson ssanderson merged commit b0b20b0 into master Jan 14, 2020
@ssanderson ssanderson deleted the fix-only-currency-converted-prices branch January 14, 2020 22:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants