-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: Fix crash in pipeline with all currency-converted data. #2613
Conversation
@@ -166,7 +166,7 @@ def _inplace_currency_convert(self, columns, arrays, dates, sids): | |||
by_spec[column.currency_conversion].append(array) | |||
|
|||
# Nothing to do for terms with no currency conversion. | |||
by_spec.pop(None) | |||
by_spec.pop(None, None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the bugfix. The code above this groups the columns by "conversion spec", which describes what currency conversion needs to happen on the column. None
is used for columns that don't need conversion, and we want to remove those before proceeding, since we don't need to do anything for those. All my previous tests that exercised currency conversion also included at least one non-converted column, so the pop(None)
always succeeded. When running with all converted columns, however, we crash here because there's no None
key.
Fixes a bug where we would crash when trying to run a pipeline that contained only currency-converted data.
3bf790d
to
b7d4d78
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two questions about timezones and additional tests, should be good though.
rate, | ||
quote, | ||
bases=np.array([base], dtype=object), | ||
dts=pd.DatetimeIndex([dt], tz='UTC'), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not seeing much in the Pandas docs about what the tz
argument to DatetimeIndex
actually does -- is it just used to convert its inputs? If so, should we document its type as "convertible to datetime64", or do you want to limit what this interface officially supports?
Also, some failure modes we may or may not want to account for:
- Someone in Japan provides
dt='2020-01-01'
(or a timezone-naive Timestamp or datetime64 object, intended to represent local time) and gets results for a different market day than they expected. - They try again with
dt=pd.Timestamp('2020-01-01', tz='Japan')
and getTypeError: Already tz-aware, use tz_convert to convert.
Not sure what happens (or should happen) if non-UTC timezone-aware datetimes are passed directly to get_rates
. Looks like the test_fx
tests all use UTC Timestamps: should that just be the requirement in the interface?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If so, should we document its type as "convertible to datetime64", or do you want to limit what this interface officially supports?
In general, DatetimeIndex
will accept anything that pandas can convert to a datetime. For example:
In [8]: pd.DatetimeIndex(['2014-01-02', 5])
Out[8]: DatetimeIndex(['2014-01-02 00:00:00', '1970-01-01 00:00:00.000000005'], dtype='datetime64[ns]', freq=None)
So, if the goal here is to be maximally accurate, the signature of this is probably "anything that pandas can convert to a datetime". In practice, however, this class is a relatively low-level interface. I expect its primary clients to be pipeline loaders, not end users, so I'd rather advertise the interface I actually expect, which is Timestamp or datetime64.
Looks like the test_fx tests all use UTC Timestamps: should that just be the requirement in the interface?
One of the relatively bad mistakes we made in Zipline a long time ago is that we uniformly represent "dates" as pandas Timestamp
s at midnight, localized to utc, of the represented date. This is an objectively bad representation for a date for a variety of reasons, but it's been our representation for a long time, and it's pretty deeply ingrained into the codebase. That's the expected representation here, and I updated the docs on get_rates
to reflect that here: 8a361e7.
Someone in Japan provides dt='2020-01-01' (or a timezone-naive Timestamp or datetime64 object, intended to represent local time) and gets results for a different market day than they expected.
Both of these cases would result in loading the rates for the date requested, which is probably the expected behavior.
They try again with dt=pd.Timestamp('2020-01-01', tz='Japan') and getTypeError: Already tz-aware, use tz_convert to convert.
As you note, this would be an error, which I think is reasonable. We never expect to see non-utc-localized dates in this function (or, really, anywhere in zipline).
('CA', CA_EQUITIES, 'XTSE'), | ||
('GB', GB_EQUITIES, 'XLON'), | ||
]) | ||
def test_only_currency_converted_data(self, name, domain, calendar_name): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might could use a direct test for get_rates
without any non-converted columns, in case InMemoryFXRateReader
ever sprouts its own implementation of get_rate_scalar
; but maybe coverage measurement is sufficient to catch that if it happens.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might could use a direct test for get_rates without any non-converted columns
This doesn't really make sense. get_rates
doesn't know anything about pipeline columns. The interface it provides is for reading a dates-by-currencies block of of fx rate data.
That said, I think a reasonable check to add is to ensure that get_rates_scalar
is always "just" syntactic sugar for calling get_rates
with a single date and currency and extracting the value of the 1 x 1 array. I've added that check to an existing test here: b842b87
Fixes a bug where we would crash when trying to run a pipeline that contained
only currency-converted data.