New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pandas 0.22 upgrade #2194

Merged
merged 19 commits into from Jul 11, 2018

Conversation

Projects
None yet
7 participants
@richafrank
Member

richafrank commented May 23, 2018

Makes zipline compatible with pandas 0.19 and 0.22, but lacks support for pipeline categoricals.
Part of #2132.

@coveralls

This comment has been minimized.

coveralls commented May 23, 2018

Coverage Status

Coverage increased (+0.08%) to 86.0% when pulling 204f7fe on pandas_upgrade into 845b33d on master.

PANDAS_VERSION: "0.19.2"

- PYTHON_VERSION: "3.5"
PANDAS_VERSION: "0.22.0"

This comment has been minimized.

@twiecki

twiecki May 25, 2018

Contributor

I think 0.23.0 is the most recent.

@richafrank richafrank force-pushed the pandas_upgrade branch 4 times, most recently from 817b776 to 301166f May 31, 2018

@richafrank richafrank force-pushed the pandas_upgrade branch 2 times, most recently from 2bf1dd6 to 35f1373 Jun 11, 2018

@@ -171,8 +171,7 @@ def _filter_requirements(lines_iter, filter_names=None,

REQ_UPPER_BOUNDS = {
'bcolz': '<1',
'pandas': '<0.19',
'pandas-datareader': '<0.6', # 0.6.0 requires pandas >=0.19.2
'pandas': '<=0.22',

This comment has been minimized.

@aseyboldt

aseyboldt Jun 22, 2018

Shouldn't this be "~=0.22" or "<=0.22.*" (or hopefully "~=0.23" if that works as well)? Otherwise bugfix releases of pandas will not be accepted.

This comment has been minimized.

@ssanderson

ssanderson Jul 3, 2018

Member

@aseyboldt pandas there doesn't appear to be a 0.22 forthcoming, and 0.23 triggers a bunch of warnings and a few errors when I try to update this.

This comment has been minimized.

@richafrank

richafrank Jul 3, 2018

Member

Without further changes, 0.23 didn't work, so I didn't add it to this PR (would love not to add more scope to this instead of merging what we have).

We could make it <0.23, if we think it's safe to support the micro versions. We have seen in the past that micro versions of pandas have changed behavior and broken things...

@ssanderson

@richafrank I posted a couple questions. I'm also seeing warnings coming out of odo at the start of the tests when running locally:

/home/ssanderson/.virtualenvs/zipline-new/lib/python3.5/site-packages/pandas/core/internals.py:3462: FutureWarning: Passing in 'datetime64' dtype with no frequency is deprecated and will raise in a future version. Please pass in 'datetime64[ns]' instead.
  return self.apply('astype', dtype=dtype, **kwargs)
/home/ssanderson/.virtualenvs/zipline-new/src/odo/odo/backends/pandas.py:102: FutureWarning: pandas.tslib is deprecated and will be removed in a future version.
You can access NaTType as type(pandas.NaT)
  @convert.register((pd.Timestamp, pd.Timedelta), (pd.tslib.NaTType, type(None)))
/home/ssanderson/.virtualenvs/zipline-new/lib/python3.5/site-packages/pandas/core/internals.py:3462: FutureWarning: Passing in 'datetime64' dtype with no frequency is deprecated and will raise in a future version. Please pass in 'datetime64[ns]' instead.
  return self.apply('astype', dtype=dtype, **kwargs)
@@ -171,8 +171,7 @@ def _filter_requirements(lines_iter, filter_names=None,

REQ_UPPER_BOUNDS = {
'bcolz': '<1',
'pandas': '<0.19',
'pandas-datareader': '<0.6', # 0.6.0 requires pandas >=0.19.2
'pandas': '<=0.22',

This comment has been minimized.

@ssanderson

ssanderson Jul 3, 2018

Member

@aseyboldt pandas there doesn't appear to be a 0.22 forthcoming, and 0.23 triggers a bunch of warnings and a few errors when I try to update this.

@@ -0,0 +1 @@
from zipline import setup, teardown # noqa For nosetests

This comment has been minimized.

@ssanderson

ssanderson Jul 3, 2018

Member

Is there a reason we're importing these from zipline.__init__ instead of just being defining them here?

This comment has been minimized.

@richafrank

richafrank Jul 3, 2018

Member

I put them in zipline.__init__ so that doctests running on code in zipline would apply the compat.

This comment has been minimized.

@ssanderson

ssanderson Jul 3, 2018

Member

Ah. Can we add a comment explaining that?

This comment has been minimized.

@richafrank
2013-06-06 00:00:00+00:00,0.00905314069572749
2013-06-07 00:00:00+00:00,0.01272045719904158
2013-05-01 00:00:00+00:00,-0.008768026837593923
2013-05-02 00:00:00+00:00,0.009287480470740572

This comment has been minimized.

@ssanderson

ssanderson Jul 3, 2018

Member

Why the changes here?

This comment has been minimized.

@richafrank

richafrank Jul 9, 2018

Member

I needed to rebuild the example data for use in test_examples with the new pandas version. That required downloading benchmarks, but our current benchmarks downloading process doesn't go back as far as the current data. I merged in data here from resources/market_data_SPY_benchmark.csv, which had a different range of dates, but also some floating point differences where they overlap (such as here). Another option might be to change the start/end dates of the test, but I was hoping to avoid that.

@@ -321,7 +324,7 @@ def rolling_window(array, length):

# Sentinel value that isn't NaT.
_notNaT = make_datetime64D(0)
iNaT = NaTns.view(int64_dtype)
iNaT = int(NaTns.view(int64_dtype))

This comment has been minimized.

@ssanderson

ssanderson Jul 3, 2018

Member

I would have expected this to be an int64. What's the reason for converting to a python int here?

This comment has been minimized.

@richafrank

richafrank Jul 3, 2018

Member

The only usage of this now (it was unused before, except in the assert below) is to represent a missing first_trading_day attribute in the bcolz ctable for us equity pricing, which requires json-serializability. We used to use pandas.tslib.iNaT (a long) for this.

skip_pipeline_new_pandas = \
'Pipeline categoricals are not yet compatible with pandas >=0.19'

if pandas_version >= StrictVersion('0.20'):

This comment has been minimized.

@ssanderson

ssanderson Jul 3, 2018

Member

I think this function is supposed to be private. Could we just define our own version of this using the normalize method of timestamps? E.g.:

def normalize_date(dt):
    return dt.normalize()

This comment has been minimized.

@richafrank

richafrank Jul 9, 2018

Member

Sounds good to me.

@@ -148,7 +148,13 @@ def _run(handle_data,
"invalid url %r, must begin with 'sqlite:///'" %
str(bundle_data.asset_finder.engine.url),
)
env = TradingEnvironment(asset_db_path=connstr, environ=environ)
env = TradingEnvironment(

This comment has been minimized.

@ssanderson

ssanderson Jul 3, 2018

Member

Why the changes here? (I'm about to remove this class entirely, so I'm surprised to see nontrivial changes involving it.)

This comment has been minimized.

@richafrank

richafrank Jul 9, 2018

Member

This was also for test_examples. Previously we ignored the cached benchmark data if it doesn't cover the requested date range, but that date range was always the entire trading calendar (trading_calendar.schedule.index). Now it's filtered to the range of the backtest, so we can use the cached benchmarks.

@ssanderson ssanderson force-pushed the pandas_upgrade branch from 35f1373 to 585731c Jul 3, 2018

@richafrank richafrank force-pushed the pandas_upgrade branch from 1f36f35 to caea07f Jul 9, 2018

@ssanderson

This comment has been minimized.

Member

ssanderson commented Jul 9, 2018

@richafrank this looks good to me. We should probably open a tracking issue to document the known failures with categoricals in new pandas.

It also might make sense to do a release with these changes and the trading_calendars refactor before we merge the breaking changes from #2192.

@richafrank richafrank force-pushed the pandas_upgrade branch from caea07f to 2077664 Jul 10, 2018

richafrank and others added some commits Mar 8, 2018

TST: Set numpy array formatting to legacy for doctest
TST: Ignore runtime warnings for invalid division only in newer pandas
BLD: Don't bother testing/packaging on pandas 0.19
It's already old, and we don't need a package per pandas version,
just numpy.

@richafrank richafrank force-pushed the pandas_upgrade branch from 2077664 to 204f7fe Jul 10, 2018

@richafrank richafrank changed the title from Pandas upgrade to Pandas 0.22 upgrade Jul 11, 2018

@richafrank richafrank merged commit c8a5e52 into master Jul 11, 2018

2 checks passed

continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details

@richafrank richafrank deleted the pandas_upgrade branch Jul 11, 2018

@twiecki

This comment has been minimized.

Contributor

twiecki commented Jul 11, 2018

🎉 👍 🥇

@twiecki

This comment has been minimized.

Contributor

twiecki commented Aug 7, 2018

Should we release a new version with the updated pandas dependency?

@richafrank

This comment has been minimized.

Member

richafrank commented Aug 7, 2018

@twiecki Great idea 😄 . zipline 1.3.0 has this change.

@rolo rolo referenced this pull request Sep 7, 2018

Open

Update pandas version #269

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment