Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for Newer Python Versions #2616

Open
ssanderson opened this issue Jan 14, 2020 · 15 comments
Open

Support for Newer Python Versions #2616

ssanderson opened this issue Jan 14, 2020 · 15 comments

Comments

@ssanderson
Copy link
Contributor

ssanderson commented Jan 14, 2020

Background

Zipline currently officially supports running on Python 2.7 and Python 3.5. These are the Python versions currently available on Quantopian.

These are both relatively old versions of Python, and many Zipline users (including users internally at Quantopian) would like to be able to use Zipline on more modern versions of Python. Supporting newer Python versions is particularly important because Python 2.7 is no longer supported by the Python Software Foundation, and Python 3.5 will reach end of life in September 2020.

We'd like to support a range of modern Python versions without dramatically increasing the cost of maintaining Zipline. In the short term, this probably means adding CI builds for Zipline on 3.6 and 3.7 (possibly also 3.8?). In the medium-to-long term, we need a more sustainable process for adding new Python versions without having to expend a ton of effort. This is especially important now that Python has adopted an annual release cadence.

Challenges

Supporting new Python versions is more challenging for Zipline than the average Python project, for a few reasons:

  1. We maintain backwards compatibility for older Python versions, so adding support for newer Pythons increases the total range of versions that we need to test, package, and triage issues for.
  2. We also maintain backwards compatibility for relatively old versions of NumPy and Pandas. At the time of this writing, Quantopian supports pandas 0.18.1, which was released in May of 2016. That version of pandas has a few notable bugs on Python 3.6, and it doesn't work at all on Python 3.7+ and above due to an issue in Cython. Supporting newer versions of Python requires us to also support a larger range of numpy and pandas versions, which further increases our maintenance burden.
  3. We currently build our own conda packages are part of Zipline's Travis and Appveyor builds. These package builds take a while, and (I believe) can't be easily updated by non-Quantopian employees, which effectively means that only Q employees can work on this.

Additional Thoughts

  • One of the biggest costs of adding new Python versions is that it adds new entries to the Travis/Appveyor build matrices. Since the number of workers in these builds are limited, adding new entries causes the builds to run in serial, which slows down development. One promising alternative would be to switch to using Github Actions, which provides a pretty generous number of workers for open source projects. @gmanoim-quantopian has prototyped a version of this over in trading-calendars. I think we could replace a lot of the complexity in our Travis/Appveyor config with a much simpler Github Actions setup.
  • Another big maintenance headache here is building our own conda packages. We might benefit from switching to Conda Forge for many of these dependencies.
@samatix
Copy link
Contributor

samatix commented Jan 16, 2020

Thank you @ssanderson. This is excellent news.

As I don't have the full picture on all the challenges faced, would it make sense to have new major releases (backward incompatible) to drop support of the old python versions or soon to be (2.7, 3.5) as well as the old dependencies (pandas, numpy) ?

Regarding the process for the releases, I was thinking about first creating a sheet file with all the dependencies used in zipline with their release schedule and supported versions as part of the constraints on the sought upgrade process.

This said, I've done a tentative to upgrade Zipline's dependencies to the latest versions from the latest conda distribution release, run the tests and review the errors. I've noted the following:

  • Time management changes
  • pd.Panel to be replaced as it has been deprecated
  • Replace assertRaisesRegexp with assertRaisesRegex
  • pd.Categorical support
  • Remove the box=True
  • pd.TimeGrouper to be replaced by pd.Grouper(freq=*)
  • Labels in pd.multiIndex to be replaced by codes
  • pd.DatetimeIndex to be replaced by pd.date_range
  • Keep networkx==1.1 as for now the upgrade to 2.4 requires additional work
  • Replace in data_portal the dividend_tuple[i] by a better method to not depend on the dividend tuple order retrieved from the DB
  • Improve the examples tests to depend less on the pandas version
    • Other minor changes

Dependencies:

  • python==3.7.6
  • pandas==0.25.3
  • numpy==1.17.4

Thanks
Ayoub

@AndreasClenow
Copy link

I would very much welcome support for more recent versions, @ssanderson .

My two cents here may be a little simplistic, but here it goes: Break backwards compatibility.

Let those who for reason want to stay on Python 3.5 keep using Zipline 1.3. Throw out backwards compatibility for 2.7 and 3.5 for the next version of Zipline. It's unreasonable to expect software updates for ancient platforms.

I'd be a happy camper if Zipline 1.4 (or perhaps a 2.0 denotation is in order), would require Python 3.7+ and Pandas 0.25+.

ac

@ssanderson
Copy link
Contributor Author

ssanderson commented Jan 23, 2020

As I don't have the full picture on all the challenges faced, would it make sense to have new major releases (backward incompatible) to drop support of the old python versions or soon to be (2.7, 3.5) as well as the old dependencies (pandas, numpy) ?

It isn't really viable for the Quantopian team to drop support for 2.7 or 3.5 until we've dropped support for those versions on the Quantopian platform. That's likely to happen relatively soon for 2.7, but 3.5 is going to be around for at least a few more months.

We have similar challenges w/r/t upgrading pandas. There are a lot of breaking changes that are likely to affect Zipline algorithms in newer versions of pandas, so upgrading pandas unconditionally on Quantopian would have a significant cost for our users. This is made trickier by the fact that python 3.7+ is only compatible with relatively recent versions of pandas (I think 0.22 or 0.23).

I think the most realistic path forward for Q is something like:

  • Add support for newer numpy and pandas versions in zipline. An open question here is what version of pandas to target. The recently released 1.0 is the obvious choice, but that release contains a lot of breaking changes (most notably, the removal of the Panel class; xarray is probably the right replacement for it) @samatix's list above is probably a good starting point for this at least.
  • In parallel, add support for Python 3.7 in zipline (and possibly Python 3.8), while retaining support for older python versions.
    • As mentioned above, a challenge here is that our current minimum supported versions of pandas/numpy do not work on 3.7+, because they distribute Cython-generated code that used members of CPython's internal data structures that were removed in 3.7. I think the only realistic way we'd deal with this would be to require newer numpy/pandas versions for Python 3.7+. Thus our supported build configurations would be, at least:

      • python==2.7, numpy==1.11.3, pandas=0.18.1
      • python==3.5, numpy==1.11.3, pandas=0.18.1
      • python==3.7, numpy==<new>, pandas=<new>

      I could imagine filling out other points in this matrix as well (e.g. Python 3.8, or some intermediate versions of numpy/pandas), but the larger the matrix the more maintenance work we create for ourselves.

    • Another challenge here is that we currently build many of our own conda packages as part of our travis and appveyor builds. We do this, I believe, primarily because many of our dependent packages don't have generally-available builds that are compatible with our supported versions of python, numpy and pandas. Our conda infrastructure was created before the widespread adoption of Conda Forge though, so it's possible that much of it is redundant now (at least for newer package versions).

      To be perfectly honest I don't quite fully understand all the conda stuff in our CI builds, so I don't have a good sense of what's necessary to update/replace/remove it. Conda is our only well-supported mechanism for installing zipline on Windows though, so we need to figure out a plan for it if we want to continue to support that platform. This is also tough for us to allocate a lot of resources to, because no one at Q actually uses Zipline on windows, and very few people use conda.

  • Drop support for Python 2.7 on Quantopian.
  • Drop support for Python 2.7 in Zipline.
  • Add support for Python 3.7 / modern pandas on Quantopian.
  • (Eventually) drop support for Python 3.5 on Quantopian/Zipline. This likely won't happen until near the EOL for 3.5 in September. Doing this would also imply dropping support for pandas 0.18.

"Support", in the above, means something like "a passing CI build that gives us strong confidence that the core Zipline functionality works correctly on all supported platforms". The biggest challenge here, besides just getting things working, is that adding many more entries to our build matrix would likely result in build times getting significantly (i.e., at least 2x) slower, because we'd hit enough workers that Travis wouldn't be able to run all our builds in parallel. We've looked into paying Travis for more workers in the past, but in the interim since we last did so, GitHub Actions was released, which seems like it's superior in basically every way.

@willianpaixao
Copy link
Contributor

willianpaixao commented Jan 23, 2020

he biggest challenge here, besides just getting things working, is that adding many more entries to our build matrix would likely result in build times getting significantly (i.e., at least 2x) slower, because we'd hit enough workers that Travis wouldn't be able to run all our builds in parallel.

@ssanderson there are many ways to solve this. Just on the top of my mind:

  1. limit the CI builds to more stable branches or make the use of the skip flag in WIP commits or branches.
  2. make use of precommit hooks to run preliminary checks and tests and reduce the number of failing builds (therefore reducing the number of consequent commits, fixing something small. by reducing the number of commits, you reduce the number of build the CI will have to do).
  3. make use of Docker builds for "development images" and that could be for the pure python or conda base images, also making developers run it locally before pushing a commit.

That said, regarding the backwards compatibility, the 1.3.X version can be "freeze" supporting the current python and pandas version and the next major version (1.4 or 2.0) would at the same time drop support to 2.7, maybe 3.5 and start supporting 3.7 or 3.8. @AndreasClenow said it well.

@willianpaixao
Copy link
Contributor

@ssanderson #2631 introduces solution 2. described above, giving a way of reducing number of build.

I believe solution 3. is also a valid and will work on a PR.

@suitablyquantified
Copy link

FYI Pandas v1.0 was released on 29 Jan 2020. "The pandas 1.0 release removed a lot of functionality that was deprecated in previous releases. It is recommended to first upgrade to pandas 0.25 and to ensure your code is working without warnings, before upgrading to pandas 1.0."

@Peque
Copy link
Contributor

Peque commented Feb 4, 2020

Plans are pretty ambitious, which is great. 😊

A little step forward: #2643

@Peque
Copy link
Contributor

Peque commented Feb 5, 2020

[...] no one at Q actually uses Zipline on windows, and very few people use conda."

@ssanderson Maybe that would be a good reason to drop Python 2.7 (and maybe 3.5 too) from AppVeyor pipelines.

That means, do not officially support those versions in Windows from now on. That would reduce the load on AppVeyor pipelines, which seem to be the slower and which can even block (#2619).

I think users would appreciate more support towards newer Python/package versions instead of supporting old stuff that they probably don't use. I understand you want to still support old stuff in Linux until you update your projects at Quantopian, but Windows may be treated differently. 😊

@ssanderson
Copy link
Contributor Author

ssanderson commented Feb 5, 2020

Maybe that would be a good reason to drop Python 2.7 (and maybe 3.5 too) from AppVeyor pipelines.

My hope in the relatively near term is to switch our CI over to GitHub Actions (see #2637), which should improve the speed

That means, do not officially support those versions in Windows from now on.

I think this makes sense. Broadly, I think our goal should be to support roughly two use cases:

  1. Long term support for the platforms Quantopian deploys to internally. These will pretty much always be relatively (~1-3 years) old Linux/Python/Pandas versions. I think we only need to support pip installs for these platforms, and we should expect that the primary consumers of these platforms will be users like Quantopian who value stability and backwards compatibility over ease of installation.
  2. Relatively modern versions of packages that non-Q users would like to use. This probably means something like the most recent 1-2 Python versions, and relatively recent versions of pydata packages.

Generally speaking, since the majority of Zipline maintenance has been done by Q employees, we've done a good job at prioritizing (1) and a not-so-great job of prioritizing (2). The challenge is made harder by the fact that (2) generally requires more regular investment to stay up to date, which is harder for us to prioritize at Q since we're not using these newer versions ourselves.

I think the best path forward for making progress toward better support for (2) is for us to make it easier for community members to help us with the maintenance burden of supporting newer versions of packages. Besides switching over to actions, I think the other thing that would help in that effort would be to use conda forge to support installation with conda instead of maintaining our own conda packages for many of zipline's dependencies. I wrote about this in a bit more depth in the GitHub Actions issue I linked above.

@philtrade
Copy link

I've done a tentative to upgrade Zipline's dependencies to the latest versions from the latest conda distribution release, run the tests and review the errors. I've noted the following...

Great start! It would be wonderful if I can start from your env setup and chipping away at the task list.

@ssanderson, would it make sense to start a branch specifically for porting to newer Python and packages, and start labelling/tagging issues with "Python3.7", so that the many eager community users/volunteers can help with this much needed upgrade?

@ksyme99
Copy link

ksyme99 commented May 12, 2020

I think the best path forward for making progress toward better support for (2) is for us to make it easier for community members to help us with the maintenance burden of supporting newer versions of packages.

One of the blockers to this is how far away the current code is to the last release - I use zipline on Windows outside Quantopian and would be willing to contribute as part of my role at work, but until we are using the version I can contribute to, it's difficult (there are numerous changes needed in my repo to data structures, etc. between 1.3 and master to be made in an upgrade, even before I could help with fixes for version changes). Obviously the backwards compatibility is a big issue with letting loose on newer versions, and supporting your own platform will come before external users, but speaking from experience, external users are looking to alternatives due to the perceived lack of support of this package (2 years since last release).

@nfx
Copy link

nfx commented Oct 3, 2020

Python3.8 has MacOSX build problems as well

@leonarduschen
Copy link
Contributor

So sad to hear the recent news about Q.

One thing comes to mind though, is there any plans on what's going to happen to Zipline regarding support for newer libraries/Python?

As far as my understanding goes (correct me if I wrong), it was hard to bump up dependencies in Zipline and its relevant packages (e.g. Alphalens) because they were used to support Quantopian algorithm IDE.

@everling
Copy link

This branch https://github.com/quantopian/zipline/tree/new-new-new is worth looking into for the bump to pandas 1.1.3..
With the Quantopian guys heading over to Robinhood I wonder what kind stewardship awaits for the zipline repo. We use zipline at my firm, though with our own security master and datasets. Prior to Quantopian's closure I was in talks with them to open source our SQL-facing loader implementation, it would be great if it could find a place somewhere here.

@stefan-jansen
Copy link

A new release of Zipline now supports Python 3.7+.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests