Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Pandas Sprint (July, 2018)
Pandas Sprint Update
Some notes on what we discussed and the conclusions we reached.
Towards Pandas 1.0
After pandas 1.0, pandas will adopt semantic versioning. API breaking changes will be restricted to major releases. New deprecations may be introduced in minor and bugfix releases, but will not be enforced until a later major release.
At this time, we don’t have plans to adopt a formal timely release schedule like Django’s.
Prior to 1.0, we have a bit of work to do. The biggest changes are
- Removal of Panel.
- Removal of ix
- Possibly dropping Python 2.7 (depends on whether 1.0 happens before or after January 1, 2019)
Beyond that, we have some TODOs around fixing a few inconsistencies in the API (groupby relabeling, filter / select, rename / relabel), removing currently deprecated things, and finalizing the concept of
.values. We're making all of pandas custom dtypes (Interval, Period, Datetime with TZ, Sparse), actual ExtensionArrays.
Additionally, we’re implementing IntegerNA as an extension array, fixing one of the longest-standing complaints about pandas’ type system. We think IntegerNA should be optional for pandas 1.0, to receive feedback from users.
Further built-in extension arrays for, e.g. Strings and nested data, will wait for later pandas versions (if ever).
Release 0.24.0 in September 2018.
This will be a relatively normal release with a mix of improvements and bug fixes. There will be an unusually high number of deprecations in 0.24, as we prepare for 1.0
Release 0.25.0 in December 2018.
This release will not remove any previously deprecated features. The hope is for developers to upgrade to 0.25 with little effort, fix any warnings, and easily upgrade to 1.0.
Release 1.0.0 in January 2019.
This release will remove all previously deprecated features. Otherwise, it should be essentially the same as 0.25.0. We want transitioning from 0.25 to 1.0.0 to be as easy as possible.
Currently, the details of pandas being built on top of NumPy is exposed directly to the user, most often via
.values but in other ways as well. NumPy's memory layout and type system isn't ideal for pandas. Historically, we've hacked in workarounds (e.g. Datetime with TZ, Categorical). More recently we've standardized these "hacks" with ExtensionDtype and ExtensionArray. This is a confusing state of affairs for users (and developers). It's worth asking whether pandas should take fuller control over its internal data representation.
Fixing all this, however, is a large project. It's not exactly clear what a fix will look like, and there's a lot of work to do before we can even get there. A future version of pandas will likely have more ownership over its internal memory, but this isn't a goal for 1.0
Aside from the pandas 1.0 and beyond discussions, we discussed the maintainer workflow, project governance, and a documentation overhaul.