Pandas Sprint (July, 2018)

Tom Augspurger edited this page Jul 12, 2018 · 5 revisions

Pandas Sprint Update

Some notes on what we discussed and the conclusions we reached.

Towards Pandas 1.0

Version Policy

After pandas 1.0, pandas will adopt semantic versioning. API breaking changes will be restricted to major releases. New deprecations may be introduced in minor and bugfix releases, but will not be enforced until a later major release.

Release Cadence

At this time, we don’t have plans to adopt a formal timely release schedule like Django’s.

1.0 groundwork

Prior to 1.0, we have a bit of work to do. The biggest changes are

  1. Removal of Panel.
  2. Removal of ix
  3. Possibly dropping Python 2.7 (depends on whether 1.0 happens before or after January 1, 2019)

Beyond that, we have some TODOs around fixing a few inconsistencies in the API (groupby relabeling, filter / select, rename / relabel), removing currently deprecated things, and finalizing the concept of .values. We're making all of pandas custom dtypes (Interval, Period, Datetime with TZ, Sparse), actual ExtensionArrays.

Additionally, we’re implementing IntegerNA as an extension array, fixing one of the longest-standing complaints about pandas’ type system. We think IntegerNA should be optional for pandas 1.0, to receive feedback from users.

Further built-in extension arrays for, e.g. Strings and nested data, will wait for later pandas versions (if ever).

Concrete Plan

  • Release 0.24.0 in September 2018.

    This will be a relatively normal release with a mix of improvements and bug fixes. There will be an unusually high number of deprecations in 0.24, as we prepare for 1.0

  • Release 0.25.0 in December 2018.

    This release will not remove any previously deprecated features. The hope is for developers to upgrade to 0.25 with little effort, fix any warnings, and easily upgrade to 1.0.

  • Release 1.0.0 in January 2019.

    This release will remove all previously deprecated features. Otherwise, it should be essentially the same as 0.25.0. We want transitioning from 0.25 to 1.0.0 to be as easy as possible.

Future Pandas

Currently, the details of pandas being built on top of NumPy is exposed directly to the user, most often via .values but in other ways as well. NumPy's memory layout and type system isn't ideal for pandas. Historically, we've hacked in workarounds (e.g. Datetime with TZ, Categorical). More recently we've standardized these "hacks" with ExtensionDtype and ExtensionArray. This is a confusing state of affairs for users (and developers). It's worth asking whether pandas should take fuller control over its internal data representation.

Fixing all this, however, is a large project. It's not exactly clear what a fix will look like, and there's a lot of work to do before we can even get there. A future version of pandas will likely have more ownership over its internal memory, but this isn't a goal for 1.0

Other Items

Aside from the pandas 1.0 and beyond discussions, we discussed the maintainer workflow, project governance, and a documentation overhaul.

The documentation will be updated with a new theme and a new structural organization (in additional to the usual content improvements).

You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.
Press h to open a hovercard with more details.