Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pin pandas to latest version 0.22.0 #189

Closed
wants to merge 1 commit into from

Conversation

pyup-bot
Copy link
Collaborator

pandas is not pinned to a specific version.

I'm pinning it to the latest version 0.22.0 for now.

These links might come in handy: PyPI | Changelog | Homepage

Changelog

0.22.0


This is a major release from 0.21.1 and includes a single, API-breaking change.
We recommend that all users upgrade to this version after carefully reading the
release note (singular!).

.. _whatsnew_0220.api_breaking:

Backwards incompatible API changes

Pandas 0.22.0 changes the handling of empty and all-NA sums and products. The
summary is that

  • The sum of an empty or all-NA Series is now 0
  • The product of an empty or all-NA Series is now 1
  • We've added a min_count parameter to .sum() and .prod() controlling
    the minimum number of valid values for the result to be valid. If fewer than
    min_count non-NA values are present, the result is NA. The default is
    0. To return NaN, the 0.21 behavior, use min_count=1.

Some background: In pandas 0.21, we fixed a long-standing inconsistency
in the return value of all-NA series depending on whether or not bottleneck
was installed. See :ref:whatsnew_0210.api_breaking.bottleneck. At the same
time, we changed the sum and prod of an empty Series to also be NaN.

Based on feedback, we've partially reverted those changes.

Arithmetic Operations
^^^^^^^^^^^^^^^^^^^^^

The default sum for empty or all-NA Series is now 0.

pandas 0.21.x

.. code-block:: ipython

In [1]: pd.Series([]).sum()
Out[1]: nan

In [2]: pd.Series([np.nan]).sum()
Out[2]: nan

pandas 0.22.0

.. ipython:: python

pd.Series([]).sum()
pd.Series([np.nan]).sum()

The default behavior is the same as pandas 0.20.3 with bottleneck installed. It
also matches the behavior of NumPy's np.nansum on empty and all-NA arrays.

To have the sum of an empty series return NaN (the default behavior of
pandas 0.20.3 without bottleneck, or pandas 0.21.x), use the min_count
keyword.

.. ipython:: python

pd.Series([]).sum(min_count=1)

Thanks to the skipna parameter, the .sum on an all-NA
series is conceptually the same as the .sum of an empty one with
skipna=True (the default).

.. ipython:: python

pd.Series([np.nan]).sum(min_count=1) skipna=True by default

The min_count parameter refers to the minimum number of non-null values
required for a non-NA sum or product.

:meth:Series.prod has been updated to behave the same as :meth:Series.sum,
returning 1 instead.

.. ipython:: python

pd.Series([]).prod()
pd.Series([np.nan]).prod()
pd.Series([]).prod(min_count=1)

These changes affect :meth:DataFrame.sum and :meth:DataFrame.prod as well.
Finally, a few less obvious places in pandas are affected by this change.

Grouping by a Categorical
^^^^^^^^^^^^^^^^^^^^^^^^^

Grouping by a Categorical and summing now returns 0 instead of
NaN for categories with no observations. The product now returns 1
instead of NaN.

pandas 0.21.x

.. code-block:: ipython

In [8]: grouper = pd.Categorical(['a', 'a'], categories=['a', 'b'])

In [9]: pd.Series([1, 2]).groupby(grouper).sum()
Out[9]:
a 3.0
b NaN
dtype: float64

pandas 0.22

.. ipython:: python

grouper = pd.Categorical(['a', 'a'], categories=['a', 'b'])
pd.Series([1, 2]).groupby(grouper).sum()

To restore the 0.21 behavior of returning NaN for unobserved groups,
use min_count>=1.

.. ipython:: python

pd.Series([1, 2]).groupby(grouper).sum(min_count=1)

Resample
^^^^^^^^

The sum and product of all-NA bins has changed from NaN to 0 for
sum and 1 for product.

pandas 0.21.x

.. code-block:: ipython

In [11]: s = pd.Series([1, 1, np.nan, np.nan],
...: index=pd.date_range('2017', periods=4))
...: s
Out[11]:
2017-01-01 1.0
2017-01-02 1.0
2017-01-03 NaN
2017-01-04 NaN
Freq: D, dtype: float64

In [12]: s.resample('2d').sum()
Out[12]:
2017-01-01 2.0
2017-01-03 NaN
Freq: 2D, dtype: float64

pandas 0.22.0

.. ipython:: python

s = pd.Series([1, 1, np.nan, np.nan],
index=pd.date_range('2017', periods=4))
s.resample('2d').sum()

To restore the 0.21 behavior of returning NaN, use min_count>=1.

.. ipython:: python

s.resample('2d').sum(min_count=1)

In particular, upsampling and taking the sum or product is affected, as
upsampling introduces missing values even if the original series was
entirely valid.

pandas 0.21.x

.. code-block:: ipython

In [14]: idx = pd.DatetimeIndex(['2017-01-01', '2017-01-02'])

In [15]: pd.Series([1, 2], index=idx).resample('12H').sum()
Out[15]:
2017-01-01 00:00:00 1.0
2017-01-01 12:00:00 NaN
2017-01-02 00:00:00 2.0
Freq: 12H, dtype: float64

pandas 0.22.0

.. ipython:: python

idx = pd.DatetimeIndex(['2017-01-01', '2017-01-02'])
pd.Series([1, 2], index=idx).resample("12H").sum()

Once again, the min_count keyword is available to restore the 0.21 behavior.

.. ipython:: python

pd.Series([1, 2], index=idx).resample("12H").sum(min_count=1)

Rolling and Expanding
^^^^^^^^^^^^^^^^^^^^^

Rolling and expanding already have a min_periods keyword that behaves
similar to min_count. The only case that changes is when doing a rolling
or expanding sum with min_periods=0. Previously this returned NaN,
when fewer than min_periods non-NA values were in the window. Now it
returns 0.

pandas 0.21.1

.. code-block:: ipython

In [17]: s = pd.Series([np.nan, np.nan])

In [18]: s.rolling(2, min_periods=0).sum()
Out[18]:
0 NaN
1 NaN
dtype: float64

pandas 0.22.0

.. ipython:: python

s = pd.Series([np.nan, np.nan])
s.rolling(2, min_periods=0).sum()

The default behavior of min_periods=None, implying that min_periods
equals the window size, is unchanged.

Compatibility

If you maintain a library that should work across pandas versions, it
may be easiest to exclude pandas 0.21 from your requirements. Otherwise, all your
sum() calls would need to check if the Series is empty before summing.

With setuptools, in your setup.py use::

install_requires=['pandas!=0.21.*', ...]

With conda, use

.. code-block:: yaml

requirements:
run:
- pandas !=0.21.0,!=0.21.1

Note that the inconsistency in the return value for all-NA series is still
there for pandas 0.20.3 and earlier. Avoiding pandas 0.21 will only help with
the empty case.

.. _whatsnew_0211:

0.21.1


This is a minor bug-fix release in the 0.21.x series and includes some small regression fixes,
bug fixes and performance improvements.
We recommend that all users upgrade to this version.

Highlights include:

  • Temporarily restore matplotlib datetime plotting functionality. This should
    resolve issues for users who implicitly relied on pandas to plot datetimes
    with matplotlib. See :ref:here <whatsnew_0211.converters>.
  • Improvements to the Parquet IO functions introduced in 0.21.0. See
    :ref:here <whatsnew_0211.enhancements.parquet>.

.. contents:: What's new in v0.21.1
:local:
:backlinks: none

.. _whatsnew_0211.converters:

Restore Matplotlib datetime Converter Registration

Pandas implements some matplotlib converters for nicely formatting the axis
labels on plots with datetime or Period values. Prior to pandas 0.21.0,
these were implicitly registered with matplotlib, as a side effect of import pandas.

In pandas 0.21.0, we required users to explicitly register the
converter. This caused problems for some users who relied on those converters
being present for regular matplotlib.pyplot plotting methods, so we're
temporarily reverting that change; pandas 0.21.1 again registers the converters on
import, just like before 0.21.0.

We've added a new option to control the converters:
pd.options.plotting.matplotlib.register_converters. By default, they are
registered. Toggling this to False removes pandas' formatters and restore
any converters we overwrote when registering them (:issue:18301).

We're working with the matplotlib developers to make this easier. We're trying
to balance user convenience (automatically registering the converters) with
import performance and best practices (importing pandas shouldn't have the side
effect of overwriting any custom converters you've already set). In the future
we hope to have most of the datetime formatting functionality in matplotlib,
with just the pandas-specific converters in pandas. We'll then gracefully
deprecate the automatic registration of converters in favor of users explicitly
registering them when they want them.

.. _whatsnew_0211.enhancements:

New features

.. _whatsnew_0211.enhancements.parquet:

Improvements to the Parquet IO functionality
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  • :func:DataFrame.to_parquet will now write non-default indexes when the
    underlying engine supports it. The indexes will be preserved when reading
    back in with :func:read_parquet (:issue:18581).
  • :func:read_parquet now allows to specify the columns to read from a parquet file (:issue:18154)
  • :func:read_parquet now allows to specify kwargs which are passed to the respective engine (:issue:18216)

.. _whatsnew_0211.enhancements.other:

Other Enhancements
^^^^^^^^^^^^^^^^^^

  • :meth:Timestamp.timestamp is now available in Python 2.7. (:issue:17329)
  • :class:Grouper and :class:TimeGrouper now have a friendly repr output (:issue:18203).

.. _whatsnew_0211.deprecations:

Deprecations

  • pandas.tseries.register has been renamed to
    :func:pandas.plotting.register_matplotlib_converters`` (:issue:18301`)

.. _whatsnew_0211.performance:

Performance Improvements

  • Improved performance of plotting large series/dataframes (:issue:18236).

.. _whatsnew_0211.bug_fixes:

Bug Fixes

Conversion
^^^^^^^^^^

  • Bug in :class:TimedeltaIndex subtraction could incorrectly overflow when NaT is present (:issue:17791)
  • Bug in :class:DatetimeIndex subtracting datetimelike from DatetimeIndex could fail to overflow (:issue:18020)
  • Bug in :meth:IntervalIndex.copy when copying and IntervalIndex with non-default closed (:issue:18339)
  • Bug in :func:DataFrame.to_dict where columns of datetime that are tz-aware were not converted to required arrays when used with orient='records', raising``TypeError (:issue:18372`)
  • Bug in :class:DateTimeIndex and :meth:date_range where mismatching tz-aware start and end timezones would not raise an err if end.tzinfo is None (:issue:18431)
  • Bug in :meth:Series.fillna which raised when passed a long integer on Python 2 (:issue:18159).

Indexing
^^^^^^^^

  • Bug in a boolean comparison of a datetime.datetime and a datetime64[ns] dtype Series (:issue:17965)
  • Bug where a MultiIndex with more than a million records was not raising AttributeError when trying to access a missing attribute (:issue:18165)
  • Bug in :class:IntervalIndex constructor when a list of intervals is passed with non-default closed (:issue:18334)
  • Bug in Index.putmask when an invalid mask passed (:issue:18368)
  • Bug in masked assignment of a timedelta64[ns] dtype Series, incorrectly coerced to float (:issue:18493)

I/O
^^^

  • Bug in class:~pandas.io.stata.StataReader not converting date/time columns with display formatting addressed (:issue:17990). Previously columns with display formatting were normally left as ordinal numbers and not converted to datetime objects.
  • Bug in :func:read_csv when reading a compressed UTF-16 encoded file (:issue:18071)
  • Bug in :func:read_csv for handling null values in index columns when specifying na_filter=False (:issue:5239)
  • Bug in :func:read_csv when reading numeric category fields with high cardinality (:issue:18186)
  • Bug in :meth:DataFrame.to_csv when the table had MultiIndex columns, and a list of strings was passed in for header (:issue:5539)
  • Bug in parsing integer datetime-like columns with specified format in read_sql (:issue:17855).
  • Bug in :meth:DataFrame.to_msgpack when serializing data of the numpy.bool_ datatype (:issue:18390)
  • Bug in :func:read_json not decoding when reading line deliminted JSON from S3 (:issue:17200)
  • Bug in :func:pandas.io.json.json_normalize to avoid modification of meta (:issue:18610)
  • Bug in :func:to_latex where repeated multi-index values were not printed even though a higher level index differed from the previous row (:issue:14484)
  • Bug when reading NaN-only categorical columns in :class:HDFStore (:issue:18413)
  • Bug in :meth:DataFrame.to_latex with longtable=True where a latex multicolumn always spanned over three columns (:issue:17959)

Plotting
^^^^^^^^

  • Bug in DataFrame.plot() and Series.plot() with :class:DatetimeIndex where a figure generated by them is not pickleable in Python 3 (:issue:18439)

Groupby/Resample/Rolling
^^^^^^^^^^^^^^^^^^^^^^^^

  • Bug in DataFrame.resample(...).apply(...) when there is a callable that returns different columns (:issue:15169)
  • Bug in DataFrame.resample(...) when there is a time change (DST) and resampling frequecy is 12h or higher (:issue:15549)
  • Bug in pd.DataFrameGroupBy.count() when counting over a datetimelike column (:issue:13393)
  • Bug in rolling.var where calculation is inaccurate with a zero-valued array (:issue:18430)

Reshaping
^^^^^^^^^

  • Error message in pd.merge_asof() for key datatype mismatch now includes datatype of left and right key (:issue:18068)
  • Bug in pd.concat when empty and non-empty DataFrames or Series are concatenated (:issue:18178 :issue:18187)
  • Bug in DataFrame.filter(...) when :class:unicode is passed as a condition in Python 2 (:issue:13101)
  • Bug when merging empty DataFrames when np.seterr(divide='raise') is set (:issue:17776)

Numeric
^^^^^^^

  • Bug in pd.Series.rolling.skew() and rolling.kurt() with all equal values has floating issue (:issue:18044)

Categorical
^^^^^^^^^^^

  • Bug in :meth:DataFrame.astype where casting to 'category' on an empty DataFrame causes a segmentation fault (:issue:18004)
  • Error messages in the testing module have been improved when items have different CategoricalDtype (:issue:18069)
  • CategoricalIndex can now correctly take a pd.api.types.CategoricalDtype as its dtype (:issue:18116)
  • Bug in Categorical.unique() returning read-only codes array when all categories were NaN (:issue:18051)
  • Bug in DataFrame.groupby(axis=1) with a CategoricalIndex (:issue:18432)

String
^^^^^^

  • :meth:Series.str.split() will now propagate NaN values across all expanded columns instead of None (:issue:18450)

.. _whatsnew_060:

0.21.0


This is a major release from 0.20.3 and includes a number of API changes, deprecations, new features,
enhancements, and performance improvements along with a large number of bug fixes. We recommend that all
users upgrade to this version.

Highlights include:

  • Integration with Apache Parquet <https://parquet.apache.org/>__, including a new top-level :func:read_parquet function and :meth:DataFrame.to_parquet method, see :ref:here <whatsnew_0210.enhancements.parquet>.
  • New user-facing :class:pandas.api.types.CategoricalDtype for specifying
    categoricals independent of the data, see :ref:here <whatsnew_0210.enhancements.categorical_dtype>.
  • The behavior of sum and prod on all-NaN Series/DataFrames is now consistent and no longer depends on whether bottleneck <http://berkeleyanalytics.com/bottleneck>__ is installed, and sum and prod on empty Series now return NaN instead of 0, see :ref:here <whatsnew_0210.api_breaking.bottleneck>.
  • Compatibility fixes for pypy, see :ref:here <whatsnew_0210.pypy>.
  • Additions to the drop, reindex and rename API to make them more consistent, see :ref:here <whatsnew_0210.enhancements.drop_api>.
  • Addition of the new methods DataFrame.infer_objects (see :ref:here <whatsnew_0210.enhancements.infer_objects>) and GroupBy.pipe (see :ref:here <whatsnew_0210.enhancements.GroupBy_pipe>).
  • Indexing with a list of labels, where one or more of the labels is missing, is deprecated and will raise a KeyError in a future version, see :ref:here <whatsnew_0210.api_breaking.loc>.

Check the :ref:API Changes <whatsnew_0210.api_breaking> and :ref:deprecations <whatsnew_0210.deprecations> before updating.

.. contents:: What's new in v0.21.0
:local:
:backlinks: none
:depth: 2

.. _whatsnew_0210.enhancements:

New features

.. _whatsnew_0210.enhancements.parquet:

Integration with Apache Parquet file format
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Integration with Apache Parquet <https://parquet.apache.org/>__, including a new top-level :func:read_parquet and :func:DataFrame.to_parquet method, see :ref:here <io.parquet> (:issue:15838, :issue:17438).

Apache Parquet <https://parquet.apache.org/>__ provides a cross-language, binary file format for reading and writing data frames efficiently.
Parquet is designed to faithfully serialize and de-serialize DataFrame s, supporting all of the pandas
dtypes, including extension dtypes such as datetime with timezones.

This functionality depends on either the pyarrow <http://arrow.apache.org/docs/python/>__ or fastparquet <https://fastparquet.readthedocs.io/en/latest/>__ library.
For more details, see see :ref:the IO docs on Parquet <io.parquet>.

.. _whatsnew_0210.enhancements.infer_objects:

infer_objects type conversion
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The :meth:DataFrame.infer_objects and :meth:Series.infer_objects
methods have been added to perform dtype inference on object columns, replacing
some of the functionality of the deprecated convert_objects
method. See the documentation :ref:here <basics.object_conversion>
for more details. (:issue:11221)

This method only performs soft conversions on object columns, converting Python objects
to native types, but not any coercive conversions. For example:

.. ipython:: python

df = pd.DataFrame({'A': [1, 2, 3],
'B': np.array([1, 2, 3], dtype='object'),
'C': ['1', '2', '3']})
df.dtypes
df.infer_objects().dtypes

Note that column 'C' was not converted - only scalar numeric types
will be converted to a new type. Other types of conversion should be accomplished
using the :func:to_numeric function (or :func:to_datetime, :func:to_timedelta).

.. ipython:: python

df = df.infer_objects()
df['C'] = pd.to_numeric(df['C'], errors='coerce')
df.dtypes

.. _whatsnew_0210.enhancements.attribute_access:

Improved warnings when attempting to create columns
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

New users are often puzzled by the relationship between column operations and
attribute access on DataFrame instances (:issue:7175). One specific
instance of this confusion is attempting to create a new column by setting an
attribute on the DataFrame:

.. code-block:: ipython

In[1]: df = pd.DataFrame({'one': [1., 2., 3.]})
In[2]: df.two = [4, 5, 6]

This does not raise any obvious exceptions, but also does not create a new column:

.. code-block:: ipython

In[3]: df
Out[3]:
one
0 1.0
1 2.0
2 3.0

Setting a list-like data structure into a new attribute now raises a UserWarning about the potential for unexpected behavior. See :ref:Attribute Access <indexing.attribute_access>.

.. _whatsnew_0210.enhancements.drop_api:

drop now also accepts index/columns keywords
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The :meth:~DataFrame.drop method has gained index/columns keywords as an
alternative to specifying the axis. This is similar to the behavior of reindex
(:issue:12392).

For example:

.. ipython:: python

df = pd.DataFrame(np.arange(8).reshape(2,4),
columns=['A', 'B', 'C', 'D'])
df
df.drop(['B', 'C'], axis=1)
the following is now equivalent
df.drop(columns=['B', 'C'])

.. _whatsnew_0210.enhancements.rename_reindex_axis:

rename, reindex now also accept axis keyword
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The :meth:DataFrame.rename and :meth:DataFrame.reindex methods have gained
the axis keyword to specify the axis to target with the operation
(:issue:12392).

Here's rename:

.. ipython:: python

df = pd.DataFrame({"A": [1, 2, 3], "B": [4, 5, 6]})
df.rename(str.lower, axis='columns')
df.rename(id, axis='index')

And reindex:

.. ipython:: python

df.reindex(['A', 'B', 'C'], axis='columns')
df.reindex([0, 1, 3], axis='index')

The "index, columns" style continues to work as before.

.. ipython:: python

df.rename(index=id, columns=str.lower)
df.reindex(index=[0, 1, 3], columns=['A', 'B', 'C'])

We highly encourage using named arguments to avoid confusion when using either
style.

.. _whatsnew_0210.enhancements.categorical_dtype:

CategoricalDtype for specifying categoricals
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

:class:pandas.api.types.CategoricalDtype has been added to the public API and
expanded to include the categories and ordered attributes. A
CategoricalDtype can be used to specify the set of categories and
orderedness of an array, independent of the data. This can be useful for example,
when converting string data to a Categorical (:issue:14711,
:issue:15078, :issue:16015, :issue:17643):

.. ipython:: python

from pandas.api.types import CategoricalDtype

s = pd.Series(['a', 'b', 'c', 'a']) strings
dtype = CategoricalDtype(categories=['a', 'b', 'c', 'd'], ordered=True)
s.astype(dtype)

One place that deserves special mention is in :meth:read_csv. Previously, with
dtype={'col': 'category'}, the returned values and categories would always
be strings.

.. ipython:: python
:suppress:

from pandas.compat import StringIO

.. ipython:: python

data = 'A,B\na,1\nb,2\nc,3'
pd.read_csv(StringIO(data), dtype={'B': 'category'}).B.cat.categories

Notice the "object" dtype.

With a CategoricalDtype of all numerics, datetimes, or
timedeltas, we can automatically convert to the correct type

.. ipython:: python

dtype = {'B': CategoricalDtype([1, 2, 3])}
pd.read_csv(StringIO(data), dtype=dtype).B.cat.categories

The values have been correctly interpreted as integers.

The .dtype property of a Categorical, CategoricalIndex or a
Series with categorical type will now return an instance of
CategoricalDtype. While the repr has changed, str(CategoricalDtype()) is
still the string 'category'. We'll take this moment to remind users that the
preferred way to detect categorical data is to use
:func:pandas.api.types.is_categorical_dtype, and not str(dtype) == 'category'.

See the :ref:CategoricalDtype docs <categorical.categoricaldtype> for more.

.. _whatsnew_0210.enhancements.GroupBy_pipe:

GroupBy objects now have a pipe method
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

GroupBy objects now have a pipe method, similar to the one on
DataFrame and Series, that allow for functions that take a
GroupBy to be composed in a clean, readable syntax. (:issue:17871)

For a concrete example on combining .groupby and .pipe , imagine having a
DataFrame with columns for stores, products, revenue and sold quantity. We'd like to
do a groupwise calculation of prices (i.e. revenue/quantity) per store and per product.
We could do this in a multi-step operation, but expressing it in terms of piping can make the
code more readable.

First we set the data:

.. ipython:: python

import numpy as np
n = 1000
df = pd.DataFrame({'Store': np.random.choice(['Store_1', 'Store_2'], n),
'Product': np.random.choice(['Product_1', 'Product_2', 'Product_3'], n),
'Revenue': (np.random.random(n)*50+10).round(2),
'Quantity': np.random.randint(1, 10, size=n)})
df.head(2)

Now, to find prices per store/product, we can simply do:

.. ipython:: python

(df.groupby(['Store', 'Product'])
.pipe(lambda grp: grp.Revenue.sum()/grp.Quantity.sum())
.unstack().round(2))

See the :ref:documentation <groupby.pipe> for more.

.. _whatsnew_0210.enhancements.reanme_categories:

Categorical.rename_categories accepts a dict-like
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

:meth:~Series.cat.rename_categories now accepts a dict-like argument for
new_categories. The previous categories are looked up in the dictionary's
keys and replaced if found. The behavior of missing and extra keys is the same
as in :meth:DataFrame.rename.

.. ipython:: python

c = pd.Categorical(['a', 'a', 'b'])
c.rename_categories({"a": "eh", "b": "bee"})

.. warning::

To assist with upgrading pandas, rename_categories treats Series as
list-like. Typically, Series are considered to be dict-like (e.g. in
.rename, .map). In a future version of pandas rename_categories
will change to treat them as dict-like. Follow the warning message's
recommendations for writing future-proof code.

.. code-block:: ipython

   In [33]: c.rename_categories(pd.Series([0, 1], index=['a', 'c']))
   FutureWarning: Treating Series 'new_categories' as a list-like and using the values.
   In a future version, 'rename_categories' will treat Series like a dictionary.
   For dict-like, use 'new_categories.to_dict()'
   For list-like, use 'new_categories.values'.
   Out[33]:
   [0, 0, 1]
   Categories (2, int64): [0, 1]

.. _whatsnew_0210.enhancements.other:

Other Enhancements
^^^^^^^^^^^^^^^^^^

New functions or methods
""""""""""""""""""""""""

  • :meth:~pandas.core.resample.Resampler.nearest is added to support nearest-neighbor upsampling (:issue:17496).
  • :class:~pandas.Index has added support for a to_frame method (:issue:15230).

New keywords
""""""""""""

  • Added a skipna parameter to :func:~pandas.api.types.infer_dtype to
    support type inference in the presence of missing values (:issue:17059).
  • :func:Series.to_dict and :func:DataFrame.to_dict now support an into keyword which allows you to specify the collections.Mapping subclass that you would like returned. The default is dict, which is backwards compatible. (:issue:16122)
  • :func:Series.set_axis and :func:DataFrame.set_axis now support the inplace parameter. (:issue:14636)
  • :func:Series.to_pickle and :func:DataFrame.to_pickle have gained a protocol parameter (:issue:16252). By default, this parameter is set to HIGHEST_PROTOCOL <https://docs.python.org/3/library/pickle.htmldata-stream-format>__
  • :func:read_feather has gained the nthreads parameter for multi-threaded operations (:issue:16359)
  • :func:DataFrame.clip() and :func:Series.clip() have gained an inplace argument. (:issue:15388)
  • :func:crosstab has gained a margins_name parameter to define the name of the row / column that will contain the totals when margins=True. (:issue:15972)
  • :func:read_json now accepts a chunksize parameter that can be used when lines=True. If chunksize is passed, read_json now returns an iterator which reads in chunksize lines with each iteration. (:issue:17048)
  • :func:read_json and :func:~DataFrame.to_json now accept a compression argument which allows them to transparently handle compressed files. (:issue:17798)

Various enhancements
""""""""""""""""""""

  • Improved the import time of pandas by about 2.25x. (:issue:16764)
  • Support for PEP 519 -- Adding a file system path protocol <https://www.python.org/dev/peps/pep-0519/>_ on most readers (e.g.
    :func:read_csv) and writers (e.g. :meth:DataFrame.to_csv) (:issue:13823).
  • Added a __fspath__ method to pd.HDFStore, pd.ExcelFile,
    and pd.ExcelWriter to work properly with the file system path protocol (:issue:13823).
  • The validate argument for :func:merge now checks whether a merge is one-to-one, one-to-many, many-to-one, or many-to-many. If a merge is found to not be an example of specified merge type, an exception of type MergeError will be raised. For more, see :ref:here <merging.validation> (:issue:16270)
  • Added support for PEP 518 <https://www.python.org/dev/peps/pep-0518/>_ (pyproject.toml) to the build system (:issue:16745)
  • :func:RangeIndex.append now returns a RangeIndex object when possible (:issue:16212)
  • :func:Series.rename_axis and :func:DataFrame.rename_axis with inplace=True now return None while renaming the axis inplace. (:issue:15704)
  • :func:api.types.infer_dtype now infers decimals. (:issue:15690)
  • :func:DataFrame.select_dtypes now accepts scalar values for include/exclude as well as list-like. (:issue:16855)
  • :func:date_range now accepts 'YS' in addition to 'AS' as an alias for start of year. (:issue:9313)
  • :func:date_range now accepts 'Y' in addition to 'A' as an alias for end of year. (:issue:9313)
  • :func:DataFrame.add_prefix and :func:DataFrame.add_suffix now accept strings containing the '%' character. (:issue:17151)
  • Read/write methods that infer compression (:func:read_csv, :func:read_table, :func:read_pickle, and :meth:~DataFrame.to_pickle) can now infer from path-like objects, such as pathlib.Path. (:issue:17206)
  • :func:read_sas now recognizes much more of the most frequently used date (datetime) formats in SAS7BDAT files. (:issue:15871)
  • :func:DataFrame.items and :func:Series.items are now present in both Python 2 and 3 and is lazy in all cases. (:issue:13918, :issue:17213)
  • :meth:pandas.io.formats.style.Styler.where has been implemented as a convenience for :meth:pandas.io.formats.style.Styler.applymap. (:issue:17474)
  • :func:MultiIndex.is_monotonic_decreasing has been implemented. Previously returned False in all cases. (:issue:16554)
  • :func:read_excel raises ImportError with a better message if xlrd is not installed. (:issue:17613)
  • :meth:DataFrame.assign will preserve the original order of **kwargs for Python 3.6+ users instead of sorting the column names. (:issue:14207)
  • :func:Series.reindex, :func:DataFrame.reindex, :func:Index.get_indexer now support list-like argument for tolerance. (:issue:17367)

.. _whatsnew_0210.api_breaking:

Backwards incompatible API changes

.. _whatsnew_0210.api_breaking.deps:

Dependencies have increased minimum versions
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

We have updated our minimum supported versions of dependencies (:issue:15206, :issue:15543, :issue:15214).
If installed, we now require:

+--------------+-----------------+----------+
| Package | Minimum Version | Required |
+==============+=================+==========+
| Numpy | 1.9.0 | X |
+--------------+-----------------+----------+
| Matplotlib | 1.4.3 | |
+--------------+-----------------+----------+
| Scipy | 0.14.0 | |
+--------------+-----------------+----------+
| Bottleneck | 1.0.0 | |
+--------------+-----------------+----------+

Additionally, support has been dropped for Python 3.4 (:issue:15251).

.. _whatsnew_0210.api_breaking.bottleneck:

Sum/Prod of all-NaN or empty Series/DataFrames is now consistently NaN
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. note::

The changes described here have been partially reverted. See
the :ref:v0.22.0 Whatsnew <whatsnew_0220> for more.

The behavior of sum and prod on all-NaN Series/DataFrames no longer depends on
whether bottleneck <http://berkeleyanalytics.com/bottleneck>__ is installed, and return value of sum and prod on an empty Series has changed (:issue:9422, :issue:15507).

Calling sum or prod on an empty or all-NaN Series, or columns of a DataFrame, will result in NaN. See the :ref:docs <missing_data.numeric_sum>.

.. ipython:: python

s = Series([np.nan])

Previously WITHOUT bottleneck installed:

.. code-block:: ipython

In [2]: s.sum()
Out[2]: np.nan

Previously WITH bottleneck:

.. code-block:: ipython

In [2]: s.sum()
Out[2]: 0.0

New Behavior, without regard to the bottleneck installation:

.. ipython:: python

s.sum()

Note that this also changes the sum of an empty Series. Previously this always returned 0 regardless of a bottlenck installation:

.. code-block:: ipython

In [1]: pd.Series([]).sum()
Out[1]: 0

but for consistency with the all-NaN case, this was changed to return NaN as well:

.. ipython:: python

pd.Series([]).sum()

.. _whatsnew_0210.api_breaking.loc:

Indexing with a list with missing labels is Deprecated
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Previously, selecting with a list of labels, where one or more labels were missing would always succeed, returning NaN for missing labels.
This will now show a FutureWarning. In the future this will raise a KeyError (:issue:15747).
This warning will trigger on a DataFrame or a Series for using .loc[] or [[]] when passing a list-of-labels with at least 1 missing label.
See the :ref:deprecation docs <indexing.deprecate_loc_reindex_listlike>.

.. ipython:: python

s = pd.Series([1, 2, 3])
s

Previous Behavior

.. code-block:: ipython

In [4]: s.loc[[1, 2, 3]]
Out[4]:
1 2.0
2 3.0
3 NaN
dtype: float64

Current Behavior

.. code-block:: ipython

In [4]: s.loc[[1, 2, 3]]
Passing list-likes to .loc or [] with any missing label will raise
KeyError in the future, you can use .reindex() as an alternative.

See the documentation here:
http://pandas.pydata.org/pandas-docs/stable/indexing.htmldeprecate-loc-reindex-listlike

Out[4]:
1 2.0
2 3.0
3 NaN
dtype: float64

The idiomatic way to achieve selecting potentially not-found elements is via .reindex()

.. ipython:: python

s.reindex([1, 2, 3])

Selection with all keys found is unchanged.

.. ipython:: python

s.loc[[1, 2]]

.. _whatsnew_0210.api.na_changes:

NA naming Changes
^^^^^^^^^^^^^^^^^

In order to promote more consistency among the pandas API, we have added additional top-level
functions :func:isna and :func:notna that are aliases for :func:isnull and :func:notnull.
The naming scheme is now more consistent with methods like .dropna() and .fillna(). Furthermore
in all cases where .isnull() and .notnull() methods are defined, these have additional methods
named .isna() and .notna(), these are included for classes Categorical,
Index, Series, and DataFrame. (:issue:15001).

The configuration option pd.options.mode.use_inf_as_null is deprecated, and pd.options.mode.use_inf_as_na is added as a replacement.

.. _whatsnew_0210.api_breaking.iteration_scalars:

Iteration of Series/Index will now return Python scalars
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Previously, when using certain iteration methods for a Series with dtype int or float, you would receive a numpy scalar, e.g. a np.int64, rather than a Python int. Issue (:issue:10904) corrected this for Series.tolist() and list(Series). This change makes all iteration methods consistent, in particular, for __iter__() and .map(); note that this only affects int/float dtypes. (:issue:13236, :issue:13258, :issue:14216).

.. ipython:: python

s = pd.Series([1, 2, 3])
s

Previously:

.. code-block:: ipython

In [2]: type(list(s)[0])
Out[2]: numpy.int64

New Behaviour:

.. ipython:: python

type(list(s)[0])

Furthermore this will now correctly box the results of iteration for :func:DataFrame.to_dict as well.

.. ipython:: python

d = {'a':[1], 'b':['b']}
df = pd.DataFrame(d)

Previously:

.. code-block:: ipython

In [8]: type(df.to_dict()['a'][0])
Out[8]: numpy.int64

New Behaviour:

.. ipython:: python

type(df.to_dict()['a'][0])

.. _whatsnew_0210.api_breaking.loc_with_index:

Indexing with a Boolean Index
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Previously when passing a boolean Index to .loc, if the index of the Series/DataFrame had boolean labels,
you would get a label based selection, potentially duplicating result labels, rather than a boolean indexing selection
(where True selects elements), this was inconsistent how a boolean numpy array indexed. The new behavior is to
act like a boolean numpy array indexer. (:issue:17738)

Previous Behavior:

.. ipython:: python

s = pd.Series([1, 2, 3], index=[False, True, False])
s

.. code-block:: ipython

In [59]: s.loc[pd.Index([True, False, True])]
Out[59]:
True 2
False 1
False 3
True 2
dtype: int64

Current Behavior

.. ipython:: python

s.loc[pd.Index([True, False, True])]

Furthermore, previously if you had an index that was non-numeric (e.g. strings), then a boolean Index would raise a KeyError.
This will now be treated as a boolean indexer.

Previously Behavior:

.. ipython:: python

s = pd.Series([1,2,3], index=['a', 'b', 'c'])
s

.. code-block:: ipython

In [39]: s.loc[pd.Index([True, False, True])]
KeyError: "None of [Index([True, False, True], dtype='object')] are in the [index]"

Current Behavior

.. ipython:: python

s.loc[pd.Index([True, False, True])]

.. _whatsnew_0210.api_breaking.period_index_resampling:

PeriodIndex resampling
^^^^^^^^^^^^^^^^^^^^^^^^^^

In previous versions of pandas, resampling a Series/DataFrame indexed by a PeriodIndex returned a DatetimeIndex in some cases (:issue:12884). Resampling to a multiplied frequency now returns a PeriodIndex (:issue:15944). As a minor enhancement, resampling a PeriodIndex can now handle NaT values (:issue:13224)

Previous Behavior:

.. code-block:: ipython

In [1]: pi = pd.period_range('2017-01', periods=12, freq='M')

In [2]: s = pd.Series(np.arange(12), index=pi)

In [3]: resampled = s.resample('2Q').mean()

In [4]: resampled
Out[4]:
2017-03-31 1.0
2017-09-30 5.5
2018-03-31 10.0
Freq: 2Q-DEC, dtype: float64

In [5]: resampled.index
Out[5]: DatetimeIndex(['2017-03-31', '2017-09-30', '2018-03-31'], dtype='datetime64[ns]', freq='2Q-DEC')

New Behavior:

.. ipython:: python

pi = pd.period_range('2017-01', periods=12, freq='M')

s = pd.Series(np.arange(12), index=pi)

resampled = s.resample('2Q').mean()

resampled

resampled.index

Upsampling and calling .ohlc() previously returned a Series, basically identical to calling .asfreq(). OHLC upsampling now returns a DataFrame with columns open, high, low and close (:issue:13083). This is consistent with downsampling and DatetimeIndex behavior.

Previous Behavior:

.. code-block:: ipython

In [1]: pi = pd.PeriodIndex(start='2000-01-01', freq='D', periods=10)

In [2]: s = pd.Series(np.arange(10), index=pi)

In [3]: s.resample('H').ohlc()
Out[3]:
2000-01-01 00:00 0.0
...
2000-01-10 23:00 NaN
Freq: H, Length: 240, dtype: float64

In [4]: s.resample('M').ohlc()
Out[4]:
open high low close
2000-01 0 9 0 9

New Behavior:

.. ipython:: python

pi = pd.PeriodIndex(start='2000-01-01', freq='D', periods=10)

s = pd.Series(np.arange(10), index=pi)

s.resample('H').ohlc()

s.resample('M').ohlc()

.. _whatsnew_0210.api_breaking.pandas_eval:

Improved error handling during item assignment in pd.eval
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

:func:eval will now raise a ValueError when item assignment malfunctions, or
inplace operations are specified, but there is no item assignment in the expression (:issue:16732)

.. ipython:: python

arr = np.array([1, 2, 3])

Previously, if you attempted the following expression, you would get a not very helpful error message:

.. code-block:: ipython

In [3]: pd.eval("a = 1 + 2", target=arr, inplace=True)
...
IndexError: only integers, slices (:), ellipsis (...), numpy.newaxis (None)
and integer or boolean arrays are valid indices

This is a very long way of saying numpy arrays don't support string-item indexing. With this
change, the error message is now this:

.. code-block:: python

In [3]: pd.eval("a = 1 + 2", target=arr, inplace=True)
...
ValueError: Cannot assign expression output to target

It also used to be possible to evaluate expressions inplace, even if there was no item assignment:

.. code-block:: ipython

In [4]: pd.eval("1 + 2", target=arr, inplace=True)
Out[4]: 3

However, this input does not make much sense because the output is not being assigned to
the target. Now, a ValueError will be raised when such an input is passed in:

.. code-block:: ipython

In [4]: pd.eval("1 + 2", target=arr, inplace=True)
...
ValueError: Cannot operate inplace if there is no assignment

.. _whatsnew_0210.api_breaking.dtype_conversions:

Dtype Conversions
^^^^^^^^^^^^^^^^^

Previously assignments, .where() and .fillna() with a bool assignment, would coerce to same the type (e.g. int / float), or raise for datetimelikes. These will now preserve the bools with object dtypes. (:issue:16821).

.. ipython:: python

s = Series([1, 2, 3])

.. code-block:: python

In [5]: s[1] = True

In [6]: s
Out[6]:
0 1
1 1
2 3
dtype: int64

New Behavior

.. ipython:: python

s[1] = True
s

Previously, as assignment to a datetimelike with a non-datetimelike would coerce the
non-datetime-like item being assigned (:issue:14145).

.. ipython:: python

s = pd.Series([pd.Timestamp('2011-01-01'), pd.Timestamp('2012-01-01')])

.. code-block:: python

In [1]: s[1] = 1

In [2]: s
Out[2]:
0 2011-01-01 00:00:00.000000000
1 1970-01-01 00:00:00.000000001
dtype: datetime64[ns]

These now coerce to object dtype.

.. ipython:: python

s[1] = 1
s

  • Inconsistent behavior in .where() with datetimelikes which would raise rather than coerce to object (:issue:16402)
  • Bug in assignment against int64 data with np.ndarray with float64 dtype may keep int64 dtype (:issue:14001)

.. _whatsnew_210.api.multiindex_single:

MultiIndex Constructor with a Single Level
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The MultiIndex constructors no longer squeezes a MultiIndex with all
length-one levels down to a regular Index. This affects all the
MultiIndex constructors. (:issue:17178)

Previous behavior:

.. code-block:: ipython

In [2]: pd.MultiIndex.from_tuples([('a',), ('b',)])
Out[2]: Index(['a', 'b'], dtype='object')

Length 1 levels are no longer special-cased. They behave exactly as if you had
length 2+ levels, so a :class:MultiIndex is always returned from all of the
MultiIndex constructors:

.. ipython:: python

pd.MultiIndex.from_tuples([('a',), ('b',)])

.. _whatsnew_0210.api.utc_localization_with_series:

UTC Localization with Series
^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Previously, :func:to_datetime did not localize datetime Series data when utc=True was passed. Now, :func:to_datetime will correctly localize Series with a datetime64[ns, UTC] dtype to be consistent with how list-like and Index data are handled. (:issue:6415).

Previous Behavior

.. ipython:: python

s = Series(['20130101 00:00:00'] * 3)

.. code-block:: ipython

In [12]: pd.to_datetime(s, utc=True)
Out[12]:
0 2013-01-01
1 2013-01-01
2 2013-01-01
dtype: datetime64[ns]

New Behavior

.. ipython:: python

pd.to_datetime(s, utc=True)

Additionally, DataFrames with datetime columns that were parsed by :func:read_sql_table and :func:read_sql_query will also be localized to UTC only if the original SQL columns were timezone aware datetime columns.

.. _whatsnew_0210.api.consistency_of_range_functions:

Consistency of Range Functions
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

In previous versions, there were some inconsistencies between the various range functions: :func:date_range, :func:bdate_range, :func:period_range, :func:timedelta_range, and :func:interval_range. (:issue:17471).

One of the inconsistent behaviors occurred when the start, end and period parameters were all specified, potentially leading to ambiguous ranges. When all three parameters were passed, interval_range ignored the period parameter, period_range ignored the end parameter, and the other range functions raised. To promote consistency among the range functions, and avoid potentially ambiguous ranges, interval_range and period_range will now raise when all three parameters are passed.

Previous Behavior:

.. code-block:: ipython

In [2]: pd.interval_range(start=0, end=4, periods=6)
Out[2]:
IntervalIndex([(0, 1], (1, 2], (2, 3]]
closed='right',
dtype='interval[int64]')

In [3]: pd.period_range(start='2017Q1', end='2017Q4', periods=6, freq='Q')
Out[3]: PeriodIndex(['2017Q1', '2017Q2', '2017Q3', '2017Q4', '2018Q1', '2018Q2'], dtype='period[Q-DEC]', freq='Q-DEC')

New Behavior:

.. code-block:: ipython

In [2]: pd.interval_range(start=0, end=4, periods=6)

ValueError: Of the three parameters: start, end, and periods, exactly two must be specified

In [3]: pd.period_range(start='2017Q1', end='2017Q4', periods=6, freq='Q')

ValueError: Of the three parameters: start, end, and periods, exactly two must be specified

Additionally, the endpoint parameter end was not included in the intervals produced by interval_range. However, all other range functions include end in their output. To promote consistency among the range functions, interval_range will now include end as the right endpoint of the final interval, except if freq is specified in a way which skips end.

Previous Behavior:

.. code-block:: ipython

In [4]: pd.interval_range(start=0, end=4)
Out[4]:
IntervalIndex([(0, 1], (1, 2], (2, 3]]
closed='right',
dtype='interval[int64]')

New Behavior:

.. ipython:: python

pd.interval_range(start=0, end=4)

.. _whatsnew_0210.api.mpl_converters:

No Automatic Matplotlib Converters
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Pandas no longer registers our date, time, datetime,
datetime64, and Period converters with matplotlib when pandas is
imported. Matplotlib plot methods (plt.plot, ax.plot, ...), will not
nicely format the x-axis for DatetimeIndex or PeriodIndex values. You
must explicitly register these methods:

.. ipython:: python

from pandas.tseries import converter
converter.register()

fig, ax = plt.subplots()
plt.plot(pd.date_range('2017', periods=6), range(6))

Pandas built-in Series.plot and DataFrame.plot will register these
converters on first-use (:issue:17710).

.. _whatsnew_0210.api:

Other API Changes
^^^^^^^^^^^^^^^^^

  • The Categorical constructor no longer accepts a scalar for the categories keyword. (:issue:16022)
  • Accessing a non-existent attribute on a closed :class:~pandas.HDFStore will now
    raise an AttributeError rather than a ClosedFileError (:issue:16301)
  • :func:read_csv now issues a UserWarning if the names parameter contains duplicates (:issue:17095)
  • :func:read_csv now treats 'null' and 'n/a' strings as missing values by default (:issue:16471, :issue:16078)
  • :class:pandas.HDFStore's string representation is now faster and less detailed. For the previous behavior, use pandas.HDFStore.info(). (:issue:16503).
  • Compression defaults in HDF stores now follow pytables standards. Default is no compression and if complib is missing and complevel > 0 zlib is used (:issue:15943)
  • Index.get_indexer_non_unique() now returns a ndarray indexer rather than an Index; this is consistent with Index.get_indexer() (:issue:16819)
  • Removed the slow decorator from pandas.util.testing, which caused issues for some downstream packages' test suites. Use pytest.mark.slow instead, which achieves the same thing (:issue:16850)
  • Moved definition of MergeError to the pandas.errors module.
  • The signature of :func:Series.set_axis and :func:DataFrame.set_axis has been changed from set_axis(axis, labels) to set_axis(labels, axis=0), for consistency with the rest of the API. The old signature is deprecated and will show a FutureWarning (:issue:14636)
  • :func:Series.argmin and :func:Series.argmax will now raise a TypeError when used with object dtypes, instead of a ValueError (:issue:13595)
  • :class:Period is now immutable, and will now raise an AttributeError when a user tries to assign a new value to the ordinal or freq attributes (:issue:17116).
  • :func:to_datetime when passed a tz-aware origin= kwarg will now raise a more informative ValueError rather than a TypeError (:issue:16842)
  • :func:to_datetime now raises a ValueError when format includes %W or %U without also including day of the week and calendar year (:issue:16774)
  • Renamed non-functional index to index_col in :func:read_stata to improve API consistency (:issue:16342)
  • Bug in :func:DataFrame.drop caused boolean labels False and True to be treated as labels 0 and 1 respectively when dropping indices from a numeric index. This will now raise a ValueError (:issue:16877)
  • Restricted DateOffset keyword arguments. Previously, DateOffset subclasses allowed arbitrary keyword arguments which could lead to unexpected behavior. Now, only valid arguments will be accepted. (:issue:17176).

.. _whatsnew_0210.deprecations:

Deprecations

  • :meth:DataFrame.from_csv and :meth:Series.from_csv have been deprecated in favor of :func:read_csv() (:issue:4191)
  • :func:read_excel() has deprecated sheetname in favor of sheet_name for consistency with .to_excel() (:issue:10559).
  • :func:read_excel() has deprecated parse_cols in favor of usecols for consistency with :func:read_csv (:issue:4988)
  • :func:read_csv() has deprecated the tupleize_cols argument. Column tuples will always be converted to a MultiIndex (:issue:17060)
  • :meth:DataFrame.to_csv has deprecated the tupleize_cols argument. Multi-index columns will be always written as rows in the CSV file (:issue:17060)
  • The convert parameter has been deprecated in the .take() method, as it was not being respected (:issue:16948)
  • pd.options.html.border has been deprecated in favor of pd.options.display.html.border (:issue:15793).
  • :func:SeriesGroupBy.nth has deprecated True in favor of 'all' for its kwarg dropna (:issue:11038).
  • :func:DataFrame.as_blocks is deprecated, as this is exposing the internal implementation (:issue:17302)
  • pd.TimeGrouper is deprecated in favor of :class:pandas.Grouper (:issue:16747)
  • cdate_range has been deprecated in favor of :func:bdate_range, which has gained weekmask and holidays parameters for building custom frequency date ranges. See the :ref:documentation <timeseries.custom-freq-ranges> for more details (:issue:17596)
  • passing categories or ordered kwargs to :func:Series.astype is deprecated, in favor of passing a :ref:CategoricalDtype <whatsnew_0210.enhancements.categorical_dtype> (:issue:17636)
  • .get_value and .set_value on Series, DataFrame, Panel, SparseSeries, and SparseDataFrame are deprecated in favor of using .iat[] or .at[] accessors (:issue:15269)
  • Passing a non-existent column in .to_excel(..., columns=) is deprecated and will raise a KeyError in the future (:issue:17295)
  • raise_on_error parameter to :func:Series.where, :func:Series.mask, :func:DataFrame.where, :func:DataFrame.mask is deprecated, in favor of errors= (:issue:14968)
  • Using :meth:DataFrame.rename_axis and :meth:Series.rename_axis to alter index or column labels is now deprecated in favor of using .rename. rename_axis may still be used to alter the name of the index or columns (:issue:17833).
  • :meth:~DataFrame.reindex_axis has been deprecated in favor of :meth:~DataFrame.reindex. See :ref:here <whatsnew_0210.enhancements.rename_reindex_axis> for more (:issue:17833).

.. _whatsnew_0210.deprecations.select:

Series.select and DataFrame.select
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The :meth:Series.select and :meth:DataFrame.select methods are deprecated in favor of using df.loc[labels.map(crit)] (:issue:12401)

.. ipython:: python

df = DataFrame({'A': [1, 2, 3]}, index=['foo', 'bar', 'baz'])

.. code-block:: ipython

In [3]: df.select(lambda x: x in ['bar', 'baz'])
FutureWarning: select is deprecated and will be removed in a future release. You can use .loc[crit] as a replacement
Out[3]:
A
bar 2
baz 3

.. ipython:: python

df.loc[df.index.map(lambda x: x in ['bar', 'baz'])]

.. _whatsnew_0210.deprecations.argmin_min:

Series.argmax and Series.argmin
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The behavior of :func:Series.argmax and :func:Series.argmin have been deprecated in favor of :func:Series.idxmax and :func:Series.idxmin, respectively (:issue:16830).

For compatibility with NumPy arrays, pd.Series implements argmax and
argmin. Since pandas 0.13.0, argmax has been an alias for
:meth:pandas.Series.idxmax, and argmin has been an alias for
:meth:pandas.Series.idxmin. They return the label of the maximum or minimum,
rather than the position.

We've deprecated the current behavior of Series.argmax and
Series.argmin. Using either of these will emit a FutureWarning. Use
:meth:Series.idxmax if you want the label of the maximum. Use
Series.values.argmax() if you want the position of the maximum. Likewise for
the minimum. In a future release Series.argmax and Series.argmin will
return the position of the maximum or minimum.

.. _whatsnew_0210.prior_deprecations:

Removal of prior version deprecations/changes

  • :func:read_excel() has dropped the has_index_names parameter (:issue:10967)
  • The pd.options.display.height configuration has been dropped (:issue:3663)
  • The pd.options.display.line_width configuration has been dropped (:issue:2881)
  • The pd.options.display.mpl_style configuration has been dropped (:issue:12190)
  • Index has dropped the .sym_diff() method in favor of .symmetric_difference() (:issue:12591)
  • Categorical has dropped the .order() and .sort() methods in favor of .sort_values() (:issue:12882)
  • :func:eval and :func:DataFrame.eval have changed the default of inplace from None to False (:issue:11149)
  • The function get_offset_name has been dropped in favor of the .freqstr attribute for an offset (:issue:11834)
  • pandas no longer tests for compatibility with hdf5-files created with pandas < 0.11 (:issue:17404).

.. _whatsnew_0210.performance:

Performance Improvements

  • Improved performance of instantiating :class:SparseDataFrame (:issue:16773)
  • :attr:Series.dt no longer performs frequency inference, yielding a large speedup when accessing the attribute (:issue:17210)
  • Improved performance of :meth:~Series.cat.set_categories by not materializing the values (:issue:17508)
  • :attr:Timestamp.microsecond no longer re-computes on attribute access (:issue:17331)
  • Improved performance of the :class:CategoricalIndex for data that is already categorical dtype (:issue:17513)
  • Improved performance of :meth:RangeIndex.min and :meth:RangeIndex.max by using RangeIndex properties to perform the computations (:issue:17607)

.. _whatsnew_0210.docs:

Documentation Changes

  • Several NaT method docstrings (e.g. :func:NaT.ctime) were incorrect (:issue:17327)
  • The documentation has had references to versions < v0.17 removed and cleaned up (:issue:17442, :issue:17442, :issue:17404 & :issue:17504)

.. _whatsnew_0210.bug_fixes:

Bug Fixes

Conversion
^^^^^^^^^^

  • Bug in assignment against datetime-like data with int may incorrectly convert to datetime-like (:issue:14145)
  • Bug in assignment against int64 data with np.ndarray with float64 dtype may keep int64 dtype (:issue:14001)
  • Fixed the return type of IntervalIndex.is_non_overlapping_monotonic to be a Python bool for consistency with similar attributes/methods. Previously returned a numpy.bool_. (:issue:17237)
  • Bug in IntervalIndex.is_non_overlapping_monotonic when intervals are closed on both sides and overlap at a point (:issue:16560)
  • Bug in :func:Series.fillna returns frame when inplace=True and value is dict (:issue:16156)
  • Bug in :attr:Timestamp.weekday_name returning a UTC-based weekday name when localized to a timezone (:issue:17354)
  • Bug in Timestamp.replace when replacing tzinfo around DST changes (:issue:15683)
  • Bug in Timedelta construction and arithmetic that would not propagate the Overflow exception (:issue:17367)
  • Bug in :meth:~DataFrame.astype converting to object dtype when passed extension type classes (DatetimeTZDtype``, ``CategoricalDtype``) rather than instances. Now a ``TypeError`` is raised when a class is passed (:issue:17780`).
  • Bug in :meth:to_numeric in which elements were not always being coerced to numeric when errors=&#39;coerce&#39; (:issue:17007, :issue:17125)
  • Bug in DataFrame and Series constructors where range objects are converted to int32 dtype on Windows instead of int64 (:issue:16804)

Indexing
^^^^^^^^

  • When called with a null slice (e.g. df.iloc[:]), the .iloc and .loc indexers return a shallow copy of the original object. Previously they returned the original object. (:issue:13873).
  • When called on an unsorted MultiIndex, the loc indexer now will raise UnsortedIndexError only if proper slicing is used on non-sorted levels (:issue:16734).
  • Fixes regression in 0.20.3 when indexing with a string on a TimedeltaIndex (:issue:16896).
  • Fixed :func:TimedeltaIndex.get_loc handling of np.timedelta64 inputs (:issue:16909).
  • Fix :func:MultiIndex.sort_index ordering when ascending argument is a list, but not all levels are specified, or are in a different order (:issue:16934).
  • Fixes bug where indexing with np.inf caused an OverflowError to be raised (:issue:16957)
  • Bug in reindexing on an empty CategoricalIndex (:issue:16770)
  • Fixes DataFrame.loc for setting with alignment and tz-aware DatetimeIndex (:issue:16889)
  • Avoids IndexError when passing an Index or Series to .iloc with older numpy (:issue:17193)
  • Allow unicode empty strings as placeholders in multilevel columns in Python 2 (:issue:17099)
  • Bug in .iloc when used with inplace addition or assignment and an int indexer on a MultiIndex causing the wrong indexes to be read from and written to (:issue:17148)
  • Bug in .isin() in which checking membership in empty Series objects raised an error (:issue:16991)
  • Bug in CategoricalIndex reindexing in which specified indices containing duplicates were not being respected (:issue:17323)
  • Bug in intersection of RangeIndex with negative step (:issue:17296)
  • Bug in IntervalIndex where performing a scalar lookup fails for included right endpoints of non-overlapping monotonic decreasing indexes (:issue:16417, :issue:17271)
  • Bug in :meth:DataFrame.first_valid_index and :meth:DataFrame.last_valid_index when no valid entry (:issue:17400)
  • Bug in :func:Series.rename when called with a callable, incorrectly alters the name of the Series, rather than the name of the Index. (:issue:17407)
  • Bug in :func:String.str_get raises IndexError instead of inserting NaNs when using a negative index. (:issue:17704)

I/O
^^^

  • Bug in :func:read_hdf when reading a timezone aware index from fixed format HDFStore (:issue:17618)
  • Bug in :func:read_csv in which columns were not being thoroughly de-duplicated (:issue:17060)
  • Bug in :func:read_csv in which specified column names were not being thoroughly de-duplicated (:issue:17095)
  • Bug in :func:read_csv in which non integer values for the header argument generated an unhelpful / unrelated error message (:issue:16338)
  • Bug in :func:read_csv in which memory management issues in exception handling, under certain conditions, would cause the interpreter to segfault (:issue:14696, :issue:16798).
  • Bug in :func:read_csv when called with low_memory=False in which a CSV with at least one column > 2GB in size would incorrectly raise a MemoryError (:issue:16798).
  • Bug in :func:read_csv when called with a single-element list header would return a DataFrame of all NaN values (:issue:7757)
  • Bug in :meth:`Data

@coveralls
Copy link

Coverage Status

Coverage remained the same at 92.851% when pulling 0d8c077 on pyup-pin-pandas-0.22.0 into d163527 on develop.

@gasparka gasparka closed this Jun 5, 2018
@gasparka gasparka deleted the pyup-pin-pandas-0.22.0 branch June 26, 2018 12:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants