What's new in 2.2.0 (Month XX, 2024)

These are the changes in pandas 2.2.0. See :ref:`release` for a full changelog including other versions of pandas.

Upcoming changes in pandas 3.0

pandas 3.0 will bring two bigger changes to the default behavior of pandas.

Copy-on-Write

The currently optional mode Copy-on-Write will be enabled by default in pandas 3.0. There won't be an option to keep the current behavior enabled. The new behavioral semantics are explained in the :ref:`user guide about Copy-on-Write <copy_on_write>`.

The new behavior can be enabled since pandas 2.0 with the following option:

pd.options.mode.copy_on_write = True

This change brings different changes in behavior in how pandas operates with respect to copies and views. Some of these changes allow a clear deprecation, like the changes in chained assignment. Other changes are more subtle and thus, the warnings are hidden behind an option that can be enabled in pandas 2.2.

pd.options.mode.copy_on_write = "warn"

This mode will warn in many different scenarios that aren't actually relevant to most queries. We recommend exploring this mode, but it is not necessary to get rid of all of these warnings. The :ref:`migration guide <copy_on_write.migration_guide>` explains the upgrade process in more detail.

Dedicated string data type (backed by Arrow) by default

Historically, pandas represented string columns with NumPy object data type. This representation has numerous problems, including slow performance and a large memory footprint. This will change in pandas 3.0. pandas will start inferring string columns as a new string data type, backed by Arrow, which represents strings contiguous in memory. This brings a huge performance and memory improvement.

Old behavior:

In [1]: ser = pd.Series(["a", "b"])
Out[1]:
0    a
1    b
dtype: object

New behavior:

In [1]: ser = pd.Series(["a", "b"])
Out[1]:
0    a
1    b
dtype: string

The string data type that is used in these scenarios will mostly behave as NumPy object would, including missing value semantics and general operations on these columns.

This change includes a few additional changes across the API:

Currently, specifying dtype="string" creates a dtype that is backed by Python strings which are stored in a NumPy array. This will change in pandas 3.0, this dtype will create an Arrow backed string column.
The column names and the Index will also be backed by Arrow strings.
PyArrow will become a required dependency with pandas 3.0 to accommodate this change.

This future dtype inference logic can be enabled with:

pd.options.future.infer_string = True

Enhancements

ADBC Driver support in to_sql and read_sql

:func:`read_sql` and :meth:`~DataFrame.to_sql` now work with Apache Arrow ADBC drivers. Compared to traditional drivers used via SQLAlchemy, ADBC drivers should provide significant performance improvements, better type support and cleaner nullability handling.

import adbc_driver_postgresql.dbapi as pg_dbapi

df = pd.DataFrame(
    [
        [1, 2, 3],
        [4, 5, 6],
    ],
    columns=['a', 'b', 'c']
)
uri = "postgresql://postgres:postgres@localhost/postgres"
with pg_dbapi.connect(uri) as conn:
    df.to_sql("pandas_table", conn, index=False)

# for round-tripping
with pg_dbapi.connect(uri) as conn:
    df2 = pd.read_sql("pandas_table", conn)

The Arrow type system offers a wider array of types that can more closely match what databases like PostgreSQL can offer. To illustrate, note this (non-exhaustive) listing of types available in different databases and pandas backends:

numpy/pandas	arrow	postgres	sqlite
int16/Int16	int16	SMALLINT	INTEGER
int32/Int32	int32	INTEGER	INTEGER
int64/Int64	int64	BIGINT	INTEGER
float32	float32	REAL	REAL
float64	float64	DOUBLE PRECISION	REAL
object	string	TEXT	TEXT
bool	`bool_`	BOOLEAN
datetime64[ns]	timestamp(us)	TIMESTAMP
datetime64[ns,tz]	timestamp(us,tz)	TIMESTAMPTZ
	date32	DATE
	month_day_nano_interval	INTERVAL
	binary	BINARY	BLOB
	decimal128	DECIMAL [1]
	list	ARRAY [1]
	struct	COMPOSITE TYPE [1]

Footnotes

[1]	(1, 2, 3) Not implemented as of writing, but theoretically possible

If you are interested in preserving database types as best as possible throughout the lifecycle of your DataFrame, users are encouraged to leverage the dtype_backend="pyarrow" argument of :func:`~pandas.read_sql`

# for round-tripping
with pg_dbapi.connect(uri) as conn:
    df2 = pd.read_sql("pandas_table", conn, dtype_backend="pyarrow")

This will prevent your data from being converted to the traditional pandas/NumPy type system, which often converts SQL types in ways that make them impossible to round-trip.

For a full list of ADBC drivers and their development status, see the ADBC Driver Implementation Status documentation.

`to_numpy` for NumPy nullable and Arrow types converts to suitable NumPy dtype

to_numpy for NumPy nullable and Arrow types will now convert to a suitable NumPy dtype instead of object dtype for nullable and PyArrow backed extension dtypes.

Old behavior:

In [1]: ser = pd.Series([1, 2, 3], dtype="Int64")
In [2]: ser.to_numpy()
Out[2]: array([1, 2, 3], dtype=object)

New behavior:

.. ipython:: python

    ser = pd.Series([1, 2, 3], dtype="Int64")
    ser.to_numpy()

    ser = pd.Series([1, 2, 3], dtype="timestamp[ns][pyarrow]")
    ser.to_numpy()

The default NumPy dtype (without any arguments) is determined as follows:

float dtypes are cast to NumPy floats
integer dtypes without missing values are cast to NumPy integer dtypes
integer dtypes with missing values are cast to NumPy float dtypes and NaN is used as missing value indicator
boolean dtypes without missing values are cast to NumPy bool dtype
boolean dtypes with missing values keep object dtype
datetime and timedelta types are cast to Numpy datetime64 and timedelta64 types respectively and NaT is used as missing value indicator

Series.struct accessor for PyArrow structured data

The Series.struct accessor provides attributes and methods for processing data with struct[pyarrow] dtype Series. For example, :meth:`Series.struct.explode` converts PyArrow structured data to a pandas DataFrame. (:issue:`54938`)

.. ipython:: python

    import pyarrow as pa
    series = pd.Series(
        [
            {"project": "pandas", "version": "2.2.0"},
            {"project": "numpy", "version": "1.25.2"},
            {"project": "pyarrow", "version": "13.0.0"},
        ],
        dtype=pd.ArrowDtype(
            pa.struct([
                ("project", pa.string()),
                ("version", pa.string()),
            ])
        ),
    )
    series.struct.explode()

Use :meth:`Series.struct.field` to index into a (possible nested) struct field.

.. ipython:: python

    series.struct.field("project")

Series.list accessor for PyArrow list data

The Series.list accessor provides attributes and methods for processing data with list[pyarrow] dtype Series. For example, :meth:`Series.list.__getitem__` allows indexing pyarrow lists in a Series. (:issue:`55323`)

.. ipython:: python

    import pyarrow as pa
    series = pd.Series(
        [
            [1, 2, 3],
            [4, 5],
            [6],
        ],
        dtype=pd.ArrowDtype(
            pa.list_(pa.int64())
        ),
    )
    series.list[0]

Calamine engine for :func:`read_excel`

The calamine engine was added to :func:`read_excel`. It uses python-calamine, which provides Python bindings for the Rust library calamine. This engine supports Excel files (.xlsx, .xlsm, .xls, .xlsb) and OpenDocument spreadsheets (.ods) (:issue:`50395`).

There are two advantages of this engine:

Calamine is often faster than other engines, some benchmarks show results up to 5x faster than 'openpyxl', 20x - 'odf', 4x - 'pyxlsb', and 1.5x - 'xlrd'. But, 'openpyxl' and 'pyxlsb' are faster in reading a few rows from large files because of lazy iteration over rows.
Calamine supports the recognition of datetime in .xlsb files, unlike 'pyxlsb' which is the only other engine in pandas that can read .xlsb files.

pd.read_excel("path_to_file.xlsb", engine="calamine")

For more, see :ref:`io.calamine` in the user guide on IO tools.

Other enhancements

:meth:`~DataFrame.to_sql` with method parameter set to multi works with Oracle on the backend
:attr:`Series.attrs` / :attr:`DataFrame.attrs` now uses a deepcopy for propagating attrs (:issue:`54134`).
:func:`get_dummies` now returning extension dtypes boolean or bool[pyarrow] that are compatible with the input dtype (:issue:`56273`)
:func:`read_csv` now supports on_bad_lines parameter with engine="pyarrow" (:issue:`54480`)
:func:`read_sas` returns datetime64 dtypes with resolutions better matching those stored natively in SAS, and avoids returning object-dtype in cases that cannot be stored with datetime64[ns] dtype (:issue:`56127`)
:func:`read_spss` now returns a :class:`DataFrame` that stores the metadata in :attr:`DataFrame.attrs` (:issue:`54264`)
:func:`tseries.api.guess_datetime_format` is now part of the public API (:issue:`54727`)
:meth:`DataFrame.apply` now allows the usage of numba (via engine="numba") to JIT compile the passed function, allowing for potential speedups (:issue:`54666`)
:meth:`ExtensionArray._explode` interface method added to allow extension type implementations of the explode method (:issue:`54833`)
:meth:`ExtensionArray.duplicated` added to allow extension type implementations of the duplicated method (:issue:`55255`)
:meth:`Series.ffill`, :meth:`Series.bfill`, :meth:`DataFrame.ffill`, and :meth:`DataFrame.bfill` have gained the argument limit_area; 3rd party :class:`.ExtensionArray` authors need to add this argument to the method _pad_or_backfill (:issue:`56492`)
Allow passing read_only, data_only and keep_links arguments to openpyxl using engine_kwargs of :func:`read_excel` (:issue:`55027`)
Implement masked algorithms for :meth:`Series.value_counts` (:issue:`54984`)
Implemented :meth:`Series.dt` methods and attributes for :class:`ArrowDtype` with pyarrow.duration type (:issue:`52284`)
Implemented :meth:`Series.str.extract` for :class:`ArrowDtype` (:issue:`56268`)
Improved error message that appears in :meth:`DatetimeIndex.to_period` with frequencies which are not supported as period frequencies, such as "BMS" (:issue:`56243`)
Improved error message when constructing :class:`Period` with invalid offsets such as "QS" (:issue:`55785`)
The dtypes string[pyarrow] and string[pyarrow_numpy] now both utilize the large_string type from PyArrow to avoid overflow for long columns (:issue:`56259`)

Notable bug fixes

These are bug fixes that might have notable behavior changes.

:func:`merge` and :meth:`DataFrame.join` now consistently follow documented sort behavior

In previous versions of pandas, :func:`merge` and :meth:`DataFrame.join` did not always return a result that followed the documented sort behavior. pandas now follows the documented sort behavior in merge and join operations (:issue:`54611`, :issue:`56426`, :issue:`56443`).

As documented, sort=True sorts the join keys lexicographically in the resulting :class:`DataFrame`. With sort=False, the order of the join keys depends on the join type (how keyword):

how="left": preserve the order of the left keys
how="right": preserve the order of the right keys
how="inner": preserve the order of the left keys
how="outer": sort keys lexicographically

One example with changing behavior is inner joins with non-unique left join keys and sort=False:

.. ipython:: python

    left = pd.DataFrame({"a": [1, 2, 1]})
    right = pd.DataFrame({"a": [1, 2]})
    result = pd.merge(left, right, how="inner", on="a", sort=False)

Old Behavior

In [5]: result
Out[5]:
   a
0  1
1  1
2  2

New Behavior

.. ipython:: python

    result

:func:`merge` and :meth:`DataFrame.join` no longer reorder levels when levels differ

In previous versions of pandas, :func:`merge` and :meth:`DataFrame.join` would reorder index levels when joining on two indexes with different levels (:issue:`34133`).

.. ipython:: python

    left = pd.DataFrame({"left": 1}, index=pd.MultiIndex.from_tuples([("x", 1), ("x", 2)], names=["A", "B"]))
    right = pd.DataFrame({"right": 2}, index=pd.MultiIndex.from_tuples([(1, 1), (2, 2)], names=["B", "C"]))
    left
    right
    result = left.join(right)

Old Behavior

In [5]: result
Out[5]:
       left  right
B A C
1 x 1     1      2
2 x 2     1      2

New Behavior

.. ipython:: python

    result

Backwards incompatible API changes

Increased minimum versions for dependencies

For optional dependencies the general recommendation is to use the latest version. Optional dependencies below the lowest tested version may still work but are not considered supported. The following table lists the optional dependencies that have had their minimum tested version increased.

Package	New Minimum Version
beautifulsoup4	4.11.2
blosc	1.21.3
bottleneck	1.3.6
fastparquet	2022.12.0
fsspec	2022.11.0
gcsfs	2022.11.0
lxml	4.9.2
matplotlib	3.6.3
numba	0.56.4
numexpr	2.8.4
qtpy	2.3.0
openpyxl	3.1.0
psycopg2	2.9.6
pyreadstat	1.2.0
pytables	3.8.0
pyxlsb	1.0.10
s3fs	2022.11.0
scipy	1.10.0
sqlalchemy	2.0.0
tabulate	0.9.0
xarray	2022.12.0
xlsxwriter	3.0.5
zstandard	0.19.0
pyqt5	5.15.8
tzdata	2022.7

See :ref:`install.dependencies` and :ref:`install.optional_dependencies` for more.

Other API changes

The hash values of nullable extension dtypes changed to improve the performance of the hashing operation (:issue:`56507`)
check_exact now only takes effect for floating-point dtypes in :func:`testing.assert_frame_equal` and :func:`testing.assert_series_equal`. In particular, integer dtypes are always checked exactly (:issue:`55882`)

Deprecations

Chained assignment

In preparation of larger upcoming changes to the copy / view behaviour in pandas 3.0 (:ref:`copy_on_write`, PDEP-7), we started deprecating chained assignment.

Chained assignment occurs when you try to update a pandas DataFrame or Series through two subsequent indexing operations. Depending on the type and order of those operations this currently does or does not work.

A typical example is as follows:

df = pd.DataFrame({"foo": [1, 2, 3], "bar": [4, 5, 6]})

# first selecting rows with a mask, then assigning values to a column
# -> this has never worked and raises a SettingWithCopyWarning
df[df["bar"] > 5]["foo"] = 100

# first selecting the column, and then assigning to a subset of that column
# -> this currently works
df["foo"][df["bar"] > 5] = 100

This second example of chained assignment currently works to update the original df. This will no longer work in pandas 3.0, and therefore we started deprecating this:

>>> df["foo"][df["bar"] > 5] = 100
FutureWarning: ChainedAssignmentError: behaviour will change in pandas 3.0!
You are setting values through chained assignment. Currently this works in certain cases, but when using Copy-on-Write (which will become the default behaviour in pandas 3.0) this will never work to update the original DataFrame or Series, because the intermediate object on which we are setting values will behave as a copy.
A typical example is when you are setting values in a column of a DataFrame, like:

df["col"][row_indexer] = value

Use `df.loc[row_indexer, "col"] = values` instead, to perform the assignment in a single step and ensure this keeps updating the original `df`.

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

You can fix this warning and ensure your code is ready for pandas 3.0 by removing the usage of chained assignment. Typically, this can be done by doing the assignment in a single step using for example .loc. For the example above, we can do:

df.loc[df["bar"] > 5, "foo"] = 100

The same deprecation applies to inplace methods that are done in a chained manner, such as:

>>> df["foo"].fillna(0, inplace=True)
FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method.
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.

When the goal is to update the column in the DataFrame df, the alternative here is to call the method on df itself, such as df.fillna({"foo": 0}, inplace=True).

See more details in the :ref:`migration guide <copy_on_write.migration_guide>`.

Deprecate aliases `M`, `Q`, `Y`, etc. in favour of `ME`, `QE`, `YE`, etc. for offsets

Deprecated the following frequency aliases (:issue:`9586`):

offsets	deprecated aliases	new aliases
:class:`MonthEnd`	`M`	`ME`
:class:`BusinessMonthEnd`	`BM`	`BME`
:class:`SemiMonthEnd`	`SM`	`SME`
:class:`CustomBusinessMonthEnd`	`CBM`	`CBME`
:class:`QuarterEnd`	`Q`	`QE`
:class:`BQuarterEnd`	`BQ`	`BQE`
:class:`YearEnd`	`Y`	`YE`
:class:`BYearEnd`	`BY`	`BYE`

For example:

Previous behavior:

In [8]: pd.date_range('2020-01-01', periods=3, freq='Q-NOV')
Out[8]:
DatetimeIndex(['2020-02-29', '2020-05-31', '2020-08-31'],
              dtype='datetime64[ns]', freq='Q-NOV')

Future behavior:

.. ipython:: python

    pd.date_range('2020-01-01', periods=3, freq='QE-NOV')

Deprecated automatic downcasting

Deprecated the automatic downcasting of object dtype results in a number of methods. These would silently change the dtype in a hard to predict manner since the behavior was value dependent. Additionally, pandas is moving away from silent dtype changes (:issue:`54710`, :issue:`54261`).

These methods are:

:meth:`Series.replace` and :meth:`DataFrame.replace`
:meth:`DataFrame.fillna`, :meth:`Series.fillna`
:meth:`DataFrame.ffill`, :meth:`Series.ffill`
:meth:`DataFrame.bfill`, :meth:`Series.bfill`
:meth:`DataFrame.mask`, :meth:`Series.mask`
:meth:`DataFrame.where`, :meth:`Series.where`
:meth:`DataFrame.clip`, :meth:`Series.clip`

Explicitly call :meth:`DataFrame.infer_objects` to replicate the current behavior in the future.

result = result.infer_objects(copy=False)

Or explicitly cast all-round floats to ints using astype.

Set the following option to opt into the future behavior:

In [9]: pd.set_option("future.no_silent_downcasting", True)

Other Deprecations

Changed :meth:`Timedelta.resolution_string` to return h, min, s, ms, us, and ns instead of H, T, S, L, U, and N, for compatibility with respective deprecations in frequency aliases (:issue:`52536`)
Deprecated :attr:`offsets.Day.delta`, :attr:`offsets.Hour.delta`, :attr:`offsets.Minute.delta`, :attr:`offsets.Second.delta`, :attr:`offsets.Milli.delta`, :attr:`offsets.Micro.delta`, :attr:`offsets.Nano.delta`, use pd.Timedelta(obj) instead (:issue:`55498`)
Deprecated :func:`pandas.api.types.is_interval` and :func:`pandas.api.types.is_period`, use isinstance(obj, pd.Interval) and isinstance(obj, pd.Period) instead (:issue:`55264`)
Deprecated :func:`pd.core.internals.api.make_block`, use public APIs instead (:issue:`40226`)
Deprecated :func:`read_gbq` and :meth:`DataFrame.to_gbq`. Use pandas_gbq.read_gbq and pandas_gbq.to_gbq instead https://pandas-gbq.readthedocs.io/en/latest/api.html (:issue:`55525`)
Deprecated :meth:`.DataFrameGroupBy.fillna` and :meth:`.SeriesGroupBy.fillna`; use :meth:`.DataFrameGroupBy.ffill`, :meth:`.DataFrameGroupBy.bfill` for forward and backward filling or :meth:`.DataFrame.fillna` to fill with a single value (or the Series equivalents) (:issue:`55718`)
Deprecated :meth:`DatetimeArray.__init__` and :meth:`TimedeltaArray.__init__`, use :func:`array` instead (:issue:`55623`)
Deprecated :meth:`Index.format`, use index.astype(str) or index.map(formatter) instead (:issue:`55413`)
Deprecated :meth:`Series.ravel`, the underlying array is already 1D, so ravel is not necessary (:issue:`52511`)
Deprecated :meth:`Series.resample` and :meth:`DataFrame.resample` with a :class:`PeriodIndex` (and the 'convention' keyword), convert to :class:`DatetimeIndex` (with .to_timestamp()) before resampling instead (:issue:`53481`)
Deprecated :meth:`Series.view`, use :meth:`Series.astype` instead to change the dtype (:issue:`20251`)
Deprecated core.internals members Block, ExtensionBlock, and DatetimeTZBlock, use public APIs instead (:issue:`55139`)
Deprecated year, month, quarter, day, hour, minute, and second keywords in the :class:`PeriodIndex` constructor, use :meth:`PeriodIndex.from_fields` instead (:issue:`55960`)
Deprecated accepting a type as an argument in :meth:`Index.view`, call without any arguments instead (:issue:`55709`)
Deprecated allowing non-integer periods argument in :func:`date_range`, :func:`timedelta_range`, :func:`period_range`, and :func:`interval_range` (:issue:`56036`)
Deprecated allowing non-keyword arguments in :meth:`DataFrame.to_clipboard` (:issue:`54229`)
Deprecated allowing non-keyword arguments in :meth:`DataFrame.to_csv` except path_or_buf (:issue:`54229`)
Deprecated allowing non-keyword arguments in :meth:`DataFrame.to_dict` (:issue:`54229`)
Deprecated allowing non-keyword arguments in :meth:`DataFrame.to_excel` except excel_writer (:issue:`54229`)
Deprecated allowing non-keyword arguments in :meth:`DataFrame.to_gbq` except destination_table (:issue:`54229`)
Deprecated allowing non-keyword arguments in :meth:`DataFrame.to_hdf` except path_or_buf (:issue:`54229`)
Deprecated allowing non-keyword arguments in :meth:`DataFrame.to_html` except buf (:issue:`54229`)
Deprecated allowing non-keyword arguments in :meth:`DataFrame.to_json` except path_or_buf (:issue:`54229`)
Deprecated allowing non-keyword arguments in :meth:`DataFrame.to_latex` except buf (:issue:`54229`)
Deprecated allowing non-keyword arguments in :meth:`DataFrame.to_markdown` except buf (:issue:`54229`)
Deprecated allowing non-keyword arguments in :meth:`DataFrame.to_parquet` except path (:issue:`54229`)
Deprecated allowing non-keyword arguments in :meth:`DataFrame.to_pickle` except path (:issue:`54229`)
Deprecated allowing non-keyword arguments in :meth:`DataFrame.to_string` except buf (:issue:`54229`)
Deprecated allowing non-keyword arguments in :meth:`DataFrame.to_xml` except path_or_buffer (:issue:`54229`)
Deprecated allowing passing :class:`BlockManager` objects to :class:`DataFrame` or :class:`SingleBlockManager` objects to :class:`Series` (:issue:`52419`)
Deprecated behavior of :meth:`Index.insert` with an object-dtype index silently performing type inference on the result, explicitly call result.infer_objects(copy=False) for the old behavior instead (:issue:`51363`)
Deprecated casting non-datetimelike values (mainly strings) in :meth:`Series.isin` and :meth:`Index.isin` with datetime64, timedelta64, and :class:`PeriodDtype` dtypes (:issue:`53111`)
Deprecated dtype inference in :class:`Index`, :class:`Series` and :class:`DataFrame` constructors when giving a pandas input, call .infer_objects on the input to keep the current behavior (:issue:`56012`)
Deprecated dtype inference when setting a :class:`Index` into a :class:`DataFrame`, cast explicitly instead (:issue:`56102`)
Deprecated including the groups in computations when using :meth:`.DataFrameGroupBy.apply` and :meth:`.DataFrameGroupBy.resample`; pass include_groups=False to exclude the groups (:issue:`7155`)
Deprecated indexing an :class:`Index` with a boolean indexer of length zero (:issue:`55820`)
Deprecated not passing a tuple to :class:`.DataFrameGroupBy.get_group` or :class:`.SeriesGroupBy.get_group` when grouping by a length-1 list-like (:issue:`25971`)
Deprecated string AS denoting frequency in :class:`YearBegin` and strings AS-DEC, AS-JAN, etc. denoting annual frequencies with various fiscal year starts (:issue:`54275`)
Deprecated string A denoting frequency in :class:`YearEnd` and strings A-DEC, A-JAN, etc. denoting annual frequencies with various fiscal year ends (:issue:`54275`)
Deprecated string BAS denoting frequency in :class:`BYearBegin` and strings BAS-DEC, BAS-JAN, etc. denoting annual frequencies with various fiscal year starts (:issue:`54275`)
Deprecated string BA denoting frequency in :class:`BYearEnd` and strings BA-DEC, BA-JAN, etc. denoting annual frequencies with various fiscal year ends (:issue:`54275`)
Deprecated strings H, BH, and CBH denoting frequencies in :class:`Hour`, :class:`BusinessHour`, :class:`CustomBusinessHour` (:issue:`52536`)
Deprecated strings H, S, U, and N denoting units in :func:`to_timedelta` (:issue:`52536`)
Deprecated strings H, T, S, L, U, and N denoting units in :class:`Timedelta` (:issue:`52536`)
Deprecated strings T, S, L, U, and N denoting frequencies in :class:`Minute`, :class:`Second`, :class:`Milli`, :class:`Micro`, :class:`Nano` (:issue:`52536`)
Deprecated support for combining parsed datetime columns in :func:`read_csv` along with the keep_date_col keyword (:issue:`55569`)
Deprecated the :attr:`.DataFrameGroupBy.grouper` and :attr:`SeriesGroupBy.grouper`; these attributes will be removed in a future version of pandas (:issue:`56521`)
Deprecated the :class:`.Grouping` attributes group_index, result_index, and group_arraylike; these will be removed in a future version of pandas (:issue:`56148`)
Deprecated the delim_whitespace keyword in :func:`read_csv` and :func:`read_table`, use sep="\\s+" instead (:issue:`55569`)
Deprecated the errors="ignore" option in :func:`to_datetime`, :func:`to_timedelta`, and :func:`to_numeric`; explicitly catch exceptions instead (:issue:`54467`)
Deprecated the fastpath keyword in the :class:`Series` constructor (:issue:`20110`)
Deprecated the kind keyword in :meth:`Series.resample` and :meth:`DataFrame.resample`, explicitly cast the object's index instead (:issue:`55895`)
Deprecated the ordinal keyword in :class:`PeriodIndex`, use :meth:`PeriodIndex.from_ordinals` instead (:issue:`55960`)
Deprecated the unit keyword in :class:`TimedeltaIndex` construction, use :func:`to_timedelta` instead (:issue:`55499`)
Deprecated the verbose keyword in :func:`read_csv` and :func:`read_table` (:issue:`55569`)
Deprecated the behavior of :meth:`DataFrame.replace` and :meth:`Series.replace` with :class:`CategoricalDtype`; in a future version replace will change the values while preserving the categories. To change the categories, use ser.cat.rename_categories instead (:issue:`55147`)
Deprecated the behavior of :meth:`Series.value_counts` and :meth:`Index.value_counts` with object dtype; in a future version these will not perform dtype inference on the resulting :class:`Index`, do result.index = result.index.infer_objects() to retain the old behavior (:issue:`56161`)
Deprecated the default of observed=False in :meth:`DataFrame.pivot_table`; will be True in a future version (:issue:`56236`)
Deprecated the extension test classes BaseNoReduceTests, BaseBooleanReduceTests, and BaseNumericReduceTests, use BaseReduceTests instead (:issue:`54663`)
Deprecated the option mode.data_manager and the ArrayManager; only the BlockManager will be available in future versions (:issue:`55043`)
Deprecated the previous implementation of :class:`DataFrame.stack`; specify future_stack=True to adopt the future version (:issue:`53515`)

Performance improvements

Performance improvement in :func:`.testing.assert_frame_equal` and :func:`.testing.assert_series_equal` (:issue:`55949`, :issue:`55971`)
Performance improvement in :func:`concat` with axis=1 and objects with unaligned indexes (:issue:`55084`)
Performance improvement in :func:`get_dummies` (:issue:`56089`)
Performance improvement in :func:`merge` and :func:`merge_ordered` when joining on sorted ascending keys (:issue:`56115`)
Performance improvement in :func:`merge_asof` when by is not None (:issue:`55580`, :issue:`55678`)
Performance improvement in :func:`read_stata` for files with many variables (:issue:`55515`)
Performance improvement in :meth:`DataFrame.groupby` when aggregating pyarrow timestamp and duration dtypes (:issue:`55031`)
Performance improvement in :meth:`DataFrame.join` when joining on unordered categorical indexes (:issue:`56345`)
Performance improvement in :meth:`DataFrame.loc` and :meth:`Series.loc` when indexing with a :class:`MultiIndex` (:issue:`56062`)
Performance improvement in :meth:`DataFrame.sort_index` and :meth:`Series.sort_index` when indexed by a :class:`MultiIndex` (:issue:`54835`)
Performance improvement in :meth:`DataFrame.to_dict` on converting DataFrame to dictionary (:issue:`50990`)
Performance improvement in :meth:`Index.difference` (:issue:`55108`)
Performance improvement in :meth:`Index.sort_values` when index is already sorted (:issue:`56128`)
Performance improvement in :meth:`MultiIndex.get_indexer` when method is not None (:issue:`55839`)
Performance improvement in :meth:`Series.duplicated` for pyarrow dtypes (:issue:`55255`)
Performance improvement in :meth:`Series.str.get_dummies` when dtype is "string[pyarrow]" or "string[pyarrow_numpy]" (:issue:`56110`)
Performance improvement in :meth:`Series.str` methods (:issue:`55736`)
Performance improvement in :meth:`Series.value_counts` and :meth:`Series.mode` for masked dtypes (:issue:`54984`, :issue:`55340`)
Performance improvement in :meth:`.DataFrameGroupBy.nunique` and :meth:`.SeriesGroupBy.nunique` (:issue:`55972`)
Performance improvement in :meth:`.SeriesGroupBy.idxmax`, :meth:`.SeriesGroupBy.idxmin`, :meth:`.DataFrameGroupBy.idxmax`, :meth:`.DataFrameGroupBy.idxmin` (:issue:`54234`)
Performance improvement when hashing a nullable extension array (:issue:`56507`)
Performance improvement when indexing into a non-unique index (:issue:`55816`)
Performance improvement when indexing with more than 4 keys (:issue:`54550`)
Performance improvement when localizing time to UTC (:issue:`55241`)

Bug fixes

Categorical

:meth:`Categorical.isin` raising InvalidIndexError for categorical containing overlapping :class:`Interval` values (:issue:`34974`)
Bug in :meth:`CategoricalDtype.__eq__` returning False for unordered categorical data with mixed types (:issue:`55468`)
Bug when casting pa.dictionary to :class:`CategoricalDtype` using a pa.DictionaryArray as categories (:issue:`56672`)

Datetimelike

Bug in :class:`DatetimeIndex` construction when passing both a tz and either dayfirst or yearfirst ignoring dayfirst/yearfirst (:issue:`55813`)
Bug in :class:`DatetimeIndex` when passing an object-dtype ndarray of float objects and a tz incorrectly localizing the result (:issue:`55780`)
Bug in :func:`Series.isin` with :class:`DatetimeTZDtype` dtype and comparison values that are all NaT incorrectly returning all-False even if the series contains NaT entries (:issue:`56427`)
Bug in :func:`concat` raising AttributeError when concatenating all-NA DataFrame with :class:`DatetimeTZDtype` dtype DataFrame (:issue:`52093`)
Bug in :func:`testing.assert_extension_array_equal` that could use the wrong unit when comparing resolutions (:issue:`55730`)
Bug in :func:`to_datetime` and :class:`DatetimeIndex` when passing a list of mixed-string-and-numeric types incorrectly raising (:issue:`55780`)
Bug in :func:`to_datetime` and :class:`DatetimeIndex` when passing mixed-type objects with a mix of timezones or mix of timezone-awareness failing to raise ValueError (:issue:`55693`)
Bug in :meth:`.Tick.delta` with very large ticks raising OverflowError instead of OutOfBoundsTimedelta (:issue:`55503`)
Bug in :meth:`DatetimeIndex.shift` with non-nanosecond resolution incorrectly returning with nanosecond resolution (:issue:`56117`)
Bug in :meth:`DatetimeIndex.union` returning object dtype for tz-aware indexes with the same timezone but different units (:issue:`55238`)
Bug in :meth:`Index.is_monotonic_increasing` and :meth:`Index.is_monotonic_decreasing` always caching :meth:`Index.is_unique` as True when first value in index is NaT (:issue:`55755`)
Bug in :meth:`Index.view` to a datetime64 dtype with non-supported resolution incorrectly raising (:issue:`55710`)
Bug in :meth:`Series.dt.round` with non-nanosecond resolution and NaT entries incorrectly raising OverflowError (:issue:`56158`)
Bug in :meth:`Series.fillna` with non-nanosecond resolution dtypes and higher-resolution vector values returning incorrect (internally-corrupted) results (:issue:`56410`)
Bug in :meth:`Timestamp.unit` being inferred incorrectly from an ISO8601 format string with minute or hour resolution and a timezone offset (:issue:`56208`)
Bug in .astype converting from a higher-resolution datetime64 dtype to a lower-resolution datetime64 dtype (e.g. datetime64[us]->datetime64[ms]) silently overflowing with values near the lower implementation bound (:issue:`55979`)
Bug in adding or subtracting a :class:`Week` offset to a datetime64 :class:`Series`, :class:`Index`, or :class:`DataFrame` column with non-nanosecond resolution returning incorrect results (:issue:`55583`)
Bug in addition or subtraction of :class:`BusinessDay` offset with offset attribute to non-nanosecond :class:`Index`, :class:`Series`, or :class:`DataFrame` column giving incorrect results (:issue:`55608`)
Bug in addition or subtraction of :class:`DateOffset` objects with microsecond components to datetime64 :class:`Index`, :class:`Series`, or :class:`DataFrame` columns with non-nanosecond resolution (:issue:`55595`)
Bug in addition or subtraction of very large :class:`.Tick` objects with :class:`Timestamp` or :class:`Timedelta` objects raising OverflowError instead of OutOfBoundsTimedelta (:issue:`55503`)
Bug in creating a :class:`Index`, :class:`Series`, or :class:`DataFrame` with a non-nanosecond :class:`DatetimeTZDtype` and inputs that would be out of bounds with nanosecond resolution incorrectly raising OutOfBoundsDatetime (:issue:`54620`)
Bug in creating a :class:`Index`, :class:`Series`, or :class:`DataFrame` with a non-nanosecond datetime64 (or :class:`DatetimeTZDtype`) from mixed-numeric inputs treating those as nanoseconds instead of as multiples of the dtype's unit (which would happen with non-mixed numeric inputs) (:issue:`56004`)
Bug in creating a :class:`Index`, :class:`Series`, or :class:`DataFrame` with a non-nanosecond datetime64 dtype and inputs that would be out of bounds for a datetime64[ns] incorrectly raising OutOfBoundsDatetime (:issue:`55756`)
Bug in parsing datetime strings with nanosecond resolution with non-ISO8601 formats incorrectly truncating sub-microsecond components (:issue:`56051`)
Bug in parsing datetime strings with sub-second resolution and trailing zeros incorrectly inferring second or millisecond resolution (:issue:`55737`)
Bug in the results of :func:`to_datetime` with an floating-dtype argument with unit not matching the pointwise results of :class:`Timestamp` (:issue:`56037`)
Fixed regression where :func:`concat` would raise an error when concatenating datetime64 columns with differing resolutions (:issue:`53641`)

Timedelta

Bug in :class:`Timedelta` construction raising OverflowError instead of OutOfBoundsTimedelta (:issue:`55503`)
Bug in rendering (__repr__) of :class:`TimedeltaIndex` and :class:`Series` with timedelta64 values with non-nanosecond resolution entries that are all multiples of 24 hours failing to use the compact representation used in the nanosecond cases (:issue:`55405`)

Timezones

Bug in :class:`AbstractHolidayCalendar` where timezone data was not propagated when computing holiday observances (:issue:`54580`)
Bug in :class:`Timestamp` construction with an ambiguous value and a pytz timezone failing to raise pytz.AmbiguousTimeError (:issue:`55657`)
Bug in :meth:`Timestamp.tz_localize` with nonexistent="shift_forward around UTC+0 during DST (:issue:`51501`)

Numeric

Bug in :func:`read_csv` with engine="pyarrow" causing rounding errors for large integers (:issue:`52505`)
Bug in :meth:`Series.__floordiv__` and :meth:`Series.__truediv__` for :class:`ArrowDtype` with integral dtypes raising for large divisors (:issue:`56706`)
Bug in :meth:`Series.__floordiv__` for :class:`ArrowDtype` with integral dtypes raising for large values (:issue:`56645`)
Bug in :meth:`Series.pow` not filling missing values correctly (:issue:`55512`)
Bug in :meth:`Series.replace` and :meth:`DataFrame.replace` matching float 0.0 with False and vice versa (:issue:`55398`)
Bug in :meth:`Series.round` raising for nullable boolean dtype (:issue:`55936`)

Conversion

Bug in :meth:`DataFrame.astype` when called with str on unpickled array - the array might change in-place (:issue:`54654`)
Bug in :meth:`DataFrame.astype` where errors="ignore" had no effect for extension types (:issue:`54654`)
Bug in :meth:`Series.convert_dtypes` not converting all NA column to null[pyarrow] (:issue:`55346`)

Strings

Bug in :func:`pandas.api.types.is_string_dtype` while checking object array with no elements is of the string dtype (:issue:`54661`)
Bug in :meth:`DataFrame.apply` failing when engine="numba" and columns or index have StringDtype (:issue:`56189`)
Bug in :meth:`DataFrame.reindex` not matching :class:`Index` with string[pyarrow_numpy] dtype (:issue:`56106`)
Bug in :meth:`Index.str.cat` always casting result to object dtype (:issue:`56157`)
Bug in :meth:`Series.__mul__` for :class:`ArrowDtype` with pyarrow.string dtype and string[pyarrow] for the pyarrow backend (:issue:`51970`)
Bug in :meth:`Series.str.find` when start < 0 for :class:`ArrowDtype` with pyarrow.string (:issue:`56411`)
Bug in :meth:`Series.str.replace` when n < 0 for :class:`ArrowDtype` with pyarrow.string (:issue:`56404`)
Bug in :meth:`Series.str.startswith` and :meth:`Series.str.endswith` with arguments of type tuple[str, ...] for :class:`ArrowDtype` with pyarrow.string dtype (:issue:`56579`)
Bug in :meth:`Series.str.startswith` and :meth:`Series.str.endswith` with arguments of type tuple[str, ...] for string[pyarrow] (:issue:`54942`)
Bug in :meth:`str.fullmatch` when dtype=pandas.ArrowDtype(pyarrow.string())) allows partial matches when regex ends in literal //$ (:issue:`56652`)
Bug in comparison operations for dtype="string[pyarrow_numpy]" raising if dtypes can't be compared (:issue:`56008`)

Interval

Bug in :class:`Interval` __repr__ not displaying UTC offsets for :class:`Timestamp` bounds. Additionally the hour, minute and second components will now be shown (:issue:`55015`)
Bug in :meth:`IntervalIndex.factorize` and :meth:`Series.factorize` with :class:`IntervalDtype` with datetime64 or timedelta64 intervals not preserving non-nanosecond units (:issue:`56099`)
Bug in :meth:`IntervalIndex.from_arrays` when passed datetime64 or timedelta64 arrays with mismatched resolutions constructing an invalid IntervalArray object (:issue:`55714`)
Bug in :meth:`IntervalIndex.from_tuples` raising if subtype is a nullable extension dtype (:issue:`56765`)
Bug in :meth:`IntervalIndex.get_indexer` with datetime or timedelta intervals incorrectly matching on integer targets (:issue:`47772`)
Bug in :meth:`IntervalIndex.get_indexer` with timezone-aware datetime intervals incorrectly matching on a sequence of timezone-naive targets (:issue:`47772`)
Bug in setting values on a :class:`Series` with an :class:`IntervalIndex` using a slice incorrectly raising (:issue:`54722`)

Indexing

Bug in :meth:`DataFrame.loc` mutating a boolean indexer when :class:`DataFrame` has a :class:`MultiIndex` (:issue:`56635`)
Bug in :meth:`DataFrame.loc` when setting :class:`Series` with extension dtype into NumPy dtype (:issue:`55604`)
Bug in :meth:`Index.difference` not returning a unique set of values when other is empty or other is considered non-comparable (:issue:`55113`)
Bug in setting :class:`Categorical` values into a :class:`DataFrame` with numpy dtypes raising RecursionError (:issue:`52927`)
Fixed bug when creating new column with missing values when setting a single string value (:issue:`56204`)

Missing

Bug in :meth:`DataFrame.update` wasn't updating in-place for tz-aware datetime64 dtypes (:issue:`56227`)

MultiIndex

Bug in :meth:`MultiIndex.get_indexer` not raising ValueError when method provided and index is non-monotonic (:issue:`53452`)

I/O

Bug in :func:`read_csv` where engine="python" did not respect chunksize arg when skiprows was specified (:issue:`56323`)
Bug in :func:`read_csv` where engine="python" was causing a TypeError when a callable skiprows and a chunk size was specified (:issue:`55677`)
Bug in :func:`read_csv` where on_bad_lines="warn" would write to stderr instead of raising a Python warning; this now yields a :class:`.errors.ParserWarning` (:issue:`54296`)
Bug in :func:`read_csv` with engine="pyarrow" where quotechar was ignored (:issue:`52266`)
Bug in :func:`read_csv` with engine="pyarrow" where usecols wasn't working with a CSV with no headers (:issue:`54459`)
Bug in :func:`read_excel`, with engine="xlrd" (xls files) erroring when the file contains NaN or Inf (:issue:`54564`)
Bug in :func:`read_json` not handling dtype conversion properly if infer_string is set (:issue:`56195`)
Bug in :meth:`DataFrame.to_excel`, with OdsWriter (ods files) writing Boolean/string value (:issue:`54994`)
Bug in :meth:`DataFrame.to_hdf` and :func:`read_hdf` with datetime64 dtypes with non-nanosecond resolution failing to round-trip correctly (:issue:`55622`)
Bug in :meth:`DataFrame.to_stata` raising for extension dtypes (:issue:`54671`)
Bug in :meth:`~pandas.read_excel` with engine="odf" (ods files) when a string cell contains an annotation (:issue:`55200`)
Bug in :meth:`~pandas.read_excel` with an ODS file without cached formatted cell for float values (:issue:`55219`)
Bug where :meth:`DataFrame.to_json` would raise an OverflowError instead of a TypeError with unsupported NumPy types (:issue:`55403`)

Period

Bug in :class:`PeriodIndex` construction when more than one of data, ordinal and **fields are passed failing to raise ValueError (:issue:`55961`)
Bug in :class:`Period` addition silently wrapping around instead of raising OverflowError (:issue:`55503`)
Bug in casting from :class:`PeriodDtype` with astype to datetime64 or :class:`DatetimeTZDtype` with non-nanosecond unit incorrectly returning with nanosecond unit (:issue:`55958`)

Plotting

Bug in :meth:`DataFrame.plot.box` with vert=False and a Matplotlib Axes created with sharey=True (:issue:`54941`)
Bug in :meth:`DataFrame.plot.scatter` discarding string columns (:issue:`56142`)
Bug in :meth:`Series.plot` when reusing an ax object failing to raise when a how keyword is passed (:issue:`55953`)

Groupby/resample/rolling

Bug in :class:`.Rolling` where duplicate datetimelike indexes are treated as consecutive rather than equal with closed='left' and closed='neither' (:issue:`20712`)
Bug in :meth:`.DataFrameGroupBy.idxmin`, :meth:`.DataFrameGroupBy.idxmax`, :meth:`.SeriesGroupBy.idxmin`, and :meth:`.SeriesGroupBy.idxmax` would not retain :class:`.Categorical` dtype when the index was a :class:`.CategoricalIndex` that contained NA values (:issue:`54234`)
Bug in :meth:`.DataFrameGroupBy.transform` and :meth:`.SeriesGroupBy.transform` when observed=False and f="idxmin" or f="idxmax" would incorrectly raise on unobserved categories (:issue:`54234`)
Bug in :meth:`.DataFrameGroupBy.value_counts` and :meth:`.SeriesGroupBy.value_counts` could result in incorrect sorting if the columns of the DataFrame or name of the Series are integers (:issue:`55951`)
Bug in :meth:`.DataFrameGroupBy.value_counts` and :meth:`.SeriesGroupBy.value_counts` would not respect sort=False in :meth:`DataFrame.groupby` and :meth:`Series.groupby` (:issue:`55951`)
Bug in :meth:`.DataFrameGroupBy.value_counts` and :meth:`.SeriesGroupBy.value_counts` would sort by proportions rather than frequencies when sort=True and normalize=True (:issue:`55951`)
Bug in :meth:`DataFrame.asfreq` and :meth:`Series.asfreq` with a :class:`DatetimeIndex` with non-nanosecond resolution incorrectly converting to nanosecond resolution (:issue:`55958`)
Bug in :meth:`DataFrame.ewm` when passed times with non-nanosecond datetime64 or :class:`DatetimeTZDtype` dtype (:issue:`56262`)
Bug in :meth:`DataFrame.groupby` and :meth:`Series.groupby` where grouping by a combination of Decimal and NA values would fail when sort=True (:issue:`54847`)
Bug in :meth:`DataFrame.groupby` for DataFrame subclasses when selecting a subset of columns to apply the function to (:issue:`56761`)
Bug in :meth:`DataFrame.resample` not respecting closed and label arguments for :class:`~pandas.tseries.offsets.BusinessDay` (:issue:`55282`)
Bug in :meth:`DataFrame.resample` when resampling on a :class:`ArrowDtype` of pyarrow.timestamp or pyarrow.duration type (:issue:`55989`)
Bug in :meth:`DataFrame.resample` where bin edges were not correct for :class:`~pandas.tseries.offsets.BusinessDay` (:issue:`55281`)
Bug in :meth:`DataFrame.resample` where bin edges were not correct for :class:`~pandas.tseries.offsets.MonthBegin` (:issue:`55271`)
Bug in :meth:`DataFrame.rolling` and :meth:`Series.rolling` where either the index or on column was :class:`ArrowDtype` with pyarrow.timestamp type (:issue:`55849`)

Reshaping

Bug in :func:`concat` ignoring sort parameter when passed :class:`DatetimeIndex` indexes (:issue:`54769`)
Bug in :func:`concat` renaming :class:`Series` when ignore_index=False (:issue:`15047`)
Bug in :func:`merge_asof` raising TypeError when by dtype is not object, int64, or uint64 (:issue:`22794`)
Bug in :func:`merge_asof` raising incorrect error for string dtype (:issue:`56444`)
Bug in :func:`merge_asof` when using a :class:`Timedelta` tolerance on a :class:`ArrowDtype` column (:issue:`56486`)
Bug in :func:`merge` not raising when merging datetime columns with timedelta columns (:issue:`56455`)
Bug in :func:`merge` not raising when merging string columns with numeric columns (:issue:`56441`)
Bug in :func:`merge` returning columns in incorrect order when left and/or right is empty (:issue:`51929`)
Bug in :meth:`DataFrame.melt` where an exception was raised if var_name was not a string (:issue:`55948`)
Bug in :meth:`DataFrame.melt` where it would not preserve the datetime (:issue:`55254`)
Bug in :meth:`DataFrame.pivot_table` where the row margin is incorrect when the columns have numeric names (:issue:`26568`)
Bug in :meth:`DataFrame.pivot` with numeric columns and extension dtype for data (:issue:`56528`)
Bug in :meth:`DataFrame.stack` with future_stack=True would not preserve NA values in the index (:issue:`56573`)

Sparse

Bug in :meth:`SparseArray.take` when using a different fill value than the array's fill value (:issue:`55181`)

Other

:meth:`DataFrame.__dataframe__` did not support pyarrow large strings (:issue:`56702`)
Bug in :func:`DataFrame.describe` when formatting percentiles in the resulting percentile 99.999% is rounded to 100% (:issue:`55765`)
Bug in :func:`cut` and :func:`qcut` with datetime64 dtype values with non-nanosecond units incorrectly returning nanosecond-unit bins (:issue:`56101`)
Bug in :func:`cut` incorrectly allowing cutting of timezone-aware datetimes with timezone-naive bins (:issue:`54964`)
Bug in :func:`infer_freq` and :meth:`DatetimeIndex.inferred_freq` with weekly frequencies and non-nanosecond resolutions (:issue:`55609`)
Bug in :meth:`DataFrame.apply` where passing raw=True ignored args passed to the applied function (:issue:`55009`)
Bug in :meth:`DataFrame.from_dict` which would always sort the rows of the created :class:`DataFrame`. (:issue:`55683`)
Bug in :meth:`DataFrame.sort_index` when passing axis="columns" and ignore_index=True raising a ValueError (:issue:`56478`)
Bug in rendering inf values inside a :class:`DataFrame` with the use_inf_as_na option enabled (:issue:`55483`)
Bug in rendering a :class:`Series` with a :class:`MultiIndex` when one of the index level's names is 0 not having that name displayed (:issue:`55415`)
Bug in the error message when assigning an empty :class:`DataFrame` to a column (:issue:`55956`)
Bug when time-like strings were being cast to :class:`ArrowDtype` with pyarrow.time64 type (:issue:`56463`)

Files

v2.2.0.rst

Latest commit

History