Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API: Unify .update to generic #23192

Closed
wants to merge 10 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions doc/source/whatsnew/v0.24.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -281,6 +281,8 @@ Other Enhancements
all instances of ``set`` will not be considered "list-like" anymore (:issue:`23061`)
- :meth:`Index.to_frame` now supports overriding column name(s) (:issue:`22580`).
- New attribute :attr:`__git_version__` will return git commit sha of current build (:issue:`21295`).
- :meth:`Series.update` now supports the same keywords and functionality as :meth:`DataFrame.update`.
In particular, it has gained the keywords ``overwrite``, ``filter_func`` and ``errors`` (:issue:`22358`, :issue:`23585`)
- Compatibility with Matplotlib 3.0 (:issue:`22790`).
- Added :meth:`Interval.overlaps`, :meth:`IntervalArray.overlaps`, and :meth:`IntervalIndex.overlaps` for determining overlaps between interval-like objects (:issue:`21998`)
- :func:`~DataFrame.to_parquet` now supports writing a ``DataFrame`` as a directory of parquet files partitioned by a subset of the columns when ``engine = 'pyarrow'`` (:issue:`23283`)
Expand All @@ -297,6 +299,7 @@ Backwards incompatible API changes
- A newly constructed empty :class:`DataFrame` with integer as the ``dtype`` will now only be cast to ``float64`` if ``index`` is specified (:issue:`22858`)
- :meth:`Series.str.cat` will now raise if `others` is a `set` (:issue:`23009`)
- Passing scalar values to :class:`DatetimeIndex` or :class:`TimedeltaIndex` will now raise ``TypeError`` instead of ``ValueError`` (:issue:`23539`)
- :meth:`DataFrame.update` will now try to preserve the dtype of the caller as much as possible (:issue:`23606`)

.. _whatsnew_0240.api_breaking.deps:

Expand Down
150 changes: 3 additions & 147 deletions pandas/core/frame.py
Original file line number Diff line number Diff line change
Expand Up @@ -5203,157 +5203,13 @@ def combiner(x, y):

return self.combine(other, combiner, overwrite=False)

@Appender(NDFrame.update.__doc__)
@deprecate_kwarg(old_arg_name='raise_conflict', new_arg_name='errors',
mapping={False: 'ignore', True: 'raise'})
def update(self, other, join='left', overwrite=True, filter_func=None,
errors='ignore'):
"""
Modify in place using non-NA values from another DataFrame.

Aligns on indices. There is no return value.

Parameters
----------
other : DataFrame, or object coercible into a DataFrame
Should have at least one matching index/column label
with the original DataFrame. If a Series is passed,
its name attribute must be set, and that will be
used as the column name to align with the original DataFrame.
join : {'left'}, default 'left'
Only left join is implemented, keeping the index and columns of the
original object.
overwrite : bool, default True
How to handle non-NA values for overlapping keys:

* True: overwrite original DataFrame's values
with values from `other`.
* False: only update values that are NA in
the original DataFrame.

filter_func : callable(1d-array) -> bool 1d-array, optional
Can choose to replace values other than NA. Return True for values
that should be updated.
errors : {'raise', 'ignore'}, default 'ignore'
If 'raise', will raise a ValueError if the DataFrame and `other`
both contain non-NA data in the same place.

.. versionchanged :: 0.24.0
Changed from `raise_conflict=False|True`
to `errors='ignore'|'raise'`.

Returns
-------
None : method directly changes calling object

Raises
------
ValueError
* When `errors='raise'` and there's overlapping non-NA data.
* When `errors` is not either `'ignore'` or `'raise'`
NotImplementedError
* If `join != 'left'`

See Also
--------
dict.update : Similar method for dictionaries.
DataFrame.merge : For column(s)-on-columns(s) operations.

Examples
--------
>>> df = pd.DataFrame({'A': [1, 2, 3],
... 'B': [400, 500, 600]})
>>> new_df = pd.DataFrame({'B': [4, 5, 6],
... 'C': [7, 8, 9]})
>>> df.update(new_df)
>>> df
A B
0 1 4
1 2 5
2 3 6

The DataFrame's length does not increase as a result of the update,
only values at matching index/column labels are updated.

>>> df = pd.DataFrame({'A': ['a', 'b', 'c'],
... 'B': ['x', 'y', 'z']})
>>> new_df = pd.DataFrame({'B': ['d', 'e', 'f', 'g', 'h', 'i']})
>>> df.update(new_df)
>>> df
A B
0 a d
1 b e
2 c f

For Series, it's name attribute must be set.

>>> df = pd.DataFrame({'A': ['a', 'b', 'c'],
... 'B': ['x', 'y', 'z']})
>>> new_column = pd.Series(['d', 'e'], name='B', index=[0, 2])
>>> df.update(new_column)
>>> df
A B
0 a d
1 b y
2 c e
>>> df = pd.DataFrame({'A': ['a', 'b', 'c'],
... 'B': ['x', 'y', 'z']})
>>> new_df = pd.DataFrame({'B': ['d', 'e']}, index=[1, 2])
>>> df.update(new_df)
>>> df
A B
0 a x
1 b d
2 c e

If `other` contains NaNs the corresponding values are not updated
in the original dataframe.

>>> df = pd.DataFrame({'A': [1, 2, 3],
... 'B': [400, 500, 600]})
>>> new_df = pd.DataFrame({'B': [4, np.nan, 6]})
>>> df.update(new_df)
>>> df
A B
0 1 4.0
1 2 500.0
2 3 6.0
"""
import pandas.core.computation.expressions as expressions
# TODO: Support other joins
if join != 'left': # pragma: no cover
raise NotImplementedError("Only left join is supported")
if errors not in ['ignore', 'raise']:
raise ValueError("The parameter errors must be either "
"'ignore' or 'raise'")

if not isinstance(other, DataFrame):
other = DataFrame(other)

other = other.reindex_like(self)

for col in self.columns:
this = self[col].values
that = other[col].values
if filter_func is not None:
with np.errstate(all='ignore'):
mask = ~filter_func(this) | isna(that)
else:
if errors == 'raise':
mask_this = notna(that)
mask_that = notna(this)
if any(mask_this & mask_that):
raise ValueError("Data overlaps.")

if overwrite:
mask = isna(that)
else:
mask = notna(this)

# don't overwrite columns unecessarily
if mask.all():
continue

self[col] = expressions.where(mask, this, that)
super(DataFrame, self).update(other, join=join, overwrite=overwrite,
filter_func=filter_func, errors=errors)

# ----------------------------------------------------------------------
# Data reshaping
Expand Down
175 changes: 175 additions & 0 deletions pandas/core/generic.py
Original file line number Diff line number Diff line change
Expand Up @@ -4173,6 +4173,181 @@ def _reindex_with_indexers(self, reindexers, fill_value=None, copy=False,

return self._constructor(new_data).__finalize__(self)

def update(self, other, join='left', overwrite=True, filter_func=None,
errors='ignore'):
"""
Modify in place using non-NA values from another DataFrame.

Series/DataFrame will be aligned on indexes, and whenever possible,
the dtype of the individual Series of the caller will be preserved.

There is no return value.

Parameters
----------
other : DataFrame, or object coercible into a DataFrame
Should have at least one matching index/column label
with the original DataFrame. If a Series is passed,
its name attribute must be set, and that will be
used as the column name to align with the original DataFrame.
join : {'left'}, default 'left'
Only left join is implemented, keeping the index and columns of the
original object.
overwrite : bool, default True
How to handle non-NA values for overlapping keys:

* True: overwrite original DataFrame's values
with values from `other`.
* False: only update values that are NA in
the original DataFrame.

filter_func : callable(1d-array) -> bool 1d-array, optional
Can choose to replace values other than NA. Return True for values
that should be updated.
errors : {'raise', 'ignore'}, default 'ignore'
If 'raise', will raise a ValueError if the DataFrame and `other`
both contain non-NA data in the same place.

.. versionchanged :: 0.24.0
Changed from `raise_conflict=False|True`
to `errors='ignore'|'raise'`.

Returns
-------
None : method directly changes calling object

Raises
------
ValueError
* When `errors='raise'` and there's overlapping non-NA data.
* When `errors` is not either `'ignore'` or `'raise'`
NotImplementedError
* If `join != 'left'`

See Also
--------
Series.update : Similar method for `Series`.
DataFrame.merge : For column(s)-on-columns(s) operations.
dict.update : Similar method for `dict`.

Examples
--------
>>> df = pd.DataFrame({'A': [1, 2, 3],
... 'B': [400, 500, 600]})
>>> new_df = pd.DataFrame({'B': [4, 5, 6],
... 'C': [7, 8, 9]})
>>> df.update(new_df)
>>> df
A B
0 1 4
1 2 5
2 3 6

The DataFrame's length does not increase as a result of the update,
only values at matching index/column labels are updated.

>>> df = pd.DataFrame({'A': ['a', 'b', 'c'],
... 'B': ['x', 'y', 'z']})
>>> new_df = pd.DataFrame({'B': ['d', 'e', 'f', 'g', 'h', 'i']})
>>> df.update(new_df)
>>> df
A B
0 a d
1 b e
2 c f

For Series, it's name attribute must be set.

>>> df = pd.DataFrame({'A': ['a', 'b', 'c'],
... 'B': ['x', 'y', 'z']})
>>> new_column = pd.Series(['d', 'e'], name='B', index=[0, 2])
>>> df.update(new_column)
>>> df
A B
0 a d
1 b y
2 c e
>>> df = pd.DataFrame({'A': ['a', 'b', 'c'],
... 'B': ['x', 'y', 'z']})
>>> new_df = pd.DataFrame({'B': ['d', 'e']}, index=[1, 2])
>>> df.update(new_df)
>>> df
A B
0 a x
1 b d
2 c e

If `other` contains NaNs the corresponding values are not updated
in the original dataframe.

>>> df = pd.DataFrame({'A': [1, 2, 3],
... 'B': [400, 500, 600]})
>>> new_df = pd.DataFrame({'B': [4, np.nan, 6]})
>>> df.update(new_df)
>>> df
A B
0 1 4
1 2 500
2 3 6
Copy link
Contributor Author

@h-vetinari h-vetinari Nov 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't obvious from the diff (due to the code moving files), but now this keeps the dtype - yay!

"""
from pandas import Series, DataFrame
# TODO: Support other joins
if join != 'left': # pragma: no cover
raise NotImplementedError("Only left join is supported")
if errors not in ['ignore', 'raise']:
raise ValueError("The parameter errors must be either "
"'ignore' or 'raise'")

if isinstance(self, ABCSeries):
if not isinstance(other, ABCSeries):
other = Series(other)
other = other.reindex_like(self)
this = self.values
that = other.values

# will return None if "this" remains unchanged
updated_array = missing._update_array(this, that,
overwrite=overwrite,
filter_func=filter_func,
errors=errors)
# don't overwrite unnecessarily
if updated_array is not None:
# avoid unnecessary upcasting (introduced by alignment)
try:
updated = Series(updated_array, index=self.index,
dtype=this.dtype)
except ValueError:
updated = Series(updated_array, index=self.index)
self._update_inplace(updated)
else: # DataFrame
if not isinstance(other, ABCDataFrame):
other = DataFrame(other)

other = other.reindex_like(self)

for col in self.columns:
this = self[col].values
that = other[col].values

# will return None if "this" remains unchanged
updated_array = missing._update_array(this, that,
overwrite=overwrite,
filter_func=filter_func,
errors=errors)
# don't overwrite unnecessarily
if updated_array is not None:
# no problem to set DataFrame column with array
updated = updated_array

if updated_array.dtype != this.dtype:
# avoid unnecessary upcasting (introduced by alignment)
try:
updated = Series(updated_array, index=self.index,
dtype=this.dtype)
except ValueError:
pass
self[col] = updated

def filter(self, items=None, like=None, regex=None, axis=None):
"""
Subset rows or columns of dataframe according to labels in
Expand Down
Loading