Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API: Unify .update to generic #23192

Closed
wants to merge 10 commits into from

Conversation

h-vetinari
Copy link
Contributor

@h-vetinari h-vetinari commented Oct 16, 2018

Since #22286 and #21855 have stalled completely, I've decided to go ahead with this independently.

@pep8speaks
Copy link

pep8speaks commented Oct 16, 2018

Hello @h-vetinari! Thanks for updating the PR.

Comment last updated on November 11, 2018 at 10:14 Hours UTC

@h-vetinari h-vetinari force-pushed the unify_update branch 3 times, most recently from 16ca125 to c6e9dfb Compare October 16, 2018 23:07
@@ -99,6 +99,32 @@ def _single_replace(self, to_replace, method, inplace, limit):
return result


def _update_column(this, that, overwrite=True, filter_func=None,
raise_conflict=False):
import pandas.core.computation.expressions as expressions
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a doc-string

@@ -99,6 +99,32 @@ def _single_replace(self, to_replace, method, inplace, limit):
return result


def _update_column(this, that, overwrite=True, filter_func=None,
raise_conflict=False):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move this to pandas/core/missing.py, you can de-privatize it and rename to update_array

@@ -99,6 +99,32 @@ def _single_replace(self, to_replace, method, inplace, limit):
return result


def _update_column(this, that, overwrite=True, filter_func=None,
raise_conflict=False):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we have an issue, but pls create one to deprecate this arg and rename to errors= (you can do that in the internal function and put a TODO in the external facing ones).

if updated is None:
# don't overwrite Series unnecessarily
return
self._data._block.values = updated
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is SO unsafe, doesn't handle caching and is reaching into internals. Use self._update_inplace(updated)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if updated is not None:
    self._update_inplace(updated)

original object.
overwrite : bool, default True
How to handle non-NA values for overlapping keys:

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you indicate with a versionadded tag what has changed compared to current

filter_func=filter_func,
raise_conflict=raise_conflict)
# don't overwrite columns unnecessarily
if updated is None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if updated is not None:
    self[col] = updated

@@ -200,6 +200,7 @@ Other Enhancements
- :meth:`Series.resample` and :meth:`DataFrame.resample` have gained the :meth:`Resampler.quantile` (:issue:`15023`).
- :meth:`Index.to_frame` now supports overriding column name(s) (:issue:`22580`).
- New attribute :attr:`__git_version__` will return git commit sha of current build (:issue:`21295`).
- :meth:`Series.update` now supports the same keywords and functionality as :meth:`DataFrame.update` (:issue:`22358`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you enumerate what has changed.

@jreback jreback added Reshaping Concat, Merge/Join, Stack/Unstack, Explode Clean labels Oct 17, 2018
@jreback
Copy link
Contributor

jreback commented Oct 17, 2018

also have a look at #3025

@h-vetinari
Copy link
Contributor Author

@jreback
Thanks for the review, I'll incorporate it (resp. answer in detail) later. However, this might need a precursor PR to add tests for Series.update - the dtype change for

>>> s = pd.Series([1, 2, 3])
>>> s.update(pd.Series([4, np.nan, 6]))

only got caught by the doctests. Are we fine with changing that?

@jreback
Copy link
Contributor

jreback commented Oct 17, 2018

let's make first pass that doesn't change anything, xfail new tests if needed for #23192 (comment).

I would try the approach for #3025 as an alternative

@h-vetinari
Copy link
Contributor Author

h-vetinari commented Nov 9, 2018

Split off the dtype promotion tests for Series (which are already passing on master) off into a precursor: #23604

Copy link
Contributor Author

@h-vetinari h-vetinari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback
Incorporated your feedback for deprecating raise_conflict, and fixed the dtype thing as well. PTAL

warnings.simplefilter("ignore", FutureWarning)
# do not warn about constructing Panel when reindexing
result = super(Panel, self).reindex(**kwargs)
return result
Copy link
Contributor Author

@h-vetinari h-vetinari Nov 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had no intention to add this warning filter here, but otherwise I cannot test the raise_conflict deprecation for Panel, as this warning was throwing me off from trying to catch the deprecation warning. If we don't care about testing that kwarg deprecation for Panel, I can revert this change here.

A B
0 1 4
1 2 500
2 3 6
Copy link
Contributor Author

@h-vetinari h-vetinari Nov 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't obvious from the diff (due to the code moving files), but now this keeps the dtype - yay!

Copy link
Contributor Author

@h-vetinari h-vetinari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Forgot to mention this before.

"""
updated = _update_array(this, that, overwrite=overwrite,
filter_func=filter_func, errors=errors)
return this if updated is None else updated
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback
You asked me to move the function here and make it public as update_array. This had the problem that it had the fallback of returning None in case there were no changes, which is necessary to avoid pointlessly updating the columns in NDFrame.update.

However, I was not comfortable with a public function having such unexpected behaviour, so I made the function I used originally private again (but still in core.missing) and just made a thin wrapper around it public, that only filters out the None-returning case.

@datapythonista
Copy link
Member

@h-vetinari I'm canceling the the old azure builds for this PR as you add new commits (azure auto-cancel is not working at the moment). Shouldn't make a difference to you, as I guess you just care about the last version, but ping me if this causes you any trouble.

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@h-vetinari this PR is again too much going on. pls do this in multiple steps.

@h-vetinari
Copy link
Contributor Author

@jreback
I already split off #23604, but everything else comes directly out of your review. Let's deal with #23604 first though.

@h-vetinari
Copy link
Contributor Author

Though I guess I could split off solving #23606 if I first xfail the tests for #23604 (which represent the current state of Series.update) in this PR

@h-vetinari
Copy link
Contributor Author

h-vetinari commented Nov 12, 2018

Both this PR and #22225 have had persistent a ResourceWarning the last few runs. I first thought it was a flaky thing like those Warnings used to be, but this time, it stayed, and I can reproduce it locally (not with pytest pandas/tests/io/test_parquet.py, but at least with pytest pandas/tests/io).

For example in https://travis-ci.org/pandas-dev/pandas/jobs/453820311 or https://travis-ci.org/pandas-dev/pandas/jobs/453822449:

sys:1: ResourceWarning: unclosed <socket.socket fd=16, family=AddressFamily.AF_INET, type=2050, proto=0, laddr=('0.0.0.0', 0)>
sys:1: ResourceWarning: unclosed <socket.socket fd=15, family=AddressFamily.AF_INET, type=2050, proto=0, laddr=('0.0.0.0', 0)>

and

=============================== warnings summary ===============================
pandas/core/frame.py::pandas.core.frame.DataFrame.to_parquet
  /home/travis/build/pandas-dev/pandas/pandas/io/parquet.py:129: ResourceWarning: unclosed file <_io.BufferedReader name='df.parquet.gzip'>
    **kwargs).to_pandas()

There's also a stderr (or stdout) warning from the parser-tests surfacing somewhere:

..............................................................x...........................................................s....
........................................Skipping line 3: Expected 3 fields in line 3, saw 4
.......................................s.......................................................................................

I've narrowed the ResourceWarning down to the parquet-s3 tests, but couldn't figure out a way to fix it despite many tries, so I'm skipping them for now.

@codecov
Copy link

codecov bot commented Nov 15, 2018

Codecov Report

Merging #23192 into master will decrease coverage by <.01%.
The diff coverage is 94.64%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #23192      +/-   ##
==========================================
- Coverage   92.25%   92.25%   -0.01%     
==========================================
  Files         161      161              
  Lines       51383    51410      +27     
==========================================
+ Hits        47404    47428      +24     
- Misses       3979     3982       +3
Flag Coverage Δ
#multiple 90.64% <94.64%> (-0.01%) ⬇️
#single 42.3% <10.71%> (-0.02%) ⬇️
Impacted Files Coverage Δ
pandas/core/series.py 93.66% <100%> (-0.02%) ⬇️
pandas/core/frame.py 96.99% <100%> (-0.04%) ⬇️
pandas/core/missing.py 91.6% <90%> (-0.11%) ⬇️
pandas/core/generic.py 96.82% <96.87%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a23f901...411741a. Read the comment docs.

@h-vetinari
Copy link
Contributor Author

This is still ongoing WIP, but ran into #23823 while trying to use maybe_upcast_putmask here, and then #23833 while trying to fix that method...

@jreback
Copy link
Contributor

jreback commented Apr 20, 2019

conceptually this PR is good, but it does too much in a single go, diluting the intent. several much smaller scope PR's would be better.

@jreback jreback closed this Apr 20, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Clean Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ENH: unify signature for df.update and Series.update
4 participants