Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: fix mutation of DTI backing Series/DataFrame #24096

Merged
merged 8 commits into from Dec 5, 2018

Conversation

jbrockmendel
Copy link
Member

Fixes (at least some) verify_integrity bugs discussed in #24074.

@pep8speaks
Copy link

Hello @jbrockmendel! Thanks for submitting the PR.

assert ser._values is not dti
assert dti[1] == ts

def test_dt64tz_setitem_does_not_mutate_dti(self):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is not a series.test_block_internals; not sure if there is somewhere else this might belong

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls create one

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure

@mroeschke
Copy link
Member

Does this fix #6326 as well?

@jbrockmendel
Copy link
Member Author

Does this fix #6326 as well?

It looks like it does not. This also breaks another copy/view-based test, which I'll need to look into.

@codecov
Copy link

codecov bot commented Dec 4, 2018

Codecov Report

Merging #24096 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #24096      +/-   ##
==========================================
+ Coverage   42.52%   42.52%   +<.01%     
==========================================
  Files         161      161              
  Lines       51697    51699       +2     
==========================================
+ Hits        21982    21986       +4     
+ Misses      29715    29713       -2
Flag Coverage Δ
#single 42.52% <100%> (ø) ⬆️
Impacted Files Coverage Δ
pandas/core/internals/blocks.py 52.93% <100%> (+0.19%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1d3ed91...841c4a5. Read the comment docs.

@codecov
Copy link

codecov bot commented Dec 4, 2018

Codecov Report

Merging #24096 into master will decrease coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #24096      +/-   ##
==========================================
- Coverage    92.2%    92.2%   -0.01%     
==========================================
  Files         162      162              
  Lines       51714    51719       +5     
==========================================
+ Hits        47682    47686       +4     
- Misses       4032     4033       +1
Flag Coverage Δ
#multiple 90.6% <100%> (ø) ⬆️
#single 43.02% <100%> (-0.01%) ⬇️
Impacted Files Coverage Δ
pandas/core/internals/blocks.py 93.72% <100%> (ø) ⬆️
pandas/core/series.py 93.7% <100%> (+0.01%) ⬆️
pandas/core/internals/construction.py 96.64% <100%> (+0.02%) ⬆️
pandas/io/json/json.py 92.61% <0%> (-0.48%) ⬇️
pandas/util/testing.py 87.51% <0%> (+0.09%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9f2c716...4851ffa. Read the comment docs.

@jreback jreback added this to the 0.24.0 milestone Dec 5, 2018
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't see any tests for the OP issue here (just the freq change), which AFIACT is orthogonal

@@ -494,6 +495,11 @@ def _init_dict(self, data, index, columns, dtype=None):
arrays.loc[missing] = [v] * missing.sum()

else:
for key in data:
if isinstance(data[key], ABCIndexClass):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

really? we need to deep copy everything?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can be more specific and just make this ABCDatetimeIndex, but I don't see any solution that doesn't involve making a copy here.

The first thing I tried was to make a copy in Block.setitem (circa line 923 in internals.blocks) if we had a DatetimeIndex. But that breaks a test because it means when we call ser.__setitem__ (or ser.loc.__setitem__, or...), we fail to change other Series that are views on ser.

AFAICT the only way to do do this is to avoid ever having a DatetimeIndex and a Series/column share data. This may be simpler to resolve after Blocks are backed by DatetimeArray instead of DatetimeIndex (I'm not 100% sure on this), but ATM this is a blocker to #24074, which is itself a blocker to DatetimeArray. We could avoid having this be a blocker for #24074 if you were OK with the pickle workaround currently there.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, the pickle workaround is much worse. for now let's just do this for DTI w/tz only. otherwise this may be a lot of copying. Pls add a TODO around this though.

@@ -189,6 +190,9 @@ def __init__(self, data=None, index=None, dtype=None, name=None,
else:
# need to copy to avoid aliasing issues
data = data._values.copy()
if isinstance(data, ABCDatetimeIndex):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

really?? this is a specific problem to Datetime w/tz, this is too heavy handed

assert ser._values is not dti
assert dti[1] == ts

def test_dt64tz_setitem_does_not_mutate_dti(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls create one

@jbrockmendel
Copy link
Member Author

i don't see any tests for the OP issue here (just the freq change), which AFIACT is orthogonal

Each of the tests has an assertion near the end assert dti[1] == ts or assert dti[0] == ts that is a check for non-mutation.

@jreback
Copy link
Contributor

jreback commented Dec 5, 2018

lgtm. ping on green.

@jbrockmendel
Copy link
Member Author

Ping

@jreback jreback merged commit cb862e4 into pandas-dev:master Dec 5, 2018
@jreback
Copy link
Contributor

jreback commented Dec 5, 2018

thanks!

@jbrockmendel jbrockmendel deleted the invalidate branch December 5, 2018 20:29
TomAugspurger added a commit to TomAugspurger/pandas that referenced this pull request Dec 6, 2018
commit 28c61d770f6dfca6857fd0fa6979d4119a31129e
Author: Tom Augspurger <tom.w.augspurger@gmail.com>
Date:   Thu Dec 6 12:18:19 2018 -0600

    uncomment

commit bae2e322523efc73a1344464f51611e2dc555ccb
Author: Tom Augspurger <tom.w.augspurger@gmail.com>
Date:   Thu Dec 6 12:17:09 2018 -0600

    maybe fixes

commit 6cb4db05c9d6ceba3794096f0172cae5ed5f6019
Author: Tom Augspurger <tom.w.augspurger@gmail.com>
Date:   Thu Dec 6 09:57:37 2018 -0600

    we back

commit d97ab57fb32cb23371169d9ed659ccfac34cfe45
Merge: a117de4 b78aa8d
Author: Tom Augspurger <tom.w.augspurger@gmail.com>
Date:   Thu Dec 6 09:51:51 2018 -0600

    Merge remote-tracking branch 'upstream/master' into disown-tz-only-rebased2

commit b78aa8d
Author: gfyoung <gfyoung17+GitHub@gmail.com>
Date:   Thu Dec 6 07:18:44 2018 -0500

    REF/TST: Add pytest idiom to reshape/test_tile (pandas-dev#24107)

commit 2993b8e
Author: gfyoung <gfyoung17+GitHub@gmail.com>
Date:   Thu Dec 6 07:17:55 2018 -0500

    REF/TST: Add more pytest idiom to scalar/test_nat (pandas-dev#24120)

commit b841374
Author: evangelineliu <hsiyinliu@gmail.com>
Date:   Wed Dec 5 18:21:46 2018 -0500

    BUG: Fix concat series loss of timezone (pandas-dev#24027)

commit 4ae63aa
Author: jbrockmendel <jbrockmendel@gmail.com>
Date:   Wed Dec 5 14:44:50 2018 -0800

    Implement DatetimeArray._from_sequence (pandas-dev#24074)

commit 2643721
Author: jbrockmendel <jbrockmendel@gmail.com>
Date:   Wed Dec 5 14:43:45 2018 -0800

    CLN: Follow-up to pandas-dev#24100 (pandas-dev#24116)

commit 8ea7744
Author: chris-b1 <cbartak@gmail.com>
Date:   Wed Dec 5 14:21:23 2018 -0600

    PERF: ascii c string functions (pandas-dev#23981)

commit cb862e4
Author: jbrockmendel <jbrockmendel@gmail.com>
Date:   Wed Dec 5 12:19:46 2018 -0800

    BUG: fix mutation of DTI backing Series/DataFrame (pandas-dev#24096)

commit aead29b
Author: topper-123 <contribute@tensortable.com>
Date:   Wed Dec 5 19:06:00 2018 +0000

    API: rename MultiIndex.labels to MultiIndex.codes (pandas-dev#23752)
Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019
Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

pd.Series(dti) shares data with dti --> DatetimeIndex mutated
4 participants