New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: avoid creating reference cycle on indexing (#15746) #17956

Merged
merged 8 commits into from Oct 27, 2017

Conversation

Projects
None yet
4 participants
@pitrou
Contributor

pitrou commented Oct 23, 2017

  • closes #15746
  • tests added / passed
  • passes git diff upstream/master -u -- "*.py" | flake8 --diff
@pitrou

This comment has been minimized.

Contributor

pitrou commented Oct 23, 2017

I wonder if I should add a whatsnew entry for this, and if so, in which file?

@jreback

no whatsnew needed here. this is not user visible.

@@ -1839,5 +1839,27 @@ cdef class BlockPlacement:
return self._as_slice

This comment has been minimized.

@jreback

jreback Oct 23, 2017

Contributor

make a new cython module. indexing.pyx (and add to setup.py).

@jreback

This comment has been minimized.

Contributor

jreback commented Oct 23, 2017

though you could add in the perf section if you want (but you will need to wait for us to create the 0.22 notes); releasing 0.21.0 shortly.

@pitrou

This comment has been minimized.

Contributor

pitrou commented Oct 23, 2017

The latest changes should address your comments.

self.obj = obj
self.ndim = obj.ndim
self.name = name
def __call__(self, axis=None):
# we need to return a copy of ourselves

This comment has been minimized.

@jreback

jreback Oct 23, 2017

Contributor

did you reverse these on purpose?

This comment has been minimized.

@pitrou

pitrou Oct 23, 2017

Contributor

Yes, so that we can use functools.partial(indexer, name) to shave a few % more. If that's not important, I can revert to the original order.

This comment has been minimized.

@jreback

jreback Oct 23, 2017

Contributor

ok this is not a big deal to reverse.

@codecov

This comment has been minimized.

codecov bot commented Oct 23, 2017

Codecov Report

Merging #17956 into master will decrease coverage by 0.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #17956      +/-   ##
==========================================
- Coverage   91.23%   91.22%   -0.02%     
==========================================
  Files         163      163              
  Lines       50113    50103      -10     
==========================================
- Hits        45723    45704      -19     
- Misses       4390     4399       +9
Flag Coverage Δ
#multiple 89.03% <100%> (-0.01%) ⬇️
#single 40.3% <71.42%> (-0.08%) ⬇️
Impacted Files Coverage Δ
pandas/core/indexing.py 92.8% <100%> (-0.02%) ⬇️
pandas/core/generic.py 92.51% <100%> (-0.03%) ⬇️
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️
pandas/core/frame.py 97.75% <0%> (-0.1%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e1dabf3...28c5056. Read the comment docs.

@codecov

This comment has been minimized.

codecov bot commented Oct 23, 2017

Codecov Report

Merging #17956 into master will decrease coverage by 0.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #17956      +/-   ##
==========================================
- Coverage   91.24%   91.22%   -0.02%     
==========================================
  Files         163      163              
  Lines       50165    50155      -10     
==========================================
- Hits        45775    45756      -19     
- Misses       4390     4399       +9
Flag Coverage Δ
#multiple 89.04% <100%> (-0.01%) ⬇️
#single 40.32% <71.42%> (-0.08%) ⬇️
Impacted Files Coverage Δ
pandas/core/indexing.py 92.8% <100%> (-0.02%) ⬇️
pandas/core/generic.py 92.42% <100%> (-0.03%) ⬇️
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️
pandas/core/frame.py 97.75% <0%> (-0.1%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 79498e3...efe021d. Read the comment docs.

setattr(cls, name, property(_indexer, doc=indexer.__doc__))
# add to our internal names set

This comment has been minimized.

@jreback

jreback Oct 23, 2017

Contributor

hmm I think you need to leave this. but only an asv will tell us. can you run the asv's for indexing (and add one which replicates the issue that initiated this PR).

This comment has been minimized.

@pitrou

pitrou Oct 24, 2017

Contributor

Ok, I ran the benchmarks. The only significant difference is that .ix becomes slower, because the deprecation warning is emitter at each instantiation. Since it's deprecated anyway, I'm not sure performance is important.

@pitrou

This comment has been minimized.

Contributor

pitrou commented Oct 24, 2017

One of the Circle-Ci jobs died mysteriously, I don't know why: https://circleci.com/gh/pandas-dev/pandas/5803#tests/containers/1

@jreback

small comments. actually will have you add a release note in perf section, but for 0.22, but it doesn't exist yet, will be created after release (shortly).

goal_time = 0.2
def setup(self):
self.s = Series(range(10))

This comment has been minimized.

@jreback

jreback Oct 25, 2017

Contributor

can you add for .loc (and might as well for .ix)

This comment has been minimized.

@pitrou

pitrou Oct 25, 2017

Contributor

Ok, done.

@jreback jreback added this to the 0.22.0 milestone Oct 25, 2017

@@ -881,6 +882,14 @@ def test_partial_boolean_frame_indexing(self):
columns=list('ABC'))
tm.assert_frame_equal(result, expected)
def test_no_reference_cycle(self):
df = pd.DataFrame({'a': [0, 1], 'b': [2, 3]})
for name in ('loc', 'iloc', 'ix', 'at', 'iat'):

This comment has been minimized.

@bluenote10

bluenote10 Oct 25, 2017

Would it make sense to do a sys.getrefcount(df) before and after and assert that it doesn't change? Or is this assertion too strong?

This comment has been minimized.

@pitrou

pitrou Oct 25, 2017

Contributor

There is no need for that, the weakref-based test below already tests that no leak happens.

This comment has been minimized.

@bluenote10

bluenote10 Oct 25, 2017

I was only wondering if the weakref can become None "by chance" if the garbage collector happens to run exactly between the del and the assertion. This is of course highly unlikely, but my understanding was that the None result does not directly imply that there wasn't a cyclic reference. But I could be wrong about that.

This comment has been minimized.

@pitrou

pitrou Oct 25, 2017

Contributor

Well, In that case the test would fail most of the time.

This comment has been minimized.

@bluenote10

bluenote10 Oct 26, 2017

Why most of the time? It is very unlikely, so it would fail rarely, right?

You can check that this assertion erroneously passes with pandas 0.20.3 where there are definitely cyclic references by forcing an "accidental" GC:

In [8]:     def test_no_reference_cycle(happens_to_run_gc):
   ...:         import weakref, gc, sys, pandas as pd
   ...:         df = pd.DataFrame({'a': [0, 1], 'b': [2, 3]})
   ...:         refcount_before = sys.getrefcount(df)
   ...:         for name in ('loc', 'iloc', 'ix', 'at', 'iat'):
   ...:             getattr(df, name)
   ...:         refcount_after = sys.getrefcount(df)
   ...:         print("ref counts {} -> {}".format(refcount_before, refcount_after))
   ...:         wr = weakref.ref(df)
   ...:         del df
   ...:         if happens_to_run_gc:
   ...:             gc.collect()
   ...:         assert wr() is None
   ...:

This comment has been minimized.

@pitrou

pitrou Oct 26, 2017

Contributor

You said « the weakref can become None "by chance" ». If that's the only reason the test succeeds, then it would fail most of the time, since the chance of a GC happening between two consecutive opcodes is slim.

I'm not sure what your code snippet is supposed to prove.

This comment has been minimized.

@bluenote10

bluenote10 Oct 26, 2017

What I'm trying to say: Checking for wr() is None does not guarantee that the operations are ref-cycle-free, which I though was the purpose of the test. That's why I would have preferred refcount_before == refcount_after but due to the low probability of an accidental success it doesn't really matter.

@jreback

can you add a note in 0.22.0 perf section

@jreback jreback merged commit 37c9cea into pandas-dev:master Oct 27, 2017

3 checks passed

ci/circleci Your tests passed on CircleCI!
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@jreback

This comment has been minimized.

Contributor

jreback commented Oct 27, 2017

thanks @pitrou very nice!

@pitrou pitrou deleted the pitrou:no_index_refcycle branch Oct 27, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment