Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG/PERF: Use EA in Index.get_value #24204

Merged
merged 3 commits into from Dec 10, 2018

Conversation

Projects
None yet
3 participants
@TomAugspurger
Copy link
Contributor

commented Dec 10, 2018

Avoid converting the ExtensionArray to an ndarray.

Before

pandas/tests/arrays/categorical/test_indexing.py F                                                                                                   [100%]

========================================================================= FAILURES =========================================================================
______________________________________________________________________ test_series_at ______________________________________________________________________

    def test_series_at():
        arr = NonCoercaibleCategorical(['a', 'b', 'c'])
        ser = Series(arr)
>       result = ser.at[0]

pandas/tests/arrays/categorical/test_indexing.py:158:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
pandas/core/indexing.py:2266: in __getitem__
    return self.obj._get_value(*key, takeable=self._takeable)
pandas/core/series.py:1078: in _get_value
    return self.index.get_value(self._values, label)
pandas/core/indexes/base.py:4303: in get_value
    s = com.values_from_object(series)
pandas/_libs/lib.pyx:82: in pandas._libs.lib.values_from_object
    obj = func()
pandas/core/arrays/categorical.py:1509: in get_values
    return np.array(self)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <[ValueError("I cannot be converted.") raised in repr()] NonCoercaibleCategorical object at 0x113d8b6a0>, dtype = None

    def __array__(self, dtype=None):
>       raise ValueError("I cannot be converted.")
E       ValueError: I cannot be converted.

Perf:

In [3]: a = pd.Series(pd.Categorical(np.random.choice(list(string.ascii_letters[:10]), 10_000)))

In [4]: %timeit a.at[0]
143 µs ± 4.86 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [3]: a = pd.Series(pd.Categorical(np.random.choice(list(string.ascii_letters[:10]), 10_000)))

In [4]: %timeit a.at[0]
11.1 µs ± 95.2 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

Split from #24024

BUG/PERF: Use EA in Index.get_value
Avoid converting the ExtensionArray to an ndarray.

Before

```
pandas/tests/arrays/categorical/test_indexing.py F                                                                                                   [100%]

========================================================================= FAILURES =========================================================================
______________________________________________________________________ test_series_at ______________________________________________________________________

    def test_series_at():
        arr = NonCoercaibleCategorical(['a', 'b', 'c'])
        ser = Series(arr)
>       result = ser.at[0]

pandas/tests/arrays/categorical/test_indexing.py:158:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
pandas/core/indexing.py:2266: in __getitem__
    return self.obj._get_value(*key, takeable=self._takeable)
pandas/core/series.py:1078: in _get_value
    return self.index.get_value(self._values, label)
pandas/core/indexes/base.py:4303: in get_value
    s = com.values_from_object(series)
pandas/_libs/lib.pyx:82: in pandas._libs.lib.values_from_object
    obj = func()
pandas/core/arrays/categorical.py:1509: in get_values
    return np.array(self)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <[ValueError("I cannot be converted.") raised in repr()] NonCoercaibleCategorical object at 0x113d8b6a0>, dtype = None

    def __array__(self, dtype=None):
>       raise ValueError("I cannot be converted.")
E       ValueError: I cannot be converted.

```

Perf:

```
In [3]: a = pd.Series(pd.Categorical(np.random.choice(list(string.ascii_letters[:10]), 10_000)))

In [4]: %timeit a.at[0]
143 µs ± 4.86 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [3]: a = pd.Series(pd.Categorical(np.random.choice(list(string.ascii_letters[:10]), 10_000)))

In [4]: %timeit a.at[0]
11.1 µs ± 95.2 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
```
@pep8speaks

This comment has been minimized.

Copy link

commented Dec 10, 2018

Hello @TomAugspurger! Thanks for submitting the PR.

@@ -145,3 +145,15 @@ def test_mask_with_boolean_raises(index):

with pytest.raises(ValueError, match='NA / NaN'):
s[idx]


class NonCoercaibleCategorical(Categorical):

This comment has been minimized.

Copy link
@TomAugspurger

TomAugspurger Dec 10, 2018

Author Contributor

I imagine this pattern would be generally useful to detect places where we accidentally convert an EA to an ndarray. What would be the best way to share this among tests? Would making a fixture that returned this be too weird? Monkeypatching?

This comment has been minimized.

Copy link
@jreback

jreback Dec 10, 2018

Contributor

could do this with a pytest monkeypatch (maybe just create an issue to follow up)

@TomAugspurger TomAugspurger referenced this pull request Dec 10, 2018

Merged

REF: DatetimeLikeArray #24024

7 of 12 tasks complete
@codecov

This comment has been minimized.

Copy link

commented Dec 10, 2018

Codecov Report

Merging #24204 into master will not change coverage.
The diff coverage is 100%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master   #24204   +/-   ##
=======================================
  Coverage   92.21%   92.21%           
=======================================
  Files         162      162           
  Lines       51723    51723           
=======================================
  Hits        47694    47694           
  Misses       4029     4029
Flag Coverage Δ
#multiple 90.61% <100%> (ø) ⬆️
#single 43.03% <100%> (ø) ⬆️
Impacted Files Coverage Δ
pandas/core/indexes/base.py 96.27% <100%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c5a4711...7a6a16d. Read the comment docs.

@codecov

This comment has been minimized.

Copy link

commented Dec 10, 2018

Codecov Report

Merging #24204 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #24204      +/-   ##
==========================================
+ Coverage   92.21%   92.21%   +<.01%     
==========================================
  Files         162      162              
  Lines       51723    51761      +38     
==========================================
+ Hits        47694    47731      +37     
- Misses       4029     4030       +1
Flag Coverage Δ
#multiple 90.61% <100%> (ø) ⬆️
#single 43% <100%> (-0.03%) ⬇️
Impacted Files Coverage Δ
pandas/core/indexes/base.py 96.27% <100%> (ø) ⬆️
pandas/core/arrays/categorical.py 95.3% <0%> (-0.1%) ⬇️
pandas/core/generic.py 96.65% <0%> (ø) ⬆️
pandas/core/arrays/sparse.py 92.08% <0%> (ø) ⬆️
pandas/core/arrays/base.py 97.41% <0%> (ø) ⬆️
pandas/core/indexes/category.py 97.9% <0%> (ø) ⬆️
pandas/core/internals/blocks.py 93.81% <0%> (+0.13%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c5a4711...49c5967. Read the comment docs.

@jreback
Copy link
Contributor

left a comment

if you'd just document what you are doing (and fix spelling), looks good to me.

Show resolved Hide resolved pandas/tests/arrays/categorical/test_indexing.py Outdated
@@ -145,3 +145,15 @@ def test_mask_with_boolean_raises(index):

with pytest.raises(ValueError, match='NA / NaN'):
s[idx]


class NonCoercaibleCategorical(Categorical):

This comment has been minimized.

Copy link
@jreback

jreback Dec 10, 2018

Contributor

could do this with a pytest monkeypatch (maybe just create an issue to follow up)

@TomAugspurger

This comment has been minimized.

Copy link
Contributor Author

commented Dec 10, 2018

All green.

@TomAugspurger

This comment has been minimized.

Copy link
Contributor Author

commented Dec 10, 2018

Opened #24207 for followup.

@jreback jreback merged commit 91802fb into pandas-dev:master Dec 10, 2018

3 checks passed

ci/circleci Your tests passed on CircleCI!
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
pandas-dev.pandas Build #20181210.50 succeeded
Details
@jreback

This comment has been minimized.

Copy link
Contributor

commented Dec 10, 2018

thanks!

Pingviinituutti added a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019

Pingviinituutti added a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.