Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG/PERF: Use EA in Index.get_value #24204

Merged
merged 3 commits into from
Dec 10, 2018

Conversation

TomAugspurger
Copy link
Contributor

Avoid converting the ExtensionArray to an ndarray.

Before

pandas/tests/arrays/categorical/test_indexing.py F                                                                                                   [100%]

========================================================================= FAILURES =========================================================================
______________________________________________________________________ test_series_at ______________________________________________________________________

    def test_series_at():
        arr = NonCoercaibleCategorical(['a', 'b', 'c'])
        ser = Series(arr)
>       result = ser.at[0]

pandas/tests/arrays/categorical/test_indexing.py:158:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
pandas/core/indexing.py:2266: in __getitem__
    return self.obj._get_value(*key, takeable=self._takeable)
pandas/core/series.py:1078: in _get_value
    return self.index.get_value(self._values, label)
pandas/core/indexes/base.py:4303: in get_value
    s = com.values_from_object(series)
pandas/_libs/lib.pyx:82: in pandas._libs.lib.values_from_object
    obj = func()
pandas/core/arrays/categorical.py:1509: in get_values
    return np.array(self)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <[ValueError("I cannot be converted.") raised in repr()] NonCoercaibleCategorical object at 0x113d8b6a0>, dtype = None

    def __array__(self, dtype=None):
>       raise ValueError("I cannot be converted.")
E       ValueError: I cannot be converted.

Perf:

In [3]: a = pd.Series(pd.Categorical(np.random.choice(list(string.ascii_letters[:10]), 10_000)))

In [4]: %timeit a.at[0]
143 µs ± 4.86 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [3]: a = pd.Series(pd.Categorical(np.random.choice(list(string.ascii_letters[:10]), 10_000)))

In [4]: %timeit a.at[0]
11.1 µs ± 95.2 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

Split from #24024

Avoid converting the ExtensionArray to an ndarray.

Before

```
pandas/tests/arrays/categorical/test_indexing.py F                                                                                                   [100%]

========================================================================= FAILURES =========================================================================
______________________________________________________________________ test_series_at ______________________________________________________________________

    def test_series_at():
        arr = NonCoercaibleCategorical(['a', 'b', 'c'])
        ser = Series(arr)
>       result = ser.at[0]

pandas/tests/arrays/categorical/test_indexing.py:158:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
pandas/core/indexing.py:2266: in __getitem__
    return self.obj._get_value(*key, takeable=self._takeable)
pandas/core/series.py:1078: in _get_value
    return self.index.get_value(self._values, label)
pandas/core/indexes/base.py:4303: in get_value
    s = com.values_from_object(series)
pandas/_libs/lib.pyx:82: in pandas._libs.lib.values_from_object
    obj = func()
pandas/core/arrays/categorical.py:1509: in get_values
    return np.array(self)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <[ValueError("I cannot be converted.") raised in repr()] NonCoercaibleCategorical object at 0x113d8b6a0>, dtype = None

    def __array__(self, dtype=None):
>       raise ValueError("I cannot be converted.")
E       ValueError: I cannot be converted.

```

Perf:

```
In [3]: a = pd.Series(pd.Categorical(np.random.choice(list(string.ascii_letters[:10]), 10_000)))

In [4]: %timeit a.at[0]
143 µs ± 4.86 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [3]: a = pd.Series(pd.Categorical(np.random.choice(list(string.ascii_letters[:10]), 10_000)))

In [4]: %timeit a.at[0]
11.1 µs ± 95.2 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
```
@TomAugspurger TomAugspurger added Performance Memory or execution speed performance ExtensionArray Extending pandas with custom dtypes or arrays. labels Dec 10, 2018
@TomAugspurger TomAugspurger added this to the 0.24.0 milestone Dec 10, 2018
@pep8speaks
Copy link

Hello @TomAugspurger! Thanks for submitting the PR.

@@ -145,3 +145,15 @@ def test_mask_with_boolean_raises(index):

with pytest.raises(ValueError, match='NA / NaN'):
s[idx]


class NonCoercaibleCategorical(Categorical):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I imagine this pattern would be generally useful to detect places where we accidentally convert an EA to an ndarray. What would be the best way to share this among tests? Would making a fixture that returned this be too weird? Monkeypatching?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could do this with a pytest monkeypatch (maybe just create an issue to follow up)

@TomAugspurger TomAugspurger mentioned this pull request Dec 10, 2018
12 tasks
@codecov
Copy link

codecov bot commented Dec 10, 2018

Codecov Report

Merging #24204 into master will not change coverage.
The diff coverage is 100%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master   #24204   +/-   ##
=======================================
  Coverage   92.21%   92.21%           
=======================================
  Files         162      162           
  Lines       51723    51723           
=======================================
  Hits        47694    47694           
  Misses       4029     4029
Flag Coverage Δ
#multiple 90.61% <100%> (ø) ⬆️
#single 43.03% <100%> (ø) ⬆️
Impacted Files Coverage Δ
pandas/core/indexes/base.py 96.27% <100%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c5a4711...7a6a16d. Read the comment docs.

@codecov
Copy link

codecov bot commented Dec 10, 2018

Codecov Report

Merging #24204 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #24204      +/-   ##
==========================================
+ Coverage   92.21%   92.21%   +<.01%     
==========================================
  Files         162      162              
  Lines       51723    51761      +38     
==========================================
+ Hits        47694    47731      +37     
- Misses       4029     4030       +1
Flag Coverage Δ
#multiple 90.61% <100%> (ø) ⬆️
#single 43% <100%> (-0.03%) ⬇️
Impacted Files Coverage Δ
pandas/core/indexes/base.py 96.27% <100%> (ø) ⬆️
pandas/core/arrays/categorical.py 95.3% <0%> (-0.1%) ⬇️
pandas/core/generic.py 96.65% <0%> (ø) ⬆️
pandas/core/arrays/sparse.py 92.08% <0%> (ø) ⬆️
pandas/core/arrays/base.py 97.41% <0%> (ø) ⬆️
pandas/core/indexes/category.py 97.9% <0%> (ø) ⬆️
pandas/core/internals/blocks.py 93.81% <0%> (+0.13%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c5a4711...49c5967. Read the comment docs.

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you'd just document what you are doing (and fix spelling), looks good to me.

pandas/tests/arrays/categorical/test_indexing.py Outdated Show resolved Hide resolved
@@ -145,3 +145,15 @@ def test_mask_with_boolean_raises(index):

with pytest.raises(ValueError, match='NA / NaN'):
s[idx]


class NonCoercaibleCategorical(Categorical):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could do this with a pytest monkeypatch (maybe just create an issue to follow up)

@TomAugspurger
Copy link
Contributor Author

All green.

@TomAugspurger
Copy link
Contributor Author

Opened #24207 for followup.

@jreback jreback merged commit 91802fb into pandas-dev:master Dec 10, 2018
@jreback
Copy link
Contributor

jreback commented Dec 10, 2018

thanks!

Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019
Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ExtensionArray Extending pandas with custom dtypes or arrays. Performance Memory or execution speed performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants