Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PERF: improve speed of nans in CategoricalIndex #21493

Merged

Conversation

topper-123
Copy link
Contributor

@topper-123 topper-123 commented Jun 15, 2018

This is a minor follow-up to #21369.

>>> n = 100_000
>>> ci = pd.CategoricalIndex(['a']*n + ['b']*n + ['c']*n + [np.nan])
>>> np.nan in ci
19.5 us  # master
114 ns  # this PR

Using self.hasnans to check for nans is faster than self.isna().any() because it's cached.

@codecov
Copy link

codecov bot commented Jun 15, 2018

Codecov Report

Merging #21493 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #21493      +/-   ##
==========================================
+ Coverage   91.89%   91.89%   +<.01%     
==========================================
  Files         153      153              
  Lines       49604    49604              
==========================================
+ Hits        45584    45586       +2     
+ Misses       4020     4018       -2
Flag Coverage Δ
#multiple 90.3% <100%> (ø) ⬆️
#single 41.88% <0%> (ø) ⬆️
Impacted Files Coverage Δ
pandas/core/indexes/category.py 97.09% <100%> (ø) ⬆️
pandas/util/testing.py 84.81% <0%> (+0.2%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update bf1c3dc...4b868bf. Read the comment docs.

@jreback jreback added this to the 0.23.2 milestone Jun 15, 2018
@jreback jreback added Performance Memory or execution speed performance Categorical Categorical Data Type labels Jun 15, 2018
@jreback jreback merged commit 4918829 into pandas-dev:master Jun 15, 2018
@jreback
Copy link
Contributor

jreback commented Jun 15, 2018

thanks @topper-123

@topper-123 topper-123 deleted the speed_nans_in_CategoricalIndex branch June 17, 2018 09:21
david-liu-brattle-1 pushed a commit to david-liu-brattle-1/pandas that referenced this pull request Jun 18, 2018
@jorisvandenbossche jorisvandenbossche modified the milestones: 0.23.2, 0.24.0 Jul 2, 2018
Sup3rGeo pushed a commit to Sup3rGeo/pandas that referenced this pull request Oct 1, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Categorical Categorical Data Type Performance Memory or execution speed performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants