Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PERF: faster dir() calls #37450

Merged
merged 6 commits into from
Oct 29, 2020
Merged

PERF: faster dir() calls #37450

merged 6 commits into from
Oct 29, 2020

Conversation

topper-123
Copy link
Contributor

@topper-123 topper-123 commented Oct 27, 2020

I experience slow tab-completion in iPython, so I've optimized dir calls in pandas, e.g.

>>> n = 100_000
>>> ser = pd.Series(['a'] * n]
>>> %timeit dir(ser)
3.73 ms ± 34.7 µs per loop  # master
253 µs ± 4.3 µs per loop  # this PR

It does this by caching the output for the info axis, when the info axis may have string, and returning an empty set for info axes that cannot have strings (numeric indexes etc.)

The above didn't actually improve the subjective speed of tab completion for me, so the problem there probably is in Ipython, but the change in this PR can't hurt either.

for c in self._info_axis.unique(level=0)[:100]
if isinstance(c, str) and c.isidentifier()
}
additions = self._info_axis._dir_additions_for_owner
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of adding all of this code, why can't you slap a cache_readonly here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not possible, because the axes (index/columns) can be changed. However, the axis labels are immutable, so it's safe to use cache_readonly there.

I've changed the PR to avoid the addition in child classes.

@jreback jreback added the Performance Memory or execution speed performance label Oct 29, 2020
@jreback jreback added this to the 1.2 milestone Oct 29, 2020
@jreback jreback merged commit d2c0674 into pandas-dev:master Oct 29, 2020
@jreback
Copy link
Contributor

jreback commented Oct 29, 2020

thanks @topper-123

@topper-123 topper-123 deleted the accessor_perf branch October 29, 2020 14:01
kesmit13 pushed a commit to kesmit13/pandas that referenced this pull request Nov 2, 2020
ukarroum pushed a commit to ukarroum/pandas that referenced this pull request Nov 2, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Performance Memory or execution speed performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants