Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC: Replace .values with .to_numpy() in enhancingperf #26313

Merged
merged 2 commits into from
May 8, 2019

Conversation

huizew
Copy link
Contributor

@huizew huizew commented May 8, 2019

As suggested in #24807 (comment)

Replace .values with .to_numpy() in the benchmark demonstration code.

As suggested in pandas-dev#24807 (comment)

Replace `.values` with `.to_numpy()` in the benchmark demonstration code.
@huizew huizew changed the title DOC: Replace .values with .to_numpy() DOC: Replace .values with .to_numpy() in enhancingperf May 8, 2019
@WillAyd
Copy link
Member

WillAyd commented May 8, 2019

Thanks for the PR! How much effort do you think it would it be to swap out all of these instances across the documentation?

@codecov
Copy link

codecov bot commented May 8, 2019

Codecov Report

Merging #26313 into master will decrease coverage by <.01%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #26313      +/-   ##
==========================================
- Coverage   92.04%   92.03%   -0.01%     
==========================================
  Files         175      175              
  Lines       52302    52302              
==========================================
- Hits        48142    48137       -5     
- Misses       4160     4165       +5
Flag Coverage Δ
#multiple 90.59% <ø> (ø) ⬆️
#single 40.73% <ø> (-0.17%) ⬇️
Impacted Files Coverage Δ
pandas/io/gbq.py 78.94% <0%> (-10.53%) ⬇️
pandas/core/frame.py 97.01% <0%> (-0.12%) ⬇️
pandas/util/testing.py 90.6% <0%> (-0.11%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 20fa58d...05789ca. Read the comment docs.

@codecov
Copy link

codecov bot commented May 8, 2019

Codecov Report

Merging #26313 into master will decrease coverage by <.01%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #26313      +/-   ##
==========================================
- Coverage   92.04%   92.03%   -0.01%     
==========================================
  Files         175      175              
  Lines       52302    52302              
==========================================
- Hits        48142    48137       -5     
- Misses       4160     4165       +5
Flag Coverage Δ
#multiple 90.59% <ø> (ø) ⬆️
#single 40.73% <ø> (-0.17%) ⬇️
Impacted Files Coverage Δ
pandas/io/gbq.py 78.94% <0%> (-10.53%) ⬇️
pandas/core/frame.py 97.01% <0%> (-0.12%) ⬇️
pandas/util/testing.py 90.6% <0%> (-0.11%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 20fa58d...bd2b70e. Read the comment docs.

@huizew
Copy link
Contributor Author

huizew commented May 8, 2019

@WillAyd All the .values in this file have been replaced.

Took a look at the other files under pandas/doc/source. Apart from whatsnew folder, no other file uses .values for pandas objects. And I suppose whatsnew files shouldn't be changed, right?

Fix: after replacing .values with .to_numpy(), some lines are too long to pass the line-length check.
@WillAyd WillAyd added the Docs label May 8, 2019
Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great thanks for checking. I think this looks good - @gfyoung any thoughts?

@WillAyd WillAyd added this to the 0.25.0 milestone May 8, 2019
@gfyoung
Copy link
Member

gfyoung commented May 8, 2019

For our own edification, can those benchmark numbers be double-checked (i.e. the ones that follow the timeit commands)? Are those still approximately correct when we swap in to_numpy?

And how much better is it vs. using .values ?

@huizew
Copy link
Contributor Author

huizew commented May 8, 2019

Thanks for the review.

I just checked that changing to .to_numpy() doesn’t change the benchmark time very much (less than 1%), and all the points that this guide tries to demonstrate still hold.

The performance comparison between .to_numpy() and .values is not discussed in this guide page. The difference seems negligible when compared to other things discussed in this guide page, such as Cython and Jit compiling. Personally I think the main reason behind this PR is to follow the documentation/community’s encouragement to use .to_numpy() rather than .values

@gfyoung
Copy link
Member

gfyoung commented May 8, 2019

Personally I think the main reason behind this PR is to follow the documentation/community’s encouragement to use .to_numpy() rather than .values

True, but always good to double check to make sure we aren't actually proposing a performance regression in our docs.

@gfyoung gfyoung merged commit 7bfbd81 into pandas-dev:master May 8, 2019
@gfyoung
Copy link
Member

gfyoung commented May 8, 2019

Thanks @huizew !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants