Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove property that re-computed microsecond #17331

Merged
merged 6 commits into from
Sep 7, 2017

Conversation

jbrockmendel
Copy link
Member

@jbrockmendel jbrockmendel commented Aug 24, 2017

Just accessing the already-existing attribute is about 50-130x faster than re-computing it. These results are from a 2011-era MBP.

Before:

In [1]: import pandas as pd
In [2]: ts = pd.Timestamp.now()
In [3]: %timeit ts.microsecond
The slowest run took 4.69 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 29.7 µs per loop

After:

In [1]: import pandas as pd
In [2]: ts = pd.Timestamp.now()
In [3]: %timeit ts.microsecond
The slowest run took 8.08 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 620 ns per loop

This accounts for the discrepancy discussed in #17234 (#17297 (review))

Per requests from @jreback, I'm refraining from moving forward with any other optimizations that this makes possible (e.g. removing the roundabout call to convert_to_tsobject from Timestamp.to_pydatetime).

  • closes #xxxx
  • tests added / passed
  • passes git diff upstream/master -u -- "*.py" | flake8 --diff
  • whatsnew entry

just accessing the already-existing attribute is about 10x faster than re-computing it
@jbrockmendel jbrockmendel changed the title Remove property that re-computed microsecond; Remove property that re-computed microsecond Aug 24, 2017
@gfyoung gfyoung added Enhancement Performance Memory or execution speed performance labels Aug 24, 2017
@gfyoung
Copy link
Member

gfyoung commented Aug 24, 2017

@jbrockmendel : Could you post performance metrics in this PR just for reference?

@jbrockmendel
Copy link
Member Author

See update above.

@jbrockmendel
Copy link
Member Author

The error on Travis here is identical to the one in #17318. I can't reproduce it locally in either branch. Is it plausible that the issue is with Travis?

@jreback
Copy link
Contributor

jreback commented Aug 25, 2017

The error on Travis here is identical to the one in #17318. I can't reproduce it locally in either branch. Is it plausible that the issue is with Travis?

No, the issue is here: #17323

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needs a whatsnew note and asv (and add asv for all Timestamp properties)

@pep8speaks
Copy link

pep8speaks commented Aug 25, 2017

Hello @jbrockmendel! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on August 29, 2017 at 14:47 Hours UTC

@codecov
Copy link

codecov bot commented Aug 25, 2017

Codecov Report

Merging #17331 into master will decrease coverage by 0.01%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #17331      +/-   ##
==========================================
- Coverage   91.01%   90.99%   -0.02%     
==========================================
  Files         162      162              
  Lines       49567    49567              
==========================================
- Hits        45113    45104       -9     
- Misses       4454     4463       +9
Flag Coverage Δ
#multiple 88.77% <ø> (ø) ⬆️
#single 40.24% <ø> (-0.07%) ⬇️
Impacted Files Coverage Δ
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️
pandas/core/frame.py 97.72% <0%> (-0.1%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 96f92eb...26c0980. Read the comment docs.

@codecov
Copy link

codecov bot commented Aug 25, 2017

Codecov Report

Merging #17331 into master will decrease coverage by 0.01%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #17331      +/-   ##
==========================================
- Coverage   91.01%   90.99%   -0.02%     
==========================================
  Files         162      163       +1     
  Lines       49567    49561       -6     
==========================================
- Hits        45113    45098      -15     
- Misses       4454     4463       +9
Flag Coverage Δ
#multiple 88.77% <ø> (-0.01%) ⬇️
#single 40.25% <ø> (-0.06%) ⬇️
Impacted Files Coverage Δ
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️
pandas/core/frame.py 97.72% <0%> (-0.1%) ⬇️
pandas/core/base.py 95.96% <0%> (-0.07%) ⬇️
pandas/core/series.py 94.95% <0%> (-0.05%) ⬇️
pandas/core/strings.py 98.45% <0%> (-0.03%) ⬇️
pandas/core/indexing.py 93.94% <0%> (ø) ⬆️
pandas/core/accessor.py 100% <0%> (ø)
pandas/core/generic.py 91.99% <0%> (ø) ⬆️
pandas/core/indexes/base.py 95.88% <0%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 96f92eb...9919df4. Read the comment docs.

@jbrockmendel
Copy link
Member Author

Just pushed with asv and note in whatsnew. Two runs of asv results:

[100.00%] ··· Running timestamp.TimestampProperties.time_weekday_name                                                                                                                                                                                     26.8±0.9μs       before           after         ratio
     [96f92eb1]       [48a59233]
+        34.7±1μs        70.6±10μs     2.03  timestamp.TimestampProperties.time_is_quarter_end
+      34.4±0.8μs         57.4±2μs     1.67  timestamp.TimestampProperties.time_is_quarter_start
+        35.7±2μs         49.9±2μs     1.40  timestamp.TimestampProperties.time_is_year_end
+     1.18±0.02μs      1.58±0.05μs     1.35  timestamp.TimestampProperties.time_freqstr
+      31.5±0.4μs       38.8±0.3μs     1.23  timestamp.TimestampProperties.time_is_month_end
+      30.7±0.5μs         37.7±1μs     1.23  timestamp.TimestampProperties.time_days_in_month
+      33.3±0.6μs         38.8±1μs     1.17  timestamp.TimestampProperties.time_is_month_start
-      30.6±0.8μs         473±10ns     0.02  timestamp.TimestampProperties.time_microsecond
[100.00%] ··· Running timestamp.TimestampProperties.time_weekday_name                                                                                                                                                                                     30.2±0.7μs       before           after         ratio
     [96f92eb1]       [48a59233]
-        27.0±2μs       20.5±0.5μs     0.76  timestamp.TimestampProperties.time_week
-        399±20ns          301±8ns     0.75  timestamp.TimestampProperties.time_tz
-        27.8±6μs       20.6±0.4μs     0.74  timestamp.TimestampProperties.time_quarter
-        36.3±4μs       23.5±0.8μs     0.65  timestamp.TimestampProperties.time_is_year_end
-      30.2±0.7μs       15.4±0.3μs     0.51  timestamp.TimestampProperties.time_weekday_name
-      22.7±0.7μs          239±8ns     0.01  timestamp.TimestampProperties.time_microsecond

microsecond is the only one that I should be changed, so it's reassuring that it is an order of magnitude bigger change than everything else.

@jbrockmendel
Copy link
Member Author

Looks like the Travis error is the same OSX error that is happening elsewhere. The AppVeyor error message looked like an HTTP timeout.

@@ -325,7 +325,7 @@ Performance Improvements

- Improved performance of instantiating :class:`SparseDataFrame` (:issue:`16773`)
- :attr:`Series.dt` no longer performs frequency inference, yielding a large speedup when accessing the attribute (:issue:`17210`)

- `Timestamp.microsecond` no longer re-computes on attribute access.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add the PR number as the issue. we always want a reference to an entry in whatsnew (issue preferd, or PR if not avail).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

either use double-backticks, or :attr:`Timestamp.microsecond` (in general prefer the api references). I don't think this API reference will actually work ATM as Timestamp is not defined in api.rst (so separate issue to rectify that).

@jbrockmendel
Copy link
Member Author

What's the syntax for referencing a PR? For an Issue I'd use (:issue:`17331`). should I just replace "issue" with "pull" here?

@jreback
Copy link
Contributor

jreback commented Aug 29, 2017

same syntax just use the pr number

@jreback jreback merged commit 93e23a7 into pandas-dev:master Sep 7, 2017
@jreback
Copy link
Contributor

jreback commented Sep 7, 2017

thanks!

jbrockmendel added a commit to jbrockmendel/pandas that referenced this pull request Sep 10, 2017
@jbrockmendel jbrockmendel deleted the microsecond branch October 30, 2017 16:23
alanbato pushed a commit to alanbato/pandas that referenced this pull request Nov 10, 2017
No-Stream pushed a commit to No-Stream/pandas that referenced this pull request Nov 28, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Performance Memory or execution speed performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants