Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New benchmarks for parsing dates with dayfirst=True and format='%d-%m-%Y' #26360

Merged
merged 4 commits into from May 14, 2019

Conversation

anmyachev
Copy link
Contributor

@anmyachev anmyachev commented May 12, 2019

Gap in performance between use dayfirst or format is acceptable now.

asv run -E existing -b ParseDateComparison -a warmup_time=2 -a sample_time=2:

io.csv.ParseDateComparison.time_read_csv_dayfirst parameter time
- cache_dates -
- False 16.3±0.4ms
- True 7.22±0.2ms
io.csv.ParseDateComparison.time_to_datetime_dayfirst parameter time
- cache_dates -
- False 15.0±0.3ms
- True 5.67±0.2ms
io.csv.ParseDateComparison.time_to_datetime_format_DD_MM_YYYY parameter time
- cache_dates -
- False 25.0±0.4ms
- True 5.47±0.4ms

@jreback
Copy link
Contributor

jreback commented May 12, 2019

great, can you show results for thee in the top-section.

@jreback jreback added IO CSV read_csv, to_csv Performance Memory or execution speed performance Timeseries labels May 12, 2019
@jreback jreback added this to the 0.25.0 milestone May 12, 2019
@codecov
Copy link

codecov bot commented May 12, 2019

Codecov Report

Merging #26360 into master will decrease coverage by 0.13%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #26360      +/-   ##
==========================================
- Coverage   91.81%   91.68%   -0.14%     
==========================================
  Files         175      174       -1     
  Lines       52289    50700    -1589     
==========================================
- Hits        48009    46482    -1527     
+ Misses       4280     4218      -62
Flag Coverage Δ
#multiple 90.18% <ø> (-0.18%) ⬇️
#single 41.19% <ø> (+0.32%) ⬆️
Impacted Files Coverage Δ
pandas/io/gbq.py 78.94% <0%> (-10.53%) ⬇️
pandas/core/frame.py 97.01% <0%> (-0.12%) ⬇️
pandas/io/parsers.py

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4a30fa5...3b1034b. Read the comment docs.

@codecov
Copy link

codecov bot commented May 12, 2019

Codecov Report

Merging #26360 into master will decrease coverage by 0.12%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #26360      +/-   ##
==========================================
- Coverage   91.81%   91.68%   -0.13%     
==========================================
  Files         175      174       -1     
  Lines       52289    50749    -1540     
==========================================
- Hits        48009    46530    -1479     
+ Misses       4280     4219      -61
Flag Coverage Δ
#multiple 90.19% <ø> (-0.17%) ⬇️
#single 41.16% <ø> (+0.29%) ⬆️
Impacted Files Coverage Δ
pandas/io/gbq.py 78.94% <0%> (-10.53%) ⬇️
pandas/compat/numpy/function.py 90.39% <0%> (-0.41%) ⬇️
pandas/core/frame.py 97.02% <0%> (-0.12%) ⬇️
pandas/core/series.py 93.67% <0%> (ø) ⬆️
pandas/io/parsers.py
pandas/core/indexes/base.py 96.72% <0%> (ø) ⬆️
pandas/core/arrays/integer.py 96.35% <0%> (+0.02%) ⬆️
pandas/core/sparse/frame.py 95.63% <0%> (+0.14%) ⬆️
pandas/core/arrays/sparse.py 92.7% <0%> (+0.39%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4a30fa5...e33284d. Read the comment docs.

@jreback
Copy link
Contributor

jreback commented May 14, 2019

@anmyachev

@jreback Should I make the conditions for testing the same? (for this, most likely I will have to create new classes)

yes making these consistent for benchmarks (and testing both cache on/off would be great)

@anmyachev
Copy link
Contributor Author

yes making these consistent for benchmarks (and testing both cache on/off would be great)

@jreback this is done.

@jreback jreback merged commit e5d15b2 into pandas-dev:master May 14, 2019
@jreback
Copy link
Contributor

jreback commented May 14, 2019

thanks @anmyachev

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO CSV read_csv, to_csv Performance Memory or execution speed performance Timeseries
Projects
None yet
Development

Successfully merging this pull request may close these issues.

read_csv : using day first 23x to 35x slower than setting the format explicitly
2 participants