New benchmarks for parsing dates with dayfirst=True and format='%d-%m-%Y' #26360

anmyachev · 2019-05-12T20:47:17Z

closes read_csv : using day first 23x to 35x slower than setting the format explicitly #25848
tests added
passes git diff upstream/master -u -- "*.py" | flake8 --diff

Gap in performance between use dayfirst or format is acceptable now.

asv run -E existing -b ParseDateComparison -a warmup_time=2 -a sample_time=2:

io.csv.ParseDateComparison.time_read_csv_dayfirst	parameter	time
-	cache_dates	-
-	False	16.3±0.4ms
-	True	7.22±0.2ms
io.csv.ParseDateComparison.time_to_datetime_dayfirst	parameter	time
-	cache_dates	-
-	False	15.0±0.3ms
-	True	5.67±0.2ms
io.csv.ParseDateComparison.time_to_datetime_format_DD_MM_YYYY	parameter	time
-	cache_dates	-
-	False	25.0±0.4ms
-	True	5.47±0.4ms

jreback · 2019-05-12T20:50:02Z

great, can you show results for thee in the top-section.

codecov · 2019-05-12T21:39:28Z

Codecov Report

Merging #26360 into master will decrease coverage by 0.13%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master   #26360      +/-   ##
==========================================
- Coverage   91.81%   91.68%   -0.14%     
==========================================
  Files         175      174       -1     
  Lines       52289    50700    -1589     
==========================================
- Hits        48009    46482    -1527     
+ Misses       4280     4218      -62

Flag	Coverage Δ
#multiple	`90.18% <ø> (-0.18%)`	⬇️
#single	`41.19% <ø> (+0.32%)`	⬆️

Impacted Files	Coverage Δ
pandas/io/gbq.py	`78.94% <0%> (-10.53%)`	⬇️
pandas/core/frame.py	`97.01% <0%> (-0.12%)`	⬇️
pandas/io/parsers.py

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4a30fa5...3b1034b. Read the comment docs.

codecov · 2019-05-12T21:39:28Z

Codecov Report

Merging #26360 into master will decrease coverage by 0.12%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master   #26360      +/-   ##
==========================================
- Coverage   91.81%   91.68%   -0.13%     
==========================================
  Files         175      174       -1     
  Lines       52289    50749    -1540     
==========================================
- Hits        48009    46530    -1479     
+ Misses       4280     4219      -61

Flag	Coverage Δ
#multiple	`90.19% <ø> (-0.17%)`	⬇️
#single	`41.16% <ø> (+0.29%)`	⬆️

Impacted Files	Coverage Δ
pandas/io/gbq.py	`78.94% <0%> (-10.53%)`	⬇️
pandas/compat/numpy/function.py	`90.39% <0%> (-0.41%)`	⬇️
pandas/core/frame.py	`97.02% <0%> (-0.12%)`	⬇️
pandas/core/series.py	`93.67% <0%> (ø)`	⬆️
pandas/io/parsers.py
pandas/core/indexes/base.py	`96.72% <0%> (ø)`	⬆️
pandas/core/arrays/integer.py	`96.35% <0%> (+0.02%)`	⬆️
pandas/core/sparse/frame.py	`95.63% <0%> (+0.14%)`	⬆️
pandas/core/arrays/sparse.py	`92.7% <0%> (+0.39%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4a30fa5...e33284d. Read the comment docs.

jreback · 2019-05-14T13:10:45Z

@anmyachev

@jreback Should I make the conditions for testing the same? (for this, most likely I will have to create new classes)

yes making these consistent for benchmarks (and testing both cache on/off would be great)

anmyachev · 2019-05-14T17:03:14Z

yes making these consistent for benchmarks (and testing both cache on/off would be great)

@jreback this is done.

jreback · 2019-05-14T17:13:05Z

thanks @anmyachev

anmyachev added 2 commits May 12, 2019 23:22

added 'dayfirst' parameter to 'ReadCSVParseSpecialDate' benchmark

dd369e0

added benchmark for 'to_datetime' function with '%d-%m-%Y' format

3b1034b

jreback added IO CSV read_csv, to_csv Performance Memory or execution speed performance Timeseries labels May 12, 2019

jreback added this to the 0.25.0 milestone May 12, 2019

anmyachev added 2 commits May 14, 2019 18:33

created 'ParseDateComparison' class for asv testing

31aa606

rollback first benchmarks

e33284d

jreback merged commit e5d15b2 into pandas-dev:master May 14, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New benchmarks for parsing dates with dayfirst=True and format='%d-%m-%Y' #26360

New benchmarks for parsing dates with dayfirst=True and format='%d-%m-%Y' #26360

anmyachev commented May 12, 2019 •

edited

jreback commented May 12, 2019

codecov bot commented May 12, 2019

codecov bot commented May 12, 2019 •

edited

jreback commented May 14, 2019

anmyachev commented May 14, 2019

jreback commented May 14, 2019

New benchmarks for parsing dates with dayfirst=True and format='%d-%m-%Y' #26360

New benchmarks for parsing dates with dayfirst=True and format='%d-%m-%Y' #26360

Conversation

anmyachev commented May 12, 2019 • edited

jreback commented May 12, 2019

codecov bot commented May 12, 2019

Codecov Report

codecov bot commented May 12, 2019 • edited

Codecov Report

jreback commented May 14, 2019

anmyachev commented May 14, 2019

jreback commented May 14, 2019

anmyachev commented May 12, 2019 •

edited

codecov bot commented May 12, 2019 •

edited