Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: added new cache_dates parameter for read_csv func #25990

Merged
merged 9 commits into from May 7, 2019

Conversation

@anmyachev
Copy link
Contributor

commented Apr 4, 2019

  • closes N/A
  • tests added / passed
  • passes git diff upstream/master -u -- "*.py" | flake8 --diff
  • whatsnew entry

This change allows the user to use caching when parsing dates, which is provided by to_datetime function

@pep8speaks

This comment has been minimized.

Copy link

commented Apr 4, 2019

Hello @anmyachev! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2019-05-07 08:30:55 UTC
pandas/io/parsers.py Outdated Show resolved Hide resolved
@codecov

This comment has been minimized.

Copy link

commented Apr 4, 2019

Codecov Report

Merging #25990 into master will decrease coverage by 49.95%.
The diff coverage is 100%.

Impacted file tree graph

@@             Coverage Diff             @@
##           master   #25990       +/-   ##
===========================================
- Coverage   91.86%    41.9%   -49.96%     
===========================================
  Files         175      175               
  Lines       52543    52544        +1     
===========================================
- Hits        48267    22021    -26246     
- Misses       4276    30523    +26247
Flag Coverage Δ
#multiple ?
#single 41.9% <100%> (-0.07%) ⬇️
Impacted Files Coverage Δ
pandas/io/parsers.py 48.24% <100%> (-47.55%) ⬇️
pandas/io/formats/latex.py 0% <0%> (-100%) ⬇️
pandas/io/sas/sas_constants.py 0% <0%> (-100%) ⬇️
pandas/tseries/plotting.py 0% <0%> (-100%) ⬇️
pandas/tseries/converter.py 0% <0%> (-100%) ⬇️
pandas/io/formats/html.py 0% <0%> (-99.36%) ⬇️
pandas/core/groupby/categorical.py 0% <0%> (-95.46%) ⬇️
pandas/io/sas/sas7bdat.py 0% <0%> (-91.16%) ⬇️
pandas/io/sas/sas_xport.py 0% <0%> (-90.1%) ⬇️
pandas/core/tools/numeric.py 10.44% <0%> (-89.56%) ⬇️
... and 131 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 70773d9...c9b258f. Read the comment docs.

@codecov

This comment has been minimized.

Copy link

commented Apr 4, 2019

Codecov Report

Merging #25990 into master will decrease coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #25990      +/-   ##
==========================================
- Coverage   91.99%   91.98%   -0.01%     
==========================================
  Files         175      175              
  Lines       52377    52378       +1     
==========================================
- Hits        48182    48180       -2     
- Misses       4195     4198       +3
Flag Coverage Δ
#multiple 90.53% <100%> (ø) ⬆️
#single 40.77% <100%> (-0.13%) ⬇️
Impacted Files Coverage Δ
pandas/io/parsers.py 95.84% <100%> (ø) ⬆️
pandas/io/gbq.py 78.94% <0%> (-10.53%) ⬇️
pandas/core/frame.py 96.9% <0%> (-0.12%) ⬇️
pandas/util/testing.py 90.7% <0%> (+0.1%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 64c1127...687d0c4. Read the comment docs.

pandas/io/parsers.py Outdated Show resolved Hide resolved
@jreback

This comment has been minimized.

Copy link
Contributor

commented Apr 5, 2019

while we are doing this may as well flip the flag in pd.to_datetime as well to True. (it could be a separate PR), but need to run asv's.

@anmyachev anmyachev force-pushed the anmyachev:cache_dates_api_change branch from 9b0a443 to 2fd52d0 Apr 5, 2019

@anmyachev

This comment has been minimized.

Copy link
Contributor Author

commented Apr 5, 2019

@jreback I made sure that changing to_datetime signature should be done in a separate request, because, unfortunately, it is not enough to change it alone, I will have to change the signatures of other functions. So last commit will be removed from this PR.

@anmyachev anmyachev force-pushed the anmyachev:cache_dates_api_change branch from 0925e56 to 2fd52d0 Apr 5, 2019

@anmyachev

This comment has been minimized.

Copy link
Contributor Author

commented Apr 5, 2019

Note for reviewers: it seems that this PR have the same errors as master branch

@anmyachev anmyachev force-pushed the anmyachev:cache_dates_api_change branch from 7ca68a1 to 002c031 Apr 5, 2019

@gfyoung

This comment has been minimized.

Copy link
Member

commented Apr 8, 2019

We already have so many datetime parameters at this point, but I'm thinking at some point we should condense them into a dict. This would allow for ease of parameterization without bloat.

@jreback : What do you think?

@vnlitvin

This comment has been minimized.

Copy link
Contributor

commented Apr 8, 2019

I'm thinking at some point we should condense them into a dict.

Maybe declare a collections.namedtuple for those arguments, so they're fix-named?

@gfyoung

This comment has been minimized.

Copy link
Member

commented Apr 8, 2019

Maybe declare a collections.namedtuple for those arguments, so they're fix-named?

@vnlitvin : I'm actually inclined to leave it as dict for the time being. I would like to keep that portion of the API flexible so that the only limits on how people want to parse dates is the signature underlying our datetime parsing.

@anmyachev

This comment has been minimized.

Copy link
Contributor Author

commented Apr 8, 2019

asv continuous -f 1.1 origin/master cache_dates_api_change -a warmup_time=1 -a sample_time=1

master our_patch ratio test_name
1.45±0.01ms 1.87±0.01ms 1.29 io.csv.ReadCSVParseDates.time_baseline
1.74±0.02ms 2.24±0.02ms 1.29 io.csv.ReadCSVParseDates.time_multiple_date
1.41±0.01ms 1.56±0.02ms 1.11 io.csv.ReadCSVDInferDatetimeFormat.time_read_csv(False, 'iso8601')
1.34±0ms 1.47±0.02ms 1.10 io.csv.ReadCSVDInferDatetimeFormat.time_read_csv(False, 'ymd')
47.3±0.3ms 6.45±0.04ms 0.14 io.csv.ReadCSVCachedParseDates.time_read_csv_cached(True)
@anmyachev

This comment has been minimized.

Copy link
Contributor Author

commented Apr 9, 2019

@gfyoung condensation of parameters into dict should be in a separate PR, shouldn't it?

@gfyoung

This comment has been minimized.

Copy link
Member

commented Apr 9, 2019

@anmyachev : Absolutely! Separate PR would be most appropriate.

@anmyachev

This comment has been minimized.

Copy link
Contributor Author

commented Apr 9, 2019

@gfyoung ok. Anything left to fix in this PR?

@gfyoung

gfyoung approved these changes Apr 9, 2019

Copy link
Member

left a comment

Over to you @jreback

@jreback jreback added this to the 0.25.0 milestone Apr 9, 2019

doc/source/whatsnew/v0.25.0.rst Outdated Show resolved Hide resolved
asv_bench/benchmarks/io/csv.py Outdated Show resolved Hide resolved
@jreback

This comment has been minimized.

Copy link
Contributor

commented Apr 9, 2019

small comments, otherwise lgtm. Is that an issue for changing this in pd.to_datetime as well? (the default)

@anmyachev

This comment has been minimized.

Copy link
Contributor Author

commented Apr 10, 2019

@jreback when changing the flag in pd.to_datetime to true, errors occur on some platforms. I do not remember exactly what the problem was there. But in the near future I will create the PR right with this change. That's when I can answer your question.

@anmyachev anmyachev force-pushed the anmyachev:cache_dates_api_change branch from ed521d6 to 94c0eb5 Apr 10, 2019

@anmyachev

This comment has been minimized.

Copy link
Contributor Author

commented Apr 10, 2019

@jreback after getting result of CI tests for #26043 I can show you an example.
I think we can continue our discussion there.

@vnlitvin

This comment has been minimized.

Copy link
Contributor

commented Apr 17, 2019

@jreback is there anything left to fix before this could be merged?

@vnlitvin vnlitvin force-pushed the anmyachev:cache_dates_api_change branch from 94c0eb5 to 1c85780 Apr 18, 2019

@vnlitvin

This comment has been minimized.

Copy link
Contributor

commented Apr 18, 2019

@jreback

@anmyachev this is a troubling comment. #25990 (comment)

can you show me an example?

Does our example (#26097) satisfy you here? I feel that it shouldn't be a blocker for this PR.

P.S. I've rebased on master to get rid of merge conflicts (and to have a clean history).

@vnlitvin

This comment has been minimized.

Copy link
Contributor

commented Apr 25, 2019

@jreback @gfyoung anything needed from us?

doc/source/whatsnew/v0.25.0.rst Outdated Show resolved Hide resolved
@jreback

This comment has been minimized.

Copy link
Contributor

commented Apr 28, 2019

is there an issue for this?

@anmyachev

This comment has been minimized.

Copy link
Contributor Author

commented Apr 28, 2019

is there an issue for this?

@jreback no errors detected for this.

@anmyachev anmyachev force-pushed the anmyachev:cache_dates_api_change branch from ee0b37f to af757f2 Apr 28, 2019

@jreback

This comment has been minimized.

Copy link
Contributor

commented Apr 28, 2019

no errors detected for this.

what does this mean?

@anmyachev

This comment has been minimized.

Copy link
Contributor Author

commented Apr 28, 2019

no errors detected for this.

what does this mean?

I mean: I not found issue for this.

@vnlitvin

This comment has been minimized.

Copy link
Contributor

commented Apr 29, 2019

is there an issue for this?

If you mean Github issue then no, we didn't file one and we didn't fine any.
That's why this PR is used as a reference in whatsnew.

@jreback

This comment has been minimized.

Copy link
Contributor

commented May 7, 2019

can you merge master.

@anmyachev anmyachev force-pushed the anmyachev:cache_dates_api_change branch from af757f2 to 687d0c4 May 7, 2019

@anmyachev

This comment has been minimized.

Copy link
Contributor Author

commented May 7, 2019

@jreback Done & green

@jreback

jreback approved these changes May 7, 2019

@jreback jreback merged commit f65742c into pandas-dev:master May 7, 2019

11 checks passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
pandas-dev.pandas Build #20190507.22 succeeded
Details
pandas-dev.pandas (Checks_and_doc) Checks_and_doc succeeded
Details
pandas-dev.pandas (Linux py35_compat) Linux py35_compat succeeded
Details
pandas-dev.pandas (Linux py36_locale_slow) Linux py36_locale_slow succeeded
Details
pandas-dev.pandas (Linux py36_locale_slow_old_np) Linux py36_locale_slow_old_np succeeded
Details
pandas-dev.pandas (Linux py37_locale) Linux py37_locale succeeded
Details
pandas-dev.pandas (Linux py37_np_dev) Linux py37_np_dev succeeded
Details
pandas-dev.pandas (Windows py36_np15) Windows py36_np15 succeeded
Details
pandas-dev.pandas (Windows py37_np141) Windows py37_np141 succeeded
Details
pandas-dev.pandas (macOS py35_macos) macOS py35_macos succeeded
Details
@jreback

This comment has been minimized.

Copy link
Contributor

commented May 7, 2019

thanks @anmyachev

@TomAugspurger

This comment has been minimized.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
6 participants
You can’t perform that action at this time.