Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Already on GitHub? Sign in to your account

Add dateutil kwargs to csv2rec #1210

Merged
merged 2 commits into from Apr 2, 2013

Conversation

Projects
None yet
5 participants
Member

dmcdougall commented Sep 6, 2012

Fixes ambiguous date problems between US/UK date conventions.

My attempt at resolving #208.

I'm not sure this is the best way to resolve this issue. Another option is to keep the dateutil stuff separate from the specific csv2rec kwargs by having only one dateutil_properties kwarg, say. This kwarg would take a dictionary of keyword arguments that are passed directly to the dateutil parser. The functionality would be the same, with the only difference being an aesthetic one.

Another method is suggested by @efiring in the comments in #208.

Edit: @efiring's suggestion is to have one kwarg specifying the exact date in strptime format, rather than having dateutil do it.

@dmcdougall dmcdougall Add dateutil kwargs to csv2rec
Fixes ambiguous date problems between US/UK date conventions.
b096c81

@pelson pelson commented on the diff Sep 7, 2012

lib/matplotlib/mlab.py
@@ -2090,7 +2090,7 @@ def extract(r):
def csv2rec(fname, comments='#', skiprows=0, checkrows=0, delimiter=',',
converterd=None, names=None, missing='', missingd=None,
- use_mrecords=False):
+ use_mrecords=False, dayfirst=False, yearfirst=False):
@pelson

pelson Sep 7, 2012

Member

Could we just use the builtin datetime for this? (showing my naivety here). If a user specifies a string format as a kwarg, use that with strftime. If not, let dateparser do its (frightening) magic.

@dmcdougall

dmcdougall Sep 8, 2012

Member

@pelson That's a nice extension of @efiring's suggestion. I'm happy with that method, too. I'll leave this for a while and let others weigh in with thoughts and opinions.

@dmcdougall

dmcdougall Sep 8, 2012

Member

Actually, thinking about it, that wouldn't fix current scripts...

@efiring

efiring Sep 8, 2012

Owner

It's hopeless; there is nothing reasonably simple you can do to fix existing usage of this function when applied to ambiguous dates. All we can do is make it easier for users to write new code, or fix old code, so it doesn't blow up in their faces. Dateparser is badly designed for our purposes, but for the time being we are stuck with it. Some method of feeding it the kwargs seems like an essential improvement.

Member

dmcdougall commented Jan 24, 2013

This may or may not be helpful.

Member

pelson commented Mar 29, 2013

@dmcdougall - where do you want to go with this PR? For this to get merged a bit of documentation would be needed, but not much else.

A much more extreme option would be to start down the road of deprecating these routines and reducing the amount of non visualisation code that sits in the mpl codebase - if I ever need to read a CSV file matplotlib would not be the first place I looked. I suspect this option wouldn't be very popular though... 😉

Member

dmcdougall commented Mar 30, 2013

@pelson I think the kwargs option, as @efiring pointed out some time ago, is the way to go. As far as removing CSV support is concerned, I think that is a matter needing discussion and consensus on the mailing list.

I'll push a rebase and doc update for this.

P.S. Thanks for peeling the scabs off these old wounds. They have been sitting for too long and have become stale.

Member

dmcdougall commented Mar 31, 2013

@pelson Done.

@pelson pelson added a commit that referenced this pull request Apr 2, 2013

@pelson pelson Merge pull request #1210 from dmcdougall/csv2rec_date
Add dateutil kwargs to csv2rec
ff6d7eb

@pelson pelson merged commit ff6d7eb into matplotlib:master Apr 2, 2013

1 check passed

default The Travis build passed
Details
Contributor

ultra-andy commented Mar 18, 2016

Hello, I've been looking at csv2rec for a bit, trying to understand what looks like a bug in the code.

If a test.csv file contains datetime type entries (ie, file contains "11/01/14 12:00:01 AM" only), and you attempt to run the following script:

import matplotlib.mlab as mlab
import datetime as datetime

dataset = mlab.csv2rec('test.csv', delimiter=',', names=['datetime'], dayfirst = True)
#dataset = mlab.csv2rec('test.csv', delimiter=',',converterd={0: lambda x: datetime.datetime.strptime(x, '%d/%m/%y %H:%M:%S %p'),}, names= ['datetime'], dayfirst = True)

for elem in dataset:
    print(elem, end = "\n")

...then the datetime object echoed misinterprets the datetime by swapping the day and the month.

The problem is that there are inconsistent handlings of formats in the converter candidate dictionary in mlab.csv2rec. If the time entry is a simple date ("11/01/14"), then the dates will be echoed by the print statement correctly. If the column in the .csv file contains a datetime object (ie date and time, for instance "11/01/14 12:00:01 AM"), then the datetime data will be interpreted as though dayfirst and yearfirst have NOT been specified, and so will NOT be echoed correctly.

This happens because the mydate converter (which uses dayfirst and yearfirst) returns an error if d.hour or d.minute or d.second is greater than zero, but the alternative mydateparser converter, which does not return an error, does not interpret dayfirst or yearfirst, so assumes monthfirst. So then datetime strings in the .csv file are not interpreted correctly if they are other than month first format, and are a datetime string with hour or minute or second other than zero.

In other words, mydateparser is NOT consistent with mydate. Mydateparser should reflect the approach used for mydate and also refer to the dayfirst and yearfirst arguments so that datetime elements of the .csv file column are interpreted in a way that matches dayfirst and/or yearfirst arguments in the call, if specified as True.

A fix that works in my own C:\Python33\Lib\site-packages\matplotlib\mlab.py file is to replace this line:

mydateparser = with_default_value(dateparser, datetime.date(1,1,1))

with this block:

def mydateparser(x):
    # try and return a datetime object
    d = dateparser(x, dayfirst=dayfirst, yearfirst=yearfirst)
    return d
mydateparser = with_default_value(mydateparser, datetime.datetime(1,1,1,0,0,0))

This problem is also present in Python 3.4, and probably exists in other releases as well. This appears to be a design bug. I think it is a dangerous bug because it fails in a way that is inconsistent with other aspects of function behaviour, and can silently leave the user with invalid data despite the correct format having been specified.

The proposed fix shouldn't break any code that uses the workaround of explicitly specifying a converterd in the csv2rec arguments (which also works - included as commented out line in script code above).

It would be good to get this bug fixed properly, so let me know if I need to do anything further to achieve this. I'm new to this bug reporting system, so apologies if I've done anything unconventional or improper - please just point me in the right direction!

Owner

jenshnielsen commented Mar 18, 2016

@ultra-andy Can you please create a new issue or pr. Comments on a almost 3 year old pull request that have already been merged are almost sure to be lost.

Contributor

ultra-andy commented Mar 18, 2016

Thanks Jen, have created new issue #6184 - I went with creating a new issue as I'm new on this and would prefer not to do a pull request myself until I understand how the system works.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment