Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

type casting assumes 'month first' for ambiguous dates #16

Closed
amirouche opened this issue Dec 8, 2017 · 3 comments
Closed

type casting assumes 'month first' for ambiguous dates #16

amirouche opened this issue Dec 8, 2017 · 3 comments
Labels

Comments

@amirouche
Copy link

Right now, the type detection does infer a date, datetime or time types without taking into account the fact that 01/02/2002 can be both Februrary the 1st or January the 2nd depending on the date format used respectively DD/MM/YYYY and MM/DD/YYYY.

This might be undecidable in some rare cases, but in general it's possible given enough values to decide between both formats.

One possible way, to handle this in meza is to use a higher level datatype for representing the type of a field to replace the current string representation. For instance:

datetime_type = namedtuple('DateTimeType', ['format'])

Basically, use a representation that takes optional extra information about the type.

@reubano
Copy link
Owner

reubano commented Dec 8, 2017

Good point. In fact, the date format detection could be implemented in a way similar to that of type detection. I.e., test dates record by record until a given confidence threshold has been crossed.

@reubano reubano changed the title datetime type detection is not good enough type casting doesn't account for various date formats Dec 8, 2017
@reubano reubano changed the title type casting doesn't account for various date formats type casting assumes 'month first' for ambiguous dates Dec 8, 2017
@reubano
Copy link
Owner

reubano commented Dec 8, 2017

To clarify, process.type_cast calls convert.to_date and convert.to_datetime, and those functions then eventually call dateutil.parser.parse. parse takes an optional dayfirst parameter which defaults to False. This should be configurable, and optionally detected by a new function.

@reubano reubano added the bug label Dec 8, 2017
@reubano reubano reopened this Aug 10, 2020
reubano added a commit that referenced this issue Dec 24, 2021
* fixes:
  Bump to version 0.43.0
  [FIX] Misc compatibility fixes
  Be explicit with kwarg
  [ENH] Loosen up dev requirements
  [DOC] Show a csv dialect example (closes #22)
  [NEW] Add `whitelist` option to remove_keys
  [ENH] Add prettify command and use it!
  [ENH] Pass kwargs to date parser (closes #16)
  Fix incorrect int parsing during is_numeric checks
@amirouche
Copy link
Author

💯

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants