-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"Error parsing date..." #2279
Comments
Ok I did some exploring and as a temporary workaound I've successfully patched Changed: def date_to_list(date_str, datetime_format, preprocessing_parameters):
try:
if datetime_format is not None:
datetime_obj = datetime.strptime(date_str, datetime_format)
else:
datetime_obj = parse(date_str) To: def date_to_list(date_str, datetime_format, preprocessing_parameters):
try:
if isinstance(date_str, datetime):
datetime_obj = date_str
elif datetime_format is not None:
datetime_obj = datetime.strptime(date_str, datetime_format)
else:
datetime_obj = parse(date_str) I'm not going to submit this as a patch because this doesn't fix the root cause - but it does get things working. I'd be happy to help submit a proper patch if someone can guide me in the right direction to where this might be happening upstream. |
Hi @noahlh! What you've proposed is actually quite reasonable -- could you create a PR with this change, with an additional test case in Looking into this a bit, it looks like there's a discrepancy between There's a more principled solution to cast all datetime features to I think your patch is a reasonable stopgap -- happy to help you land that. |
@justinxzhao Ah ha! You got it -- happy to help. I'll put that together ASAP. |
@justinxzhao Just reading through the Pandas docs on read_json and another possible upstream fix might be to set the From the docs, it looks like it automatically converts for datelike columns, which includes columns with the name "date" (which is exactly my situation). We might be able to avoid a refactor that way, since you'll consistently be getting the same input. Just an idea. |
@noahlh Thanks for looking more into the Setting |
Describe the bug
I'm attempting to train a model from data in a large JSON array. An example of the relevant slice looks like:
When I run the following command:
ludwig train --dataset input.json --data_format json -c config.yaml
After a few moments, I get the following output for every element of the array:
I've tried this with several config options in
config.yaml
, including each of the following:All 3 result in the same error message.
It seems to me what's happening is that the date string is being parsed too soon in the process -- it's being turned into a Timestamp object BEFORE being fed into strptime(), hence the error.
I'm not familiar enough with the codebase to know if this is a bug, an issue with JSON as the input source, or something else, but some guidance would be appreciated!
Expected behavior
The date should parse according to the provided format.
Environment (please complete the following information):
The text was updated successfully, but these errors were encountered: