New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fields.Date accepts input in non ISO8601 format. This leads to unaccepted behaviour. #899

Closed
dushr opened this Issue Aug 1, 2018 · 11 comments

Comments

Projects
None yet
3 participants
@dushr
Contributor

dushr commented Aug 1, 2018

The doc of fields.Date suggests that it accepts only ISO8601 date format.

class Date(Field):
    """ISO8601-formatted date string.
    :param kwargs: The same keyword arguments that :class:`Field` receives.
    """

It works with ISO8601 format fine:

In [2]: fields.Date().deserialize('2018-10-08')
Out[2]: datetime.date(2018, 10, 8)

However unfortunately it also works with non ISO8601 formats:

In [3]: fields.Date().deserialize('08-10-2018')
Out[3]: datetime.date(2018, 8, 10)

And in this case it seems like it parses the format as %m-%d-%Y
I feel, it should raise a ValidationError with invalid_date format incase the input is not in the iso8601 format.

Digging further into the issue I found that DateField uses utils.from_iso_date; which inturn uses dateutil (if available) parser to parse the date.
I think that util should be modified to use dateutil.parser.isoparse

@lafrech lafrech added the datetime label Aug 1, 2018

@sloria

This comment has been minimized.

Member

sloria commented Aug 4, 2018

Thanks @dushr for the report. I think you are right about using isoparse. Would you like to send a PR?

@sloria sloria added this to the 3.0 milestone Aug 4, 2018

@sloria sloria added the help wanted label Aug 4, 2018

@dushr

This comment has been minimized.

Contributor

dushr commented Aug 5, 2018

@sloria Yup, I can do that.

@lafrech

This comment has been minimized.

Member

lafrech commented Sep 2, 2018

@dushr a PR with this change and a non-regression test would be awesome.

@dushr

This comment has been minimized.

Contributor

dushr commented Sep 3, 2018

@lafrech I created a PR for this

dushr added a commit to dushr/marshmallow that referenced this issue Sep 3, 2018

@lafrech

This comment has been minimized.

Member

lafrech commented Sep 3, 2018

Thanks for the PR, @dushr.

Since dateutil is optional and since its doc says

As of version 2.7.0, the strictness of the parser should not be considered a stable part of the contract. Any valid ISO-8601 string that parses correctly with the default settings will continue to parse correctly in future versions, but invalid strings that currently fail (e.g. 2017-01-01T00:00+00:00:00) are not guaranteed to continue failing in future versions if they encode a valid date.

we can't really rely on it for validation.

I think isoparse is the right tool for the job so I'm fine with this change, but to be more strict about iso standard, maybe we should pick date_re and time_re from Django, like we picked datetime_re.

https://github.com/django/django/blob/b9cf764be62e77b4777b3a75ec256f6209a57671/django/utils/dateparse.py#L13-L27

date_re = re.compile(
    r'(?P<year>\d{4})-(?P<month>\d{1,2})-(?P<day>\d{1,2})$'
)

time_re = re.compile(
    r'(?P<hour>\d{1,2}):(?P<minute>\d{1,2})'
    r'(?::(?P<second>\d{1,2})(?:\.(?P<microsecond>\d{1,6})\d{0,6})?)?'
)

datetime_re = re.compile(
    r'(?P<year>\d{4})-(?P<month>\d{1,2})-(?P<day>\d{1,2})'
    r'[T ](?P<hour>\d{1,2}):(?P<minute>\d{1,2})'
    r'(?::(?P<second>\d{1,2})(?:\.(?P<microsecond>\d{1,6})\d{0,6})?)?'
    r'(?P<tzinfo>Z|[+-]\d{2}(?::?\d{2})?)?$'
)

At this point, we may want to borrow the parsing functions from Django as well...

Going further, if we remove unused from_datestring (see #758 (comment)) and if we modify from_rfc to use email (its docstrings points to a SO page suggesting to use email), then I think we have all references to dateutil removed, so we can drop it totally and have a simpler interface.

@sloria, @deckar01, thoughts?

@dushr

This comment has been minimized.

Contributor

dushr commented Sep 3, 2018

I am okay with implementing it without dateutils too.
If we can come up with a consensus on this, then I can go ahead and modify my PR.

@dushr

This comment has been minimized.

Contributor

dushr commented Sep 9, 2018

@lafrech @sloria @deckar01 do you have any opinions on this?

Like I said before, I have done it using dateutils. However if you want to get rid of dateutils and do it using regex, I can do that too.

Just waiting on your thoughts.

@lafrech

This comment has been minimized.

Member

lafrech commented Sep 11, 2018

Totally removing dateutils is out of scope for this PR.

The change you propose (using isoparse) is good but according to the docs, it does not make the validation totally reliable, so it would be nice if you could investigate adding the regexes from Django as shown above. If this works out, that's great already and it can be merged.

If you have the time and will to go further, it would be nice to investigate using Django's parsing functions instead of dateutils and see how it goes. This is not required for this PR. It can be done in another one, by you or anyone else.

And finally, removing all other calls to dateutils would definitely be another PR.

Thanks again for your help.

@dushr

This comment has been minimized.

Contributor

dushr commented Sep 14, 2018

👍 Will try to get this done by this weeekend

@lafrech

This comment has been minimized.

Member

lafrech commented Sep 25, 2018

I got a bit carried away about removing dateutil. The deserialization code in django works because it uses pytz from their utils/timezone.py file.

Timezone management needs a third-pary lib, be it dateutil or pytz. This dependency can be optional (like it is today with dateutil in marshmallow).

Maybe we could make things more explicit about the difference between with and without dateutil, even move timezone related features into separate fields. But we can't have TZ features for free.

Bottom line, if you have time to incorporate the validation regexes from django, that's great. Otherwise, I might try to do it. But don't go any further, at least not in the same PR, as it is a more complicated change than I thought.

lafrech added a commit that referenced this issue Sep 26, 2018

@lafrech lafrech referenced this issue Sep 26, 2018

Merged

Enforce ISO8601 when deserializing date and time #968

2 of 2 tasks complete
@lafrech

This comment has been minimized.

Member

lafrech commented Sep 26, 2018

@dushr, I added the regexes in #968.

lafrech added a commit that referenced this issue Oct 6, 2018

@sloria sloria closed this in #968 Oct 6, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment