Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Returned datetime skips a day with time+timezone input and PREFER_DATES_FROM = 'future' #403

Closed
Laogeodritt opened this issue Mar 27, 2018 · 3 comments · Fixed by #1002
Closed

Comments

@Laogeodritt
Copy link

Laogeodritt commented Mar 27, 2018

Versions:

  • Linux (Debian testing)
  • Python 3.6.4
  • dateparser (0.7.0)
    • python-dateutil (2.7.2)
    • pytz (2018.3)
    • regex (2018.2.21)
    • tzlocal (1.5.1)

What I was trying to do: Use dateparser to:

  • Return a timezone-unaware datetime, with the value converted to UTC
  • Default to UTC, if the input string does not have a timezone
  • Convert the time in the input string to UTC, if the input string does specify a timezone

I think my settings here should achieve this according to my understanding of the documentation—if not, and this isn't a bug but a config error on my part, please do let me know!

Minimum working example:

With the following settings, behaviour is as expected:

>>> DATEPARSER_SETTINGS
{'TIMEZONE': 'UTC', 'TO_TIMEZONE': 'UTC', 'RETURN_AS_TIMEZONE_AWARE': False}
>>> parse('6PM UTC', settings=DATEPARSER_SETTINGS)  # case #1
datetime.datetime(2018, 3, 27, 18, 0)
>>> parse('6PM UTC-4', settings=DATEPARSER_SETTINGS)  # case #2
datetime.datetime(2018, 3, 27, 22, 0)
>>> parse('2PM UTC', settings=DATEPARSER_SETTINGS)  # case #3
datetime.datetime(2018, 3, 27, 14, 0)
>>> parse('2PM UTC-4', settings=DATEPARSER_SETTINGS)  # case #4
datetime.datetime(2018, 3, 27, 18, 0)
>>> datetime.utcnow()
datetime.datetime(2018, 3, 27, 17, 15, 52, 695093)

With the following settings, some unexpected behaviour occurs:

>>> DATEPARSER_SETTINGS
{'TIMEZONE': 'UTC', 'TO_TIMEZONE': 'UTC', 'RETURN_AS_TIMEZONE_AWARE': False, 'PREFER_DATES_FROM': 'future'}
>>> parse('6PM UTC', settings=DATEPARSER_SETTINGS)  # case #1 - OK, later today
datetime.datetime(2018, 3, 27, 18, 0)
>>> parse('6PM UTC-4', settings=DATEPARSER_SETTINGS)  # case #2 - OK, later today
datetime.datetime(2018, 3, 27, 22, 0)
>>> parse('2PM UTC', settings=DATEPARSER_SETTINGS)  # case #3 - this is fine, 2PM UTC today is in the past so skip to tomorrow
datetime.datetime(2018, 3, 28, 14, 0)
>>> parse('2PM UTC-4', settings=DATEPARSER_SETTINGS)  # case #4               !!! 2PM UTC-4 is in ~40 minutes, this skips that time and instead yields tomorrow
datetime.datetime(2018, 3, 28, 18, 0)
>>> datetime.utcnow()
datetime.datetime(2018, 3, 27, 17, 13, 50, 149498)

Expected behaviour: With an undated time input and PREFER_DATES_FROM=future, dateparse should return the nearest future time that matches, i.e., datetime(2018, 3, 27, 18, 0).

What actually happens: Dateparser skips a day to the second nearest date in the future, datetime(2018, 3, 28, 18, 0). This only happens in the last case above, which to me suggests the following:

  • The timezone in the input string and the timezone in the settings are different
  • The time, if in the settings timezone, would be in the past. That is to say, 2PM UTC is in the past here (so dateparser adds +1 day to satisfy the PREFER_DATES_FROM=future condition).
  • The time with the timezone given in the input string is actually a few hours in the future, i.e., 2PM UTC-4 is in the future, so adding +1 day is erroneous
@Baviaan
Copy link
Contributor

Baviaan commented Apr 11, 2019

I am experiencing exactly the same issue. It checks whether the given time is in the future for UTC instead of for the given timezone.

Versions:

  • Ubuntu 18.04.2 LTS
  • Python 3.6.7
  • dateparser (0.7.1)
    • python-dateutil (2.8.0)
    • pytz (2018.9)
    • regex (2019.3.12)
    • tzlocal (1.5.1)

@Laogeodritt
Copy link
Author

Laogeodritt commented Aug 25, 2019

I have found that the following MWE also seems to produce the same bug, even if PREFER_DATES_FROM is not set:

from datetime import datetime
from dateparser import parse
DATEPARSER_SETTINGS = {
    'TIMEZONE': 'UTC',
    'TO_TIMEZONE': 'UTC',
    'RETURN_AS_TIMEZONE_AWARE': False,
    'RELATIVE_BASE': datetime(2019, 8, 18, 3, 55, 1, 0)
}

dt = parse("9:57 PM MDT", settings=DATEPARSER_SETTINGS)  # datetime(2019, 8, 19, 3, 57, 0)
expected = datetime(2019, 8, 18, 3, 57, 0)
assert dt == expected  # fail:  dt is a day later than expected

Note that RELATIVE_BASE is set to 9:55:01 PM MDT on 17 August, which is before the specified time of "9:57 PM MDT" in the parse input. The returned dt value is neither the nearest future or past instance of 9:57 PM MDT relative to the current time (RELATIVE_BASE).

@manycoding manycoding added Type: Bug and removed bug labels Oct 18, 2019
@Ricimon
Copy link

Ricimon commented Feb 22, 2020

It looks like lines 309 to 324 of parser.py are what cause this issue:

    def _set_relative_base(self):
        self.now = self.settings.RELATIVE_BASE
        if not self.now:
            self.now = datetime.utcnow()

    def _get_datetime_obj_params(self):
        if not self.now:
            self._set_relative_base()

        params = {
            'day': self.day or self.now.day,
            'month': self.month or self.now.month,
            'year': self.year or self.now.year,
            'hour': 0, 'minute': 0, 'second': 0, 'microsecond': 0,
        }
        return params

Without a RELATIVE_BASE parameter, the day selected for the parsed input time is the current UTC date, before any past or future PREFER_DATES_FROM adjustment is done. This explains why timezones behind UTC will skip a day when UTC time has passed 12AM before the local timezone has.

With a RELATIVE_BASE parameter, the parsed time will use the date of the RELATIVE_BASE parameter, ignoring any declared timezone settings. This explains @Laogeodritt 's latest issue, as the 18th from RELATIVE_BASE is joined with 9:57 PM MDT, which converted to UTC will end up on the 19th.

The current solution is to specify a RELATIVE_BASE on the same timezone as the input string to parse. When not specifying a timezone in the input string, adding this should solve most issues:

'RELATIVE_BASE': datetime.now()

With the pytz library,

'RELATIVE_BASE': datetime.now(pytz.timezone(timezone_of_input_string)).replace(tzinfo=None)

The library fix is probably to make _set_relative_base timezone aware.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Project board
Awaiting triage
Development

Successfully merging a pull request may close this issue.

5 participants