-
Notifications
You must be signed in to change notification settings - Fork 466
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Timezone fixes #235
Timezone fixes #235
Conversation
Current coverage is 94.19% (diff: 88.23%)@@ master #235 diff @@
==========================================
Files 32 32
Lines 2736 2826 +90
Methods 0 0
Messages 0 0
Branches 0 0
==========================================
+ Hits 2582 2662 +80
- Misses 154 164 +10
Partials 0 0
|
Hm, what's the reason for this change? Server timezone has nothing to do with a timezone of a site you're scraping. Using server timezone can produce hard-to-notice bugs - you can e.g. develop code locally, get it working, then deploy it to a server, and if a server happens to have a different timezone, you've got a wrong result in production. With UTC default there was no such a problem. |
@kmike @waqasshabbir I honestly don't understand why, from a design perspective, you're adding in this needless complexity to begin with. It seems to me that the right way to do it is to return time zone aware datetimes if the string you parsed has a time zone in it, and otherwise return a naive time zone. One option like, If you do something where you are handling all time zone applications as part of the parsing step, people will be confused as to how the time zones are being treated, and you'll end up with a bunch of complicated options for how to deal with different cases using different zones - all of which can easily be handled by the user just manipulating the results using python's (superior) datetime functions. |
@pganssle there is one problem which can't be solved at post-processing: handling of relative dates like '2 hours ago'. On forums it is common to have relative and absolute dates mixed - older dates are absolute, recent posts/comments have relative dates. Because dateparser returns absolute dates, there must be a way to tell dateparser "this website uses this timezone" in order to parse such relative dates correctly. If you parse these dates as relative to server time or to UTC then you need to convert absolute dates to server time or UTC (otherwise returned dates will be incompatible) - and you need server timezone information for that. I agree that all other timezone processing is less critical because it can be handled by an end user as a post-processing step. |
Probably a better design would be the following:
|
@kmike reason for this change is conversion to UTC in either relative dates or incomplete dates like You can also rephrase below as
dateparser no longer make implicit conversion to a default timezone. Having to specify settings will help user be more aware of timezone conversions and implications. When you specify re better design, not really sure, how useful is the idea of mixing naive and tzaware objects when you can do general post-processing/normalizing using @pganssle dateparser offers all of its primary functionality through one entry point For timezones with similar abbreviations, some work was done early on to adopt only the most commonly used ones and avoid duplicates. See here |
@kmike I see how that is, but wouldn't it make more sense for relative dates to return @waqasshabbir The problem I have is that you are doing manipulations to the datetime that have no place in a parsing function. I think it's very reasonable to specify a default value for the time zone, which is replaced if a time zone is specified in the string. I also think it's very reasonable to say "even if a time zone is specified in the string, ignore it", but what you are doing is providing some facility for doing timezone math within the parsing function, which does not make sense, conceptually. All of this makes the whole function quite confusing, because I would expect |
fixes #212, #230
There was a general confusion with
TIMEZONE
setting. To rectify that, we've added another setting,TO_TIMEZONE
to make settings self explanatory. It's a major change and is not backward compatible.Now,
dateparser
consider timezone to be system's default unless specified. e.g.No conversion takes place, and specified timezone is attached to the resultant date
Conversion is explicit now and must be done using
TO_TIMEZONE
setting.