-
Notifications
You must be signed in to change notification settings - Fork 342
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
use dateparser for parsing dates #159
Comments
|
@anarcat I completely agree that it would be great to replace a lot of the edge case date parsing with a dedicated date parser (like dateparser). My first thought is that I'd like to keep dedicated parsers for the date formats that are documented in the RSS/Atom specs, but there's a lot of edge case code that could fall back to a dedicated date parser! Quick question: do you have experience with other libraries besides dateparser? |
|
On 2019-03-05 16:56:18, Kurt McKee wrote:
Quick question: do you have experience with other libraries besides dateparser?
Yes. I wrote a tool called undertime:
https://gitlab.com/anarcat/undertime
... that originally used "parsedatetime":
https://pypi.org/project/parsedatetime/
... but it had trouble parsing certain dates:
https://gitlab.com/anarcat/undertime/issues/8
Things are much better with dateparser.
|
|
Okay great! Thank you for the quick response! |
|
dateutil is less flexible than dateparser, but might be enough to handle the problematic case. Did anyone check the actual bug reports to see if they are fixed by the patch? |
|
Good point, I haven't done that yet. |
|
Hi,
My test script, as well as the detailed result, is here: https://gist.github.com/Ash-Crow/08bededf6a2d87a0a06453ff7803f355 |
|
I found |
I recently worked with the dateparser module for parsing dates, and it seems it might be better than feedparser at its job. For example, it will support parsing internationalized dates and it can parse the date
Tue,19 Feb 2019 00:27:25 GMT(notice the missing space?) orSun, 15 Feb 2015 00:00:00(no timezone??), both of which feedparser cannot parse.I don't have time to patch this myself right now, but I wonder if it might not be as simple as doing:
In my tests, the above successfully parses the times specified above.
As dateparser trickles into various linux distributions and (hopefully) standardizes, the built-in date parsing code could eventually be replaced by dateparser as well.
Thank you for any advice.
(Note: the issues affected by this are far ranging, and we'd need to see them one by one, but I suspect it could fix #135, #114, #91, #75, aaand #72, so it's totally worth it. :)
The text was updated successfully, but these errors were encountered: