Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use dateparser for parsing dates #159

Open
anarcat opened this Issue Feb 19, 2019 · 3 comments

Comments

Projects
None yet
2 participants
@anarcat
Copy link

anarcat commented Feb 19, 2019

I recently worked with the dateparser module for parsing dates, and it seems it might be better than feedparser at its job. For example, it will support parsing internationalized dates and it can parse the date Tue,19 Feb 2019 00:27:25 GMT (notice the missing space?) or Sun, 15 Feb 2015 00:00:00 (no timezone??), both of which feedparser cannot parse.

I don't have time to patch this myself right now, but I wonder if it might not be as simple as doing:

try:
    import dateparser
    def dateparser_tuple(string):
        return dateparser.parse(string).utctimetuple()
    feedparser.registerDateHandler(dateparser_tuple)
except ImportError:
    pass

In my tests, the above successfully parses the times specified above.

As dateparser trickles into various linux distributions and (hopefully) standardizes, the built-in date parsing code could eventually be replaced by dateparser as well.

Thank you for any advice.

(Note: the issues affected by this are far ranging, and we'd need to see them one by one, but I suspect it could fix #135, #114, #91, #75, aaand #72, so it's totally worth it. :)

@kurtmckee

This comment has been minimized.

Copy link
Owner

kurtmckee commented Mar 6, 2019

@anarcat I completely agree that it would be great to replace a lot of the edge case date parsing with a dedicated date parser (like dateparser). My first thought is that I'd like to keep dedicated parsers for the date formats that are documented in the RSS/Atom specs, but there's a lot of edge case code that could fall back to a dedicated date parser!

Quick question: do you have experience with other libraries besides dateparser?

@anarcat

This comment has been minimized.

Copy link
Author

anarcat commented Mar 6, 2019

@kurtmckee

This comment has been minimized.

Copy link
Owner

kurtmckee commented Mar 6, 2019

Okay great! Thank you for the quick response!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.