-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WARCs with datetime before year 1900 cause error in indexer #603
Comments
This seems to be an issue with strftime with potential solutions provided here. |
When would a WARC have a datetime prior to the year 1900? |
@shawnmjones A WARC generated through conventional means should not, since 1900 predates the creation of the WARC spec and the Web. The WARC spec cites the W3C profile of the ISO W3C profile of ISO 8601:1988 spec as the WARC-Date basis. Dates prior to 1900 are legal here, so should not cause an exception. However, the interpretation of a dates prior to 1900 in this field is likely due to a misinterpretation, misconfiguration, or a fabricated example, as attached ↑. |
I tried this again with 0b9bb3e and the above WARC and was unable to replicate. I would like to see if there was some fix in Py3's |
On second look, the above is using Python 2.7. Perhaps this was never an issue with Py3. I can replicate the above with Py2:
...which should be moot as we drop support per #51. |
Yes, this should have been fixed in Py3 as per https://bugs.python.org/issue1777412 |
Thanks for finding this reference, @ibnesayeed. With the move to Py3 in #609, we should be able to close this issue, #608, and finally #51. |
Closed in #609. |
fb_fab_dates 2.warc.txt
The text was updated successfully, but these errors were encountered: