New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
proposal to update WARC-Date spec #6
Conversation
Relating to the "reduced precision" issue, reading the Wikipedia page the ISO8601 standard already allows for this. If we were to broaden the spec. to include ISO8601-compatible dates this should technically be covered (and anything which correctly implements it should handle date-comparison). |
@PsypherPunk yes, those are exactly what I proposed using, in my "alternative", "not preferred" proposal. http://nlevitt.github.io/warc-specifications/specifications/warc-date/allow-more-precise.html#warc-date-mandatory-2 |
The "Proposed Revised Spec" looks good to me. |
cleymour commented 6 hours ago
I would hesitate to make any particular recommendation, because I think the correct behavior is not obvious. Consider the effect of replacing missing digits with zeroes. Suppose you have a warc record for http://example.com/ with timestamp "2005", and another record with timestamp "2005-01-05T12:34:56.789Z". Someone requests http://example.com/ from "2005-01-01T00:00:00Z". The capture with timestamp "2005-[00-00T00:00:00]" would be chosen in preference over "2005-01-05T12:34:56.789Z", even though it's more likely that the latter was captured closer to the desired timestamp. What is the preferred playback behavior for timestamps with reduced precision? It's not at all clear to me. Maybe it's something like "return the record with less precise timestamp if it's definitely? probably? the closest to the requested timestamp". But that could be exceedingly difficult to implement, especially given how wayback indexes currently work, without any special provision for reduced precision timestamps. In general, I don't think it's a good idea for standards or specs to make any recommendations, except when they are based on proven implementations. You can never think of everything until you actually do it. |
FWIW, the current wayback machine behavior is to pad up not down.. e.g., a request for /2005/ is actually equivalent to a request for /20051231235959/ not /20050101000000/ The merits of this should probably be discussed elsewhere, but just pointing out current default behavior. |
I think this issue can be split, into an easy part and a not-easy part. We clearly need microsecond precision as an option, so I think we should deal first. The issues around reducing the precision by omitting day/month or whatever would seem to need more discussion. So, can I propose that we deal with the microsecond-precision option first? Furthermore, can I propose that this is done in a new pull request that directly modified the proposed version 1.1 of the specification: I'd also like to suggest that each proposed change is also noted in the Document History section (at the end) so this can act as a change-log for the specification document. This could like back to relevant issues or pull requests, in the same way as for the CHANGES.md files we tend to use elsewhere. |
I think we'd agreed in the last discussion to adopt the "alternative" proposal as-is...? If that's the case the current pull request should be fine, although the addition to the "Document History" might be a good practice for future amendments. |
@nlevitt's original pull-request was based around adding a new specification document. I am proposing instead that the change is submitted as a modification to the WARC 1.1 standard document itself. However, if the community would rather this is a separate specification, that's fine by me. |
Revise WARC-Date specification to permit values with varying levels of precision. It is the same as the "Alternative Proposed Revised Spec" from http://nlevitt.github.io/warc-specifications/specifications/warc-date/allow-more-precise.html but with the addition of the sentence "This document recommends no particular algorithm for choosing a record by date when an exact match is not available." I also added an entry to Document History. See also iipc#6
Took @anjackson's suggestion and created a new pull request against the standard document itself. #21 It replaces the WARC-Date section with my "alternative" proposal, with the addition of the sentence "This document recommends no particular algorithm for choosing a record by date when an exact match is not available." That was my understanding of the consensus established on the phone meeting. Also added a document history entry as requested. |
No description provided.