New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix for ServerNotModified WARC revisit records incorrectly record WARC-Payload-Digest #118

Merged
merged 4 commits into from Mar 23, 2015

Conversation

Projects
None yet
2 participants
@kris-sigur
Collaborator

kris-sigur commented Mar 20, 2015

Fixes issue: https://webarchive.jira.com/browse/HER-2080
Also adds crawl.log annotations for server-not-modified duplicates.

Relates to: iipc/openwayback#224

kris-sigur added some commits Mar 11, 2015

Stop automatically writing WARC-Payload-Digest for revisit records.
Refer that functionality to the relevant revisit profile classes.
Use digest from previous capture when encountering 304s.
The previous capture may also have been a 304 but there must be an
original 200 capture at the start. This assumption may fail when using
an index generated by a version of Heritrix without this fix. But even
then you should be no worse off than before.

nlevitt added a commit that referenced this pull request Mar 23, 2015

Merge pull request #118 from kris-sigur/HER-2080
Fix for ServerNotModified WARC revisit records incorrectly record WARC-Payload-Digest

@nlevitt nlevitt merged commit bcae99d into internetarchive:master Mar 23, 2015

@kris-sigur kris-sigur deleted the kris-sigur:HER-2080 branch Aug 20, 2015

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment