Skip to content

Openwayback replays metadata record for the revisit records with duplicate hash instead of the original response #146

@aalsum

Description

@aalsum

I used H3.2.0 to crawl a set of seed URIs that are identical. The WARC file has been generated successfully with the recommend WARC-Refers-To-Target-URI and WARC-Refers-To-Date fields.

When I try to replay the WARC file in WM, it reads the metadata record instead of the revisit record.

I'm using WM built from the latest master branch on github.

WARC: http://stanford.edu/~aalsum/data/WEB-20140806165357001-00000-2081~heritrix-dev.Stanford.EDU~8443.warc.gz
CDX: http://stanford.edu/~aalsum/data/index.cdx

Original URI: http://www.cs.odu.edu/~aalsum/test1.txt (works fine)
Duplicate URI: http://www.cs.odu.edu/~aalsum/test3.txt

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions