You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Just noticed an oddity in our crawls. We have a WARC response with no response in it (see below). This seems to be due to the crawler getting a HTTP 204 response.
However, I only think that because the @ikreymer's pywbcdx-indexer creates this CDX line:
From the extracted links it seems to be a redirect not a 204.
ato
changed the title
Heritrix appears to write empty WARC records for HTTP 204 responses
Heritrix sometimes writes empty WARC records for redirects
Aug 2, 2018
Just noticed an oddity in our crawls. We have a WARC response with no response in it (see below). This seems to be due to the crawler getting a
HTTP 204
response.However, I only think that because the @ikreymer's
pywb
cdx-indexer
creates this CDX line:But frankly I don't understand where it's getting the
204
from!Assuming it is really a
204
(I'll check the crawl log), the question is: What should Heritrix3 be writing to the WARC file?The text was updated successfully, but these errors were encountered: