Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not include resource WARC records inside the ZIM #197

Closed
benoit74 opened this issue Feb 29, 2024 · 1 comment · Fixed by #198
Closed

Do not include resource WARC records inside the ZIM #197

benoit74 opened this issue Feb 29, 2024 · 1 comment · Fixed by #198
Assignees
Labels
bug Something isn't working
Milestone

Comments

@benoit74
Copy link
Collaborator

webrecorder/browsertrix-crawler#457 and webrecorder/browsertrix-crawler#458 have introduced more systematic use of resource WARC records.

As of 1.0.0-beta5, resource WARC records are used for:

  • thumbnails
  • screenshots
  • text extracts
  • page infos (mentioned above)

All these resources while potentially useful for warc2zim logic (especially the page infos, we do not use thumbnails, screenshots and text extracts for now) should not be directly embedded inside the ZIM, at least not as a response record is.

@benoit74 benoit74 added the bug Something isn't working label Feb 29, 2024
@benoit74 benoit74 self-assigned this Feb 29, 2024
@kelson42
Copy link
Contributor

kelson42 commented Mar 6, 2024

Fixes by #198

@kelson42 kelson42 closed this as completed Mar 6, 2024
@kelson42 kelson42 added this to the 2.0.0 milestone Mar 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants