Conversation
There was a problem hiding this comment.
Pull Request Overview
This PR adds support for configurable digest algorithms in WARC files by introducing the --warc-digest-algorithm command-line flag. The change allows users to specify different digest algorithms (sha1, sha256, blake3) for WARC record block and payload digests instead of being limited to the default sha1.
- Added
WARCDigestAlgorithmconfiguration field with validation - Updated WARC writer to use the configured digest algorithm
- Upgraded gowarc dependency to support new digest functionality
Reviewed Changes
Copilot reviewed 4 out of 5 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| internal/pkg/config/config.go | Added WARCDigestAlgorithm field and validation logic |
| internal/pkg/archiver/warc.go | Updated WARC writer to use configured digest algorithm |
| go.mod | Upgraded gowarc dependency and added new transitive dependencies |
| cmd/get.go | Added --warc-digest-algorithm command-line flag |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
NGTmeaty
left a comment
There was a problem hiding this comment.
Looks good besides one comment!
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #449 +/- ##
==========================================
+ Coverage 55.45% 56.43% +0.97%
==========================================
Files 120 128 +8
Lines 7364 7972 +608
==========================================
+ Hits 4084 4499 +415
- Misses 2956 3101 +145
- Partials 324 372 +48
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
* add: --warc-digest-algorithm * fix: simplify WARC digest algorithm choice validation Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: Jake L <NGTmeaty@users.noreply.github.com>
This pull request introduces support for selecting the digest algorithm used for WARC record digests, allowing users to choose between
sha1,sha256, andblake3. It updates command-line flags, configuration, and internal logic to handle this new option, and ensures the selected algorithm is validated and passed through to the WARC writer. Additionally, it updates dependencies to support the new functionality.Digest Algorithm Selection and Validation:
--warc-digest-algorithm(default:sha1) to thegetcommand, allowing users to specify the digest algorithm for WARC records. Supported values aresha1,sha256, andblake3.Configstruct to include aWARCDigestAlgorithmfield, and added validation inGenerateCrawlConfig()to ensure the specified algorithm is supported. [1] [2]warc.GetDigestFromPrefix.Dependency Updates:
github.com/internetarchive/gowarcto v0.8.87 to support additional digest algorithms, and added indirect dependencies forgithub.com/zeebo/blake3andgithub.com/klauspost/cpuid/v2. [1] [2] [3]Minor Fixes:
WarcSizetoWARCSize.gowarcpackage in the config file to access digest-related helpers.PS: copilot wrote that, not that bad?