Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dictionary mismatch #3655

Closed
necros2k7 opened this issue May 24, 2023 · 4 comments
Closed

Dictionary mismatch #3655

necros2k7 opened this issue May 24, 2023 · 4 comments
Assignees
Labels

Comments

@necros2k7
Copy link

Describe the bug
"zstd -d "1.warc.zst"
1.warc.zst : 0 B... 1.warc.zst : Decoding error (36) : Dictionary mismatch"
I guess some kind of dictionary to unpack should be used? Where to get it?

To Reproduce
Steps to reproduce the behavior:
try to decompress any "zstd -d "1.warc.zst"

Expected behavior
normal unpacking

Desktop (please complete the following information):

  • OS: Win11
  • Version 22H2
  • Other relevant hardware specs X79 system
@Cyan4973 Cyan4973 self-assigned this May 24, 2023
@Cyan4973
Copy link
Contributor

I guess some kind of dictionary to unpack should be used? Where to get it?

We have no idea.

There is no managed repository of public dictionaries at the time of this writing,
consequently, all dictionaries are private. Therefore, it could be anything.

The dictionary ID can be consulted, using the command zstd -lv file.zst, though it will just provide an integer value.
This integer value is supposed to represent a (hopefully) unique file in the private referential of the compressor.
But that's about it, it doesn't tell much about the potential content of such a dictionary.
If one doesn't know where to retrieve the dictionary, chances are, the file can't be decompressed.

@JustAnotherArchivist
Copy link

Ugh, sorry about this spilling over to here.

@necros2k7 .warc.zst uses a custom format that includes the dictionary in the file itself, but regular zstd tooling doesn't know how to extract it. I wrote a little wrapper script to handle that.

For the record, the relevant specification is at http://iipc.github.io/warc-specifications/specifications/warc-zstd/, and the stalled attempt to upstream the embedded dictionary format is at #2349.

@necros2k7
Copy link
Author

necros2k7 commented Jun 13, 2023

@JustAnotherArchivist , can you mod this script for Win?

@JustAnotherArchivist
Copy link

I don't see a particular reason why changes would be necessary, but if so, no, I'm afraid I don't have time for that. Maybe Windows Subsystem for Linux is an option. But that's all the advice I can give.

@Cyan4973 Cyan4973 closed this as completed Mar 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants