Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Zstandard to 1.5.0 #12705

Closed
wongsyrone opened this issue Oct 29, 2021 · 10 comments
Closed

Update Zstandard to 1.5.0 #12705

wongsyrone opened this issue Oct 29, 2021 · 10 comments
Labels
Type: Feature Feature request or new feature

Comments

@wongsyrone
Copy link

Describe the feature would like to see added to OpenZFS

Zstandard v1.5.0 is released with large performance improvements, any plan to upgrade it?

How will this feature improve OpenZFS?

Additional context

https://github.com/facebook/zstd/releases/tag/v1.5.0

@wongsyrone wongsyrone added the Type: Feature Feature request or new feature label Oct 29, 2021
@rincebrain
Copy link
Contributor

rincebrain commented Oct 29, 2021

There are a number of issues that have been opened requesting this, and some dev time that has gone into doing it.

As I understand it, one of the tricky parts is that a number of components of ZFS assume that if they decompress something and recompress it, it's byte-for-byte identical to what was originally there. But basically no compression format promises that across version updates, they just promise weaker things like "older/newer compressors within this version range can always de/compress each other's stuff".

So updating zstd would break those assumptions, and finding a good way to work with that that doesn't just bloat ZFS periodically with more zstd versions every time someone updates it is hard.

Some people were working on it. I believe they've mostly stopped, because they got "replace the zstd in ZFS right now with 1.5.0" done, but the "handle older zstd stuff" part was not done, and there was frustration over it not being done.

#11367 is the most recent PR about it. #12081 was a prior bug requesting it.

@wongsyrone
Copy link
Author

How about this: facebook/zstd#731 (comment)

@rincebrain
Copy link
Contributor

rincebrain commented Oct 29, 2021

That says exactly what I already said.

Older and newer compressors can read each other's data.

"Stable format" does not mean "identical compressed output".

If you asked 50 people to write down a description of a specific event that happened, in English, and asked them to all include certain details, it's likely all 50 would be valid English paragraphs, and they could all read each other's paragraphs and agree they said the necessary things, but that no 2 would be identical.

@wongsyrone
Copy link
Author

Got it. If that's the case the entire ZSTD stuff should be removed as it won't get any update due to back & forward compatibility. Some day in the future, ZSTD may have security vulnerabilities (no one can guarantee it has absolutely no vulnerabilities), nobody can fix it as there will always be fragmented ZFS users with different versions in production.

@rincebrain
Copy link
Contributor

By that logic, you should probably remove all the compressors, all the encryption algorithms, etc.

As previously stated, it's not impossible, but currently no one has completed it.

@wongsyrone
Copy link
Author

I'll maintain my own copy as @BrainSlayer has done or stay away from the zstd stuff.

@rincebrain
Copy link
Contributor

That would be unfortunate, if you were to implement all the necessary functionality and decline to share it. I think you would likely tire of maintaining such a fork, and would suggest your effort might be better spent on either improving another project where you think upstream would be receptive to improvements or contributing to this one, which I would not claim is unreceptive to fixes.

I'm not particularly sure why you've decided that this is a lost cause. It went poorly, people fell out, which sucks, but there's no indication that it could never be completed and merged, nobody standing around saying "I will never merge that", and right now it's a "nice improvement", not "known fundamentally broken or insecure in some fashion".

@wongsyrone
Copy link
Author

any compressed data produced by version v1+ respect the specification, and can therefore be decompressed correctly by any decoder from version v1+.

If it's true, there should be no issue from the ZSTD side. As you mentioned before, it's the assumption of "identical binary output" in ZFS makes it no way to jump to one specific version, you have to keep migration steps for all variants. Otherwise, ZFS thinks your data is corrupted, and boom.

facebook/zstd#999 (comment)

@BrainSlayer
Copy link
Contributor

I'll maintain my own copy as @BrainSlayer has done or stay away from the zstd stuff.

what makes you think that i dont maintain it? my tree contains zstd 1.5.0 since a long time and i also keep the source up to date.

@robszy
Copy link

robszy commented Nov 27, 2021

Only way out of it would be rewrite checksumming code do chcecksum of logical data not tying everything to one implementation maybe at penalty of performance and scrubs without loaded keys but we must have way to update compression algorithms but it won't be easy and that's source of maintaining own copy of repo.

Other way in 2100 everyone will still use zstd 1.4.5 in zfs. Bad design decision causes a lot of headaches and a lot of work to do.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Feature Feature request or new feature
Projects
None yet
Development

No branches or pull requests

4 participants