Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix corruption error when compressing blob data with zlib #9572

Closed
wants to merge 1 commit into from

Conversation

Jingkai
Copy link

@Jingkai Jingkai commented Feb 15, 2022

The plain data length may not be big enough if the compression actually expands data. So use deflateBound() to get the upper limit on the compressed output before deflate().

@Jingkai
Copy link
Author

Jingkai commented Feb 17, 2022

@siying pls help review

@ajkr
Copy link
Contributor

ajkr commented Feb 18, 2022

Why is it specific to blob data? Is it because we store data blocks uncompressed when their compressed size would be larger than their uncompressed size, but don't do that for blobs? cc @ltamasi

Copy link
Contributor

@ajkr ajkr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@facebook-github-bot
Copy link
Contributor

@ajkr has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@ltamasi
Copy link
Contributor

ltamasi commented Feb 18, 2022

Why is it specific to blob data? Is it because we store data blocks uncompressed when their compressed size would be larger than their uncompressed size, but don't do that for blobs? cc @ltamasi

Right. That's actually one of the file format improvements we're planning.

Having said that, I would recommend using a more modern compression algorithm like lz4 or zstd.

@riversand963 riversand963 changed the title Fix corruption error when compressing blob data with zlib. Fix corruption error when compressing blob data with zlib Feb 18, 2022
ajkr pushed a commit that referenced this pull request Mar 3, 2022
Summary:
The plain data length may not be big enough if the compression actually expands data. So use deflateBound() to get the upper limit on the compressed output before deflate().

Pull Request resolved: #9572

Reviewed By: riversand963

Differential Revision: D34326475

Pulled By: ajkr

fbshipit-source-id: 4b679cb7a83a62782a127785b4d5eb9aa4646449
ajkr pushed a commit that referenced this pull request Mar 7, 2022
Summary:
The plain data length may not be big enough if the compression actually expands data. So use deflateBound() to get the upper limit on the compressed output before deflate().

Pull Request resolved: #9572

Reviewed By: riversand963

Differential Revision: D34326475

Pulled By: ajkr

fbshipit-source-id: 4b679cb7a83a62782a127785b4d5eb9aa4646449
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants