Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compression ratio is better after tweaking quality 6 -> 1 for big files #222

Closed
tesseract2048 opened this issue Oct 11, 2015 · 7 comments
Closed

Comments

@tesseract2048
Copy link

I have a ~30GB text file filled with ascii numbers.
If I truncate to first 1GB, brotli outperforms gzip in size about ~20%.
However, if I compress the whole file with quality 1, brotli compressed file is only 8% smaller.
More strange, if I compress the whole file with quality 6, brotli compressed file is actually 11% bigger than gzip.

Any theory what is going on? Thanks.

@tesseract2048 tesseract2048 changed the title Compression ratio is better after tweaking quality 6 -> 1 Compression ratio is better after tweaking quality 6 -> 1 for big files Oct 11, 2015
@tesseract2048
Copy link
Author

Tested on a 2.79GB file, results are same.

Every line in the file looks like the following one:

30878059857552670 50025829476611184 66874527287706175 375508968765795686 373127765632278220 375342102553363245 364326748989567069 366505919391298506 377457911994795327 363411424571894825 373898759588425536 374334808416099019 360760477436270128 361309365116896886 364352003089439073 375674008634286701 367152582065696876 367855851559397543 367273034879590470 373534666133923980 374767354319435320 364939151523857981 370766209358644373 368849862587483333 369375471723485160 360428992812683960 362756519526083575 368058669706273620 363374354726247701 383336502073316860 390840051326797822 389682673771031688 392042489058586768 382011509102760527 385448191113192450 384274092396816514 379180964994502905 382036106541650951 388203100937202784 393434223891394090 388010572174871871 380003376763688520 386570822226305665 380297123401928640 386596087792217231 383677090571341963 390567909615441534 389005466880157272 393969508707548020 387078554704535241 381737568827717361 392334530352732955 390266000934033127 394425453383066281 383385069065536734 389697536188506696 388212606095605133 388999121467810361 399878203990676585 405963764532775318 398673866377410605 404746856101231847 410863332993714830 399881472443165042 404307238551047893 404064578127766509 402851101716692827 396751564398309658 400444420333068725 408786268458861903 409681960326371980 400376894996149263 401096177915629208 397287185742754389 405113985967409613 413703204463637567 403771931572842749 402507102181568599 407150113799399139 400798926857979021 401045289602881283 407200800125504338 399083388635636878 411847794488541711 411032375813134263 410397688189064473 408149830613424760 404890282039985176 476848146289316439 469545715298139183 481242829052469824 478271635622728360 473675894149529551 470754644261353529 479402329352691534 472168062584460950 476539969431554962 479069280569433230 481430874913536310

@jyrkialakuijala
Copy link
Collaborator

Are these sorted 64-bit hashes?

@tesseract2048
Copy link
Author

Yes, you can say that, @jyrkialakuijala.
It represents sparse feature.

@jyrkialakuijala
Copy link
Collaborator

I have identified a problem in the hashing that degrades the compression performance after 2 GB for quality >= 5. We will fix this within three days.

@tesseract2048
Copy link
Author

Sure, thanks.
I've identified that it worked as expected with quality <= 4.

@jyrkialakuijala
Copy link
Collaborator

Levels 5-9 should work in the head now. Level 11, the same as 10, is still misbehaving for files longer than 2 GB.

@eustas
Copy link
Collaborator

eustas commented Jun 21, 2016

As mentioned in previous comment, should be fixed now.
If not, feel free to reopen the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants