Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

testing compression #166

Closed
flibustenet opened this issue Jan 8, 2020 · 6 comments
Closed

testing compression #166

flibustenet opened this issue Jan 8, 2020 · 6 comments

Comments

@flibustenet
Copy link

flibustenet commented Jan 8, 2020

I'm trying to test compression but i don't see any compression.

$ kopia repository create filesystem --path /ocean/kz
$ kopia policy set --compression=gzip-best-compression /ocean/kz
$ kopia policy show /ocean/kz
Policy for wilk@thinkpad:/ocean/kz:
...
Compression:
  Compressor: "gzip-best-compression" (defined for this target)
  Compress files regardless of extensions.
  Compress files of all sizes.

$ kopia snapshot ~/projets/flibuste

The repository has exactly the same size without setting compression (580M for 617M of source, 259M with targz).
Did I missed some configuration ?

I also tried setting compression with --global

edit: I build kopia with today clone

@jkowalski
Copy link
Contributor

jkowalski commented Jan 8, 2020

you need to set policy on ~/projects/... and not on the repository.

What kinds of files are you compressing? Many files these days are not really compressible at all.
Try a directory full of source code, such as Kopia itself.

To verify the compression do kopia snapshot ls to find the directory entry (starts with k<hash>) then do kopia ls -l k<hash> - if you see any entries with Z in them - those are compressed.

@flibustenet
Copy link
Author

flibustenet commented Jan 8, 2020

It's ok with setting policy on the source. I've tried with kopia source and it's ok also.

But i found that with a sql dump as text, very compressible (153M possibly gziped to 10M) there is an issue.

$ ls -lh /tmp/data
total 153M
-rw-r--r-- 1 wilk wilk 153M janv.  8 16:30 dump.sql

Without compression

$  kopia snapshot /tmp/data
uploaded snapshot 034c... (root k8b...) in 4.407741852s
$ du -shc /ocean/kk
153M	/ocean/kk

With compression

uploaded snapshot 70... (root k9b6...) in 1m9.976920549s
$ du -shc /ocean/kk
311M	/ocean/kk

I cleared cache and create new repository each time

edit: also mem usage is going very high (800M)

@jkowalski
Copy link
Contributor

Weird, I'll investigate this - I downloaded a 163MB SQL dump from https://launchpad.net/test-db/+download and I can also see this.

jkowalski referenced this issue in jkowalski/kopia Jan 9, 2020
this effectively defeated the purpose of compression, caused high
memory usage and other kinds of bad behavior.

refactored the code to prevent this issue by resetting the buffer
at the caller not callee.

fixed previous e2e test to catch the issue mentioned in #166,
verified it fails against master and passes with this change.
@flibustenet
Copy link
Author

I can confirm that it fix the issue \o/. You've done it very rapidly.
My repository (617M) is going to 262M which is less than with borg 305M. But slower (kopia 34s, borg 16s). On second snapshot kopia is a lot faster than borg...

I will test it now with my home on a daily basis to compare with real data.

Thanks !

@jkowalski
Copy link
Contributor

btw you can see which compression algo is the fastest with:

kopia benchmark compression --data-file ~/employees_db/load_salaries.dump --repeat 1

@flibustenet
Copy link
Author

Changing the algo, it's a lot faster with s2. Still a little bit slower than borg (lz4) but, for me, i don't think it need to be investigated more than that.

jkowalski added a commit that referenced this issue Jan 10, 2020
this effectively defeated the purpose of compression, caused high
memory usage and other kinds of bad behavior.

refactored the code to prevent this issue by resetting the buffer
at the caller not callee.

fixed previous e2e test to catch the issue mentioned in #166,
verified it fails against master and passes with this change.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants