Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(repository): Metadata compression config support for dir writer #556

Open
wants to merge 8 commits into
base: kasten
Choose a base branch
from

Conversation

PrasadG193
Copy link
Collaborator

@PrasadG193 PrasadG193 commented Jun 26, 2024

Overview

This PR adds support to configure metadata compressor for dir type of metadata (metadata with k prefix)

This PR:

  • Adds metadata compression setting to policy
  • Set zstd-fastest as the default compressor for metadata in the policy
  • Adds support to set and show metadata compression to kopia policy commands
  • Adds metadata compression config to dir writer

Test plan

  1. Initialize repo and check default policy setting. Validate default metadata compression is zstd-fastest
$ kopia policy show --global
.
.
Compression disabled.

Metadata compression:
  Compressor:                   zstd-fastest   (defined for this target)
.
.
  1. Perform snapshot of repo dir and observe stats. Validate content with k prefix is compressed with zstd-fastest
$ kopia content stats                                
Count: 265
Total Bytes: 1.3 MB
Total Packed: 1.2 MB (compression 2.1%)
By Method:
  (uncompressed)         count: 233 size: 1.2 MB
  zstd-fastest           count: 32 size: 51.9 KB packed: 18.9 KB compression: 63.6%
Average: 4.7 KB
Histogram:

        0 between 0 B and 10 B (total 0 B)
        0 between 10 B and 100 B (total 0 B)
       75 between 100 B and 1 KB (total 42.5 KB)
      160 between 1 KB and 10 KB (total 580.6 KB)
       30 between 10 KB and 100 KB (total 601.5 KB)
        0 between 100 KB and 1 MB (total 0 B)
        0 between 1 MB and 10 MB (total 0 B)
        0 between 10 MB and 100 MB (total 0 B)


$ kopia content list --compression  | grep ^k         
k00473f5ceebe4cefc192c66c2d76195b length 323 packed 268 zstd-fastest 17.0% 
k0407700cee06e87b51aab7ffbbce88ef length 1200 packed 503 zstd-fastest 58.1% 
k185e9f3f6243e051a6bb5b1bbdeff657 length 654 packed 372 zstd-fastest 43.1% 
k18a025a61b565c6a0c522b61c7e73e5e length 318 packed 256 zstd-fastest 19.5% 
k18b2a5e67cce76db463e166f57393eeb length 969 packed 440 zstd-fastest 54.6% 
k2cfbb3585badd71e4f856a2ac78a5ee6 length 1660 packed 577 zstd-fastest 65.2% 
k2f26fc9180f12448ce9f986df13b79ef length 1293 packed 516 zstd-fastest 60.1% 
k31cb1c03ffc2c16684147a45c62512de length 4482 packed 1309 zstd-fastest 70.8% 
k3d0606c3842e082bc03c53d8c504795d length 496 packed 312 zstd-fastest 37.1% 
k4030902da299fa1bfae1787179b71d5f length 1514 packed 588 zstd-fastest 61.2% 
k43160d3bf0c096b1a615a462b2004e9a length 2627 packed 781 zstd-fastest 70.3% 
k4b274e92160641cfd58297c3e49a7195 length 497 packed 320 zstd-fastest 35.6% 
k63e25126a08e351c89bfce64c804c51d length 3032 packed 939 zstd-fastest 69.0% 
k66652ef4a3059e04d59e7955fc1b8a23 length 5559 packed 1387 zstd-fastest 75.0% 
k7de5ed738ded8002f86936c740e48ac7 length 821 packed 397 zstd-fastest 51.6% 
k962e84fed4d77665a4c965cfb6dbd706 length 662 packed 365 zstd-fastest 44.9% 
k996615190f76580347be3b72fe060e41 length 6934 packed 1809 zstd-fastest 73.9% 
ka5ed759c873cf268a80a5d2367ca5da3 length 1028 packed 476 zstd-fastest 53.7% 
ka90dfe7aafa487a09c902bcbc81bdd93 length 1135 packed 485 zstd-fastest 57.3% 
kab699518ccfede55bcd4513628af595b length 818 packed 408 zstd-fastest 50.1% 
kad0bc8dfa5d2c91905aa5357567bc364 length 3586 packed 1095 zstd-fastest 69.5% 
kadbec8b7be5f0edfb2c38585e2a10ce0 length 1315 packed 528 zstd-fastest 59.8% 
kc41a1ac95a057c25b381258826e99f5e length 658 packed 366 zstd-fastest 44.4% 
kc71e5bf6ce8a1d733dc2e7c6df174005 length 2440 packed 791 zstd-fastest 67.6% 
kc9ec6c3020b021c34d92de1983bbde6e length 648 packed 366 zstd-fastest 43.5% 
kcbcca443215184c66628ef0b0782a608 length 1328 packed 535 zstd-fastest 59.7% 
kd2b6a4439afd0e230dc201885bde35a7 length 495 packed 311 zstd-fastest 37.2% 
kd300d05ae5b8c1ce52ee1e9ea16937b4 length 827 packed 413 zstd-fastest 50.1% 
kd614a27b8ac03818ad6d190ff8a79f8d length 481 packed 316 zstd-fastest 34.3% 
ke0be124657186d044e9385382e2ca952 length 1157 packed 483 zstd-fastest 58.3% 
ke510ebb5a512616bce5de97ee1879309 length 1604 packed 613 zstd-fastest 61.8% 
kf577c49a7a48668c416df32dc609120d length 1314 packed 546 zstd-fastest 58.4% 
  1. Set metadata compression of global policy to s2-default
$ kopia policy set --global --metadata-compression=s2-default

$ kopia policy show --global 
.
.
Compression disabled.

Metadata compression:
  Compressor:                     s2-default   (defined for this target)
.
.
  1. Snapshot kopia/internal directory and view content stats. Validate new metadata is compressed with s2-default
$ kopia content stats                                        
Count: 613
Total Bytes: 2.4 MB
Total Packed: 2.3 MB (compression 2.1%)
By Method:
  (uncompressed)         count: 510 size: 2.2 MB
  s2-default             count: 71 size: 74.7 KB packed: 43.2 KB compression: 42.2%
  zstd-fastest           count: 32 size: 51.9 KB packed: 18.9 KB compression: 63.6%
Average: 3.9 KB
Histogram:

        0 between 0 B and 10 B (total 0 B)
        0 between 10 B and 100 B (total 0 B)
      204 between 100 B and 1 KB (total 107.7 KB)
      361 between 1 KB and 10 KB (total 1.2 MB)
       47 between 10 KB and 100 KB (total 884.3 KB)
        1 between 100 KB and 1 MB (total 102.2 KB)
        0 between 1 MB and 10 MB (total 0 B)
        0 between 10 MB and 100 MB (total 0 B)
  1. Disable metadata compression for kopia/tests dir
$ kopia policy set ./tests --metadata-compression=none

$ kopia policy show test
.
.
Compression disabled.

Metadata compression disabled.
.
.
  1. Snapshot tests dir and inspect content. New metadata stats should be seen as uncompressed
$ kopia content stats                                 
Count: 768
Total Bytes: 3 MB
Total Packed: 3 MB (compression 1.5%)
By Method:
  (uncompressed)         count: 665 size: 2.9 MB
  s2-default             count: 71 size: 74.7 KB packed: 43.2 KB compression: 42.2%
  zstd-fastest           count: 32 size: 51.9 KB packed: 18.9 KB compression: 63.6%
Average: 3.9 KB
Histogram:

        0 between 0 B and 10 B (total 0 B)
        2 between 10 B and 100 B (total 180 B)
      248 between 100 B and 1 KB (total 131.7 KB)
      455 between 1 KB and 10 KB (total 1.6 MB)
       62 between 10 KB and 100 KB (total 1.2 MB)
        1 between 100 KB and 1 MB (total 102.2 KB)
        0 between 1 MB and 10 MB (total 0 B)
        0 between 10 MB and 100 MB (total 0 B)

@PrasadG193 PrasadG193 changed the title Metadata compression config support for dir writer feat(repository): Metadata compression config support for dir writer Jun 26, 2024
// TODO(prasad): Get rid of complete block once metadata compression setting is implemented for
// all the prefixes
// For now, exclude this for `k` prefixed metadata.
if contentID.HasPrefix() && contentID.Prefix() != "k" && comp == NoCompression && mp.IndexVersion >= index.Version2 {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a constant for dir prefix "k" called objectIDPrefixDirectory. Can we share that >

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This condition will be updated as per the recent discussion.

Signed-off-by: Prasad Ghangal <prasad.ganghal@veeam.com>
Signed-off-by: Prasad Ghangal <prasad.ganghal@veeam.com>
Signed-off-by: Prasad Ghangal <prasad.ganghal@veeam.com>
Signed-off-by: Prasad Ghangal <prasad.ganghal@veeam.com>
Signed-off-by: Prasad Ghangal <prasad.ganghal@veeam.com>
Signed-off-by: Prasad Ghangal <prasad.ganghal@veeam.com>
Signed-off-by: Prasad Ghangal <prasad.ganghal@veeam.com>
@PrasadG193 PrasadG193 force-pushed the md-compression-setting-k-content branch from bb7e7bd to b29e3a5 Compare July 23, 2024 05:29
Signed-off-by: Prasad Ghangal <prasad.ganghal@veeam.com>
Copy link

@e-sumin e-sumin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM but please take into account that I'm not an expert in this particular area.

@Shrekster
Copy link

Blocked on this: #557

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
3 participants