Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(repository): Add support to configure metadata compression algorithm #550

Closed
wants to merge 9 commits into from

Conversation

PrasadG193
Copy link
Collaborator

Overview

This PR:

  • Adds a new repository mutable parameter MetadataCompression to set the compression algorithm for metadata
  • The MetadataCompression can be set using kopia repository set-parameters --metadata-compression=<name>
  • Disable metadata compression by setting algorithm value to none
  • Metadata compression is set to zstd-fastest by default for newly created repo
  • For backward compatibility, for the already created repo, if the metadata compression is "", the default compression zstd-fastest is used.

Test plan

Metadata compression setting on the existing repository

  • Create a kopia repo using kopia CLI built from master and perform a snapshot.
$ ./kopia-master repository status
Config file:         /Users/work/Library/Application Support/kopia/repository.config

Description:         Repository in S3: s3.amazonaws.com xxxxx
Hostname:            xxxxxx
Username:            xxxx
Read-only:           false
Format blob cache:   15m0s

Storage type:        s3
Storage capacity:    unbounded
Storage config:      {
                       xxxxxxx
                     }

Unique ID:           ced6ca00adab6e31e132ae6101066f3c4f816ce8ad6d806d439253085840bffc
Hash:                BLAKE2B-256-128
Encryption:          AES256-GCM-HMAC-SHA256
Splitter:            DYNAMIC-4M-BUZHASH
Format version:      3
Content compression: true
Password changes:    true
Max pack length:     21 MB
Index Format:        v2

Epoch Manager:       enabled
Current Epoch: 0

Epoch refresh frequency: 20m0s
Epoch advance on:        20 blobs or 10.5 MB, minimum 24h0m0s
Epoch cleanup margin:    4h0m0s
Epoch checkpoint every:  7 epochs


./kopia-master content list --compression | grep k4f0d5ad646417c5e8eaa2da1fd83afa6
k4f0d5ad646417c5e8eaa2da1fd83afa6 length 910 packed 371 zstd-fastest 59.2%
  • Make changes and build kopia binary. View repository config. Perform a snapshot. Check the compression on directory object
$ kopia repository status | grep Metadata
Metadata compression: zstd-fastest

$ kopia ls -l k11ab5c4362bd1ecb6b24f9a069eee23e
drwxr-xr-x           13 2024-05-23 16:21:48 IST k1b467e329c385a3b12822981171a59f4  d1/
-rw-r--r--    107374182 2024-04-25 18:20:09 IST Ixa2d871c317f66b5709f15f96998a92b1 myfile

$ kopia content list --compression | grep k1b467e329c385a3b12822981171a59f4
k1b467e329c385a3b12822981171a59f4 length 302 packed 247 zstd-fastest 18.2% 
  • Set metadata compression algorithm, and perform snapshot. Check compression used on directory object
$ kopia repository set-parameters --metadata-compression=zstd-best-compression
 - setting metadata compression algorithm to zstd-best-compression.

deleting /Users/work/Library/Caches/kopia/1f4078815d31fd80/kopia.repository
deleting /Users/work/Library/Caches/kopia/1f4078815d31fd80/kopia.blobcfg
NOTE: Repository parameters updated, you must disconnect and re-connect all other Kopia clients.

$ kopia repository status | grep Metadata
Metadata compression: zstd-best-compression

$ kopia ls -l k209cea5e887d888179ee1f69e294f67f
drwxr-xr-x           13 2024-05-23 16:21:48 IST k1b467e329c385a3b12822981171a59f4  d1/
drwxr-xr-x            0 2024-05-23 16:27:11 IST kc96eb7033e7f2e6ac093e416f50543df  d2/
drwxr-xr-x            0 2024-05-23 16:30:07 IST kc81171fd0ac56f50fa48cae14a0ffd63  d3/
-rw-r--r--    107374182 2024-04-25 18:20:09 IST Ixa2d871c317f66b5709f15f96998a92b1 myfile

$ kopia content list --compression | grep kc81171fd0ac56f50fa48cae14a0ffd63
kc81171fd0ac56f50fa48cae14a0ffd63 length 153 packed 171 zstd-best-compression 0% 

  • Disable metadata compression, and perform snapshot. Validate the compression on directory object.
$ kopia repository set-parameters --metadata-compression=none
 - setting metadata compression algorithm to none.

deleting /Users/work/Library/Caches/kopia/1f4078815d31fd80/kopia.repository
deleting /Users/work/Library/Caches/kopia/1f4078815d31fd80/kopia.blobcfg
NOTE: Repository parameters updated, you must disconnect and re-connect all other Kopia clients.

$ kopia ls -l k59abdd24f3fe80ebf13fd825f12239da
drwxr-xr-x            0 2024-05-23 16:32:19 IST k317ca6f933d4cf67d4d3d725aa124631  d4/
-rw-r--r--    107374182 2024-04-25 18:20:09 IST Ixa2d871c317f66b5709f15f96998a92b1 myfile
drwxr-xr-x           13 2024-05-23 16:21:48 IST k1b467e329c385a3b12822981171a59f4  d1/
drwxr-xr-x            0 2024-05-23 16:27:11 IST kc96eb7033e7f2e6ac093e416f50543df  d2/
drwxr-xr-x            0 2024-05-23 16:30:07 IST kc81171fd0ac56f50fa48cae14a0ffd63  d3/

$ kopia content list --compression | grep k317ca6f933d4cf67d4d3d725aa124631
k317ca6f933d4cf67d4d3d725aa124631 length 154 packed 182 - 

Metadata compression setting on a new repository

  • Create a new kopia repo
kopia repository status                
Config file:         /Users/work/Library/Application Support/kopia/repository.config

Description:         Repository in S3: s3.amazonaws.com xxxxx
Hostname:            xxxxxx
Username:            xxxxx
Read-only:           false
Format blob cache:   15m0s

Storage type:        s3
Storage capacity:    unbounded
Storage config:      {
			xxxxxxx
                     }

Unique ID:           6da473a3d3a65bfbb42ab50635e61b9303bbb2711ce5567892daa1f7acb944f8
Hash:                BLAKE2B-256-128
Encryption:          AES256-GCM-HMAC-SHA256
Splitter:            DYNAMIC-4M-BUZHASH
Format version:      3
Content compression: true
Password changes:    true
Max pack length:     21 MB
Index Format:        v2

Metadata compression: zstd-fastest

Epoch Manager:       enabled
Current Epoch: 0

Epoch refresh frequency: 20m0s
Epoch advance on:        20 blobs or 10.5 MB, minimum 24h0m0s
Epoch cleanup margin:    4h0m0s
Epoch checkpoint every:  7 epochs
  • Perform a snapshot and check compression on a directory object.
$ kopia ls -l k8fe379641e0ef8282fa36df90f2f032e
drwxr-xr-x           13 2024-05-23 16:45:15 IST k2babeb0fe4442c4a28b464b2256686dc  d1/

$ kopia content list --compression | grep k2babeb0fe4442c4a28b464b2256686dc

  • Disable metadata compression, and perform snapshot. Check compression on directory object.
$ kopia repository set-parameters --metadata-compression=none
 - setting metadata compression algorithm to none.

deleting /Users/work/Library/Caches/kopia/95f4eeb554efde52/kopia.repository
deleting /Users/work/Library/Caches/kopia/95f4eeb554efde52/kopia.blobcfg
NOTE: Repository parameters updated, you must disconnect and re-connect all other Kopia clients.


$ kopia repository status | grep Metadata
Metadata compression: disabled


$ kopia ls -l k2f1683d4f856301a8c46f69a9384b216
drwxr-xr-x           13 2024-05-23 16:45:15 IST k2babeb0fe4442c4a28b464b2256686dc  d1/
drwxr-xr-x            0 2024-05-23 16:47:38 IST ka039535722f527e224754685e4fa536b  d2/

$ kopia content list --compression | grep ka039535722f527e224754685e4fa536b
ka039535722f527e224754685e4fa536b length 154 packed 182 - 

PrasadG193 and others added 5 commits May 21, 2024 20:25
Signed-off-by: Prasad Ghangal <prasad.ganghal@veeam.com>
Add unit tests

Signed-off-by: Prasad Ghangal <prasad.ganghal@veeam.com>
Signed-off-by: Prasad Ghangal <prasad.ganghal@veeam.com>
@PrasadG193 PrasadG193 marked this pull request as ready for review May 23, 2024 19:55
Copy link

@redgoat650 redgoat650 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One minor suggestion, but +1 from me apart from that

cli/command_repository_set_parameters.go Outdated Show resolved Hide resolved
repo/content/content_manager_test.go Show resolved Hide resolved
Signed-off-by: Prasad Ghangal <prasad.ganghal@veeam.com>
Signed-off-by: Prasad Ghangal <prasad.ganghal@veeam.com>
Signed-off-by: Prasad Ghangal <prasad.ganghal@veeam.com>
@PrasadG193 PrasadG193 requested a review from plar June 6, 2024 06:49
Copy link

@plar plar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link

@julio-lopez julio-lopez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@PrasadG193

According to the PR description, you performed a manual end-to-end test to verify that the setting was applied and had an effect.

Can you please add an automated test for those steps?

Also, see inline questions.

🥇Thanks for doing this.

@@ -2403,6 +2405,13 @@ func (s *contentManagerSuite) newTestContentManager(t *testing.T, st blob.Storag
return s.newTestContentManagerWithTweaks(t, st, nil)
}

func (s *contentManagerSuite) newTestContentManagerWithMetadataCompression(t *testing.T, st blob.Storage, comp compression.Name) *WriteManager {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is comp used?

I wonder why it was not flagged by a linter. 🤔

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is not used anymore, I forgot to cleaned it up. I've removed this

MaxPackSize int `json:"maxPackSize,omitempty"` // maximum size of a pack object
IndexVersion int `json:"indexVersion,omitempty"` // force particular index format version (1,2,..)
EpochParameters epoch.Parameters `json:"epochParameters,omitempty"` // epoch manager parameters
MetadataCompression compression.Name `json:"metadataCompression,omitempty"` // metadata compression algorithm name

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The approach implemented in this PR is to have a repo-wide setting in MutableParameters.

Does it make sense to make this a policy setting instead?
Are there any tradeoffs in doing so? and which ones?

Copy link
Collaborator Author

@PrasadG193 PrasadG193 Jun 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main disadvantage was the implementation and testing complexity. Since this feature is not expected to be used frequently (applicable only to special cases), we decided to go with the easiest option which will be quicker to implement. We don't need to set metadata compression settings per file. It's applicable for the whole repo instead of the file level.

Signed-off-by: Prasad Ghangal <prasad.ganghal@veeam.com>
@PrasadG193
Copy link
Collaborator Author

Closing in respect of #556

@PrasadG193 PrasadG193 closed this Jul 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
4 participants