Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

where shall parameters get stored? #210

Open
ThomasWaldmann opened this issue Mar 1, 2015 · 3 comments
Open

where shall parameters get stored? #210

ThomasWaldmann opened this issue Mar 1, 2015 · 3 comments

Comments

@ThomasWaldmann
Copy link
Contributor

As already seen in PR #207 and issue #209 there is some need for more per-repository and/or per-payload parameter storage:

  • key / encryption type
  • hash / mac algorithm
  • compression algorithm + compression algorithm parameters

Currently, there seem to be 2 places where such information can be stored:

  • first byte of every payload
  • $repopath/config file (.ini style)

While some stuff could be defined once and be global per repository, it might be more flexible to have it in each payload, so you can switch to new stuff without having to start a new repo.

But: one byte is not enough to represent all we need.

@ThomasWaldmann
Copy link
Contributor Author

payload currently uses byte0 with values 0x00, 0x01, 0x02 (to give key/enc type). this should be recognized also in future for backwards compatibility (but always write new flexible format, so old format can be dropped at some time in the future).

first idea (superceded by msgpack, see comment below) for new flexible format for parameter record:

  • use byte0 values 0x03 and beyond
  • lookup parser function from mapping[byte0]. the parser sets up everything as needed and returns an offset to the data after the parameter record (here: 4, old types: 1)
  • byte0: TYPE, 0x03 for now (old: 0x00 .. 0x02)
  • byte1: key/encryption type (0x00, 0x01, 0x02, ... (as previously put into byte0))
  • byte2: compression method
  • byte3: hash / hmac method
  • byte4..n: everything that was at/after offset 1 with old format

So, we have 3 bytes more to store, but gain flexibility for the future. Also the code gets simpler than in PR #207. As we have 256 values for compression, we could even map some parameters directly, e.g. use 0-9 for gzip+level, 10-19 for lzma+level, etc. (alternatively: use 1 byte more for compression params).

@ThomasWaldmann
Copy link
Contributor Author

Some measurements with PR #207 code - all tests on local SSD filesystems:

--compression=6 --mac=0 (zlib default level 6 + sha256)
Duration: 6 minutes 53.29 seconds
Number of files: 247725 6.03 GB 2.36 GB 2.15 GB

--compression=6 --mac=1 ("" + sha512-256)
Duration: 6 minutes 42.46 seconds
Number of files: 247725 6.03 GB 2.36 GB 2.15 GB

--compression=1 --mac=1 (fastest zlib compression + "")
Duration: 4 minutes 29.36 seconds
Number of files: 247725 6.03 GB 2.53 GB 2.31 GB

--compression=0 --mac=1 (no compression + "")
Duration: 4 minutes 15.61 seconds
Number of files: 247725 6.03 GB 6.04 GB 5.49 GB

@ThomasWaldmann
Copy link
Contributor Author

I changed the 0x03 format again to use msgpack to get more flexibility and easier code.
Also using a Meta namedtuple now to remove the pain with all these hardcoded offset/byte-ranges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant