Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize pack readHeader() implementation #1574

Merged
merged 1 commit into from
Jan 24, 2018

Conversation

ifedorenko
Copy link
Contributor

@ifedorenko ifedorenko commented Jan 24, 2018

What is the purpose of this change? What does it change?

Load pack header length and 15 header entries with single backend
request. This eliminates separate header Load() request for most pack
files and significantly improves index.New() performance.

Was the change discussed in an issue or in the forum before?

See #1567

Checklist

  • I have read the Contribution Guidelines
  • I have added tests for all changes in this PR
  • I have added documentation for the changes (in the manual)
  • There's a new file in a subdir of changelog/x.y.z that describe the changes for our users (template here)
  • I have run gofmt on the code in all commits
  • All commit messages are formatted in the same style as the other commits in the repo
  • I'm done, this Pull Request is ready for review

Load pack header length and 15 header entries with single backend
request. This eliminates separate header Load() request for most pack
files and significantly improves index.New() performance.

Signed-off-by: Igor Fedorenko <igor@ifedorenko.com>
const maxHeaderSize = 16 * 1024 * 1024

// we require at least one entry in the header, and one blob for a pack file
var minFileSize = entrySize + crypto.Extension

// number of header enries to download as part of header-length request
var eagerEntries = uint(15)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fd0 This number is based on stats from single 435GB repository (where >98% of all packs have 15 or less header entries, fwiw). Maybe useful to get stats from other repositories, assuming you have access or have interested users who can provide the info.


return binary.LittleEndian.Uint32(buf), nil
}

const maxHeaderSize = 16 * 1024 * 1024

// we require at least one entry in the header, and one blob for a pack file
var minFileSize = entrySize + crypto.Extension
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not related to my change, but I believe minFileSize should be 4 bytes longer to account for header length record at the end of the file.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh indeed, that's right.

Copy link
Member

@fd0 fd0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, thanks!

@fd0 fd0 merged commit 953f3d5 into restic:master Jan 24, 2018
fd0 added a commit that referenced this pull request Jan 24, 2018
fd0 added a commit that referenced this pull request Jan 24, 2018
@ifedorenko ifedorenko deleted the 1567_optimize-pack-readHeader branch January 25, 2018 14:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants