Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a configurable limit to the max pack size that will be indexed #4721

Merged
merged 5 commits into from
Jul 26, 2018

Conversation

nelhage
Copy link
Contributor

@nelhage nelhage commented Jul 11, 2018

This is needed in order for google/oss-fuzz#1604 to not immediately OOM with a ~32GB allocation.

The existing code enforces a hard limit of 2³² objects, which may be acceptable as a general default, but is too large for the fuzzer's memory limits.

I'm very open to other approaches (should there be a configurable limit that the fuzzer configures?) but I'm throwing up a PR with the simplest fix I could find in order to start a discussion.

@ethomson
Copy link
Member

Yeah, I think that a configurable limit (via git_libgit2_opts) would be preferable here.

@nelhage nelhage changed the title While fuzzing, limit number objects a pack can contain Add a configurable limit to the max pack size that will be indexed Jul 12, 2018
@nelhage
Copy link
Contributor Author

nelhage commented Jul 12, 2018

PTAL, I've added a git_libgit2_opts setting.

@@ -372,6 +374,18 @@ typedef enum {
* > fail. (Using the FORCE flag to checkout will still overwrite
* > these changes.)
*
* opts(GIT_OPT_GET_INDEXER_MAX_OBJECTS, size_t *out)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'd rather name this GIT_OPT_{SET,GET}_PACK_MAX_OBJECTS. Mentioning the indexer here is simply leaking implementation details which are not of much interest to the general user.

* > Get the maximum number of objects libgit2 will allow in a pack
* > file when downloading a pack file from a remote. This can be
* > used to limit maximum memory usage when fetching from an untrusted
* > remote.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can it really? A remote is still able to send a very small amount of huge objects. So I wonder whether a size limitation would actually be more useful than a count limitation.

That being said, should this really be a global value? It could be somebody has two remotes, one of which he is trusting and another one which he is not trusting. Another scenario is packfiles which were generated locally, which have different trust levels than packfiles which are being downloaded. So I think making this an option of the indexer would be a lot saner. We'd have to provide a new way to instantiate an indexer for this, as git_indexer_new cannot be changed but does not have git_indexer_opts parameter.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As best I can tell -- and you certainly know the code better than I -- everything other than the index metadata is streamed directly from the remote to disk, and so does not consume memory. And so the number of objects (which sizes the entries and deltas vectors) is the salient variable for memory (but not disk) usage.

Is there another type that has an alternate constructor that takes an _opts type I can look at for prior art? In addition, the git_indexer is constructed by the pack backend right now; What would be the idiomatic way to plumb configuration all the way through from the remote or repository to that creation?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have much prior experience with the indexer, so you're actually right. It will only store struct delta_infos and struct entrys. They obviously still result in quite a lot of memory being allocated when we have 2^32-1 objects.

I currently still think about accepting this PR with only the global option. In case the need arises to have different limits based on what the indexer is being used for we can still add an options struct to make it configurable per indexer, with the default being the global option's value.

src/indexer.c Outdated
@@ -22,6 +22,8 @@

extern git_mutex git__mwindow_mutex;

size_t git_indexer__max_objects = (size_t)-1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should use UINT32_MAX here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why UINT32_MAX and not SIZE_MAX?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original behavior capped at 2³², so UINT32_MAX would replicate that behavior. I'll push a change up shortly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, thanks.

This replicates the old behavior of limiting to 2³² by default.
@pks-t pks-t merged commit 42f8384 into libgit2:master Jul 26, 2018
@pks-t
Copy link
Member

pks-t commented Jul 26, 2018

Thanks, @nelhage! I'll rebase #4728 and apply the limit there instead of creating a pull request for oss-fuzz.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants