Add a configurable limit to the max pack size that will be indexed #4721

nelhage · 2018-07-11T14:34:22Z

This is needed in order for google/oss-fuzz#1604 to not immediately OOM with a ~32GB allocation.

The existing code enforces a hard limit of 2³² objects, which may be acceptable as a general default, but is too large for the fuzzer's memory limits.

I'm very open to other approaches (should there be a configurable limit that the fuzzer configures?) but I'm throwing up a PR with the simplest fix I could find in order to start a discussion.

ethomson · 2018-07-11T16:16:36Z

Yeah, I think that a configurable limit (via git_libgit2_opts) would be preferable here.

nelhage · 2018-07-12T05:05:12Z

PTAL, I've added a git_libgit2_opts setting.

pks-t · 2018-07-13T08:24:07Z

include/git2/common.h

@@ -372,6 +374,18 @@ typedef enum {
 *		> fail.  (Using the FORCE flag to checkout will still overwrite
 *		> these changes.)
 *
+ *	 opts(GIT_OPT_GET_INDEXER_MAX_OBJECTS, size_t *out)


I think I'd rather name this GIT_OPT_{SET,GET}_PACK_MAX_OBJECTS. Mentioning the indexer here is simply leaking implementation details which are not of much interest to the general user.

pks-t · 2018-07-13T08:25:13Z

include/git2/common.h

+ *		> Get the maximum number of objects libgit2 will allow in a pack
+ *		> file when downloading a pack file from a remote. This can be
+ *		> used to limit maximum memory usage when fetching from an untrusted
+ *		> remote.


Can it really? A remote is still able to send a very small amount of huge objects. So I wonder whether a size limitation would actually be more useful than a count limitation.

That being said, should this really be a global value? It could be somebody has two remotes, one of which he is trusting and another one which he is not trusting. Another scenario is packfiles which were generated locally, which have different trust levels than packfiles which are being downloaded. So I think making this an option of the indexer would be a lot saner. We'd have to provide a new way to instantiate an indexer for this, as git_indexer_new cannot be changed but does not have git_indexer_opts parameter.

As best I can tell -- and you certainly know the code better than I -- everything other than the index metadata is streamed directly from the remote to disk, and so does not consume memory. And so the number of objects (which sizes the entries and deltas vectors) is the salient variable for memory (but not disk) usage.

Is there another type that has an alternate constructor that takes an _opts type I can look at for prior art? In addition, the git_indexer is constructed by the pack backend right now; What would be the idiomatic way to plumb configuration all the way through from the remote or repository to that creation?

I don't have much prior experience with the indexer, so you're actually right. It will only store struct delta_infos and struct entrys. They obviously still result in quite a lot of memory being allocated when we have 2^32-1 objects.

I currently still think about accepting this PR with only the global option. In case the need arises to have different limits based on what the indexer is being used for we can still add an options struct to make it configurable per indexer, with the default being the global option's value.

pks-t · 2018-07-20T12:16:38Z

src/indexer.c

@@ -22,6 +22,8 @@

 extern git_mutex git__mwindow_mutex;

+size_t git_indexer__max_objects = (size_t)-1;


You should use UINT32_MAX here.

Why UINT32_MAX and not SIZE_MAX?

The original behavior capped at 2³², so UINT32_MAX would replicate that behavior. I'll push a change up shortly.

Got it, thanks.

This replicates the old behavior of limiting to 2³² by default.

pks-t · 2018-07-26T13:26:35Z

Thanks, @nelhage! I'll rebase #4728 and apply the limit there instead of creating a pull request for oss-fuzz.

nelhage changed the title ~~While fuzzing, limit number objects a pack can contain~~ Add a configurable limit to the max pack size that will be indexed Jul 12, 2018

nelhage added 2 commits July 12, 2018 05:03

while fuzzing, limit # objects read

912c59c

Add a git_libgit2_opts option to set the max indexer object count

efe3f37

nelhage force-pushed the max-objects branch from 63728a1 to efe3f37 Compare July 12, 2018 05:04

pks-t requested changes Jul 13, 2018

View reviewed changes

nelhage mentioned this pull request Jul 13, 2018

Add a fuzzer for libgit2 google/oss-fuzz#1604

Merged

nelhage added 2 commits July 16, 2018 03:12

See if this fixes 32-bit build

bfe3424

INDEXER_MAX_OBJECTS -> PACK_MAX_OBJECTS

b3ca817

pks-t requested changes Jul 20, 2018

View reviewed changes

Use UINT32_MAX as the default object limit

3281034

This replicates the old behavior of limiting to 2³² by default.

pks-t merged commit 42f8384 into libgit2:master Jul 26, 2018

snyk-bot mentioned this pull request Feb 23, 2020

[Snyk] Upgrade nodegit from 0.4.1 to 0.26.4 saurabharch/Breezeblocks#1

Open

snyk-bot mentioned this pull request Apr 22, 2020

[Snyk] Upgrade nodegit from 0.24.3 to 0.26.5 aminatakonate000/Graviton-App#4

Open

snyk-bot mentioned this pull request May 5, 2020

[Snyk] Upgrade nodegit from 0.24.3 to 0.26.5 Barnstorm-Online/ngp-openapi-generator#1

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a configurable limit to the max pack size that will be indexed #4721

Add a configurable limit to the max pack size that will be indexed #4721

nelhage commented Jul 11, 2018

ethomson commented Jul 11, 2018

nelhage commented Jul 12, 2018

pks-t Jul 13, 2018

pks-t Jul 13, 2018

nelhage Jul 13, 2018

pks-t Jul 20, 2018

pks-t Jul 20, 2018

ethomson Jul 20, 2018

nelhage Jul 20, 2018

ethomson Jul 20, 2018

pks-t commented Jul 26, 2018

		@@ -22,6 +22,8 @@

		extern git_mutex git__mwindow_mutex;

		size_t git_indexer__max_objects = (size_t)-1;

Add a configurable limit to the max pack size that will be indexed #4721

Add a configurable limit to the max pack size that will be indexed #4721

Conversation

nelhage commented Jul 11, 2018

ethomson commented Jul 11, 2018

nelhage commented Jul 12, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pks-t commented Jul 26, 2018