Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add large document benchmarks, tune alias heuristic, add max depth limits #515

Merged
merged 1 commit into from Oct 2, 2019
Merged

Add large document benchmarks, tune alias heuristic, add max depth limits #515

merged 1 commit into from Oct 2, 2019

Conversation

liggitt
Copy link
Contributor

@liggitt liggitt commented Oct 2, 2019

This PR addresses the following items:

Parse time of excessively deep nested or indented documents

Parsing these documents is non-linear; limiting stack depth to 10,000 keeps parse times of pathological documents sub-second (~.25 seconds in benchmarks)

Alias node expansion limits

The current limit allows 10,000% expansion, which is too permissive for large documents.

Limiting to 10% expansion for larger documents allows callers to use input size as an effective way to limit resource usage. Continuing to allow larger expansion rates (up to the current 10,000% limit) for smaller documents does not unduly affect memory use.

This PR bounds decode operations from alias expansion to ~400,000 operations for small documents (worst-case ~100-150MB) or 10% of the input document for large documents, whichever is greater.

Using the same benchmark methodology as is included in this PR, memory growth of documents at a given size was measured with no alias use, and with alias use up to the point where the document was rejected:

Benchmark10KBMaps-8                    	     300	   5170631 ns/op	 2178170 B/op	   30780 allocs/op
Benchmark10KB100Aliases-8              	      10	 157137076 ns/op	63367120 B/op	 1015200 allocs/op
58MB / 2800% memory growth

Benchmark100KBMaps-8                   	      30	  49184677 ns/op	22002812 B/op	  307269 allocs/op
Benchmark100KB100Aliases-8             	       5	 324828977 ns/op	121346915 B/op	 1911494 allocs/op
94MB / 450% memory growth

Benchmark500KBMaps-8         	       5	 242092357 ns/op	110320752 B/op	 1536076 allocs/op
Benchmark500KB100Aliases-8   	       2	 646305778 ns/op	262996448 B/op	 3996330 allocs/op
145MB / 140% memory growth

Benchmark1000KBMaps-8                  	       2	 515058247 ns/op	220560496 B/op	 3072079 allocs/op
Benchmark1000KB100Aliases-8            	       1	1007482569 ns/op	393118304 B/op	 5832425 allocs/op
164MB / 78% memory growth

Benchmark2000KBMaps-8         	       2	 988554128 ns/op	440802576 B/op	 6144084 allocs/op
Benchmark2000KB100Aliases-8   	       1	1500724350 ns/op	607745104 B/op	 8809907 allocs/op
159MB / 38% memory growth

Benchmark3000KBMaps-8         	       1	1558944296 ns/op	662789424 B/op	 9216086 allocs/op
Benchmark3000KB100Aliases-8   	       1	2130929511 ns/op	804852592 B/op	11394373 allocs/op
135MB / 20% memory growth

Benchmark10000KBMaps-8                 	       1	4861632275 ns/op	2199723248 B/op	30720091 allocs/op
Benchmark10000KB100Aliases-8           	       1	6217891286 ns/op	2359273744 B/op	32710176 allocs/op
150MB / 7% memory growth

Benchmarks

  • Benchmarks for decoding large documents with lots of allocations
  • Benchmarks for runtime of decoding very deep documents
$ go test . -bench . -benchmem -run None
goos: darwin
goarch: amd64
pkg: gopkg.in/yaml.v2
Benchmark1000KB100Aliases-8            	       1	1101005272 ns/op	393203632 B/op	 5833714 allocs/op
Benchmark1000KBDeeplyNestedSlices-8    	       5	 212081876 ns/op	 4690417 B/op	    9070 allocs/op
Benchmark1000KBDeeplyNestedMaps-8      	       5	 207762964 ns/op	 4690702 B/op	    9074 allocs/op
Benchmark1000KBDeeplyNestedIndents-8   	     300	   4000775 ns/op	 2970468 B/op	   10085 allocs/op
Benchmark1000KB1000IndentLines-8       	       3	 400336822 ns/op	143718512 B/op	 4093516 allocs/op
Benchmark1KBMaps-8                     	    3000	    451697 ns/op	  218984 B/op	    3126 allocs/op
Benchmark10KBMaps-8                    	     300	   4930581 ns/op	 2178172 B/op	   30780 allocs/op
Benchmark100KBMaps-8                   	      30	  48219421 ns/op	22002820 B/op	  307269 allocs/op
Benchmark1000KBMaps-8                  	       3	 477795963 ns/op	220560496 B/op	 3072079 allocs/op
PASS
ok  	gopkg.in/yaml.v2	15.168s

@liggitt
Copy link
Contributor Author

@liggitt liggitt commented Oct 2, 2019

cc @niemeyer

@niemeyer
Copy link
Contributor

@niemeyer niemeyer commented Oct 2, 2019

This looks simple and reasonable. There are minor non-functional details, but I won't bother you with those. Thanks for doing the proper analysis backing it.

@niemeyer niemeyer merged commit f221b84 into go-yaml:v2 Oct 2, 2019
1 check passed
@sfowl
Copy link

@sfowl sfowl commented Oct 9, 2019

@liggitt @niemeyer This patch partly comprises the k8s fix for CVE-2019-11253, which updates gopkg.in/yaml.v2 to version 2.2.4. Presumably any repository that uses an earlier version of gopkg.in/yaml.v2 and accepts untrusted YAML is similarly vulnerable. Rather than assigning individual CVEs for every repo that uses gopkg.in/yaml.v2 in a vulnerable way, I am wondering if it makes more sense to assign a CVE directly to gopkg.in/yaml.v2.

There is certainly an argument of "caveat emptor": that gopkg.in/yaml.v2 should not bear the responsibility of other projects to accept untrusted YAML. However, assigning a new CVE direct to gopkg.in/yaml.v2 allows for those projects to be alerted of this patch and potential vulnerability, which is essentially the same (i.e. allows for excessive resource usage via crafted YAML).

I can assist with CVE assignment for gopkg.in/yaml, if there are no objections.

@frioux
Copy link

@frioux frioux commented Oct 9, 2019

Shouldn't this also be merged into v3? (And I guess v1?)

@niemeyer
Copy link
Contributor

@niemeyer niemeyer commented Oct 10, 2019

This patch is an improvement on a fix that was backported from v3. Now we need to forward port it again.

About assigning a CVE to go-yaml, the fix for this is to not accept certain valid documents, so this is a vulnerability in the specification more than it is in the code. We are using heuristics to attempt to draw a line between what's something reasonable from something that could be abuse or someone being naive. Even with that in place, if your system accepts a large enough YAML document you may still be in trouble, because that's the nature of variable expansion: it allows the document to grow non-linearly with its input size.

That's a class of problem that I haven't seen CVEs being assigned to before, and it still feels somewhat wrong in this case.

niemeyer added a commit that referenced this issue Oct 10, 2019
This is a forward port of v2 commit f221b84 by Jordan Liggitt.
dolmen added a commit to dolmen-go/openapi-preprocessor that referenced this issue Oct 16, 2019
chenrui333 added a commit to chenrui333/yaml that referenced this issue Oct 17, 2019
* 'v2' of github.com:go-yaml/yaml:
  Trivial style tuning on last change.
  Improve heuristics preventing CPU/memory abuse (go-yaml#515)
  Add logic to catch cases of alias abuse.
thaJeztah added a commit to thaJeztah/yaml that referenced this issue Nov 28, 2019
full diff: go-yaml/yaml@v2.2.2...v2.2.7

includes:

- go-yaml/yaml@caeefd8
  addresses CVE-2019-11253 JSON/YAML parsing vulnerable to resource exhaustion attack
- go-yaml/yaml#171 Tighten restrictions on float decoding
- go-yaml/yaml#515 Add large document benchmarks, tune alias heuristic, add max depth limits
- go-yaml/yaml@f90ceb4
  fixes go-yaml/yaml#529 yaml.Unmarshal crashes on "assignment to entry in nil map"
- go-yaml/yaml#543 Port stale simple_keys fix to v2
- go-yaml/yaml@1f64d61
  fixes go-yaml/yaml#548 Invalid simple_keys now cause panics later in decode

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
thaJeztah added a commit to thaJeztah/yaml that referenced this issue Nov 28, 2019
full diff: go-yaml/yaml@v2.2.2...v2.2.7

includes:

- go-yaml/yaml@caeefd8
  addresses CVE-2019-11253 JSON/YAML parsing vulnerable to resource exhaustion attack
- go-yaml/yaml#171 Tighten restrictions on float decoding
- go-yaml/yaml#515 Add large document benchmarks, tune alias heuristic, add max depth limits
- go-yaml/yaml@f90ceb4
  fixes go-yaml/yaml#529 yaml.Unmarshal crashes on "assignment to entry in nil map"
- go-yaml/yaml#543 Port stale simple_keys fix to v2
- go-yaml/yaml@1f64d61
  fixes go-yaml/yaml#548 Invalid simple_keys now cause panics later in decode

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
@nsitbon nsitbon mentioned this pull request Dec 4, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants