Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

performance: creation of object with many key/val pairs #4625

Closed
srenatus opened this issue Apr 27, 2022 · 5 comments · Fixed by #4830
Closed

performance: creation of object with many key/val pairs #4625

srenatus opened this issue Apr 27, 2022 · 5 comments · Fixed by #4830
Assignees

Comments

@srenatus
Copy link
Contributor

A map with 1200000 keys that OPA at some point needed to convert from a go struct into ast.Value and did it through 1200000 separate inserts, each involved hash computation, binary search of key and then insertion of the key in the middle of the list which means possible allocation and data copy. That insert alone took 75% of the processing time combined. If OPA would have object.BatchInsert or something similar that would add all elements and then sort the final list that would work order of magnitude faster.

TODO:

  • create a benchmark
  • change the implementation
  • prove that it helps
@philipaconrad
Copy link
Contributor

I'm collecting some notes on how other programming languages have handled dictionary/hashtable data structures, and we can potentially sample from these to find ideas for different implementations:

Python:

Lua:

Ruby:

@stale
Copy link

stale bot commented Jun 23, 2022

This issue has been automatically marked as inactive because it has not had any activity in the last 30 days.

@stale stale bot added the inactive label Jun 23, 2022
@philipaconrad philipaconrad self-assigned this Jun 29, 2022
@stale stale bot removed the inactive label Jun 29, 2022
@philipaconrad
Copy link
Contributor

I finally have bandwidth to pick this issue back up again! 😃

I'm going to tackle the benchmarking first, since that's necessary for us to quantify performance diffs. The closest existing test data generator in util/test/benchmark.go is the GenerateLargeJSONBenchmarkData function, but the object it creates is an entire order of magnitude smaller than the case mentioned in the original issue (100k keys, versus 1.2M keys).

My current plan is to have 1-2 different tests, since different implementations could have performance differences between these cases:

  • Incrementally building up a large object by inserting keys one-by-one.
  • (Optional) Inserting a single key into an already-constructed large (>1M keys) object.

The second case under most circumstances will be a specialized version of the first case, so it might not be worth adding. 🤷‍♂️

@philipaconrad
Copy link
Contributor

philipaconrad commented Jun 29, 2022

Interestingly, we do have an existing object insertion test under ast/term_bench_test.go, named BenchmarkObjectConstruction.

It only goes up to a size of 50k keys though, so I might add a 100k keys option, since the default benchtest results look like this on my T480 ThinkPad:

Running tool: /usr/local/go/bin/go test -benchmem -run=^$ -bench ^BenchmarkObjectConstruction$ github.com/open-policy-agent/opa/ast

goos: linux
goarch: amd64
pkg: github.com/open-policy-agent/opa/ast
cpu: Intel(R) Core(TM) i7-8650U CPU @ 1.90GHz
BenchmarkObjectConstruction/shuffled_keys/5-8       	  470325	      3123 ns/op	     880 B/op	      32 allocs/op
BenchmarkObjectConstruction/shuffled_keys/50-8      	   38388	     30145 ns/op	    8806 B/op	     265 allocs/op
BenchmarkObjectConstruction/shuffled_keys/500-8     	    2742	    421370 ns/op	  105506 B/op	    3332 allocs/op
BenchmarkObjectConstruction/shuffled_keys/5000-8    	     196	   6521511 ns/op	 1029403 B/op	   34930 allocs/op
BenchmarkObjectConstruction/shuffled_keys/50000-8   	       5	 239263439 ns/op	10475545 B/op	  351682 allocs/op
BenchmarkObjectConstruction/increasing_keys/5-8     	  488294	      2143 ns/op	     880 B/op	      32 allocs/op
BenchmarkObjectConstruction/increasing_keys/50-8    	   43467	     27188 ns/op	    8806 B/op	     265 allocs/op
BenchmarkObjectConstruction/increasing_keys/500-8   	    3094	    376710 ns/op	  105518 B/op	    3332 allocs/op
BenchmarkObjectConstruction/increasing_keys/5000-8  	     249	   4858298 ns/op	 1028220 B/op	   34930 allocs/op
BenchmarkObjectConstruction/increasing_keys/50000-8 	      18	  64826620 ns/op	10458889 B/op	  351702 allocs/op
PASS
ok  	github.com/open-policy-agent/opa/ast	15.900s

@philipaconrad
Copy link
Contributor

So, it looks like the one interface function that was relying on the existing slice-based key structure for ast.Object is Elem(i int) (*Term, *Term), which, when removed, reveals that topdown/eval used Elem in biunifyObjectsRec, to allow iteratively performing recursive unification in a predictable order with low memory consumption.

This may be slightly trickier to replace than anticipated, as this site appears to strongly rely on the ordering of the keys, and replacing the iterator with a naive object.Keys() would cause massive memory usage when unifying large objects.

I'll keep investigating. 🤔 There's definitely a clean solution possible here.

philipaconrad added a commit to philipaconrad/opa that referenced this issue Jul 8, 2022
This commit delays the sorting of keys until just-before-use. This
is a net win on asymptotics as Objects get larger, even with Quicksort
as the sorting algorithm.

This commit also adjusts the evaluator to use the new ObjectKeysIterator
interface, instead of the raw keys array.

Fixes open-policy-agent#4625.

Signed-off-by: Philip Conrad <philipaconrad@gmail.com>
philipaconrad added a commit that referenced this issue Jul 8, 2022
This commit delays the sorting of keys until just-before-use. This
is a net win on asymptotics as Objects get larger, even with Quicksort
as the sorting algorithm.

This commit also adjusts the evaluator to use the new ObjectKeysIterator
interface, instead of the raw keys array.

Fixes #4625.

Signed-off-by: Philip Conrad <philipaconrad@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants