Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

struct: store members as starlark.StringDict for faster access #401

Closed
wants to merge 5 commits into from

Conversation

emcfarlane
Copy link
Contributor

@emcfarlane emcfarlane commented Mar 1, 2022

Wraps starlark.StringDict for storing fields.
Improves the performance on the Attr method for accessing fields.

I added a benchmark to check the happy path for access to a fields value.

OLD:
BenchmarkAttr_4-4 44931932 24.03 ns/op
BenchmarkAttr_8-4 39342510 28.64 ns/op
BenchmarkAttr_16-4 33596271 32.18 ns/op
BenchmarkAttr_32-4 30505212 37.62 ns/op
BenchmarkAttr_64-4 24424558 43.33 ns/op
BenchmarkAttr_128-4 17947136 64.29 ns/op

NEW:
BenchmarkAttr_4-4 48772784 21.45 ns/op
BenchmarkAttr_8-4 34554286 30.95 ns/op
BenchmarkAttr_16-4 56185027 19.59 ns/op
BenchmarkAttr_32-4 58317289 19.60 ns/op
BenchmarkAttr_64-4 58073444 20.30 ns/op
BenchmarkAttr_128-4 52069291 21.30 ns/op

Wraps starlark.StringDict for storing fields.
Improves the performance on the `Attr` method for accessing fields.

Benchmark of changes to field storage.
goos: linux
goarch: amd64
pkg: go.starlark.net/starlarkstruct
cpu: Intel(R) Core(TM) i7-7500U CPU @ 2.70GHz

OLD:
BenchmarkAttr_4-4       44931932                24.03 ns/op
BenchmarkAttr_8-4       39342510                28.64 ns/op
BenchmarkAttr_16-4      33596271                32.18 ns/op
BenchmarkAttr_32-4      30505212                37.62 ns/op
BenchmarkAttr_64-4      24424558                43.33 ns/op
BenchmarkAttr_128-4     17947136                64.29 ns/op

NEW:
BenchmarkAttr_4-4       48772784                21.45 ns/op
BenchmarkAttr_8-4       34554286                30.95 ns/op
BenchmarkAttr_16-4      56185027                19.59 ns/op
BenchmarkAttr_32-4      58317289                19.60 ns/op
BenchmarkAttr_64-4      58073444                20.30 ns/op
BenchmarkAttr_128-4     52069291                21.30 ns/op
@@ -268,10 +237,13 @@ func structsEqual(x, y *Struct, depth int) (bool, error) {
return false, nil
}

for i, n := 0, x.len(); i < n; i++ {
if x.entries[i].name != y.entries[i].name {
for k, xv := range x.members {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This loop is now nondeterministic because EqualDepth may fail before or after an unequal field comparison, depending on iteration order. I don't see any way to make it deterministic other than by iterating over Keys(), which could increase the cost of comparison quite a lot, since it allocates.

Perhaps you could add benchmarks of these operations to testdata/benchmark.star?

Copy link
Contributor Author

@emcfarlane emcfarlane Mar 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right! Forgot about the comparison case. Not sure where the benchmarking setup code should live from starlark to give access to starlarkstruct packages so instead I've added a similar benchmark test under struct_test.go:

Current non-deterministic:

BenchmarkEqual_4-4       5617100               203.9 ns/op             0 B/op          0 allocs/op
BenchmarkEqual_8-4       2535235               469.3 ns/op             0 B/op          0 allocs/op
BenchmarkEqual_16-4      1643161               734.6 ns/op             0 B/op          0 allocs/op
BenchmarkEqual_32-4       743796              1568 ns/op               0 B/op          0 allocs/op
BenchmarkEqual_64-4       334785              3319 ns/op               0 B/op          0 allocs/op
BenchmarkEqual_128-4      150016              7613 ns/op               0 B/op          0 allocs/op

Fixed with range over members.Keys():

BenchmarkEqual_4-4       2556396               470.8 ns/op            88 B/op          2 allocs/op
BenchmarkEqual_8-4       1000000              1001 ns/op             152 B/op          2 allocs/op
BenchmarkEqual_16-4       711522              1753 ns/op             280 B/op          2 allocs/op
BenchmarkEqual_32-4       274262              3979 ns/op             536 B/op          2 allocs/op
BenchmarkEqual_64-4       121689              9411 ns/op            1048 B/op          2 allocs/op
BenchmarkEqual_128-4       51733             22085 ns/op            2072 B/op          2 allocs/op

Optimized with setErr func to report the first error by key:

BenchmarkEqual_4-4       5697518               205.8 ns/op             0 B/op          0 allocs/op
BenchmarkEqual_8-4       2513456               477.0 ns/op             0 B/op          0 allocs/op
BenchmarkEqual_16-4      1560715               741.0 ns/op             0 B/op          0 allocs/op
BenchmarkEqual_32-4       725080              1535 ns/op               0 B/op          0 allocs/op
BenchmarkEqual_64-4       354627              3426 ns/op               0 B/op          0 allocs/op
BenchmarkEqual_128-4      148989              7502 ns/op               0 B/op          0 allocs/op

Original:

BenchmarkEqual_4-4      11487849                99.35 ns/op            0 B/op          0 allocs/op
BenchmarkEqual_8-4       6423819               185.7 ns/op             0 B/op          0 allocs/op
BenchmarkEqual_16-4      3405678               350.4 ns/op             0 B/op          0 allocs/op
BenchmarkEqual_32-4      1758501               679.3 ns/op             0 B/op          0 allocs/op
BenchmarkEqual_64-4       880608              1345 ns/op               0 B/op          0 allocs/op
BenchmarkEqual_128-4      441919              2662 ns/op               0 B/op          0 allocs/op

I tried to optimize the comparison to avoid the allocation. So now looking at least 2-3x worse with the optimized case. I've just tested happy path on equal values. Non equal comparisons have to loop through all keys now so will be affected more. I thought Attr was the most important method but thats only from my use case.

starlarkstruct/struct.go Outdated Show resolved Hide resolved
@emcfarlane
Copy link
Contributor Author

Looked at how starlark.Dict internally does it and found a similar optimisation: #402

return false, nil
} else if eq, err := starlark.EqualDepth(x.entries[i].value, y.entries[i].value, depth-1); err != nil {
return false, err
// Loop over every key returning the lowest error by key, if any.
Copy link
Collaborator

@adonovan adonovan Mar 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This loop must now visit every pair of elements even when the first pair differs. That's a regression. I'm not sure StringDict really makes sense as a representation for struct. The requirements are that:

  1. struct is compact.
  2. struct.f is fast.
  3. str(struct) uses field-name order.
  4. hash(struct) uses field-name order (or is order-independent).
  5. struct==struct compares fields in name order.

The previous representation using a name-ordered k/v array was better on all requirements except 2, and even then the advantage is only material when N>=16, which is an unusually large struct in a typical Starlark program.

I suspect the access patterns of Starlark fields in Attr are highly temporally localized: the most likely field name to be accessed is the previous one. If that is true (and it would be worth measuring), it might be possible to outperform both the existing implementation and the hash table by caching the resulting index of the binary search in a field in the struct (without a mutex!) and using it as a hint next time.

func FromStringDict()  {
   ...
   s := &Struct{
	constructor: constructor,
	hint: maxuint32,
   }
}

func (s *Struct) Attr(name string) (starlark.Value, error) {
        if hint := s.hint; hint < n && s.entries[hint].name == name {
              return ... // yay the hint worked
        }
	// Binary search the entries.
	for i < j { ...binary search...}
	if i < n && s.entries[i].name == name {
                s.hint = i // save the hint
		return s.entries[i].value, nil
	}
        ...
}

What do you think?

Copy link
Contributor Author

@emcfarlane emcfarlane Mar 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. I think the original binary search implementation is better. Hint idea sounds good, I'll have a look at how to validate it. Please let me know if you'd like to keep the benchmarking code and I could modify this PR to just include that? Otherwise please close! Thanks for taking the time to review.

I did try a solution based on the starlark.hashtable here: emcfarlane#1 . It does look like it could beat both implementations but requires modifying the original hashtable, and cloning it out of the package. Just an experiment.

@emcfarlane
Copy link
Contributor Author

Closing for #403

@emcfarlane emcfarlane closed this Mar 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants