Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reflect: use binary search in MethodByName #50617

Open
quasilyte opened this issue Jan 14, 2022 · 6 comments
Open

reflect: use binary search in MethodByName #50617

quasilyte opened this issue Jan 14, 2022 · 6 comments
Labels
NeedsInvestigation Performance

Comments

@quasilyte
Copy link
Contributor

@quasilyte quasilyte commented Jan 14, 2022

What version of Go are you using (go version)?

go1.17.2

Does this issue reproduce with the latest release?

Yes.

What operating system and processor architecture are you using (go env)?

Linux/amd64

But it should be irrelevant.

What did you do?

Executed hugo on digitalgov.gov site, collected the CPU profiles. Found that reflect.Type.MethodByName is top1.

      flat  flat%   sum%        cum   cum%
     8.84s  6.28%  6.28%     57.85s 41.10%  reflect.(*rtype).MethodByName
     7.93s  5.63% 11.92%      8.50s  6.04%  reflect.name.readVarint
     7.56s  5.37% 17.29%    111.79s 79.43%  reflect.Value.call
     7.53s  5.35% 22.64%     23.33s 16.58%  runtime.mallocgc
     7.29s  5.18% 27.82%     16.10s 11.44%  reflect.name.name

If we take a look at the MethodByName sources, we'll notice that it uses a linear search. There is even a comment to fix that:

go/src/reflect/type.go

Lines 888 to 893 in a99c38d

// TODO(mdempsky): Binary search.
for i, p := range ut.exportedMethods() {
if t.nameOff(p.name).name() == name {
return t.Method(i), true
}
}

What did you expect to see?

reflect.Type.MethodByName doesn't use linear search for bigger slices.

What did you see instead?

reflect.Type.MethodByName is quite slow for some cases.

More info, numbers, benchmarks

I'll send a CL that adds a binary search to the reflect.Type.MethodByName, but since there are a few caveats, I think it's worthwhile to share some info I gathered by investigating this.

After patching the reflect package, I compared the results.

Old site build time: 20.4s
New site build time: 16.5s

If we take a look at the "new" CPU profile, MethodByName doesn't appear in top20 at all, even though it was a top1 entry before:

(pprof) top 20
Showing nodes accounting for 50.85s, 44.61% of 114s total
Dropped 1174 nodes (cum <= 0.57s)
Showing top 20 nodes out of 194
      flat  flat%   sum%        cum   cum%
     8.64s  7.58%  7.58%     25.01s 21.94%  runtime.mallocgc
     7.69s  6.75% 14.32%     87.02s 76.33%  reflect.Value.call
     4.17s  3.66% 17.98%      5.39s  4.73%  runtime.heapBitsSetType
     2.88s  2.53% 20.51%         4s  3.51%  runtime.findObject
     2.83s  2.48% 22.99%      2.83s  2.48%  runtime.nextFreeFast (inline)
     2.79s  2.45% 25.44%      9.46s  8.30%  runtime.scanobject
     2.74s  2.40% 27.84%      2.74s  2.40%  runtime.duffcopy
     1.82s  1.60% 29.44%      1.88s  1.65%  runtime.pageIndexOf (partial-inline)
     1.76s  1.54% 30.98%      1.76s  1.54%  runtime.memclrNoHeapPointers
     1.69s  1.48% 32.46%      3.24s  2.84%  runtime.mapaccess2
     1.66s  1.46% 33.92%      1.83s  1.61%  syscall.Syscall
     1.49s  1.31% 35.23%      1.61s  1.41%  reflect.name.readVarint (inline)
     1.43s  1.25% 36.48%      3.07s  2.69%  reflect.name.name
     1.42s  1.25% 37.73%     10.54s  9.25%  reflect.FuncOf
     1.37s  1.20% 38.93%      1.75s  1.54%  github.com/gohugoio/hugo/related.ranks.Less
     1.34s  1.18% 40.11%      4.99s  4.38%  sync.(*Map).Load
     1.33s  1.17% 41.27%      1.33s  1.17%  reflect.(*rtype).Kind
     1.29s  1.13% 42.40%      1.29s  1.13%  cmpbody
     1.28s  1.12% 43.53%      1.28s  1.12%  runtime.resolveNameOff
     1.23s  1.08% 44.61%      1.30s  1.14%  runtime.heapBitsForAddr (partial-inline)

But then we have a question: why we get an impact that big? We need to collect some stats to be sure that it's not a coincidence.

First, let's see a distribution of number of methods.

15920637 times | 120 methods
493669 times | 29 methods
227736 times | 16 methods
180098 times | 8 methods
143846 times | 39 methods
31371 times | 6 methods
14636 times | 1 methods
10933 times | 31 methods
10094 times | 9 methods
9026 times | 4 methods
102 times | 113 methods
85 times | 7 methods
4 times | 114 methods
1 times | 0 methods

It looks like vast majority of calls goes through 120-entry slice with linear search. Not very common I guess, but that's what we get.

But if the searched method would be located at 2nd or 3rd pos, it wouldn't be a problem. Therefore, we need to see how "found at" indexes are distributed.

12998952 times | 98 index pos
1203864 times | 70 index pos
515978 times | 102 index pos
354191 times | 110 index pos
262965 times | 6 index pos
200819 times | 11 index pos
178457 times | 19 index pos
170806 times | 97 index pos
147510 times | 74 index pos
136927 times | 3 index pos
123071 times | 20 index pos
106250 times | 9 index pos
87190 times | 7 index pos
62767 times | 31 index pos
62192 times | 5 index pos
50170 times | 51 index pos
49120 times | 69 index pos
45325 times | 72 index pos
42236 times | 115 index pos
38646 times | 0 index pos
37106 times | 28 index pos
22744 times | 108 index pos
21521 times | 14 index pos
18134 times | 92 index pos
17341 times | 4 index pos
11535 times | 103 index pos
10791 times | 63 index pos
10303 times | 2 index pos
9170 times | 10 index pos
7278 times | 68 index pos
5936 times | 15 index pos
5466 times | 65 index pos
5385 times | 16 index pos
5172 times | 47 index pos
4478 times | 43 index pos
3032 times | 40 index pos
2340 times | 17 index pos
2265 times | 25 index pos
1516 times | 44 index pos
1068 times | 88 index pos
843 times | 1 index pos
709 times | 71 index pos
358 times | 109 index pos
100 times | 67 index pos
50 times | 64 index pos
40 times | 42 index pos
33 times | 34 index pos
17 times | 86 index pos
15 times | 83 index pos
15 times | 41 index pos
15 times | 33 index pos
12 times | 118 index pos
5 times | 8 index pos
5 times | 27 index pos
4 times | 66 index pos

Most cases fall into "found at i=98" category, which is pretty bad for linear search.

In total, building that site makes hugo call MethodByName 17042238 times.

Benchmarking the new method shows mixed, but expected, results:

name                           old time/op  new time/op  delta
MethodByName/4_first-8          426ns ± 2%   423ns ± 1%     ~     (p=0.056 n=10+9)
MethodByName/4_last-8           482ns ± 1%   476ns ± 1%   -1.33%  (p=0.000 n=10+10)
MethodByName/4_nonexisting-8   57.8ns ± 0%  57.3ns ± 1%   -0.78%  (p=0.000 n=9+9)
MethodByName/16_first-8         427ns ± 1%   538ns ± 0%  +25.97%  (p=0.000 n=9+9)
MethodByName/16_last-8          643ns ± 1%   519ns ± 1%  -19.24%  (p=0.000 n=9+9)
MethodByName/16_nonexisting-8   194ns ± 0%   105ns ± 0%  -46.03%  (p=0.000 n=8+9)
MethodByName/32_first-8         429ns ± 1%   552ns ± 1%  +28.78%  (p=0.000 n=10+8)
MethodByName/32_last-8          831ns ± 0%   540ns ± 2%  -34.97%  (p=0.000 n=10+9)
MethodByName/32_nonexisting-8   396ns ± 1%   125ns ± 1%  -68.56%  (p=0.000 n=10+10)
MethodByName/64_first-8         431ns ± 1%   580ns ± 2%  +34.64%  (p=0.000 n=10+10)
MethodByName/64_last-8         1.36µs ± 1%  0.56µs ± 1%  -58.95%  (p=0.000 n=10+8)
MethodByName/64_nonexisting-8   759ns ± 0%   146ns ± 1%  -80.82%  (p=0.000 n=8+10)
[Geo mean]                      430ns        303ns       -29.58%

In general, the new approach is only slower if searched method happens to be located somewhere in the beginning of the slice. If we assume that they're somewhere in the middle on average, then binary search performs better. I would still keep a linear search path for small slices, like len<=8 to avoid a minor regression for small slices (that are not so uncommon).

Caveats

  1. reflect can't import sort package, so we can't use sort.Search
  2. Linear search works better if a search term is somewhere at the beginning of a slice

We can decrease the effect of (2) by using a linear search path for small slices, but in general, this side-effect is hard to avoid.

@gopherbot
Copy link

@gopherbot gopherbot commented Jan 14, 2022

Change https://golang.org/cl/378634 mentions this issue: reflect: use binary search in MethodByName

@robpike
Copy link
Contributor

@robpike robpike commented Jan 14, 2022

Thanks for the thorough analysis. It's nice to see someone really dig into a performance problem like this.

120 methods is just crazy. I wonder how common this problem actually is, and if a different fix, if any, is the right way forward. MethodByName works very hard, even ignoring the a linear scan. Also, using the reflect package to call a method is sometimes necessary but often avoidable with a bit of work elsewhere. I am nervous about opening the door to optimizations of the reflect package.

@seankhliao seankhliao added NeedsInvestigation Performance labels Jan 14, 2022
@YuriyNasretdinov
Copy link

@YuriyNasretdinov YuriyNasretdinov commented Jan 14, 2022

As a regular Go user, I think it's pretty reasonable to assume that MethodByName() won't be an O(N) operation. I think the author did list one such example, the templating engine, where it matters, but one would argue that e.g. if you make a reflection-based HTTP router, or any kind of dynamic dispatch, e.g. in an interpreter like yaegi, you would probably just use MethodByName(). You would not think of caching it unless you know that it's implemented as a linear search, because it's a standard library function, and most of the time the user's assumption would be that it's reasonably (as much as it can be reasonable given that it's reflection) optimised.

@quasilyte
Copy link
Contributor Author

@quasilyte quasilyte commented Jan 14, 2022

@YuriyNasretdinov in fact, yaegi does use MethodByName in a few places. I haven't done any research to find out whether these usages lie on a hot path or anything. I can probably run the benchmarks that involve calling MethodByName and give some old/new timings.

@MBkkt
Copy link

@MBkkt MBkkt commented Jan 15, 2022

Linear search works better if a search term is somewhere at the beginning of a slice

I think you can try to fix it with something like this:

r = linearSearch(0, 4, x); 0-3 if found, 4 if not
if (r != 4) {
  return a[r];
}
l = r;
for (; r < len(a) / 2;) {
  r *= 2;
  if (x < a[r]) {
    return binarySearch(l, r, x);
  }
  l = r;
}
return binarySearch(l, len(a), x);

@quasilyte
Copy link
Contributor Author

@quasilyte quasilyte commented Jan 15, 2022

@MBkkt probably. It may be worthwhile to see which strategy gives the mean average boost on different real codebases as opposed to the benchmarks.

That being said, I would like to wait for a decision on whether any optimization to MethodByName should be done at all.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsInvestigation Performance
Projects
None yet
Development

No branches or pull requests

6 participants