Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reflect: Call is slow #7818

Open
bradfitz opened this issue Apr 18, 2014 · 12 comments
Open

reflect: Call is slow #7818

bradfitz opened this issue Apr 18, 2014 · 12 comments

Comments

@bradfitz
Copy link
Contributor

@bradfitz bradfitz commented Apr 18, 2014

reflect.Value.Call is pretty slow.

In addition to always allocating a slice for its []reflect.Value result parameters, it
also does a lot of prep work on each call.

It would be nice to both avoid that allocation and also do the setup checks just once.

Maybe a new method to return some sort of 'Caller' type that makes repeated Calls
faster, with less paranoia and no allocation.

Or just speed up the checks and make a new Call method that also accepts a slice for the
result values.
@bradfitz

This comment has been minimized.

@bradfitz

This comment has been minimized.

dvyukov added a commit that referenced this issue Jan 28, 2015
Call frame allocations can account for significant portion
of all allocations in a program, if call is executed
in an inner loop (e.g. to process every line in a log).
On the other hand, the allocation is easy to remove
using sync.Pool since the allocation is strictly scoped.

benchmark           old ns/op     new ns/op     delta
BenchmarkCall       634           338           -46.69%
BenchmarkCall-4     496           167           -66.33%

benchmark           old allocs     new allocs     delta
BenchmarkCall       1              0              -100.00%
BenchmarkCall-4     1              0              -100.00%

Update #7818

Change-Id: Icf60cce0a9be82e6171f0c0bd80dee2393db54a7
Reviewed-on: https://go-review.googlesource.com/1954
Reviewed-by: Keith Randall <khr@golang.org>
@rsc rsc added this to the Unplanned milestone Apr 10, 2015
@bradfitz bradfitz added this to the Go1.9Maybe milestone Dec 12, 2016
@bradfitz bradfitz removed this from the Unplanned milestone Dec 12, 2016
@bradfitz
Copy link
Contributor Author

@bradfitz bradfitz commented Dec 12, 2016

Looking at some internal fleet-wide CPU usage, I see reflect.Value.Call and reflect.Value.call show up pretty high in the list.

It might be time to optimize this.

In Go 1.8, there's now precedent in the reflect package for returning a worker func (reflect.Swapper) after doing the validation only once.

Investigate the top 2014 comment's Caller idea, and see how much CPU it can save.

/cc @dsnet

@bradfitz bradfitz self-assigned this Dec 12, 2016
@aclements
Copy link
Member

@aclements aclements commented Mar 28, 2017

Maybe a new method to return some sort of 'Caller' type that makes repeated Calls faster, with less paranoia and no allocation.

Value.Interface() sort of does this, but you have to call the result like a real function, not using Value.Call(). If you can fit code into using Value.Interface(), then there's no allocation overhead (unless the frame pool is empty) and no validation. The compiler constructs the arguments frame and interprets the result frame and reflect just has to do some memory copying. (That said, it still does two copies of both the arguments and the results. I feel like it should be possible to get that down to one.)

@nicola-spb

This comment has been minimized.

@bradfitz bradfitz added this to the Go1.10 milestone Jun 7, 2017
@bradfitz bradfitz removed this from the Go1.9Maybe milestone Jun 7, 2017
@bruth
Copy link

@bruth bruth commented Oct 17, 2017

For some concrete numbers, as of 1.9 on darwin/amd64, reflect.Value.Call is ~65-80x slower than other invocation types. Gist of benchmark code.

BenchmarkReflectMethodCall-4         	10000000	       144 ns/op
BenchmarkReflectOnceMethodCall-4     	10000000	       138 ns/op
BenchmarkStructMethodCall-4          	1000000000	         2.20 ns/op
BenchmarkInterfaceMethodCall-4       	1000000000	         2.14 ns/op
BenchmarkTypeSwitchMethodCall-4      	2000000000	         1.88 ns/op
BenchmarkTypeAssertionMethodCall-4   	2000000000	         1.83 ns/op

@bradfitz bradfitz removed this from the Go1.10 milestone Nov 14, 2017
@bradfitz bradfitz added this to the Unplanned milestone Nov 14, 2017
@bradfitz bradfitz removed their assignment Nov 14, 2017
@dongweigogo

This comment has been minimized.

@patricksuo
Copy link

@patricksuo patricksuo commented Jun 12, 2018

Caching function layout in a Caller reduce 50% reflect call overhead. https://gist.github.com/sillyousu/606e4874839456cc02335bd1c5045f27

update @bruth 's benchmark:

BenchmarkReflectCaller-8                20000000                64.6 ns/op
BenchmarkReflectMethodCall-8            10000000               135 ns/op
BenchmarkReflectOnceMethodCall-8        10000000               126 ns/op
BenchmarkStructMethodCall-8             2000000000               1.67 ns/op
BenchmarkInterfaceMethodCall-8          1000000000               2.41 ns/op
BenchmarkTypeSwitchMethodCall-8         2000000000               1.23 ns/op
BenchmarkTypeAssertionMethodCall-8      2000000000               1.42 ns/op

@bradfitz

This comment has been minimized.

@dongweigogo
Copy link

@dongweigogo dongweigogo commented Dec 28, 2018

It seems that java sets a buffer slice to avoid this problem.

@gopherbot
Copy link

@gopherbot gopherbot commented Mar 9, 2019

Change https://golang.org/cl/166462 mentions this issue: reflect: make all flag.mustBe* methods inlinable

gopherbot pushed a commit that referenced this issue Mar 9, 2019
mustBe was barely over budget, so manually inlining the first flag.kind
call is enough. Add a TODO to reverse that in the future, once the
compiler gets better.

mustBeExported and mustBeAssignable were over budget by a larger amount,
so add slow path functions instead. This is the same strategy used in
the sync package for common methods like Once.Do, for example.

Lots of exported reflect.Value methods call these assert-like unexported
methods, so avoiding the function call overhead in the common case does
shave off a percent from most exported APIs.

Finally, add the methods to TestIntendedInlining.

While at it, replace a couple of uses of the 0 Kind with its descriptive
name, Invalid.

name     old time/op    new time/op    delta
Call-8     68.0ns ± 1%    66.8ns ± 1%  -1.81%  (p=0.000 n=10+9)
PtrTo-8    8.00ns ± 2%    7.83ns ± 0%  -2.19%  (p=0.000 n=10+9)

Updates #7818.

Change-Id: Ic1603b640519393f6b50dd91ec3767753eb9e761
Reviewed-on: https://go-review.googlesource.com/c/go/+/166462
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
jsign added a commit to jsign/go that referenced this issue Jan 3, 2021
Taking a look at a CPU profile, IsVariadic was calculated multiple times
unnecessarily.

I also added a new BechmarkCallMethod to measure this use-case.

name                       old time/op    new time/op    delta
Call-16                      20.3ns ± 8%    20.3ns ±16%    ~     (p=0.443 n=18+20)
CallMethod-16                99.2ns ± 3%    90.1ns ± 2%  -9.22%  (p=0.000 n=20+17)
CallArgCopy/size=128-16      15.0ns ± 2%    14.5ns ± 3%  -2.76%  (p=0.000 n=20+19)
CallArgCopy/size=256-16      15.9ns ± 7%    15.3ns ± 5%  -4.26%  (p=0.000 n=20+20)
CallArgCopy/size=1024-16     17.6ns ± 7%    17.2ns ± 6%  -1.73%  (p=0.044 n=19+20)
CallArgCopy/size=4096-16     25.3ns ± 4%    24.9ns ± 4%  -1.66%  (p=0.016 n=18+20)
CallArgCopy/size=65536-16     375ns ± 4%     376ns ± 4%    ~     (p=0.644 n=20+20)

name                       old alloc/op   new alloc/op   delta
Call-16                       0.00B          0.00B         ~     (all equal)
CallMethod-16                 0.00B          0.00B         ~     (all equal)

name                       old allocs/op  new allocs/op  delta
Call-16                        0.00           0.00         ~     (all equal)
CallMethod-16                  0.00           0.00         ~     (all equal)

name                       old speed      new speed      delta
CallArgCopy/size=128-16    8.56GB/s ± 2%  8.80GB/s ± 3%  +2.84%  (p=0.000 n=20+19)
CallArgCopy/size=256-16    16.1GB/s ± 6%  16.8GB/s ± 5%  +4.45%  (p=0.000 n=20+20)
CallArgCopy/size=1024-16   58.2GB/s ± 7%  59.4GB/s ± 6%  +2.16%  (p=0.026 n=20+20)
CallArgCopy/size=4096-16    161GB/s ± 4%   165GB/s ± 4%  +1.95%  (p=0.007 n=17+20)
CallArgCopy/size=65536-16   175GB/s ± 4%   174GB/s ± 4%    ~     (p=0.640 n=20+20)

Updates golang#7818

Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>
jsign added a commit to jsign/go that referenced this issue Jan 3, 2021
These calls are cachable, so do that in order to avoid doing extra work.

This opportunity was discovered while taking a look at a CPU profile while
investigating golang#7818.

I added a BenchmarkCallMethod which is similar to BechmarkCall but for a
method receiver.

Benchmark results, including the new BenchmarkCallMethod:

name                       old time/op    new time/op    delta
Call-16                      22.0ns ±19%    20.2ns ±17%  -8.08%  (p=0.000 n=40+40)
CallMethod-16                 100ns ± 3%      91ns ± 2%  -9.13%  (p=0.000 n=40+39)
CallArgCopy/size=128-16      15.7ns ± 1%    14.3ns ± 4%  -8.98%  (p=0.000 n=38+37)
CallArgCopy/size=256-16      15.9ns ± 3%    15.0ns ± 5%  -6.12%  (p=0.000 n=39+39)
CallArgCopy/size=1024-16     18.8ns ± 6%    17.1ns ± 6%  -9.03%  (p=0.000 n=38+38)
CallArgCopy/size=4096-16     26.6ns ± 3%    25.2ns ± 4%  -5.19%  (p=0.000 n=39+40)
CallArgCopy/size=65536-16     379ns ± 3%     371ns ± 5%  -2.11%  (p=0.000 n=39+40)

name                       old alloc/op   new alloc/op   delta
Call-16                       0.00B          0.00B         ~     (all equal)
CallMethod-16                 0.00B          0.00B         ~     (all equal)

name                       old allocs/op  new allocs/op  delta
Call-16                        0.00           0.00         ~     (all equal)
CallMethod-16                  0.00           0.00         ~     (all equal)

name                       old speed      new speed      delta
CallArgCopy/size=128-16    8.13GB/s ± 1%  8.92GB/s ± 4%  +9.77%  (p=0.000 n=38+38)
CallArgCopy/size=256-16    16.1GB/s ± 3%  17.1GB/s ± 5%  +6.56%  (p=0.000 n=39+39)
CallArgCopy/size=1024-16   54.6GB/s ± 6%  60.1GB/s ± 5%  +9.93%  (p=0.000 n=38+38)
CallArgCopy/size=4096-16    154GB/s ± 5%   163GB/s ± 4%  +5.63%  (p=0.000 n=40+40)
CallArgCopy/size=65536-16   173GB/s ± 3%   177GB/s ± 5%  +2.18%  (p=0.000 n=39+40)

Updates golang#7818.

Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>
jsign added a commit to jsign/go that referenced this issue Jan 3, 2021
These calls are cachable, so do that in order to avoid doing extra work.

This opportunity was discovered while taking a look at a CPU profile while
investigating golang#7818.

I added a BenchmarkCallMethod which is similar to BechmarkCall but for a
method receiver.

Benchmark results, including the new BenchmarkCallMethod:

name                       old time/op    new time/op    delta
Call-16                      22.0ns ±19%    20.2ns ±17%  -8.08%  (p=0.000 n=40+40)
CallMethod-16                 100ns ± 3%      91ns ± 2%  -9.13%  (p=0.000 n=40+39)
CallArgCopy/size=128-16      15.7ns ± 1%    14.3ns ± 4%  -8.98%  (p=0.000 n=38+37)
CallArgCopy/size=256-16      15.9ns ± 3%    15.0ns ± 5%  -6.12%  (p=0.000 n=39+39)
CallArgCopy/size=1024-16     18.8ns ± 6%    17.1ns ± 6%  -9.03%  (p=0.000 n=38+38)
CallArgCopy/size=4096-16     26.6ns ± 3%    25.2ns ± 4%  -5.19%  (p=0.000 n=39+40)
CallArgCopy/size=65536-16     379ns ± 3%     371ns ± 5%  -2.11%  (p=0.000 n=39+40)

name                       old alloc/op   new alloc/op   delta
Call-16                       0.00B          0.00B         ~     (all equal)
CallMethod-16                 0.00B          0.00B         ~     (all equal)

name                       old allocs/op  new allocs/op  delta
Call-16                        0.00           0.00         ~     (all equal)
CallMethod-16                  0.00           0.00         ~     (all equal)

name                       old speed      new speed      delta
CallArgCopy/size=128-16    8.13GB/s ± 1%  8.92GB/s ± 4%  +9.77%  (p=0.000 n=38+38)
CallArgCopy/size=256-16    16.1GB/s ± 3%  17.1GB/s ± 5%  +6.56%  (p=0.000 n=39+39)
CallArgCopy/size=1024-16   54.6GB/s ± 6%  60.1GB/s ± 5%  +9.93%  (p=0.000 n=38+38)
CallArgCopy/size=4096-16    154GB/s ± 5%   163GB/s ± 4%  +5.63%  (p=0.000 n=40+40)
CallArgCopy/size=65536-16   173GB/s ± 3%   177GB/s ± 5%  +2.18%  (p=0.000 n=39+40)

Updates golang#7818.

Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>
@gopherbot
Copy link

@gopherbot gopherbot commented Jan 3, 2021

Change https://golang.org/cl/281252 mentions this issue: reflect: cache IsVariadic calls in Call

mvdan pushed a commit to jsign/go that referenced this issue Mar 30, 2021
These calls are cachable, so do that in order to avoid doing extra work.

This opportunity was discovered while taking a look at a CPU profile while
investigating golang#7818.

I added a BenchmarkCallMethod which is similar to BechmarkCall but for a
method receiver.

Benchmark results, including the new BenchmarkCallMethod:

name                       old time/op    new time/op    delta
Call-16                      22.0ns ±19%    20.2ns ±17%  -8.08%  (p=0.000 n=40+40)
CallMethod-16                 100ns ± 3%      91ns ± 2%  -9.13%  (p=0.000 n=40+39)
CallArgCopy/size=128-16      15.7ns ± 1%    14.3ns ± 4%  -8.98%  (p=0.000 n=38+37)
CallArgCopy/size=256-16      15.9ns ± 3%    15.0ns ± 5%  -6.12%  (p=0.000 n=39+39)
CallArgCopy/size=1024-16     18.8ns ± 6%    17.1ns ± 6%  -9.03%  (p=0.000 n=38+38)
CallArgCopy/size=4096-16     26.6ns ± 3%    25.2ns ± 4%  -5.19%  (p=0.000 n=39+40)
CallArgCopy/size=65536-16     379ns ± 3%     371ns ± 5%  -2.11%  (p=0.000 n=39+40)

name                       old alloc/op   new alloc/op   delta
Call-16                       0.00B          0.00B         ~     (all equal)
CallMethod-16                 0.00B          0.00B         ~     (all equal)

name                       old allocs/op  new allocs/op  delta
Call-16                        0.00           0.00         ~     (all equal)
CallMethod-16                  0.00           0.00         ~     (all equal)

name                       old speed      new speed      delta
CallArgCopy/size=128-16    8.13GB/s ± 1%  8.92GB/s ± 4%  +9.77%  (p=0.000 n=38+38)
CallArgCopy/size=256-16    16.1GB/s ± 3%  17.1GB/s ± 5%  +6.56%  (p=0.000 n=39+39)
CallArgCopy/size=1024-16   54.6GB/s ± 6%  60.1GB/s ± 5%  +9.93%  (p=0.000 n=38+38)
CallArgCopy/size=4096-16    154GB/s ± 5%   163GB/s ± 4%  +5.63%  (p=0.000 n=40+40)
CallArgCopy/size=65536-16   173GB/s ± 3%   177GB/s ± 5%  +2.18%  (p=0.000 n=39+40)

Updates golang#7818.

Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>
gopherbot pushed a commit that referenced this issue Mar 30, 2021
These calls are cacheable, so do that to avoid doing extra work.

This opportunity was discovered while taking a look at a CPU profile
while investigating #7818.

I added a BenchmarkCallMethod, which is similar to BechmarkCall but
for a method receiver.

Benchmark results, including the new BenchmarkCallMethod:

	name                       old time/op    new time/op    delta
	Call-16                      22.0ns ±19%    20.2ns ±17%  -8.08%  (p=0.000 n=40+40)
	CallMethod-16                 100ns ± 3%      91ns ± 2%  -9.13%  (p=0.000 n=40+39)
	CallArgCopy/size=128-16      15.7ns ± 1%    14.3ns ± 4%  -8.98%  (p=0.000 n=38+37)
	CallArgCopy/size=256-16      15.9ns ± 3%    15.0ns ± 5%  -6.12%  (p=0.000 n=39+39)
	CallArgCopy/size=1024-16     18.8ns ± 6%    17.1ns ± 6%  -9.03%  (p=0.000 n=38+38)
	CallArgCopy/size=4096-16     26.6ns ± 3%    25.2ns ± 4%  -5.19%  (p=0.000 n=39+40)
	CallArgCopy/size=65536-16     379ns ± 3%     371ns ± 5%  -2.11%  (p=0.000 n=39+40)

	name                       old alloc/op   new alloc/op   delta
	Call-16                       0.00B          0.00B         ~     (all equal)
	CallMethod-16                 0.00B          0.00B         ~     (all equal)

	name                       old allocs/op  new allocs/op  delta
	Call-16                        0.00           0.00         ~     (all equal)
	CallMethod-16                  0.00           0.00         ~     (all equal)

	name                       old speed      new speed      delta
	CallArgCopy/size=128-16    8.13GB/s ± 1%  8.92GB/s ± 4%  +9.77%  (p=0.000 n=38+38)
	CallArgCopy/size=256-16    16.1GB/s ± 3%  17.1GB/s ± 5%  +6.56%  (p=0.000 n=39+39)
	CallArgCopy/size=1024-16   54.6GB/s ± 6%  60.1GB/s ± 5%  +9.93%  (p=0.000 n=38+38)
	CallArgCopy/size=4096-16    154GB/s ± 5%   163GB/s ± 4%  +5.63%  (p=0.000 n=40+40)
	CallArgCopy/size=65536-16   173GB/s ± 3%   177GB/s ± 5%  +2.18%  (p=0.000 n=39+40)

Updates #7818.

Change-Id: I94f88811ea9faf3dc2543984a13b360b5db66a4b
GitHub-Last-Rev: 9bbaa18
GitHub-Pull-Request: #43475
Reviewed-on: https://go-review.googlesource.com/c/go/+/281252
Reviewed-by: Daniel Martí <mvdan@mvdan.cc>
Reviewed-by: Keith Randall <khr@golang.org>
Trust: Daniel Martí <mvdan@mvdan.cc>
Trust: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Go Bot <gobot@golang.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants