Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lease: refactor lease's benchmark. #10710

Merged
merged 1 commit into from Jun 8, 2019
Merged

lease: refactor lease's benchmark. #10710

merged 1 commit into from Jun 8, 2019

Conversation

j2gg0s
Copy link

@j2gg0s j2gg0s commented May 7, 2019

Lease's benchmark were added when we optimize lease expiration check in #9418.

However, some problems exist.

  • For benchmarkLessorFindExpired, lease's ttl is always more than 100 second.
    When call findExpiredLeases, we always only compare leaseHeap's first element ttl with current time and return immediately.

  • Lessor is not primary, so Renew do nothing and Grant ignore ttl.

  • We always call Grant with increasing ttl. leaseHeap degenerates into sorted array.

  • Wrong use size.
    Such as BenchmarkLessorGrant1 and BenchmarkLessorGrant10000. Assume b.N is 500000. For BenchmarkLessorGrant1, we call Grant 500001 times and observer last 500000 call. For BenchmarkLessorGrant10000, we call Grant 510000 times and observer last 500000 call. I don't think the results of these two tests will be too different.

@j2gg0s
Copy link
Author

j2gg0s commented May 7, 2019

The final result

 ~/go/src/github.com/golang/go/bin/go test -bench BenchmarkLessor --benchmem --benchtime 5000000x --timeout 30m0s
goos: darwin
goarch: amd64
pkg: go.etcd.io/etcd/v3/lease
BenchmarkLessorGrant10000-4              5000000              2592 ns/op             843 B/op          9 allocs/op
BenchmarkLessorGrant100000-4             5000000              4054 ns/op             955 B/op         10 allocs/op
BenchmarkLessorRevoke10000-4             5000000               246 ns/op               0 B/op          0 allocs/op
BenchmarkLessorRevoke100000-4            5000000               152 ns/op               0 B/op          0 allocs/op
BenchmarkLessorRenew10000-4              5000000               710 ns/op              56 B/op          1 allocs/op
BenchmarkLessorRenew100000-4             5000000               429 ns/op              56 B/op          1 allocs/op
BenchmarkLessorFindExpired10000-4        5000000              4791 ns/op             128 B/op          1 allocs/op
BenchmarkLessorFindExpired100000-4       5000000              4654 ns/op             134 B/op          1 allocs/op
PASS
ok      go.etcd.io/etcd/v3/lease        1176.713s

Copy link
Contributor

@jingyih jingyih left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for finding the bugs in benchmark and fixing them!!! I just read through the existing benchmark code and agree with you on the bugs you found. For your changes, I read through benchmarkLessorGrant and added few comments.

func BenchmarkLessorRevoke10000(b *testing.B) { benchmarkLessorRevoke(10000, b) }
func BenchmarkLessorRevoke100000(b *testing.B) { benchmarkLessorRevoke(100000, b) }
func BenchmarkLessorRevoke1000000(b *testing.B) { benchmarkLessorRevoke(1000000, b) }
// NOTE: run with --benchtime Nx, please
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why? Is -benchtime 10s not working?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ns working, however the actual use time may exceeding our exceptions because this implementation.

lease git:(refactor-lease-bench-test) ✗ ~/go/src/github.com/golang/go/bin/go test --bench BenchmarkLessorRevoke100000 -benchtime 1s
goos: darwin
goarch: amd64
pkg: go.etcd.io/etcd/v3/lease
BenchmarkLessorRevoke100000-4            8995898               133 ns/op
PASS
ok      go.etcd.io/etcd/v3/lease        83.956s

With benchtime Ns, golang estimate size of benchmark through the last time consuming.
We spend a lot of time to construct fixture and let golang ignore it.

func BenchmarkLessorFindExpired10000(b *testing.B) { benchmarkLessorFindExpired(10000, b) }
func BenchmarkLessorFindExpired100000(b *testing.B) { benchmarkLessorFindExpired(100000, b) }

func benchmarkLessorGrant(size int, b *testing.B) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we document what the benchmark test does? I think a high-level description is useful, so that people can understand the benchmark test without reading code details.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find the current code is wrong, but I don't think my implementation is good enough.
So, I push my code and expect some advices.

After having a clear and certain plan, I am happy to fix these.

le.Promote(0)
for i := 0; i < size; i++ {
le.Grant(LeaseID(i), int64(100+i))
func BenchmarkLessorGrant10000(b *testing.B) { benchmarkLessorGrant(10000, b) }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why removing other BenchmarkLessorGrantXXX?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my opinion, we use benchmark to find:

  • RT under massive data.
  • With the size grows, RT increases significantly?
  • Is there deadlock?

So, I think BenchmarkLessorGrant100 has little meaning.

My implementation make BenchmarkLessorGrant100 use too much time with huge b.N.

~/go/src/github.com/golang/go/bin/go test -bench BenchmarkLessorGrant100$ -benchtime 100000x
goos: darwin
goarch: amd64
pkg: go.etcd.io/etcd/v3/lease
BenchmarkLessorGrant100-4         100000              2296 ns/op
PASS
ok      go.etcd.io/etcd/v3/lease        102.638s

le.mu.Lock() //Stop the findExpiredLeases call in the runloop
defer le.mu.Unlock()

maxTTL := int64((size + refresh) / 100)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I do not understand what is maxTTL and how it is used in this benchmark?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I control the probability of repetition ttl by minimum and maximum value.
Comment is need here.

}
le, tearDown = setUp()
for j := 0; j < size; j++ {
le.Grant(LeaseID(j+1), ttls[j])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to hold le.mu in order to prevent leases from being auto-revoked by the lessor?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

auto-revoked means expire? We can limit the minimum ttl to avoid lease expire before test end.

@jingyih
Copy link
Contributor

jingyih commented May 23, 2019

I find the current code is wrong, but I don't think my implementation is good enough.
So, I push my code and expect some advices.
After having a clear and certain plan, I am happy to fix these.

I do not have brilliant ideas on how to benchmark this. An alternative way would be to start from empty heap in each benchmark run, something like:

for i := 0; i < b.N; i++ {
  new lessor with empty heap
  for j := 0; j < size; j++ {
    // grant lease
  }
}

I think this is a simpler / more easy-to-read way of doing this. But it benchmarks something different than the original.

@jingyih
Copy link
Contributor

jingyih commented May 23, 2019

I also thought about benchmarking lease grant and revoke together (however only time grant or revoke), so that we can keep the heap size constant. But lease revoke does not remove the corresponding item from heap, so heap size still increases after each benchmark run.

new lessor with empty heap
for j := 0; j < size; j++ {
  // grant lease
}
for i := 0; i < b.N; i++ {
  grant lease
  b.StopTimer()
  revoke lease // unfortunately this does not remove lease from heap
  b.StartTimer()
}

@jingyih
Copy link
Contributor

jingyih commented May 24, 2019

I feel like we do not need the benchmark to be super accurate, but the current benchmark is not providing us any relavant informantion. We just need something that is roughly accurate to unblock #10693.

@j2gg0s
Copy link
Author

j2gg0s commented May 29, 2019

@jingyih
I simplify these benchmark and add some comments.
Now, it run fast and the bench result seems reasonable.

go test -run=None -bench .
goos: darwin
goarch: amd64
pkg: go.etcd.io/etcd/lease
BenchmarkLessorGrant1000-4               1000000              2208 ns/op
BenchmarkLessorGrant100000-4              500000              3843 ns/op
BenchmarkLessorRevoke1000-4              1000000              1519 ns/op
BenchmarkLessorRevoke100000-4            1000000              2411 ns/op
BenchmarkLessorRenew1000-4               3000000               508 ns/op
BenchmarkLessorRenew100000-4             3000000               474 ns/op
BenchmarkLessorFindExpired10000-4          30000             40654 ns/op
BenchmarkLessorFindExpired100000-4         20000             76311 ns/op
PASS
ok      go.etcd.io/etcd/lease   412.994s

I also use benchcmp compare result between #10693 and master branch.

benchmark                              old ns/op     new ns/op     delta
BenchmarkLessorGrant1000-4             2204          2316          +5.08%
BenchmarkLessorGrant1000-4             2254          2353          +4.39%
BenchmarkLessorGrant1000-4             2214          2324          +4.97%
BenchmarkLessorGrant1000-4             2228          2337          +4.89%
BenchmarkLessorGrant1000-4             2224          2327          +4.63%

BenchmarkLessorGrant100000-4           3862          3917          +1.42%
BenchmarkLessorGrant100000-4           3817          3908          +2.38%
BenchmarkLessorGrant100000-4           3879          3981          +2.63%
BenchmarkLessorGrant100000-4           3765          3899          +3.56%
BenchmarkLessorGrant100000-4           3898          4290          +10.06%

BenchmarkLessorRevoke1000-4            1514          1506          -0.53%
BenchmarkLessorRevoke1000-4            1496          1492          -0.27%
BenchmarkLessorRevoke1000-4            1516          1520          +0.26%
BenchmarkLessorRevoke1000-4            1500          1495          -0.33%
BenchmarkLessorRevoke1000-4            1508          1501          -0.46%

BenchmarkLessorRevoke100000-4          2390          2440          +2.09%
BenchmarkLessorRevoke100000-4          2468          2468          +0.00%
BenchmarkLessorRevoke100000-4          2433          2441          +0.33%
BenchmarkLessorRevoke100000-4          2383          2424          +1.72%
BenchmarkLessorRevoke100000-4          2314          2411          +4.19%

BenchmarkLessorRenew1000-4             513           520           +1.36%
BenchmarkLessorRenew1000-4             513           520           +1.36%
BenchmarkLessorRenew1000-4             515           523           +1.55%
BenchmarkLessorRenew1000-4             511           517           +1.17%
BenchmarkLessorRenew1000-4             511           520           +1.76%

BenchmarkLessorRenew100000-4           443           562           +26.86%
BenchmarkLessorRenew100000-4           431           553           +28.31%
BenchmarkLessorRenew100000-4           446           581           +30.27%
BenchmarkLessorRenew100000-4           468           593           +26.71%
BenchmarkLessorRenew100000-4           445           573           +28.76%

BenchmarkLessorFindExpired10000-4      42074         64775         +53.95%
BenchmarkLessorFindExpired10000-4      41164         62892         +52.78%
BenchmarkLessorFindExpired10000-4      41028         62781         +53.02%
BenchmarkLessorFindExpired10000-4      40757         62493         +53.33%
BenchmarkLessorFindExpired10000-4      41462         63517         +53.19%

BenchmarkLessorFindExpired100000-4     76540         131566        +71.89%
BenchmarkLessorFindExpired100000-4     75847         130328        +71.83%
BenchmarkLessorFindExpired100000-4     75848         131377        +73.21%
BenchmarkLessorFindExpired100000-4     76425         131513        +72.08%
BenchmarkLessorFindExpired100000-4     76749         132065        +72.07%

@codecov-io
Copy link

Codecov Report

❗ No coverage uploaded for pull request base (master@caee28a). Click here to learn what that means.
The diff coverage is 0%.

Impacted file tree graph

@@            Coverage Diff            @@
##             master   #10710   +/-   ##
=========================================
  Coverage          ?   63.19%           
=========================================
  Files             ?      392           
  Lines             ?    37160           
  Branches          ?        0           
=========================================
  Hits              ?    23482           
  Misses            ?    12091           
  Partials          ?     1587
Impacted Files Coverage Δ
lease/lessor.go 87.74% <0%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update caee28a...2859346. Read the comment docs.

@jingyih
Copy link
Contributor

jingyih commented May 30, 2019

@j2gg0s I will take a look tomorrow. Thanks for doing benchmark for the other PR.

Copy link
Contributor

@jingyih jingyih left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great. Added few comments.

le.Grant(LeaseID(i+size), int64(100+i+size))
// MinLeaseTTL is negative, so we can grant expired lease in benchmark.
le = newLessor(lg, be, LessorConfig{MinLeaseTTL: -100})
le.SetRangeDeleter(func() TxnDelete {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for making revoke actually deleting leases in leaseMap in the test.

@@ -909,3 +909,9 @@ func (fl *FakeLessor) ExpiredLeasesC() <-chan []*Lease { return nil }
func (fl *FakeLessor) Recover(b backend.Backend, rd RangeDeleter) {}

func (fl *FakeLessor) Stop() {}

type FakeTxnDelete struct {
backend.BatchTx
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to include backend BatchTx? Can we simply makes End() to do nothing?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need backend's lock.

  • Benchmark save data to tmp file by backend, and backend only support unsafe op (no lock).
  • Before save/delete data, we need call RangeDeleter to lock.
  • After save/delete data, we need call End to unlock.

And use lock of backend make bench more realistic.

le.Grant(LeaseID(i), int64(100+i))
func benchmarkLessorGrant(benchSize int, b *testing.B) {
// avoid lease expire when benchmark
ttls := randomTTL(benchSize, 10, 100)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How are the min and max chosen? Do they always guarantee that no leases expire during the benchmark?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The best:
we chosen suitable min and max, make the repeat probability of ttls meet the prod situation.
This is too hard. So i choose two random values.

10 second can avoid expire under 99.99%.
And under other 0.01%, the benchmark will run too long.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I do not fully understand. It seems that the safe value of min TTL should depend on benchSize and how fast the test machine runs? Maybe 10s is large enough to cover all cases? Can we use really large values to make it obvious that the leases won't expire during the test?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I change min ttl to 1000 second.

@j2gg0s
Copy link
Author

j2gg0s commented Jun 4, 2019

In benchmark, set ExpiredLeasesRetryInterval with 10 microsecond, so benchmark of findExpired will recheck expired lease.

Copy link
Contributor

@jingyih jingyih left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm after nits.

le.Grant(LeaseID(i+size), int64(100+i+size))
// MinLeaseTTL is negative, so we can grant expired lease in benchmark.
// ExpiredLeasesRetryInterval should small, so benchmark of findExpired will recheck expired lease.
le = newLessor(lg, be, LessorConfig{MinLeaseTTL: -100, ExpiredLeasesRetryInterval: 10 * time.Microsecond})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-100 is not enough for the TTLs in benchmarkLessorFindExpired.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks

b.StartTimer()

// refresh fixture after pop all expired lease
for ;;i++ {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jingyih
Copy link
Contributor

jingyih commented Jun 5, 2019

lgtm

@xiang90 xiang90 merged commit f6a9ebe into etcd-io:master Jun 8, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants