lease: refactor lease's benchmark. #10710

j2gg0s · 2019-05-07T07:08:47Z

Lease's benchmark were added when we optimize lease expiration check in #9418.

However, some problems exist.

For benchmarkLessorFindExpired, lease's ttl is always more than 100 second.
When call findExpiredLeases, we always only compare leaseHeap's first element ttl with current time and return immediately.
Lessor is not primary, so Renew do nothing and Grant ignore ttl.
We always call Grant with increasing ttl. leaseHeap degenerates into sorted array.
Wrong use size.
Such as BenchmarkLessorGrant1 and BenchmarkLessorGrant10000. Assume b.N is 500000. For BenchmarkLessorGrant1, we call Grant 500001 times and observer last 500000 call. For BenchmarkLessorGrant10000, we call Grant 510000 times and observer last 500000 call. I don't think the results of these two tests will be too different.

j2gg0s · 2019-05-07T07:09:25Z

The final result

 ~/go/src/github.com/golang/go/bin/go test -bench BenchmarkLessor --benchmem --benchtime 5000000x --timeout 30m0s
goos: darwin
goarch: amd64
pkg: go.etcd.io/etcd/v3/lease
BenchmarkLessorGrant10000-4              5000000              2592 ns/op             843 B/op          9 allocs/op
BenchmarkLessorGrant100000-4             5000000              4054 ns/op             955 B/op         10 allocs/op
BenchmarkLessorRevoke10000-4             5000000               246 ns/op               0 B/op          0 allocs/op
BenchmarkLessorRevoke100000-4            5000000               152 ns/op               0 B/op          0 allocs/op
BenchmarkLessorRenew10000-4              5000000               710 ns/op              56 B/op          1 allocs/op
BenchmarkLessorRenew100000-4             5000000               429 ns/op              56 B/op          1 allocs/op
BenchmarkLessorFindExpired10000-4        5000000              4791 ns/op             128 B/op          1 allocs/op
BenchmarkLessorFindExpired100000-4       5000000              4654 ns/op             134 B/op          1 allocs/op
PASS
ok      go.etcd.io/etcd/v3/lease        1176.713s

jingyih

Thanks for finding the bugs in benchmark and fixing them!!! I just read through the existing benchmark code and agree with you on the bugs you found. For your changes, I read through benchmarkLessorGrant and added few comments.

jingyih · 2019-05-21T21:43:56Z

lease/lessor_bench_test.go

-func BenchmarkLessorRevoke10000(b *testing.B)   { benchmarkLessorRevoke(10000, b) }
-func BenchmarkLessorRevoke100000(b *testing.B)  { benchmarkLessorRevoke(100000, b) }
-func BenchmarkLessorRevoke1000000(b *testing.B) { benchmarkLessorRevoke(1000000, b) }
+// NOTE: run with --benchtime Nx, please


why? Is -benchtime 10s not working?

Ns working, however the actual use time may exceeding our exceptions because this implementation.

lease git:(refactor-lease-bench-test) ✗ ~/go/src/github.com/golang/go/bin/go test --bench BenchmarkLessorRevoke100000 -benchtime 1s goos: darwin goarch: amd64 pkg: go.etcd.io/etcd/v3/lease BenchmarkLessorRevoke100000-4 8995898 133 ns/op PASS ok go.etcd.io/etcd/v3/lease 83.956s

With benchtime Ns, golang estimate size of benchmark through the last time consuming.
We spend a lot of time to construct fixture and let golang ignore it.

jingyih · 2019-05-21T21:46:33Z

lease/lessor_bench_test.go

+func BenchmarkLessorFindExpired10000(b *testing.B)  { benchmarkLessorFindExpired(10000, b) }
+func BenchmarkLessorFindExpired100000(b *testing.B) { benchmarkLessorFindExpired(100000, b) }
+
+func benchmarkLessorGrant(size int, b *testing.B) {


Can we document what the benchmark test does? I think a high-level description is useful, so that people can understand the benchmark test without reading code details.

I find the current code is wrong, but I don't think my implementation is good enough.
So, I push my code and expect some advices.

After having a clear and certain plan, I am happy to fix these.

jingyih · 2019-05-21T22:10:10Z

lease/lessor_bench_test.go

-	le.Promote(0)
-	for i := 0; i < size; i++ {
-		le.Grant(LeaseID(i), int64(100+i))
+func BenchmarkLessorGrant10000(b *testing.B)  { benchmarkLessorGrant(10000, b) }


Why removing other BenchmarkLessorGrantXXX?

In my opinion, we use benchmark to find:

RT under massive data.

With the size grows, RT increases significantly?

Is there deadlock?

So, I think BenchmarkLessorGrant100 has little meaning.

My implementation make BenchmarkLessorGrant100 use too much time with huge b.N.

~/go/src/github.com/golang/go/bin/go test -bench BenchmarkLessorGrant100$ -benchtime 100000x goos: darwin goarch: amd64 pkg: go.etcd.io/etcd/v3/lease BenchmarkLessorGrant100-4 100000 2296 ns/op PASS ok go.etcd.io/etcd/v3/lease 102.638s

jingyih · 2019-05-21T22:11:43Z

lease/lessor_bench_test.go

-	le.mu.Lock() //Stop the findExpiredLeases call in the runloop
-	defer le.mu.Unlock()
+
+	maxTTL := int64((size + refresh) / 100)


Sorry I do not understand what is maxTTL and how it is used in this benchmark?

I control the probability of repetition ttl by minimum and maximum value.
Comment is need here.

jingyih · 2019-05-21T22:15:35Z

lease/lessor_bench_test.go

+			}
+			le, tearDown = setUp()
+			for j := 0; j < size; j++ {
+				le.Grant(LeaseID(j+1), ttls[j])


Do we need to hold le.mu in order to prevent leases from being auto-revoked by the lessor?

auto-revoked means expire? We can limit the minimum ttl to avoid lease expire before test end.

jingyih · 2019-05-23T23:51:52Z

I find the current code is wrong, but I don't think my implementation is good enough.
So, I push my code and expect some advices.
After having a clear and certain plan, I am happy to fix these.

I do not have brilliant ideas on how to benchmark this. An alternative way would be to start from empty heap in each benchmark run, something like:

for i := 0; i < b.N; i++ {
  new lessor with empty heap
  for j := 0; j < size; j++ {
    // grant lease
  }
}

I think this is a simpler / more easy-to-read way of doing this. But it benchmarks something different than the original.

jingyih · 2019-05-23T23:56:21Z

I also thought about benchmarking lease grant and revoke together (however only time grant or revoke), so that we can keep the heap size constant. But lease revoke does not remove the corresponding item from heap, so heap size still increases after each benchmark run.

new lessor with empty heap
for j := 0; j < size; j++ {
  // grant lease
}
for i := 0; i < b.N; i++ {
  grant lease
  b.StopTimer()
  revoke lease // unfortunately this does not remove lease from heap
  b.StartTimer()
}

jingyih · 2019-05-24T00:00:28Z

I feel like we do not need the benchmark to be super accurate, but the current benchmark is not providing us any relavant informantion. We just need something that is roughly accurate to unblock #10693.

j2gg0s · 2019-05-29T15:22:41Z

@jingyih
I simplify these benchmark and add some comments.
Now, it run fast and the bench result seems reasonable.

go test -run=None -bench .
goos: darwin
goarch: amd64
pkg: go.etcd.io/etcd/lease
BenchmarkLessorGrant1000-4               1000000              2208 ns/op
BenchmarkLessorGrant100000-4              500000              3843 ns/op
BenchmarkLessorRevoke1000-4              1000000              1519 ns/op
BenchmarkLessorRevoke100000-4            1000000              2411 ns/op
BenchmarkLessorRenew1000-4               3000000               508 ns/op
BenchmarkLessorRenew100000-4             3000000               474 ns/op
BenchmarkLessorFindExpired10000-4          30000             40654 ns/op
BenchmarkLessorFindExpired100000-4         20000             76311 ns/op
PASS
ok      go.etcd.io/etcd/lease   412.994s

I also use benchcmp compare result between #10693 and master branch.

benchmark                              old ns/op     new ns/op     delta
BenchmarkLessorGrant1000-4             2204          2316          +5.08%
BenchmarkLessorGrant1000-4             2254          2353          +4.39%
BenchmarkLessorGrant1000-4             2214          2324          +4.97%
BenchmarkLessorGrant1000-4             2228          2337          +4.89%
BenchmarkLessorGrant1000-4             2224          2327          +4.63%

BenchmarkLessorGrant100000-4           3862          3917          +1.42%
BenchmarkLessorGrant100000-4           3817          3908          +2.38%
BenchmarkLessorGrant100000-4           3879          3981          +2.63%
BenchmarkLessorGrant100000-4           3765          3899          +3.56%
BenchmarkLessorGrant100000-4           3898          4290          +10.06%

BenchmarkLessorRevoke1000-4            1514          1506          -0.53%
BenchmarkLessorRevoke1000-4            1496          1492          -0.27%
BenchmarkLessorRevoke1000-4            1516          1520          +0.26%
BenchmarkLessorRevoke1000-4            1500          1495          -0.33%
BenchmarkLessorRevoke1000-4            1508          1501          -0.46%

BenchmarkLessorRevoke100000-4          2390          2440          +2.09%
BenchmarkLessorRevoke100000-4          2468          2468          +0.00%
BenchmarkLessorRevoke100000-4          2433          2441          +0.33%
BenchmarkLessorRevoke100000-4          2383          2424          +1.72%
BenchmarkLessorRevoke100000-4          2314          2411          +4.19%

BenchmarkLessorRenew1000-4             513           520           +1.36%
BenchmarkLessorRenew1000-4             513           520           +1.36%
BenchmarkLessorRenew1000-4             515           523           +1.55%
BenchmarkLessorRenew1000-4             511           517           +1.17%
BenchmarkLessorRenew1000-4             511           520           +1.76%

BenchmarkLessorRenew100000-4           443           562           +26.86%
BenchmarkLessorRenew100000-4           431           553           +28.31%
BenchmarkLessorRenew100000-4           446           581           +30.27%
BenchmarkLessorRenew100000-4           468           593           +26.71%
BenchmarkLessorRenew100000-4           445           573           +28.76%

BenchmarkLessorFindExpired10000-4      42074         64775         +53.95%
BenchmarkLessorFindExpired10000-4      41164         62892         +52.78%
BenchmarkLessorFindExpired10000-4      41028         62781         +53.02%
BenchmarkLessorFindExpired10000-4      40757         62493         +53.33%
BenchmarkLessorFindExpired10000-4      41462         63517         +53.19%

BenchmarkLessorFindExpired100000-4     76540         131566        +71.89%
BenchmarkLessorFindExpired100000-4     75847         130328        +71.83%
BenchmarkLessorFindExpired100000-4     75848         131377        +73.21%
BenchmarkLessorFindExpired100000-4     76425         131513        +72.08%
BenchmarkLessorFindExpired100000-4     76749         132065        +72.07%

codecov-io · 2019-05-29T17:00:36Z

Codecov Report

❗ No coverage uploaded for pull request base (master@caee28a). Click here to learn what that means.
The diff coverage is 0%.

@@            Coverage Diff            @@
##             master   #10710   +/-   ##
=========================================
  Coverage          ?   63.19%           
=========================================
  Files             ?      392           
  Lines             ?    37160           
  Branches          ?        0           
=========================================
  Hits              ?    23482           
  Misses            ?    12091           
  Partials          ?     1587

Impacted Files	Coverage Δ
lease/lessor.go	`87.74% <0%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update caee28a...2859346. Read the comment docs.

jingyih · 2019-05-30T03:58:11Z

@j2gg0s I will take a look tomorrow. Thanks for doing benchmark for the other PR.

jingyih

Looks great. Added few comments.

jingyih · 2019-05-30T23:16:56Z

lease/lessor_bench_test.go

-		le.Grant(LeaseID(i+size), int64(100+i+size))
+	// MinLeaseTTL is negative, so we can grant expired lease in benchmark.
+	le = newLessor(lg, be, LessorConfig{MinLeaseTTL: -100})
+	le.SetRangeDeleter(func() TxnDelete {


Thanks for making revoke actually deleting leases in leaseMap in the test.

jingyih · 2019-05-30T23:22:41Z

lease/lessor.go

@@ -909,3 +909,9 @@ func (fl *FakeLessor) ExpiredLeasesC() <-chan []*Lease { return nil }
 func (fl *FakeLessor) Recover(b backend.Backend, rd RangeDeleter) {}

 func (fl *FakeLessor) Stop() {}
+
+type FakeTxnDelete struct {
+	backend.BatchTx


Do we need to include backend BatchTx? Can we simply makes End() to do nothing?

We need backend's lock.

Benchmark save data to tmp file by backend, and backend only support unsafe op (no lock).

Before save/delete data, we need call RangeDeleter to lock.

After save/delete data, we need call End to unlock.

And use lock of backend make bench more realistic.

jingyih · 2019-05-30T23:29:06Z

lease/lessor_bench_test.go

-		le.Grant(LeaseID(i), int64(100+i))
+func benchmarkLessorGrant(benchSize int, b *testing.B) {
+	// avoid lease expire when benchmark
+	ttls := randomTTL(benchSize, 10, 100)


How are the min and max chosen? Do they always guarantee that no leases expire during the benchmark?

The best:
we chosen suitable min and max, make the repeat probability of ttls meet the prod situation.
This is too hard. So i choose two random values.

10 second can avoid expire under 99.99%.
And under other 0.01%, the benchmark will run too long.

Sorry I do not fully understand. It seems that the safe value of min TTL should depend on benchSize and how fast the test machine runs? Maybe 10s is large enough to cover all cases? Can we use really large values to make it obvious that the leases won't expire during the test?

Ok, I change min ttl to 1000 second.

j2gg0s · 2019-06-04T13:04:21Z

In benchmark, set ExpiredLeasesRetryInterval with 10 microsecond, so benchmark of findExpired will recheck expired lease.

jingyih

lgtm after nits.

jingyih · 2019-06-04T21:39:52Z

lease/lessor_bench_test.go

-		le.Grant(LeaseID(i+size), int64(100+i+size))
+	// MinLeaseTTL is negative, so we can grant expired lease in benchmark.
+	// ExpiredLeasesRetryInterval should small, so benchmark of findExpired will recheck expired lease.
+	le = newLessor(lg, be, LessorConfig{MinLeaseTTL: -100, ExpiredLeasesRetryInterval: 10 * time.Microsecond})


-100 is not enough for the TTLs in benchmarkLessorFindExpired.

jingyih · 2019-06-04T21:42:58Z

lease/lessor_bench_test.go

+		b.StartTimer()
+
+		// refresh fixture after pop all expired lease
+		for ;;i++ {


Test failed due to formatting:
https://travis-ci.com/etcd-io/etcd/jobs/205296470#L588

jingyih · 2019-06-05T02:35:59Z

lgtm

jingyih self-assigned this May 21, 2019

jingyih mentioned this pull request May 21, 2019

lease/lessor: recheck if exprired lease is revoked #10693

Merged

jingyih reviewed May 21, 2019

View reviewed changes

jingyih reviewed May 30, 2019

View reviewed changes

jingyih reviewed Jun 4, 2019

View reviewed changes

lease: refactor benchmark.

6a7ee70

xiang90 merged commit f6a9ebe into etcd-io:master Jun 8, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

lease: refactor lease's benchmark. #10710

lease: refactor lease's benchmark. #10710

j2gg0s commented May 7, 2019

j2gg0s commented May 7, 2019

jingyih left a comment

jingyih May 21, 2019

j2gg0s May 22, 2019

jingyih May 21, 2019

j2gg0s May 22, 2019

jingyih May 21, 2019

j2gg0s May 22, 2019

jingyih May 21, 2019

j2gg0s May 22, 2019

jingyih May 21, 2019

j2gg0s May 22, 2019

jingyih commented May 23, 2019

jingyih commented May 23, 2019

jingyih commented May 24, 2019

j2gg0s commented May 29, 2019

codecov-io commented May 29, 2019

jingyih commented May 30, 2019

jingyih left a comment

jingyih May 30, 2019

jingyih May 30, 2019

j2gg0s Jun 3, 2019

jingyih May 30, 2019

j2gg0s Jun 3, 2019

jingyih Jun 3, 2019

j2gg0s Jun 4, 2019

j2gg0s commented Jun 4, 2019

jingyih left a comment

jingyih Jun 4, 2019

j2gg0s Jun 5, 2019

jingyih Jun 4, 2019

jingyih commented Jun 5, 2019

lease: refactor lease's benchmark. #10710

lease: refactor lease's benchmark. #10710

Conversation

j2gg0s commented May 7, 2019

j2gg0s commented May 7, 2019

jingyih left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jingyih commented May 23, 2019

jingyih commented May 23, 2019

jingyih commented May 24, 2019

j2gg0s commented May 29, 2019

codecov-io commented May 29, 2019

Codecov Report

jingyih commented May 30, 2019

jingyih left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

j2gg0s commented Jun 4, 2019

jingyih left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jingyih commented Jun 5, 2019