Jwinkler2083233/eject fake random #945

jwinkler2083233 · 2021-07-12T19:07:12Z

This is a solution for:
https://github.com/dapperlabs/flow-go/issues/5645

EjectTrueRandom appears to be saturating CPUs, so this changeset is for testing the alternative.

These changes are to eliminate a majority of the linear searching that was occurring during the EjectTrueRandom() method.

This has some updates to the eject logic

Removed a couple of magic numbers and used constants instead.

Improved comments

This changeset alters the unit tests to handle changes to the eject logic.

…/flow-go into jwinkler2083233/eject-fake-random

This has a minor fix in whitespace for lint

This fixes a whitespace error on line 31

Fixes up whitespace

The unit tests are updated to reflect the change in logic for the eject feature. The ejections are now batched in groups of 64 and triggered by an accumulation that is greater than or equal to 128 over the limit.

codecov-commenter · 2021-07-13T02:25:53Z

Codecov Report

Merging #945 (21ca55d) into master (8e0d32b) will decrease coverage by 0.08%.
The diff coverage is 38.09%.

@@            Coverage Diff             @@
##           master     #945      +/-   ##
==========================================
- Coverage   53.38%   53.29%   -0.09%     
==========================================
  Files         318      318              
  Lines       21491    21511      +20     
==========================================
- Hits        11472    11465       -7     
- Misses       8450     8475      +25     
- Partials     1569     1571       +2

Flag	Coverage Δ
unittests	`53.29% <38.09%> (-0.09%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
module/mempool/stdmap/incorporated_result_seals.go	`80.82% <0.00%> (ø)`
module/mempool/stdmap/eject.go	`47.36% <21.42%> (-34.22%)`	⬇️
module/mempool/stdmap/backend.go	`92.04% <72.72%> (ø)`
module/mempool/stdmap/options.go	`100.00% <100.00%> (ø)`
module/mempool/stdmap/pending_receipts.go	`68.04% <0.00%> (-4.13%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8e0d32b...21ca55d. Read the comment docs.

AlexHentschel

Thanks Jeff. This is an interesting PR. I am not quite clear what the stochastic guarantees are that the EjectTrueRandom policy is supposed to satisfy.

Context

From a mathematical standpoint, there is one safe answer: a simple random sample (which is actually not simple to implement efficiently):

consider a mempool with N elements and the desire to eject k
what we desire is that each subset of k elements has the same probability of being chosen

Any sampling method that does not generate a simple random sample introduces a bias, which could be exploited to compromise liveness.

If we don't care about unbiased sampling, we can just eject the first k elements
Based on the terminology of "TrueRandom" in EjectTrueRandom I would assume that we actually want an unbiased sample (?)

So does this implementation produce an unbiased sample? Unfortunately not:

Consider a mempool with N=4096 = 64 * 64 elements from which we want to eject k=64.
- An unbiased sampling algorithm must be able to generate the sample 0, 1, 2, ..., 63 as one possible outcomes.
- But the current implementation can only pick a single index from the interval [0, 63] and then picks the second index from [64, 127] and so on. It is quite biased in that it enforces a fairly uniform coverage of the total index space.
As a second example, consider a mempool N=4100 = 64 * 64 + 4. Due to the integer devision

flow-go/module/mempool/stdmap/eject.go

Line 85 in 070c71e

maxInterval := mapSize / batchSize

the last 4 elements cannot be ejected.

So overall, we can see that the ejection policy is not "true random", it is in fact quite biased.

suggestions

Fact of the matter is that the previous implementation had a an unbiased sampling policy, which is provably safe (but with a horrible performance). You have changed the default behaviour to an unsafe (yet more performant) ejection policy. This concerns me, because you might be breaking safety assumptions (I don't think this is actually the case ... but nevertheless, its not a good practise to change the default behaviour from safe to unsafe).

So what can we do? I think there are a few options.

In any case

We should make it unambiguously clear that the fast ejection policy is statistically biased:

choose a more accurate name, e.g. FastBiasedRandom
and include a detailed documentation in which way this ejection policy is biased (feel free to reuse my examples above)

Default behaviour

I think the default behaviour should remain a safe ejection policy. The nodes can always use the unsafe, more performant policy with careful consideration.

There are efficient unbiased sampling algorithms, which we could implement

The old implementation had complexity O(k*N). I think we should use your trick of ejecting in batches of constant size. But maybe we can use a provably unbiased sampling algorithm? There are some which are only slightly less costly than your implementation. Suggestion:

you want to sample k elements from the index set [0, 1, ..., N-1]
The basic approach would be a Fisher-Yates, but it requires you to enumerate the full index set. Though, your initial population P would be P[i] = i.
Instead of writing P out, you can only memorize the slots of P which you picked.

Happy to write up a pseudo-code algorithm, if that would help. I feel then we have a good set of algorithms:

FakeRandomSampling if you don't meed unbiased sampling
a solution that is much more efficient than the current EjectTrueRandom (utilizing your idea of batch ejection) that still produces an unbiased sample

module/mempool/stdmap/eject.go

module/mempool/stdmap/backend.go

Co-authored-by: Alexander Hentschel <alex.hentschel@axiomzen.co>

jwinkler2083233 · 2021-07-16T15:00:21Z

I agree completely that the new logic has bias. The bias we're seeing is very small (<0.1%), but it should be eliminated nonetheless.

As a fix, how about I modify the part that generates random indexes, so that the last index uses that actual remaining count of values to determine a random index. That would theoretically eliminate the bias in cases where the count of elements in the container is not evenly divisible by 64. 64 is an arbitrary value I chose because the compiler generates optimal code via bit shifting when 64 is the modulus.

This changeset tweaks the algorithm for ejecting elements from the cache, such that there is less bias against the remaining elements at the end of the cache.

module/mempool/stdmap/eject.go

AlexHentschel · 2021-07-16T20:29:54Z

the last index uses that actual remaining count of values to determine a random index

that would eliminate some bias that some elements have zero probability to be ejected.

Based on my understanding, the first bias:

Consider a mempool with N=4096 = 64 * 64 elements from which we want to eject k=64.

An unbiased sampling algorithm must be able to generate the sample 0, 1, 2, ..., 63 as one possible outcomes.

But the current implementation can only pick a single index from the interval [0, 63] and then picks the second index from [64, 127] and so on. It is quite biased in that it enforces a fairly uniform coverage of the total index space.

still remains (?)

…/flow-go into jwinkler2083233/eject-fake-random

This changeset has a larger population used for the random values, and those values are then sorted before they are used.

…33/eject-fake-random

Co-authored-by: François Garillot <4142+huitseeker@users.noreply.github.com>

I checked this in godbolt, and Go is using 10 instructions less when a tuple is used for a swap.

jwinkler2083233 · 2021-07-23T23:02:11Z

https://github.com/dapperlabs/flow-go/issues/5645 is the related issue

AlexHentschel

Some high-level design comments:

(1) Allowing the ejector to go below the specified capacity limit

removed as this was a misunderstanding on my part interpreting the test case.

(2) Return values of the `EjectFunc`

With the updated convention for the EjectFunc, the return values don't make sense for me anymore. We specifically allow the ejector to remove multiple elements, but the ejector can only return a single one. That doesn't make sense to me.

Suggestion:

EjectFunc func(b *Backend)

This is the API from the view-point of the Backend. It doesn't care what the ejector does specifically. If you need additional logic around the ejector (as your comment seems to indicate), I would suggest wrapping the ejector itself.

(3) Documentation of the `EjectFunc`

I think it would be worth-while to clearly specify the conventions an ejector implementation has to satisfy. Currently, it states:

flow-go/module/mempool/stdmap/eject.go

Lines 13 to 17 in 43f2a8f

    
           // EjectFunc is a function used to pick an entity to evict from the memory pool 
        
           // backend when it overflows its limit. A custom eject function can be injected 
        
           // into the memory pool upon creation, which allows us to hook into the eject 
        
           // to clean up auxiliary data and/or to change the strategy of eviction. 
        
           type EjectFunc func(b *Backend) (flow.Identifier, flow.Entity, bool)

Here is my interpretation of the convention:

// EjectFunc implements an ejection policy to remove elements when the mempool
// exceeds its specified capacity. A custom ejection policy can be injected
// into the memory pool upon creation to change the strategy of eviction.
// The ejection policy is executed from within the thread that serves the
// mempool. Implementations should adhere to the following convention:
//  * The ejector function has the freedom to eject _multiple_ elements.
//  * In a single `eject` call, it must eject as many elements to statistically
//    keep the mempool size within the desired limit.
//  * The ejector _might_ (for performance reasons) retain more elements in the
//    mempool than the targeted capacity.
//  * The ejector _must_ notify the `Backend.ejectionCallbacks` for _each_
//    element it removes from the mempool.
//  * Implementations do _not_ need to be concurrency safe. The Backend handles
//    concurrency (specifically, it locks the mempool during ejection).
//  * The implementation should be non-blocking (though, it is allowed to
//    take a bit of time; the mempool will just be locked during this time).
type EjectFunc func(b *Backend)

In this PR, I have roughly implemented my suggestions https://github.com/onflow/flow-go/pull/1025/files

module/mempool/stdmap/backend.go

module/mempool/stdmap/backend_test.go

module/mempool/stdmap/eject.go

module/mempool/stdmap/incorporated_result_seals.go

Co-authored-by: Alexander Hentschel <alex.hentschel@axiomzen.co>

This has some changes that were recommended in code review.

module/mempool/stdmap/eject.go

module/mempool/stdmap/backend_test.go

AlexHentschel

Looks good. There is a panicking edge-case in the EjectTrueRandomFast (see #945 (comment)); easy to fix. Approving this PR trusting that this is fixed before merging.

Thanks

Co-authored-by: Alexander Hentschel <alex.hentschel@axiomzen.co>

huitseeker

Thanks! Batch ejection makes sense, I appreciate you taking the time to also make it legible!

Jeff Winkler and others added 6 commits July 3, 2021 21:35

performance: test using EjectFakeRandom

b18b866

EjectTrueRandom appears to be saturating CPUs, so this changeset is for testing the alternative.

performance: Changes to eliminate the linear searching for eject

e5ca095

These changes are to eliminate a majority of the linear searching that was occurring during the EjectTrueRandom() method.

performance: updates to the eject logic

c21586f

This has some updates to the eject logic

performance: Changes for code review

769cc7e

Removed a couple of magic numbers and used constants instead.

performance: Improved comments

b4d87bb

Improved comments

Merge branch 'master' into jwinkler2083233/eject-fake-random

edf3787

jwinkler2083233 requested review from yhassanzadeh13, AlexHentschel, jordanschalm and zhangchiqing July 12, 2021 19:23

Jeff Winkler added 6 commits July 12, 2021 17:36

performance: Alter tests for new eject logic

e1b69b8

This changeset alters the unit tests to handle changes to the eject logic.

Merge branch 'jwinkler2083233/eject-fake-random' of github.com:onflow…

05c1a6b

…/flow-go into jwinkler2083233/eject-fake-random

performance: Changes for lint pass

0b06583

This has a minor fix in whitespace for lint

performance: Fix whitespace for linter

922177d

This fixes a whitespace error on line 31

performance: Fix for linter

fd6c58d

Fixes up whitespace

performance: Changes to unit tests to reflect change to eject

c633b00

The unit tests are updated to reflect the change in logic for the eject feature. The ejections are now batched in groups of 64 and triggered by an accumulation that is greater than or equal to 128 over the limit.

jwinkler2083233 added 2 commits July 13, 2021 14:11

Merge branch 'master' into jwinkler2083233/eject-fake-random

724237c

Merge branch 'master' into jwinkler2083233/eject-fake-random

070c71e

AlexHentschel reviewed Jul 16, 2021

View reviewed changes

module/mempool/stdmap/eject.go Outdated Show resolved Hide resolved

module/mempool/stdmap/eject.go Outdated Show resolved Hide resolved

module/mempool/stdmap/backend.go Outdated Show resolved Hide resolved

AlexHentschel reviewed Jul 16, 2021

View reviewed changes

module/mempool/stdmap/backend.go Outdated Show resolved Hide resolved

jwinkler2083233 and others added 2 commits July 16, 2021 09:39

Update module/mempool/stdmap/eject.go

227b09c

Co-authored-by: Alexander Hentschel <alex.hentschel@axiomzen.co>

Update module/mempool/stdmap/eject.go

8f14cb0

Co-authored-by: Alexander Hentschel <alex.hentschel@axiomzen.co>

performance: modifications pursuant to the code review

05a8e7c

This changeset tweaks the algorithm for ejecting elements from the cache, such that there is less bias against the remaining elements at the end of the cache.

jordanschalm reviewed Jul 16, 2021

View reviewed changes

module/mempool/stdmap/eject.go Outdated Show resolved Hide resolved

module/mempool/stdmap/eject.go Show resolved Hide resolved

module/mempool/stdmap/eject.go Outdated Show resolved Hide resolved

Jeff Winkler added 3 commits July 19, 2021 09:40

Merge branch 'jwinkler2083233/eject-fake-random' of github.com:onflow…

8df9cfe

…/flow-go into jwinkler2083233/eject-fake-random

performance: Updates related to the code review

791675c

This changeset has a larger population used for the random values, and those values are then sorted before they are used.

Merge branch 'master' of github.com:onflow/flow-go into jwinkler20832…

e0bffe6

…33/eject-fake-random

jwinkler2083233 and others added 3 commits July 23, 2021 14:52

Update module/mempool/stdmap/eject.go

935fac2

Co-authored-by: François Garillot <4142+huitseeker@users.noreply.github.com>

Merge branch 'master' into jwinkler2083233/eject-fake-random

d93d266

performance: Use tuple for swap operation

43f2a8f

I checked this in godbolt, and Go is using 10 instructions less when a tuple is used for a swap.

Merge branch 'master' into jwinkler2083233/eject-fake-random

b691031

AlexHentschel pushed a commit that referenced this pull request Jul 23, 2021

suggestions for PR #945

b6013cb

AlexHentschel mentioned this pull request Jul 24, 2021

suggestions for Unbiased Random Ejection (PR #945) #1025

Closed

AlexHentschel reviewed Jul 24, 2021

View reviewed changes

jwinkler2083233 and others added 10 commits July 25, 2021 12:07

Merge branch 'master' into jwinkler2083233/eject-fake-random

7fdd764

Merge branch 'master' into jwinkler2083233/eject-fake-random

7043af6

Update module/mempool/stdmap/eject.go

4f9c022

Co-authored-by: Alexander Hentschel <alex.hentschel@axiomzen.co>

Update module/mempool/stdmap/eject.go

c3197eb

Co-authored-by: Alexander Hentschel <alex.hentschel@axiomzen.co>

Update module/mempool/stdmap/incorporated_result_seals.go

caf9766

Co-authored-by: Alexander Hentschel <alex.hentschel@axiomzen.co>

Update module/mempool/stdmap/eject.go

62c9a99

Co-authored-by: Alexander Hentschel <alex.hentschel@axiomzen.co>

performance: changes for passing code review

1553191

This has some changes that were recommended in code review.

performance: added additional comments

b06ddc3

performance: cleanup for code review

5e5a798

Merge branch 'master' into jwinkler2083233/eject-fake-random

61199d6

AlexHentschel reviewed Jul 27, 2021

View reviewed changes

module/mempool/stdmap/eject.go Outdated Show resolved Hide resolved

module/mempool/stdmap/backend_test.go Show resolved Hide resolved

AlexHentschel approved these changes Jul 27, 2021

View reviewed changes

jwinkler2083233 and others added 5 commits July 27, 2021 12:19

Merge branch 'master' into jwinkler2083233/eject-fake-random

42351dc

Update module/mempool/stdmap/eject.go

44fe725

Co-authored-by: Alexander Hentschel <alex.hentschel@axiomzen.co>

Merge branch 'master' into jwinkler2083233/eject-fake-random

f4476c4

Merge branch 'master' into jwinkler2083233/eject-fake-random

d94b142

Merge branch 'master' into jwinkler2083233/eject-fake-random

cd459ab

huitseeker approved these changes Jul 28, 2021

View reviewed changes

jwinkler2083233 added 2 commits July 29, 2021 17:48

Merge branch 'master' into jwinkler2083233/eject-fake-random

21ca55d

Merge branch 'master' into jwinkler2083233/eject-fake-random

48c7386

jwinkler2083233 merged commit f2547a2 into master Jul 30, 2021

jwinkler2083233 deleted the jwinkler2083233/eject-fake-random branch July 30, 2021 16:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Jwinkler2083233/eject fake random #945

Jwinkler2083233/eject fake random #945

jwinkler2083233 commented Jul 12, 2021

codecov-commenter commented Jul 13, 2021 •

edited

AlexHentschel left a comment

jwinkler2083233 commented Jul 16, 2021

AlexHentschel commented Jul 16, 2021 •

edited

jwinkler2083233 commented Jul 23, 2021 •

edited

AlexHentschel left a comment •

edited

AlexHentschel left a comment •

edited

huitseeker left a comment

	// EjectFunc is a function used to pick an entity to evict from the memory pool
	// backend when it overflows its limit. A custom eject function can be injected
	// into the memory pool upon creation, which allows us to hook into the eject
	// to clean up auxiliary data and/or to change the strategy of eviction.
	type EjectFunc func(b *Backend) (flow.Identifier, flow.Entity, bool)

Jwinkler2083233/eject fake random #945

Jwinkler2083233/eject fake random #945

Conversation

jwinkler2083233 commented Jul 12, 2021

codecov-commenter commented Jul 13, 2021 • edited

Codecov Report

AlexHentschel left a comment

Choose a reason for hiding this comment

Context

suggestions

In any case

Default behaviour

There are efficient unbiased sampling algorithms, which we could implement

jwinkler2083233 commented Jul 16, 2021

AlexHentschel commented Jul 16, 2021 • edited

jwinkler2083233 commented Jul 23, 2021 • edited

AlexHentschel left a comment • edited

Choose a reason for hiding this comment

(1) Allowing the ejector to go below the specified capacity limit

(2) Return values of the EjectFunc

(3) Documentation of the EjectFunc

AlexHentschel left a comment • edited

Choose a reason for hiding this comment

huitseeker left a comment

Choose a reason for hiding this comment

codecov-commenter commented Jul 13, 2021 •

edited

AlexHentschel commented Jul 16, 2021 •

edited

jwinkler2083233 commented Jul 23, 2021 •

edited

AlexHentschel left a comment •

edited

(2) Return values of the `EjectFunc`

(3) Documentation of the `EjectFunc`

AlexHentschel left a comment •

edited