[FEATURE BRANCH] More work on shuffle sharding utils #81182

MikeSpreitzer · 2019-08-08T18:59:26Z

What type of PR is this?

Uncomment only one /kind <> line, hit enter to put that in a new line, and remove leading whitespaces from that line:

/kind api-change
/kind bug

/kind cleanup

/kind design
/kind documentation
/kind failing-test
/kind feature
/kind flake

What this PR does / why we need it:
Changes following up on PR #80710 .

Made the validation checking function return a slice of error messages
rather than just a bit.

Made the shard-to-slice functions return []int rather than []int32 on
the expectation of increased convenience.

Made the hand uniformity tester avoid reflection, use a number of hash
values corresponding to the validation check, and evaluate a histogram
of hand counts.

Which issue(s) this PR fixes:

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

NONE

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

[KEP] https://github.com/kubernetes/enhancements/blob/master/keps/sig-api-machinery/20190228-priority-and-fairness.md

MikeSpreitzer · 2019-08-08T19:00:01Z

/sig api-machinery

MikeSpreitzer · 2019-08-08T19:00:17Z

/cc @yue9944882
/cc @aaron-prindle
@mars1024

MikeSpreitzer · 2019-08-08T20:18:51Z

staging/src/k8s.io/apiserver/pkg/util/shufflesharding/shufflesharding_test.go

-			return hands[i] < hands[j]
-		})
+	for i := uint64(0); i < hashMax; i++ {
+		ShuffleAndDealIntoHand(i, deckSize, aHand[:])


FYI, using i rather than rand.Uint64() as the hash value made the distribution in the histogram much tighter. Using i does an even sampling of the hash value space (modulo the fact that the falling factorial is not a power of two), whereas using rand.Uint64() relies on the quality of rand (which evidently is not that great).

BTW, if we set hashMax touint64(fallingFactorial) then every hand gets dealt the same number of times.

fedebongio · 2019-08-08T20:22:08Z

/cc @yliaog

yue9944882 · 2019-08-09T03:58:04Z

staging/src/k8s.io/apiserver/pkg/util/shufflesharding/shufflesharding.go

-	if handSize <= 0 || deckSize <= 0 || handSize > deckSize {
-		return false
+// ValidateParameters finds errors in the parameters for shuffle
+// sharding.  Returns an empty slice iff there are no errors.  The


iff is correct

"iff" is standard jargon for "if and only if"

mars1024 · 2019-08-09T02:35:13Z

staging/src/k8s.io/apiserver/pkg/util/shufflesharding/shufflesharding.go

+		errs = append(errs, "deckSize is not positive")
+	}
+	if len(errs) > 0 {
+		return


Could we combine some if to make return in if?

We could, but I chose to use the existing structure because it made a certain amount of sense. First, we check the range of each individual value, accumulating errors. If there are any then take an early out. Then check logically necessary relationships. Early out if that revealed problems. Finally do the more complicated and costly quantitative check. I thought that would be easiest on other peoples' minds. But since you objected, I will rewrite to be more direct.

i agree with Mike, the current flow is easier to understand.

@yliaog : I am not sure which you are agreeing with. Is the revised code OK?

Yes, the current code structure looks good.

OK，let's keep this.

mars1024 · 2019-08-09T02:43:03Z

staging/src/k8s.io/apiserver/pkg/util/shufflesharding/shufflesharding_test.go

-	allCoordinateCount := 128 * 127 * 126 / 6
+	fallingFactorial := 128 * 127 * 126
+	allCoordinateCount := fallingFactorial / 6
+	hashMax := uint64(16 << uint32(math.Ceil(math.Log2(float64(fallingFactorial)))))


Could we have hashMax := fallingFactorial << (64 - maxHashBits) here? The center will be 2^(64-maxHashBits). Seems easy to be understood.

It is easy to prove that whenever hashMax is an integer multiple of fallingFactorial the distribution of hand occurrences is perfect --- every hand gets dealt the same number of times. A more interesting test is when hashMax is a power of 2 rather than a multiple of fallingFactorial.

We could replace 16 << X with 1 << (X + 64 - maxHashBits).

mars1024 · 2019-08-09T02:55:20Z

staging/src/k8s.io/apiserver/pkg/util/shufflesharding/shufflesharding_test.go

+		histogram[count] = histogram[count] + 1
+	}
+
+	for count := int(math.Ceil(center * 0.9)); count <= int(math.Floor(center*1.1)); count++ {


There are still 3 fixed number here for test, 0.9, 1.1 and 0.99, and they are linked, hard to maintain I think. How about using standard deviation? Then we will only have one fixed number.

cc @yue9944882

am fine with the hardcodes in the test

There are really only two numbers, not three; 0.9 and 1.1 are just 0.1 below and above unity.

I am not sure there is a real issue with maintainability here. The ShuffleAndDeal function is intended to implement a certain mathematical function; alternative implementations of the same function will produce exactly the same histogram (because it is a consequence of the mathematical function of shuffle-and-deal).

Could we bikeshed the evaluation of the histogram in follow-up PRs? I would like to get the validation function put to bed.

From the math of the ShuffleAndDeal function, when N*fallingFactorial (for any whole number N) consecutive hash values are studied we know that each hand will be dealt the same number of times and that number is N*factorial(handSize). So to test that this is what happened the evaluation should sum the histogram from floor(hashMax/fallingFactorial)*factorial(handSize) through ceiling(hashMax/fallingFactorial)*factorial(handSize) and insist on finding 100% of the mass there.

I revised again, making the uniformity tester run multiple test cases and evaluate whether all the counts in the histogram fall in the expected range.

yue9944882 · 2019-08-09T03:49:50Z

staging/src/k8s.io/apiserver/pkg/util/shufflesharding/shufflesharding.go

+// sharding.  Returns an empty slice iff there are no errors.  The
+// entropy requirement is evaluated in a fast but approximate way:
+// bits(deckSize^handSize).
+func ValidateParameters(deckSize, handSize int32) (errs []string) {


(errs []string) {

avoid named return value as much as possible, it's conventional in k/k to avoid that usage

I can imagine reasons to avoid named return values in general, but in this case I think it works well. The alternative starts out with

func ValidateParameters(deckSize, handSize int32) []string { errs := nil

and has more verbosity in the remainder too.

yue9944882 · 2019-08-09T03:56:00Z

staging/src/k8s.io/apiserver/pkg/util/shufflesharding/shufflesharding_test.go

+		histogram[count] = histogram[count] + 1
+	}
+
+	for count := int(math.Ceil(center * 0.9)); count <= int(math.Floor(center*1.1)); count++ {


am fine with the hardcodes in the test

yue9944882 · 2019-08-09T03:57:43Z

staging/src/k8s.io/apiserver/pkg/util/shufflesharding/shufflesharding.go

@@ -71,14 +88,14 @@ func ShuffleAndDealWithValidation(hashValue uint64, deckSize, handSize int32, pi

 // ShuffleAndDealToSlice will use specific pick function to return slices of indices
 // after ShuffleAndDeal
-func ShuffleAndDealToSlice(hashValue uint64, deckSize, handSize int32) []int32 {
+func ShuffleAndDealToSlice(hashValue uint64, deckSize, handSize int32) []int {


what's the point of s/int32/int/

#80710 (comment)

sort.Slice(int32Array, func(i, j int){ return int32Array[i] < int32Array[j]})

we can sort int32 in another style. all the elements in the returned slice are not supposed to exceed an int32 deckSize, returning []int32 is more reasonable here..

https://golang.org/ref/spec#Numeric_types promises that an int can hold an int32.

Yes, we could make clients sort in another way. But why should we? Returning int makes for simpler client code.

On further thought: I understand the dislike of the dissonance between int32 arguments and int returns. Using int32 for the arguments is really just inertia from when this was entirely dedicated to the priority and fairness feature, which is configured with int32 parameters. But for general purpose use, I think int is preferred as it is the natural integer data type. There is an error here in putting the boundary between priority-and-fairness config fealty and general sensibilities inside this function. I will recode so that this general purpose function takes int size parameters as well as returning int cards.

yue9944882 · 2019-08-09T03:58:04Z

staging/src/k8s.io/apiserver/pkg/util/shufflesharding/shufflesharding.go

-	if handSize <= 0 || deckSize <= 0 || handSize > deckSize {
-		return false
+// ValidateParameters finds errors in the parameters for shuffle
+// sharding.  Returns an empty slice iff there are no errors.  The


iff is correct

MikeSpreitzer · 2019-08-09T09:46:50Z

/retest

yliaog · 2019-08-13T18:07:56Z

staging/src/k8s.io/apiserver/pkg/util/shufflesharding/shufflesharding.go

+		errs = append(errs, "deckSize is not positive")
+	}
+	if len(errs) > 0 {
+		return


i agree with Mike, the current flow is easier to understand.

yliaog · 2019-08-13T18:09:38Z

staging/src/k8s.io/apiserver/pkg/util/shufflesharding/shufflesharding.go

-	if handSize <= 0 || deckSize <= 0 || handSize > deckSize {
-		return false
+// ValidateParameters finds errors in the parameters for shuffle
+// sharding.  Returns an empty slice iff there are no errors.  The


is it an empty slice or nil slice returned when there are no errors?

The nil slice is empty. https://play.golang.org/p/AdlgjTL2DdP

there are subtle differences between the two,
https://www.pixelstech.net/article/1539870875-Empty-slice-vs-nil-slice-in-GoLang,
https://programming.guide/go/nil-slice-vs-empty-slice.html

Oh, by "empty" I meant "len() returns 0". I will reword to make it clearer.

yliaog · 2019-08-13T18:18:47Z

staging/src/k8s.io/apiserver/pkg/util/shufflesharding/shufflesharding.go

+	handSize := int32(len(hand))
+	var idx int
+	ShuffleAndDeal(hashValue, deckSize, handSize, func(card int32) {
+		hand[idx] = int(card)


isn't idx always 0, since ShuffleAndDeal is called only once inside ShuffleAndDealIntoHand?

ShuffleAndDeal calls its pick parameter (which is the function here) handSize times, and each increments idx

I think Mike is right.

mars1024 · 2019-08-17T12:23:12Z

staging/src/k8s.io/apiserver/pkg/util/shufflesharding/shufflesharding.go

+		errs = append(errs, "deckSize is not positive")
+	}
+	if len(errs) > 0 {
+		return


OK，let's keep this.

mars1024 · 2019-08-17T12:26:09Z

staging/src/k8s.io/apiserver/pkg/util/shufflesharding/shufflesharding.go

+	handSize := int32(len(hand))
+	var idx int
+	ShuffleAndDeal(hashValue, deckSize, handSize, func(card int32) {
+		hand[idx] = int(card)


I think Mike is right.

mars1024 · 2019-08-17T12:27:07Z

staging/src/k8s.io/apiserver/pkg/util/shufflesharding/shufflesharding.go

 }

 // ShuffleAndDeal can shuffle a hash value to handSize-quantity and non-redundant
 // indices of decks, with the pick function, we can get the optimal deck index
 // Eg. From deckSize=128, handSize=8, we can get an index array [12 14 73 18 119 51 117 26],
 // then pick function will choose the optimal index from these
 // Algorithm: https://github.com/kubernetes/enhancements/blob/master/keps/sig-api-machinery/20190228-priority-and-fairness.md#queue-assignment-proof-of-concept
-func ShuffleAndDeal(hashValue uint64, deckSize, handSize int32, pick func(int32)) {
-	remainders := make([]int32, handSize)
+func ShuffleAndDeal(hashValue uint64, deckSize, handSize int, pick func(int)) {


@MikeSpreitzer @yue9944882 why do we change int32 to int here?

I documented that in the commit comment. See it at aadd486 , for example.

I also wrote about it in another comment: #81182 (comment)

got it, I think both are OK if we accept the type conversion in priority and fairness. cc @yue9944882

convinced, let's make it an int

mars1024 · 2019-08-19T03:17:36Z

staging/src/k8s.io/apiserver/pkg/util/shufflesharding/shufflesharding_test.go

+		minCount := permutations * int(math.Floor(nff))
+		maxCount := permutations * int(math.Ceil(nff))
+		aHand := make([]int, test.handSize)
+		for i := 0; i < test.hashMax; i++ {


I don't think it is good to test from 0 to hashMax, because we all know the principle of our algorithm, it will get same results in [allCoordinateCount*N, allCoordinateCount*(N+1) ), so the test case will always succeed, which is not expected.

I think the point of unit tests is to check whether the implementation is doing what it is intended to do. If we were confident that the implementation does what it is intended to do then these unit tests would prove nothing, and the only things to prove would be math exercises that do not need to be repeated in unit tests.

By exploring from 0 to hashMax-1, we can test a range of hash space that is like what gets used in the priority and fairness filter --- a range from 0 to SomePowerOfTwo-1.

OK, I think these test cases are enough for priority and fairness filter.

mars1024 · 2019-08-19T03:18:53Z

staging/src/k8s.io/apiserver/pkg/util/shufflesharding/shufflesharding_test.go

+	}{
+		{64, 3, 1 << uint(math.Ceil(math.Log2(float64(ff(64, 3))))+spare)},
+		{128, 3, ff(128, 3)},
+		{128, 3, 3 * ff(128, 3)},


If {128, 3, ff(128,3)} pass, {128, 3, 3 *ff(128, 3)} will pass either, so I think this case is duplication.

My thinking for the {128, 3, 3 * ff(128, 3)} test case is that by exploring a larger hash value space this gives the implementation more opportunities to mess up.

mars1024 · 2019-08-21T06:04:31Z

/hold cancel

mars1024 · 2019-08-21T06:06:02Z

staging/src/k8s.io/apiserver/pkg/util/shufflesharding/shufflesharding_test.go

+		minCount := permutations * int(math.Floor(nff))
+		maxCount := permutations * int(math.Ceil(nff))
+		aHand := make([]int, test.handSize)
+		for i := 0; i < test.hashMax; i++ {


OK, I think these test cases are enough for priority and fairness filter.

mars1024 · 2019-08-21T06:07:04Z

/lgtm

k8s-ci-robot · 2019-08-21T06:07:11Z

@mars1024: changing LGTM is restricted to collaborators

In response to this:

/lgtm

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

yue9944882 · 2019-08-21T06:07:24Z

/hold

for the clean-up from FQ package

MikeSpreitzer · 2019-08-21T06:17:05Z

Latest revision changes the fairqueuing package to use the shuffle sharding utility package.

yue9944882 · 2019-08-21T06:18:22Z

staging/src/k8s.io/apiserver/pkg/util/flowcontrol/fairqueuing/fairqueuing.go

@@ -254,10 +231,16 @@ func (qs *queueSetImpl) ChooseQueueIdx(hashValue uint64, handSize int) int {
 	// TODO(aaron-prindle) currently a lock is held for this in a larger anonymous function
 	// verify that makes sense...

+	bestQueueIdx := -1
+	bestQueueLen := 2147483647


math.MaxInt32?

This way it is an integer and no conversion is required when doing the comparison

Oh, I see that constant is actually untyped. So the revised code has an unnecessary typecast. Oh well.

Changes following up on PR #807810 . Made the validation checking function return a slice of error messages rather than just a bit. Replaced all the `int32` with `int` because this is intended for more than just the priority-and-faireness feature and so should not be a slave to its configuration datatypes. Introduced ShuffleAndDealIntoHand, to make memory allocation the caller's problem/privilege. Made the hand uniformity tester avoid reflection, evaluate the histogram against the expected range of counts, and run multiple test cases, including one in which the number of hash values is a power of two with four extra bits (as the validation check requires) and one in which the deck size is not a power of two. Updated the fairqueuing implementation to use the shuffle sharding utility package.

yue9944882 · 2019-08-21T06:30:07Z

/hold cancel
/lgtm

k8s-ci-robot · 2019-08-21T06:31:15Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: MikeSpreitzer

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~staging/src/k8s.io/apiserver/OWNERS~~ [MikeSpreitzer]
~~vendor/OWNERS~~ [MikeSpreitzer]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

yue9944882 · 2019-08-21T06:56:17Z

/lgtm

yue9944882 · 2019-08-21T07:36:13Z

/test pull-kubernetes-integration

yue9944882 · 2019-08-21T08:48:23Z

/test pull-kubernetes-kubemark-e2e-gce-big

k8s-ci-robot added sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Aug 8, 2019

k8s-ci-robot requested review from aaron-prindle and yue9944882 August 8, 2019 19:00

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 8, 2019

k8s-ci-robot requested review from ncdc and wojtek-t August 8, 2019 19:00

k8s-ci-robot added the area/apiserver label Aug 8, 2019

MikeSpreitzer force-pushed the sharding-redux branch from 515aa1e to 92359be Compare August 8, 2019 20:15

MikeSpreitzer changed the title ~~More work on shuffle sharding utils~~ [FEATURE BRANCH] More work on shuffle sharding utils Aug 8, 2019

MikeSpreitzer commented Aug 8, 2019

View reviewed changes

k8s-ci-robot requested a review from yliaog August 8, 2019 20:22

MikeSpreitzer force-pushed the sharding-redux branch from 92359be to 3581c05 Compare August 8, 2019 21:49

mars1024 reviewed Aug 9, 2019

View reviewed changes

yue9944882 reviewed Aug 9, 2019

View reviewed changes

MikeSpreitzer force-pushed the sharding-redux branch 2 times, most recently from f3105c4 to a8342c6 Compare August 9, 2019 07:12

ncdc removed their request for review August 9, 2019 15:26

yliaog reviewed Aug 13, 2019

View reviewed changes

MikeSpreitzer force-pushed the sharding-redux branch from 3aa0ea3 to 270765e Compare August 14, 2019 01:47

k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 16, 2019

k8s-ci-robot assigned yue9944882 Aug 16, 2019

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 16, 2019

mars1024 reviewed Aug 17, 2019

View reviewed changes

mars1024 reviewed Aug 19, 2019

View reviewed changes

k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 21, 2019

mars1024 approved these changes Aug 21, 2019

View reviewed changes

k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 21, 2019

MikeSpreitzer force-pushed the sharding-redux branch from aadd486 to 4c4f536 Compare August 21, 2019 06:15

k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 21, 2019

yue9944882 reviewed Aug 21, 2019

View reviewed changes

k8s-ci-robot added lgtm "Looks good to me", indicates that a PR is ready to be merged. and removed do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. labels Aug 21, 2019

MikeSpreitzer force-pushed the sharding-redux branch from 24e9424 to e5c9f50 Compare August 21, 2019 06:30

k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 21, 2019

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 21, 2019

k8s-ci-robot merged commit 57e217a into kubernetes:feature-rate-limiting Aug 21, 2019

MikeSpreitzer deleted the sharding-redux branch August 21, 2019 13:53

mars1024 mentioned this pull request Oct 9, 2019

shuffle sharding package for priority and fairness #83665

Merged

[FEATURE BRANCH] More work on shuffle sharding utils #81182

[FEATURE BRANCH] More work on shuffle sharding utils #81182

Conversation

MikeSpreitzer commented Aug 8, 2019 • edited

MikeSpreitzer commented Aug 8, 2019

MikeSpreitzer commented Aug 8, 2019

MikeSpreitzer Aug 8, 2019 • edited

Choose a reason for hiding this comment

MikeSpreitzer Aug 8, 2019 • edited

Choose a reason for hiding this comment

fedebongio commented Aug 8, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MikeSpreitzer Aug 9, 2019 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MikeSpreitzer Aug 10, 2019 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MikeSpreitzer commented Aug 9, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mars1024 Aug 19, 2019 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mars1024 commented Aug 21, 2019

Choose a reason for hiding this comment

mars1024 commented Aug 21, 2019

k8s-ci-robot commented Aug 21, 2019

yue9944882 commented Aug 21, 2019 • edited

MikeSpreitzer commented Aug 21, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yue9944882 commented Aug 21, 2019

k8s-ci-robot commented Aug 21, 2019

yue9944882 commented Aug 21, 2019

yue9944882 commented Aug 21, 2019

/test pull-kubernetes-integration

yue9944882 commented Aug 21, 2019

MikeSpreitzer commented Aug 8, 2019 •

edited

MikeSpreitzer Aug 8, 2019 •

edited

MikeSpreitzer Aug 8, 2019 •

edited

MikeSpreitzer Aug 9, 2019 •

edited

MikeSpreitzer Aug 10, 2019 •

edited

mars1024 Aug 19, 2019 •

edited

yue9944882 commented Aug 21, 2019 •

edited