Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE BRANCH] More work on shuffle sharding utils #81182

Merged

Conversation

MikeSpreitzer
Copy link
Member

@MikeSpreitzer MikeSpreitzer commented Aug 8, 2019

What type of PR is this?

Uncomment only one /kind <> line, hit enter to put that in a new line, and remove leading whitespaces from that line:

/kind api-change
/kind bug

/kind cleanup

/kind design
/kind documentation
/kind failing-test
/kind feature
/kind flake

What this PR does / why we need it:
Changes following up on PR #80710 .

Made the validation checking function return a slice of error messages
rather than just a bit.

Made the shard-to-slice functions return []int rather than []int32 on
the expectation of increased convenience.

Made the hand uniformity tester avoid reflection, use a number of hash
values corresponding to the validation check, and evaluate a histogram
of hand counts.

Which issue(s) this PR fixes:

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

NONE

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

[KEP] https://github.com/kubernetes/enhancements/blob/master/keps/sig-api-machinery/20190228-priority-and-fairness.md

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Aug 8, 2019
@MikeSpreitzer
Copy link
Member Author

/sig api-machinery

@k8s-ci-robot k8s-ci-robot added sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Aug 8, 2019
@MikeSpreitzer
Copy link
Member Author

/cc @yue9944882
/cc @aaron-prindle
@mars1024

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 8, 2019
@MikeSpreitzer MikeSpreitzer changed the title More work on shuffle sharding utils [FEATURE BRANCH] More work on shuffle sharding utils Aug 8, 2019
return hands[i] < hands[j]
})
for i := uint64(0); i < hashMax; i++ {
ShuffleAndDealIntoHand(i, deckSize, aHand[:])
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, using i rather than rand.Uint64() as the hash value made the distribution in the histogram much tighter. Using i does an even sampling of the hash value space (modulo the fact that the falling factorial is not a power of two), whereas using rand.Uint64() relies on the quality of rand (which evidently is not that great).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, if we set hashMax touint64(fallingFactorial) then every hand gets dealt the same number of times.

@fedebongio
Copy link
Contributor

/cc @yliaog

if handSize <= 0 || deckSize <= 0 || handSize > deckSize {
return false
// ValidateParameters finds errors in the parameters for shuffle
// sharding. Returns an empty slice iff there are no errors. The
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

iff is correct

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"iff" is standard jargon for "if and only if"

errs = append(errs, "deckSize is not positive")
}
if len(errs) > 0 {
return
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we combine some if to make return in if?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could, but I chose to use the existing structure because it made a certain amount of sense. First, we check the range of each individual value, accumulating errors. If there are any then take an early out. Then check logically necessary relationships. Early out if that revealed problems. Finally do the more complicated and costly quantitative check. I thought that would be easiest on other peoples' minds. But since you objected, I will rewrite to be more direct.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i agree with Mike, the current flow is easier to understand.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yliaog : I am not sure which you are agreeing with. Is the revised code OK?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the current code structure looks good.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK,let's keep this.

allCoordinateCount := 128 * 127 * 126 / 6
fallingFactorial := 128 * 127 * 126
allCoordinateCount := fallingFactorial / 6
hashMax := uint64(16 << uint32(math.Ceil(math.Log2(float64(fallingFactorial)))))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we have hashMax := fallingFactorial << (64 - maxHashBits) here? The center will be 2^(64-maxHashBits). Seems easy to be understood.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is easy to prove that whenever hashMax is an integer multiple of fallingFactorial the distribution of hand occurrences is perfect --- every hand gets dealt the same number of times. A more interesting test is when hashMax is a power of 2 rather than a multiple of fallingFactorial.

We could replace 16 << X with 1 << (X + 64 - maxHashBits).

histogram[count] = histogram[count] + 1
}

for count := int(math.Ceil(center * 0.9)); count <= int(math.Floor(center*1.1)); count++ {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are still 3 fixed number here for test, 0.9, 1.1 and 0.99, and they are linked, hard to maintain I think. How about using standard deviation? Then we will only have one fixed number.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

am fine with the hardcodes in the test

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are really only two numbers, not three; 0.9 and 1.1 are just 0.1 below and above unity.

I am not sure there is a real issue with maintainability here. The ShuffleAndDeal function is intended to implement a certain mathematical function; alternative implementations of the same function will produce exactly the same histogram (because it is a consequence of the mathematical function of shuffle-and-deal).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we bikeshed the evaluation of the histogram in follow-up PRs? I would like to get the validation function put to bed.

From the math of the ShuffleAndDeal function, when N*fallingFactorial (for any whole number N) consecutive hash values are studied we know that each hand will be dealt the same number of times and that number is N*factorial(handSize). So to test that this is what happened the evaluation should sum the histogram from floor(hashMax/fallingFactorial)*factorial(handSize) through ceiling(hashMax/fallingFactorial)*factorial(handSize) and insist on finding 100% of the mass there.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I revised again, making the uniformity tester run multiple test cases and evaluate whether all the counts in the histogram fall in the expected range.

// sharding. Returns an empty slice iff there are no errors. The
// entropy requirement is evaluated in a fast but approximate way:
// bits(deckSize^handSize).
func ValidateParameters(deckSize, handSize int32) (errs []string) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(errs []string) {

avoid named return value as much as possible, it's conventional in k/k to avoid that usage

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can imagine reasons to avoid named return values in general, but in this case I think it works well. The alternative starts out with

 func ValidateParameters(deckSize, handSize int32) []string {
    errs := nil

and has more verbosity in the remainder too.

histogram[count] = histogram[count] + 1
}

for count := int(math.Ceil(center * 0.9)); count <= int(math.Floor(center*1.1)); count++ {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

am fine with the hardcodes in the test

@@ -71,14 +88,14 @@ func ShuffleAndDealWithValidation(hashValue uint64, deckSize, handSize int32, pi

// ShuffleAndDealToSlice will use specific pick function to return slices of indices
// after ShuffleAndDeal
func ShuffleAndDealToSlice(hashValue uint64, deckSize, handSize int32) []int32 {
func ShuffleAndDealToSlice(hashValue uint64, deckSize, handSize int32) []int {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the point of s/int32/int/

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sort.Slice(int32Array, func(i, j int){ return int32Array[i] < int32Array[j]})

we can sort int32 in another style. all the elements in the returned slice are not supposed to exceed an int32 deckSize, returning []int32 is more reasonable here..

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://golang.org/ref/spec#Numeric_types promises that an int can hold an int32.

Yes, we could make clients sort in another way. But why should we? Returning int makes for simpler client code.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On further thought: I understand the dislike of the dissonance between int32 arguments and int returns. Using int32 for the arguments is really just inertia from when this was entirely dedicated to the priority and fairness feature, which is configured with int32 parameters. But for general purpose use, I think int is preferred as it is the natural integer data type. There is an error here in putting the boundary between priority-and-fairness config fealty and general sensibilities inside this function. I will recode so that this general purpose function takes int size parameters as well as returning int cards.

if handSize <= 0 || deckSize <= 0 || handSize > deckSize {
return false
// ValidateParameters finds errors in the parameters for shuffle
// sharding. Returns an empty slice iff there are no errors. The
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

iff is correct

@MikeSpreitzer MikeSpreitzer force-pushed the sharding-redux branch 2 times, most recently from f3105c4 to a8342c6 Compare August 9, 2019 07:12
@MikeSpreitzer
Copy link
Member Author

/retest

@ncdc ncdc removed their request for review August 9, 2019 15:26
errs = append(errs, "deckSize is not positive")
}
if len(errs) > 0 {
return
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i agree with Mike, the current flow is easier to understand.

if handSize <= 0 || deckSize <= 0 || handSize > deckSize {
return false
// ValidateParameters finds errors in the parameters for shuffle
// sharding. Returns an empty slice iff there are no errors. The
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it an empty slice or nil slice returned when there are no errors?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The nil slice is empty. https://play.golang.org/p/AdlgjTL2DdP

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, by "empty" I meant "len() returns 0". I will reword to make it clearer.

handSize := int32(len(hand))
var idx int
ShuffleAndDeal(hashValue, deckSize, handSize, func(card int32) {
hand[idx] = int(card)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't idx always 0, since ShuffleAndDeal is called only once inside ShuffleAndDealIntoHand?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ShuffleAndDeal calls its pick parameter (which is the function here) handSize times, and each increments idx

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think Mike is right.

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 16, 2019
@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 16, 2019
errs = append(errs, "deckSize is not positive")
}
if len(errs) > 0 {
return
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK,let's keep this.

handSize := int32(len(hand))
var idx int
ShuffleAndDeal(hashValue, deckSize, handSize, func(card int32) {
hand[idx] = int(card)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think Mike is right.

}

// ShuffleAndDeal can shuffle a hash value to handSize-quantity and non-redundant
// indices of decks, with the pick function, we can get the optimal deck index
// Eg. From deckSize=128, handSize=8, we can get an index array [12 14 73 18 119 51 117 26],
// then pick function will choose the optimal index from these
// Algorithm: https://github.com/kubernetes/enhancements/blob/master/keps/sig-api-machinery/20190228-priority-and-fairness.md#queue-assignment-proof-of-concept
func ShuffleAndDeal(hashValue uint64, deckSize, handSize int32, pick func(int32)) {
remainders := make([]int32, handSize)
func ShuffleAndDeal(hashValue uint64, deckSize, handSize int, pick func(int)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MikeSpreitzer @yue9944882 why do we change int32 to int here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I documented that in the commit comment. See it at aadd486 , for example.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also wrote about it in another comment: #81182 (comment)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got it, I think both are OK if we accept the type conversion in priority and fairness. cc @yue9944882

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

convinced, let's make it an int

minCount := permutations * int(math.Floor(nff))
maxCount := permutations * int(math.Ceil(nff))
aHand := make([]int, test.handSize)
for i := 0; i < test.hashMax; i++ {
Copy link
Member

@mars1024 mars1024 Aug 19, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it is good to test from 0 to hashMax, because we all know the principle of our algorithm, it will get same results in [allCoordinateCount*N, allCoordinateCount*(N+1) ), so the test case will always succeed, which is not expected.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the point of unit tests is to check whether the implementation is doing what it is intended to do. If we were confident that the implementation does what it is intended to do then these unit tests would prove nothing, and the only things to prove would be math exercises that do not need to be repeated in unit tests.

By exploring from 0 to hashMax-1, we can test a range of hash space that is like what gets used in the priority and fairness filter --- a range from 0 to SomePowerOfTwo-1.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I think these test cases are enough for priority and fairness filter.

}{
{64, 3, 1 << uint(math.Ceil(math.Log2(float64(ff(64, 3))))+spare)},
{128, 3, ff(128, 3)},
{128, 3, 3 * ff(128, 3)},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If {128, 3, ff(128,3)} pass, {128, 3, 3 *ff(128, 3)} will pass either, so I think this case is duplication.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thinking for the {128, 3, 3 * ff(128, 3)} test case is that by exploring a larger hash value space this gives the implementation more opportunities to mess up.

@mars1024
Copy link
Member

/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 21, 2019
minCount := permutations * int(math.Floor(nff))
maxCount := permutations * int(math.Ceil(nff))
aHand := make([]int, test.handSize)
for i := 0; i < test.hashMax; i++ {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I think these test cases are enough for priority and fairness filter.

@mars1024
Copy link
Member

/lgtm

@k8s-ci-robot
Copy link
Contributor

@mars1024: changing LGTM is restricted to collaborators

In response to this:

/lgtm

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@yue9944882
Copy link
Member

yue9944882 commented Aug 21, 2019

/hold

for the clean-up from FQ package

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 21, 2019
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 21, 2019
@MikeSpreitzer
Copy link
Member Author

Latest revision changes the fairqueuing package to use the shuffle sharding utility package.

@@ -254,10 +231,16 @@ func (qs *queueSetImpl) ChooseQueueIdx(hashValue uint64, handSize int) int {
// TODO(aaron-prindle) currently a lock is held for this in a larger anonymous function
// verify that makes sense...

bestQueueIdx := -1
bestQueueLen := 2147483647
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

math.MaxInt32?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This way it is an integer and no conversion is required when doing the comparison

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see that constant is actually untyped. So the revised code has an unnecessary typecast. Oh well.

Changes following up on PR #807810 .

Made the validation checking function return a slice of error messages
rather than just a bit.

Replaced all the `int32` with `int` because this is intended for more
than just the priority-and-faireness feature and so should not be a
slave to its configuration datatypes.

Introduced ShuffleAndDealIntoHand, to make memory allocation the
caller's problem/privilege.

Made the hand uniformity tester avoid reflection, evaluate the
histogram against the expected range of counts, and run multiple test
cases, including one in which the number of hash values is a power of
two with four extra bits (as the validation check requires) and one in
which the deck size is not a power of two.

Updated the fairqueuing implementation to use the shuffle sharding
utility package.
@yue9944882
Copy link
Member

/hold cancel
/lgtm

@k8s-ci-robot k8s-ci-robot added lgtm "Looks good to me", indicates that a PR is ready to be merged. and removed do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. labels Aug 21, 2019
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 21, 2019
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: MikeSpreitzer

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@yue9944882
Copy link
Member

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 21, 2019
@yue9944882
Copy link
Member

/test pull-kubernetes-integration

@yue9944882
Copy link
Member

/test pull-kubernetes-kubemark-e2e-gce-big

@k8s-ci-robot k8s-ci-robot merged commit 57e217a into kubernetes:feature-rate-limiting Aug 21, 2019
@MikeSpreitzer MikeSpreitzer deleted the sharding-redux branch August 21, 2019 13:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/apiserver cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. release-note-none Denotes a PR that doesn't merit a release note. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants