New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
picker: increase range of peer observation #1267
picker: increase range of peer observation #1267
Conversation
d7399c2
to
48b52dc
Compare
Current coverage is 54.91% (diff: 16.66%)@@ master #1267 diff @@
==========================================
Files 78 78
Lines 12418 12418
Methods 0 0
Messages 0 0
Branches 0 0
==========================================
+ Hits 6795 6819 +24
+ Misses 4677 4660 -17
+ Partials 946 939 -7
|
@@ -299,15 +303,15 @@ func (p *Picker) WaitForStateChange(ctx context.Context, sourceState grpc.Connec | |||
// TODO(stevvooe): This is questionable, but we'll see how it works. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe remove the TODO now that we are taking a closer look at this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is still questionable... and we're probably getting rid of this after the move to grpc.LoadBalancer.
LGTM |
@aaronlehmann true, we should modify test which I wrote to use such ideas, like Observe each twice, then downweight one and see how often it's there. |
I'm wondering if using For example, when the weight has converged to 10, and we observe with -10, we should end up at 0. But if the math was not exact due to FP rounding, and we ended up with a value like 0.00000001, this would turn into a weight of 1. It wouldn't be a big deal if the domain was large, but since we've limited ourselves to 21 discrete steps, a difference of 1 is quite significant. I think using floating point weights for internal state would prevent roundoffs from becoming amplified like this. |
if peer == peers[0] { | ||
// we have an *extremely* low probability of selecting this node | ||
// (like 0.5%) once. We still allow the delta to keep from being | ||
// flaky. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding the protection against flakiness. It really sucks when a probablistic test fails 1 in 1000 times because someone felt that was rare enough not to be a problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has a probability of 0.5%^20 of failing.
@aaronlehmann @stevvooe I'm not sure that I understand algorithm, but test
passes, downweighted peer choosen only in 0.001% cases. |
} | ||
|
||
// one bad observation should mark the node as bad | ||
remotes.Observe(peers[0], -DefaultObservationWeight) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we observe a few times with a positive weight before this downweighting to make sure that we're still unlikely to select this node even if its weight started as > 0 (as it will in practice)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can either have balanced downweighting have failures immediately reduce selection probability or have less negative than positive to require several observations before it is fully downweighted.
In this case, initial condition is DefaultObservationWeight
and we downweight with -DefaultObservationWeight
, expecting it to cross zero.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I missed the initial condition. Where is it set up?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The initial loop converges it to DefaultObservationWeight
.
ee5fbe1
to
74fdc94
Compare
The tests have been adjusted to reduce failure probability. Another test has been changed to deal with bias reduction, discovered after running 10s of thousands of times. |
remotes := NewRemotes(peers...) | ||
seen := map[api.Peer]int{} | ||
selections := 1000 | ||
tolerance := 0.20 // allow 10% delta to reduce test failure probability |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comment says 10%
74fdc94
to
bfd10a1
Compare
To allow the weighting algorithms to work properly, the values of observation weights need to be selected to be greater than 1. Without this, the observations, positive or negative, will always converge to 1. We also relax the smoothing factor to allow the observations to react faster to changes. Arguably, we could remove the weight parameter from `Observe` and then make the manager state truly discrete but that is a much larger change. Signed-off-by: Stephen J Day <stephen.day@docker.com>
bfd10a1
to
bbc4243
Compare
LGTM |
To allow the weighting algorithms to work properly, the values of
observation weights need to be selected to be greater than 1. Without
this, the observations, positive or negative, will always converge to 1.
We also relax the smoothing factor to allow the observations to react
faster to changes.
Arguably, we could remove the weight parameter from
Observe
and thenmake the manager state truly discrete but that is a much larger change.
Signed-off-by: Stephen J Day stephen.day@docker.com
cc @LK4D4 @aaronlehmann