/
priority.go
124 lines (115 loc) · 6.34 KB
/
priority.go
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
// Copyright (C) 2020 Storj Labs, Inc.
// See LICENSE for copying information.
package repair
import "math"
// segmentHealthNHD purports to find the number of days that a segment can be
// expected to survive, with the given failureRate.
//
// The loss of nodes and pieces on the Storj network relative to a single
// segment can be modeled as a bin holding differently colored balls, one for
// each node on the network. Nodes holding pieces of our segment become red
// balls, and all other nodes are blue balls. One by one, we reach in and remove
// a ball from the bin at random, symbolizing that node going offline. If the
// ball is blue, we count that as a success (our segment is unaffected). If the
// ball is red, it is a failure (our segment has lost a piece). We want to know
// how many draws it will take before some number of failures is reached (that
// is, how long will it be before a segment loses too many pieces and is no
// longer reconstructible, if we don't repair it along the way).
//
// With this formulation, the problem is nearly identical to the situation
// described by the negative hypergeometric distribution
// (https://en.wikipedia.org/wiki/Negative_hypergeometric_distribution). It is
// related to the negative binomial distribution
// (https://en.wikipedia.org/wiki/Negative_binomial_distribution), but the NBD
// deals with drawing balls from a bin with replacement, while the NHD deals
// with drawing balls from a bin without replacement. Because we can't expect
// lost nodes to come back, we don't put balls back into the bin once they are
// drawn, so ours is a case of drawing without replacement. The negative
// hypergeometric distribution more closely matches our problem, especially
// around certain edge cases.
//
// Do nodes tend to go offline one after another like a sequence of balls being
// chosen from a bin? No, in reality node failures tend to happen in conjunction
// with other failures. They are not independent occurrences. However, if we
// have done a good enough job in declumping segments so that pieces tend to be
// well distributed across unrelated nodes, then node failure patterns from the
// point of view of a particular segment should be pretty indistinguishable from
// random and independent. We have measured in the past what appeared to be a
// fairly steady rate of node failure: each node has something like a 0.00005435
// chance of going permanently offline on any given day. With a population size
// of about 24k nodes, this gives us a mean time between failures (MTBF) for
// nodes of about 18.5 hours. We can make the simplifying assumption that nodes
// do in fact go offline at this rate.
//
// In our formulation of the model, the NHD parameters are:
//
// N (the total number of balls) = totalNodes
// K (the number of balls considered successes) = totalNodes-numHealthy
// r (the number of failures until we're done) = numHealthy-minPieces+1
//
// Knowing this tells us how to calculate how many draws to expect segment decay
// to take. The expected value of the negative hypergeometric distribution (the
// mean value you would get if you tried the experiment enough times) is
// r*K/(N-K+1).
//
// Now, knowing the number of draws doesn't immediately tell us how many _days_
// of survival to expect. We use the failureRate parameter to get from _draws_
// to _days_.
//
// We want to scale things so that one draw corresponds to one node failure.
// All we need for that is the MTBF, the mean time between failures. One draw
// can correspond to one MTBF interval. Since we know that failureRate is a
// chance of failure per node per day, we can multiply it by totalNodes to get
// the total number of node failures per day, and invert that value to get the
// mean number of days per failure, and that is the MTBF.
//
// For more analysis of this model, see the Jupyter Notebook
// repair_and_durability/repairPriority/hypergeo.ipynb in the storj/datascience
// repository.
func segmentHealthNHD(numHealthy, minPieces, totalNodes int, failureRate float64) float64 {
if numHealthy < minPieces {
// take a shortcut.
return 0
}
N := float64(totalNodes) // the total population
K := float64(totalNodes - numHealthy) // the number of successes/blue balls in the bin
r := float64(numHealthy - minPieces + 1) // how many failures before the segment is irrecoverable
// the mean of the distribution, corresponding to the expected number of
// successes before we reach r failures
expectedNumberOfSuccesses := r * K / (N - K + 1)
// the total number of expected draws, including both successes and failures
expectedNumberOfDraws := expectedNumberOfSuccesses + r
drawsPerDay := N * failureRate
mtbf := 1 / drawsPerDay
days := expectedNumberOfDraws * mtbf
return days
}
const (
// These somewhat magic-looking values correspond to pop-significance values
// suggested by @elek in the context of a different health model. They have
// been adapted to work in this model.
popSignificanceLow = 3154 // from segmentHealthNHD(34, 29, 24000, 0.00005435)
popSignificanceHigh = 5385 // from segmentHealthNHD(40, 29, 24000, 0.00005435)
)
// SegmentHealth returns a value corresponding to the health of a segment in the
// repair queue. Lower health segments should be repaired first.
//
// This implementation uses segmentHealthNHD to calculate the base health value.
//
// An additional wrinkle added here is that we need to assign high priority to
// pieces which need to be repaired as soon as possible, e.g., pieces out of
// placement ("POPs"). We want to tune it so that segments with POPs are
// generally higher priority than other segments, and segments with more POPs
// are generally higher priority than segments with fewer POPs. It is possible,
// however, for a segment with no POPs to be prioritized above a segment that
// does have POPs, if the first segment is in sufficient danger and the second
// segment is not.
func SegmentHealth(numHealthy, minPieces, totalNodes int, failureRate float64, numForcingRepair int) float64 {
base := segmentHealthNHD(numHealthy, minPieces, totalNodes, failureRate)
if numForcingRepair > 0 {
// POP segments are put between segments with lifetimes between popSignificanceLow and popSignificanceHigh days.
popSignificance := math.Min(float64(numForcingRepair)/float64(minPieces), 1)
return math.Min(base, popSignificanceHigh-(popSignificanceHigh-popSignificanceLow)*popSignificance)
}
return base
}