Move adaptive sampling processor #1179

black-adder · 2018-11-13T19:58:59Z

Signed-off-by: Won Jun Jang wjang@uber.com

This PR is part 2 of moving adaptive sampling (#365) over to OSS. This is moving over processor and related components.

codecov · 2018-11-13T20:37:29Z

Codecov Report

Merging #1179 into master will increase coverage by <.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master    #1179      +/-   ##
==========================================
+ Coverage   99.82%   99.83%   +<.01%     
==========================================
  Files         173      179       +6     
  Lines        8203     8550     +347     
==========================================
+ Hits         8189     8536     +347     
  Misses          7        7              
  Partials        7        7

Impacted Files	Coverage Δ
plugin/sampling/strategystore/adaptive/utils.go	`100% <100%> (ø)`
plugin/sampling/strategystore/adaptive/factory.go	`100% <100%> (ø)`
plugin/sampling/strategystore/adaptive/cache.go	`100% <100%> (ø)`
plugin/sampling/strategystore/adaptive/options.go	`100% <100%> (ø)`
...lugin/sampling/strategystore/adaptive/processor.go	`100% <100%> (ø)`
...mpling/strategystore/adaptive/weightvectorcache.go	`100% <100%> (ø)`
... and 2 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 54416c6...507404b. Read the comment docs.

yurishkuro · 2018-11-13T21:14:05Z

plugin/sampling/strategystore/adaptive/cache.go

+// samplingCacheEntry keeps track of the probability and whether a service-operation is using adaptive sampling
+type samplingCacheEntry struct {
+	probability    float64
+	usingAdapative bool


s/usingAdapative/usingAdaptive

yurishkuro · 2018-11-13T21:14:39Z

plugin/sampling/strategystore/adaptive/cache.go

+
+package adaptive
+
+// samplingCacheEntry keeps track of the probability and whether a service-operation is using adaptive sampling


"whether a service-operation is using adaptive sampling" - in what sense, configured to use or observed to use?

plugin/sampling/strategystore/adaptive/cache.go

yurishkuro · 2018-11-13T21:17:42Z

plugin/sampling/strategystore/adaptive/factory.go

+
+// Factory implements strategystore.Factory for an adaptive strategy store.
+type Factory struct {
+	options        *Options


don't need a pointer (faster access)

plugin/sampling/strategystore/adaptive/options.go

plugin/sampling/strategystore/adaptive/processor.go

yurishkuro · 2018-11-13T21:45:36Z

plugin/sampling/strategystore/adaptive/processor.go

+	// before.
+	if len(p.serviceCache) > 1 {
+		if e := p.serviceCache[1].Get(service, operation); e != nil {
+			return e.usingAdapative && e.probability != p.DefaultSamplingProbability


call floatEquals for the right side?

yurishkuro · 2018-11-13T21:45:59Z

plugin/sampling/strategystore/adaptive/processor.go

+	return false
+}
+
+// generateStrategyResponses generates a SamplingStrategyResponse from the calculated sampling probabilities.


generates and caches

yurishkuro · 2018-11-13T21:48:36Z

plugin/sampling/strategystore/adaptive/weightscache.go

+	"sync"
+)
+
+// weightsCache stores normalizing weights of different lengths. The head of the weights slice


weight vectors of different lengths. Could've used 'vector' in the type name as well, makes it much clearer.

yurishkuro · 2018-11-13T21:50:15Z

plugin/sampling/strategystore/adaptive/weightscache.go

+	}
+	l := float64(length)
+	// closed form of sum l^4 ie 1^4 + 2^4 + ... + l^4
+	sum := (l / 30) * (l + 1) * ((2 * l) + 1) * ((3 * l * l) + (3 * l) - 1)


I have no idea what's going on here

Signed-off-by: Won Jun Jang <wjang@uber.com>

Signed-off-by: Yuri Shkuro <ys@uber.com>

yurishkuro

once merged, please update status in #365. Aside from wiring into main, is anything else missing?

yurishkuro · 2019-04-07T18:56:00Z

plugin/sampling/strategystore/adaptive/factory.go

+
+// CreateStrategyStore implements strategystore.Factory
+func (f *Factory) CreateStrategyStore() (strategystore.StrategyStore, error) {
+	// TODO


so there's more to come?

@yurishkuro I can see that in processor.go, the NewProcessor() (https://github.com/jaegertracing/jaeger/blob/master/plugin/sampling/strategystore/adaptive/processor.go#L119-L152) is already returning a &processor{} that has implemented the GetSamplingStrategy(https://github.com/jaegertracing/jaeger/blob/master/plugin/sampling/strategystore/adaptive/processor.go#L155-L162) and satisfies the StrategyStore interface. We no longer use the Factory interface anywhere inside the adaptive sampling directory (https://github.com/jaegertracing/jaeger/tree/master/plugin/sampling/strategystore/adaptive). Do we need an implementation for Factory{} inside adaptive package or can we proceed without it?

individual factories are used by the meta-factory plugin/sampling/strategystore/factory.go:

jaeger/plugin/sampling/strategystore/factory.go

Line 34 in 5062366

var allSamplingTypes = []string{staticStrategyStoreType} // TODO support adaptive

yurishkuro · 2019-04-07T18:57:04Z

plugin/sampling/strategystore/adaptive/options.go

+	// Increase this to reduce the amount of fluctuation in the probability calculation.
+	QPSEquivalenceThreshold float64
+
+	// CalculationInterval determines how often new probabilities are calculated. It was doubles as the interval


It was doubles -> It doubles ?

yurishkuro · 2019-04-07T19:04:20Z

plugin/sampling/strategystore/adaptive/options.go

+	// the CalculationInterval is 1 minute (each bucket contains 1 minute of thoughput data) and the
+	// AggregationBuckets is 3, the adaptive sampling processor will keep at most 3 buckets in memory for
+	// all operations.
+	// TODO(wjang): Expand on why this is needed when BucketsForCalculation seems to suffice.


yes, please elaborate on the use case when BucketsForCalculation needs to be less than AggregationBuckets

yurishkuro · 2019-04-07T20:30:14Z

plugin/sampling/strategystore/adaptive/processor.go

+	return math.Abs(actual-expected)/expected < p.DeltaTolerance
+}
+
+func combineProbabilities(p1 map[string]struct{}, p2 map[string]struct{}) map[string]struct{} {


If I understand this:

you are keeping a set of probability values expressed as strings, and here you are joining two sets

the values are kept in order to decide if the given sampler is using adaptive sampling strategy, by comparing the sampling rate found in the root span's tag with these values

Could you please add this somewhere as a comment, e.g. to the sampling/model Throughput.Probabilities field?

yurishkuro · 2019-04-07T21:13:54Z

plugin/sampling/strategystore/adaptive/processor.go

+	defaultFollowerProbabilityInterval = 20 * time.Second
+
+	// The number of past entries for samplingCache the leader keeps in memory
+	serviceCacheSize = 25


it seems only serviceCacheSize[0] and serviceCacheSize[1] are ever used, is that true? The latter is checked explicitly in isUsingAdaptiveSampling, and the former is used indirectly via prependServiceCache when it eventually moves to position 1.

yes we only use the most recent two entries in the serviceCache. We can probably remove the older entries in the cache. I'll add a TODO for now because there might've been a reason my younger and smarter self did this.

plugin/sampling/strategystore/adaptive/processor.go

yurishkuro · 2019-04-07T22:20:18Z

plugin/sampling/strategystore/adaptive/weightvectorcache.go

+		weights = append(weights, w)
+		sum += w
+	}
+	// normalize


the previous reference to "close form sum" was rather confusing. I changed to explicit normalization step, please verify.

yurishkuro · 2019-04-07T22:23:55Z

plugin/sampling/strategystore/adaptive/processor.go

+
+func (p *processor) calculateProbability(service, operation string, qps float64) float64 {
+	oldProbability := p.InitialSamplingProbability
+	// TODO: is this loop overly expensive?


which loop is this referring to?

yurishkuro · 2019-04-07T22:59:51Z

plugin/sampling/strategystore/adaptive/processor.go

+
+	getThroughputErrMsg = "failed to get throughput from storage"
+
+	defaultFollowerProbabilityInterval = 20 * time.Second


this should also be configurable.

yurishkuro · 2019-04-07T23:01:08Z

plugin/sampling/strategystore/adaptive/processor.go

+	} else {
+		newProbability = p.probabilityCalculator.Calculate(p.TargetSamplesPerSecond, qps, oldProbability)
+	}
+	return math.Min(maxSamplingProbability, math.Max(p.MinSamplingProbability, newProbability))


the maxSamplingProbability const is superfluous, this stmt would read better if you simply use 1.0

black-adder · 2019-04-15T23:23:03Z

going to land and address comments and wire things up in next PR

black-adder requested review from jpkrohling, objectiser, pavolloffay, tiffon, vprithvi and yurishkuro as code owners November 13, 2018 19:59

ghost assigned black-adder Nov 13, 2018

ghost added the review label Nov 13, 2018

yurishkuro reviewed Nov 13, 2018

View reviewed changes

black-adder added 4 commits March 14, 2019 08:12

Move adaptive sampling processor

e11675a

Signed-off-by: Won Jun Jang <wjang@uber.com>

lint

fe80f5f

Signed-off-by: Won Jun Jang <wjang@uber.com>

increase coverage

d0fcf6c

Signed-off-by: Won Jun Jang <wjang@uber.com>

address what I can

b00cd26

Signed-off-by: Won Jun Jang <wjang@uber.com>

black-adder force-pushed the adaptive_sampling_part_deux branch from 9b16980 to b00cd26 Compare March 14, 2019 17:27

yurishkuro and others added 4 commits April 6, 2019 10:35

Merge branch 'master' into adaptive_sampling_part_deux

caa6773

Rename params, minor clean-up

ef2cd27

Signed-off-by: Yuri Shkuro <ys@uber.com>

Fix constant and var name

3b0b6c2

Signed-off-by: Yuri Shkuro <ys@uber.com>

minor refactoring, renaming, and comments, to improve readbility

56ce51a

Signed-off-by: Yuri Shkuro <ys@uber.com>

yurishkuro mentioned this pull request Apr 7, 2019

Adaptive Sampling #365

Closed

3 tasks

yurishkuro approved these changes Apr 7, 2019

View reviewed changes

Merge branch 'master' into adaptive_sampling_part_deux

507404b

black-adder merged commit 0b9311d into master Apr 15, 2019

pavolloffay deleted the adaptive_sampling_part_deux branch August 27, 2019 08:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Move adaptive sampling processor #1179

Move adaptive sampling processor #1179

black-adder commented Nov 13, 2018

codecov bot commented Nov 13, 2018 •

edited

Loading

yurishkuro Nov 13, 2018

yurishkuro Nov 13, 2018

yurishkuro Nov 13, 2018

yurishkuro Nov 13, 2018

yurishkuro Nov 13, 2018

yurishkuro Nov 13, 2018

yurishkuro Nov 13, 2018

yurishkuro left a comment

yurishkuro Apr 7, 2019

Ashmita152 Feb 13, 2021

yurishkuro Feb 13, 2021

yurishkuro Apr 7, 2019

yurishkuro Apr 7, 2019

yurishkuro Apr 7, 2019

yurishkuro Apr 7, 2019

black-adder Apr 16, 2019

yurishkuro Apr 7, 2019

yurishkuro Apr 7, 2019

yurishkuro Apr 7, 2019

yurishkuro Apr 7, 2019

black-adder commented Apr 15, 2019


		package adaptive

		// samplingCacheEntry keeps track of the probability and whether a service-operation is using adaptive sampling


		getThroughputErrMsg = "failed to get throughput from storage"

		defaultFollowerProbabilityInterval = 20 * time.Second

Move adaptive sampling processor #1179

Move adaptive sampling processor #1179

Conversation

black-adder commented Nov 13, 2018

codecov bot commented Nov 13, 2018 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yurishkuro left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

black-adder commented Apr 15, 2019

codecov bot commented Nov 13, 2018 •

edited

Loading