feat: improved poller algorithm #2622

mathnogueira · 2023-05-31T18:12:04Z

This PR introduces another algorithm parameter for our trace poller. When we don't detect changes in the trace anymore, we try matching the selectors with spans. If all selectors get at least one span, we consider the polling successful, otherwise, try again in the next polling cycle. If we have no changes in the trace and we cannot match the selectors with spans from the trace 3 times in a row, we end the polling and mark it as complete (to prevent issues when writing TDD)

There are more things that could be done and I'll do in the future if needed:

Make this configurable via a polling profile option (either another strategy, or just an option inside the periodic object)

Example

# Run	Action
# 01	Got initial trace
# 02	Got more spans, continue
# 03	Got more spans, continue
# 04	No more spans, but not all selectors are working
# 05	No more spans, but not all selectors are working
# 06	No more spans, but not all selectors are working. But we tried 3 times already, so finish polling


## Checklist

- [x] tested locally
- [ ] added new dependencies
- [ ] updated the docs
- [x] added a test

mathnogueira · 2023-05-31T18:20:21Z

server/executor/trace_poller.go

-	ctx        context.Context
-	test       model.Test
-	run        model.Run
-	count      int
-	hadRequeue bool


The idea behind changing this object is to make it extendable via headers (just like network protocols work by attaching headers to the packet). Thus, I removed the context which can be serialized via context propagation and the hadRequeue.

This way, we can have the counter for the selector based polling executor without changing this object and impacting other strategies..

schoren

Looks great and I think it's going to have a great impact on the more flaky tests. Left a few suggestions, nothing to block a merge

schoren · 2023-05-31T20:08:26Z

server/executor/selector_based_poller_executor.go

+)
+
+const (
+	selectorBasedPollerExecutorRetryHeader = "SelectorBasedPollerExecutor::retryCount"


Nit: I feel like it would be more consistent to use . instead of :: to split the "namespace". Did you choose this for any particular reason?

when I see namespace I remember C++ and used it's ugly convention 😆

Question, do we have/want event logs for the new polling strategy? so we can see the failed selectors from the client side

added some logs to it! Good catch

schoren · 2023-05-31T20:10:12Z

server/executor/selector_based_poller_executor.go

+
+	currentNumberTries := pe.getNumberTries(request)
+	if currentNumberTries >= selectorBasedPollerExecutorMaxTries {
+		return true, "not all selectors matched, but trace haven't changed in a while", run, err


would the end user, at some point, have a clear undestanding of what in a while means? Like, is the amount of retries related to this error at some point? If not, maybe add the amount of retries/time in this message?

actually, oscar got a point: this message should be an event. This string is only shown if the polling is not complete (first parameter is false) so it's useless in this context)

schoren · 2023-05-31T20:15:28Z

server/executor/selector_based_poller_executor.go

+		return true, "all selectors have matched one or more spans", run, err
+	}
+
+	request.SetHeader(selectorBasedPollerExecutorRetryHeader, fmt.Sprintf("%d", currentNumberTries+1))


Since this file already has the strconv package imported, might look more consistent use it here

Suggested change

request.SetHeader(selectorBasedPollerExecutorRetryHeader, fmt.Sprintf("%d", currentNumberTries+1))

request.SetHeader(selectorBasedPollerExecutorRetryHeader, strconv.Itoa(currentNumberTries+1))

Another approach could be to add a few request.SetHeaderInt(key string, val int), request.GetHeaderInt(key string) int kind of methods to the request, to reduce the boilerplate on the consumer side, and further abstract them from the internal implementation

schoren · 2023-05-31T20:19:56Z

server/executor/trace_poller.go

 }

 func (pr PollingRequest) IsFirstRequest() bool {
-	return !pr.hadRequeue
+	return pr.Header("requeued") != "true"


another point where a return !pr.GetBool("requeued") would look cool

schoren · 2023-05-31T20:33:48Z

server/executor/default_poller_executor.go

@@ -189,7 +194,7 @@ func (pe DefaultPollerExecutor) donePollingTraces(job *PollingRequest, traceDB t
 	if !traceDB.ShouldRetry() {
 		return true, "TraceDB is not retryable"
 	}
-	pp := *pe.ppGetter.GetDefault(job.ctx).Periodic
+	pp := *pe.ppGetter.GetDefault(job.Context()).Periodic


since you're here, can you add a nil check for Peridiodic? It should never be nil, but it's better to be safe than panic

xoscar · 2023-05-31T21:40:06Z

Question, do we have/want event logs for the new polling strategy? so we can see the failed selectors from the client side

* simplify polling request object and use mapcarrier to propagate ctx * add another condition to check for selectors after all spans are ready * fix test * add option to configure max retries in selector based trace poller * move selector based poller initialization to app.go * PR suggestions * fix tests and move polling success event to trace poller component * add events to new selector poller executor * fix pooling profile test * fix tests

mathnogueira commented May 31, 2023

View reviewed changes

schoren approved these changes May 31, 2023

View reviewed changes

schoren reviewed May 31, 2023

View reviewed changes

danielbdias approved these changes May 31, 2023

View reviewed changes

mathnogueira added 8 commits June 1, 2023 13:42

simplify polling request object and use mapcarrier to propagate ctx

3592ca5

add another condition to check for selectors after all spans are ready

08804ac

fix test

b3d862a

add option to configure max retries in selector based trace poller

dbe317c

move selector based poller initialization to app.go

57c560c

PR suggestions

0016816

fix tests and move polling success event to trace poller component

462da19

add events to new selector poller executor

e621a3a

mathnogueira force-pushed the feat/improved-poller-algorithm branch from d7547fe to e621a3a Compare June 1, 2023 16:48

mathnogueira added 2 commits June 1, 2023 13:56

fix pooling profile test

09b3d2c

fix tests

f5acc23

xoscar approved these changes Jun 1, 2023

View reviewed changes

mathnogueira merged commit 163fe30 into main Jun 1, 2023
28 checks passed

mathnogueira deleted the feat/improved-poller-algorithm branch June 1, 2023 17:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: improved poller algorithm #2622

feat: improved poller algorithm #2622

mathnogueira commented May 31, 2023 •

edited

mathnogueira May 31, 2023

schoren left a comment

schoren May 31, 2023

mathnogueira Jun 1, 2023

mathnogueira Jun 1, 2023

schoren May 31, 2023

mathnogueira Jun 1, 2023

schoren May 31, 2023

schoren May 31, 2023

schoren May 31, 2023

xoscar commented May 31, 2023

	request.SetHeader(selectorBasedPollerExecutorRetryHeader, fmt.Sprintf("%d", currentNumberTries+1))
	request.SetHeader(selectorBasedPollerExecutorRetryHeader, strconv.Itoa(currentNumberTries+1))

feat: improved poller algorithm #2622

feat: improved poller algorithm #2622

Conversation

mathnogueira commented May 31, 2023 • edited

Example

Choose a reason for hiding this comment

schoren left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

xoscar commented May 31, 2023

mathnogueira commented May 31, 2023 •

edited