Refactor workload-lifecycle e2e to be more fault tolerant #2513
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The workload-lifecycel e2e can flake because it only runs the traffic generation job once, then loops checking for traces. If auto-instrumentation isn't actually running for a service before the job is generated, the loop is doomed to fail.
This changes how the
workload-lifecyclee2e generates test traces and counts them:jqfilter to count the unique occurance of each service nameUnlike the source e2e, which can just check for a minimum number of spans, we have to aggregate the unique services in these checks. That's because in the source e2e, all of the services are tied in a single trace. In this test, the services are separate, so the loop could generate multiple traces for the same service while waiting for others to be ready. This means we can't rely on just checking
minimum, because we might hit that minimum even if all services haven't sent traces yet.We might be able to eventually drop the
custom_jqfield and just make that the default, but because other jobs might be using thetraceql_runner.shscript I'm not doing that yet to not break anything else.