Cloud output v2 #3117

codebien · 2023-06-06T15:48:30Z

Context

#2954 introduces the new experimental Coud output with a Protobuf-based protocol.

Memory usage

After the first iteration, the memory usage is higher than required. Especially for the Trend metrics is very easy to saturate the bandwidth in a range from tons of KiloBytes up to the remote limit (1 MB).

We also decided to denormalize some fields to reduce the workload and keep the implementation simple on the remote server but the load generated on the client is high, we should revisit this decision.

Fault tolerance

The current flush process could be more fault tolerant, it doesn't retry on failures.

Validation

__name__ and test_run_id are reserved labels for the remote service and if a test also sets them then there are conflicts generating unexpected behavior for the user. A more dev-friendly UX should be implemented.

Proposal

We identified some actions that should drive us to the goal:

A more compact Protobuf representation for Histogram.
Split in multiple requests when the flush process gets a number of time series higher than the MaxMetricSamplesPerPackage variable.
Normalize as MetricSet's fields the common fields across time series.
Fault-tolerant flush operation.
Exclude __name__ and test_run_id from the allowed tag names.

Acceptance criteria

Change the Cloud output default version to 2.

Worklog

Must have

Give feedback

Nice to have (in case we need to reduce the scope)

Give feedback

output/cloudv2: Unlimited size for body payload #3120
output/cloudv2: Use a static remote service url #3125
TestOutputFlush always enabled #3146

cloud lower prio tests
cloudv2: s/ReferenceID/TestRunID/ #3137
Snappy-Framing support #3122

cloud enhancement performance
Revaluate the current periodic and abort signal architecture/interaction (output/cloudv2: Error handling for flush #3082 (comment), output/cloudv2: Retry and flushers pool #3104 (comment))
Unexport all the strucs/methods/fields not required as exported
Options

The text was updated successfully, but these errors were encountered:

codebien · 2023-08-08T13:12:04Z

Most of the critical work expected here has been merged and will be released on v0.46.0. Remaining work will continue on #3258

codebien added this to the v0.46.0 milestone Jun 6, 2023

codebien self-assigned this Jun 6, 2023

This was referenced Jun 8, 2023

cloud: Binary-based ingestion #2954

Closed

output/cloudv2: Flush chunks #3108

Merged

olegbespalov self-assigned this Jun 14, 2023

mstoykov mentioned this issue Jun 15, 2023

output/cloudv2: Retry and flushers pool #3104

Merged

codebien mentioned this issue Jun 27, 2023

cloudv2: Not replicate common fields #3144

Merged

olegbespalov mentioned this issue Jun 27, 2023

TestOutputFlush always enabled #3146

Merged

5 tasks

This was referenced Jun 29, 2023

output/cloudv2: Add retry support to insights client #3150

Merged

output/cloudv2: Compact histogram #3169

Merged

olegbespalov mentioned this issue Jul 6, 2023

chore: add MaxTimeSeriesInBatch to the printable config #3172

Merged

5 tasks

codebien changed the title ~~Cloud output v2 as default~~ Cloud output v2 Aug 8, 2023

codebien mentioned this issue Aug 8, 2023

cloud: Set v2 as the default output #3258

Closed

codebien closed this as completed Aug 8, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cloud output v2 #3117

Cloud output v2 #3117

codebien commented Jun 6, 2023 •

edited

Loading

Must have

Nice to have (in case we need to reduce the scope)

codebien commented Aug 8, 2023

Cloud output v2 #3117

Cloud output v2 #3117

Comments

codebien commented Jun 6, 2023 • edited Loading

Context

Memory usage

Fault tolerance

Validation

Proposal

Acceptance criteria

Worklog

Must have

Nice to have (in case we need to reduce the scope)

codebien commented Aug 8, 2023

codebien commented Jun 6, 2023 •

edited

Loading