pkg/manifests: Configure remote write more conservatively #630

brancz · 2020-01-24T12:08:01Z

Note this PR does not re-enable telemetry by default, it merely if enabled fine tunes the remote write config. With this config we neither require resharding nor does remote write fall behind trying to replicate the data to the remote which did happen with the previous configuration. Additionally this drives the req/s per Prometheus down to ~4req/s.

@lilic @s-urbaniak @paulfantom @simonpasquier @pgier

lilic

Ideally testing and tweaking this before we enable remote/write would be good no? On something that is not just a cluster-bot.

lilic · 2020-01-24T12:30:54Z

pkg/manifests/manifests.go

+				// buffer before waiting for samples to be sent successfully
+				// and then continuing to read from the WAL.
+				Capacity: 30000,
+				// Should we accumulate 10000 samples before the batch send


You mean less than 1000 samples?

The comment refers to the MaxSamplesPerSend, which is set to 10000.

s-urbaniak · 2020-01-27T11:56:51Z

/lgtm

openshift-ci-robot · 2020-01-27T11:57:14Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: brancz, s-urbaniak

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [brancz,s-urbaniak]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

openshift-bot · 2020-01-27T13:11:56Z

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2020-01-27T13:24:57Z

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2020-01-27T14:16:31Z

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2020-01-27T14:29:25Z

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2020-01-27T15:34:29Z

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2020-01-27T17:19:03Z

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2020-01-27T17:31:22Z

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2020-01-27T19:54:23Z

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2020-01-27T20:07:37Z

/retest

Please review the full test history for this PR and help us cut down flakes.

paulfantom · 2020-01-27T23:46:08Z

The generation step is failing. Putting this on hold

/hold

cben · 2020-01-30T10:32:07Z

pkg/manifests/manifests.go

+				MinBackoff: "1s",
+				// 128s is the 8th backoff in a row, once we end up here, we
+				// don't increase backoff time anymore. As we would at most
+				// produce (concurrency/256) number of requests per second.


These comments are very helpful for uninitated readers 👍
Couple questions that may be worth clarifying:

256s is more that 1m. May multiple backing-off batches overlap, or do they block later batches until success / give up?

"we don't increase backoff time anymore" — does that mean retries continues indefinitely at 256s intervals? Or is the batch given up after 8 failures?

they have to be consecutive so only the concurrency factor plays a role here.

it retries infinitely at 256s intervals and stops after the WAL is cut which happens every 2 hours. Then the mechanisms starts tailing the “new” WAL.

s-urbaniak · 2020-01-31T06:36:32Z

/hold cancel

openshift-bot · 2020-01-31T06:40:06Z

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2020-01-31T06:54:11Z

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2020-01-31T08:11:10Z

/retest

Please review the full test history for this PR and help us cut down flakes.

lilic · 2020-01-31T10:48:33Z

@brancz you need bugzilla for this now, just create one with "Alerts for remote write firing" 🙄

brancz · 2020-01-31T12:42:28Z

It’s ok this is fine to wait until 4.5 master opens again.

openshift-bot · 2020-02-19T01:19:21Z

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2020-02-19T03:03:16Z

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2020-02-19T04:36:06Z

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2020-02-19T05:00:10Z

/retest

Please review the full test history for this PR and help us cut down flakes.

pkg/manifests: Configure remote write more conservatively

6acfab0

openshift-ci-robot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Jan 24, 2020

openshift-ci-robot requested review from paulfantom and pgier January 24, 2020 12:08

openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 24, 2020

lilic reviewed Jan 24, 2020

View reviewed changes

openshift-ci-robot assigned s-urbaniak Jan 27, 2020

openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Jan 27, 2020

openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 27, 2020

cben reviewed Jan 30, 2020

View reviewed changes

openshift-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 31, 2020

openshift-merge-robot merged commit 3f72900 into openshift:master Feb 19, 2020

brancz deleted the improve-rw-config branch February 19, 2020 06:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pkg/manifests: Configure remote write more conservatively #630

pkg/manifests: Configure remote write more conservatively #630

brancz commented Jan 24, 2020 •

edited

lilic left a comment

lilic Jan 24, 2020

brancz Jan 24, 2020

s-urbaniak commented Jan 27, 2020

openshift-ci-robot commented Jan 27, 2020

openshift-bot commented Jan 27, 2020

openshift-bot commented Jan 27, 2020

openshift-bot commented Jan 27, 2020

openshift-bot commented Jan 27, 2020

openshift-bot commented Jan 27, 2020

openshift-bot commented Jan 27, 2020

openshift-bot commented Jan 27, 2020

openshift-bot commented Jan 27, 2020

openshift-bot commented Jan 27, 2020

paulfantom commented Jan 27, 2020

cben Jan 30, 2020

brancz Jan 30, 2020

s-urbaniak commented Jan 31, 2020

openshift-bot commented Jan 31, 2020

openshift-bot commented Jan 31, 2020

openshift-bot commented Jan 31, 2020

lilic commented Jan 31, 2020

brancz commented Jan 31, 2020

openshift-bot commented Feb 19, 2020

openshift-bot commented Feb 19, 2020

openshift-bot commented Feb 19, 2020

openshift-bot commented Feb 19, 2020

pkg/manifests: Configure remote write more conservatively #630

pkg/manifests: Configure remote write more conservatively #630

Conversation

brancz commented Jan 24, 2020 • edited

lilic left a comment

Choose a reason for hiding this comment

lilic Jan 24, 2020

Choose a reason for hiding this comment

brancz Jan 24, 2020

Choose a reason for hiding this comment

s-urbaniak commented Jan 27, 2020

openshift-ci-robot commented Jan 27, 2020

openshift-bot commented Jan 27, 2020

openshift-bot commented Jan 27, 2020

openshift-bot commented Jan 27, 2020

openshift-bot commented Jan 27, 2020

openshift-bot commented Jan 27, 2020

openshift-bot commented Jan 27, 2020

openshift-bot commented Jan 27, 2020

openshift-bot commented Jan 27, 2020

openshift-bot commented Jan 27, 2020

paulfantom commented Jan 27, 2020

cben Jan 30, 2020

Choose a reason for hiding this comment

brancz Jan 30, 2020

Choose a reason for hiding this comment

s-urbaniak commented Jan 31, 2020

openshift-bot commented Jan 31, 2020

openshift-bot commented Jan 31, 2020

openshift-bot commented Jan 31, 2020

lilic commented Jan 31, 2020

brancz commented Jan 31, 2020

openshift-bot commented Feb 19, 2020

openshift-bot commented Feb 19, 2020

openshift-bot commented Feb 19, 2020

openshift-bot commented Feb 19, 2020

brancz commented Jan 24, 2020 •

edited