Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pkg/manifests: Configure remote write more conservatively #630

Merged
merged 1 commit into from Feb 19, 2020

Conversation

brancz
Copy link
Contributor

@brancz brancz commented Jan 24, 2020

Note this PR does not re-enable telemetry by default, it merely if enabled fine tunes the remote write config. With this config we neither require resharding nor does remote write fall behind trying to replicate the data to the remote which did happen with the previous configuration. Additionally this drives the req/s per Prometheus down to ~4req/s.

@lilic @s-urbaniak @paulfantom @simonpasquier @pgier

@openshift-ci-robot openshift-ci-robot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Jan 24, 2020
@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 24, 2020
Copy link
Contributor

@lilic lilic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally testing and tweaking this before we enable remote/write would be good no? On something that is not just a cluster-bot.

// buffer before waiting for samples to be sent successfully
// and then continuing to read from the WAL.
Capacity: 30000,
// Should we accumulate 10000 samples before the batch send
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean less than 1000 samples?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment refers to the MaxSamplesPerSend, which is set to 10000.

@s-urbaniak
Copy link
Contributor

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Jan 27, 2020
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: brancz, s-urbaniak

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

8 similar comments
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@paulfantom
Copy link
Contributor

The generation step is failing. Putting this on hold

/hold

@openshift-ci-robot openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 27, 2020
MinBackoff: "1s",
// 128s is the 8th backoff in a row, once we end up here, we
// don't increase backoff time anymore. As we would at most
// produce (concurrency/256) number of requests per second.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These comments are very helpful for uninitated readers 👍
Couple questions that may be worth clarifying:

  • 256s is more that 1m. May multiple backing-off batches overlap, or do they block later batches until success / give up?
  • "we don't increase backoff time anymore" — does that mean retries continues indefinitely at 256s intervals? Or is the batch given up after 8 failures?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. they have to be consecutive so only the concurrency factor plays a role here.
  2. it retries infinitely at 256s intervals and stops after the WAL is cut which happens every 2 hours. Then the mechanisms starts tailing the “new” WAL.

@s-urbaniak
Copy link
Contributor

/hold cancel

@openshift-ci-robot openshift-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 31, 2020
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

2 similar comments
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@lilic
Copy link
Contributor

lilic commented Jan 31, 2020

@brancz you need bugzilla for this now, just create one with "Alerts for remote write firing" 🙄

@brancz
Copy link
Contributor Author

brancz commented Jan 31, 2020

It’s ok this is fine to wait until 4.5 master opens again.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

2 similar comments
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-merge-robot openshift-merge-robot merged commit 3f72900 into openshift:master Feb 19, 2020
@brancz brancz deleted the improve-rw-config branch February 19, 2020 06:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants