-
Notifications
You must be signed in to change notification settings - Fork 11.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unified Alerting: Set max_attempts
to 1 by default
#79095
Conversation
The retry logic for unified alerting has been broken as far as v9.4.x, rather than fixing it in one go and causing a headache to our users with rules putting extra load on their datasources - I think a better approach is to simply set 1 as a default and then let our users change it. I see two cons with this approach: - Configuration for legacy to unified alerting cannot be ported over automatically, users will have to manually set `max_attempts` to 3 when migrating. - Users expecting to get any sort of retrying (as with legacy alerting) will not have it out of the box and will have to manually edit the configuration. Signed-off-by: gotjosh <josue.abreu@gmail.com>
LGTM as it appears this config option has no effect currently. |
Signed-off-by: gotjosh <josue.abreu@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
We discussed this offline extensively and all agreed that it was a fine change to merge. Please find the conversation in: https://raintank-corp.slack.com/archives/C028MCV4R7C/p1701445758479449 |
The backport to
To backport manually, run these commands in your terminal: # Fetch latest updates from GitHub
git fetch
# Create a new branch
git switch --create backport-79095-to-v9.4.x origin/v9.4.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x 0c9356a3c78b83b8bb5abf99263053b10de84d48 When the conflicts are resolved, stage and commit the changes:
If you have the GitHub CLI installed: # Push the branch to GitHub:
git push --set-upstream origin backport-79095-to-v9.4.x
# Create the PR body template
PR_BODY=$(gh pr view 79095 --json body --template 'Backport 0c9356a3c78b83b8bb5abf99263053b10de84d48 from #79095{{ "\n\n---\n\n" }}{{ index . "body" }}')
# Create the PR on GitHub
echo "${PR_BODY}" | gh pr create --title "[v9.4.x] Unified Alerting: Set `max_attempts` to 1 by default" --body-file - --label "type/docs" --label "area/backend" --label "add to changelog" --label "product-approved" --label "backport" --base v9.4.x --milestone 9.4.x --web Or, if you don't have the GitHub CLI installed (we recommend you install it!): # Push the branch to GitHub:
git push --set-upstream origin backport-79095-to-v9.4.x
# Create a pull request where the `base` branch is `v9.4.x` and the `compare`/`head` branch is `backport-79095-to-v9.4.x`.
# Remove the local backport branch
git switch main
git branch -D backport-79095-to-v9.4.x |
The backport to
To backport manually, run these commands in your terminal: # Fetch latest updates from GitHub
git fetch
# Create a new branch
git switch --create backport-79095-to-v9.5.x origin/v9.5.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x 0c9356a3c78b83b8bb5abf99263053b10de84d48 When the conflicts are resolved, stage and commit the changes:
If you have the GitHub CLI installed: # Push the branch to GitHub:
git push --set-upstream origin backport-79095-to-v9.5.x
# Create the PR body template
PR_BODY=$(gh pr view 79095 --json body --template 'Backport 0c9356a3c78b83b8bb5abf99263053b10de84d48 from #79095{{ "\n\n---\n\n" }}{{ index . "body" }}')
# Create the PR on GitHub
echo "${PR_BODY}" | gh pr create --title "[v9.5.x] Unified Alerting: Set `max_attempts` to 1 by default" --body-file - --label "type/docs" --label "area/backend" --label "add to changelog" --label "product-approved" --label "backport" --base v9.5.x --milestone 9.5.x --web Or, if you don't have the GitHub CLI installed (we recommend you install it!): # Push the branch to GitHub:
git push --set-upstream origin backport-79095-to-v9.5.x
# Create a pull request where the `base` branch is `v9.5.x` and the `compare`/`head` branch is `backport-79095-to-v9.5.x`.
# Remove the local backport branch
git switch main
git branch -D backport-79095-to-v9.5.x |
Unified Alerting: Set `max_attempts` to 1 by default (#79095) * Unified Alerting: Set `max_attempts` to 1 by default The retry logic for unified alerting has been broken as far as v9.4.x, rather than fixing it in one go and causing a headache to our users with rules putting extra load on their datasources - I think a better approach is to simply set 1 as a default and then let our users change it. I see two cons with this approach: - Configuration for legacy to unified alerting cannot be ported over automatically, users will have to manually set `max_attempts` to 3 when migrating. - Users expecting to get any sort of retrying (as with legacy alerting) will not have it out of the box and will have to manually edit the configuration. Signed-off-by: gotjosh <josue.abreu@gmail.com> --------- Signed-off-by: gotjosh <josue.abreu@gmail.com> (cherry picked from commit 0c9356a) Co-authored-by: gotjosh <josue.abreu@gmail.com>
Unified Alerting: Set `max_attempts` to 1 by default (#79095) * Unified Alerting: Set `max_attempts` to 1 by default The retry logic for unified alerting has been broken as far as v9.4.x, rather than fixing it in one go and causing a headache to our users with rules putting extra load on their datasources - I think a better approach is to simply set 1 as a default and then let our users change it. I see two cons with this approach: - Configuration for legacy to unified alerting cannot be ported over automatically, users will have to manually set `max_attempts` to 3 when migrating. - Users expecting to get any sort of retrying (as with legacy alerting) will not have it out of the box and will have to manually edit the configuration. Signed-off-by: gotjosh <josue.abreu@gmail.com> --------- Signed-off-by: gotjosh <josue.abreu@gmail.com> (cherry picked from commit 0c9356a) Co-authored-by: gotjosh <josue.abreu@gmail.com>
Unified Alerting: Set `max_attempts` to 1 by default (#79095) * Unified Alerting: Set `max_attempts` to 1 by default The retry logic for unified alerting has been broken as far as v9.4.x, rather than fixing it in one go and causing a headache to our users with rules putting extra load on their datasources - I think a better approach is to simply set 1 as a default and then let our users change it. I see two cons with this approach: - Configuration for legacy to unified alerting cannot be ported over automatically, users will have to manually set `max_attempts` to 3 when migrating. - Users expecting to get any sort of retrying (as with legacy alerting) will not have it out of the box and will have to manually edit the configuration. Signed-off-by: gotjosh <josue.abreu@gmail.com> --------- Signed-off-by: gotjosh <josue.abreu@gmail.com> (cherry picked from commit 0c9356a) Co-authored-by: gotjosh <josue.abreu@gmail.com>
Unified Alerting: Set `max_attempts` to 1 by default (#79095) * Unified Alerting: Set `max_attempts` to 1 by default The retry logic for unified alerting has been broken as far as v9.4.x, rather than fixing it in one go and causing a headache to our users with rules putting extra load on their datasources - I think a better approach is to simply set 1 as a default and then let our users change it. I see two cons with this approach: - Configuration for legacy to unified alerting cannot be ported over automatically, users will have to manually set `max_attempts` to 3 when migrating. - Users expecting to get any sort of retrying (as with legacy alerting) will not have it out of the box and will have to manually edit the configuration. Signed-off-by: gotjosh <josue.abreu@gmail.com> --------- Signed-off-by: gotjosh <josue.abreu@gmail.com> (cherry picked from commit 0c9356a)
Unified Alerting: Set `max_attempts` to 1 by default (#79095) * Unified Alerting: Set `max_attempts` to 1 by default The retry logic for unified alerting has been broken as far as v9.4.x, rather than fixing it in one go and causing a headache to our users with rules putting extra load on their datasources - I think a better approach is to simply set 1 as a default and then let our users change it. I see two cons with this approach: - Configuration for legacy to unified alerting cannot be ported over automatically, users will have to manually set `max_attempts` to 3 when migrating. - Users expecting to get any sort of retrying (as with legacy alerting) will not have it out of the box and will have to manually edit the configuration. Signed-off-by: gotjosh <josue.abreu@gmail.com> --------- Signed-off-by: gotjosh <josue.abreu@gmail.com> (cherry picked from commit 0c9356a)
Retrying has been broken for a good while now (at least since version 9.4) - this change attempts to re-introduce them in their simplest and safest form possible. I first introduced #79095 to make sure we don't disrupt or put additional load on our customer's data sources with this change in a patch release. Paired with this change, retries can now work as expected. There's two small differences between how retries work now and how they used to work in legacy alerting. Retries only occur for valid alert definitions - if we suspect that that error comes from a malformed alert definition we skip retrying. We have added a constant backoff of 1s in between retries. Signed-off-by: gotjosh <josue.abreu@gmail.com>
* Alerting: Attempt to retry retryable errors Retrying has been broken for a good while now (at least since version 9.4) - this change attempts to re-introduce them in their simplest and safest form possible. I first introduced #79095 to make sure we don't disrupt or put additional load on our customer's data sources with this change in a patch release. Paired with this change, retries can now work as expected. There's two small differences between how retries work now and how they used to work in legacy alerting. Retries only occur for valid alert definitions - if we suspect that that error comes from a malformed alert definition we skip retrying. We have added a constant backoff of 1s in between retries. --------- Signed-off-by: gotjosh <josue.abreu@gmail.com>
* Alerting: Attempt to retry retryable errors Retrying has been broken for a good while now (at least since version 9.4) - this change attempts to re-introduce them in their simplest and safest form possible. I first introduced #79095 to make sure we don't disrupt or put additional load on our customer's data sources with this change in a patch release. Paired with this change, retries can now work as expected. There's two small differences between how retries work now and how they used to work in legacy alerting. Retries only occur for valid alert definitions - if we suspect that that error comes from a malformed alert definition we skip retrying. We have added a constant backoff of 1s in between retries. --------- Signed-off-by: gotjosh <josue.abreu@gmail.com> (cherry picked from commit c631261)
* Alerting: Attempt to retry retryable errors Retrying has been broken for a good while now (at least since version 9.4) - this change attempts to re-introduce them in their simplest and safest form possible. I first introduced #79095 to make sure we don't disrupt or put additional load on our customer's data sources with this change in a patch release. Paired with this change, retries can now work as expected. There's two small differences between how retries work now and how they used to work in legacy alerting. Retries only occur for valid alert definitions - if we suspect that that error comes from a malformed alert definition we skip retrying. We have added a constant backoff of 1s in between retries. --------- Signed-off-by: gotjosh <josue.abreu@gmail.com> (cherry picked from commit c631261)
* Alerting: Attempt to retry retryable errors Retrying has been broken for a good while now (at least since version 9.4) - this change attempts to re-introduce them in their simplest and safest form possible. I first introduced #79095 to make sure we don't disrupt or put additional load on our customer's data sources with this change in a patch release. Paired with this change, retries can now work as expected. There's two small differences between how retries work now and how they used to work in legacy alerting. Retries only occur for valid alert definitions - if we suspect that that error comes from a malformed alert definition we skip retrying. We have added a constant backoff of 1s in between retries. --------- Signed-off-by: gotjosh <josue.abreu@gmail.com> (cherry picked from commit c631261) Signed-off-by: gotjosh <josue.abreu@gmail.com>
Alerting: Attempt to retry retryable errors (#79161) * Alerting: Attempt to retry retryable errors Retrying has been broken for a good while now (at least since version 9.4) - this change attempts to re-introduce them in their simplest and safest form possible. I first introduced #79095 to make sure we don't disrupt or put additional load on our customer's data sources with this change in a patch release. Paired with this change, retries can now work as expected. There's two small differences between how retries work now and how they used to work in legacy alerting. Retries only occur for valid alert definitions - if we suspect that that error comes from a malformed alert definition we skip retrying. We have added a constant backoff of 1s in between retries. --------- (cherry picked from commit c631261) Signed-off-by: gotjosh <josue.abreu@gmail.com>
Alerting: Attempt to retry retryable errors (#79161) * Alerting: Attempt to retry retryable errors Retrying has been broken for a good while now (at least since version 9.4) - this change attempts to re-introduce them in their simplest and safest form possible. I first introduced #79095 to make sure we don't disrupt or put additional load on our customer's data sources with this change in a patch release. Paired with this change, retries can now work as expected. There's two small differences between how retries work now and how they used to work in legacy alerting. Retries only occur for valid alert definitions - if we suspect that that error comes from a malformed alert definition we skip retrying. We have added a constant backoff of 1s in between retries. --------- Signed-off-by: gotjosh <josue.abreu@gmail.com> (cherry picked from commit c631261) Co-authored-by: gotjosh <josue.abreu@gmail.com>
* Alerting: Attempt to retry retryable errors Retrying has been broken for a good while now (at least since version 9.4) - this change attempts to re-introduce them in their simplest and safest form possible. I first introduced #79095 to make sure we don't disrupt or put additional load on our customer's data sources with this change in a patch release. Paired with this change, retries can now work as expected. There's two small differences between how retries work now and how they used to work in legacy alerting. Retries only occur for valid alert definitions - if we suspect that that error comes from a malformed alert definition we skip retrying. We have added a constant backoff of 1s in between retries. --------- Signed-off-by: gotjosh <josue.abreu@gmail.com> (cherry picked from commit c631261)
* Alerting: Attempt to retry retryable errors Retrying has been broken for a good while now (at least since version 9.4) - this change attempts to re-introduce them in their simplest and safest form possible. I first introduced #79095 to make sure we don't disrupt or put additional load on our customer's data sources with this change in a patch release. Paired with this change, retries can now work as expected. There's two small differences between how retries work now and how they used to work in legacy alerting. Retries only occur for valid alert definitions - if we suspect that that error comes from a malformed alert definition we skip retrying. We have added a constant backoff of 1s in between retries. --------- Signed-off-by: gotjosh <josue.abreu@gmail.com> (cherry picked from commit c631261) Signed-off-by: gotjosh <josue.abreu@gmail.com>
* Alerting: Attempt to retry retryable errors Retrying has been broken for a good while now (at least since version 9.4) - this change attempts to re-introduce them in their simplest and safest form possible. I first introduced #79095 to make sure we don't disrupt or put additional load on our customer's data sources with this change in a patch release. Paired with this change, retries can now work as expected. There's two small differences between how retries work now and how they used to work in legacy alerting. Retries only occur for valid alert definitions - if we suspect that that error comes from a malformed alert definition we skip retrying. We have added a constant backoff of 1s in between retries. --------- Signed-off-by: gotjosh <josue.abreu@gmail.com> (cherry picked from commit c631261)
* Alerting: Attempt to retry retryable errors Retrying has been broken for a good while now (at least since version 9.4) - this change attempts to re-introduce them in their simplest and safest form possible. I first introduced #79095 to make sure we don't disrupt or put additional load on our customer's data sources with this change in a patch release. Paired with this change, retries can now work as expected. There's two small differences between how retries work now and how they used to work in legacy alerting. Retries only occur for valid alert definitions - if we suspect that that error comes from a malformed alert definition we skip retrying. We have added a constant backoff of 1s in between retries. --------- Signed-off-by: gotjosh <josue.abreu@gmail.com> (cherry picked from commit c631261)
Alerting: Attempt to retry retryable errors (#79161) * Alerting: Attempt to retry retryable errors Retrying has been broken for a good while now (at least since version 9.4) - this change attempts to re-introduce them in their simplest and safest form possible. I first introduced #79095 to make sure we don't disrupt or put additional load on our customer's data sources with this change in a patch release. Paired with this change, retries can now work as expected. There's two small differences between how retries work now and how they used to work in legacy alerting. Retries only occur for valid alert definitions - if we suspect that that error comes from a malformed alert definition we skip retrying. We have added a constant backoff of 1s in between retries. --------- Signed-off-by: gotjosh <josue.abreu@gmail.com> (cherry picked from commit c631261)
Alerting: Attempt to retry retryable errors (#79161) * Alerting: Attempt to retry retryable errors Retrying has been broken for a good while now (at least since version 9.4) - this change attempts to re-introduce them in their simplest and safest form possible. I first introduced #79095 to make sure we don't disrupt or put additional load on our customer's data sources with this change in a patch release. Paired with this change, retries can now work as expected. There's two small differences between how retries work now and how they used to work in legacy alerting. Retries only occur for valid alert definitions - if we suspect that that error comes from a malformed alert definition we skip retrying. We have added a constant backoff of 1s in between retries. --------- (cherry picked from commit c631261) Signed-off-by: gotjosh <josue.abreu@gmail.com>
Alerting: Attempt to retry retryable errors (#79161) * Alerting: Attempt to retry retryable errors Retrying has been broken for a good while now (at least since version 9.4) - this change attempts to re-introduce them in their simplest and safest form possible. I first introduced #79095 to make sure we don't disrupt or put additional load on our customer's data sources with this change in a patch release. Paired with this change, retries can now work as expected. There's two small differences between how retries work now and how they used to work in legacy alerting. Retries only occur for valid alert definitions - if we suspect that that error comes from a malformed alert definition we skip retrying. We have added a constant backoff of 1s in between retries. --------- Signed-off-by: gotjosh <josue.abreu@gmail.com> (cherry picked from commit c631261)
The retry logic for unified alerting has been broken as far as v9.4.x, rather than fixing it in one go and causing a headache to our users with rules putting extra load on their datasources - I think a better approach is to simply set 1 as a default and then let our users change it.
I see two cons with this approach:
max_attempts
to 3 when migrating.I think this is the best compromise we can make with the introduction of #79037
Signed-off-by: gotjosh josue.abreu@gmail.com