-
Notifications
You must be signed in to change notification settings - Fork 8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
slack action not reporting rate-limiting as it should #44127
Comments
Pinging @elastic/kibana-stack-services |
I created a better always firing alert that provides a counter, to see if we lose any messages during rate-limiting process. Creating it looks like this: $ kbn-alert create .always-firing 1s {} "{group:default id:'c4bee054-f51e-4ef1-88bd-ac010e720097' params:{message: 'via alert; {{context.date}} {{context.count}}'}}" In slack, I saw the following:
So we lost messages 13-16 during the rate-limiting; there were also 4 server log messages as shown ^^^. So, seems like it the right message got printed, probably from the slack action type, but it didn't actually go into rate-limiting mode ... |
Pinging @elastic/kibana-reporting-services (Team:Reporting Services) |
Thinking about this for 7.7. Without the event log for customers to notice rate limiting is happening, not sure it's all that important to fix this. But want to retry on the latest code. For customers, once you get rate-limited by a webhook, the channel that it's posting too seems to add a message indicating that too many messages are being sent. So, even if we can't be precise here, at least customers should get that feedback. |
huh, I think maybe this is fixed. And noticed this in the Kibana log:
But also noticed this, indicating some cleanup is required in the event log:
|
Actually, now that I look at that event log entry above, seems like it's coming from the action, so it's probably just SAYING it wants to retry then. Still need to figure out if it's doing the retry or not. I think perhaps I can use the slack server simulator in FTS to help diagnose this, will be easier to force a 429 than with the real server. |
Pinging @elastic/kibana-app-services (Team:AppServices) |
Pinging @elastic/kibana-alerting-services (Team:Alerting Services) |
Closing this issue - we haven't had any reports of this being a problem, and it's not even clear how involved it might be to make it work the way we wanted (if it's not already), as task manager has changed so much since this was first opened. We're also looking at changing action execution to happen within the alert task execution, instead of as separate tasks - see issue #90888 . Assuming we go down that route, trying to manage retries will become a lot harder - presumably we'd schedule the retries as new one-off tasks? For now, per team discussion, we'll shelve that thought till we see some actual requirements for it. |
I've set up a scenario to test what happens when slack returns a rate-limiting reponse from an action.
setup:
.slack
action type, with a secret ofwebhookUrl
(value being the entire URL)Assuming your webhook url is available in the env var
SLACK_WEBHOOKURL
, these are commands to create the action and alert, and then run autocannon:When run, the following message is printed in the Kibana console:
It's partially correct - it got a retry date! The problem is with the end of the message, it should indicate that the request was rate-limited, instead of an http protocol error.
Or perhaps something more nefarious is going on, but I doubt it, as the case where a retry date is returned is pretty constrained ...
The text was updated successfully, but these errors were encountered: