Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

alerts fail to evaluate queries and expressions #173

Closed
Tracked by #192
mhristof opened this issue May 16, 2023 · 21 comments
Closed
Tracked by #192

alerts fail to evaluate queries and expressions #173

mhristof opened this issue May 16, 2023 · 21 comments

Comments

@mhristof
Copy link

What happened:
when trying preview an alert, im getting the following error

Failed to evaluate queries and expressions: [plugin.downstreamError] failed to query data: Failed to query data: rpc error: code = Unknown desc = error while Decoding to MultiSearchResponse: EOF

image
Screenshot 2023-05-16 at 14 49 51

What you expected to happen:
The alert to preview the values

How to reproduce it (as minimally and precisely as possible):
Create an alert as above and try to preview the status

Anything else we need to know?:

Environment:

  • Grafana version: v9.5.2 (cfcea75916)
  • Plugin version: 2.4.1
@mhristof mhristof added datasource/OpenSearch type/bug Something isn't working labels May 16, 2023
@sarahzinger
Copy link
Member

Hi @mhristof thanks for opening this issue!

I'm having some trouble recreating this but I think my test dataset is just different and it's hard to match the query to catch this edge case. Would you mind showing us a screenshot of what the query results look like in table format when on the query editor/explore page with any sensitive data removed so we might try to recreate this on our end?

Thanks so much!

@mhristof
Copy link
Author

mhristof commented May 16, 2023

i think you mean this

image

The query returns lots of empty messages and there is a message with the actual value

image

@sarahzinger
Copy link
Member

@mhristof that's very helpful thanks! Still haven't quite recreated it yet, but I wonder if the issue is related to the missing values.

I see we have this additional option for missing values where they can be set to a specific value such as zero:
Screenshot 2023-05-17 at 10 37 17 AM

Does altering that number help?

@sarahzinger
Copy link
Member

It's not the same error message as what you're seeing but I am seeing inconsistent data in general between alerts and queries. Made another issue to track that here: #176

@mhristof
Copy link
Author

mhristof commented May 17, 2023

Setting missing to 0 causes a Trying to create too many buckets error.

data:"{
    \"root_cause\": [],
    \"type\": \"search_phase_execution_exception\",
    \"reason\": \"\",
    \"phase\": \"fetch\",
    \"grouped\": true,
    \"failed_shards\": [],
    \"caused_by\": {
        \"type\": \"too_many_buckets_exception\",
        \"reason\": \"Trying to create too many buckets. Must be less than or equal to: [65535] but was [66286]. This limit can be set by changing the [search.max_buckets] cluster level setting.\",
        \"max_buckets\": 65535
    }
}"

Do you have any idea why we are getting the emty/null values in the results. If i use the same query in Kibana, only the non empty values are returned (see https://stackoverflow.com/a/70439328/2599522 for example)

@sarahzinger
Copy link
Member

I'm so sorry that fix didn't work for you! I think it's likely we have some kind of bug in how our plugin is handling alerting queries, particularly around nil values, but even more generally I'm seeing inconsistent results between the results in our query editor and the results in our alerts. This is likely because unlike many of our other datasource plugins, this one seems to have 2 different query paths, one for alerting and one for the query editor. In our other plugins, we tend to have the same data flow path for both, which ensures any bug fixes in one area are fixed in the other.

We're currently scoping out a refactor of this plugin to unify the experience between alerting queries and query editor queries, and it's one of our top priorities for the quarter. But unfortunately it may take us at least a few weeks before we can ship improvements, though we will do what we can to expedite this process!

I am curious has alerting with this datasource worked well for you in the past and recently started showing errors? Just want to rule out that this isn't related to some sort of recent changes we may have made on our end. I suspect not and that these issues have likely been around for a long time, but would like to rule that out in case there's a quick fix, we can get out to unblock you.

@mhristof
Copy link
Author

I've only recently started to add alerts as this is a fresh grafana instance/opensearch integration.

Thanks for the updates, you've been most helpful. At the moment we can live without the alerts.

Let met know if you need testing when beta versions are out

@sarahzinger
Copy link
Member

Great thanks @mhristof will do! cc @fridgepoet and @idastambuk for your refactor project!

@maxwellvarner
Copy link

I was previously able to create alerts with this plugin using the following versions.

Grafana: 9.4.2
Opensearch Plugin: 2.1.0

I recently upgraded to Opensearch Plugin 2.4.1 and Grafana 9.5.2. I upgraded the plugin first, but don't recall testing to see if I could still create alerts. I can say that as of today with both recent upgrades I've done; I cannot create alerts and get the same error message as seen in the original post.

@fridgepoet
Copy link
Member

Hey @maxwellvarner can you try plugin v2.6.1? There might be a backend client creation issue in v2.4.1 that you're running into.

@maxwellvarner
Copy link

@fridgepoet thank you for that suggestion. I have just upgraded to v2.6.1 and am able to create alerts again!

Screenshot 2023-06-08 at 11 03 57 AM

@fridgepoet
Copy link
Member

fridgepoet commented Jun 8, 2023

@mhristof Can you also try plugin v2.6.1? There might be a client issue in v2.4.1 that you're running into.

@pavriet-boxtal
Copy link

pavriet-boxtal commented Jun 14, 2023

Hello, I have a related issue that I hope can help us get to the root of the problem.

Everything is fine while creating the alert, but a few times a day, we receive a notification for a datasourceError, and the Error annotation says [plugin.downstreamError] failed to query data: Failed to query data: rpc error: code = Unknown desc = error while Decoding to MultiSearchResponse: invalid character '}' looking for beginning of object key string.

At first we were running v2.4.1, I upgraded to 2.6.2 after seeing this thread but nothing changed.

grafana v9.5.1 (bc353e4b2d)
opensearch-datasource 2.6.2
Datasource setup :
01

Hope this helps :)

@idastambuk
Copy link
Contributor

Hi @pavriet-boxtal, thanks for reporting this. I just have a few questions:

  1. Can you tell us more about the query you are running for the alert?
  2. Does the Version field in the config form have the correct version for your OpenSearch instance? If not, can you run "Get version and Save"?

Thank you!

@pavriet-boxtal
Copy link

Hi @idastambuk,

  1. Really not much to say imo
    image

In the picture bellow, I was able to collect the same query (in a panel editor this time) where an error happened (blue vertical line @16:07) + the query inspect menu on the right. Just to show that data does not seem to be the issue.
image

  1. For the full story, we use opensearch 1.3 from AWS opensearch service. Before upgrading to v2.6.2 of the plugin, the Version field in the data source config form was set to opensearch 1.x (iirc). The 'Get version and save' button was not present. We had the errors in that state of config.
    Now, with v2.6.2, I saw the 'Get version and save', clicked it and it returned ES 7.10.2, as we see in the picture in my first comment. We still have the error, and we can not change the version field (weird behavior imo).

@fridgepoet
Copy link
Member

Hi @pavriet-boxtal, thanks for all the information so far. What version of the plugin were you using when you did not have any issues?

@pavriet-boxtal
Copy link

Hello @fridgepoet , We had issues with both 2.4.1 and 2.6.2. We never had no issues :(

@idastambuk
Copy link
Contributor

Hi @pavriet-boxtal, can you by any chance recall if the error for 2.4.1 was the same as in 2.6.2 (error while Decoding to MultiSearchResponse: invalid character '}' looking for beginning of object key string)?
I'm just trying to determine if both versions have the same bug, or if 2.4.1 had the error while Decoding to MultiSearchResponse: EOF and 2.6.1 is a different one. In which case we will track it in another ticket. Thanks!

@pavriet-boxtal
Copy link

I search our notification channels for EOF and found this (url anonymized for obvious reasons). I check and it happened twice while on 2.4.1. I also attached a picture of the alert query that generated the errors.

[plugin.downstreamError] failed to query data: Failed to query data: rpc error: code = Unknown desc = Post "https://*****.es.amazonaws.com/_msearch?max_concurrent_shard_requests=5": EOF

image

These errors are very rare compared to the invalid character '}' ones. So I'm not sure the version could have changed anything. I'll try to look through all the errors later in the day, as I completely missed the EOF ones. Maybe other kinds of errors are hidden :)

@mhristof
Copy link
Author

@mhristof Can you also try plugin v2.6.1? There might be a client issue in v2.4.1 that you're running into.

v2.6.1 made my issue go away. I can now see the data in my queries and they are not failing.

@idastambuk
Copy link
Contributor

@mhristof thanks for the update! I will close this ticket as it looks like the original issue was fixed with 2.6.1.

@pavriet-boxtal I opened another issue to track the problem from your comment, you can follow the progress there and let us know if you find any other errors, as you mentioned.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

No branches or pull requests

6 participants