Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ADX: Failed to parse KQL response with errors #508

Closed
chriswue opened this issue Nov 3, 2022 · 8 comments · Fixed by #527
Closed

ADX: Failed to parse KQL response with errors #508

chriswue opened this issue Nov 3, 2022 · 8 comments · Fixed by #527
Assignees
Labels

Comments

@chriswue
Copy link

chriswue commented Nov 3, 2022

I have an ADX database which has an external table mapped. Any query that include this external table refuses to execute with the following error:

Failed to execute query: models.TableResponse.Tables: []models.Table: models.Table.Rows: []models.Row: models.Row: decode slice: expect [ or n, but found {, error found in #10 byte of ...|,"Rows":[{"Exception|..., bigger context ...|taType":"String","ColumnType":"string"}],"Rows":[{"Exceptions":["Query execution has resulted in err|...

I have another cluster with external tables and those work fine which means the problem lies with specific tables.

The only difference between the tables that are OK and the one that doesn't work is the non-functioning one is an external SQL table while the others are storage account tables.

Querying the table from Kusto Explorer or via Azure Portal is perfectly fine.

Unfortunately the error message is heavily redacted and not very useful to track down specifics. I don't know where or how to find the full error. Would be nice if Grafana could make this available for download with a "Debug" flag or something.

I guess the problem lies with the query parsing somehow. Is there an option for "non-intelligent query execution" and bypassing the parser?

Environment:

  • Grafana version: 9.2.3
  • Plugin version: 4.1.9
  • OS Grafana is installed on: Azure App Service - Linux based
  • User OS & Browser: Windows 10 + Firefox
  • Others:
@chriswue chriswue added datasource/ADX type/bug Something isn't working labels Nov 3, 2022
@chriswue
Copy link
Author

chriswue commented Nov 3, 2022

Actually looking at it closer - the problem is probably more on the side of the return value parsing. Is there a way to get the raw return payload from grafana?

@chriswue
Copy link
Author

chriswue commented Nov 3, 2022

I've got the raw return payloads from ADX by sending the queries through curl but I cannot see any substantial difference between the two queries (the one that works and the one that doesn't)

@andresmgot
Copy link
Contributor

andresmgot commented Nov 22, 2022

hi @chriswue can you send the data you are getting back with curl? (removing any sensitive data if there is some).

From the original error:

"Rows":[{"Exceptions":["Query execution has resulted in err|...

it seems that some error might be encoded rather than returning the actual list rows so that may be the cause.

@chriswue
Copy link
Author

chriswue commented Nov 22, 2022

@andresmgot Figured it out - the problem was that the external table was mapped with AD auth - which means that ADX would pass the user token to log into SQL - (which is grafana in this case which doesn't have direct access to that database). When testing on the CLI I used my user account which has access and that works.
I now switched this to Managed Id auth - the ADX cluster system identity has access to the DB - and it works.

In the end the problem was that the KQL query resulted in a "permission denied" response but grafana failed to recognize it as such and tried to parse it as a regular response and that failed with a unhelpful error.

If grafana had an option "dump raw data source response" that would be great in helping to debug this.

The response in this case from ADX was something like:

Error Query execution has resulted in error (0x80131500): Partial query failure: 0x80131500 (message: 'Error occured when executing sql_request(). SqlRequestPlugin.Open: Failed to open connection or ExecuteReader: Login failed for user '<token-identified principal>'.  ==> ExecutePluginOperator failure: ', details: 'Source: Kusto.Plugins.Callouts [0]Kusto.Data.Exceptions.KustoServicePartialQueryFailureIllFormattedDataException: Error occured when executing sql_request(). SqlRequestPlugin.Open: Failed to open connection or ExecuteReader: Login failed for user '

Not sure why grafana didn't recognize the error response. Kusto works via HTTP API, maybe the return code was misleading?

@andresmgot
Copy link
Contributor

Since we cannot reproduce the issue, it's difficult for us to know why this was not being handled as an error. From the behavior you describe, it seems that the API is returning a 200 OK but there is an error encoded in the body.

According to the API docs, that's not something that should be happening, are you able to reproduce the issue with a curl and send us the full response so we can see what's happening?

@chriswue
Copy link
Author

@andresmgot Looks like any query that results in a cluster error should do. Here is a query that will run most clusters into a runaway query error:

let randomArray = toscalar(range n from 0 to 1000000 step 1 | summarize make_list(new_guid()));
range num from 1 to 1000000 step 1
| extend numArr = randomArray
| mv-expand numArr
| summarize avg(string_size(numArr))

Results in Grafana:

models.TableResponse.Tables: []models.Table: models.Table.Rows: []models.Row: models.Row: decode slice: expect [ or n, but found {, error found in #10 byte of ...|,"Rows":[{"Exception|..., bigger context ...|DataType":"Double","ColumnType":"real"}],"Rows":[{"Exceptions":["Query execution has exceeded the al|...

Firing off the query with curl indeed yields a 200 result with the following response:

[
  {
    "FrameType": "DataSetHeader",
    "IsProgressive": false,
    "Version": "v2.0"
  },
  {
    "FrameType": "DataTable",
    "TableId": 0,
    "TableKind": "QueryProperties",
    "TableName": "@ExtendedProperties",
    "Columns": [
      {
        "ColumnName": "TableId",
        "ColumnType": "int"
      },
      {
        "ColumnName": "Key",
        "ColumnType": "string"
      },
      {
        "ColumnName": "Value",
        "ColumnType": "dynamic"
      }
    ],
    "Rows": [
      [
        1,
        "Visualization",
        "{\"Visualization\":null,\"Title\":null,\"XColumn\":null,\"Series\":null,\"YColumns\":null,\"AnomalyColumns\":null,\"XTitle\":null,\"YTitle\":null,\"XAxis\":null,\"YAxis\":null,\"Legend\":null,\"YSplit\":null,\"Accumulate\":false,\"IsQuerySorted\":false,\"Kind\":null,\"Ymin\":\"NaN\",\"Ymax\":\"NaN\",\"Xmin\":null,\"Xmax\":null}"
      ]
    ]
  },
  {
    "FrameType": "DataTable",
    "TableId": 1,
    "TableKind": "PrimaryResult",
    "TableName": "PrimaryResult",
    "Columns": [
      {
        "ColumnName": "avg_string_size_numArr",
        "ColumnType": "real"
      }
    ],
    "Rows": [
      {
        "OneApiErrors": [
          {
            "error": {
              "code": "LimitsExceeded",
              "message": "Request is invalid and cannot be executed.",
              "@type": "Kusto.Data.Exceptions.KustoServicePartialQueryFailureLimitsExceededException",
              "@message": "Query execution has exceeded the allowed limits (80DA0001): Partial query failure: Runaway query (E_RUNAWAY_QUERY). (message: 'Aggregation over string column exceeded the memory budget of 7GB during evaluation (see https://aka.ms/kustoquerylimits): ', details: '').",
              "@context": {
                "timestamp": "2022-12-14T20:03:14.3523358Z",
                "serviceAlias": "***",
                "machineName": "KENGINE000000",
                "processName": "Kusto.WinSvc.Svc",
                "processId": 3080,
                "threadId": 4296,
                "clientRequestId": "Curl.Query;7220a387-d1cc-469b-88c4-bcd5ca5ed370",
                "activityId": "***",
                "subActivityId": "***",
                "activityType": "GW.Http.CallContext",
                "parentActivityId": "***",
                "activityStack": "***"
              },
              "@permanent": false
            }
          }
        ]
      }
    ]
  },
  {
    "FrameType": "DataSetCompletion",
    "HasErrors": true,
    "Cancelled": false,
    "OneApiErrors": [
      {
        "error": {
          "code": "LimitsExceeded",
          "message": "Request is invalid and cannot be executed.",
          "@type": "Kusto.Data.Exceptions.KustoServicePartialQueryFailureLimitsExceededException",
          "@message": "Query execution has exceeded the allowed limits (80DA0001): Partial query failure: Runaway query (E_RUNAWAY_QUERY). (message: 'Aggregation over string column exceeded the memory budget of 7GB during evaluation (see https://aka.ms/kustoquerylimits): ', details: '').",
          "@context": {
            "timestamp": "2022-12-14T20:03:14.3523358Z",
            "serviceAlias": "***",
            "machineName": "KENGINE000000",
            "processName": "Kusto.WinSvc.Svc",
            "processId": 3080,
            "threadId": 4296,
            "clientRequestId": "Curl.Query;7220a387-d1cc-469b-88c4-bcd5ca5ed370",
            "activityId": "***",
            "subActivityId": "***",
            "activityType": "GW.Http.CallContext",
            "parentActivityId": "***",
            "activityStack": "***"
          },
          "@permanent": false
        }
      }
    ]
  }
]

It's a bit questionable to return a 200 on a query that had errors. I probably would have expected a 400 here. Don't know what the rational behind this is.
Looks like you're supposed to check the "FrameType": "DataSetCompletion" and see if HasErrors is true or false.

Now that I've spent some time on this again I realize we have some work-around code in our API do deal with this exact problem (that a 200 response doesn't mean all is good).

@chriswue
Copy link
Author

chriswue commented Dec 14, 2022

Actually according to the linked docs:

The 200 status code shows that the request processing has successfully started, and not that it has successfully completed. Failures encountered during request processing after the 200 status code has returned are called "partial query failures", and when they are encountered, special indicators are injected into the response stream to alert the client that they occurred.

And also 400 is only returned for permanent query failures (for example syntax errors). So the above follows the documented behaviour.

@andresmgot
Copy link
Contributor

That's great, thanks for the details! I am able to reproduce this with the query you suggested. Let me re-open the issue so we can fix the error parsing in this case.

@andresmgot andresmgot reopened this Dec 15, 2022
@andresmgot andresmgot changed the title ADX: query fails to execute when including external_table ADX: Failed to parse KQL response with errors Dec 15, 2022
@andresmgot andresmgot self-assigned this Dec 20, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

2 participants