Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Elasticsearch: fix support for multiple indices #71183

Closed
wants to merge 3 commits into from

Conversation

matyax
Copy link
Contributor

@matyax matyax commented Jul 7, 2023

Originally reported in an escalation, this PR fixes support for multiple configured indices, as a comma-separated list.

Which issue(s) does this PR fix?:

Fixes #71092

Special notes for your reviewer:

To verify:

  • Provision an Elasticsearch instance with multiple indices
  • Configure the data source with a comma-separated list of indices

Expected results: without the fix, no data is returned by the data source; with the code changes, it should work as usual.

@matyax matyax requested a review from a team as a code owner July 7, 2023 10:05
@matyax matyax added no-backport Skip backport of PR no-changelog Skip including change in changelog/release notes labels Jul 7, 2023
@matyax matyax added this to the 10.1.x milestone Jul 7, 2023
@matyax matyax requested a review from svennergr July 7, 2023 10:43
Copy link
Contributor

@svennergr svennergr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few comments, maybe we should quickly talk about that on Monday's meeting.

@@ -207,7 +207,7 @@ func (c *baseClientImpl) createMultiSearchRequests(searchRequests []*SearchReque
header: map[string]interface{}{
"search_type": "query_then_fetch",
"ignore_unavailable": true,
"index": c.indices,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something we should ideally also do is to dedupe the array. With a "daily" pattern and the index as [logs-]YYYY.MM.DD,[test],[metrics-]YYYY.MM.DD we would correctly end up in multiple [logs-]YYYY.MM.DD and [metrics-]YYYY.MM.DD indices, but test will also be duplicated, i.e. ending in a total of logs-2023.07.05,test,metrics-2023.07.05,logs-2023.07.06,test,metrics-2023.07.06,logs-2023.07.07,test,metrics-2023.07.07 for 3 days.

@@ -210,7 +211,7 @@ func TestClient_Index(t *testing.T) {
jHeader, err := simplejson.NewJson(headerBytes)
require.NoError(t, err)

assert.Equal(t, test.indexInRequest, jHeader.Get("index").MustStringArray())
assert.Equal(t, strings.Join(test.indexInRequest, ","), jHeader.Get("index").MustString())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be great if we ensure with a test that multiple indices are joined.

@@ -207,7 +207,7 @@ func (c *baseClientImpl) createMultiSearchRequests(searchRequests []*SearchReque
header: map[string]interface{}{
"search_type": "query_then_fetch",
"ignore_unavailable": true,
"index": c.indices,
"index": strings.Join(c.indices, ","),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even though this seems to work and I might have pointed you in a wrong direction (sorry), I am starting to be a bit unsure if that's the right approach. After reading https://www.elastic.co/guide/en/elasticsearch/reference/current/search-multi-search.html

index
(Optional, string or array of strings) Data streams, indices, and aliases to search. Supports wildcards (*). Specify multiple targets as an array.

If this parameter is not specified, the <target> request path parameter is used as a fallback.

We should either use an array here or use the correct <target> in the request path parameter. WDYT? We can also discuss with the rest of the squad next week.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely! Let's discuss this next week.

@matyax
Copy link
Contributor Author

matyax commented Jul 7, 2023

Thanks for the review @svennergr , let's discuss this more in depth.

@ivanahuckova
Copy link
Member

ivanahuckova commented Jul 10, 2023

I haven't head a detailed look, but I suspect this #67276 broke it. We decided to change comma separated list to array to be consistent with frontend. So maybe we could just revert it to fix the issue. And then we can decide if we want to refactor/change the logic.

@gabor
Copy link
Contributor

gabor commented Jul 10, 2023

i did some testing on how "frontend-mode" handles this and:

  1. for things where there is no "interpolation", for example logs, it sends it as a single-string, like:
{
    "index": "logs"
}

this also works if you put more indexes comma-separated, like logs1,logs2, it becomes:

{
    "index": "logs1,logs2"
}
  1. if you use the pattern feature, for example daily pattern, and set the index to be [logs-]YYYY.MM.DD, and query for the last 3 days, you get an array-of-strings:
{
    "index": [
        "logs-2023.07.08",
        "logs-2023.07.09",
        "logs-2023.07.10"
    ]
}
  1. if you have a pattern, and also comma-separated, it does not work, like, if i extend [2] to be [logs-]YYYY.MM.DD,logs-*, it will generate a broken index-string

@gabor
Copy link
Contributor

gabor commented Jul 10, 2023

after some thinking, i think [3] from above can be considered unsupported. i see two ways to achieve this:

  1. make it a single-string: this PR, or just revert the PR @ivanahuckova mentioned above
  2. fix the array-of-strings case, that would need a change like this:
 func (ip *staticIndexPattern) GetIndices(timeRange backend.TimeRange) ([]string, error) {
        if ip.indexName != "" {
-               return []string{ip.indexName}, nil
+               return strings.Split(ip.indexName, ","), nil
        } else {
                return []string{}, nil
        }

i think the single-string approach is more robust, WDYT? @ivanahuckova @matyax

@svennergr
Copy link
Contributor

svennergr commented Jul 10, 2023

i think the single-string approach is more robust, WDYT? @ivanahuckova @matyax

I am a bit concerned with the statement in the Elastic docs https://www.elastic.co/guide/en/elasticsearch/reference/current/search-multi-search.html

Specify multiple targets as an array.

The other fix could be to rely on this statement

If this parameter is not specified, the request path parameter is used as a fallback.

and remove the index from the headers and use it from the request path, where multiple indices can be a single string.

@matyax
Copy link
Contributor Author

matyax commented Jul 11, 2023

Thanks everyone

@matyax matyax closed this Jul 11, 2023
@ricky-undeadcoders ricky-undeadcoders modified the milestones: 10.1.x, 10.1.0 Aug 1, 2023
@zerok zerok removed this from the 10.1.0 milestone Aug 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/backend datasource/Elasticsearch no-backport Skip backport of PR no-changelog Skip including change in changelog/release notes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Elasticsearch: Add backend support for multiple indexes
7 participants