Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

_terms_enum when using alias indices #107114

Open
cauemarcondes opened this issue Apr 4, 2024 · 2 comments
Open

_terms_enum when using alias indices #107114

cauemarcondes opened this issue Apr 4, 2024 · 2 comments
Labels
>enhancement :Search/Search Search-related issues that do not fall into other categories Team:Search Meta label for search team

Comments

@cauemarcondes
Copy link
Contributor

cauemarcondes commented Apr 4, 2024

Elasticsearch Version

8.12

Installed Plugins

No response

Java Version

bundled

OS Version

darwin_x86_64

Problem Description

I'm ingesting two datasets on a datastream called metrics-apm*, the only difference is on the service.environment field, where there are two options 'dev' or 'production'.

I then create an alias where I split the data based on the environment:

POST /_aliases?pretty
{
  "actions": [
    {
      "add": {
        "index": "metrics-apm*", 
        "alias": "dev-metrics-apm", 
        "filter": {
          "term": {
            "service.environment": {
              "value": "dev" 
            }
          }
        }
      }
    }
  ]
}

I confirmed that only dev environment documents are available on the alias index by running a terms agg on the service.environment field:

 "buckets": [
        {
          "key": "dev",
          "doc_count": 19725
        }
      ]

But when we call the _terms_enum api using the alias index as target, both environments are returned:

POST dev-metrics-apm/_terms_enum
{
  "case_insensitive": true,
  "field": "service.environment",
  "size": 100,
  "string": "",
  "index_filter": {
    "range": {
      "@timestamp": {
        "gte": 1712241288111,
        "lte": 1712242188112,
        "format": "epoch_millis"
      }
    }
  }
}

Result:

"terms": [
    "dev",
    "production"
  ],

Shouldn't the terms enum API exclusively return the dev environment? Given that the target index is an alias and the dev environment is the only one associated with this alias, it seems logical to restrict the API response to just that

Related to: elastic/kibana#180065

Steps to Reproduce

I gave the steps on the problem description.

Logs (if relevant)

No response

@cauemarcondes cauemarcondes added >bug needs:triage Requires assignment of a team area label labels Apr 4, 2024
@pxsalehi pxsalehi added :Search/Search Search-related issues that do not fall into other categories and removed needs:triage Requires assignment of a team area label labels Apr 4, 2024
@elasticsearchmachine elasticsearchmachine added the Team:Search Meta label for search team label Apr 4, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search (Team:Search)

@cbuescher
Copy link
Member

I believe the _terms_enum_api was never designed to work with aliases. The alias described works through a search filter that gets attached to search calls on the alias, but the _terms_enum_api doesn't use Lucene search under the hood but uses the term dictionaries directly. That is why, e.g. you can also get terms for deleted documents that haven't been merged away yet. Those approximations were made to make the api very fast to be used as an easy auto-complete like feature.
While filtering using search on top of that might be possible, it will most certainly come at a performance cost and require larger changes.
I'm re-labeling this as an enhancement because I don't think this has ever been working for the reasons stated above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>enhancement :Search/Search Search-related issues that do not fall into other categories Team:Search Meta label for search team
Projects
None yet
Development

No branches or pull requests

4 participants