Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

In the Alias of data streams filter on only one data_stream ends up applying to all others #92050

Closed
mpakoupete opened this issue Dec 1, 2022 · 2 comments · Fixed by #92692
Closed
Labels
>bug :Data Management/Data streams Data streams and their lifecycles Team:Data Management Meta label for data/management team

Comments

@mpakoupete
Copy link

Elasticsearch Version

8.5

Installed Plugins

No response

Java Version

bundled

OS Version

any

Problem Description

The use of alias of data streams behaves differently from the alias of indices.
When trying to apply filter to only one data_stream in the Alias, the result is that It ends up applying to both.

Steps to Reproduce

POST _aliases
{
    "actions": [
        {
            "add": {
                "index": "data_stream_1",
                "alias": "alias_1",
                "filter": {
                    "bool": {
                        "filter": [
                            {
                                "term": {
                                    "reaction": "report"
                                }
                            }
                        ]
                    }
                }
            }
        },
        {
            "add": {
                "index": "data_stream_2",
                "alias": "alias_1"
            }
        }
    ]
}

Per the above request, I’m trying to apply filter to only one data stream in the Alias but the result is that It ends up applying to both.

result

GET _alias/alias_1
{
  "data_stream_2": {
    "aliases": {
      "alias_1": {
        "filter": {
          "bool": {
            "filter": [
              {
                "term": {
                  "reaction": "report"
                }
              }
            ]
          }
        }
      }
    }
  },
  "data_stream_1": {
    "aliases": {
      "alias_1": {
        "filter": {
          "bool": {
            "filter": [
              {
                "term": {
                  "reaction": "report"
                }
              }
            ]
          }
        }
      }
    }
  }
}

Logs (if relevant)

No response

@mpakoupete mpakoupete added >bug needs:triage Requires assignment of a team area label labels Dec 1, 2022
@dakrone
Copy link
Member

dakrone commented Dec 1, 2022

I looked into this a little bit. Here's a quick reproduction that includes creating the data streams from ./gradlew run:

POST /logs-foo-bar/_doc
{
  "@timestamp": "2022-01-01"
}

POST /logs-foo-baz/_doc
{
  "@timestamp": "2022-01-01"
}

POST _aliases
{
    "actions": [
        {
            "add": {
                "index": "logs-foo-bar",
                "alias": "alias_2",
                "filter": {
                    "bool": {
                        "filter": [
                            {
                                "term": {
                                    "reaction": "report"
                                }
                            }
                        ]
                    }
                }
            }
        },
        {
            "add": {
                "index": "logs-foo-baz",
                "alias": "alias_2"
            }
        }
    ]
}

GET /_alias/alias_2

The output includes the filter duplicated for both data streams, instead of only for the logs-foo-bar data stream.


Looking into where this is happening in the code, this is due to a couple of blocks of code:

This first is:

boolean filterUpdated;
CompressedXContent filter;
if (filterAsMap != null) {
filter = compress(filterAsMap);
if (this.filter == null) {
filterUpdated = true;
} else {
filterUpdated = filterAsMap.equals(decompress(this.filter)) == false;
}
} else {
filter = this.filter;
filterUpdated = false;
}

Where the issue is the filterUpdated ends up not being updated if the newly passed in filter is null but the previous filter was non-null.

The second part is this function:

public DataStreamMetadata withAlias(String aliasName, String dataStream, Boolean isWriteDataStream, String filter) {
if (dataStreams.containsKey(dataStream) == false) {
throw new IllegalArgumentException("alias [" + aliasName + "] refers to a non existing data stream [" + dataStream + "]");
}
Map<String, Object> filterAsMap;
if (filter != null) {
filterAsMap = XContentHelper.convertToMap(XContentFactory.xContent(filter), filter, true);
} else {
filterAsMap = null;
}
DataStreamAlias alias = dataStreamAliases.get(aliasName);
if (alias == null) {
String writeDataStream = isWriteDataStream != null && isWriteDataStream ? dataStream : null;
alias = new DataStreamAlias(aliasName, List.of(dataStream), writeDataStream, filterAsMap);
} else {
DataStreamAlias copy = alias.update(dataStream, isWriteDataStream, filterAsMap);
if (copy == alias) {
return this;
}
alias = copy;
}
return new DataStreamMetadata(dataStreams, ImmutableOpenMap.builder(dataStreamAliases).fPut(aliasName, alias).build());
}

Where the data stream aliases are being stored in an ImmutableOpenMap<String, DataStreamAlias>, where the key is the alias name. Unfortunately this means that one alias (alias_2 in the example above) can only refer to a single DataStreamAlias instance, and they end up clobbering each other. We'll need to figure out a different way to store this so that we can have multiple DataStreamAlias instances (with separate filters) for a single alias name.

@DaveCTurner DaveCTurner added :Data Management/Indices APIs APIs to create and manage indices and templates and removed needs:triage Requires assignment of a team area label labels Dec 5, 2022
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-data-management (Team:Data Management)

@elasticsearchmachine elasticsearchmachine added the Team:Data Management Meta label for data/management team label Dec 5, 2022
@dakrone dakrone added :Data Management/Data streams Data streams and their lifecycles and removed :Data Management/Indices APIs APIs to create and manage indices and templates labels Dec 5, 2022
elasticsearchmachine pushed a commit that referenced this issue Jan 9, 2023
Index aliases allow each index in the alias to have a different filter.
Data stream aliases appear to do this, but in reality they do not. That
is, if you ask to add two data streams to a data stream alias, each with
a different filter, the API allows it but only keeps one of the filters.
This PR makes it so that the DataStreamAlias keeps a map of DataStreams
to filters so that the different filters are respected. Closes
#92050
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Data Management/Data streams Data streams and their lifecycles Team:Data Management Meta label for data/management team
Projects
None yet
4 participants