Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Get document API can specify an alias, but will return documents that are not part of that alias (as defined by the filter for that alias) #3861

Closed
ccw-morris opened this issue Oct 9, 2013 · 18 comments · Fixed by #108433
Labels
:Data Management/Indices APIs APIs to create and manage indices and templates >enhancement help wanted adoptme Team:Data Management Meta label for data/management team

Comments

@ccw-morris
Copy link

Create an index and populate it with two documents. Create an aliases with a filter, such that the alias contains one document.

A search using the alias will return one result.
A get document request using the alias can retrieve both documents.

This may confuse users: "why does this document not turn up in my search results", and makes it hard to implement a security model using aliases.

The UidField.loadDocIdAndVersion() method already uses a filter to check whether the document has been deleted. It would be possible to pass in the alias filter, if defined.

@javanna
Copy link
Member

javanna commented Oct 18, 2013

The tricky bit here is that the GET api is realtime, as we can either get document by id from lucene or from the transaction log, before the next refresh happens and the newly indexed documents are made searchable.
Filters are part of the search capabilities of elasticsearch, using the search API, which are not applicable to the GET api. In fact we can only get docs by id from the transaction log, we cannot execute queries (or filters) on top of it.
That's why there are cases where the filter associated with an alias might be ignored. We should at least better document those cases.

@javanna
Copy link
Member

javanna commented Mar 22, 2015

I would like to know what people think about the following options:

  1. optionally index the document taken from the transaction log in an in memory index so that we can execute the filter against it... seems very slow, default would be not do it but could be enabled via setting
  2. just document this, explain clearly why it happens, and make it clear that the get api against filtered aliases is not something you want to build a security model around. Move to search api instead, giving up on the real-time aspect, but relying on the guarantee that only documents that match the filter get returned.

I am personally afraid of the complexity and slowness that would be introduced with option 1. Do comment if you have better ideas around this.

@javanna
Copy link
Member

javanna commented Mar 28, 2015

We discussed this, we'd rather prefer to reject get requests performed against a filtered alias, given that we cannot provide users the correct answer. If the get request has the realtime flag set to false though, meaning that the document will only be retrieved from the lucene index, we should not ignore the filter within the alias, instead apply it and do return the result. Marking as adoptme.

@javanna javanna added help wanted adoptme and removed discuss labels Mar 28, 2015
@javanna javanna removed their assignment Mar 28, 2015
javanna added a commit to javanna/elasticsearch that referenced this issue Jun 4, 2015
Documents that don't belong to the filtered alias might get returned as the filter cannot be taken into account when the document is retrieved from the transaction log, we simply reject the request given that we cannot answer it properly.
This change affects also apis that use the get api internally, like: multi_get, term_vector, multi_term_vector, explain and update. Percolator is not affected as it already knows how to execute alias filters against the lucene index in non real-time when needed.

Closes elastic#3861
@javanna
Copy link
Member

javanna commented Jun 5, 2015

Remarking for discussion... I attempted to solve this as explained above, but it's more complicated than it initially seemed, see comment here. We have to discuss how we want to move forward here, we have again a few options:

  1. go all the way with rejecting a get against a filtered alias, wherever it can happen (also index api) and make filtered aliases consistent in every case, where the filter cannot be run we reject.
  2. leave things as they are and better document limitations of filtered aliases
  3. replace any internal get against a filtered alias with a search api (near real-time though)

other opinions are welcome.

@clintongormley
Copy link

My preference is number 2. I don't think we can make filtered aliases behave exactly like indices, We would just end up creating complexity, poorer performance, and still have edge cases. Rather just explain how filtered aliases work and leave it at that. If users need the functionality of indices, then they should use a real index instead.

@javanna javanna removed their assignment Jun 8, 2015
@clintongormley
Copy link

Closing this issue and leaving things as they are today

@clintongormley
Copy link

My feeling is that this has not been a big problem for users. Plus there is a workaround: they can retrieve docs with a search-by-id against the alias instead of using GET (ie opt in to doing what you're suggesting themselves). As we know from security in x-pack, there's a lot more to security than just making filtered aliases work for GET. I think we should leave things as they are.

@clintongormley
Copy link

Discussed in FixItFriday - we're good to implement this.

@clintongormley clintongormley added help wanted adoptme and removed discuss labels Dec 23, 2016
@idubinskiy
Copy link

Was this limitation ever documented (at least in the 2.3 docs)? Led to a lot of confusion for me while trying to handle an edge case.

@clintongormley clintongormley added :Data Management/Indices APIs APIs to create and manage indices and templates and removed :Aliases labels Feb 13, 2018
@hub-cap hub-cap added :Data Management/Indices APIs APIs to create and manage indices and templates and removed :Data Management/Indices APIs APIs to create and manage indices and templates labels Mar 21, 2018
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra

@dakrone
Copy link
Member

dakrone commented Sep 6, 2019

I believe this can be closed as get-by-id no longer retrieves documents from the translog, so filtered aliases should always work.

@dakrone dakrone closed this as completed Sep 6, 2019
@javanna
Copy link
Member

javanna commented Sep 10, 2019

@dakrone do we have tests for this functionality?

@dakrone
Copy link
Member

dakrone commented Sep 10, 2019

@javanna not that I know of, are you concerned about the behavior? We can re-open this to add tests if you'd like?

@javanna
Copy link
Member

javanna commented Sep 10, 2019

yea I don't feel like "it should work" is a good resolution to this long standing issue. I am super happy if it's fixed, but I think we should add tests that demonstrate that it is fixed.

@dakrone dakrone reopened this Sep 10, 2019
@rjernst rjernst added the Team:Data Management Meta label for data/management team label May 4, 2020
@jiripospisil
Copy link

Is this supposed to work in ES 6.8.9 or only in newer releases? I'm still seeing the behavior with 6.8.9 - that is I have two aliases for an index, one alias with a term filter (filtered_foos) and the other one without (foos).

If I use the get API to get a specific document, the filter is not respected and I get the same document with both aliases:

http :9200/foos/all/5ebd22f472f2b4246375615d (1 document)
http :9200/filtered_foos/all/5ebd22f472f2b4246375615d (1 document)

If I use the search API, the filter is respected:

{
  "query": {
    "ids": {
      "values" : ["5ebd22f472f2b4246375615d"]
    }
  }
}
http :9200/foos/all/_search < query.json (1 document)
http :9200/filtered_foos/all/_search < query.json (no documents as expected)

@n1v0lg
Copy link
Contributor

n1v0lg commented Jul 11, 2022

Linking another report of this (#88425), for 8.2.2.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Data Management/Indices APIs APIs to create and manage indices and templates >enhancement help wanted adoptme Team:Data Management Meta label for data/management team
Projects
None yet
Development

Successfully merging a pull request may close this issue.

10 participants