Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GET API should support routing=* option #28120

Closed
yehosef opened this issue Jan 7, 2018 · 6 comments
Closed

GET API should support routing=* option #28120

yehosef opened this issue Jan 7, 2018 · 6 comments
Labels
:Distributed/CRUD A catch all label for issues around indexing, updating and getting a doc by id. Not search. >enhancement feedback_needed

Comments

@yehosef
Copy link

yehosef commented Jan 7, 2018

I'm working with a data set that the ids are already set and I accessing via the direct RESTish API:
GET /index/doc/1
This is great, very fast, deals with refreshes - great!
But now I want to reindex some of the documents with a parent/child so doc 2 will be a child of doc 1. After I do that, GET /index/doc/2 will/may not return the document as explained in https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-get.html#get-routing

Many times I do not know the parent id until I get the document so I have to resort to a regular query. Which is performant, but removes the simplicity of the REST interface I'm using now.

I'd like to suggest that there be an querystring option (eg ?routing=* for GET (ideally also POST/PUT/DELETE) that can effectively add an "ids" query to get the right document, regardless of the routing.

The value of this is that it simplifies the REST use case while still allowing routing when needed.

@javanna javanna added :Distributed/CRUD A catch all label for issues around indexing, updating and getting a doc by id. Not search. >enhancement discuss labels Jan 16, 2018
@jpountz
Copy link
Contributor

jpountz commented Jan 19, 2018

Can you give more details about your use-case? Is there any reason why the _search API would not work?

@yehosef
Copy link
Author

yehosef commented Jan 20, 2018

I went into more detail in the forums at https://discuss.elastic.co/t/is-there-a-way-to-specify-routing-all-or-similar-for-get-requests/115439.

Basically my specific use case is that I have a parent-child relationship (stories and comments) and I don't know the id of the parent when requesting the child (which is common/normal). Until now I had the documents stored not as parent-child and I'm using the Document API. As I started planning the move to parent-child, I realized I wouldn't be able to us it for GET of a child when all I have is it's ID because I don't know the parent's id, and hence the routing param.

While I can use the search API, you could make the same argument of why you should have a Document API at all - you can use the search API. Since there is already a Document API and you already support specifying routing (and multiple routing ids), I'm just asking/suggesting that this be extended to support all the shards with a single command, routing=* (or similar - all_shards=true.. etc).

Here's the one problem I can see that could come from this. It's possible to have multiple documents with the same ID on different shards. Someone using the document API presumably wouldn't be doing this on purpose, but it could happen. Since the document ID only supports one result, you have to decide how to handle this.

The best solution (I think) would be the throw an error that the request expects one results and more than one was found. Then it would be clear there is a problem and the person could use the search API to find the problem documents and correct them.

But what currently happens is that one of the results is returned and the other ignored (I created a test case with two documents with the same ID and different routing params. If I do an ID search, I get both results. If I do a document request with both routing options, I sometimes get one of the results, sometimes neither, depending on the order of the routing params (I can explain the test if you want.. but it's not hard to imagine and create.) This is less than ideal because really something went wrong and I'm not noticing it.

Let me know if you have other questions. Thanks.

@yehosef
Copy link
Author

yehosef commented Jan 20, 2018

I don't think what I'm asking for is any big/major change. You already support multiple routing id and I could, knowing the number of shards I have, experiment and find routing params to hit all the shards. But that's hacky and shouldn't be needed, IMO.

@bleskes
Copy link
Contributor

bleskes commented May 22, 2018

You already support multiple routing id

Can you elaborate on where you see this?

@yehosef
Copy link
Author

yehosef commented May 22, 2018

I thought I saw it, but after checking it, perhaps it was a mistake. I was trying ?routing=1,2 or things like that. Sometimes it would work, which I thought it was using multiple routing params. I realize now it was just using "1,2" as the routing string to hash.

So it might not be the same thing or as easy. The way I solved it for now is to use a "?q=id:123" query. The problem is that it gives a results array instead of single and I had to change my code to accommodate that structure instead of a single result. It would also be helpful if there was a query string param that could get the first result and return it as a simple result instead of like a result set (with timing, total count, etc..)

@bleskes
Copy link
Contributor

bleskes commented May 22, 2018

ok. Thanks for responding quickly. I believe that adding multi-routing values to the GET API will make it more complicated than needed. It has elegance in its simplicity and is geared toward retrieval of docs which have a known place. Otherwise a fall back to _search makes sense (search is optimized to support multiple routing values).

It would also be helpful if there was a query string param that could get the first result and return it as a simple result instead of like a result set

I see how that will be useful for you. I think it's better to cover it using a utility method in the application code rather than making the ES API more complicated.

I'm going to close this as I feel the use cased has been properly hashed out.

@bleskes bleskes closed this as completed May 22, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed/CRUD A catch all label for issues around indexing, updating and getting a doc by id. Not search. >enhancement feedback_needed
Projects
None yet
Development

No branches or pull requests

6 participants