Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

URI Request that returns just the _source, without metadata #2149

Closed
ejain opened this issue Aug 8, 2012 · 18 comments
Closed

URI Request that returns just the _source, without metadata #2149

ejain opened this issue Aug 8, 2012 · 18 comments
Labels

Comments

@ejain
Copy link
Contributor

ejain commented Aug 8, 2012

'http://localhost:9200/twitter/tweet/_search?q=user:kimchy' returns:

{
    "_shards":{
        "total" : 5,
        "successful" : 5,
        "failed" : 0
    },
    "hits":{
        "total" : 1,
        "hits" : [
            {
                "_index" : "twitter",
                "_type" : "tweet",
                "_id" : "1", 
                "_source" : {
                    "user" : "kimchy",
                    "postDate" : "2009-11-15T14:12:12",
                    "message" : "trying out Elastic Search"
                }
            }
        ]
    }
}

But sometimes it would be more useful to get a plain "dump" of the _source data instead:

{
    ...
    "hits":{
        "total" : 1,
        "hits" : [
            {
                "user" : "kimchy",
                "postDate" : "2009-11-15T14:12:12",
                "message" : "trying out Elastic Search"
            }
        ]
    }
}
@xstevens
Copy link

This would be really useful to have. In my case I'm trying to do HTTP response caching but "took" in the results obviously can change on each query even though the results are the same.

@spinscale
Copy link
Contributor

Hey,

you can do this with the current elasticsearch release for single documents (but not for searches)

curl -X PUT localhost:9200/foo/bar/1 -d '{ "name":"foo", "f":"a" }'
{"ok":true,"_index":"foo","_type":"bar","_id":"1","_version":2}                                                                                                                                          

curl localhost:9200/foo/bar/1/_source
{ "name":"foo", "f":"a" }

@xstevens If you really need to this for searches, putting a varnish proxy (or something similar) front makes more sense.

@ejain Can you tell what the big difference of only having the source compared to having the source including the metadata is anyway in a search response? Maybe I didnt get your request completely right.

@ghost ghost assigned spinscale Jun 24, 2013
@xstevens
Copy link

Well this wasn't really my request but it would work for what I want. I'm looking to remove the "took" variable from search results because that's what blows out an HTTP response cache. What I mean by that is, I end up with an entry per took="response time" even though the rest of the data stays the same.

@spinscale
Copy link
Contributor

Hey,

I am still not sure, if these are the right approaches to the problem, as I am still unsure about the problem. Maybe you can elaborate on what you want to do. If you simply want to cache the response, is it really important, if the took value is included in the response? I mean, does it matter? If an old took value is sent, because the search response is cached, what does this mean for you? Is that bad?

I am not sure, how your caching is working either. Is that configurable? Or do you simply cache the result of a certain request with a certain body? Maybe you can use the X-Unique-Id header for this (can be specified in the request and is included in the response as well), but I cannot really tell, until I understand your caching strategy (and why you are so focused on some fields :-)

@xstevens
Copy link

I'm just trying to do basic HTTP response caching with no knowledge that's it is even ElasticSearch that I'm talking to. I'm using Apache HttpClient caching that comes built-in. The reason why the "took" field is a problem is because the caching mechanism is checking on whether the payload (search result in this case) has changed in the background. So it's invalidating the cache more often than it needs to. I can work around this of course by doing my own caching, but I was going to try to avoid that since HttpClient has some other nice checks around Cache-Control headers, etc. for services that give that kind of feedback.

@xstevens
Copy link

As far as how HttpClient is detecting a payload change I believe their impelmentation is using SHA256(payload).

@ejain
Copy link
Contributor Author

ejain commented Jun 25, 2013

My use case is that I need to let users download their documents in bulk; this would be a lot more efficient if I didn't have to parse the response and strip out elasticsearch-specific properties.

@dpkirchner
Copy link

@spinscale What version introduced _source? I get "No handler found for uri /index/type/NNN/_source" on 0.26.

This feature would be really useful for me as well (I'd like to be able to download documents in bulk and then update them in bulk without having to do surgery).

@brusic
Copy link
Contributor

brusic commented Oct 14, 2013

@therealdpk Judging by the commit/issue, the feature will be available in elasticsearch 1.0. Someone please correct me if I am wrong, but I am curious as well and I do not see it in the 0.90 branch.

#3301

@spinscale
Copy link
Contributor

@therealdpk it was introduced in 0.90.1

@brusic the issue you referred to is for more fine grained access control to the source without changing the data structure layout when requesting the data (which can happen in few cases)

https://github.com/elasticsearch/elasticsearch/blob/0.90/src/main/java/org/elasticsearch/rest/action/get/RestGetSourceAction.java

@brusic
Copy link
Contributor

brusic commented Oct 15, 2013

Sorry for the misinformation. I assumed the _source param would be part of the normal RestGetAction.

@karmi
Copy link
Contributor

karmi commented Oct 23, 2013

Just a correction, the correct header is X-Opaque-Id, not X-Unique-Id:

curl -i -H "X-Opaque-Id: foobar" localhost:9200/_search | grep foobar

@abhijitiitr
Copy link

Is this feature implemented in the latest beta version?
Shouldn't the _source only option be a part of _search & _msearch similar to _get & _mget.

@spinscale spinscale removed their assignment Jul 18, 2014
@clintongormley
Copy link

Given that this isn't a common use case, and can be solved easily on the application side (by extracting the hits only and sha'ing just those), we've decided against making any changes here.

@kuseman
Copy link

kuseman commented Aug 13, 2014

Could this be opened again and reconsidered?

Solving this on the application side is not an option for us because then it's too late.
We have certain queries that only request small amount of data from each document, then in the whole 80-90% of the response is just metadata and is garbage to us and slows down the response times.

Being able to exclude the meta data would be awesome.

@brusic
Copy link
Contributor

brusic commented Aug 18, 2014

Take a look at Jörg's plugin: https://github.com/jprante/elasticsearch-arrayformat

@kuseman
Copy link

kuseman commented Aug 19, 2014

Added #7330

@clintongormley
Copy link

We're keen to provide a more generic solution to this problem, so I'm going to close this issue in favour of #7401

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

9 participants