URI Request that returns just the _source, without metadata #2149

ejain · 2012-08-08T00:42:37Z

'http://localhost:9200/twitter/tweet/_search?q=user:kimchy' returns:

{
    "_shards":{
        "total" : 5,
        "successful" : 5,
        "failed" : 0
    },
    "hits":{
        "total" : 1,
        "hits" : [
            {
                "_index" : "twitter",
                "_type" : "tweet",
                "_id" : "1", 
                "_source" : {
                    "user" : "kimchy",
                    "postDate" : "2009-11-15T14:12:12",
                    "message" : "trying out Elastic Search"
                }
            }
        ]
    }
}

But sometimes it would be more useful to get a plain "dump" of the _source data instead:

{
    ...
    "hits":{
        "total" : 1,
        "hits" : [
            {
                "user" : "kimchy",
                "postDate" : "2009-11-15T14:12:12",
                "message" : "trying out Elastic Search"
            }
        ]
    }
}

xstevens · 2013-06-20T20:35:27Z

This would be really useful to have. In my case I'm trying to do HTTP response caching but "took" in the results obviously can change on each query even though the results are the same.

spinscale · 2013-06-24T07:49:08Z

Hey,

you can do this with the current elasticsearch release for single documents (but not for searches)

curl -X PUT localhost:9200/foo/bar/1 -d '{ "name":"foo", "f":"a" }'
{"ok":true,"_index":"foo","_type":"bar","_id":"1","_version":2}                                                                                                                                          

curl localhost:9200/foo/bar/1/_source
{ "name":"foo", "f":"a" }

@xstevens If you really need to this for searches, putting a varnish proxy (or something similar) front makes more sense.

@ejain Can you tell what the big difference of only having the source compared to having the source including the metadata is anyway in a search response? Maybe I didnt get your request completely right.

xstevens · 2013-06-24T17:18:05Z

Well this wasn't really my request but it would work for what I want. I'm looking to remove the "took" variable from search results because that's what blows out an HTTP response cache. What I mean by that is, I end up with an entry per took="response time" even though the rest of the data stays the same.

spinscale · 2013-06-24T21:54:48Z

Hey,

I am still not sure, if these are the right approaches to the problem, as I am still unsure about the problem. Maybe you can elaborate on what you want to do. If you simply want to cache the response, is it really important, if the took value is included in the response? I mean, does it matter? If an old took value is sent, because the search response is cached, what does this mean for you? Is that bad?

I am not sure, how your caching is working either. Is that configurable? Or do you simply cache the result of a certain request with a certain body? Maybe you can use the X-Unique-Id header for this (can be specified in the request and is included in the response as well), but I cannot really tell, until I understand your caching strategy (and why you are so focused on some fields :-)

xstevens · 2013-06-24T22:57:06Z

I'm just trying to do basic HTTP response caching with no knowledge that's it is even ElasticSearch that I'm talking to. I'm using Apache HttpClient caching that comes built-in. The reason why the "took" field is a problem is because the caching mechanism is checking on whether the payload (search result in this case) has changed in the background. So it's invalidating the cache more often than it needs to. I can work around this of course by doing my own caching, but I was going to try to avoid that since HttpClient has some other nice checks around Cache-Control headers, etc. for services that give that kind of feedback.

xstevens · 2013-06-24T22:58:41Z

As far as how HttpClient is detecting a payload change I believe their impelmentation is using SHA256(payload).

ejain · 2013-06-25T18:15:16Z

My use case is that I need to let users download their documents in bulk; this would be a lot more efficient if I didn't have to parse the response and strip out elasticsearch-specific properties.

dpkirchner · 2013-10-14T19:33:59Z

@spinscale What version introduced _source? I get "No handler found for uri /index/type/NNN/_source" on 0.26.

This feature would be really useful for me as well (I'd like to be able to download documents in bulk and then update them in bulk without having to do surgery).

brusic · 2013-10-14T21:08:46Z

@therealdpk Judging by the commit/issue, the feature will be available in elasticsearch 1.0. Someone please correct me if I am wrong, but I am curious as well and I do not see it in the 0.90 branch.

#3301

spinscale · 2013-10-15T07:39:57Z

@therealdpk it was introduced in 0.90.1

@brusic the issue you referred to is for more fine grained access control to the source without changing the data structure layout when requesting the data (which can happen in few cases)

https://github.com/elasticsearch/elasticsearch/blob/0.90/src/main/java/org/elasticsearch/rest/action/get/RestGetSourceAction.java

brusic · 2013-10-15T17:59:46Z

Sorry for the misinformation. I assumed the _source param would be part of the normal RestGetAction.

karmi · 2013-10-23T15:31:26Z

Just a correction, the correct header is X-Opaque-Id, not X-Unique-Id:

curl -i -H "X-Opaque-Id: foobar" localhost:9200/_search | grep foobar

abhijitiitr · 2013-12-15T10:18:06Z

Is this feature implemented in the latest beta version?
Shouldn't the _source only option be a part of _search & _msearch similar to _get & _mget.

clintongormley · 2014-07-25T07:59:04Z

Given that this isn't a common use case, and can be solved easily on the application side (by extracting the hits only and sha'ing just those), we've decided against making any changes here.

kuseman · 2014-08-13T14:11:27Z

Could this be opened again and reconsidered?

Solving this on the application side is not an option for us because then it's too late.
We have certain queries that only request small amount of data from each document, then in the whole 80-90% of the response is just metadata and is garbage to us and slows down the response times.

Being able to exclude the meta data would be awesome.

brusic · 2014-08-18T18:55:39Z

Take a look at Jörg's plugin: https://github.com/jprante/elasticsearch-arrayformat

kuseman · 2014-08-19T07:09:41Z

Added #7330

clintongormley · 2014-08-22T11:03:56Z

We're keen to provide a more generic solution to this problem, so I'm going to close this issue in favour of #7401

ghost assigned spinscale Jun 24, 2013

clintongormley added the discuss label Jul 8, 2014

spinscale removed their assignment Jul 18, 2014

clintongormley closed this as completed Jul 25, 2014

clintongormley reopened this Aug 13, 2014

This was referenced Aug 19, 2014

Update InternalSearchHit.java #7330

Closed

top_hits, date_histogram : allow not serializing some of the fields. #7350

Closed

Add response_transform_script to allow xpath style selection of response elements #7401

Closed

clintongormley closed this as completed Aug 22, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

URI Request that returns just the _source, without metadata #2149

URI Request that returns just the _source, without metadata #2149

ejain commented Aug 8, 2012

xstevens commented Jun 20, 2013

spinscale commented Jun 24, 2013

xstevens commented Jun 24, 2013

spinscale commented Jun 24, 2013

xstevens commented Jun 24, 2013

xstevens commented Jun 24, 2013

ejain commented Jun 25, 2013

dpkirchner commented Oct 14, 2013

brusic commented Oct 14, 2013

spinscale commented Oct 15, 2013

brusic commented Oct 15, 2013

karmi commented Oct 23, 2013

abhijitiitr commented Dec 15, 2013

clintongormley commented Jul 25, 2014

kuseman commented Aug 13, 2014

brusic commented Aug 18, 2014

kuseman commented Aug 19, 2014

clintongormley commented Aug 22, 2014

URI Request that returns just the _source, without metadata #2149

URI Request that returns just the _source, without metadata #2149

Comments

ejain commented Aug 8, 2012

xstevens commented Jun 20, 2013

spinscale commented Jun 24, 2013

xstevens commented Jun 24, 2013

spinscale commented Jun 24, 2013

xstevens commented Jun 24, 2013

xstevens commented Jun 24, 2013

ejain commented Jun 25, 2013

dpkirchner commented Oct 14, 2013

brusic commented Oct 14, 2013

spinscale commented Oct 15, 2013

brusic commented Oct 15, 2013

karmi commented Oct 23, 2013

abhijitiitr commented Dec 15, 2013

clintongormley commented Jul 25, 2014

kuseman commented Aug 13, 2014

brusic commented Aug 18, 2014

kuseman commented Aug 19, 2014

clintongormley commented Aug 22, 2014