Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update API: Allow to update a document based on a script #1583

Closed
kimchy opened this issue Jan 2, 2012 · 26 comments
Closed

Update API: Allow to update a document based on a script #1583

kimchy opened this issue Jan 2, 2012 · 26 comments

Comments

@kimchy
Copy link
Member

kimchy commented Jan 2, 2012

The update action allows to directly update a specific document based on a script. The operation gets the document (collocated with the shard) from the index, runs the script (with optional script language and parameters), and index back the result (also allows to delete, or ignore the operation).

Note, this operation still means full reindex of the document, it just removes some network roundtrips and reduces chances of version conflicts between the get and the index.

For example, lets index a simple doc:

curl -XPUT localhost:9200/test/type1/1 -d '{
    "counter" : 1,
    "tags" : ["red"]
}'

Now, we can execute a script that would increment the counter

curl -XPOST 'localhost:9200/test/type1/1/_update' -d '{
    "script" : "ctx._source.counter += count",
    "params" : {
        "count" : 4
    }
}'

We can also add a tag to the list of tags:

curl -XPOST 'localhost:9200/test/type1/1/_update' -d '{
    "script" : "ctx._source.tags += tag",
    "params" : {
        "tag" : "blue"
    }
}'

And, we can delete the doc if the tags contain blue, or ignore (noop):

curl -XPOST 'localhost:9200/test/type1/1/_update' -d '{
    "script" : "ctx._source.tags.contains(tag) ? ctx.op = \"delete\" : ctx.op = \"none\"",
    "params" : {
        "tag" : "blue"
    }
}'

Parameters:

  • routing: Sets the routing that will be used to route the document to the relevant shard.
  • parent: Simply sets the routing.
  • timeout: Timeout waiting for a shard to become available.
  • replication: The replication type for the delete/index operation (sync or async).
  • consistency: The write consistency of the index/delete operation.
  • retry_on_conflict: How many times to retry if there is a version conflict between getting the document and indexing / deleting it. Defaults to 0.
@kimchy kimchy closed this as completed in 83d5084 Jan 2, 2012
@n0rthwood
Copy link

Thank you so much on this update. assume this is still in trunk and will be released0.19.0?
also, if i have a nested document , updating it can also be expressed in the script i assume:
something like :
with array and netsted object supported ?
ctx._source.classification=[{"cid"=10,"cvalue"=20},{"cid"=20,"cvalue"=30}]

also, may i ask where is this script reference, I checked though the documentation, but it seems (http://www.elasticsearch.org/guide/reference/modules/scripting.html) it didn't mention this kind of gramma, i.e. refering to a document as ctx and assign value to it. is there more to come?

@kimchy
Copy link
Member Author

kimchy commented Jan 6, 2012

@n0rthwood yea, it will be part of the next release. The grammer in the scripting part relates to search, not for this one. its mainly a set of hashes and lists (json), stored under ctx._source. You can change it however you want, either using mvel (the default) or using one of the plugin scripting langs (including javascript).

@medcl
Copy link
Contributor

medcl commented Jan 7, 2012

very happy to see this feature,my poor paritalupdate plugin can be retired now~

@monken
Copy link

monken commented Jan 10, 2012

Great feature (I think you can close #426)

One thing I miss:
Woud it be possible to call update on a scroll id? That would allow to update a number of documents in ES instead of pulling them to the client and push the update for each document.

Proposed API:

curl -XPOST 'http://localhost:9200/_search/update?scroll_id=c2Nhbjs...WUc1'
-d '{"script":"...","params": ... }'

@kimchy
Copy link
Member Author

kimchy commented Jan 10, 2012

@monken thats basically "update by query", no need to provide a scroll id, just provide the query to use. Its much harder to do, and will come with a lot of caveats (i.e. the query update might fail in the middle of the operation and only be partially completed).

@monken
Copy link

monken commented Jan 10, 2012

@kimchy is "update by query" already implemented? I couldn't find it.
I was just worried about the syntax because you have to provide both the query and the update script in the request body. How would they live next to each other?

@kimchy
Copy link
Member Author

kimchy commented Jan 10, 2012

@monken no, its not implemented. The structure should be simple, the query under query, and script under script.

@monken
Copy link

monken commented Jan 10, 2012

@kimchy should I open a new ticket for this request?

@kimchy
Copy link
Member Author

kimchy commented Jan 11, 2012

@monken yea, open one, though I am still not sure how to best implement it. It going to come with a lot of caveats.

@monken
Copy link

monken commented Jan 12, 2012

@kimchy I bet you'll figure something out :-)

@folke
Copy link

folke commented Jan 12, 2012

Awesome! :-)

@seyyedi
Copy link

seyyedi commented Jan 23, 2012

@kimchy can the update action be used in bulk? I would have expected something along the lines of

{ "update" : { "_index" : "main", "_type" : "visits", "_id" : "21" } }
{ "script" : "ctx._source.count += i", "params": { "i" : 3 } }

but i can't get it to work (with current version in trunk). Is there any inherent problem with update/bulk or am i using the wrong syntax or is it just too early? :-)

Awesome feature by the way!

@kimchy
Copy link
Member Author

kimchy commented Jan 23, 2012

@seyyedi No, it can't be used in bulk (but nice imaginative format for it :) ). I supposed there is an option to support it in bulk, but I was thinking that if we have the update by query (which we still don't) then it will be less needed. Though, I guess it has its uses.

@rolyv
Copy link

rolyv commented Jan 27, 2012

Can you update a document's TTL and Timestamp?

@Paikan
Copy link
Contributor

Paikan commented Jan 28, 2012

@rolyv yes you can update TTL and Timestamp on master branch.

You can use ctx._ttl and ctx._timestamp in your script.

@Alex-Ikanow
Copy link

Nice new feature! Is there any way this could be (theoretically modified to be) used to update just a nested object, while leaving the "parent" document alone?

Eg in my use case I have documents with a large full text index, and many nested sub-objects that have a smallish number of indexed fields.

Any given sub-object has (numeric) attributes that change every few hours, but I don't want to re-index the entire document (store in MongoDB), so at the moment I just discard that attribute and try to combine them as best I can in the application layer.

If I could "just" modify (numeric) fields inside specified nested sub-objects (eg all sub-objects matching a query), without touching the "parent" document, that would remove my last MongoDB-elasticsearch synchronization issue.

@ghost
Copy link

ghost commented Feb 12, 2012

Thanks for the great feature! Is it possible to update stored field if the _source is disabled? I tried ctx.fields, ctx._fields, ctx['...'] to no avail...

@kimchy
Copy link
Member Author

kimchy commented Feb 12, 2012

@msayapin no, this only works when _source is enabled, otherwise, we can't reindex the doc.

@gzsombor
Copy link

gzsombor commented Mar 6, 2012

Is it documented anywhere, what variables can a script access ? The documentation briefly mentions ctx._ttl, and ctx._timestamp : http://www.elasticsearch.org/guide/reference/api/update.html but it's not clear, what is the type of this variables (string, number or timestamp? ) And how to enable / disable _source ?

@Paikan
Copy link
Contributor

Paikan commented Mar 6, 2012

@gzsombor the _ttl can be a number or a string representing a TimeValue like "1d". The _timestamp is a String wich can be a timestamp or use configured date format. You can check http://www.elasticsearch.org/guide/reference/mapping/timestamp-field.html and http://www.elasticsearch.org/guide/reference/mapping/ttl-field.html for more information.

The _source is enabled by default. Have a look here http://www.elasticsearch.org/guide/reference/mapping/source-field.html to disable it.

@alex-in2
Copy link

alex-in2 commented Dec 9, 2013

Is it possible to use _version in conditions ?

like

if(ctx._version == 1)
{
ctx._source.field1 = 'some value'
}

@bigtoerag
Copy link

bigtoerag commented Sep 19, 2017

I have a issue with remove based on the above logic but using the kibana dev tools interface rather than curl:

This works:
POST metricbeat-2017.09.18/_update_by_query?conflicts=proceed { "script": { "inline": "ctx._source.tags.add(params.tag)", "lang": "painless", "params" : { "tag" : "green" } }, "query" : { "term" : { "type" : "metricsets" } } }

But this doesnt:
POST metricbeat-2017.09.18/_update_by_query?conflicts=proceed { "script": { "inline": "ctx._source.tags.remove(params.tag)", "lang": "painless", "params" : { "tag" : "green" } }, "query" : { "term" : { "type" : "metricsets" } } }

The error thrown back seems like a syntax issue:
{ "error": { "root_cause": [ { "type": "script_exception", "reason": "runtime error", "script_stack": [ "ctx._source.tags.remove(params.tag)", " ^---- HERE" ], "script": "ctx._source.tags.remove(params.tag)", "lang": "painless" } ], "type": "script_exception", "reason": "runtime error", "script_stack": [ "ctx._source.tags.remove(params.tag)", " ^---- HERE" ], "script": "ctx._source.tags.remove(params.tag)", "lang": "painless", "caused_by": { "type": "class_cast_exception", "reason": "java.lang.String cannot be cast to java.lang.Number" } }, "status": 500 }

What is the correct way to remove if this is the way to add? Does it not support remove or delete this way?

EDIT: More abstract searching came up with a hit for painless not having the remove method due to some stuff I dont understand.
#21375

So i tried this and it worked for the added tag and any other tag match:
POST metricbeat-2017.09.18/_update_by_query { "script": { "inline": "ctx._source.tags.removeAll(Collections.singleton(params.tag))", "lang": "painless", "params" : { "tag" : "green" } }, "query" : { "term" : { "type" : "metricsets" } } }

@rezende7
Copy link

How can I change the type of a field?
e.g. : CloseDate is a date type, but in my index it is like long.

@reswob10
Copy link

reswob10 commented Jun 7, 2018

Question: I'm using the syntax above to try to remove a tag (via Kibana dev tools tab):

POST logstash-*/_update_by_query
{
"query" : {
"match" : { "tags": "GREEN" }
},
"script":{"inline":"ctx._source.tags.removeALL(Collections.singleton(params.tag))","lang":"painless","params": {"tag":"NEWTAG"}}
}

But when I run this, I get the following error:

"type": "script_exception:,
"reason":"runtime error",
"script_stack": [
   "ctx._source.tags.removeALL(Collections.singleton(params.tag))",
"                                                         ^---- HERE"
],
"script":"ctx._source.tags.removeALL(Collections.singleton(params.tag))",
"lang":"painless",
"caused_by": {
"type":"illegal_argument_exception",
"reason": "Unable to find dynamic method [removeALL] with [1] arguments for class [java.util.ArrayList]."
}
},
"status":500
}

I haven't googled this yet, but as I do, I was wondering if anyone know what this meant.

Thanks....

@dadoonet
Copy link
Member

dadoonet commented Jun 7, 2018

Better to ask on discuss.elastic.co.

But may be it should be removeAll ?

@reswob10
Copy link

reswob10 commented Jun 7, 2018

Will ask there...

Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests