Skip to content

Advance Update in a Elasticsearch plugin that provides you control over the document update functionality of elasticsearch.

Notifications You must be signed in to change notification settings

paralax/advance-update

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Advance Update

Control your elasticsearch document updates with extra speed.

Description

  • So, Elasticsearch provides update API's like _update and _bulk which helps us in updating elasticsearch document. But elasticsearch updates the document by merging the given [NEW] document with the [OLD] document, which leads to a case where

    • The older documents fields which are not present in the new given document will be retained, so if you want delete a field you have to explicitly send a update query with a script. And everyone knows that elasticsearch scripts are slower (eg: to delete a field from 13 million documents it will take 50 min in 5 node cluster)

    eg : Old Document

         {
             "a": 23,
             "b": 40
         }
    _New Document_ : /_update api
    
         {
             "a": 19,
             "c": 45
         }
    
    Now If I send this to elasticsearch the result output document will be :
    
         {
             "a": 19,
             "b": 40,
             "c": 45
         }
    
    So the field "b"is retained, but my usecase is to remove the "b", as I not sending it in the request.
    
    • The Advance Plugin can do this for you.

    eg : Old Document

         {
             "a": 23,
             "b": 40
         }
    _New Document_ : /_advanceupdate api
    
         {
             "a": 19,
             "c": 45
         }
    
    Now If I send this to elasticsearch the result output document will be :
    
         {
             "a": 19,
             "c": 45
         }
    
    • Elasticsearch Updates first update documents in the primary shard and then it updates the replica shards, now that increased the total time required for update of document.
    • You can reduce the total time by setting the replica to -1, but if you have a lot of documents which takes 1 hour to get updated and in meanwhile your nodes goes down you dont have a protection of the replicas to recover i.e, you loose your data.
    • Advance-Update gives you the functionality where your old data is safe and speed for updating documents is much higher than the normal updates.

API Support

  • POST _advanceupdate

    usage :

     /_update
    
     {
         "a": 19,
         "c": 45
     }
    
  • POST/PUT _advancebulk

     /_advancebulk
    
      { "update" : {"_id" : "2", "_type" : "type1", "_index" : "test"} }
      { "doc" : {"b": 12,"f": 14,"m":15}, "doc_as_upsert" : true}
      { "update" : {"_id" : "2", "_type" : "type1", "_index" : "test"} }
      { "doc" : {"s": 15,"l": 14,"k":12}, "doc_as_upsert" : true}
    

Prerequisites

  • Elasticsearch 5.6.0
  • Change the thread_pool.bulk.queue_size to high enough so that your documents don't get skipped or else you can keep it to -1 in your elasticsearch.yml file.

Installing

Download and install elasticsearch 5.6.0 from

Download the leatest Advance Update plugin from

Install the plugin :

    <ES directory>/elasticsearch-plugin install file:///Downloads/advance-update.zip

Test the installation by visiting https://localhost:9200/_cat/plugins

Versioning

Works with only elasticsearch 5.6.0

Authors

About

Advance Update in a Elasticsearch plugin that provides you control over the document update functionality of elasticsearch.

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Java 100.0%