## Documentation

To read more about the update API, visit the [docs](https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-update.html).

![update_docs](../images/update_docs.png)

## Connect to ElasticSearch

In [1]:
from pprint import pprint
from elasticsearch import Elasticsearch

es = Elasticsearch('http://localhost:9200')
client_info = es.info()
print('Connected to Elasticsearch!')
pprint(client_info.body)

Connected to Elasticsearch!
{'cluster_name': 'es-docker-cluster',
 'cluster_uuid': '68vsKryIR7Ss49bLh7mz5Q',
 'name': 'es01',
 'tagline': 'You Know, for Search',
 'version': {'build_date': '2025-12-16T10:09:08.849001802Z',
             'build_flavor': 'default',
             'build_hash': 'd8972a71dbbd64ff17f2f4dba9ca2c3fe09fb100',
             'build_snapshot': False,
             'build_type': 'docker',
             'lucene_version': '10.3.2',
             'minimum_index_compatibility_version': '8.0.0',
             'minimum_wire_compatibility_version': '8.19.0',
             'number': '9.2.3'}}


## Index documents

In [6]:
es.indices.delete(index='my_index', ignore_unavailable=True)
es.indices.create(index='my_index')

ObjectApiResponse({'acknowledged': True, 'shards_acknowledged': True, 'index': 'my_index'})

We are using the `dummy_data.json` file to insert multiple documents in the index. We store the id of each document in the `document_ids` list.

In [7]:
import json
from tqdm import tqdm


document_ids = []
dummy_data = json.load(open("../data/dummy_data.json"))
for document in tqdm(dummy_data, total=len(dummy_data)):
    response = es.index(index='my_index', body=document)
    document_ids.append(response['_id'])

100%|██████████| 3/3 [00:00<00:00, 31.50it/s]


In [8]:
document_ids

['Dj1mbpsBX61qEH3bx0kl', 'Dz1mbpsBX61qEH3bx0l5', 'ED1mbpsBX61qEH3bx0l_']

## Update API

### 1. If documents exists in the index

#### 1.1 Update an existing field

In [11]:
from pprint import pprint

response = es.update(
    index="my_index",
    id=document_ids[0],
    script={
        "source": "ctx._source.title = params.title",
        "params": {
            "title": "New Title"
        }
    },
)
pprint(response.body)

{'_id': 'Dj1mbpsBX61qEH3bx0kl',
 '_index': 'my_index',
 '_primary_term': 1,
 '_seq_no': 4,
 '_shards': {'failed': 0, 'successful': 1, 'total': 2},
 '_version': 3,
 'result': 'updated'}


In [None]:
POST /my_index/_update/Dj1mbpsBX61qEH3bx0kl
{
  "script": {
    "source": "ctx._source.title = params.title",
    "params": {
      "title": "New Title"
    }
  }
}


In [12]:
response = es.get(index='my_index', id=document_ids[0])
pprint(response.body)

{'_id': 'Dj1mbpsBX61qEH3bx0kl',
 '_index': 'my_index',
 '_primary_term': 1,
 '_seq_no': 4,
 '_source': {'created_on': '2024-09-22',
             'text': 'This is the first sample document text.',
             'title': 'New Title'},
 '_version': 3,
 'found': True}


In [None]:
GET /my_index/_doc/Dj1mbpsBX61qEH3bx0kl

> Note: `_version` field is automatically incremented each time a document is updated

#### 1.2 Add a new field

To add a new field, you can either use the `script` argument or the `doc` argument.

##### 1.2.1 Method 1 (Script)

In [13]:
response = es.update(
    index="my_index",
    id=document_ids[0],
    script={
        "source": "ctx._source.new_field = 'dummy_value'",
    },
)
pprint(response.body)

{'_id': 'Dj1mbpsBX61qEH3bx0kl',
 '_index': 'my_index',
 '_primary_term': 1,
 '_seq_no': 6,
 '_shards': {'failed': 0, 'successful': 1, 'total': 2},
 '_version': 5,
 'result': 'updated'}


In [None]:
POST /my_index/_update/Dj1mbpsBX61qEH3bx0kl
{
  "script": {
    "source": "ctx._source.new_field = 'dummy_value'",
  }
}

In [14]:
response = es.get(index='my_index', id=document_ids[0])
pprint(response.body)

{'_id': 'Dj1mbpsBX61qEH3bx0kl',
 '_index': 'my_index',
 '_primary_term': 1,
 '_seq_no': 7,
 '_source': {'created_on': '2024-09-22',
             'new_field': 'dummy_value',
             'new_field2': 'dummy_value2',
             'text': 'This is the first sample document text.',
             'title': 'New Title'},
 '_version': 6,
 'found': True}


##### 1.2.2 Method 2 (doc)

In [15]:
response = es.update(
    index="my_index",
    id=document_ids[0],
    doc={
        "new_value_3": "dummy_value_3",
    },
)
pprint(response.body)

{'_id': 'Dj1mbpsBX61qEH3bx0kl',
 '_index': 'my_index',
 '_primary_term': 1,
 '_seq_no': 8,
 '_shards': {'failed': 0, 'successful': 1, 'total': 2},
 '_version': 7,
 'result': 'updated'}


In [None]:
POST /my_index/_update/Dj1mbpsBX61qEH3bx0kl
{
  "doc": {
    "new_value_2": "dummy_value_2"
  }
}

> Note: `script` has higher preference compared to `doc`, thus when both are specified, `doc` is ignored

In [16]:
response = es.get(index='my_index', id=document_ids[0])
pprint(response.body)

{'_id': 'Dj1mbpsBX61qEH3bx0kl',
 '_index': 'my_index',
 '_primary_term': 1,
 '_seq_no': 8,
 '_source': {'created_on': '2024-09-22',
             'new_field': 'dummy_value',
             'new_field2': 'dummy_value2',
             'new_value_3': 'dummy_value_3',
             'text': 'This is the first sample document text.',
             'title': 'New Title'},
 '_version': 7,
 'found': True}


#### 1.3 Remove a field

In [17]:
response = es.update(
    index="my_index",
    id=document_ids[0],
    script={
        "source": "ctx._source.remove('new_field')",
    },
)
pprint(response.body)

{'_id': 'Dj1mbpsBX61qEH3bx0kl',
 '_index': 'my_index',
 '_primary_term': 1,
 '_seq_no': 10,
 '_shards': {'failed': 0, 'successful': 1, 'total': 2},
 '_version': 9,
 'result': 'updated'}


In [None]:
POST /my_index/_update/Dj1mbpsBX61qEH3bx0kl
{
  "script": {
    "source": "ctx._source.remove('new_field2')"
  }
}

In [18]:
response = es.get(index='my_index', id=document_ids[0])
pprint(response.body)

{'_id': 'Dj1mbpsBX61qEH3bx0kl',
 '_index': 'my_index',
 '_primary_term': 1,
 '_seq_no': 11,
 '_source': {'created_on': '2024-09-22',
             'new_value_2': 'dummy_value_2',
             'new_value_3': 'dummy_value_3',
             'text': 'This is the first sample document text.',
             'title': 'New Title'},
 '_version': 10,
 'found': True}


### 2. If documents doesn't exist in the index

We use `doc_as_upsert` to tell Elasticsearch that if the document does not exist, it should be inserted as a new document.

In [19]:
response = es.update(
    index="my_index",
    id="1",
    doc={
        "book_id": 1234,
        "book_name": "A book",
    },
    doc_as_upsert=True,
)

In [None]:
POST /my_index/_update/2
{
  "doc": {
    "book_id": 124,
    "book_name": "A book2"
  },
  "doc_as_upsert": true
}

In [20]:
pprint(response.body)

{'_id': '1',
 '_index': 'my_index',
 '_primary_term': 1,
 '_seq_no': 12,
 '_shards': {'failed': 0, 'successful': 1, 'total': 2},
 '_version': 1,
 'result': 'created'}


Sure enough, we have 4 documents now instead of 3.

In [22]:
response = es.count(index='my_index')
response['count']

5