# Elasticsearch basics: indexing

This notebook contains a basic introduction to indexing documents into Elasticsearch, using the Python client.
This is an interactive notebook, so you can run the code and experiment with it!

Run this notebook:

- Locally using [jupyter](https://docs.jupyter.org/en/latest/install.html)
- Online using [Google Colab](https://colab.research.google.com/?hl=en)

## 🧰 Requirements

For this example, you will need:

- Python 3.6 or later
- An Elastic deployment
   - We'll be using [Elastic Cloud](https://www.elastic.co/guide/en/cloud/current/ec-getting-started.html) for this example (available with a [free trial](https://cloud.elastic.co/registration?elektra=en-ess-sign-up-page))
- The [Elastic Python client](https://www.elastic.co/guide/en/elasticsearch/client/python-api/current/installation.html)

## Create Elastic Cloud deployment

If you don't have an Elastic Cloud deployment, sign up [here](https://cloud.elastic.co/registration?fromURI=%2Fhome) for a free trial.

- Go to the [Create deployment](https://cloud.elastic.co/deployments/create) page
   - Select **Create deployment**

## Install packages and import modules

To get started, we'll need to connect to our Elastic deployment using the Python client.
Because we're using an Elastic Cloud deployment, we'll use the **Cloud ID** to identify our deployment.

First we need to `pip` install the following packages:

- `elasticsearch`


In [1]:
!pip install elasticsearch

You should consider upgrading via the '/Users/liamthompson/.pyenv/versions/3.9.7/bin/python3.9 -m pip install --upgrade pip' command.[0m


Next we need to import the modules we need.

In [2]:
from elasticsearch import Elasticsearch, helpers
from urllib.request import urlopen
import getpass
# import requests
import json
from datetime import datetime

## Initialize the Elasticsearch client

Now we can instantiate the Elasticsearch client.
First we prompt the user for their password and Cloud ID.

🔐 NOTE: `getpass` enables us to securely prompt for credentials without echoing them to the terminal, or storing in memory.

Then we create a `client` object that instantiates an instance of the `Elasticsearch` class.

In [3]:
# Found in the 'Manage Deployment' page
CLOUD_ID = getpass.getpass('Enter Elastic Cloud ID:  ')

# Password for the 'elastic' user generated by Elasticsearch
ELASTIC_PASSWORD = getpass.getpass('Enter Elastic password:  ')

# Create the client instance
client = Elasticsearch(
    cloud_id=CLOUD_ID,
    basic_auth=("elastic", ELASTIC_PASSWORD)
)

Confirm that the client has connected with this test.

In [4]:
print(client.info())

{'name': 'instance-0000000001', 'cluster_name': '9dd1e5c0b0d64796b8cf0746cf63d734', 'cluster_uuid': 'VeYvw6JhQcC3P-Q1-L9P_w', 'version': {'number': '8.9.0-SNAPSHOT', 'build_flavor': 'default', 'build_type': 'docker', 'build_hash': 'ac7d79178c3e57c935358453331efe9e9cc5104d', 'build_date': '2023-06-21T09:08:25.219504984Z', 'build_snapshot': True, 'lucene_version': '9.7.0', 'minimum_wire_compatibility_version': '7.17.0', 'minimum_index_compatibility_version': '7.0.0', 'transport_version': '8500019'}, 'tagline': 'You Know, for Search'}


Refer to https://www.elastic.co/guide/en/elasticsearch/client/python-api/current/connecting.html#connect-self-managed-new to learn how to connect to a self-managed deployment.

Read https://www.elastic.co/guide/en/elasticsearch/client/python-api/current/connecting.html#connect-self-managed-new to learn how to connect using API keys.

## Indexing a single document

Let's start by indexing a single document.
To index a document, you need to specify three pieces of information:
- the Elasticsearch `index` to index the document into
- the document's `id` (optional) - If you don't specify an id, Elasticsearch will generate a random one for you
- the document itself (here we store this as a Python dictionary named `doc`)

In [15]:
doc = {
    'author': 'john_smith',
    'text': "This is a lovely document, but it's a bit short.",
    'timestamp': datetime.now(),
}
resp = client.index(index="test-index", id=1, document=doc)
print("Document: " + resp["result"])

Document: created


### Updating a document

If you index a document with an id that already exists, Elasticsearch will update the existing document.

In [9]:
doc = {
    'author': 'john_smith',
    'text': "This is a lovely document, and now it's a little bit longer which is great.",
    'timestamp': datetime.now(),
}
resp = client.index(index="test-index", id=1, document=doc)
print("Document: " + resp["result"])

Document: updated


### Deleting a document

You can delete a document by specifying its `index` and `id` in the `delete()` method:

In [7]:
resp= client.delete(index="test-index", id=1)
print("Document: " + resp["result"])

Document: deleted


## Indexing with the bulk API

You can also index multiple documents at once using the [bulk API](https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-bulk.html).
We recommend using the bulk API where possible for greater efficiency, as it allows multiple operations in a single request, ensuring better throughput and performance.

Here's an example of indexing multiple documents using the bulk API.
We have some test data in a `json` file at this [URL](https://raw.githubusercontent.com/leemthompo/notebook-tests/main/12-movies.json).
Let's load that into our Elastic deployment.
First we'll create an index named `movies` to store that data.

In [10]:
client.indices.create(
    index="movies",
    mappings= {
    "properties": {
      "genre": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword",
            "ignore_above": 256
          }
        }
      },
      "keyScene": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword",
            "ignore_above": 256
          }
        }
      },
      "plot": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword",
            "ignore_above": 256
          }
        }
      },
      "released": {
        "type": "integer"
      },
      "runtime": {
        "type": "integer"
      },
      "title": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword",
            "ignore_above": 256
          }
        }
      }
    }
})

ObjectApiResponse({'acknowledged': True, 'shards_acknowledged': True, 'index': 'movies'})

Let's upload the JSON data.
The dataset provides information on twelve films.
Each film's entry includes its title, runtime, plot summary, a key scene, genre classification, and release year.

In [16]:
import json
from urllib.request import urlopen
from urllib.error import URLError

url = "https://raw.githubusercontent.com/leemthompo/notebook-tests/main/12-movies.json"

try:
    # Send a request to the URL and get the response
    response = urlopen(url)

    # Load the response data into a JSON object
    data_json = json.loads(response.read())

    def create_index_body(doc):
        """ Generate the body for an Elasticsearch document. """
        return {
            "_index": "movies",
            "_source": doc,
        }

    # Prepare the documents to be indexed
    documents = [create_index_body(doc) for doc in data_json]

    try:
        # Use helpers.bulk to index
        helpers.bulk(client, documents)
        print("Done indexing documents into index!")
    except elasticsearch.ElasticsearchException as es1:
        print(f"Elasticsearch error: {es_e}")
    except Exception as e:
        print(f"Unknown error occurred during indexing: {e}")

except URLError as url_e:
    print(f"Error fetching data from URL: {url_e}")
except json.JSONDecodeError as json_e:
    print(f"Error decoding JSON data: {json_e}")
except Exception as e:
    print(f"Unknown error occurred: {e}")


Done indexing documents into index!


In [14]:
## Delete index 

index_name = 'movies'

# Delete the index
try:
    client.indices.delete(index=index_name)
    print(f'Successfully deleted index: {index_name}')
except Exception as e:
    print(f'Error deleting index: {index_name}, error: {str(e)}')

Successfully deleted index: movies
