# Elasticsearch database

Interact with Elasticsearch analytics database via Langchain. This chain builds search queries via the Elasticsearch DSL API (filters and aggregations).

The Elasticsearch client must have permissions for index listing, mapping description and search queries.

In [None]:
from elasticsearch import Elasticsearch

from langchain.chains.elasticsearch_database import ElasticsearchDatabaseChain
from langchain.chat_models import ChatOpenAI

In [None]:
# Initialize Elasticsearch python client.
# See https://elasticsearch-py.readthedocs.io/en/v8.8.2/api.html#elasticsearch.Elasticsearch
client = Elasticsearch("http://user:pass@localhost:9200")

In [None]:
llm = ChatOpenAI(temperature=0)
chain = ElasticsearchDatabaseChain.from_llm(llm=llm, database=client)

In [None]:
question = "What are the 10 biggest tenants in the database?"
inputs = {
    "question": question,
}
chain(inputs)

## Custom prompt

You can also customize the prompt that is used. Here is an example prompting it to understand that "tenant" is the same as the "account_id" index property.

In [None]:
from langchain.chains.elasticsearch_database.prompts import DEFAULT_DSL_TEMPLATE
from langchain.prompts.prompt import PromptTemplate

PROMPT_SUFFIX = """Only use the following Elasticsearch indices:
{indices_info}

If someone asks for the property "tenant", they really mean the "account_id" property.

Question: {input}"""

PROMPT = PromptTemplate(
    input_variables=["input", "indices_info", "top_k"],
    template=DEFAULT_DSL_TEMPLATE + PROMPT_SUFFIX,
)


In [None]:
chain = ElasticsearchDatabaseChain.from_llm(llm=ChatOpenAI(temperature=0), database=client, prompt=PROMPT)
chain.run("How many tenants are localized in eu_west_3?")

## Adding example rows from each index

Sometimes, the format of the data is not obvious and it is optimal to include a sample of rows from the indices in the prompt to allow the LLM to understand the data before providing a final query. Here we will use this feature to let the LLM know that artists are saved with their full names by providing ten rows from the index.

In [None]:
chain = ElasticsearchDatabaseChain.from_llm(
    llm=ChatOpenAI(temperature=0),
    database=client,
    include_indices=["artists"],     # we include only one table to save tokens in the prompt :)
    sample_documents_in_index_info=10,     # 10 rows from each index will be included in the prompt as sample data
)