# RAG with Elastic and Llama3 using Llamaindex

This interactive notebook uses `Llamaindex` to process fictional workplace documents and uses `Llama3` running locally using `Ollama` to transform these documents into embeddings and store them into `Elasticsearch`. We then ask a question, retrieve the relevant documents from `Elasticsearch` and use `Llama3` to provide a response. 

**_Note_** : _Llama3 is expected to be running using `Ollama` on the same machine where you will be running this notebook._

## Requirements

For this example, you will need:

- An Elastic deployment
  - We'll be using [Elastic Cloud](https://www.elastic.co/guide/en/cloud/current/ec-getting-started.html) for this example (available with a [free trial](https://cloud.elastic.co/registration?utm_source=github&utm_content=elasticsearch-labs-notebook))
  - For LLM we will be using [Ollama](https://ollama.com/) and [Llama3](https://ollama.com/library/llama3) configured locally.  

### Use Elastic Cloud

If you don't have an Elastic Cloud deployment, follow these steps to create one.

1. Go to [Elastic cloud Registration](https://cloud.elastic.co/registration?utm_source=github&utm_content=elasticsearch-labs-notebook) and sign up for a free trial
2. Select **Create Deployment** and follow the instructions

## Install required dependencies for LlamaIndex and Elasticsearch

First we install the packages we need for this example.

In [None]:
# !pip install llama-index llama-index-cli llama-index-core llama-index-embeddings-elasticsearch llama-index-embeddings-ollama llama-index-legacy llama-index-llms-ollama llama-index-readers-elasticsearch llama-index-readers-file llama-index-vector-stores-elasticsearch llamaindex-py-client

## Import packages
Next we import the required packages as required. The imports are placed in the cells as required.

In [1]:
from llama_index.core.node_parser import SentenceSplitter
from llama_index.core.ingestion import IngestionPipeline
from llama_index.embeddings.ollama import OllamaEmbedding
from llama_index.vector_stores.elasticsearch import ElasticsearchStore
from llama_index.core import VectorStoreIndex, QueryBundle
from llama_index.llms.ollama import Ollama
from llama_index.core import Document, Settings
from getpass import getpass
from urllib.request import urlopen
import json



In [3]:
# from getpass import getpass


# https://www.elastic.co/search-labs/tutorials/install-elasticsearch/elastic-cloud#finding-your-cloud-id
# ELASTIC_CLOUD_ID = getpass("Elastic Cloud ID: ")
ELASTIC_CLOUD_ID = '9dd01e3adfe24b8aabbafbd0346e26f6:dXMtY2VudHJhbDEuZ2NwLmNsb3VkLmVzLmlvOjQ0MyQ0YjRkZTgwNzBjNWM0ZGUzOTRhOWJlZjZiN2E1N2E0OSRmMTkyZWI3ZDhkNzI0OTY0OGIzNjhkMTg1YTkwMzJjMw=='

# https://www.elastic.co/search-labs/tutorials/install-elasticsearch/elastic-cloud#creating-an-api-key
# ELASTIC_API_KEY = getpass("Elastic Api Key: ")
ELASTIC_API_KEY = 'S3otdldwRUJjMDVDMmlIRmJ0SGk6MGl2bVlZcDFTYXk4akNvYmczRUdFZw=='


ELASTIC_CLOUD_ID = "031371d8df2748f398b6d907f3e5a386:dXMtY2VudHJhbDEuZ2NwLmNsb3VkLmVzLmlvOjQ0MyQ1YTE2MTJjM2E4MmU0NTUzYmRiZTE3NjkzZWQxM2RlYyQ5NWQxMWY2MDgwZDk0YTdhODNmOGFlYWIyNDUxOTVjNg=="

ELASTIC_API_KEY = "S3otdldwRUJjMDVDMmlIRmJ0SGk6MGl2bVlZcDFTYXk4akNvYmczRUdFZw=="


## Prepare documents for chunking and ingestion
We now prepare the data to be in the [Document](https://docs.llamaindex.ai/en/stable/module_guides/loading/documents_and_nodes/) type for processing using [Llamaindex](https://docs.llamaindex.ai/en/stable/) 

In [None]:
url = "https://raw.githubusercontent.com/elastic/elasticsearch-labs/main/datasets/workplace-documents.json"

response = urlopen(url)
workplace_docs = json.loads(response.read())

# Building Document required by LlamaIndex.
documents = [
    Document(
        text=doc["content"],
        metadata={
            "name": doc["name"],
            "summary": doc["summary"],
            "rolePermissions": doc["rolePermissions"],
        },
    )
    for doc in workplace_docs
]

## Define Elasticsearch and ingest pipeline in LlamaIndex for document processing. Use Llama3 for generating embeddings.
We now define the `Elasticsearchstore` with the required index name, the text field and its associated embeddings. We use `Llama3` to generate the embeddings. We will be running Semantic search on the index to find documents relevant to the query posed by the user. We will use the `SentenceSplitter` provided by `Llamaindex` to chunk the documents. All this is run as part of an `IngestionPipeline` provided by the `Llamaindex` framework.

In [None]:
es_vector_store = ElasticsearchStore(
    index_name="workplace_index",
    vector_field="content_vector",
    text_field="content",
    es_cloud_id=ELASTIC_CLOUD_ID,
    es_api_key=ELASTIC_API_KEY,
)


In [None]:
# Embedding Model to do local embedding using Ollama.
ollama_embedding = OllamaEmbedding("llama3")

In [None]:
# LlamaIndex Pipeline .configured to take care of chunking, embedding
# and storing the embeddings in the vector store.
pipeline = IngestionPipeline(
    transformations=[
        SentenceSplitter(chunk_size=512, chunk_overlap=100),
        ollama_embedding,
    ],
    vector_store=es_vector_store,
)

## Execute pipeline 
This will chunk the data, generate embeddings using `Llama3` and ingest into `Elasticsearch` index, with embeddings in a `dense` vector field.

In [None]:
pipeline.run(show_progress=True, documents=documents)

The embeddings are stored in a dense vector field of dimension `4096`. The dimension size comes from the size of the embeddings generated from `Llama3`.

## Define LLM settings. 
This connects to your local LLM. Please refer to https://ollama.com/library/llama3 for details on steps to run Llama3 locally. 

_If you have sufficient resources (atleast >64 GB Ram and GPU available) then you could try the 70B parameter version of Llama3_ 

In [None]:
Settings.embed_model = ollama_embedding
local_llm = Ollama(model="llama3")

### Setup Semantic search and integrate with Llama3. 
We now configure `Elasticsearch` as the vector store for the `Llamaindex` query engine. The query engine, using `Llama3` is then used to answer your questions with contextually relevant data from `Elasticsearch`.

In [None]:
index = VectorStoreIndex.from_vector_store(es_vector_store)
query_engine = index.as_query_engine(local_llm, similarity_top_k=10)

# Customer Query
query = "What are the organizations sales goals?"
bundle = QueryBundle(
    query_str=query, embedding=Settings.embed_model.get_query_embedding(query=query)
)

response = query_engine.query(bundle)

print(response.response)

_You could now try experimenting with other questions._

## LangChain

In [1]:
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_elasticsearch import ElasticsearchStore
from langchain_community.llms import Ollama
from langchain.prompts import ChatPromptTemplate
from langchain.schema.output_parser import StrOutputParser
from langchain.schema.runnable import RunnablePassthrough
from langchain_elasticsearch import ElasticsearchStore
from langchain_elasticsearch import SparseVectorStrategy
from getpass import getpass
from urllib.request import urlopen
import json



In [2]:
url = "https://raw.githubusercontent.com/elastic/elasticsearch-labs/main/datasets/workplace-documents.json"

response = urlopen(url)
workplace_docs = json.loads(response.read())
metadata = []
content = []
for doc in workplace_docs:
    content.append(doc["content"])
    metadata.append(
        {
            "name": doc["name"],
            "summary": doc["summary"],
            "rolePermissions": doc["rolePermissions"],
        }
    )
text_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(
    chunk_size=512, chunk_overlap=256
)
docs = text_splitter.create_documents(content, metadatas=metadata)

In [4]:
es_vector_store = ElasticsearchStore(
    es_cloud_id=ELASTIC_CLOUD_ID,
    es_api_key=ELASTIC_API_KEY,
    index_name="workplace_index_elser",
    strategy=SparseVectorStrategy(
        model_id=".elser_model_2_linux-x86_64"
    )
)

In [6]:
es_vector_store.add_documents(documents=docs)

['3e62dd9a-0718-456e-8ba7-65b06400aa32',
 '42665a9d-a301-4df6-a88c-b16013674742',
 '27c12488-63c6-4346-8446-5c6982c91351',
 '0cc6b9e0-b22e-474c-a141-cd96d52bd585',
 'bbaebfe6-7d84-400b-8d78-afa6f81a8621',
 'bbd5b76d-7569-4783-ae09-af545e5f8734',
 '2273b1d7-7d13-49f3-a102-d86313009499',
 '0279e137-fdee-4005-a767-bf5493adb601',
 'a3d0a268-822e-4d3d-a2f8-e3fc7bd2e2c9',
 '3ad3e6b0-b730-40a1-9623-800c3c1ff735',
 'fe7a98d7-39ba-4106-af9b-ed438e129470',
 '5205ffda-0145-486c-a37c-b16b7b231487',
 'f413b0d1-e792-45b4-8fd9-f0fec42b8931',
 '4f382734-8059-4bd4-9a2e-4d8c665d3370',
 '53e57a7b-9e78-4bed-9613-c493ce8b9800',
 'b93034d5-b880-4a4c-a7f1-0fd9405683be',
 '29823853-9396-4743-a6ac-7db6e09a1ea1',
 '4e5e7ce1-f3e5-400e-80a2-ff7c9072f6df',
 '06a85a83-56fd-41bf-8141-6a66c21ac069',
 'bf33bf25-7c5d-40de-a830-e5749dcd80f2',
 '37bdbcf6-b29a-49d8-ae69-d4123fbfdb93',
 '5884e536-7ca8-49b5-aa99-b3b61b1f15d3',
 'f0ca9c18-b855-4aa2-ae07-85dd744e2b51',
 '812b16b7-d826-424a-a86f-21da3fb14c27',
 '2bced6fe-9422-

In [5]:
llm = Ollama(model="llama3")

In [6]:
def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

retriever = es_vector_store.as_retriever()
template = """Answer the question based only on the following context:\n

                {context}
                
                Question: {question}
               """
prompt = ChatPromptTemplate.from_template(template)
chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

chain.invoke("What are the organizations sales goals?")

  hits = self._store.search(


"According to the context, the organization's sales goals for fiscal year 2024 are:\n\n1. Increase revenue by 20% compared to fiscal year 2023.\n2. Expand market share in key segments by 15%.\n3. Retain 95% of existing customers and increase customer satisfaction ratings.\n4. Launch at least two new products or services in high-demand market segments."

In [7]:
chain.invoke("What are the expectations from new employees?")

  hits = self._store.search(


"According to the onboarding guide, the expectations from new employees are:\n\n1. Attend orientation within your first week: Meet your colleagues and learn more about our company's history, mission, and values.\n\n2. Review policies and procedures: Familiarize yourself with our employee handbook, which contains important information about our policies and procedures. Please read it thoroughly and adhere to the guidelines.\n\n3. Complete required training sessions: Attend mandatory training sessions, such as safety training or anti-harassment training, as soon as possible.\n\n4. Updating Tax Elections and Documents: Complete tax forms, submit regional tax forms (if necessary), and update your address with the HR department if you move.\n\n5. Benefits Enrollment: Review benefits options, complete enrollment forms within 30 days of your start date, and designate beneficiaries for life insurance and retirement plans (if applicable).\n\n6. Getting Settled in Your Workspace: Set up your wor