# RAG with Elastic and Llama3 using Llamaindex

This interactive notebook uses `Llamaindex` to process fictional workplace documents and uses `Llama3` running locally using `Ollama` to transform these documents into embeddings and store them into `Elasticsearch`. We then ask a question, retrieve the relevant documents from `Elasticsearch` and use `Llama3` to provide a response. 

**_Note_** : _Llama3 is expected to be running using `Ollama` on the same machine where you will be running this notebook._

## Requirements

For this example, you will need:

- An Elastic deployment
  - We'll be using [Elastic Cloud](https://www.elastic.co/guide/en/cloud/current/ec-getting-started.html) for this example (available with a [free trial](https://cloud.elastic.co/registration?utm_source=github&utm_content=elasticsearch-labs-notebook))
  - For LLM we will be using [Ollama](https://ollama.com/) and [Llama3](https://ollama.com/library/llama3) configured locally.  

### Use Elastic Cloud

If you don't have an Elastic Cloud deployment, follow these steps to create one.

1. Go to [Elastic cloud Registration](https://cloud.elastic.co/registration?utm_source=github&utm_content=elasticsearch-labs-notebook) and sign up for a free trial
2. Select **Create Deployment** and follow the instructions

## Install required dependencies for LlamaIndex and Elasticsearch

First we install the packages we need for this example.

In [1]:
# !pip install llama-index llama-index-cli llama-index-core llama-index-embeddings-elasticsearch llama-index-embeddings-ollama llama-index-legacy llama-index-llms-ollama llama-index-readers-elasticsearch llama-index-readers-file llama-index-vector-stores-elasticsearch llamaindex-py-client

Collecting llama-index
  Obtaining dependency information for llama-index from https://files.pythonhosted.org/packages/be/1a/458a74bbec4364f474eee0ded01f691ccb76e9925278c96b07b4564f399a/llama_index-0.10.51-py3-none-any.whl.metadata
  Using cached llama_index-0.10.51-py3-none-any.whl.metadata (11 kB)
Collecting llama-index-cli
  Obtaining dependency information for llama-index-cli from https://files.pythonhosted.org/packages/8c/1b/554b8da1c7b62a7660a3ab0adfdc13a6046cad45a2490c3640728164f058/llama_index_cli-0.1.12-py3-none-any.whl.metadata
  Using cached llama_index_cli-0.1.12-py3-none-any.whl.metadata (1.5 kB)
Collecting llama-index-core
  Obtaining dependency information for llama-index-core from https://files.pythonhosted.org/packages/d6/74/8cee5adbeb187201466af7b773b9bc3ab18d21158e8f08dd8ec69b54af54/llama_index_core-0.10.51-py3-none-any.whl.metadata
  Using cached llama_index_core-0.10.51-py3-none-any.whl.metadata (2.4 kB)
Collecting llama-index-embeddings-elasticsearch
  O

## Import packages
Next we import the required packages as required. The imports are placed in the cells as required.

In [2]:
from llama_index.core.node_parser import SentenceSplitter
from llama_index.core.ingestion import IngestionPipeline
from llama_index.embeddings.ollama import OllamaEmbedding
from llama_index.vector_stores.elasticsearch import ElasticsearchStore
from llama_index.core import VectorStoreIndex, QueryBundle
from llama_index.llms.ollama import Ollama
from llama_index.core import Document, Settings
from getpass import getpass
from urllib.request import urlopen
import json

In [2]:
# from getpass import getpass
# 
# 
# # https://www.elastic.co/search-labs/tutorials/install-elasticsearch/elastic-cloud#finding-your-cloud-id
# ELASTIC_CLOUD_ID = getpass("Elastic Cloud ID: ")
# 
# # https://www.elastic.co/search-labs/tutorials/install-elasticsearch/elastic-cloud#creating-an-api-key
# ELASTIC_API_KEY = getpass("Elastic Api Key: ")

ELASTIC_CLOUD_ID = "031371d8df2748f398b6d907f3e5a386:dXMtY2VudHJhbDEuZ2NwLmNsb3VkLmVzLmlvOjQ0MyQ1YTE2MTJjM2E4MmU0NTUzYmRiZTE3NjkzZWQxM2RlYyQ5NWQxMWY2MDgwZDk0YTdhODNmOGFlYWIyNDUxOTVjNg=="
ELASTIC_API_KEY = "eDhZOVNwQUI0Y09saThSamNTbkk6T1JtWmZnMUFUemFUSHdaNmFBaWJoQQ=="


## Prepare documents for chunking and ingestion
We now prepare the data to be in the [Document](https://docs.llamaindex.ai/en/stable/module_guides/loading/documents_and_nodes/) type for processing using [Llamaindex](https://docs.llamaindex.ai/en/stable/) 

In [8]:
url = "https://raw.githubusercontent.com/elastic/elasticsearch-labs/main/datasets/workplace-documents.json"

response = urlopen(url)
workplace_docs = json.loads(response.read())

# Building Document required by LlamaIndex.
documents = [
    Document(
        text=doc["content"],
        metadata={
            "name": doc["name"],
            "summary": doc["summary"],
            "rolePermissions": doc["rolePermissions"],
        },
    )
    for doc in workplace_docs
]

## Define Elasticsearch and ingest pipeline in LlamaIndex for document processing. Use Llama3 for generating embeddings.
We now define the `Elasticsearchstore` with the required index name, the text field and its associated embeddings. We use `Llama3` to generate the embeddings. We will be running Semantic search on the index to find documents relevant to the query posed by the user. We will use the `SentenceSplitter` provided by `Llamaindex` to chunk the documents. All this is run as part of an `IngestionPipeline` provided by the `Llamaindex` framework.

In [9]:
from llama_index.core.node_parser import SentenceSplitter
from llama_index.core.ingestion import IngestionPipeline
from llama_index.embeddings.ollama import OllamaEmbedding
from llama_index.vector_stores.elasticsearch import ElasticsearchStore

es_vector_store = ElasticsearchStore(
    index_name="workplace_index",
    vector_field="content_vector",
    text_field="content",
    es_cloud_id=ELASTIC_CLOUD_ID,
    es_api_key=ELASTIC_API_KEY,
)
# Embedding Model to do local embedding using Ollama.
ollama_embedding = OllamaEmbedding("llama3")
# LlamaIndex Pipeline .configured to take care of chunking, embedding
# and storing the embeddings in the vector store.
pipeline = IngestionPipeline(
    transformations=[
        SentenceSplitter(chunk_size=512, chunk_overlap=100),
        ollama_embedding,
    ],
    vector_store=es_vector_store,
)

## Execute pipeline 
This will chunk the data, generate embeddings using `Llama3` and ingest into `Elasticsearch` index, with embeddings in a `dense` vector field.

In [10]:
pipeline.run(show_progress=True, documents=documents)

Parsing nodes:   0%|          | 0/15 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/27 [00:00<?, ?it/s]

[TextNode(id_='ef80365b-948c-4a37-910f-5ca092fb7c2e', embedding=[-1.6074005365371704, -1.7017669677734375, -0.7812604308128357, -0.1526147723197937, 0.6949493885040283, 2.9169039726257324, -2.4490771293640137, 0.12622937560081482, -1.4351484775543213, -1.1216206550598145, 4.823176383972168, -2.1315340995788574, 4.110639572143555, -0.8665105104446411, -2.6497204303741455, -0.6846609115600586, -0.7718138098716736, 0.14922168850898743, 3.5380923748016357, -0.3399670422077179, 1.5424411296844482, -0.7623054385185242, 1.5941598415374756, -3.8387680053710938, -1.485552191734314, 1.9677941799163818, 0.13070286810398102, -7.619945526123047, -1.875407099723816, 0.18321722745895386, -1.2861961126327515, 0.34918874502182007, 1.8764564990997314, 0.6194763779640198, -1.0218241214752197, -2.0194828510284424, 0.1397259533405304, 0.23796522617340088, 2.779934883117676, -0.28161075711250305, 0.12208124250173569, 1.6393643617630005, -2.8264193534851074, -0.5177251100540161, 3.5766000747680664, -2.002628

The embeddings are stored in a dense vector field of dimension `4096`. The dimension size comes from the size of the embeddings generated from `Llama3`.

## Define LLM settings. 
This connects to your local LLM. Please refer to https://ollama.com/library/llama3 for details on steps to run Llama3 locally. 

_If you have sufficient resources (atleast >64 GB Ram and GPU available) then you could try the 70B parameter version of Llama3_ 

In [11]:
from llama_index.llms.ollama import Ollama
from llama_index.core import Settings

Settings.embed_model = ollama_embedding
local_llm = Ollama(model="llama3")

### Setup Semantic search and integrate with Llama3. 
We now configure `Elasticsearch` as the vector store for the `Llamaindex` query engine. The query engine, using `Llama3` is then used to answer your questions with contextually relevant data from `Elasticsearch`.

In [12]:
from llama_index.core import VectorStoreIndex, QueryBundle

index = VectorStoreIndex.from_vector_store(es_vector_store)
query_engine = index.as_query_engine(local_llm, similarity_top_k=10)

# Customer Query
query = "What are the organizations sales goals?"
bundle = QueryBundle(
    query_str=query, embedding=Settings.embed_model.get_query_embedding(query=query)
)

response = query_engine.query(bundle)

print(response.response)

According to the "Fy2024 Company Sales Strategy" document, the organization's primary goal is to increase revenue by 20% compared to fiscal year 2023, expand market share in key segments by 15%, retain 95% of existing customers and increase customer satisfaction ratings, and launch at least two new products or services in high-demand market segments.


_You could now try experimenting with other questions._