# RAG with Elastic ELSER and Llama3 using Langchain

This interactive notebook uses `Langchain` to process fictional workplace documents and uses `ELSER v2` running in `Elasticsearch` to transform these documents into embeddings and store them into `Elasticsearch`. We then ask a question, retrieve the relevant documents from `Elasticsearch` and use `Llama3` running locally using `Ollama` to provide a response. 

**_Note_** : _`Llama3` is expected to be running using `Ollama` on the same machine where you will be running this notebook._

## Requirements

For this example, you will need:

- An Elastic deployment
  - We'll be using a local Elasticsearch setup
  - For LLM we will be using [Ollama](https://ollama.com/) and [Llama3](https://ollama.com/library/llama3) configured locally.  

## Install required dependencies
First we install the packages we need for this example.

In [19]:
!pip3 install langchain langchain-elasticsearch langchain-community tiktoken python-dotenv elasticsearch[async]

Looking in indexes: http://mirrors.aliyun.com/pypi/simple/


## Import packages
Next we import the required packages as required. The imports are placed in the cells as required.

In [20]:
from elasticsearch import Elasticsearch, AsyncElasticsearch, helpers
from dotenv import load_dotenv
import os
 
load_dotenv()
 
ES_USER = os.getenv("ES_USER")
ES_PASSWORD = os.getenv("ES_PASSWORD")
ES_ENDPOINT = os.getenv("ES_ENDPOINT")
COHERE_API_KEY = os.getenv("COHERE_API_KEY")
 
url = f"https://{ES_USER}:{ES_PASSWORD}@{ES_ENDPOINT}:9200"
print(url)
 
client = Elasticsearch(url, ca_certs = "./http_ca.crt", verify_certs = True)
# info = await client.info()
print(client.info())

https://elastic:uK+7WbkeXMzwk9YvP-H3@localhost:9200
{'name': 'liuxgm.local', 'cluster_name': 'elasticsearch', 'cluster_uuid': 'coIKHIPsTf2_aWWQ8TO4bw', 'version': {'number': '8.14.1', 'build_flavor': 'default', 'build_type': 'tar', 'build_hash': '93a57a1a76f556d8aee6a90d1a95b06187501310', 'build_date': '2024-06-10T23:35:17.114581191Z', 'build_snapshot': False, 'lucene_version': '9.10.0', 'minimum_wire_compatibility_version': '7.17.0', 'minimum_index_compatibility_version': '7.0.0'}, 'tagline': 'You Know, for Search'}


## Prepare documents for chunking and ingestion
We now prepare the data to be ingested into `Elasticsearch`. We use `LangChain`'s `RecursiveCharacterTextSplitter` and split the documents' text at 512 characters with an overlap of 256 characters. 

In [21]:
# from urllib.request import urlopen
import json
from langchain.text_splitter import RecursiveCharacterTextSplitter


# url = "https://raw.githubusercontent.com/elastic/elasticsearch-labs/main/datasets/workplace-documents.json"

# response = urlopen(url)

# workplace_docs = json.loads(response.read())

# Load data into a JSON object
with open('workplace-documents.json') as f:
   workplace_docs = json.load(f)

metadata = []
content = []
for doc in workplace_docs:
    content.append(doc["content"])
    metadata.append(
        {
            "name": doc["name"],
            "summary": doc["summary"],
            "rolePermissions": doc["rolePermissions"],
        }
    )

text_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(
    chunk_size=512, chunk_overlap=256
)
docs = text_splitter.create_documents(content, metadatas=metadata)

## Define Elasticsearch Vector Store
We define `ElasticsearchStore` as the vector store with [SparseVectorStrategy](https://python.langchain.com/v0.2/docs/integrations/vectorstores/elasticsearch/#sparsevectorstrategy-elser).`SparseVectorStrategy` converts each document into tokens and would be stored in vector field with datatype `rank_features`.
We will be using text embedding from [ELSER v2](https://www.elastic.co/guide/en/machine-learning/current/ml-nlp-elser.html#elser-v2) model `.elser_model_2_linux`

Note: Before we begin indexing, ensure you have [downloaded and deployed ELSER v2 model](https://www.elastic.co/guide/en/machine-learning/current/ml-nlp-elser.html#download-deploy-elser) in your deployment and is running in ml node. 

In [22]:
from langchain_elasticsearch import ElasticsearchStore
from langchain_elasticsearch import SparseVectorStrategy

index_name = "workplace_index_elser"

# Delete the index if it exists
if client.indices.exists(index=index_name):
    client.indices.delete(index=index_name)

es_vector_store = ElasticsearchStore(
    es_user = ES_USER,
    es_password = ES_PASSWORD,
    es_url = url,
    es_connection = client, 
    index_name=index_name,
    strategy=SparseVectorStrategy(model_id=".elser_model_2"),
)

## Add docs processed above. 
The document has already been chunked. We do not use any specific embedding function here, since the tokens are inferred at index time and at query time within Elasticsearch. 
This requires that the `ELSER v2` model to be loaded and running in Elasticsearch.

In [23]:
es_vector_store.add_documents(documents=docs)

['d252429b-19ab-4761-a55a-b5523996f02c',
 '6f326aac-71ad-43d5-a0bf-4f5e81bc6dad',
 '550ef0f5-70f0-4b43-a9ff-72e2e9a34904',
 'e11a8baf-7365-4f4b-aed0-95dfc1d58179',
 'dfdf630a-7064-4bd4-b5d6-c78487255fe3',
 'd445c931-7627-4c2f-a494-7015e5cf9c0c',
 '25cbdf0e-343e-4c0f-a017-a4cd09d20423',
 '610af5b2-d3b9-4bf7-bce1-82b27256cc55',
 'bdb490de-31eb-4c7a-af8e-58a03d8d5cc0',
 '22d91725-806d-4d64-8d60-986a21b7d76d',
 'f4a2276b-b876-4b0a-a4fd-d34b195aa881',
 '76fd4b17-7afd-4863-a30d-969068aea44a',
 'a710cf56-b4c2-4558-af74-a7166ca5d405',
 '63fef806-1813-4a79-bdce-07b901925ce0',
 '1292541b-7aa8-404e-8333-02d1c8d32f66',
 '9a767beb-6b2e-49c2-b8d9-f9600bc3da79',
 '7536a334-2fd3-4608-b1fb-a212821b361c',
 'eb708368-67f1-4b2d-8c40-6db7e7d7218a',
 '35af7992-4434-4a3c-9730-be25c482672c',
 '085b6138-e421-4e45-8c8f-bf14025ea832',
 '3bd9475b-1d7c-4ecb-b2c7-5099049816ec',
 '9164ab8a-e392-49f7-85e6-39946b2db501',
 '85e31f91-5d3d-4519-b731-e149743ef254',
 '83dfb216-540c-4986-b861-3d0de3b6969e',
 'ea56915f-2224-

## LLM Configuration
This connects to your local LLM. Please refer to https://ollama.com/library/llama3 for details on steps to run Llama3 locally. 

_If you have sufficient resources (atleast >64 GB Ram and GPU available) then you could try the 70B parameter version of Llama3_


In [24]:
from langchain_community.llms import Ollama

llm = Ollama(model="llama3")

## Semantic Search using Elasticsearch ELSER v2 and Llama3

We will perform a semantic search on query with `ELSER v2` as the model. The contextually relevant answer is then composed into a template along with the users original query. 

We then user `Llama3` to answer your questions with contextually relevant data fetched earlier from Elasticsearch using the retriever.   

In [25]:
from langchain.prompts import ChatPromptTemplate
from langchain.schema.output_parser import StrOutputParser
from langchain.schema.runnable import RunnablePassthrough


def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)


retriever = es_vector_store.as_retriever()
template = """Answer the question based only on the following context:\n

                {context}
                
                Question: {question}
               """
prompt = ChatPromptTemplate.from_template(template)
chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

a = chain.invoke("What are the organizations sales goals?")

print(a)

According to the context, the organization's sales goals for fiscal year 2024 are:

1. Increase revenue by 20% compared to fiscal year 2023.
2. Expand market share in key segments by 15%.
3. Retain 95% of existing customers and increase customer satisfaction ratings.
4. Launch at least two new products or services in high-demand market segments.


_You could now try experimenting with other questions._
