# Query Data using LLM

Here is the overall RAG pipeline.   In this notebook, we will do steps (5), (6), (7), (8), (9)
- Importing data is already done in this notebook [rag_1_B_load_data.ipynb](rag_1_B_load_data.ipynb)
- 👉 Step 5: Calculate embedding for user query
- 👉 Step 6 & 7: Send the query to vector db to retrieve relevant documents
- 👉 Step 8 & 9: Send the query and relevant documents (returned above step) to LLM and get answers to our query

![image missing](../media/rag-overview-2.png)

## Configuration

In [1]:
class MyConfig:
    pass
MY_CONFIG = MyConfig()

MY_CONFIG.EMBEDDING_MODEL = "sentence-transformers/all-MiniLM-L6-v2"
MY_CONFIG.EMBEDDING_LENGTH = 384

MY_CONFIG.DB_URI = './rag_4_walmart_dataprepkit.db'  # For embedded instance
#MY_CONFIG.DB_URI = 'http://localhost:19530'  # For Docker instance
MY_CONFIG.COLLECTION_NAME = 'dataprepkit_walmart_docs'

MY_CONFIG.LLM_MODEL = "meta/meta-llama-3-8b-instruct"


## Configuration

Create a .env file with the following properties.  You can use [env.txt](../env.txt) as starting point

---

```text
REPLICATE_API_TOKEN=YOUR_TOKEN_GOES_HERE
```

---

## Load Configurations


In [2]:
import os,sys
## Load Settings from .env file
from dotenv import find_dotenv, dotenv_values

# _ = load_dotenv(find_dotenv()) # read local .env file
config = dotenv_values(find_dotenv())

# debug
# print (config)

MY_CONFIG.REPLICATE_API_TOKEN = config.get('REPLICATE_API_TOKEN')

if  MY_CONFIG.REPLICATE_API_TOKEN:
    print ("✅ config REPLICATE_API_TOKEN found")
else:
    raise Exception ("'❌ REPLICATE_API_TOKEN' is not set.  Please set it above to continue...")


✅ config REPLICATE_API_TOKEN found


## Connect to Vector Database

Milvus can be embedded and easy to use.


In [3]:
from pymilvus import MilvusClient

milvus_client = MilvusClient(MY_CONFIG.DB_URI)

print ("✅ Connected to Milvus instance:", MY_CONFIG.DB_URI)

✅ Connected to Milvus instance: ./rag_4_walmart_dataprepkit.db


## Step-: Setup Embeddings

Use the same embeddings we used to index our documents!

In [4]:
from sentence_transformers import SentenceTransformer

model = SentenceTransformer(MY_CONFIG.EMBEDDING_MODEL)

def get_embeddings (str):
    embeddings = model.encode(str, normalize_embeddings=True)
    return embeddings

  from tqdm.autonotebook import tqdm, trange


In [5]:
# Test embeddings
embeddings = get_embeddings('Paris 2024 Olympics')
print ('embeddings len =', len(embeddings))
print ('embeddings[:5] = ', embeddings[:5])

embeddings len = 384
embeddings[:5] =  [ 0.02468892  0.10352131  0.0275264  -0.08551715 -0.01412829]


## Vector Search and RAG

In [6]:
# Get relevant documents using vector / sementic search

def fetch_relevant_documents (query : str) :
    search_res = milvus_client.search(
        collection_name=MY_CONFIG.COLLECTION_NAME,
        data = [get_embeddings(query)], # Use the `emb_text` function to convert the question to an embedding vector
        limit=3,  # Return top 3 results
        search_params={"metric_type": "IP", "params": {}},  # Inner product distance
        output_fields=["text"],  # Return the text field
    )
    # print (search_res)

    retrieved_docs_with_distances = [
        {'text': res["entity"]["text"], 'distance' : res["distance"]} for res in search_res[0]
    ]
    return retrieved_docs_with_distances
## --- end ---


In [7]:
# test relevant vector search
import json
import pprint

question = "What was Walmart's revenue in 2023?"
relevant_docs = fetch_relevant_documents(question)
pprint.pprint(relevant_docs, indent=4)

[   {   'distance': 0.7401622533798218,
        'text': 'Strong, Efficient Growth\n'
                'Comparable sales in the U.S., including fuel, increased 8.2% '
                'and 7.7% in fiscal 2023 and 2022, respectively, when compared '
                'to the previous fiscal year. Walmart U.S. comparable sales '
                'increased 7.0% and 6.4% in fiscal 2023 and 2022, '
                'respectively. For'},
    {   'distance': 0.7293714284896851,
        'text': 'General\n'
                'Our operations comprise three reportable segments: Walmart '
                "U.S., Walmart International and Sam's Club. Our fiscal year "
                'ends on January 31 for our United States ("U.S.") and '
                'Canadian operations. We consolidate all other operations '
                'generally using a one-month lag and on a calendar year basis. '
                'Our discussion is as of and for the fiscal years ended '
                'January 31, 2023 ("fisca

## Initialize LLM

### LLM Choices at Replicate

- llama 3.1 : Latest
    - **meta/meta-llama-3.1-405b-instruct** : Meta's flagship 405 billion parameter language model, fine-tuned for chat completions
- Base version of llama-3 from meta
    - [meta/meta-llama-3-8b](https://replicate.com/meta/meta-llama-3-8b) : Base version of Llama 3, an 8 billion parameter language model from Meta.
    - **meta/meta-llama-3-70b** : 70 billion
- Instruct versions of llama-3 from meta, fine tuned for chat completions
    - **meta/meta-llama-3-8b-instruct** : An 8 billion parameter language model from Meta, 
    - **meta/meta-llama-3-70b-instruct** : 70 billion

References 

- https://docs.llamaindex.ai/en/stable/examples/llm/llama_2/?h=replicate

In [8]:
import os
os.environ["REPLICATE_API_TOKEN"] = MY_CONFIG.REPLICATE_API_TOKEN

In [9]:
import replicate

def ask_LLM (question, relevant_docs):
    context = "\n".join(
        [doc['text'] for doc in relevant_docs]
    )
    print ('============ context (this is the context supplied to LLM) ============')
    print (context)
    print ('============ end  context ============', flush=True)

    system_prompt = """
    Human: You are an AI assistant. You are able to find answers to the questions from the contextual passage snippets provided.
    """
    user_prompt = f"""
    Use the following pieces of information enclosed in <context> tags to provide an answer to the question enclosed in <question> tags.
    <context>
    {context}
    </context>
    <question>
    {question}
    </question>
    """

    print ('============ here is the answer from LLM... STREAMING... =====')
    # The meta/meta-llama-3-8b-instruct model can stream output as it's running.
    for event in replicate.stream(
        MY_CONFIG.LLM_MODEL,
        input={
            "top_k": 0,
            "top_p": 0.95,
            "prompt": user_prompt,
            "max_tokens": 512,
            "temperature": 0.1,
            "system_prompt": system_prompt,
            "length_penalty": 1,
            "max_new_tokens": 512,
            "stop_sequences": "<|end_of_text|>,<|eot_id|>",
            "prompt_template": "<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\n{system_prompt}<|eot_id|><|start_header_id|>user<|end_header_id|>\n\n{prompt}<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n",
            "presence_penalty": 0,
            "log_performance_metrics": False
        },
    ):
        print(str(event), end="")
    ## ---
    print ('\n======  end LLM answer ======\n', flush=True)


In [10]:
import replicate

def ask_LLM (question, relevant_docs):
    context = "\n".join(
        [doc['text'] for doc in relevant_docs]
    )
    print ('============ context (this is the context supplied to LLM) ============')
    print (context)
    print ('============ end  context ============', flush=True)

    system_prompt = """
    Human: You are an AI assistant. You are able to find answers to the questions from the contextual passage snippets provided.
    """
    user_prompt = f"""
    Use the following pieces of information enclosed in <context> tags to provide an answer to the question enclosed in <question> tags.
    <context>
    {context}
    </context>
    <question>
    {question}
    </question>
    """

    print ('============ here is the answer from LLM... STREAMING... =====')
    # The meta/meta-llama-3-8b-instruct model can stream output as it's running.
    for event in replicate.stream(
        MY_CONFIG.LLM_MODEL,
        input={
            "top_k": 0,
            "top_p": 0.95,
            "prompt": user_prompt,
            "max_tokens": 512,
            "temperature": 0.1,
            "system_prompt": system_prompt,
            "length_penalty": 1,
            "max_new_tokens": 512,
            "stop_sequences": "<|end_of_text|>,<|eot_id|>",
            "prompt_template": "<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\n{system_prompt}<|eot_id|><|start_header_id|>user<|end_header_id|>\n\n{prompt}<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n",
            "presence_penalty": 0,
            "log_performance_metrics": False
        },
    ):
        print(str(event), end="")
    ## ---
    print ('\n======  end LLM answer ======\n', flush=True)


## Query

In [11]:
%%time

question = "What was Walmart's revenue in 2023?"
relevant_docs = fetch_relevant_documents(question)
ask_LLM(question=question, relevant_docs=relevant_docs)

Strong, Efficient Growth
Comparable sales in the U.S., including fuel, increased 8.2% and 7.7% in fiscal 2023 and 2022, respectively, when compared to the previous fiscal year. Walmart U.S. comparable sales increased 7.0% and 6.4% in fiscal 2023 and 2022, respectively. For
General
Our operations comprise three reportable segments: Walmart U.S., Walmart International and Sam's Club. Our fiscal year ends on January 31 for our United States ("U.S.") and Canadian operations. We consolidate all other operations generally using a one-month lag and on a calendar year basis. Our discussion is as of and for the fiscal years ended January 31, 2023 ("fiscal 2023"), January 31, 2022 ("fiscal 2022") and January 31, 2021 ("fiscal 2021"). During fiscal 2023, we generated total revenues of $611.3 billion, which was comprised primarily of net sales of $605.9 billion.
(A(Amounts in millions)
Of Walmart U.S.'s total net sales, approximately $53.4 billion, $47.8 billion and $43.0 billion related to eComme

In [12]:
%%time

question = "How many distribution centers does Walmart have?"
relevant_docs = fetch_relevant_documents(question)
ask_LLM(question=question, relevant_docs=relevant_docs)

Walmart U.S. Segment
Distribution . We continue to invest in supply chain automation and utilize a total of 163 distribution facilities which are located strategically throughout the U.S. For fiscal 2023, the majority of Walmart U.S.'s purchases of store merchandise were shipped through these facilities, while most of the remaining store merchandise we purchased was shipped directly from suppliers. General merchandise and dry grocery merchandise is transported primarily through the segment's private truck fleet; however, we contract with common carriers to transport the majority of our perishable grocery merchandise. We ship merchandise purchased by customers on our eCommerce platforms by a number of methods from multiple locations including from our 34 dedicated eCommerce fulfillment centers, as well as leveraging our ability to ship or deliver directly from more than 3,900 stores.
Walmart International Segment
Distribution. We utilize a total of 188 distribution facilities located in

In [13]:
%%time

question = "When was the moon landing?"
relevant_docs = fetch_relevant_documents(question)
ask_LLM(question=question, relevant_docs=relevant_docs)

Contingencies
We have served as the Company's auditor since 1969.
                                                 -                                                                        
3/29/2024 10:28:40 AM
SIGNATURES
Pursuant to the requirements of the Securities Exchange Act of 1934, this report has been signed below by the following persons on behalf of the registrant and in the capacities and on the dates indicated:
I'm happy to help! However, I don't see any information about the moon landing in the provided context. The context appears to be related to a company's auditor and a report signed by certain individuals. Therefore, I cannot provide an answer to the question about the moon landing. If you could provide more context or clarify the question, I'd be happy to try and assist you further!

CPU times: user 50.6 ms, sys: 17.8 ms, total: 68.4 ms
Wall time: 1.68 s
