# Query Data using LLM

Here is the overall RAG pipeline.   In this notebook, we will do steps (2), (3) and (4)
- Step-1: populating embeddings.  It is already done in this notebook [rag_1_B_load_data.ipynb](rag_1_B_load_data.ipynb)
- 👉 Step 2: Calculate embedding for user query
- 👉 Step 3 & 4: Send the query to vector db to retrieve relevant documents
- 👉 Step-4: Send the query and relevant documents (returned above step) to LLM and get answers to our query

![image missing](../media/rag-overview-1.png)

## Configuration

In [1]:
class MyConfig:
    pass
MY_CONFIG = MyConfig()

MY_CONFIG.DB_INSTANCE = "rag_demo.db"  # vector db (embedded)
MY_CONFIG.COLLECTION_NAME = "docs"
MY_CONFIG.EMBEDDING_MODEL = "BAAI/bge-small-en-v1.5"
MY_CONFIG.EMBEDDING_LENGTH = 384

## Figure out Runtime

In [2]:
# are we running in Colab?
import os

if os.getenv("COLAB_RELEASE_TAG"):
   print("Running in Colab")
   MY_CONFIG.RUNNING_IN_COLAB = True
else:
   print("NOT running in Colab")
   MY_CONFIG.RUNNING_IN_COLAB = False

NOT running in Colab


## Install Dependencies (If required)

**A note for Google Colab Users**

After installing the dependenceis, if you get errors loading libraries, **restart runtime** and **run the notebook** again

In [3]:
if MY_CONFIG.RUNNING_IN_COLAB:
  !pip install pymilvus  'pymilvus[model]'  datasets  sentence-transformers  replicate

## Configuration

Create a .env file with the following properties.  You can use [env.txt](../env.txt) as starting point

---

```text
REPLICATE_API_TOKEN=YOUR_TOKEN_GOES_HERE
```

---

## Load Configurations


In [4]:
import os,sys
## Load Settings from .env file
from dotenv import find_dotenv, dotenv_values

# _ = load_dotenv(find_dotenv()) # read local .env file
config = dotenv_values(find_dotenv())

# debug
# print (config)

MY_CONFIG.REPLICATE_API_TOKEN = config.get('REPLICATE_API_TOKEN')

if  MY_CONFIG.REPLICATE_API_TOKEN:
    print ("✅ config REPLICATE_API_TOKEN found")
else:
    raise Exception ("'❌ REPLICATE_API_TOKEN' is not set.  Please set it above to continue...")



✅ config REPLICATE_API_TOKEN found


## Connect to Vector Database

Milvus can be embedded and easy to use.


In [5]:
from pymilvus import MilvusClient

client = MilvusClient(MY_CONFIG.DB_INSTANCE)

print ("✅ Connected to Milvus instance:", MY_CONFIG.DB_INSTANCE)


✅ Connected to Milvus instance: rag_demo.db


## Step-: Setup Embeddings

Use the same embeddings we used to index our documents!

In [6]:
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.core import Settings

Settings.embed_model = HuggingFaceEmbedding(
    model_name = MY_CONFIG.EMBEDDING_MODEL
)

  from .autonotebook import tqdm as notebook_tqdm


In [7]:
## embedding testing
embeddings = Settings.embed_model.get_text_embedding("Paris 2024 Olympics")
print ('embedding len : ', len(embeddings))
print ('first few embeddings : ', embeddings[:3])

embedding len :  384
first few embeddings :  [-0.024121200665831566, -0.02083505131304264, 0.03565467149019241]


## Connect llama-index  to Milvus DB

References

- https://docs.llamaindex.ai/en/stable/examples/vector_stores/MilvusIndexDemo/
- https://docs.llamaindex.ai/en/v0.10.23/api_reference/storage/vector_store/milvus/?h=milvusvectorstore#llama_index.vector_stores.milvus.MilvusVectorStore

In [8]:
from llama_index.core import VectorStoreIndex, StorageContext
from llama_index.vector_stores.milvus import MilvusVectorStore


vector_store = MilvusVectorStore(
    uri=MY_CONFIG.DB_INSTANCE, 
    dim=MY_CONFIG.EMBEDDING_LENGTH, 
    collection_name = MY_CONFIG.COLLECTION_NAME,
    overwrite=False,
    embedding_field = 'vector'
)
storage_context = StorageContext.from_defaults(vector_store=vector_store)

index = VectorStoreIndex.from_vector_store(vector_store=vector_store, storage_context=storage_context)

print ("✅ Connected to Milvus instance:", vector_store)


✅ Connected to Milvus instance: stores_text=True is_embedding_query=True stores_node=True uri='./milvus_llamaindex.db' token='' collection_name='docs' dim=384 embedding_field='vector' doc_id_field='doc_id' similarity_metric='IP' consistency_level='Strong' overwrite=False text_key=None output_fields=[] index_config={} search_config={} batch_size=100 enable_sparse=False sparse_embedding_field='sparse_embedding' sparse_embedding_function=None hybrid_ranker='RRFRanker' hybrid_ranker_params={} index_management=<IndexManagement.CREATE_IF_NOT_EXISTS: 'create_if_not_exists'>


## Initialize LLM

### LLM Choices at Replicate

- llama 3.1 : Latest
    - **meta/meta-llama-3.1-405b-instruct** : Meta's flagship 405 billion parameter language model, fine-tuned for chat completions
- Base version of llama-3 from meta
    - [meta/meta-llama-3-8b](https://replicate.com/meta/meta-llama-3-8b) : Base version of Llama 3, an 8 billion parameter language model from Meta.
    - **meta/meta-llama-3-70b** : 70 billion
- Instruct versions of llama-3 from meta, fine tuned for chat completions
    - **meta/meta-llama-3-8b-instruct** : An 8 billion parameter language model from Meta, 
    - **meta/meta-llama-3-70b-instruct** : 70 billion

References 

- https://docs.llamaindex.ai/en/stable/examples/llm/llama_2/?h=replicate

In [9]:
import os
os.environ["REPLICATE_API_TOKEN"] = MY_CONFIG.REPLICATE_API_TOKEN

In [10]:
from llama_index.llms.replicate import Replicate
from llama_index.core import Settings

llm = Replicate(
    model="meta/meta-llama-3-8b-instruct",
    temperature=0.1
)

Settings.llm = llm

In [11]:
## Basic testing

resp = llm.complete("The capital of the United States is ")
print (resp)



The capital of the United States is Washington, D.C.!


## Step-: Setup Tokenizers

Setup tokenizers to match LLM for best results

Reference  : https://docs.llamaindex.ai/en/stable/module_guides/supporting_modules/settings/#tokenizer

In [12]:
## TODO: revisit later
## HuggingFace now requires token to use 'AutoTokenizer' .. ugh
## Using the default tokenizer for now


# from transformers import AutoTokenizer
# from llama_index.core import Settings
# import tiktoken

# Settings.tokenizer = tiktoken.encoding_for_model("gpt-3.5-turbo").encode

# tokenizer = AutoTokenizer.from_pretrained(
#         "mistralai/Mistral-7B-Instruct-v0.2"
# )

# Settings.tokenzier = tokenizer #typo?
# Settings.tokenizer = tokenizer


In [13]:
## test tokenizer
text = "Tokenizers are essential for natural language processing."
tokens = Settings.tokenizer(text)
print ("Text words count : ", len (text.split()))
print ('tokens count: ', len(tokens))
print("Tokens:", tokens)

Text words count :  7
tokens count:  9
Tokens: [3404, 12509, 527, 7718, 369, 5933, 4221, 8863, 13]


## Query

In [14]:
%%time 

response = index.as_query_engine().query("What was the training dataset?")
print (response)
print()
pprint(response, indent=4)

ValueError: Node content not found in metadata dict.