# Hello, Llama: Simple RAG using QDrant  

This notebook provides a guide to implementing a simple Retrieval-Augmented Generation (RAG) system using a fantasy animals FAQ dataset ingested in QDRant. 


# Install QDrant on Docker or Create a Cloud Subscription

The first step is the setup of a QDrant cluster.  
To do this you can:
- Set up a Qdrant cluster using Docker, **or**
- Create a free cluster on [Qdrant Cloud](https://cloud.qdrant.io/)

## Creating Clusters on https://cloud.qdrant.io/

The simplest method is to create a free account on [https://cloud.qdrant.io/](https://cloud.qdrant.io/) and then create a cluster. In this example this approach will used.

<img src="./images/create-cluster.png" width="600">
After clicking "create", it's important to copy the generated authentication token. Save the API key in a secure place.

<br>
<br>
Now you can copy the Qdrant cluster endpoint to connect later:
<img src="./images/cluster-endpoint.png" width="600">

<br>
You can open the dashboard by clicking the following button:
<img src="./images/cluster-dashboard.png" width="600">

<br>
And then navigate through the created collections, which will initially be empty:
<img src="./images/collections.png" width="600">

### Installazione di QDrant su Docker

To install QDrant using Docker, you'll first need to install Docker (follow the instructions at https://docs.docker.com/get-started/get-docker/) and Docker Compose.

Use the following docker-compose.yaml file. Replace the placeholder `<ADD_HERE_AN_API_KEY>` with a random alphanumeric string of at least 25 characters.

```yaml
services:
  qdrant:
    image: qdrant/qdrant
    container_name: qdrant
    ports:
      - "127.0.0.1:6333:6333"
    environment:
      - QDRANT__SERVICE__API_KEY=<ADD_HERE_AN_API_KEY>
    volumes:
      - qdrant_storage:/qdrant/storage
  

volumes:
  qdrant_storage:
```

Once you've created the docker-compose.yaml file with the above content, run the following command:

> `docker-compose up`

At this point, you can access QDrant by navigating to http://127.0.0.1:6333/dashboard in your browser.

### Configure Qdrant Connection Settings

Now we'll set up the essential variables needed to connect to Qdrant - specifically the API key and cluster URL:

In [None]:
QDRANT_API_KEY = '<QDRANT_API_KEY>'
QDRANT_CLUSTER_URL = '<QDRANT_CLUSTER_URL>'

### Install required libraries

Run the following cell to install all required packages (this includes only additional libraries not installed in previous examples. It's recommended to follow them in the specified order):

In [None]:
# uncomment to install required libraries

#!pip install langchain==0.3.4
#!pip install langchain_community==0.3.3
#!pip install langchain_huggingface==0.1.1

# install qdrant-client

#!pip install qdrant-client==1.13.0

#### Import required libraries

In [None]:
import transformers
from langchain_huggingface import HuggingFacePipeline
from langchain_core.prompts import PromptTemplate
from transformers import AutoTokenizer, AutoModelForCausalLM
from langchain.docstore.document import Document
from langchain_text_splitters import CharacterTextSplitter
from langchain_huggingface import HuggingFaceEmbeddings
from qdrant_client import QdrantClient
from qdrant_client.http.models import PointStruct
from qdrant_client.http.exceptions import UnexpectedResponse
from sentence_transformers import SentenceTransformer
from tqdm import tqdm
import torch
import os
import time
import json

#### Load dataset in memory

We'll load our fantasy animals FAQ from a json file.

In [None]:
fantasy_animals_file = "fantasy-animals-faqs.json"
qdrant_collection_name = 'fantasy_animals_faqs'

with open(fantasy_animals_file, 'r') as file:
    data = json.load(file)  # Load JSON data into a Python dictionary


print(f"Question loaded: {len(data['questions'])}\n\n")

### Create Qdrant Collection Utility

In [None]:
def create_qdrant_collection(qdrant_client: QdrantClient, collection_name:str, vectors_config):
    try:
        created = qdrant_client.create_collection(
            collection_name=collection_name,
            vectors_config=vectors_config
        )
        
        if created:
            print(f"Collection '{collection_name}' created successfully.")
        else:
            print(f"Failed to create collection '{collection_name}'.")
    except UnexpectedResponse as e:
        if e.status_code == 409:
            print(f"Collection '{collection_name}' already exists.")
        else:
            print(f"Unexpected error creating collection '{collection_name}': {e}")

### Import data on Qdrant Collection Utility

In [None]:
def import_on_qdrant(collection_name: str, texts: [str], model_embedder: SentenceTransformer, qdrant_client: QdrantClient):
    
    embeddings = model_embedder.encode(texts)
    
    # Prepare points for ingestion
    points = []

    collection_info = qdrant_client.get_collection(collection_name)
    offset = collection_info.points_count

    for i, (embedding, text) in enumerate(zip(embeddings, texts)):
        
        point_struct = PointStruct(id=offset+i, vector=embedding.tolist(), payload={"text": text})
        
        points.append(point_struct)
    
    
    qdrant_client.upsert(collection_name=collection_name, points=points)

### Query Utility Method for Qdrant Collection Data

In [None]:
def execute_query(query_text: str, collection_name: str,  model_embedder: SentenceTransformer, qdrant_client: QdrantClient, max_number_of_res_points: int = 3):
    # compute the embedding of the query text
    query_vector = model_embedder.encode([query_text])[0]
    
    # Search for similar points
    search_result = qdrant_client.search(
        collection_name=collection_name,
        query_vector=query_vector,
        limit=max_number_of_res_points
    )

    return search_result

### Create Qdrant Client

In [None]:
# Connect to Qdrant
client_qd = QdrantClient(url=QDRANT_CLUSTER_URL, api_key=QDRANT_API_KEY)

### Create embedder model

Creare l'embedder model che è il model che verrà utilizzato per calcolare l'embedding dei documenti

In [None]:
base_folder = "FILL_WITH_BASE_FOLDER" # Example: "C:/Users/username/Documents/HuggingFace"

# we will use the model https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2
model_name = "all-MiniLM-L6-v2"

# set the model id
model_id = os.path.join(base_folder, model_name)

# Initialize the embedder
model_st = SentenceTransformer(model_id, device='cpu')

#### Check how the embedder works

This step also sets the EMBEDDING_MODEL_SIZE that will be the size of our embedding vectors on Qdrant.

In [None]:
embedding_example = model_st.encode("A Drakelion is a large, lion-like creature with dragon-like scales covering its body.")

subset_num = 50
print(f"First {subset_num} array elements {embedding_example[0:subset_num]}\n")
print(f"Embedding size: {embedding_example.shape[0]}")

EMBEDDING_MODEL_SIZE = embedding_example.shape[0]

## RAG Creation

To create the RAG we need to execute the following steps:
1. **Collection Setup** - Create a Qdrant collection with the correct vector dimensions and similarity metric
2. **Data Ingestion** - Process documents into chunks, generate embeddings, and index them in Qdrant
3. **Query Testing** - Validate retrieval performance with sample queries and inspect results

#### Collection setup

In [None]:
vectors_config={"size": EMBEDDING_MODEL_SIZE, "distance": "Cosine"}

# call utility method to create collection
create_qdrant_collection(client_qd, qdrant_collection_name, vectors_config)

#### Data Ingestion

In [None]:
for elem in tqdm(data['questions'], desc="FAQs ingestion"):
    
    text = f"Title: {elem['title']}, Answer: {elem['answer']}"
    
    import_on_qdrant(qdrant_collection_name, [text], model_st, client_qd)

print("Ingestion completed!")

#### Verify Data Ingestion

After completing the ingestion process, you can confirm the data was properly indexed in your Qdrant collection:

<img src="images/collection-created.png" width="600">

### Quering Data

We'll implement semantic search to retrieve the most relevant context from our vector database:

In [None]:
query_text = "Which is the food of the Drakelion?"

points = execute_query(query_text, qdrant_collection_name, model_st, client_qd, 3)

print('Results:\n')

for point in points:
    print(f"ID: {point.id}\nScore: {point.score}\nPayload: {point.payload}\n\n")

#### Build HuggingFace pipeline to be used in langchain

In [None]:
base_folder = "FILL_WITH_BASE_FOLDER" # Example: "C:/Users/username/Documents/HuggingFace"

model_name = "Llama-3.2-3B-Instruct"

# set the model id
model_id = os.path.join(base_folder, model_name)



tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id
)

pipe = transformers.pipeline(
    "text-generation", 
    model=model, 
    tokenizer=tokenizer, 
    max_new_tokens=128, 
    top_k=50, 
    temperature=0.1
)


hf_pipeline = HuggingFacePipeline(
    pipeline=pipe
)

#### Define prompt to be used in langchain

In [None]:
template = """
System: You are an expert in fantasy animals. To answer consider the 'Extra Information' provided. If you don't know the answer respond with "I don't know the answer." without giving any explanation. 

Extra information: {context}\n\n

Query: {query}
Response:
"""

prompt_template = PromptTemplate.from_template(template)

# chaining prompt_template and pipeline
chain = prompt_template | hf_pipeline.bind(skip_prompt=True)

#### Execute the call to the RAG

##### Retrieve the context to be used in the prompt

In [None]:
# We'll use Qdrant to make semantic search to retrieve the most relevant context from our vector database:

context = points[0].payload['text']

print(f"Most probable point for '{query_text}'\n\n{context}")

##### Execute the prompt

In [None]:
input_dict = {"query": query_text, "context": context}

result = chain.invoke(input_dict)

print(f"\nThe query is: '{query_text}'")
print(f"The context is: '{context}'")
print(f"\nThe Response is: \n'{result}'")

## Conclusions
This implementation demonstrates how to build a basic RAG system using QDrant for vector search and a Llama model for generation.