#### Import Dependencies

In [1]:
import openai 
import instructor 
from qdrant_client import QdrantClient

from pydantic import BaseModel, Field

#### RAG Pipeline

In [2]:
qdrant_client = QdrantClient(url="http://localhost:6333")

In [3]:
class RAGGenerationResponse(BaseModel):  
    answer: str = Field(description="The answer to the question")

In [None]:
def get_embedding(text, model="text-embedding-3-small"):
    response = openai.embeddings.create(
        input = text,
        model = model,
    )

    return response.data[0].embedding

def retrieve_data(query, qdrant_client, k=5): 
    query_embedding = get_embedding(query)  
    results = qdrant_client.query_points(
        collection_name="Amazon-items-collection-00",  
        query=query_embedding, 
        limit=k,  
    ) 
    retrieved_context_ids = []
    retrieved_context = [] 
    retrieved_context_ratings = [] 
    similarity_scores = [] 
    
    for result in results.points: 
        retrieved_context_ids.append(result.payload["parent_asin"]) 
        retrieved_context.append(result.payload["description"]) 
        retrieved_context_ratings.append(result.payload["average_rating"]) 
        similarity_scores.append(result.score) 

    return {
        "retrieved_context_ids": retrieved_context_ids, 
        "retrieved_context": retrieved_context,
        "retrieved_context_ratings": retrieved_context_ratings, 
        "similarity_scores": similarity_scores, 
    }

def process_context(context):
    formatted_context = "" 

    for id, chunk, rating in zip(context["retrieved_context_ids"], context["retrieved_context"], context["retrieved_context_ratings"]):  
        formatted_context += f"- ID: {id}, rating: {rating}, description: {chunk} \n"  

    return formatted_context


def build_prompt(preprocessed_context, question): 

    prompt = f""" 
    You are a shopping assistant that can answer qeustions about the products in stock. 

    You will be given a question and a list of context. 

    Instructions: 
    - You need to answer the question based on the provided context only. 
    - Never use word context and refer to it as the available products.
    - As an output you need to provide: 

    * The answer to the question based on the provided context. 
    * The list of the IDs of the chunks that were used to answer the question. Only return the ones that are used in the answer. 
    * Short description (1-2 sentences) of the item based on the description provided in the context. 

    - The short description should have the name of the item. 
    - The answer to the question should contain detailed information about the product and returned with detailed specification in bullet points.
    Context: 
    {preprocessed_context} 

    Question: 
    {question} 
    """ 
    return prompt 

def generate_answer(prompt):  

    client = instructor.from_openai(openai.OpenAI())  

    response, raw_response = client.chat.completions.create_with_completion(
        model="gpt-4o-mini", 
        messages=[{"role": "system", "content": prompt}], 
        temperature=0.5, 
        response_model = RAGGenerationResponse
    ) 

    return response

def rag_pipeline(question, qdrant_client,top_k=5): 

    retrieved_context = retrieve_data(question, qdrant_client, top_k) 
    preprocessed_context = process_context(retrieved_context)
    prompt = build_prompt(preprocessed_context, question)
    answer = generate_answer(prompt)

    final_result = {
        "answer": answer.answer, 
        "question": question,  
        "retrieved_context_ids": retrieved_context["retrieved_context_ids"],   
        "retrieved_context": retrieved_context["retrieved_context"],   
        "similarity_scores": retrieved_context["similarity_scores"],   
    }
    return final_result

In [12]:
result = rag_pipeline("Can I get some earphones?", qdrant_client, top_k=10)

In [13]:
result

{'answer': 'Here are some earphones you can consider:\n\n1. **Wireless Earbuds (ID: B0B9FTVL58)**\n   - **Rating:** 4.2\n   - **Description:** Bluetooth 5.3 headphones with microphone.\n   - **Playback Time:** 37 hours playback with charging case.\n   - **Features:** LED power display, deep bass, IPX7 waterproof, ultra-light design, smart touch controls.\n   - **Ideal For:** Sports and outdoor activities.\n\n2. **TELSOR Wireless Earbuds (ID: B0C6K1GQCF)**\n   - **Rating:** 4.3\n   - **Description:** Bluetooth headphones with touch control and noise-cancelling mic.\n   - **Playback Time:** 30 hours with charging case.\n   - **Features:** IPX7 waterproof, ergonomic design, clear call quality, smart touch control.\n   - **Compatibility:** Works with most Bluetooth-enabled devices.\n\n3. **Open Ear Headphones (ID: B0CBMPG524)**\n   - **Rating:** 4.4\n   - **Description:** Bluetooth 5.3 earbuds with 60 hours playtime.\n   - **Features:** Open-ear design, powerful stereo sound, IPX7 waterpro

In [15]:
print(result["answer"])

Here are some earphones you can consider:

1. **Wireless Earbuds (ID: B0B9FTVL58)**
   - **Rating:** 4.2
   - **Description:** Bluetooth 5.3 headphones with microphone.
   - **Playback Time:** 37 hours playback with charging case.
   - **Features:** LED power display, deep bass, IPX7 waterproof, ultra-light design, smart touch controls.
   - **Ideal For:** Sports and outdoor activities.

2. **TELSOR Wireless Earbuds (ID: B0C6K1GQCF)**
   - **Rating:** 4.3
   - **Description:** Bluetooth headphones with touch control and noise-cancelling mic.
   - **Playback Time:** 30 hours with charging case.
   - **Features:** IPX7 waterproof, ergonomic design, clear call quality, smart touch control.
   - **Compatibility:** Works with most Bluetooth-enabled devices.

3. **Open Ear Headphones (ID: B0CBMPG524)**
   - **Rating:** 4.4
   - **Description:** Bluetooth 5.3 earbuds with 60 hours playtime.
   - **Features:** Open-ear design, powerful stereo sound, IPX7 waterproof, secure fit.
   - **Ideal Fo

#### RAG Pipeline with Grounding Context

In [16]:
class RAGUsedContext(BaseModel):  
    id: str = Field(description="The id of the item used to answer the question")
    description: str = Field(description="Short description of the items used to answer the question")
class RAGGenerationResponseWithReferences(BaseModel):  
    answer: str = Field(description="The answer to the question")
    references: list[RAGUsedContext] = Field(description="List of items used to answer the question")

In [17]:
def get_embedding(text, model="text-embedding-3-small"):
    response = openai.embeddings.create(
        input = text,
        model = model,
    )

    return response.data[0].embedding

def retrieve_data(query, qdrant_client, k=5): 
    query_embedding = get_embedding(query)  
    results = qdrant_client.query_points(
        collection_name="Amazon-items-collection-00",  
        query=query_embedding, 
        limit=k,  
    ) 
    retrieved_context_ids = []
    retrieved_context = [] 
    retrieved_context_ratings = [] 
    similarity_scores = [] 
    
    for result in results.points: 
        retrieved_context_ids.append(result.payload["parent_asin"]) 
        retrieved_context.append(result.payload["description"]) 
        retrieved_context_ratings.append(result.payload["average_rating"]) 
        similarity_scores.append(result.score) 

    return {
        "retrieved_context_ids": retrieved_context_ids, 
        "retrieved_context": retrieved_context,
        "retrieved_context_ratings": retrieved_context_ratings, 
        "similarity_scores": similarity_scores, 
    }

def process_context(context):
    formatted_context = "" 

    for id, chunk, rating in zip(context["retrieved_context_ids"], context["retrieved_context"], context["retrieved_context_ratings"]):  
        formatted_context += f"- ID: {id}, rating: {rating}, description: {chunk} \n"  

    return formatted_context


def build_prompt(preprocessed_context, question): 

    prompt = f""" 
    You are a shopping assistant that can answer qeustions about the products in stock. 

    You will be given a question and a list of context. 

    Instructions: 
    - You need to answer the question based on the provided context only. 
    - Never use word context and refer to it as the available products. 
    - Also as an output, you need to provide: 

    * The answer to the question based on the provided context.
    * The list of the IDs of the chunks that were used to answer the question. Only return the ones that are used in the answer. 
    * Short description (1-2 sentences) of the item based on the description provided in the context. 

    - The short description should have the name of the item. 
    - The answer to the question should contain detailed information about the product and returned with detailed specification in bullet points.
    Context: 
    {preprocessed_context} 

    Question: 
    {question} 
    """ 
    return prompt 

def generate_answer(prompt):  

    client = instructor.from_openai(openai.OpenAI())  

    response, raw_response = client.chat.completions.create_with_completion(
        model="gpt-4o-mini", 
        messages=[{"role": "system", "content": prompt}], 
        temperature=0.5, 
        response_model = RAGGenerationResponseWithReferences
    ) 

    return response

def rag_pipeline(question, qdrant_client,top_k=5): 

    retrieved_context = retrieve_data(question, qdrant_client, top_k) 
    preprocessed_context = process_context(retrieved_context)
    prompt = build_prompt(preprocessed_context, question)
    answer = generate_answer(prompt)

    final_result = {
        "answer": answer.answer, 
        "references": answer.references,
        "question": question,  
        "retrieved_context_ids": retrieved_context["retrieved_context_ids"],   
        "retrieved_context": retrieved_context["retrieved_context"],   
        "similarity_scores": retrieved_context["similarity_scores"],   
    }
    return final_result

In [18]:
result = rag_pipeline("Can I get some earphones?", qdrant_client, top_k=10)

In [19]:
result

{'answer': 'Here are some earphones you can consider:\n\n1. **Wireless Earbuds (Sport S23)**  \n   - **Rating:** 4.2  \n   - **Bluetooth Version:** 5.3  \n   - **Playback Time:** 37 hours  \n   - **Features:**  \n     - Deep bass sound quality  \n     - IPX7 waterproof rating  \n     - Ultra-lightweight design with a charging case  \n     - Smart touch controls for ease of use  \n\n2. **TELSOR Wireless Earbuds**  \n   - **Rating:** 4.3  \n   - **Bluetooth Version:** 5.1  \n   - **Playback Time:** 30 hours  \n   - **Features:**  \n     - Noise-cancelling microphone for clear calls  \n     - IPX7 waterproof design  \n     - Ergonomic design for comfort  \n     - Smart touch control for easy operation  \n\n3. **Open Ear Headphones**  \n   - **Rating:** 4.4  \n   - **Bluetooth Version:** 5.3  \n   - **Playback Time:** 60 hours  \n   - **Features:**  \n     - Open-ear design to hear surroundings  \n     - Comfortable fit with silicone ear hooks  \n     - IPX7 waterproof rating  \n\n4. **Sin

In [20]:
print(result["answer"])

Here are some earphones you can consider:

1. **Wireless Earbuds (Sport S23)**  
   - **Rating:** 4.2  
   - **Bluetooth Version:** 5.3  
   - **Playback Time:** 37 hours  
   - **Features:**  
     - Deep bass sound quality  
     - IPX7 waterproof rating  
     - Ultra-lightweight design with a charging case  
     - Smart touch controls for ease of use  

2. **TELSOR Wireless Earbuds**  
   - **Rating:** 4.3  
   - **Bluetooth Version:** 5.1  
   - **Playback Time:** 30 hours  
   - **Features:**  
     - Noise-cancelling microphone for clear calls  
     - IPX7 waterproof design  
     - Ergonomic design for comfort  
     - Smart touch control for easy operation  

3. **Open Ear Headphones**  
   - **Rating:** 4.4  
   - **Bluetooth Version:** 5.3  
   - **Playback Time:** 60 hours  
   - **Features:**  
     - Open-ear design to hear surroundings  
     - Comfortable fit with silicone ear hooks  
     - IPX7 waterproof rating  

4. **Siniffo Upgraded Bone Conduction Headphones** 