# Simple RAG Chatbot using Ollama & FAISS

### Table of Contents

1. **Load External Data**
2. **Generate Embeddings**
3. **Create Vector Index**
4. **Create Retriever**
5. **Complete RAG Chat App**
    * Create History Aware Retriever
    * Create RAG Chat App

### Installation

* **pip install ollama**
* **pip install faiss-cpu**

In [2]:
import ollama

LLM = "llama2"

response = ollama.generate(model=LLM, prompt="Do you know about Claude 3?")

type(response)

dict

In [3]:
print(response["response"])

I'm not familiar with a person or product called "Claude 3." Could you please provide more context or information about who or what Claude 3 is? That will help me better understand your question and give you a more accurate response.


## 1. Load External Data

In [10]:
import requests
from bs4 import BeautifulSoup

urls = [
        "https://www.anthropic.com/news/releasing-claude-instant-1-2",
        "https://www.anthropic.com/news/claude-pro",
        "https://www.anthropic.com/news/claude-2",
        "https://www.anthropic.com/news/claude-2-1",
        "https://www.anthropic.com/news/claude-2-1-prompting",
        "https://www.anthropic.com/news/claude-3-family",
        "https://www.anthropic.com/claude"
       ] 

docs = []

for url in urls:
    response = requests.get(url)
    soup = BeautifulSoup(response.text, 'html.parser')
    docs.append({"page-content": soup.text, "metadata": {"source": url}})
    
docs[0]

{'page-content': 'Releasing Claude Instant 1.2 \\ AnthropicClaudeAPIResearchCompanyNewsCareersProductReleasing Claude Instant 1.2Aug 9, 2023●1 min readBusinesses working with Claude can now access our latest version of Claude Instant, version 1.2, available through our API.\xa0Claude Instant is our faster, lower-priced yet still very capable model, which can handle a range of tasks including casual dialogue, text analysis, summarization, and document comprehension.Claude Instant 1.2 incorporates the strengths of our latest\xa0model Claude 2\xa0in real-world use cases and shows significant gains in key areas like math, coding, reasoning, and safety. It generates longer, more structured responses and follows formatting instructions better. Instant 1.2 also shows improvements in quote extraction, multilingual capabilities, and question answering.Claude Instant 1.2 outperforms Claude Instant 1.1 on math and coding, achieving 58.7% on the Codex evaluation compared to 52.8% in our previous m

## 2. Generate Embeddings

In [11]:
embedding_model = "llama2"

embeds = ollama.embeddings(model=embedding_model, prompt="Do you know about Claude 3?")

type(embeds)

dict

In [12]:
embeds["embedding"][:5], len(embeds["embedding"])

([1.039122223854065,
  -1.5061262845993042,
  0.9997416138648987,
  -0.21553783118724823,
  -2.3464105129241943],
 4096)

## 3. Create Vector Index

* **pip install faiss-cpu**

In [13]:
import faiss

dims = 4096

vector_index = faiss.IndexFlatL2(dims)

vector_index

<faiss.swigfaiss_avx2.IndexFlatL2; proxy of <Swig Object of type 'faiss::IndexFlatL2 *' at 0x7f64b188edf0> >

In [14]:
import numpy as np

docs_embeds = []

for doc in docs:
    resp = ollama.embeddings(model=embedding_model, prompt=doc["page-content"])
    docs_embeds.append(resp["embedding"])
    
vector_index.add(np.array(docs_embeds))

vector_index.ntotal

7

## 4. Create Retriever

In [15]:
def retriever(query_embeds: list[float], top_k: int=4)-> tuple[np.array, np.array]:
    distances, indexes = vector_index.search(query_embeds, top_k)
    return distances, indexes

In [16]:
embeds = ollama.embeddings(model=embedding_model, prompt="Do you know about Claude 3?")

D, I = retriever(np.array(embeds["embedding"]).reshape(1,-1))

D, I

(array([[18513.047, 18802.807, 18904.535, 19758.133]], dtype=float32),
 array([[3, 5, 6, 0]]))

In [17]:
for idx in I[0]:
    print(docs[idx]["metadata"])

{'source': 'https://www.anthropic.com/news/claude-2-1'}
{'source': 'https://www.anthropic.com/news/claude-3-family'}
{'source': 'https://www.anthropic.com/claude'}
{'source': 'https://www.anthropic.com/news/releasing-claude-instant-1-2'}


In [18]:
def retrieve_relevant_docs(query: str, top_k: int=4)-> list[dict]:
    embeds = ollama.embeddings(model=embedding_model, prompt=query)
    D, I = retriever(np.array(embeds["embedding"]).reshape(1,-1))
    
    return [doc for idx, doc in enumerate(docs) if idx in I[0]]

In [19]:
relevant_docs = retrieve_relevant_docs("Do you know about Claude 3?")

for doc in relevant_docs:
    print(doc["metadata"])

{'source': 'https://www.anthropic.com/news/releasing-claude-instant-1-2'}
{'source': 'https://www.anthropic.com/news/claude-2-1'}
{'source': 'https://www.anthropic.com/news/claude-3-family'}
{'source': 'https://www.anthropic.com/claude'}


## 5. Complete RAG Chat App

### 5.1 Create History Aware Retriever

In [20]:
def create_history_aware_query(query: str, chat_history: list[dict]):
    complete_history = chat_history +\
    [{
        "role": "user",
        "content": query,
    },
    {
        "role": "user",
        "content": "Given the above conversation, generate a search query to look up in order to get information relevant to the conversation",
    }]
    
    resp = ollama.chat(model=LLM, messages=complete_history)
    
    return resp["message"]["content"]

In [21]:
chat_history = [{
    "role": "user",
    "content": "Do you know about Claude 3?",
},
{
    "role": "assistant",
    "content": "Yes, I am well aware of Claude 3 AI conversational bot from Anthropic which has 3 models (Opus, Haiku & Sonnet). Please provide more context info on how can I help you.",
}]

modified_query = create_history_aware_query("Tell me about different models in detail.", chat_history)

modified_query

'Sure! Based on our conversation, here\'s a potential search query you could use:\n\n"Claude 3 models: Opus, Haiku, Sonnet - features, differences, and applications"\n\nThis search query should return results that provide detailed information about each of the three models in Claude 3, including their respective features, capabilities, and potential applications.'

### 5.2 Create RAG Chat App

In [22]:
def create_prompt(query, context):
    return f"""
        Answer the following question based on the provided context only.
        
        <context>
        {context}
        </context>

        Question: {query}
    """

In [25]:
def rag_chat_app(query: str, chat_history: list[dict])-> str:
    
    modified_query = create_history_aware_query(query, chat_history)
    
    relevant_docs = retrieve_relevant_docs(modified_query)

    context = "\n".join([doc["page-content"] for doc in docs])
    
    prompt = create_prompt(query, context)
    
    messages = chat_history + [{
        "role": "user",
        "content": query
    }]
    
    response = ollama.chat(model=LLM, messages=messages)
    
    return response["message"]["content"], relevant_docs

In [26]:
answer, relevant_docs = rag_chat_app("Tell me about different models in detail.", chat_history)

print(answer)

Certainly! Claude 3 is an AI conversational bot developed by Anthropic, which offers three distinct models: Opus, Haiku, and Sonnet. Here's a detailed overview of each model:

1. **Opus**:
Opus is the flagship model of Claude 3, designed to generate coherent and contextually relevant text. It can engage in conversation, answer questions, and even create stories or poems. Opus has been trained on a diverse range of texts, including books, articles, and websites, allowing it to understand different writing styles and language nuances.
2. **Haiku**:
Haiku is a more lighthearted and playful model than Opus. It's designed to generate short, funny, or quirky responses, often with a touch of humor or sarcasm. Haiku can be used for entertainment purposes, such as creating silly chatbot interactions or generating humorous responses to user input.
3. **Sonnet**:
Sonnet is the most creative and expressive model in Claude 3. It's designed to generate poetic or artistic responses, often with a deep

In [27]:
for doc in relevant_docs:
    print(doc["metadata"])

{'source': 'https://www.anthropic.com/news/releasing-claude-instant-1-2'}
{'source': 'https://www.anthropic.com/news/claude-2-1'}
{'source': 'https://www.anthropic.com/news/claude-2-1-prompting'}
{'source': 'https://www.anthropic.com/claude'}


In [29]:
chat_history.append({
    "role": "user",
    "content": "Tell me about different models in detail."
})

chat_history.append({
    "role": "assistant",
    "content": answer
})

chat_history

[{'role': 'user', 'content': 'Do you know about Claude 3?'},
 {'role': 'assistant',
  'content': 'Yes, I am well aware of Claude 3 AI conversational bot from Anthropic which has 3 models (Opus, Haiku & Sonnet). Please provide more context info on how can I help you.'},
 {'role': 'user', 'content': 'Tell me about different models in detail.'},
 {'role': 'assistant',
  'content': "Certainly! Claude 3 is an AI conversational bot developed by Anthropic, which offers three distinct models: Opus, Haiku, and Sonnet. Here's a detailed overview of each model:\n\n1. **Opus**:\nOpus is the flagship model of Claude 3, designed to generate coherent and contextually relevant text. It can engage in conversation, answer questions, and even create stories or poems. Opus has been trained on a diverse range of texts, including books, articles, and websites, allowing it to understand different writing styles and language nuances.\n2. **Haiku**:\nHaiku is a more lighthearted and playful model than Opus. It

In [30]:
answer, relevant_docs = rag_chat_app("Tell me more about Claude 3 Opus.", chat_history)

print(answer)

Certainly! Claude 3 Opus is the flagship model of Claude 3, designed to generate coherent and contextually relevant text. Here are some key features and capabilities of Opus:

1. **Contextual understanding**:
Opus has been trained on a diverse range of texts, including books, articles, and websites. This training allows it to understand different writing styles, language nuances, and cultural references. As a result, Opus can engage in conversation that is both relevant and contextually appropriate.
2. **Natural language processing**:
Opus has been fine-tuned using advanced natural language processing techniques. This enables it to generate text that is not only grammatically correct but also sounds like it was written by a human. Opus can understand the nuances of language, such as tone, style, and syntax, making its responses feel more natural and human-like.
3. **Creative generation**:
Opus is capable of generating text that goes beyond simple answers or responses. It can create sto

In [31]:
for doc in relevant_docs:
    print(doc["metadata"])

{'source': 'https://www.anthropic.com/news/releasing-claude-instant-1-2'}
{'source': 'https://www.anthropic.com/news/claude-2-1'}
{'source': 'https://www.anthropic.com/news/claude-2-1-prompting'}
{'source': 'https://www.anthropic.com/claude'}


## Summary

In this video, I explained how to create simple **RAG** chatbot application using **Ollama** & **FAISS**. Feel free to let me know your views and doubts in comments section.