# Simple RAG App using Ollama & FAISS

### Table of Contents

1. **Load External Data**
2. **Generate Embeddings**
3. **Create Vector Index**
4. **Create Retriever**
5. **RAG App**

### Installation

* **pip install ollama**
* **pip install faiss-cpu**

In [64]:
import ollama

LLM = "llama2"

response = ollama.generate(model=LLM, prompt="Do you know about Claude 3?")

type(response)

dict

In [65]:
print(response["response"])


I apologize, but I'm a large language model, I don't have access to information about a person or entity called "Claude 3." Could you please provide more context or clarify who or what Claude 3 refers to?


## 1. Load External Data

In [4]:
import requests
from bs4 import BeautifulSoup

urls = [
        "https://www.anthropic.com/news/releasing-claude-instant-1-2",
        "https://www.anthropic.com/news/claude-pro",
        "https://www.anthropic.com/news/claude-2",
        "https://www.anthropic.com/news/claude-2-1",
        "https://www.anthropic.com/news/claude-2-1-prompting",
        "https://www.anthropic.com/news/claude-3-family",
        "https://www.anthropic.com/claude"
       ] 

docs = []

for url in urls:
    response = requests.get(url)
    soup = BeautifulSoup(response.text, 'html.parser')
    docs.append({"page-content": soup.text, "metadata": {"source": url}})
    
docs[0]

{'page-content': 'Releasing Claude Instant 1.2 \\ AnthropicClaudeAPIResearchCompanyNewsCareersProductReleasing Claude Instant 1.2Aug 9, 2023●1 min readBusinesses working with Claude can now access our latest version of Claude Instant, version 1.2, available through our API.\xa0Claude Instant is our faster, lower-priced yet still very capable model, which can handle a range of tasks including casual dialogue, text analysis, summarization, and document comprehension.Claude Instant 1.2 incorporates the strengths of our latest\xa0model Claude 2\xa0in real-world use cases and shows significant gains in key areas like math, coding, reasoning, and safety. It generates longer, more structured responses and follows formatting instructions better. Instant 1.2 also shows improvements in quote extraction, multilingual capabilities, and question answering.Claude Instant 1.2 outperforms Claude Instant 1.1 on math and coding, achieving 58.7% on the Codex evaluation compared to 52.8% in our previous m

## 2. Generate Embeddings

In [67]:
embedding_model = "llama2"

embeds = ollama.embeddings(model=embedding_model, prompt="Do you know about Claude 3?")

type(embeds)

dict

In [68]:
embeds["embedding"][:5], len(embeds["embedding"])

([1.039122223854065,
  -1.5061262845993042,
  0.9997416138648987,
  -0.21553783118724823,
  -2.3464105129241943],
 4096)

## 3. Create Vector Index

* **pip install faiss-cpu**

In [69]:
import faiss

dims = 4096

vector_index = faiss.IndexFlatL2(dims)

vector_index

<faiss.swigfaiss_avx2.IndexFlatL2; proxy of <Swig Object of type 'faiss::IndexFlatL2 *' at 0x7f2635a8aa00> >

In [70]:
import numpy as np

docs_embeds = []

for doc in docs:
    resp = ollama.embeddings(model=embedding_model, prompt=doc["page-content"])
    docs_embeds.append(resp["embedding"])
    
vector_index.add(np.array(docs_embeds))

vector_index.ntotal

7

## 4. Create Retriever

In [71]:
def retriever(query_embeds: list[float], top_k: int=4)-> tuple[np.array, np.array]:
    distances, indexes = vector_index.search(query_embeds, top_k)
    return distances, indexes

In [72]:
embeds = ollama.embeddings(model=embedding_model, prompt="Do you know about Claude 3?")

D, I = retriever(np.array(embeds["embedding"]).reshape(1,-1))

D, I

(array([[18513.047, 18802.807, 18904.535, 19758.133]], dtype=float32),
 array([[3, 5, 6, 0]]))

In [73]:
for idx in I[0]:
    print(docs[idx]["metadata"])

{'source': 'https://www.anthropic.com/news/claude-2-1'}
{'source': 'https://www.anthropic.com/news/claude-3-family'}
{'source': 'https://www.anthropic.com/claude'}
{'source': 'https://www.anthropic.com/news/releasing-claude-instant-1-2'}


In [74]:
def retrieve_relevant_docs(query: str, top_k: int=4)-> list[dict]:
    embeds = ollama.embeddings(model=embedding_model, prompt=query)
    D, I = retriever(np.array(embeds["embedding"]).reshape(1,-1))
    
    return [doc for idx, doc in enumerate(docs) if idx in I[0]] ## To DO: Loop through Indexes instead of Docs.

In [75]:
relevant_docs = retrieve_relevant_docs("Do you know about Claude 3?")

for doc in relevant_docs:
    print(doc["metadata"])

{'source': 'https://www.anthropic.com/news/releasing-claude-instant-1-2'}
{'source': 'https://www.anthropic.com/news/claude-2-1'}
{'source': 'https://www.anthropic.com/news/claude-3-family'}
{'source': 'https://www.anthropic.com/claude'}


## 5. Complete RAG App

In [76]:
def create_prompt(query: str, context: str)-> str:
    return f"""
        Answer the following question based on the provided context only.
        
        <context>
        {context}
        </context>

        Question: {query}
    """

In [77]:
def rag_app(query: str)-> str:
    
    relevant_docs = retrieve_relevant_docs(query)

    context = "\n".join([doc["page-content"] for doc in relevant_docs])
    
    prompt = create_prompt(query, context)
    
    response = ollama.generate(model=LLM, prompt=prompt)
    
    return response["response"]

In [78]:
response = rag_app("Do you know about Claude 3?")

print(response)

    Yes, I am familiar with Claude 3. It is a family of foundational AI models that can be used in various applications such as customer interactions, content moderation, and cost-saving tasks. Claude 3 consists of three models: Haiku, Sonnet, and Opus, each with its own unique capabilities and strengths.

Claude 3 offers several advantages over other AI models on the market, including faster execution, lower costs, and improved intelligence. The models are also designed to be secure, accessible, and trustworthy, making them ideal for enterprise use cases.

Some of the key features of Claude 3 include:

* Advanced reasoning capabilities: Claude 3 can perform complex cognitive tasks beyond simple pattern recognition or text generation.
* Vision analysis: Transcribe and analyze almost any static image, from handwritten notes and graphs to photographs.
* Code generation: Start creating websites in HTML and CSS, turning images into structured JSON data, or debugging complex code bases.
* M

## Summary

In this video, I explained how to create simple **RAG** application using LLMs available through **Ollama**. Feel free to let me know your views and doubts in comments section.