# RAG-based chatbot using DeepSeek-R1

Retrieval-Augmented Generation (RAG) is a framework that combines traditional information retrieval with generative language models (LLMs). It helps to improve the accuracy and relevance of information provided by LLMs. It is used for building AI applications that generate precise, grounded, and contextually relevant answers by retrieving and synthesizing knowledge from external sources.

### How does RAG work? 
- Retrieval: Search through databases, books, and other sources for information
- Augmentation: Extract key facts, ideas, and quotes from the sources
- Generation: Use the extracted information to create new, original content
- Steps
    - 1. Data collection
    - 2. Data chunking: Breaking data down into smaller, more manageable pieces.This improves efficiency since the system can quickly obtain the most relevant pieces of information instead of processing entire documents.
    - 3. Document embeddings: Document parts need be converted into a vector representation. This involves transforming text data into embeddings, which are numeric representations that capture the semantic meaning behind text. It allows the system to understand user queries and match them with relevant information in the source dataset based on the meaning of the text, instead of a simple word-to-word comparison. This method ensures that the responses are relevant and aligned with the user’s query.
    - 4. Handling user queries: Query must be converted into an embedding or vector representation. The same model must be used for both the document and query embedding to ensure uniformity between the two. The system compares the query embedding with the document embeddings. It identifies and retrieves chunks whose embeddings are most similar to the query embedding, using measures such as cosine similarity and Euclidean distance.
    - 5. Generating responses with an LLM: The retrieved text chunks, along with the initial user query, are fed into a language model. The algorithm will use this information to generate a coherent response to the user’s questions through a chat interface.


### Why is RAG useful?
- More accurate: RAG can provide more precise, reliable, and context-specific responses 
- More relevant: RAG can provide up-to-date information by connecting LLMs to news sites, social media, and other frequently-updated sources 
- Less need for retraining: RAG can reduce the need to feed and retrain LLMs on new examples 

### Why Use DeepSeek-R1 With RAG?
- Cost and privacy benefits: DeepSeek-R1 can be run locally to avoid API cost and keep sensitive data secure.
- Offline capabilities: Retrieval systems can work without internet access once the model is downloaded.

In [1]:
import ollama
import re
from concurrent.futures import ThreadPoolExecutor
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.embeddings import OllamaEmbeddings
from chromadb.config import Settings
from chromadb import Client
from langchain.vectorstores import Chroma

In [2]:
response = ollama.chat(
    model="deepseek-r1:1.5b",
    messages=[
        {"role": "user", "content": "What are the different forms of cat communication?"},
    ],
)
print(response["message"]["content"])

<think>
Okay, so I'm trying to figure out what different forms of cat communication there are. I know a little bit about birds and their communication, but cats have been on my mind for some time. Let me think through this step by step.

First off, I remember that humans use various methods to communicate. It's all about conveying emotions or information in a way that makes sense to the other person. Maybe cats do something similar, but they might have their own unique ways of sharing sounds and sounds with others. 

I recall that some cats are vocalists. They can emit loud noises from their paws or faces. That seems pretty straightforward. People often use this when they're happy, sad, or excited. It's like a way to express themselves physically. So maybe cats have a similar system where they use their paws or facial expressions to signal their mood.

Then there are the sounds that come out of their mouths and snouts. I've seen some cats making clear whistles when they're happy or cal

In [3]:
from langchain_community.document_loaders.text import TextLoader
loader = TextLoader('cat-facts.txt', encoding='utf8')
text_file = loader.load()
text_file[0].metadata

{'source': 'cat-facts.txt'}

In [4]:
from langchain_community.document_loaders import WebBaseLoader
loader = WebBaseLoader("https://en.wikipedia.org/wiki/Cat")
wiki = loader.load()
wiki[0].metadata

USER_AGENT environment variable not set, consider setting it to identify your requests.


{'source': 'https://en.wikipedia.org/wiki/Cat',
 'title': 'Cat - Wikipedia',
 'language': 'en'}

Chunking text into manageable segments is crucial to improve the efficiency of search results. Source:https://medium.com/@ayoubkirouane3/simple-chunking-strategies-for-rag-applications-part-1-d56903b167c5

The main factors are:
- Chunk Size: The size of each chunk should strike a balance between maintaining enough context for meaningful analysis and avoiding excessively large chunks that could affect focus. Smaller chunks (e.g., 256 to 512 tokens) are suited for detailed, granular tasks, whereas larger chunks may be better for understanding broader themes.
- Chunk Overlap: An overlap of 100–200 tokens is generally effective. This overlap helps maintain continuity and context between chunks, ensuring that segmentation does not disrupt the flow and coherence of the text.
- Task Specificity: The nature of your task significantly impacts the optimal chunking strategy. For tasks involving precise information retrieval, smaller, more focused chunks can enhance retrieval accuracy. Conversely, tasks requiring complex reasoning or broader context might benefit from larger chunks that capture more comprehensive information.
- Chunking Strategy: The right chunking strategy depends on your application’s requirements and constraints. For simple, structured content, character splitting or recursive chunking may suffice. For more complex documents, document-specific (like reports or manuals) or semantic chunking might be necessary to preserve context and meaning. 

In [5]:
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
text_chunks = text_splitter.split_documents(text_file)
wiki_chunks = text_splitter.split_documents(wiki)
documents = text_chunks + wiki_chunks
print(len(documents))

224


In [6]:
# Initialize Ollama embeddings using DeepSeek-R1
embedding_function = OllamaEmbeddings(model="deepseek-r1:1.5b")

# Parallelize embedding generation
def generate_embedding(chunk):
    return embedding_function.embed_query(chunk.page_content)

with ThreadPoolExecutor() as executor:
    embeddings = list(executor.map(generate_embedding, documents))

  embedding_function = OllamaEmbeddings(model="deepseek-r1:1.5b")


In [7]:
# Initialize Chroma client and create/reset the collection
client = Client(Settings())
collection = client.get_or_create_collection(name="cats")

In [8]:
# Add documents and embeddings to Chroma
for idx, chunk in enumerate(documents):
    collection.add(
        documents=[chunk.page_content], 
        metadatas=[{'id': idx}], 
        embeddings=[embeddings[idx]], 
        ids=[str(idx)]  
    )

In [9]:
# Initialize retriever using Ollama embeddings for queries
retriever = Chroma(collection_name="cats", client=client, embedding_function=embedding_function).as_retriever()

  retriever = Chroma(collection_name="cats", client=client, embedding_function=embedding_function).as_retriever()


In [10]:
def retrieve_context(question):
    # Retrieve relevant documents
    results = retriever.invoke(question)
    # Combine the retrieved content
    context = "\n\n".join([doc.page_content for doc in results])
    return context

In [11]:
context = retrieve_context('What are the different forms of cat communication?')

In [12]:
print(context)

cats, at first by staring, hissing, and growling, and, if that does not work, by short and violent, noisy attacks. Although cats do not have a social survival strategy or herd behavior, they always hunt alone.[102]

Most cats have five claws on their front paws and four on their rear paws. The dewclaw is proximal to the other claws. More proximally is a protrusion which appears to be a sixth "finger". This special feature of the front paws on the inside of the wrists has no function in normal walking but is thought to be an antiskidding device used while jumping. Some cat breeds are prone to having extra digits ("polydactyly").[59]

Several males, called tomcats, are attracted to a female in heat. They fight over her, and the victor wins the right to mate. At first, the female rejects the male, but eventually, the female allows the male to mate. The female utters a loud yowl as the male pulls out of her because a male cat's penis has a band of about 120–150 backward-pointing penile spi

In [13]:
def query_deepseek(question, context):
    # Format the input prompt
    formatted_prompt = f"Question: {question}\n\nContext: {context}"
    # Query DeepSeek-R1 using Ollama
    response = ollama.chat(
        model="deepseek-r1:1.5b",
        messages=[{'role': 'user', 'content': formatted_prompt}]
    )
    # Clean and return the response
    response_content = response['message']['content']
    final_answer = re.sub(r'<think>.*?</think>', '', response_content, flags=re.DOTALL).strip()
    return final_answer

def ask_question(question):
    # Retrieve context and generate an answer using RAG
    context = retrieve_context(question)
    answer = query_deepseek(question, context)
    return answer

In [15]:
answer = ask_question('What are the different forms of cat communication?')
print(answer)

The different forms of cat communication include:

1. **Staring**: A natural response often used to approach or keep others safe.
2. **Hanging down (Hiss)**: A low, resonant sound that can be heard and is used for various reasons.
3. **Growling**: A loud, resonant cry that conveys fear or excitement.
4. **Dewclaw**: A sixth finger on the front paws that allows easier grip and manipulation of objects.
5. **Yowl (from Males)**: A loud sound often associated with anger or dominance.
6. **Clatter**: The crunching sound from the back of the cat's legs when something breaks.
7. **Tail Thrash**: Rasing the tail, often used to relieve tension and calm others.
8. **Bright Light during Social Behavior (Males)**: Cats emit a bright yellow/orange light during active social interactions.
9. **Tail Bob**: A bobbing motion of the tail used to alleviate stress or relieve tension.

These vocalizations can vary depending on the breed, as some may exhibit more complex communication patterns.


In [16]:
# Set up the Gradio interface
import gradio as gr
interface = gr.Interface(
    fn=ask_question,
    inputs="text",
    outputs="text",
    title="RAG Chatbot: Cat",
    description="Ask any question about Cat. Powered by DeepSeek-R1."
)
interface.launch()

  from .autonotebook import tqdm as notebook_tqdm


* Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.


