# Week 2 - Research Guide: Key Technologies for Fun Response Mode Chatbot

##Take a moment to reasearch the following Terms:


1. LangChain
2. FAISS
3. RAG
4. Embeddings



---

### LangChain, FAISS, RAG & Embeddings

---

## ü¶ú **LangChain**

### **What is LangChain?**
LangChain is a framework for developing applications powered by large language models (LLMs). LangChain simplifies every stage of the LLM application lifecycle by providing modular components that make building AI applications much easier.

### **Key Features & Capabilities:**
- **Modular Architecture**: LangChain's power lies in its modular architecture - you can mix and match components
- **Tool Integration**: Connect LLMs to APIs, databases, search engines, and external services
- **Memory Management**: It offers complete memory management, tool-chain organization, agent regulation, and context retention integration into one unified structure
- **Multi-Agent Systems**: LangChain has solidified itself as the go-to framework for building sophisticated, autonomous multi-agent systems

### **Why Use LangChain for Chatbots?**
- **Chain Different Operations**: Link together prompts, models, and tools in sequence
- **Context Retention**: Usually, whenever you request something from a model, it does not retain any information after providing the response - LangChain fixes this
- **Real-time Data Access**: Connect your chatbot to live data sources
- **Easy Integration**: Works with OpenAI, Anthropic, Google, and many other AI providers

### **LangChain in 2025:**
The question now is: Is LangChain still needed in 2025? The answer is **YES** - it's more relevant than ever with enhanced features like:
- **LangGraph**: For building complex, stateful agent workflows
- **LangSmith**: For debugging and monitoring AI applications
- **Better Documentation**: Improved learning resources and examples

---

## üîç **FAISS (Facebook AI Similarity Search)**

### **What is FAISS?**
Faiss is a library for efficient similarity search and clustering of dense vectors. It contains algorithms that search in sets of vectors of any size, up to ones that possibly do not fit in RAM

### **Key Features:**
- **Lightning Fast**: We've built nearest-neighbor search implementations for billion-scale data sets that are some 8.5x faster than the previous reported state-of-the-art
- **Memory Efficient**: Some of the methods, like those based on binary vectors and compact quantization codes, solely use a compressed representation of the vectors and do not require to keep the original vectors
- **GPU Acceleration**: Faiss supports GPU acceleration, significantly enhancing the speed of vector operations and making it suitable for real-time applications
- **Scalable**: Can handle millions to billions of vectors

### **Why Use FAISS in Chatbots?**
- **Fast Document Retrieval**: Quickly find the most relevant information from your knowledge base
- **Similarity Search**: Find documents similar to user queries in milliseconds
- **Memory Optimization**: Faiss focuses on methods that compress the original vectors, because they're the only ones that scale to data sets of billions of vectors
- **Easy Integration**: The integration lives in the langchain-community package

### **FAISS vs Other Vector Databases:**
While there are alternatives like Pinecone and ChromaDB, Faiss is an open-source library for the swift search of similarities and the clustering of dense vectors that's completely free and integrates seamlessly with LangChain.

---

##  **RAG (Retrieval-Augmented Generation)**

### **What is RAG?**
Retrieval-Augmented Generation (RAG) is the process of optimizing the output of a large language model, so it references an authoritative knowledge base outside of its training data sources before generating a response

### **How RAG Works:**
1. **User Query**: Person asks a question
2. **Retrieval**: System searches knowledge base for relevant information
3. **Augmentation**: The RAG model augments the user input (or prompts) by adding the relevant retrieved data in context
4. **Generation**: LLM generates answer using both its training and retrieved information

### **Why RAG is Revolutionary:**
- **Up-to-date Information**: RAG extends the already powerful capabilities of LLMs to specific domains or an organization's internal knowledge base, all without the need to retrain the model
- **Reduces Hallucinations**: Grounds responses in factual, retrieved data
- **Cost-Effective**: It is a cost-effective approach to improving LLM output so it remains relevant, accurate, and useful in various contexts
- **Domain-Specific Knowledge**: Add company-specific or specialized information

### **RAG in 2025:**
In 2025, advanced RAG systems address these and other limitations with a variety of innovations and architectural considerations. These enhancements push RAG from useful to indispensable

**New RAG Capabilities:**
- **Adaptive RAG**: Systems now dynamically adjust retrieval strategies based on query intent
- **Multi-modal RAG**: Handle text, images, and other data types
- **Self-Correcting RAG**: Systems that validate their own outputs

---

##  **Embeddings**

### **What are Embeddings?**
Technically, embeddings are vectors created by machine learning models for the purpose of capturing meaningful data about each object - they convert words, images, and other data into numbers that computers can understand and compare.

### **How Embeddings Work:**
Embeddings convert real-world objects into complex mathematical representations that capture inherent properties and relationships between real-world data

**Simple Example:**
- "cat" might become `[0.2, -0.4, 0.7]`
- "dog" might become `[0.3, -0.5, 0.6]`
- "car" might become `[0.8, 0.1, -0.2]`

Notice how "cat" and "dog" are closer together (more similar) than either is to "car"!

### **Why Embeddings Matter:**
- **Semantic Understanding**: Since embeddings make it possible for computers to understand the relationships between words and other objects, they are foundational for artificial intelligence (AI)
- **Similarity Search**: Essentially, embeddings enable machine learning models to find similar objects
- **Foundation of Modern AI**: Vector embeddings thus underpin nearly all modern machine learning, powering models used in the fields of NLP and computer vision, and serving as the fundamental building blocks of generative AI

### **Types of Embeddings:**
- **Text Embeddings**: Convert words/sentences to vectors
- **Image Embeddings**: Convert images to numerical representations
- **Multimodal Embeddings**: Handle multiple data types together

### **2025 State of Embeddings:**
With the exception of OpenAI (whose text-embedding-3 models from March 2023 are ancient in light of the pace of AI progress), all the prominent commercial vector embedding vendors released a new version of their flagship models in late 2024 or early 2025

---

## üîó **How They Work Together in Your Chatbot**

### **The Complete Pipeline:**

1. **Knowledge Preparation** (Embeddings + FAISS):
   - Convert your FAQ documents into embeddings (numerical vectors)
   - Store these embeddings in FAISS for fast similarity search

2. **User Query Processing** (RAG):
   - User asks: "What are your store hours?"
   - Convert question to embedding
   - Use FAISS to find most similar FAQ items

3. **Response Generation** (LangChain):
   - LangChain retrieves relevant documents
   - Adds mood/personality using system prompts
   - Generates final response using LLM

4. **Fun Response Mode**:
   - Apply system prompt engineering for different personality modes
   - Generate funny, mysterious, or serious versions of the same answer through prompt design

### **Example Workflow:**
```
User: "What are your store hours?" (Mood: Funny)

1. Embedding: [0.1, 0.8, 0.3, ...]
2. FAISS Search: Finds "We are open 9am‚Äì9pm, Mon‚ÄìSat"
3. RAG Retrieval: Gets store hours policy document
4. LangChain + System Prompt: "You are a witty assistant who loves humor.
   Based on this context: 'We are open 9am‚Äì9pm, Mon‚ÄìSat'
   Answer: What are your store hours?"
5. Response: "We're open 9am-9pm Monday through Saturday!
   We're like vampires - we come alive when the sun goes down,
   but we still close at 9pm because even vampires need sleep!"
```

---

## üí° **Why This Matters for Your Week 2 Project**

### **Learning Objectives Achieved:**
- **Prompt Engineering**: Using system prompts for personality and mood control
- **Data Processing**: Understanding how text becomes searchable embeddings
- **System Integration**: Seeing how multiple AI components work together
- **Real-world Application**: Building something that could actually be deployed

### **Technical Skills Developed:**
- **Vector Databases**: Understanding modern data storage for AI
- **Information Retrieval**: Learning how search engines really work
- **AI Frameworks**: Hands-on experience with industry-standard tools
- **API Integration**: Connecting different AI services together

### **Industry Relevance:**
These four technologies power virtually every AI application you use daily:
- **ChatGPT**: Uses embeddings and retrieval techniques
- **Google Search**: Employs vector similarity for results
- **Recommendation Systems**: Netflix, Spotify use embeddings
- **Customer Service Bots**: Built with LangChain + RAG architectures

---

## **Getting Started Tips**

### **Installation Order:**
1. **transformers** - For pre-trained AI models
2. **langchain** - Main framework
3. **langchain-community** - Additional integrations
4. **sentence-transformers** - For creating embeddings
5. **torch** - Deep learning backend
6. **faiss-cpu** - Vector similarity search

### **Best Practices:**
- Start simple with basic RAG, then add complexity
- Test each component individually before combining
- Use small datasets while learning
- Experiment with different embedding models
- Design effective system prompts for different personality modes

### **Common Pitfalls to Avoid:**
- Don't try to implement everything at once
- Make sure your embeddings model matches your language
- Test with simple questions first
- Keep system prompts clear and concise for consistent behavior

---

**Ready to build your Fun Response Mode Chatbot? These technologies will give you the foundation to create an AI assistant that's both intelligent and entertaining!** ‚ú®

# Start Avtivity

## Cell 1: Install Required Libraries

In [None]:
# Install all required libraries for our chatbot project
# Each library serves a specific purpose:

%pip install __________          # For pre-trained AI models (BERT, DistilBERT, etc.)
%pip install langchain             # Framework for building applications with language models
%pip install langchain-community   # Community extensions for LangChain
%pip install sentence-transformers # For creating text embeddings (converting text to numbers)
%pip install  __________                 # PyTorch - deep learning framework (backend for transformers)
%pip install faiss-cpu             # Facebook AI Similarity Search - for fast similarity searches


Collecting langchain-community
  Downloading langchain_community-0.3.29-py3-none-any.whl.metadata (2.9 kB)
Collecting requests<3,>=2.32.5 (from langchain-community)
  Downloading requests-2.32.5-py3-none-any.whl.metadata (4.9 kB)
Collecting dataclasses-json<0.7,>=0.6.7 (from langchain-community)
  Downloading dataclasses_json-0.6.7-py3-none-any.whl.metadata (25 kB)
Collecting marshmallow<4.0.0,>=3.18.0 (from dataclasses-json<0.7,>=0.6.7->langchain-community)
  Downloading marshmallow-3.26.1-py3-none-any.whl.metadata (7.3 kB)
Collecting typing-inspect<1,>=0.4.0 (from dataclasses-json<0.7,>=0.6.7->langchain-community)
  Downloading typing_inspect-0.9.0-py3-none-any.whl.metadata (1.5 kB)
Collecting mypy-extensions>=0.3.0 (from typing-inspect<1,>=0.4.0->dataclasses-json<0.7,>=0.6.7->langchain-community)
  Downloading mypy_extensions-1.1.0-py3-none-any.whl.metadata (1.1 kB)
Downloading langchain_community-0.3.29-py3-none-any.whl (2.5 MB)
[2K   [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚î

## Cell 2: Import Libraries and Setup

In [2]:
# Import all the libraries we need for our chatbot

# Import pipeline from transformers - this gives us easy access to pre-trained models
from transformers import pipeline

# Import FAISS for creating a searchable database of text
from langchain_community.vectorstores import FAISS

# Import embeddings to convert text into numerical vectors for similarity search
from langchain_community.embeddings import HuggingFaceEmbeddings

# Import Document class to structure our knowledge data
from langchain.docstore.document import Document



## Cell 3: Setup Knowledge Base and QA Pipelin

In [3]:
# --- Step 1: Define knowledge base ---
# Create a simple knowledge base with question-answer pairs
# This is like creating a mini-encyclopedia for our chatbot
faq_data = [
    ("What is Python?", "Python is a high-level programming language known for its simplicity and readability."),
    ("What is machine learning?", "Machine learning is a subset of AI that enables computers to learn without being explicitly programmed."),
    ("What is a chatbot?", "A chatbot is a computer program designed to simulate conversation with human users."),
    ("What is the return policy?", "30 days return with full refund."),
    ("What are your store hours?", "We are open 9am‚Äì9pm, Mon‚ÄìSat."),
    ("Do you ship internationally?", "Yes, we ship worldwide, including Australia.")
]

## Cell 4: Convert Knowledge base into LangChain Document objects

In [23]:
# --- Step 2: Convert our FAQ data into LangChain Document objects ---
# Each document contains both the question and answer as searchable content
# List comprehension: [expression for item in list] creates a new list
documents = [Document(page_content=qa[0] + " " + qa[1]) for qa in faq_data]

## Cell 5: Create embeddings model

In [24]:
# --- Step 3:  Create embeddings model - this converts text into numerical vectors ---
# We use a pre-trained model that's good at understanding sentence meanings
embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")

# Here is a larger model
# embeddings = HuggingFaceEmbeddings(model_name="gasolsun/DynamicRAG-8B")

## Cell 6: Create a FAISS database

In [25]:
# --- Step 4:  Create a FAISS database from our documents (FAISS index)
# FAISS allows us to quickly find the most relevant documents for any question
db = FAISS.from_documents(documents, embeddings)

## Cell 7: Load a pre-trained question-answering model

In [26]:
# --- Step 5:  Load a pre-trained question-answering model ---
# DistilBERT is a smaller, faster version of BERT that's good for Q&A tasks
qa_pipeline = pipeline("question-answering", model="distilbert-base-uncased-distilled-squad")

"""Additional models from HugginFace"""
# qa_pipeline = pipeline("question-answering", model="consciousAI/question-answering-generative-t5-v1-base-s-q-c")
# qa_pipeline = pipeline("question-answering", model="deepset/roberta-base-squad2")
# qa_pipeline = pipeline("question-answering", model="google-bert/bert-large-cased-whole-word-masking-finetuned-squad")
# qa_pipeline = pipeline("question-answering", model="gasolsun/DynamicRAG-8B")
print("Knowledge base and QA pipeline loaded successfully!")

Fetching 0 files: 0it [00:00, ?it/s]

Fetching 1 files:   0%|          | 0/1 [00:00<?, ?it/s]

Fetching 0 files: 0it [00:00, ?it/s]

Device set to use cuda:0


Knowledge base and QA pipeline loaded successfully!


## Cell 8: Create Questions

In [27]:
# --- Step 6: Ask questions ---
questions = [
    "Can I return a product after 2 weeks?",
    "Do you ship to Australia?",
    "What time do you open on Monday?",
    "Do you sell electronics?",
    "what is python"
]

In [29]:
# print("\n--- Week 2 Chatbot Response ---")
# for q in questions:
#     # Retrieve most relevant FAQ context
#     docs = db.similarity_search(q, k=2)
#     context = " ".join([d.page_content for d in docs])

#     # Run QA model properly
#     result = qa_pipeline({"question": q, "context": context})

#     # If confidence is low ‚Üí fallback "I don't know"
#     answer = result["answer"] if result["score"] > 0.2 else "I don't know."

#     print(f"\nQ: {q}\nA: {answer}")


# Print a header to clearly separate this section of output
print("\n--- Week 2 Chatbot Response ---")

# Loop through each question in the questions list
for q in questions:

    # STEP 1: FIND RELEVANT INFORMATION
    # Use FAISS similarity search to find the 2 most relevant FAQ documents
    # 'db' is our FAISS vector database containing embedded FAQ data
    # 'k=2' means "return the top 2 most similar documents"
    docs = db.similarity_search(q, k=2)

    # STEP 2: PREPARE CONTEXT FOR THE AI MODEL
    # Combine the content from retrieved documents into one text string
    # This creates the "context" that will help the QA model answer accurately
    # Each 'd.page_content' contains the text from a retrieved FAQ document
    context = " ".join([d.page_content for d in docs])

    # STEP 3: GENERATE ANSWER USING QA MODEL
    # Pass both the user's question AND the retrieved context to the QA pipeline
    # The model will use the context to generate a more accurate, grounded answer
    # This is the "Augmented" part of Retrieval-Augmented Generation (RAG)
    result = qa_pipeline({"question": q, "context": context})

    # STEP 4: CONFIDENCE-BASED ANSWER SELECTION
    # Check if the model is confident enough in its answer (score > 0.2)
    # If confidence is too low, use a safe fallback response
    # This prevents the chatbot from giving unreliable or hallucinated answers
    answer = result["answer"] if result["score"] > 0.2 else "I don't know."

    # STEP 5: DISPLAY RESULTS
    # Format and print the question-answer pair for easy reading
    # \n creates line breaks for better formatting
    print(f"\nQ: {q}\nA: {answer}")



--- Week 2 Chatbot Response ---

Q: Can I return a product after 2 weeks?
A: I don't know.

Q: Do you ship to Australia?
A: I don't know.

Q: What time do you open on Monday?
A: 9am‚Äì9pm

Q: Do you sell electronics?
A: I don't know.

Q: what is python
A: a high-level programming language


# Class Activity: RAG-Based Question Answering System with Mistral
---

### Class Activity: Building an Intelligent Q&A System with FAISS and Mistral

**Objective:** Students will design and implement a Retrieval-Augmented Generation (RAG) system using Mistral-7B-Instruct, FAISS vector database, and custom business data.

**Submission:** Submit the link to your completed notebook.

---

## **Instructions:**

### **1. Create an Assistant System Prompt**
* Using `mistralai/Mistral-7B-Instruct-v0.3`, design a system prompt that gives the model a specific role
* Examples: "You are a marketing expert for a tech startup" or "You are a database creator for a healthcare organization"
* Choose a specific organization/business context you'll work with throughout this activity

### **2. Generate Business Database Content**
* Use `mistralai/Mistral-7B-Instruct-v0.3` from Hugging Face to create prompts that generate:
  - A comprehensive Q&A database for your chosen business/organization
  - Minimum 10-15 question-answer pairs covering different aspects of the business
* **Add comments in your notebook clearly showing the new database Q&A pairs**

### **3. Implement FAISS Vector Database**
* Convert your generated Q&A database into embeddings
* Store the embeddings in a FAISS index for efficient similarity search
* **Use comments to demonstrate the database implementation process**

### **4. Create Test Questions**
* Using `mistralai/Mistral-7B-Instruct-v0.3`, generate two types of questions:
  - **Answerable questions**: Can be directly answered from your database
  - **Unanswerable questions**: Require information not present in your database
* Create at least 5 questions of each type

### **5. Implement and Test Questions**
* Run both types of questions through your RAG system
* **Use clear comments to differentiate between:**
  - Questions that can be answered (with expected good retrieval)
  - Questions that cannot be answered (testing system limitations)

### **6. Model Experimentation and Ranking**
* Test your RAG system with multiple Q&A models from Hugging Face
* **Experiment with these specific QA models and include two additional models of your choice:**

```python
"""Required models to test"""
# Experiment with these QA models:
qa_pipeline = pipeline("question-answering", model="consciousAI/question-answering-generative-t5-v1-base-s-q-c")
qa_pipeline = pipeline("question-answering", model="deepset/roberta-base-squad2")
qa_pipeline = pipeline("question-answering", model="google-bert/bert-large-cased-whole-word-masking-finetuned-squad")
qa_pipeline = pipeline("question-answering", model="gasolsun/DynamicRAG-8B")

"""Additional models from HuggingFace"""
# Add two more QA models of your choice from Hugging Face
# Example options: distilbert-base-cased-distilled-squad, microsoft/DialoGPT-medium, etc.

print("Knowledge base and QA pipeline loaded successfully!")
```

* **Test all models with both answerable and unanswerable questions**
* **Rank the models from best to worst performance and explain why**
* **Identify which model(s) provide confidence scores but still show reasonable output**
* **Compare performance across different question types (factual, reasoning, out-of-scope)**

#### **Model Evaluation Criteria:**
- **Accuracy**: How well does it answer questions from your database?
- **Confidence Handling**: Does it appropriately indicate uncertainty for unanswerable questions?
- **Response Quality**: Are the answers coherent and relevant?
- **Speed**: How fast does each model process queries?
- **Robustness**: How does it handle edge cases and out-of-scope questions?

---

## **Technical Requirements:**

### **Installation:**
```python
# Required packages
%pip install transformers torch sentence-transformers faiss-cpu langchain
```

### **Key Components to Include:**
1. **System Prompt Design** - Clear agentic role definition
2. **Database Generation** - Mistral-generated business Q&A pairs  
3. **FAISS Implementation** - Vector storage and retrieval
4. **Question Testing** - Both answerable and unanswerable queries
5. **Model Comparison** - Performance analysis and ranking
6. **Confidence Analysis** - Model uncertainty vs. output quality

---

## **Deliverables:**

Your notebook should demonstrate:

* **Business Context**: Clear organization/role you've chosen
* **Generated Database**: Commented Q&A pairs created by Mistral
* **FAISS Integration**: Working vector database implementation
* **Question Analysis**: Clear separation of answerable vs. unanswerable questions
* **Model Evaluation**: Ranked comparison of different Q&A models
* **Performance Insights**: Analysis of confidence scores and output quality
* **Reflection**: Strengths, weaknesses, and real-world applications

---

## **Evaluation Criteria:**

* **Creativity** in business context and agentic role design
* **Technical Implementation** of RAG pipeline with FAISS
* **Quality Analysis** of different Q&A models
* **Clear Documentation** with meaningful comments
* **Critical Thinking** in model comparison and limitations analysis

---

**Note:** Focus on building a system that could realistically be deployed for a real business use case. Consider scalability, accuracy, and user experience in your implementation choices.

In [10]:
# # Check if GPU is available and set device accordingly
# if torch.cuda.is_available():
#     device = 0  # Use GPU
#     print("GPU detected - Using GPU for faster processing")
# else:
#     device = -1  # Use CPU
#     print("GPU not available - Using CPU")