<a href="https://colab.research.google.com/github/pranalibose/LangVisionWorkshop/blob/main/RAG_Demonstration.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Retrieval-Augmented Generation (RAG)

## Introduction to RAG

Retrieval-Augmented Generation (RAG) is a powerful technique that enhances Large Language Models (LLMs) by grounding their responses in external, factual knowledge.  It addresses the limitations of LLMs by enabling them to access and incorporate relevant information from external sources like databases, documents, or the web, leading to more accurate, contextually appropriate, and up-to-date responses.

## Comparison of Fine-Tuning vs RAG

| Feature        | Fine-Tuning                               | RAG                                       |
|----------------|-------------------------------------------|--------------------------------------------|
| Data           | Task-specific training data                | External knowledge sources (databases, docs) |
| Updates        | Requires retraining for new information   | Updates with the knowledge source           |
| Specificity    | Tailored to a specific task/domain         | Can adapt to various domains with sources   |
| Hallucinations | Can still generate incorrect info if in training data| Reduces hallucinations by grounding in facts |
| Cost           | Computationally expensive                 | Less computationally expensive for updates    |
| Maintenance    | Higher maintenance due to retraining      | Easier maintenance with source updates       |

## Why LLMs Alone Are Not Enough for Real-World AI Applications

LLMs, while powerful, have inherent limitations:

*   **Limited Knowledge:** They are trained on vast datasets but don't have access to real-time information or specific knowledge bases.
*   **Hallucinations:** They can generate factually incorrect or nonsensical outputs, especially when dealing with specialized or evolving information.
*   **Lack of Explainability:**  It's often difficult to understand *why* an LLM produced a particular response.
*   **Data Staleness:** Their knowledge is limited to the data they were trained on, making them unable to provide up-to-date information.

## RAG Architecture

The typical RAG architecture involves:

1.  **Retrieval:** A query is formulated based on the user's input. This query is used to search a knowledge base for relevant documents or chunks of information.
2.  **Augmentation:** The retrieved information is combined with the original user input to create an augmented prompt.
3.  **Generation:** The LLM receives the augmented prompt and generates a response grounded in the retrieved knowledge.

## How RAG Retrieves External Knowledge to Improve Chatbot Responses

RAG improves chatbot responses by:

*   **Providing Context:** Retrieving relevant information provides the LLM with the necessary context to generate accurate and informative responses.
*   **Ensuring Factual Accuracy:** Grounding responses in external knowledge reduces the likelihood of hallucinations and ensures factual accuracy.
*   **Enabling Access to Up-to-Date Information:**  RAG can access dynamic knowledge bases, allowing chatbots to provide current and relevant information.

## Difference Between Basic Chatbots and RAG-Powered Chatbots

| Feature        | Basic Chatbots                               | RAG-Powered Chatbots                           |
|----------------|-------------------------------------------|-----------------------------------------------|
| Knowledge      | Limited to training data                   | Access to external knowledge bases            |
| Accuracy       | Prone to hallucinations                     | More accurate and factually grounded            |
| Context        | Limited contextual understanding             | Richer contextual understanding               |
| Up-to-dateness | Static knowledge                           | Can access up-to-date information           |
| Explainability | Difficult to explain responses              | Can provide source information for responses |

## Why LLMs Alone Are Not Enough for Real-World Resume Analysis

LLMs alone are insufficient for robust resume analysis because:

*   **Specific Requirements:** Resume analysis often requires matching candidates to specific job requirements, which are typically stored in external databases or job descriptions. LLMs lack access to this information.
*   **Contextual Understanding of Skills:**  LLMs might not fully understand the nuances of specific skills or their relevance to particular roles without external context.
*   **Data Privacy:**  Resumes contain sensitive personal information.  RAG can be designed to access and process this data securely within a controlled environment, which may be difficult with just an LLM.
*   **Dynamic Job Market:** The job market and required skills are constantly evolving. RAG can adapt to these changes by accessing up-to-date information, something a static LLM cannot do.  A RAG-based system can retrieve the latest job descriptions and skill requirements, ensuring that the resume analysis is relevant and accurate.

# RAG for Resume Standards and Chatbot Integration

This section explores how Retrieval-Augmented Generation (RAG) can be used to retrieve industry resume standards and integrate them into chatbot responses, enhancing the chatbot's ability to provide helpful resume advice.

## How RAG Retrieves Industry Resume Standards

1.  **Knowledge Base Creation:** A curated knowledge base of industry resume standards is created. This could include:
    *   PDF documents of best practices guides.
    *   Web pages with resume tips and examples.
    *   A structured database of resume rules and recommendations.
2.  **Indexing:** The knowledge base is indexed for efficient searching. This involves:
    *   Chunking the content into smaller, manageable pieces.
    *   Creating embeddings (vector representations) of the chunks using a language model.
    *   Storing the embeddings in a vector database.
3.  **Retrieval:** When a user asks a resume-related question, the chatbot:
    *   Formulates a query based on the user's input.
    *   Searches the vector database for relevant chunks of information.
    *   Retrieves the most similar chunks based on their embeddings.

In [2]:
#!pip install -U faiss-cpu langchain-community langchain-openai tiktoken transformers

In [3]:
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
from sentence_transformers import SentenceTransformer
import faiss
import numpy as np
import torch

# Load embedding model and HF fine-tuned model
embedding_model = SentenceTransformer("all-MiniLM-L6-v2")
model_name = "pranalibose/flan-t5-base"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)

def embed_text(text):
    return embedding_model.encode(text, convert_to_numpy=True)

# Sample resumes
resumes = [
    "Software engineer with 5 years of experience in Python and ML.",
    "Data scientist skilled in deep learning and data visualization.",
    "Project manager with expertise in agile methodologies.",
    "Backend developer specializing in Node.js and cloud computing."
]

# Create FAISS index
d = embed_text(resumes[0]).shape[0]  # Embedding size
index = faiss.IndexFlatL2(d)
resume_embeddings = np.array([embed_text(resume) for resume in resumes])
index.add(resume_embeddings)

def search_resumes(query, k=2):
    query_embedding = embed_text(query).reshape(1, -1)
    distances, indices = index.search(query_embedding, k)
    return [resumes[i] for i in indices[0]]

def analyze_resume(text):
    inputs = tokenizer(text, return_tensors="pt", max_length=512, truncation=True)
    with torch.no_grad():
        outputs = model.generate(**inputs, max_length=150)
    return tokenizer.decode(outputs[0], skip_special_tokens=True)

# Example usage
query = "Looking for an AI engineer skilled in Python"
retrieved_resumes = search_resumes(query)
analysis = [analyze_resume(resume) for resume in retrieved_resumes]

print("Query:", query)
print("Retrieved Resumes:", retrieved_resumes)
print("Analysis:", analysis)

Error while fetching `HF_TOKEN` secret value from your vault: 'Requesting secret HF_TOKEN timed out. Secrets can only be fetched when running from the Colab UI.'.
You are not authenticated with the Hugging Face Hub in this notebook.
If the error persists, please let us know by opening an issue on GitHub (https://github.com/huggingface/huggingface_hub/issues/new).


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/10.7k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

1_Pooling%2Fconfig.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/20.8k [00:00<?, ?B/s]

spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

added_tokens.json:   0%|          | 0.00/2.59k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/2.54k [00:00<?, ?B/s]

You set `add_prefix_space`. The tokenizer needs to be converted from the slow tokenizers


adapter_config.json:   0%|          | 0.00/708 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/1.40k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/990M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/147 [00:00<?, ?B/s]

adapter_model.safetensors:   0%|          | 0.00/6.66M [00:00<?, ?B/s]

Query: Looking for an AI engineer skilled in Python
Retrieved Resumes: ['Software engineer with 5 years of experience in Python and ML.', 'Data scientist skilled in deep learning and data visualization.']
Analysis: ['Emphasize your experience with software engineering and machine learning.', 'Data scientist with experience in data visualization.']
