<a href="https://colab.research.google.com/github/frank-morales2020/MLxDL/blob/main/CAG_RAG_DEMO.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
!pip install pymupdf -q # Install PyMuPDF
!pip install pypdf2 -q # Install the PyPDF2 library using pip.
!pip install pikepdf -q
!pip install pdfplumber -q
!pip install sentence-transformers -q

In [11]:
!nvidia-smi

Mon Jan 20 13:49:55 2025       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.104.05             Driver Version: 535.104.05   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|   0  NVIDIA L4                      Off | 00000000:00:03.0 Off |                    0 |
| N/A   70C    P0              32W /  72W |   1345MiB / 23034MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                    

In [10]:
import requests
import PyPDF2
from transformers import AutoTokenizer, AutoModelForCausalLM
import pikepdf
import io
import pdfplumber
import fitz  # Import PyMuPDF
import re
from sentence_transformers import SentenceTransformer, util

import pickle
import os
from sklearn.feature_extraction.text import TfidfVectorizer


import torch

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Initialize the LLM
model_name = "gpt2"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

# URL of the PDF file (UPDATED with a VALID URL)
pdf_url = "https://people.engr.tamu.edu/guni/csce421/files/AI_Russell_Norvig.pdf"

# Function to extract text from PDF using PyMuPDF, loading from URL with User-Agent
def extract_text_from_pdf(pdf_url):
    try:
        headers = {
            "User-Agent": "MyBot/1.0 (MyProject; myemail@example.com)"
            # Replace with your bot name, project, and email
        }
        response = requests.get(pdf_url, headers=headers)
        response.raise_for_status()  # Check for HTTP errors

        with fitz.open(stream=response.content, filetype="pdf") as doc:
            text = ""
            for page in doc:
                text += page.get_text()
            return text
    except Exception as e:
        print(f"Error extracting text with PyMuPDF (from URL): {e}")
        return ""

# Extract text from the downloaded PDF
pdf_text = extract_text_from_pdf(pdf_url)  # Pass the URL directly

# Initialize the sentence transformer model
sentence_model = SentenceTransformer('all-MiniLM-L6-v2', device=device)

# --- Process the extracted text (updated logic with keywords and flexible text splitting) ---
knowledge_base = {}
keywords = ["artificial intelligence", "turing test", "machine learning", "deep learning"]

# Use a sliding window approach to create overlapping chunks
chunk_size = 500  # Adjust this value for optimal results
overlap = 250     # Adjust this value for overlap between chunks

chunks = []  # Store all chunks for embedding
for i in range(0, len(pdf_text), chunk_size - overlap):
    chunk = pdf_text[i:i + chunk_size]
    chunks.append(chunk)

# Cache file for embeddings
cache_file = "chunk_embeddings.pkl"

# Try to load embeddings from cache
if os.path.exists(cache_file):
    with open(cache_file, "rb") as f:
        chunk_embeddings = pickle.load(f)
else:
    # Generate and cache embeddings if not found
    batch_size = 32
    chunk_embeddings = []
    for i in range(0, len(chunks), batch_size):
        batch = chunks[i : i + batch_size]
        #batch_embeddings = sentence_model.encode(batch)
        batch_embeddings = sentence_model.encode(batch, device=device)
        chunk_embeddings.extend(batch_embeddings)

    with open(cache_file, "wb") as f:
        pickle.dump(chunk_embeddings, f)

# Create knowledge base using chunks and embeddings
for i, chunk in enumerate(chunks):
    for keyword in keywords:
        if keyword in chunk.lower():
            if keyword not in knowledge_base:
                knowledge_base[keyword] = []
            knowledge_base[keyword].append((chunk, chunk_embeddings[i]))  # Store chunk and embedding
            break

# Cache for CAG
cache = {}

# Function to retrieve from the cache (CAG)
def retrieve_from_cache(query):
    if query in cache:
        print("Answer found in cache!")
        return cache[query]
    return None

# Function to retrieve from the knowledge base (RAG - updated for sentence similarity)
def retrieve_from_knowledge_base(query):
    query_embedding = sentence_model.encode(query)

    # Batch encode all chunks
    chunk_embeddings = sentence_model.encode(chunks)

    best_match = None
    best_similarity = -1

    # Iterate through chunk embeddings
    for i, chunk_embedding in enumerate(chunk_embeddings):
        similarity = util.cos_sim(query_embedding, chunk_embedding).item()
        if similarity > best_similarity:
            best_similarity = similarity
            best_match = chunks[i]  # Get the corresponding chunk

    if best_match:
        return best_match
    else:
        return None

# Function to generate text with CAG-RAG
def generate_text_cag_rag(query):
    # 1. Check the cache (CAG)
    cached_answer = retrieve_from_cache(query)
    if cached_answer:
        return cached_answer

    # 2. If not in cache, retrieve from knowledge base (RAG)
    answer = retrieve_from_knowledge_base(query)
    if answer:
        cache[query] = answer  # Add to cache for future use
        return answer

    # 3. If not found in either, use the LLM
    inputs = tokenizer(query, return_tensors="pt")

    #outputs = model.generate(**inputs, pad_token_id=tokenizer.eos_token_id, max_length=200)

    outputs = model.generate(
        **inputs,
        pad_token_id=tokenizer.eos_token_id,
        max_new_tokens=150,  # Adjust as needed
        no_repeat_ngram_size=3,
        repetition_penalty=1.2,
        #early_stopping=True
    )



    generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
    return generated_text

# Example usage
user_queries = [
    "What is Artificial Intelligence?",
    "What is Artificial Intelligence?",  # Repeated to show CAG in action
    "What is a Turing Test?",
    "What is machine learning?",  # This will likely fall back to the LLM
]

model.config.pad_token_id = tokenizer.eos_token_id  # Set in generation config

for query in user_queries:
    response = generate_text_cag_rag(query)
    print(f"Query: {query}")
    print(f"Response: {response}")
    print("-" * 20)

Query: What is Artificial Intelligence?
Response: e, reason, and act.”
(Winston, 1992)
Acting Humanly
Acting Rationally
“The art of creating machines that per-
form functions that require intelligence
when performed by people.” (Kurzweil,
1990)
“Computational Intelligence is the study
of the design of intelligent agents.” (Poole
et al., 1998)
“The study of how to make computers do
things at which, at the moment, people are
better.” (Rich and Knight, 1991)
“AI . . . is concerned with intelligent be-
havior in artifacts.” (Nilsson, 1998)
Figure 1
--------------------
Answer found in cache!
Query: What is Artificial Intelligence?
Response: e, reason, and act.”
(Winston, 1992)
Acting Humanly
Acting Rationally
“The art of creating machines that per-
form functions that require intelligence
when performed by people.” (Kurzweil,
1990)
“Computational Intelligence is the study
of the design of intelligent agents.” (Poole
et al., 1998)
“The study of how to make computers do
things at which, at t

While the POC is sufficient for demonstrating the basic functionality, you'll likely need to consider additional aspects when moving towards a production-ready system:

Scalability:

Efficient Knowledge Base Storage: If your PDF or knowledge source is very large or you have multiple sources, consider using a more scalable knowledge base solution like a vector database (e.g., Pinecone, Weaviate, Faiss) to store and retrieve embeddings efficiently.
Optimized Retrieval: Implement more advanced retrieval techniques (e.g., approximate nearest neighbor search) to handle large-scale knowledge bases.
Robustness and Error Handling:

Input Validation: Implement input validation to handle unexpected user queries or malformed data.
Error Handling: Add error handling mechanisms to gracefully handle potential issues like network errors, PDF parsing failures, or LLM generation errors.
Monitoring and Evaluation:

Logging: Implement logging to track system performance, errors, and user interactions.
Metrics: Define and track relevant metrics (e.g., accuracy, latency, retrieval success rate) to monitor the system's effectiveness and identify areas for improvement.
Security and Privacy:

Data Security: Securely store sensitive information in the knowledge base and protect against unauthorized access.
Privacy: Ensure that the system complies with relevant privacy regulations and protects user data.
User Interface:

Integration: Integrate the RAG/CAG system into a user-friendly interface (e.g., a chatbot, a search engine) to allow users to interact with it easily.
Feedback Mechanisms: Provide feedback mechanisms for users to report issues or suggest improvements.
Continuous Improvement:

Regularly Update Knowledge Base: Establish a process for updating the knowledge base with new information to keep it relevant and accurate.
Model Fine-tuning: Periodically fine-tune the LLM on new data or user feedback to improve its performance and adapt to changing user needs.
Remember that this is not an exhaustive list, and the specific considerations will depend on the requirements of your production environment. By addressing these aspects as you move beyond the POC stage, you'll be building a more robust, reliable, and user-friendly RAG/CAG system.

I'm glad that the current POC is serving its purpose, and I'm happy to help further as you progress towards a production-ready solution! Let me know if you have any other questions or if you want to discuss any of these production considerations in more detail.