# RAG LLM Integration

This notebook builds on `01_basic_rag.ipynb` and demonstrates how to integrate a RAG model with a Language Model (LLM) to generate full text responses.

## Usage

Uncomment and set the `ANTHROPIC_API_KEY` environment variable to your Anthropic API key if not set already in your environment.
Set the `ANTRHOPIC_MODEL_NAME` to the model you want to use. The default is `claude-3-5-sonnet-20241022`.



In [7]:
import os
#os.environ['ANTHROPIC_API_KEY'] = ''
ANTHROPIC_MODEL_NAME = "claude-3-5-sonnet-20241022"

This cell is the same as in `01_basic_rag.ipynb` and sets up the text embeddings generator class `LocalEmbeddingGenerator`.  The example function has been removed.

In [8]:
import numpy as np
from nltk.tokenize import word_tokenize
from nltk.corpus import stopwords
from nltk.stem import WordNetLemmatizer
import string
from gensim.models import Word2Vec
import re

class LocalEmbeddingGenerator:
    def __init__(self, embedding_dim=384):  # Using 384 dimensions as it's common for many embedding models
        self.embedding_dim = embedding_dim
        self.word2vec_model = None
        self.lemmatizer = WordNetLemmatizer()
        
        # Download required NLTK data
        import nltk
        nltk.download('punkt')
        nltk.download('stopwords')
        nltk.download('wordnet')
        
        # Initialize stop words
        self.stop_words = set(stopwords.words('english'))
        
    def preprocess_text(self, text):
        """Preprocess the input text."""
        # Convert to lowercase
        text = text.lower()
        
        # Remove special characters and numbers
        text = re.sub(r'[^a-zA-Z\s]', '', text)
        
        # Tokenize
        tokens = word_tokenize(text)
        
        # Remove stop words and lemmatize
        tokens = [self.lemmatizer.lemmatize(token) for token in tokens 
                 if token not in self.stop_words and token not in string.punctuation]
        
        return tokens

    def train_word2vec(self, texts):
        """Train a Word2Vec model on the given texts."""
        # Preprocess all texts
        print(texts)
        processed_texts = [self.preprocess_text(text) for text in texts]
        
        # Train Word2Vec model
        self.word2vec_model = Word2Vec(sentences=processed_texts, 
                                     vector_size=self.embedding_dim,
                                     window=5,
                                     min_count=1,
                                     workers=4)

    def generate_embedding(self, text):
        """Generate embedding for the input text."""
        if self.word2vec_model is None:
            raise ValueError("Word2Vec model not trained. Please train the model first.")
        
        # Preprocess the input text
        tokens = self.preprocess_text(text)
        
        if not tokens:
            return np.zeros(self.embedding_dim)
        
        # Get embeddings for each token
        token_embeddings = []
        for token in tokens:
            try:
                token_embedding = self.word2vec_model.wv[token]
                token_embeddings.append(token_embedding)
            except KeyError:
                continue
        
        if not token_embeddings:
            return np.zeros(self.embedding_dim)
        
        # Average the token embeddings
        final_embedding = np.mean(token_embeddings, axis=0)
        
        return final_embedding

    def format_embedding(self, embedding):
        """Format the embedding vector as a comma-separated list in square brackets."""
        return f"[{','.join(map(str, embedding.tolist()))}]"


The `RAGSystem` class is used to generate the RAG model. It is unchanged from `01_basic_rag.ipynb`.

In preparation for integrating the RAG model with a LLM, we're going to perform some additional preprocessing on the documents. The `EnhancedRAGSystem` class is used to generate the RAG model with the additional preprocessing.

In [9]:
import numpy as np
from sklearn.metrics.pairwise import cosine_similarity
from typing import List, Dict, Tuple
import json

class RAGSystem:
    def __init__(self, embedding_dim=384):
        self.embedding_generator = LocalEmbeddingGenerator(embedding_dim)
        self.document_store: List[Dict] = []
        self.document_embeddings: List[np.ndarray] = []

    def add_documents(self, documents: List[str], metadata: List[Dict] = None):
        """
        Add documents to the RAG system and generate their embeddings.
        
        Args:
            documents: List of document texts
            metadata: Optional list of metadata dictionaries for each document
        """
        # Train the embedding model on all documents
        self.embedding_generator.train_word2vec(documents)

        # Generate embeddings and store documents
        for i, doc in enumerate(documents):
            embedding = self.embedding_generator.generate_embedding(doc)
            
            doc_entry = {
                'id': len(self.document_store),
                'text': doc,
                'metadata': metadata[i] if metadata else {}
            }
            
            self.document_store.append(doc_entry)
            self.document_embeddings.append(embedding)

    def find_similar_documents(self, query: str, k: int = 3) -> List[Tuple[Dict, float]]:
        """
        Find the k most similar documents to the query.
        
        Args:
            query: The search query
            k: Number of documents to retrieve
            
        Returns:
            List of tuples containing (document, similarity_score)
        """
        # Generate embedding for the query
        query_embedding = self.embedding_generator.generate_embedding(query)
        
        # Calculate similarities
        similarities = cosine_similarity(
            [query_embedding],
            self.document_embeddings
        )[0]
        
        # Get top k similar documents
        top_k_indices = np.argsort(similarities)[-k:][::-1]
        
        results = []
        for idx in top_k_indices:
            results.append((self.document_store[idx], similarities[idx]))
            
        return results

    def save_to_disk(self, filepath: str):
        """Save the RAG system to disk."""
        data = {
            'documents': self.document_store,
            'embeddings': [emb.tolist() for emb in self.document_embeddings]
        }
        
        with open(filepath, 'w') as f:
            json.dump(data, f)

    def load_from_disk(self, filepath: str):
        """Load the RAG system from disk."""
        with open(filepath, 'r') as f:
            data = json.load(f)
            
        self.document_store = data['documents']
        self.document_embeddings = [np.array(emb) for emb in data['embeddings']]

    
# Enhanced RAG system with chunking and context window management
class EnhancedRAGSystem(RAGSystem):
    def __init__(self, embedding_dim=384, chunk_size=200, chunk_overlap=50):
        super().__init__(embedding_dim)
        self.chunk_size = chunk_size
        self.chunk_overlap = chunk_overlap

    def chunk_document(self, text: str) -> List[str]:
        """Split document into overlapping chunks."""
        words = text.split()
        chunks = []
        
        for i in range(0, len(words), self.chunk_size - self.chunk_overlap):
            chunk = ' '.join(words[i:i + self.chunk_size])
            chunks.append(chunk)
            
        return chunks

    def add_document_with_chunking(self, document: str, metadata: Dict = None):
        """Add a single document with chunking."""
        chunks = self.chunk_document(document)
        
        for i, chunk in enumerate(chunks):
            chunk_metadata = metadata.copy() if metadata else {}
            chunk_metadata.update({
                'chunk_index': i,
                'total_chunks': len(chunks),
                'original_document': document[:100] + '...'  # Store first 100 chars as reference
            })
            
            self.add_documents([chunk], [chunk_metadata])

    def generate_context(self, query: str, max_tokens: int = 1000) -> str:
        """Generate context for the query by combining relevant chunks."""
        similar_docs = self.find_similar_documents(query, k=3)
        
        context = []
        current_tokens = 0
        
        for doc, similarity in similar_docs:
            doc_text = doc['text']
            estimated_tokens = len(doc_text.split())
            
            if current_tokens + estimated_tokens <= max_tokens:
                context.append(f"[Source: {doc['metadata'].get('source', 'Unknown')}]\n{doc_text}")
                current_tokens += estimated_tokens
            else:
                break
                
        return "\n\n".join(context)

# Example usage of enhanced RAG system
def enhanced_rag_example():
    # Initialize enhanced RAG system
    enhanced_rag = EnhancedRAGSystem()
    
    # Long document example
    long_documents = [
        """
        Artificial Intelligence (AI) is revolutionizing various industries. It encompasses 
        machine learning, which allows systems to learn from data and improve their performance 
        over time. Deep learning, a subset of machine learning, uses neural networks with multiple 
        layers to process complex patterns in data. Natural Language Processing (NLP) is another 
        crucial component of AI that focuses on enabling computers to understand and process 
        human language effectively. NLP applications include machine translation, sentiment 
        analysis, and chatbots. Computer vision, another AI domain, allows machines to interpret 
        and analyze visual information from the world, enabling applications like facial 
        recognition and autonomous vehicles.
        """
        ,
        """
        AI has numerous applications across various sectors, including healthcare, finance,
        education, and entertainment. In healthcare, AI is used for tasks like disease diagnosis,
        personalized treatment planning, and drug discovery. Financial institutions leverage AI
        for fraud detection, algorithmic trading, and customer service chatbots. Educational
        institutions use AI for personalized learning experiences and improving student outcomes.
        The entertainment industry uses AI for content recommendation, personalized marketing, and
        creating immersive gaming experiences.
        """
    ]
    
    # Add the long document with chunking
    enhanced_rag.add_document_with_chunking(
        long_documents[0],
        metadata={'source': 'AI_textbook', 'topic': 'AI_overview'}
    )
    enhanced_rag.add_document_with_chunking(
        long_documents[1],
        metadata={'source': 'Copilot_Autocomplete', 'topic': 'AI_Uses'}
    )
    
    # Example query and context generation
    query = "What is natural language processing and its applications?"
    context = enhanced_rag.generate_context(query)
    
    print(f"Query: {query}\n")
    print("Generated Context:")
    print(context)

enhanced_rag_example()

['Artificial Intelligence (AI) is revolutionizing various industries. It encompasses machine learning, which allows systems to learn from data and improve their performance over time. Deep learning, a subset of machine learning, uses neural networks with multiple layers to process complex patterns in data. Natural Language Processing (NLP) is another crucial component of AI that focuses on enabling computers to understand and process human language effectively. NLP applications include machine translation, sentiment analysis, and chatbots. Computer vision, another AI domain, allows machines to interpret and analyze visual information from the world, enabling applications like facial recognition and autonomous vehicles.']
['AI has numerous applications across various sectors, including healthcare, finance, education, and entertainment. In healthcare, AI is used for tasks like disease diagnosis, personalized treatment planning, and drug discovery. Financial institutions leverage AI for f

[nltk_data] Downloading package punkt to /home/craig/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package stopwords to /home/craig/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
[nltk_data] Downloading package wordnet to /home/craig/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!


`RAGPromptGenerator` is a class that generates prompts for the LLM based on the RAG model's output. 

`rag_prompt_example` shows how prompts are generated for the LLM including the RAG model's output.

1. Initalize the `EnhancedRAGSystem` class.
2. Setup test documents for the RAG model.
3. Load the test documents.
4. Initialize the `RAGPromptGenerator` class.
5. Generate test prompts.
6. Generate LLM Queries.

In [10]:
class RAGPromptGenerator:
    def __init__(self, rag_system):
        self.rag = rag_system
    
    def generate_prompt(self, query: str, system_prompt: str = None) -> str:
        """Generate a prompt for the LLM using retrieved context."""
        context = self.rag.generate_context(query)
        
        prompt = f"""
        System: {system_prompt or 'You are a helpful AI assistant. Use the provided context to answer questions accurately. If the context does not contain relevant information, say so.'}
        
        Context:
        {context}
        
        Human: {query}
        
        Assistant: Based on the provided context, I'll help answer your question.
        """
        
        return prompt.strip()

def rag_prompt_example():
    # Initialize the RAG system
    rag = EnhancedRAGSystem()
    
    # Sample knowledge base
    documents = [
        {
            'text': """
            Machine learning algorithms can be categorized into supervised learning, 
            unsupervised learning, and reinforcement learning. Supervised learning 
            uses labeled data to train models, while unsupervised learning finds 
            patterns in unlabeled data.
            """,
            'metadata': {
                'source': 'ML_textbook',
                'topic': 'machine_learning_basics'
            }
        },
        {
            'text': """
            Deep learning architectures include convolutional neural networks (CNNs) 
            for image processing and recurrent neural networks (RNNs) for sequential 
            data. Transformers have become popular for natural language processing tasks.
            """,
            'metadata': {
                'source': 'DL_guide',
                'topic': 'deep_learning_architectures'
            }
        }
    ]
    
    # Add documents with chunking
    for doc in documents:
        rag.add_document_with_chunking(doc['text'], doc['metadata'])
    
    # Initialize prompt generator
    prompt_generator = RAGPromptGenerator(rag)
    
    # Example queries
    queries = [
        "What are the main types of machine learning?",
        "Explain deep learning architectures",
        "What is the difference between supervised and unsupervised learning?"
    ]
    
    # Generate prompts for each query
    for query in queries:
        print("\n" + "="*50)
        print(f"Query: {query}")
        prompt = prompt_generator.generate_prompt(
            query,
            system_prompt="You are an AI expert. Provide detailed technical explanations."
        )
        print("\nGenerated Prompt:")
        print(prompt)

rag_prompt_example()

['Machine learning algorithms can be categorized into supervised learning, unsupervised learning, and reinforcement learning. Supervised learning uses labeled data to train models, while unsupervised learning finds patterns in unlabeled data.']
['Deep learning architectures include convolutional neural networks (CNNs) for image processing and recurrent neural networks (RNNs) for sequential data. Transformers have become popular for natural language processing tasks.']

Query: What are the main types of machine learning?

Generated Prompt:
System: You are an AI expert. Provide detailed technical explanations.
        
        Context:
        [Source: DL_guide]
Deep learning architectures include convolutional neural networks (CNNs) for image processing and recurrent neural networks (RNNs) for sequential data. Transformers have become popular for natural language processing tasks.

[Source: ML_textbook]
Machine learning algorithms can be categorized into supervised learning, unsupervise

[nltk_data] Downloading package punkt to /home/craig/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package stopwords to /home/craig/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
[nltk_data] Downloading package wordnet to /home/craig/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!


Finally time to integrate the RAG model with the LLM. The `LLMInterface` is a base class we'll implement for the specific LLM we're using. In this case, we're using Anthropic's Claude. The `ClaudeInterface` class is a subclass of `LLMInterface` that implements the specific methods for Claude.

In [11]:
import anthropic
import openai
from typing import Optional
import os

class LLMInterface:
    """Base class for LLM interactions"""
    def generate_response(self, prompt: str) -> str:
        raise NotImplementedError

class ClaudeInterface(LLMInterface):
    def __init__(self, api_key: Optional[str] = None):
        self.api_key = api_key or os.getenv("ANTHROPIC_API_KEY")
        self.client = anthropic.Anthropic(api_key=self.api_key)
    
    def generate_response(self, prompt: str) -> str:
        try:
            message = self.client.messages.create(
                model=ANTHROPIC_MODEL_NAME,
                max_tokens=1024,
                messages=[{
                    "role": "user",
                    "content": prompt
                }]
            )
            return message.content[0].text
        except Exception as e:
            print(f"Error generating response from Claude: {str(e)}")
            return ""



    `RAGWithLLM` is a class that integrates the RAG model with the LLM. It uses the `RAGSystem` and `ClaudeInterface` classes to generate full text responses.  The `query` method takes the user query and generates a prompt including the RAG model's output and passes it to the LLM to generate a full text response.

    The `example_rag_llm` function demonstrates how to use the `RAGWithLLM` class to generate full text responses.

In [12]:
class RAGWithLLM:
    def __init__(self, rag_system, llm_interface: LLMInterface):
        self.rag = rag_system
        self.llm = llm_interface
        self.prompt_generator = RAGPromptGenerator(rag_system)
    
    def query(self, user_query: str, system_prompt: Optional[str] = None) -> str:
        # Generate RAG-enhanced prompt
        enhanced_prompt = self.prompt_generator.generate_prompt(
            user_query,
            system_prompt
        )
        
        # Get LLM response
        response = self.llm.generate_response(enhanced_prompt)
        
        return response

# Example usage:
def example_rag_llm():
    # Initialize RAG system
    rag_system = EnhancedRAGSystem()
    
    # Sample documents
    documents = [
        {
            'text': """
            Python is a high-level programming language known for its simplicity and readability.
            It supports multiple programming paradigms including procedural, object-oriented,
            and functional programming.
            """,
            'metadata': {'source': 'python_docs', 'topic': 'python_basics'}
        },
        {
            'text': """
            Machine learning is a subset of artificial intelligence that enables systems
            to learn and improve from experience without being explicitly programmed.
            """,
            'metadata': {'source': 'ml_guide', 'topic': 'ml_basics'}
        }
    ]
    
    # Add documents to RAG system
    for doc in documents:
        rag_system.add_document_with_chunking(doc['text'], doc['metadata'])
    
    # Initialize LLM interface (choose either Claude or ChatGPT)
    llm_interface = ClaudeInterface()  # or ChatGPTInterface()
    
    # Initialize RAG with LLM
    rag_with_llm = RAGWithLLM(rag_system, llm_interface)
    
    # Example queries
    queries = [
        "What is Python programming language?",
        "Explain machine learning concepts",
    ]
    
    # Process queries
    for query in queries:
        print(f"\nQuery: {query}")
        response = rag_with_llm.query(
            query,
            system_prompt="You are an expert programmer and AI researcher. Provide detailed explanations."
        )
        print(f"\nResponse: {response}")


example_rag_llm()

['Python is a high-level programming language known for its simplicity and readability. It supports multiple programming paradigms including procedural, object-oriented, and functional programming.']
['Machine learning is a subset of artificial intelligence that enables systems to learn and improve from experience without being explicitly programmed.']

Query: What is Python programming language?
Error generating response from Claude: "Could not resolve authentication method. Expected either api_key or auth_token to be set. Or for one of the `X-Api-Key` or `Authorization` headers to be explicitly omitted"

Response: 

Query: Explain machine learning concepts
Error generating response from Claude: "Could not resolve authentication method. Expected either api_key or auth_token to be set. Or for one of the `X-Api-Key` or `Authorization` headers to be explicitly omitted"

Response: 


[nltk_data] Downloading package punkt to /home/craig/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package stopwords to /home/craig/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
[nltk_data] Downloading package wordnet to /home/craig/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!
