# PDF Question Answering with Local Models and InterSystems IRIS

This notebook demonstrates how to build a **fully local** Retrieval-Augmented Generation (RAG) system using:
- **InterSystems IRIS** as the vector database
- **Sentence Transformers** for text embeddings
- **Local Transformer Models** for question answering (no external API needed)
- **PyTorch** for local model execution

## Key Advantages of Local QA
✅ **Privacy**: All processing happens locally - no data sent to external APIs  
✅ **Cost**: No API costs after initial setup  
✅ **Speed**: Fast inference once models are loaded  
✅ **Offline**: Works without internet connection  

## Workshop Overview
We'll process PDF documents, store them as vectors in IRIS, and enable natural language querying using entirely local models.

---

## 1. Setup and Dependencies

First, let's import all required libraries for our local RAG pipeline:

In [1]:
# Import required libraries for local RAG pipeline
import irisnative                                  # InterSystems IRIS native database connection
import os                                          # Operating system interface
import sentence_transformers                       # Text embedding models
import numpy as np                                 # Numerical computations

# Document processing libraries
from langchain_text_splitters import RecursiveCharacterTextSplitter  # Text chunking
from langchain_community.document_loaders import PyPDFDirectoryLoader # PDF loading

# Local AI model libraries - these run entirely on your machine
import torch                                       # PyTorch for deep learning
from transformers import pipeline                  # Hugging Face transformers pipeline

print("📦 All libraries imported successfully")
print(f"🔥 PyTorch using device: {torch.cuda.get_device_name() if torch.cuda.is_available() else 'CPU'}")

📦 All libraries imported successfully
🔥 PyTorch using device: CPU


## 2. Database Connection

Let's establish our connection to the InterSystems IRIS database where we'll store our document vectors.

In [2]:
# Database connection parameters
# These should match your InterSystems IRIS instance configuration
connection_string = "iris:1972/LLMRAG"  # host:port/namespace
username = "superuser"
password = "SYS"

# Establish connection to InterSystems IRIS database
# This creates both a connection and a cursor for executing SQL commands
connectionIRIS = irisnative.createConnection(connection_string, username, password)
cursorIRIS = connectionIRIS.cursor()
print("✅ Connected to InterSystems IRIS database")

✅ Connected to InterSystems IRIS database


## 3. Local Question-Answering Model Setup

We'll load a high-quality local QA model that can answer questions based on provided context. This model runs entirely on your machine without external API calls.

In [3]:
# Load local question-answering model
# timpal0l/mdeberta-v3-base-squad2 is a fine-tuned model excellent for extractive QA
# It can find specific answers within provided text context
print("📥 Loading local question-answering model...")
print("   Model: mdeberta-v3-base-squad2 (optimized for multilingual QA)")

qa_model = pipeline(
    "question-answering", 
    "timpal0l/mdeberta-v3-base-squad2",
    # Use GPU if available for faster inference
    device=0 if torch.cuda.is_available() else -1
)

print("✅ Local QA model loaded successfully")
print("💡 This model runs entirely offline - no internet required for inference!")

📥 Loading local question-answering model...
   Model: mdeberta-v3-base-squad2 (optimized for multilingual QA)


config.json:   0%|          | 0.00/879 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.11G [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/453 [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/16.3M [00:00<?, ?B/s]

added_tokens.json:   0%|          | 0.00/23.0 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/173 [00:00<?, ?B/s]

Fetching 0 files: 0it [00:00, ?it/s]

Fetching 1 files:   0%|          | 0/1 [00:00<?, ?it/s]

Fetching 0 files: 0it [00:00, ?it/s]

Device set to use cpu


✅ Local QA model loaded successfully
💡 This model runs entirely offline - no internet required for inference!


## 4. Embedding Model Setup

We need a sentence transformer model to convert text into numerical vectors (embeddings) for semantic similarity searches.

In [4]:
# Check if the embedding model is already downloaded and saved locally
# This saves time and bandwidth by avoiding re-downloads
if not os.path.isdir('/app/data/model/'):
    print("📥 Downloading and saving embedding model...")
    # paraphrase-multilingual-MiniLM-L12-v2 is excellent for multilingual semantic similarity
    # It's lightweight but effective for most RAG applications
    modelEmbedding = sentence_transformers.SentenceTransformer('sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2')
    modelEmbedding.save('/app/data/model/')
    print("✅ Embedding model saved to local directory")
else:
    print("✅ Embedding model already available locally")

✅ Embedding model already available locally


## 5. Document Processing and Vector Storage

This is the core of our local RAG system: we'll load PDF documents, split them into chunks, create embeddings, and store everything in InterSystems IRIS.

In [6]:
# Configure text splitting strategy
# Smaller chunks (700 chars) ensure focused, relevant context retrieval
# Overlap (50 chars) prevents important information from being split across chunks
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=700,      # Maximum characters per chunk
    chunk_overlap=50,    # Characters to overlap between adjacent chunks
)

# Load all PDF documents from the data directory
path = "/app/data"
loader = PyPDFDirectoryLoader(path)
print("📖 Loading PDF documents...")
docs_before_split = loader.load()
print(f"✅ Loaded {len(docs_before_split)} document pages")

# Split documents into smaller, manageable chunks
print("✂️ Splitting documents into chunks...")
docs_after_split = text_splitter.split_documents(docs_before_split)
print(f"✅ Created {len(docs_after_split)} text chunks")

# Load the embedding model from local storage
print("🔄 Loading embedding model...")
modelEmbedding = sentence_transformers.SentenceTransformer("/app/data/model/")
print("✅ Embedding model loaded")

# Process each document chunk: create embeddings and store in IRIS database
print("💾 Processing chunks and storing in database...")
for i, doc in enumerate(docs_after_split):
    # Generate embeddings for the text content
    # normalize_embeddings=True ensures consistent vector magnitudes for dot product similarity
    embeddings = modelEmbedding.encode(doc.page_content, normalize_embeddings=True)
    
    # Convert to numpy array and format for database storage
    array = np.array(embeddings)
    formatted_array = np.vectorize('{:.12f}'.format)(array)  # 12 decimal precision
    
    # Prepare parameters for database insertion
    parameters = [
        doc.metadata['source'],                    # Source PDF file path
        str(doc.page_content),                     # Actual text content
        str(','.join(formatted_array))             # Comma-separated vector values
    ]
    
    # Insert into IRIS database with vector storage
    # TO_VECTOR() converts comma-separated string to IRIS vector format
    cursorIRIS.execute(
        "INSERT INTO LLMRAG.DOCUMENTCHUNK (Document, Phrase, VectorizedPhrase) VALUES (?, ?, TO_VECTOR(?,DECIMAL))", 
        parameters
    )
    
    # Show progress every 10 chunks
    if (i + 1) % 10 == 0:
        print(f"  Processed {i + 1}/{len(docs_after_split)} chunks")

# Commit all changes to the database
connectionIRIS.commit()
print(f"✅ Successfully stored {len(docs_after_split)} chunks in InterSystems IRIS database")

📖 Loading PDF documents...
✅ Loaded 51 document pages
✂️ Splitting documents into chunks...
✅ Created 239 text chunks
🔄 Loading embedding model...
✅ Embedding model loaded
💾 Processing chunks and storing in database...
  Processed 10/239 chunks
  Processed 20/239 chunks
  Processed 30/239 chunks
  Processed 40/239 chunks
  Processed 50/239 chunks
  Processed 60/239 chunks
  Processed 70/239 chunks
  Processed 80/239 chunks
  Processed 90/239 chunks
  Processed 100/239 chunks
  Processed 110/239 chunks
  Processed 120/239 chunks
  Processed 130/239 chunks
  Processed 140/239 chunks
  Processed 150/239 chunks
  Processed 160/239 chunks
  Processed 170/239 chunks
  Processed 180/239 chunks
  Processed 190/239 chunks
  Processed 200/239 chunks
  Processed 210/239 chunks
  Processed 220/239 chunks
  Processed 230/239 chunks
✅ Successfully stored 239 chunks in InterSystems IRIS database


## 6. Query Processing and Similarity Search

Now we'll demonstrate how to query our vector database. We'll convert a question into an embedding and find the most relevant document chunks.

In [7]:
# Example question in Spanish (the model supports multiple languages)
literalQuestion = "¿Qué medicamento puede tomar mi hijo de 2 años para bajar la fiebre?"

print(f"🔍 Processing question: '{literalQuestion}'")

# Convert the question into an embedding using the same model
# This ensures semantic similarity with our stored document embeddings
question_embedding = modelEmbedding.encode(literalQuestion, normalize_embeddings=True)

# Format the question embedding for database query
array = np.array(question_embedding)
formatted_array = np.vectorize('{:.12f}'.format)(array)
parameter_query = [str(','.join(formatted_array))]

# Perform similarity search in InterSystems IRIS
# VECTOR_DOT_PRODUCT calculates similarity between question and document vectors
# Similarity > 0.6 threshold filters for highly relevant documents
print("🔍 Searching for relevant documents...")
cursorIRIS.execute("""
    SELECT DISTINCT(Document), MAX(similarity) as max_similarity
    FROM (
        SELECT VECTOR_DOT_PRODUCT(VectorizedPhrase, TO_VECTOR(?, DECIMAL)) AS similarity, 
               Document 
        FROM LLMRAG.DOCUMENTCHUNK
    ) 
    WHERE similarity > 0.6 
    GROUP BY Document
    ORDER BY max_similarity DESC
""", parameter_query)

similarity_rows = cursorIRIS.fetchall()
print(f"✅ Found {len(similarity_rows)} relevant documents")

# Display the relevant documents and their similarity scores
for doc_path, similarity in similarity_rows:
    print(f"  📄 {doc_path} (similarity: {similarity})")

🔍 Processing question: '¿Qué medicamento puede tomar mi hijo de 2 años para bajar la fiebre?'
🔍 Searching for relevant documents...
✅ Found 1 relevant documents
  📄 /APP/DATA/PROSPECTO_69726.HTML.PDF (similarity: .6020914157820870586)


### Debug: View Retrieved Documents

Let's inspect the similarity search results to understand what documents were found:

In [9]:
# Display the similarity search results for debugging
# This helps us understand which documents were retrieved and their relevance scores
print("🔍 Similarity search results:")
for i, row in enumerate(similarity_rows, 1):
    print(f"  {i}. Document: {row[0]}")
    if len(row) > 1:  # If similarity score is available
        print(f"     Similarity: {row[1]}")
    print()

similarity_rows

🔍 Similarity search results:
  1. Document: /APP/DATA/PROSPECTO_69726.HTML.PDF
     Similarity: .6020914157820870586



[Row(Document='/APP/DATA/PROSPECTO_69726.HTML.PDF', max_similarity='.6020914157820870586')]

## 7. Local Question Answering

Now we'll use our local QA model to answer questions based on the retrieved context. This demonstrates the power of completely local AI processing.

In [10]:
# Build context from relevant documents
# We'll concatenate the full text of documents that matched our similarity search
context = ''
print("📚 Building context from relevant documents...")

for similarity_row in similarity_rows:
    document_path = similarity_row[0]
    print(f"  Adding content from: {document_path}")
    
    # Find the original document that matches this path
    for doc in docs_before_split:
        if similarity_row[0] == doc.metadata['source'].upper():
            context += doc.page_content + "\n\n"  # Add spacing between documents

print(f"✅ Context built with {len(context)} characters")

# Example question about the medicine name
example_question = "¿Cómo se llama el medicamento descrito en el prospecto?"

print("=" * 60)
print("🤖 LOCAL QUESTION ANSWERING DEMONSTRATION")
print("=" * 60)
print(f"📝 Question: {example_question}")
print(f"📄 Context length: {len(context)} characters")
print(f"🧠 Using local model: mdeberta-v3-base-squad2")
print()

# Use the local QA model to answer the question
# This runs entirely on your machine - no external API calls!
result = qa_model(question=example_question, context=context)

print("🎯 ANSWER DETAILS:")
print(f"   Answer: {result['answer']}")
print(f"   Confidence Score: {result['score']:.4f}")
print(f"   Answer Position: characters {result['start']}-{result['end']}")
print("=" * 60)

# Display the result for inspection
result

📚 Building context from relevant documents...
  Adding content from: /APP/DATA/PROSPECTO_69726.HTML.PDF
✅ Context built with 33004 characters
🤖 LOCAL QUESTION ANSWERING DEMONSTRATION
📝 Question: ¿Cómo se llama el medicamento descrito en el prospecto?
📄 Context length: 33004 characters
🧠 Using local model: mdeberta-v3-base-squad2

🎯 ANSWER DETAILS:
   Answer: 
Dalsy 40 mg/ml suspensión oral 
ibuprofeno
   Confidence Score: 0.5234
   Answer Position: characters 54-97


{'score': 0.523392615839839,
 'start': 54,
 'end': 97,
 'answer': '\nDalsy 40 mg/ml suspensión oral \nibuprofeno'}

## 8. Interactive Question Testing

Let's create a simple function to test different questions with our local RAG system:

In [11]:
def ask_local_question(question, similarity_threshold=0.6):
    """
    Ask a question to our local RAG system
    
    Args:
        question (str): The question to ask
        similarity_threshold (float): Minimum similarity score for document retrieval
    
    Returns:
        dict: Answer with confidence score and metadata
    """
    print(f"🔍 Question: {question}")
    
    # Convert question to embedding
    question_embedding = modelEmbedding.encode(question, normalize_embeddings=True)
    array = np.array(question_embedding)
    formatted_array = np.vectorize('{:.12f}'.format)(array)
    parameter_query = [str(','.join(formatted_array))]
    
    # Search for relevant documents
    cursorIRIS.execute(f"""
        SELECT DISTINCT(Document), MAX(similarity) as max_similarity
        FROM (
            SELECT VECTOR_DOT_PRODUCT(VectorizedPhrase, TO_VECTOR(?, DECIMAL)) AS similarity, 
                   Document 
            FROM LLMRAG.DOCUMENTCHUNK
        ) 
        WHERE similarity > {similarity_threshold}
        GROUP BY Document
        ORDER BY max_similarity DESC
    """, parameter_query)
    
    results = cursorIRIS.fetchall()
    print(f"   Found {len(results)} relevant documents")
    
    if not results:
        return {"error": "No relevant documents found. Try lowering the similarity threshold."}
    
    # Build context
    context = ''
    for result in results:
        for doc in docs_before_split:
            if result[0] == doc.metadata['source'].upper():
                context += doc.page_content + "\n\n"
    
    # Get answer from local QA model
    answer = qa_model(question=question, context=context)
    
    print(f"   Answer: {answer['answer']}")
    print(f"   Confidence: {answer['score']}")
    print()
    
    return answer

# Test with different questions
test_questions = [
    "¿Qué medicamento puede tomar mi hijo de 2 años para bajar la fiebre?",
    "¿Cuál es la dosis recomendada?", 
    "¿Cuáles son los efectos secundarios?",
    "¿Cómo se debe almacenar este medicamento?"
]

print("🧪 TESTING LOCAL QA SYSTEM WITH MULTIPLE QUESTIONS")
print("=" * 60)

for i, question in enumerate(test_questions, 1):
    print(f"\n{i}. ", end="")
    result = ask_local_question(question)
    if "error" not in result:
        print(f"   ✅ Successfully answered with {result['score']} confidence")
    else:
        print(f"   ❌ {result['error']}")

print("\n💡 Try modifying the questions or similarity threshold to experiment!")

🧪 TESTING LOCAL QA SYSTEM WITH MULTIPLE QUESTIONS

1. 🔍 Question: ¿Qué medicamento puede tomar mi hijo de 2 años para bajar la fiebre?
   Found 1 relevant documents
   Answer: 
ibuprofeno
   Confidence: 0.07409860172720073

   ✅ Successfully answered with 0.07409860172720073 confidence

2. 🔍 Question: ¿Cuál es la dosis recomendada?
   Found 4 relevant documents
   Answer:  1 comprimido (25 mg) al día,
   Confidence: 0.7944133747369051

   ✅ Successfully answered with 0.7944133747369051 confidence

3. 🔍 Question: ¿Cuáles son los efectos secundarios?
   Found 3 relevant documents
   Answer:  Posibles efectos adversos
   Confidence: 0.7938088556693401

   ✅ Successfully answered with 0.7938088556693401 confidence

4. 🔍 Question: ¿Cómo se debe almacenar este medicamento?
   Found 3 relevant documents
   Answer:  en su envase original.
   Confidence: 0.5308870549779385

   ✅ Successfully answered with 0.5308870549779385 confidence

💡 Try modifying the questions or similarity threshold to ex

## 9. Cleanup and Summary

In [12]:
# Close the database connection to free up resources
connectionIRIS.close()
print("✅ Database connection closed successfully")

print("\n🎉 Local RAG Workshop completed successfully!")
print("\n🏆 What we accomplished with LOCAL MODELS ONLY:")
print("✅ Connected to InterSystems IRIS vector database")
print("✅ Loaded PDF documents and created text chunks") 
print("✅ Generated embeddings using local sentence transformers")
print("✅ Stored document vectors in IRIS for fast similarity search")
print("✅ Performed semantic search for relevant content")
print("✅ Generated answers using local transformer QA model")
print("✅ Created an interactive QA function for testing")

print("\n🔒 PRIVACY & PERFORMANCE BENEFITS:")
print("• No data sent to external APIs - complete privacy")
print("• No API costs - runs entirely on your infrastructure")
print("• Fast local inference after model loading")
print("• Works offline without internet connectivity")
print("• Full control over model selection and parameters")

print("\n💡 Next steps: Try different questions, adjust similarity thresholds, or experiment with other local models!")

✅ Database connection closed successfully

🎉 Local RAG Workshop completed successfully!

🏆 What we accomplished with LOCAL MODELS ONLY:
✅ Connected to InterSystems IRIS vector database
✅ Loaded PDF documents and created text chunks
✅ Generated embeddings using local sentence transformers
✅ Stored document vectors in IRIS for fast similarity search
✅ Performed semantic search for relevant content
✅ Generated answers using local transformer QA model
✅ Created an interactive QA function for testing

🔒 PRIVACY & PERFORMANCE BENEFITS:
• No data sent to external APIs - complete privacy
• No API costs - runs entirely on your infrastructure
• Fast local inference after model loading
• Works offline without internet connectivity
• Full control over model selection and parameters

💡 Next steps: Try different questions, adjust similarity thresholds, or experiment with other local models!


---

## 🚀 Advanced Experiments and Extensions

### 1. **Model Comparisons**
Try different local QA models by changing the pipeline initialization:
```python
# Alternative local QA models to experiment with:
models_to_try = [
    "deepset/roberta-base-squad2",              # English-focused, very fast
    "deepset/xlm-roberta-large-squad2",         # Better multilingual support
    "microsoft/DialoGPT-medium",                # Conversational responses
    "distilbert-base-cased-distilled-squad"     # Lightweight and fast
]
```

### 2. **Performance Optimization**
```python
# Enable model optimization for production
qa_model = pipeline(
    "question-answering", 
    "timpal0l/mdeberta-v3-base-squad2",
    device=0,  # Use GPU
    torch_dtype=torch.float16,  # Half precision for speed
    model_kwargs={"low_cpu_mem_usage": True}
)
```

### 3. **Hybrid Search Enhancement**
Combine semantic and keyword search:
```python
# Add keyword matching to improve retrieval
from langchain_community.retrievers import BM25Retriever
from langchain.retrievers import EnsembleRetriever
```

### 4. **Confidence Thresholds**
Implement answer quality filtering:
```python
def get_confident_answer(question, min_confidence=0.3):
    result = qa_model(question=question, context=context)
    if result['score'] < min_confidence:
        return "I'm not confident enough to answer this question."
    return result['answer']
```

### 5. **Multi-language Support**
Test the system with questions in different languages:
```python
multilingual_questions = [
    "What is the recommended dosage?",  # English
    "¿Cuál es la dosis recomendada?",   # Spanish  
    "Quelle est la posologie recommandée?",  # French
    "Qual è il dosaggio raccomandato?"  # Italian
]
```

### 📚 **Educational Notes**

**Why Local Models?**
- **Data Security**: Medical documents often contain sensitive information
- **Compliance**: Meet GDPR, HIPAA, and other privacy requirements
- **Cost Control**: No per-query API costs for high-volume usage
- **Customization**: Fine-tune models on your specific domain data

**Model Selection Guide:**
- **Speed Priority**: DistilBERT-based models
- **Accuracy Priority**: DeBERTa or RoBERTa-large models  
- **Multilingual**: XLM-RoBERTa models
- **Domain-specific**: Fine-tune on your documents