# RAG System Demo

This notebook demonstrates the complete RAG (Retrieval-Augmented Generation) system with:
- Creating collections
- Uploading PDF documents
- Querying with semantic search
- Document management

## Prerequisites

Make sure both servers are running:
```bash
bash start_servers.sh
```

Or start them separately:
```bash
python tools_server.py  # Port 10006
python server.py        # Port 10007
```

## Setup and Configuration

In [None]:
import requests
import json
from pathlib import Path
from IPython.display import display, Markdown, HTML

# Server configuration
MAIN_SERVER = "http://localhost:10007"  # Change to your server IP if needed: "http://10.198.112.203:10007"
TOOLS_SERVER = "http://localhost:10006"  # Change to your server IP if needed: "http://10.198.112.203:10006"
BASE_URL = f"{TOOLS_SERVER}/api/tools"

# Test credentials (create user first or use existing credentials)
USERNAME = "admin"
PASSWORD = "administrator"

print("‚úì Configuration loaded")
print(f"  Main Server: {MAIN_SERVER}")
print(f"  Tools Server: {TOOLS_SERVER}")

## Step 1: Authentication

First, we need to authenticate and get a JWT token.

In [None]:
# Login to get JWT token
response = requests.post(
    f"{MAIN_SERVER}/api/auth/login",
    json={
        "username": USERNAME,
        "password": PASSWORD
    }
)

if response.status_code == 200:
    token = response.json()["access_token"]
    headers = {"Authorization": f"Bearer {token}"}
    print(f"‚úì Login successful!")
    print(f"  Token: {token[:30]}...")
else:
    print(f"‚úó Login failed: {response.status_code}")
    print(f"  Error: {response.text}")
    raise Exception("Authentication failed")

## Step 2: Generate Sample PDF Documents

We'll create multiple sample PDFs for testing the multi-file upload feature.

In [None]:
from reportlab.lib.pagesizes import letter
from reportlab.lib.styles import getSampleStyleSheet, ParagraphStyle
from reportlab.lib.units import inch
from reportlab.platypus import SimpleDocTemplate, Paragraph, Spacer, PageBreak
from reportlab.lib.enums import TA_JUSTIFY, TA_CENTER

# Define multiple PDF contents
pdf_contents = [
    {
        "filename": "machine_learning_guide.pdf",
        "title": "Introduction to Machine Learning",
        "sections": [
            {
                "heading": "What is Machine Learning?",
                "content": """Machine Learning (ML) is a subset of Artificial Intelligence (AI) that 
                enables computer systems to learn and improve from experience without being explicitly 
                programmed. ML focuses on the development of computer programs that can access data and 
                use it to learn for themselves. The process of learning begins with observations or data, 
                such as examples, direct experience, or instruction, in order to look for patterns in data 
                and make better decisions in the future."""
            },
            {
                "heading": "Types of Machine Learning",
                "content": """There are three main types of machine learning: Supervised Learning, 
                Unsupervised Learning, and Reinforcement Learning. Supervised learning uses labeled datasets 
                to train algorithms that classify data or predict outcomes accurately. Unsupervised learning 
                uses machine learning algorithms to analyze and cluster unlabeled datasets without human 
                intervention. Reinforcement learning is a behavioral learning model where the algorithm 
                receives feedback from the data analysis and guides the user to the best outcome."""
            },
            {
                "heading": "Common ML Algorithms",
                "content": """Popular machine learning algorithms include Linear Regression for predicting 
                continuous values, Logistic Regression for binary classification, Decision Trees for both 
                classification and regression tasks, Random Forests which are ensemble learning methods, 
                Support Vector Machines (SVM) for classification problems, K-Means Clustering for grouping 
                similar data points, and Neural Networks which form the basis of deep learning."""
            }
        ]
    },
    {
        "filename": "deep_learning_basics.pdf",
        "title": "Deep Learning Fundamentals",
        "sections": [
            {
                "heading": "Introduction to Deep Learning",
                "content": """Deep Learning is a subset of machine learning that uses neural networks with 
                multiple layers (deep neural networks). These networks are inspired by the human brain's 
                structure and function. Deep learning algorithms can automatically learn representations from 
                data such as images, video, or text, without introducing hand-coded rules or human domain 
                knowledge."""
            },
            {
                "heading": "Neural Network Architectures",
                "content": """Popular deep learning architectures include Convolutional Neural Networks (CNNs) 
                for image processing, Recurrent Neural Networks (RNNs) for sequential data, and Transformers 
                for natural language processing. Each architecture is designed for specific types of data and 
                tasks, with unique strengths and characteristics."""
            },
            {
                "heading": "Training Deep Neural Networks",
                "content": """Training deep neural networks involves forward propagation, loss calculation, 
                backpropagation, and parameter updates. Key techniques include gradient descent optimization, 
                learning rate scheduling, batch normalization, dropout for regularization, and data augmentation. 
                Modern frameworks like TensorFlow and PyTorch simplify the training process."""
            }
        ]
    },
    {
        "filename": "ai_applications.pdf",
        "title": "AI Applications in Industry",
        "sections": [
            {
                "heading": "Healthcare Applications",
                "content": """Machine Learning has numerous applications in healthcare including disease 
                diagnosis, drug discovery, and personalized treatment plans. AI systems can analyze medical 
                images, predict patient outcomes, and assist doctors in making informed decisions. Deep learning 
                models have achieved human-level performance in tasks like radiology image analysis."""
            },
            {
                "heading": "Finance and Business",
                "content": """In finance, ML powers fraud detection, algorithmic trading, and credit scoring. 
                E-commerce companies use ML for recommendation systems and customer behavior analysis. Chatbots 
                and virtual assistants improve customer service, while predictive analytics helps businesses 
                make data-driven decisions."""
            },
            {
                "heading": "Autonomous Systems",
                "content": """Autonomous vehicles rely heavily on ML for object detection, path planning, and 
                decision making. Computer vision systems process camera feeds, LIDAR sensors create 3D maps, 
                and reinforcement learning algorithms optimize driving strategies. Natural Language Processing 
                (NLP) applications include chatbots, machine translation, and sentiment analysis."""
            }
        ]
    }
]

# Function to create PDF
def create_pdf(content):
    """Create a PDF from the given content dictionary"""
    doc = SimpleDocTemplate(
        content["filename"], 
        pagesize=letter,
        rightMargin=72, 
        leftMargin=72,
        topMargin=72, 
        bottomMargin=18
    )
    
    # Container for the 'Flowable' objects
    elements = []
    
    # Define styles
    styles = getSampleStyleSheet()
    styles.add(ParagraphStyle(name='Justify', alignment=TA_JUSTIFY))
    styles.add(ParagraphStyle(name='Center', alignment=TA_CENTER, fontSize=18, spaceAfter=30))
    
    # Add title
    title = Paragraph(content["title"], styles['Center'])
    elements.append(title)
    elements.append(Spacer(1, 12))
    
    # Add sections
    for section in content["sections"]:
        # Add heading
        heading = Paragraph(section["heading"], styles['Heading2'])
        elements.append(heading)
        elements.append(Spacer(1, 12))
        
        # Add content
        para = Paragraph(section["content"], styles['Justify'])
        elements.append(para)
        elements.append(Spacer(1, 20))
    
    # Build PDF
    doc.build(elements)
    
    return content["filename"]

# Create all PDFs
pdf_filenames = []
print("Creating multiple PDF documents...\n")

for content in pdf_contents:
    filename = create_pdf(content)
    pdf_filenames.append(filename)
    size_kb = Path(filename).stat().st_size / 1024
    print(f"‚úì Created: {filename}")
    print(f"  Title: {content['title']}")
    print(f"  Size: {size_kb:.2f} KB")
    print(f"  Sections: {len(content['sections'])}\n")

print(f"{'='*60}")
print(f"‚úì All {len(pdf_filenames)} PDF documents created successfully!")
print(f"  Files: {', '.join(pdf_filenames)}")

## Step 3: Create RAG Collection

Create a new collection to store our documents.

In [None]:
collection_name = "ml_knowledge_base"

response = requests.post(
    f"{BASE_URL}/rag/collections",
    headers=headers,
    json={"collection_name": collection_name}
)

result = response.json()
print(f"Status: {response.status_code}")
print(f"Success: {result.get('success')}")
print(f"Message: {result.get('answer')}")

if result.get('data'):
    print(f"\nCollection Details:")
    print(f"  Name: {result['data'].get('collection_name')}")
    print(f"  Path: {result['data'].get('path')}")

## Step 4: Upload Multiple PDFs to Collection

Upload all our sample PDF documents to the RAG collection.

In [None]:
# Upload all PDFs
print(f"Uploading {len(pdf_filenames)} PDF documents to collection '{collection_name}'...\n")

upload_results = []
total_chunks_created = 0

for pdf_file in pdf_filenames:
    print(f"üì§ Uploading: {pdf_file}")
    
    try:
        with open(pdf_file, 'rb') as f:
            files = {'file': (pdf_file, f, 'application/pdf')}
            data = {'collection_name': collection_name}
            
            response = requests.post(
                f"{BASE_URL}/rag/upload",
                headers=headers,
                files=files,
                data=data
            )
        
        result = response.json()
        
        if result.get('success'):
            chunks_created = result.get('chunks_created', 0)
            total_chunks_created += chunks_created
            
            print(f"  ‚úì Success!")
            print(f"    Document: {result.get('document_name')}")
            print(f"    Chunks created: {chunks_created}")
            print(f"    Total chunks in collection: {result.get('total_chunks')}")
            
            upload_results.append({
                'filename': pdf_file,
                'success': True,
                'chunks': chunks_created
            })
        else:
            print(f"  ‚úó Failed: {result.get('error')}")
            upload_results.append({
                'filename': pdf_file,
                'success': False,
                'error': result.get('error')
            })
    
    except Exception as e:
        print(f"  ‚úó Error: {str(e)}")
        upload_results.append({
            'filename': pdf_file,
            'success': False,
            'error': str(e)
        })
    
    print()  # Empty line between uploads

# Summary
print(f"{'='*60}")
successful_uploads = sum(1 for r in upload_results if r['success'])
print(f"Upload Summary:")
print(f"  Total files: {len(pdf_filenames)}")
print(f"  Successful: {successful_uploads}")
print(f"  Failed: {len(pdf_filenames) - successful_uploads}")
print(f"  Total chunks created: {total_chunks_created}")

### Alternative: Upload Your Own PDF Files

If you want to upload your own PDF files instead of the generated samples, uncomment and modify this code:

In [None]:
# # Example: Upload your own PDF files
# # Replace with your actual PDF file paths
# custom_pdf_files = [
#     "path/to/your/document1.pdf",
#     "path/to/your/document2.pdf",
#     "path/to/your/document3.pdf"
# ]

# print(f"Uploading {len(custom_pdf_files)} custom PDF documents...\n")

# for pdf_file in custom_pdf_files:
#     if not Path(pdf_file).exists():
#         print(f"‚ö†Ô∏è  File not found: {pdf_file}")
#         continue
    
#     print(f"üì§ Uploading: {pdf_file}")
    
#     try:
#         with open(pdf_file, 'rb') as f:
#             files = {'file': (Path(pdf_file).name, f, 'application/pdf')}
#             data = {'collection_name': collection_name}
            
#             response = requests.post(
#                 f"{BASE_URL}/rag/upload",
#                 headers=headers,
#                 files=files,
#                 data=data
#             )
        
#         result = response.json()
        
#         if result.get('success'):
#             print(f"  ‚úì Success! Chunks created: {result.get('chunks_created')}")
#         else:
#             print(f"  ‚úó Failed: {result.get('error')}")
    
#     except Exception as e:
#         print(f"  ‚úó Error: {str(e)}")
    
#     print()

print("‚ÑπÔ∏è  Uncomment the code above to upload your own PDF files")

## Step 5: List All Documents in Collection

View all documents that were uploaded to the collection.

In [None]:
response = requests.get(
    f"{BASE_URL}/rag/collections/{collection_name}/documents",
    headers=headers
)

result = response.json()

if result.get('success'):
    print(f"Collection: {result['collection_name']}")
    print(f"Total Documents: {result['total_documents']}")
    print(f"Total Chunks: {result['total_chunks']}")
    print(f"\nDocuments:")
    
    for doc in result['documents']:
        print(f"\n  üìÑ {doc['name']}")
        print(f"     ID: {doc['id']}")
        print(f"     Chunks: {doc['chunks']}")
        print(f"     Uploaded: {doc['uploaded_at']}")
else:
    print(f"Error: {result.get('error')}")

## Step 6: Query the RAG System

Now let's ask some questions about machine learning!

In [None]:
def query_rag(question, max_results=5):
    """
    Query the RAG system and display results nicely
    """
    response = requests.post(
        f"{BASE_URL}/rag/query",
        headers=headers,
        json={
            "query": question,
            "collection_name": collection_name,
            "max_results": max_results
        }
    )
    
    result = response.json()
    
    if result.get('success'):
        # Display question
        display(Markdown(f"### ü§î Question: {question}"))
        
        # Display answer
        display(Markdown(f"**üí° Answer:**\n\n{result['answer']}"))
        
        # Display metadata
        data = result.get('data', {})
        metadata = result.get('metadata', {})
        
        print(f"\nüìä Query Details:")
        print(f"  Optimized Query: {data.get('optimized_query')}")
        print(f"  Results Found: {data.get('num_results')}")
        print(f"  Execution Time: {metadata.get('execution_time', 0):.2f}s")
        
        # Display retrieved documents
        print(f"\nüìö Retrieved Chunks:")
        for i, doc in enumerate(data.get('documents', []), 1):
            print(f"\n  [{i}] {doc['document']} (Chunk {doc['chunk_index']})")
            print(f"      Score: {doc['score']:.3f}")
            print(f"      Preview: {doc['chunk'][:150]}...")
        
        print("\n" + "="*80)
    else:
        print(f"‚ùå Query failed: {result.get('error')}")
    
    return result

print("‚úì Query function defined")

### Query 1: What is Machine Learning?

In [None]:
query_rag("What is Machine Learning?")

### Query 2: Types of Machine Learning

In [None]:
query_rag("What are the different types of machine learning?")

### Query 3: Common ML Algorithms

In [None]:
query_rag("List some common machine learning algorithms")

### Query 4: Deep Learning

In [None]:
query_rag("Explain deep learning and neural networks")

### Query 5: Applications

In [None]:
query_rag("What are some real-world applications of machine learning?")

### Query 6: Challenges

In [None]:
query_rag("What are the main challenges in machine learning?")

## Step 7: Upload Additional Documents

Let's create and upload another document about AI.

In [None]:
# Create a simple text document
ai_content = """Artificial Intelligence (AI) Overview

Artificial Intelligence is the simulation of human intelligence processes by machines, 
especially computer systems. These processes include learning, reasoning, and self-correction.

Key AI Technologies:
1. Natural Language Processing (NLP): Enables machines to understand and respond to human language
2. Computer Vision: Allows machines to interpret and understand visual information
3. Robotics: Combines AI with mechanical systems for autonomous operation
4. Expert Systems: AI programs that simulate human expert decision-making

AI vs Machine Learning:
While often used interchangeably, AI is the broader concept of machines being able to carry out 
tasks in a way that we would consider "smart". Machine Learning is a specific subset of AI that 
trains machines to learn from data.

The Turing Test:
Proposed by Alan Turing in 1950, the Turing Test is a measure of a machine's ability to exhibit 
intelligent behavior equivalent to, or indistinguishable from, that of a human.

AI Ethics:
As AI becomes more prevalent, ethical considerations include bias in algorithms, job displacement, 
privacy concerns, and the need for transparent and explainable AI systems.
"""

# Save to file
ai_filename = "artificial_intelligence_basics.txt"
with open(ai_filename, 'w', encoding='utf-8') as f:
    f.write(ai_content)

print(f"‚úì Text document created: {ai_filename}")

# Upload to RAG
with open(ai_filename, 'rb') as f:
    files = {'file': (ai_filename, f, 'text/plain')}
    data = {'collection_name': collection_name}
    
    response = requests.post(
        f"{BASE_URL}/rag/upload",
        headers=headers,
        files=files,
        data=data
    )

result = response.json()

if result.get('success'):
    print(f"\n‚úì Document uploaded successfully!")
    print(f"  Document: {result.get('document_name')}")
    print(f"  Chunks created: {result.get('chunks_created')}")
    print(f"  Total chunks in collection: {result.get('total_chunks')}")
else:
    print(f"\n‚úó Upload failed: {result.get('error')}")

### Query with Multiple Documents

In [None]:
query_rag("What is the difference between AI and Machine Learning?")

## Step 8: List All Collections

View all collections for the current user.

In [None]:
response = requests.get(
    f"{BASE_URL}/rag/collections",
    headers=headers
)

result = response.json()

if result.get('success'):
    print(f"Found {len(result['collections'])} collection(s):\n")
    
    for coll in result['collections']:
        print(f"üìÅ {coll['name']}")
        print(f"   Documents: {coll['documents']}")
        print(f"   Chunks: {coll['chunks']}")
        print(f"   Created: {coll['created_at']}")
        print()
else:
    print(f"Error: {result.get('error')}")

## Step 9: Delete a Specific Document

Remove one document from the collection.

In [None]:
# First, get the list of documents to find the document ID
response = requests.get(
    f"{BASE_URL}/rag/collections/{collection_name}/documents",
    headers=headers
)

docs_result = response.json()

if docs_result.get('success') and docs_result['documents']:
    # Get the first document's ID
    doc_to_delete = docs_result['documents'][0]
    doc_id = doc_to_delete['id']
    doc_name = doc_to_delete['name']
    
    print(f"Deleting document: {doc_name} (ID: {doc_id})\n")
    
    # Delete the document
    response = requests.delete(
        f"{BASE_URL}/rag/collections/{collection_name}/documents/{doc_id}",
        headers=headers
    )
    
    result = response.json()
    
    if result.get('success'):
        print(f"‚úì Document deleted successfully!")
        print(f"  Deleted: {result['deleted_document']}")
        print(f"  Chunks removed: {result['deleted_chunks']}")
        print(f"  Remaining documents: {result['remaining_documents']}")
        print(f"  Remaining chunks: {result['remaining_chunks']}")
    else:
        print(f"‚úó Delete failed: {result.get('error')}")
else:
    print("No documents to delete")

## Step 10: Cleanup (Optional)

Delete the collection and all its documents.

In [None]:
# Uncomment to delete the collection
# response = requests.delete(
#     f"{BASE_URL}/rag/collections/{collection_name}",
#     headers=headers
# )

# result = response.json()

# if result.get('success'):
#     print(f"‚úì Collection '{collection_name}' deleted successfully!")
# else:
#     print(f"‚úó Delete failed: {result.get('error')}")

print("‚ÑπÔ∏è  Uncomment the code above to delete the collection")

## Summary

This notebook demonstrated:

1. ‚úÖ **Authentication** - Login and JWT token management
2. ‚úÖ **Multiple PDF Generation** - Created multiple sample PDF documents
3. ‚úÖ **Collection Creation** - Created a RAG collection
4. ‚úÖ **Batch Document Upload** - Uploaded multiple PDF files simultaneously
5. ‚úÖ **Document Listing** - Listed all documents in collection
6. ‚úÖ **Semantic Search** - Queried the RAG system with natural language
7. ‚úÖ **Document Management** - Deleted individual documents
8. ‚úÖ **Collection Management** - Listed and managed collections

### Key Features Demonstrated:

- **Per-User Collections**: Each user has isolated RAG storage
- **Multi-Format Support**: PDF, TXT, and other formats
- **Batch Upload**: Upload multiple documents at once with progress tracking
- **Automatic Chunking**: Documents split into semantic chunks
- **Vector Search**: FAISS-powered similarity search
- **LLM Enhancement**: Query optimization and answer synthesis
- **Document Management**: List, upload, and delete documents

### Next Steps:

- Try uploading your own PDF documents (multiple files supported!)
- Experiment with different queries across multiple documents
- Adjust chunk size in `config.py`
- Create multiple collections for different topics
- Integrate RAG into your applications via the API

### API Endpoints Used:

| Endpoint | Purpose |
|----------|--------|
| `POST /api/auth/login` | Authentication |
| `POST /rag/collections` | Create collection |
| `GET /rag/collections` | List collections |
| `POST /rag/upload` | Upload documents |
| `GET /rag/collections/{name}/documents` | List documents |
| `DELETE /rag/collections/{name}/documents/{id}` | Delete document |
| `POST /rag/query` | Query with semantic search |
| `DELETE /rag/collections/{name}` | Delete collection |

For complete API documentation, see [RAG_API_DOCUMENTATION.md](RAG_API_DOCUMENTATION.md)