# RAG System Demo

This notebook demonstrates the complete RAG (Retrieval-Augmented Generation) system with:
- Creating collections
- Uploading PDF documents
- Querying with semantic search
- Document management

## Prerequisites

Make sure both servers are running:
```bash
bash start_servers.sh
```

Or start them separately:
```bash
python tools_server.py  # Port 10006
python server.py        # Port 10007
```

## Setup and Configuration

In [None]:
import requests
import json
from pathlib import Path
from IPython.display import display, Markdown, HTML

# Server configuration
MAIN_SERVER = "http://localhost:10007"  # Change to your server IP if needed: "http://10.198.112.203:10007"
TOOLS_SERVER = "http://localhost:10006"  # Change to your server IP if needed: "http://10.198.112.203:10006"
BASE_URL = f"{TOOLS_SERVER}/api/tools"

# Test credentials (create user first or use existing credentials)
USERNAME = "admin"
PASSWORD = "administrator"

print("‚úì Configuration loaded")
print(f"  Main Server: {MAIN_SERVER}")
print(f"  Tools Server: {TOOLS_SERVER}")

## Step 1: Authentication

First, we need to authenticate and get a JWT token.

In [None]:
# Login to get JWT token
response = requests.post(
    f"{MAIN_SERVER}/api/auth/login",
    json={
        "username": USERNAME,
        "password": PASSWORD
    }
)

if response.status_code == 200:
    token = response.json()["access_token"]
    headers = {"Authorization": f"Bearer {token}"}
    print(f"‚úì Login successful!")
    print(f"  Token: {token[:30]}...")
else:
    print(f"‚úó Login failed: {response.status_code}")
    print(f"  Error: {response.text}")
    raise Exception("Authentication failed")

## Step 2: Generate Sample PDF Documents

We'll create multiple sample PDFs for testing the multi-file upload feature.

## Step 3: Create RAG Collection

Create a new collection to store our documents.

In [None]:
collection_name = "temp2"

response = requests.post(
    f"{BASE_URL}/rag/collections",
    headers=headers,
    json={"collection_name": collection_name}
)

result = response.json()
print(f"Status: {response.status_code}")
print(f"Success: {result.get('success')}")
print(f"Message: {result.get('answer')}")

if result.get('data'):
    print(f"\nCollection Details:")
    print(f"  Name: {result['data'].get('collection_name')}")
    print(f"  Path: {result['data'].get('path')}")

## Step 4: Upload Multiple PDFs to Collection

Upload all our sample PDF documents to the RAG collection.

### Alternative: Upload Your Own PDF Files

If you want to upload your own PDF files instead of the generated samples, uncomment and modify this code:

In [None]:
# Example: Upload your own PDF files
# Replace with your actual PDF file paths
custom_pdf_files = [
    "./USB 3.2 Revision 1.0.pdf",
    "./usb_20.pdf"
]

print(f"Uploading {len(custom_pdf_files)} custom PDF documents...\n")

for pdf_file in custom_pdf_files:
    if not Path(pdf_file).exists():
        print(f"‚ö†Ô∏è  File not found: {pdf_file}")
        continue
    
    print(f"üì§ Uploading: {pdf_file}")
    
    try:
        with open(pdf_file, 'rb') as f:
            files = {'file': (Path(pdf_file).name, f, 'application/pdf')}
            data = {'collection_name': collection_name}
            
            response = requests.post(
                f"{BASE_URL}/rag/upload",
                headers=headers,
                files=files,
                data=data
            )
        
        if response.status_code != 200:
            print(f"  ‚úó HTTP {response.status_code}: {response.json().get('detail', response.text)}")
            continue

        result = response.json()
        
        if result.get('success'):
            print(f"  ‚úì Success! Chunks created: {result.get('chunks_created')}")
        else:
            print(f"  ‚úó Failed: {result.get('error')}")
    
    except Exception as e:
        print(f"  ‚úó Error: {str(e)}")
    
    print()

## Step 5: List All Documents in Collection

View all documents that were uploaded to the collection.

In [None]:
response = requests.get(
    f"{BASE_URL}/rag/collections/{collection_name}/documents",
    headers=headers
)

result = response.json()

if result.get('success'):
    print(f"Collection: {result['collection_name']}")
    print(f"Total Documents: {result['total_documents']}")
    print(f"Total Chunks: {result['total_chunks']}")
    print(f"\nDocuments:")
    
    for doc in result['documents']:
        print(f"\n  üìÑ {doc['name']}")
        print(f"     ID: {doc['id']}")
        print(f"     Chunks: {doc['chunks']}")
        print(f"     Uploaded: {doc['uploaded_at']}")
else:
    print(f"Error: {result.get('error')}")

## Step 6: Query the RAG System

Now let's ask some questions about machine learning!

In [None]:
def query_rag(question, max_results=5):
    """
    Query the RAG system and display results nicely
    """
    response = requests.post(
        f"{BASE_URL}/rag/query",
        headers=headers,
        json={
            "query": question,
            "collection_name": collection_name,
            "max_results": max_results
        }
    )
    
    result = response.json()
    
    if result.get('success'):
        # Display question
        display(Markdown(f"### ü§î Question: {question}"))
        
        # Display answer
        display(Markdown(f"**üí° Answer:**\n\n{result['answer']}"))
        
        # Display metadata
        data = result.get('data', {})
        metadata = result.get('metadata', {})
        
        print(f"\nüìä Query Details:")
        print(f"  Optimized Query: {data.get('optimized_query')}")
        print(f"  Results Found: {data.get('num_results')}")
        print(f"  Execution Time: {metadata.get('execution_time', 0):.2f}s")
        
        # Display retrieved documents
        print(f"\nüìö Retrieved Chunks:")
        for i, doc in enumerate(data.get('documents', []), 1):
            print(f"\n  [{i}] {doc['document']} (Chunk {doc['chunk_index']})")
            print(f"      Score: {doc['score']:.3f}")
            print(f"      Preview: {doc['chunk'][:150]}...")
        
        print("\n" + "="*80)
    else:
        print(f"‚ùå Query failed: {result.get('error')}")
    
    return result

print("‚úì Query function defined")

### Query 1

In [None]:
query_rag("C-PHYÍ∞Ä 3.9GspsÎ°ú ÎèôÏûëÌï† Îïå Insertion Loss Ïä§ÌéôÏùÑ ÏïåÎ†§Ï§òÏ§ò")

## Step 8: List All Collections

View all collections for the current user.

In [None]:
response = requests.get(
    f"{BASE_URL}/rag/collections",
    headers=headers
)

result = response.json()

if result.get('success'):
    print(f"Found {len(result['collections'])} collection(s):\n")
    
    for coll in result['collections']:
        print(f"üìÅ {coll['name']}")
        print(f"   Documents: {coll['documents']}")
        print(f"   Chunks: {coll['chunks']}")
        print(f"   Created: {coll['created_at']}")
        print()
else:
    print(f"Error: {result.get('error')}")

## Step 9: Delete a Specific Document

Remove one document from the collection.

In [None]:
# First, get the list of documents to find the document ID
response = requests.get(
    f"{BASE_URL}/rag/collections/{collection_name}/documents",
    headers=headers
)

docs_result = response.json()

if docs_result.get('success') and docs_result['documents']:
    # Get the first document's ID
    doc_to_delete = docs_result['documents'][0]
    doc_id = doc_to_delete['id']
    doc_name = doc_to_delete['name']
    
    print(f"Deleting document: {doc_name} (ID: {doc_id})\n")
    
    # Delete the document
    response = requests.delete(
        f"{BASE_URL}/rag/collections/{collection_name}/documents/{doc_id}",
        headers=headers
    )
    
    result = response.json()
    
    if result.get('success'):
        print(f"‚úì Document deleted successfully!")
        print(f"  Deleted: {result['deleted_document']}")
        print(f"  Chunks removed: {result['deleted_chunks']}")
        print(f"  Remaining documents: {result['remaining_documents']}")
        print(f"  Remaining chunks: {result['remaining_chunks']}")
    else:
        print(f"‚úó Delete failed: {result.get('error')}")
else:
    print("No documents to delete")

## Step 10: Cleanup (Optional)

Delete the collection and all its documents.

In [None]:
# Uncomment to delete the collection
# response = requests.delete(
#     f"{BASE_URL}/rag/collections/{collection_name}",
#     headers=headers
# )

# result = response.json()

# if result.get('success'):
#     print(f"‚úì Collection '{collection_name}' deleted successfully!")
# else:
#     print(f"‚úó Delete failed: {result.get('error')}")

print("‚ÑπÔ∏è  Uncomment the code above to delete the collection")

## Summary

This notebook demonstrated:

1. ‚úÖ **Authentication** - Login and JWT token management
2. ‚úÖ **Multiple PDF Generation** - Created multiple sample PDF documents
3. ‚úÖ **Collection Creation** - Created a RAG collection
4. ‚úÖ **Batch Document Upload** - Uploaded multiple PDF files simultaneously
5. ‚úÖ **Document Listing** - Listed all documents in collection
6. ‚úÖ **Semantic Search** - Queried the RAG system with natural language
7. ‚úÖ **Document Management** - Deleted individual documents
8. ‚úÖ **Collection Management** - Listed and managed collections

### Key Features Demonstrated:

- **Per-User Collections**: Each user has isolated RAG storage
- **Multi-Format Support**: PDF, TXT, and other formats
- **Batch Upload**: Upload multiple documents at once with progress tracking
- **Automatic Chunking**: Documents split into semantic chunks
- **Vector Search**: FAISS-powered similarity search
- **LLM Enhancement**: Query optimization and answer synthesis
- **Document Management**: List, upload, and delete documents

### Next Steps:

- Try uploading your own PDF documents (multiple files supported!)
- Experiment with different queries across multiple documents
- Adjust chunk size in `config.py`
- Create multiple collections for different topics
- Integrate RAG into your applications via the API

### API Endpoints Used:

| Endpoint | Purpose |
|----------|--------|
| `POST /api/auth/login` | Authentication |
| `POST /rag/collections` | Create collection |
| `GET /rag/collections` | List collections |
| `POST /rag/upload` | Upload documents |
| `GET /rag/collections/{name}/documents` | List documents |
| `DELETE /rag/collections/{name}/documents/{id}` | Delete document |
| `POST /rag/query` | Query with semantic search |
| `DELETE /rag/collections/{name}` | Delete collection |

For complete API documentation, see [RAG_API_DOCUMENTATION.md](RAG_API_DOCUMENTATION.md)