# RAG System Demo - Energy Sector Data Analysis

This notebook demonstrates the capabilities of the RAG system for analyzing energy sector data from Excel files.

## Contents
1. Setup and Installation
2. Create Sample Data
3. Initialize RAG System
4. Index Documents
5. Perform Searches
6. Generate Reports
7. Advanced Features

## 1. Setup and Installation

In [None]:
# Import required libraries
import sys
import os
from pathlib import Path

# Add src to path
sys.path.insert(0, str(Path.cwd()))

# Import RAG system components
from src import RAGSystem, setup_logging
from src.utils import load_config

# Setup logging
setup_logging(log_level="INFO")

print("✓ Imports successful")

## 2. Create Sample Data

First, let's create some sample Excel files for testing.

In [None]:
# Run the sample data creation script
!python create_sample_excel.py

## 3. Initialize RAG System

In [None]:
# Initialize RAG system
rag = RAGSystem(
    input_dir="./data/input",
    prompts_dir="./prompts",
    embeddings_dir="./embeddings",
    config_path="./config/config.yaml"
)

print("✓ RAG System initialized")
print(rag)

In [None]:
# Initialize components
# Note: This will use OpenAI by default. Make sure you have OPENAI_API_KEY in .env
# Or set use_chromadb=True and llm_provider to None to skip LLM initialization

try:
    rag.initialize_components(
        embedding_model="sentence-transformers/paraphrase-multilingual-mpnet-base-v2"
    )
    print("✓ Components initialized (including LLM)")
except Exception as e:
    print(f"⚠ Could not initialize LLM: {e}")
    print("System will work for search, but answer generation will be limited")

## 4. Index Documents

In [None]:
# Index all Excel documents
num_docs = rag.index_documents(force_reindex=True)

print(f"\n✓ Indexed {num_docs} documents")

In [None]:
# Get system statistics
stats = rag.get_statistics()

import pprint
pprint.pprint(stats)

## 5. Perform Searches

### 5.1 Simple Search

In [None]:
# Search for wind energy suppliers
results = rag.search("furnizori energie eoliană", top_k=5)

print(f"Found {len(results)} results:\n")

for i, result in enumerate(results, 1):
    metadata = result["metadata"]
    score = result.get("score", 0)
    
    print(f"{i}. {metadata.get('client_name', 'N/A')} (Score: {score:.3f})")
    print(f"   Source Type: {metadata.get('source_type')}")
    print(f"   Power: {metadata.get('power_installed')} MW")
    print(f"   Location: {metadata.get('address')}")
    print()

### 5.2 Search with Filters

In [None]:
# Search with power filter
results = rag.search(
    "furnizori energie regenerabilă",
    top_k=10,
    filters={
        "power_installed": {"min": 50}  # At least 50 MW
    }
)

print(f"Found {len(results)} suppliers with power >= 50 MW:\n")

for i, result in enumerate(results, 1):
    metadata = result["metadata"]
    print(f"{i}. {metadata.get('client_name')}: {metadata.get('power_installed')} MW")

### 5.3 Full Query with LLM

In [None]:
# Full RAG query with answer generation
question = "Care sunt furnizorii de energie eoliană și cât reprezintă puterea lor totală?"

answer = rag.query(question, top_k=5)

print("Question:", question)
print("\nAnswer:")
print(answer)

### 5.4 Comparison Query

In [None]:
# Compare different energy sources
question = "Compară furnizorii de energie eoliană cu cei de energie solară"

answer = rag.query(question, top_k=10)

print("Question:", question)
print("\nAnswer:")
print(answer)

## 6. Generate Reports

In [None]:
# Generate a comprehensive report
report = rag.generate_report(
    query="analiza completă furnizori energie regenerabilă",
    output_path="./outputs/raport_energie_regenerabila.md",
    include_summary=True
)

print("Report generated!\n")
print(report[:1000] + "...\n\n[Report truncated for display]")

## 7. Advanced Features

### 7.1 Aggregate Statistics

In [None]:
# Get all results and aggregate by source type
results = rag.search("furnizori energie", top_k=20)

if rag.hybrid_retriever:
    stats = rag.hybrid_retriever.aggregate_statistics(results, "source_type")
    
    print("Statistics by Source Type:\n")
    for source_type, data in stats.items():
        print(f"{source_type}:")
        print(f"  Count: {data['count']}")
        print(f"  Total Power: {data['total_power']:.2f} MW")
        print(f"  Average Score: {data['avg_score']:.3f}")
        print()

### 7.2 Data Export

In [None]:
# Export loaded data to DataFrame
import pandas as pd

df = rag.data_loader.to_dataframe()

print(f"Loaded {len(df)} records\n")
print("Sample data:")
df.head()

In [None]:
# Distribution of power by source type
import matplotlib.pyplot as plt

if 'source_type' in df.columns and 'power_installed' in df.columns:
    power_by_source = df.groupby('source_type')['power_installed'].sum()
    
    plt.figure(figsize=(10, 6))
    power_by_source.plot(kind='bar')
    plt.title('Total Installed Power by Source Type')
    plt.xlabel('Source Type')
    plt.ylabel('Power (MW)')
    plt.xticks(rotation=45)
    plt.tight_layout()
    plt.show()

### 7.3 Custom Prompts

In [None]:
# Load custom prompts
system_prompts = rag.prompt_loader.load_system_prompts()
user_prompts = rag.prompt_loader.load_user_prompts()

print("Available system prompts:")
for name in system_prompts.keys():
    print(f"  - {name}")

print("\nAvailable user prompts:")
for name in user_prompts.keys():
    print(f"  - {name}")

In [None]:
# Use custom prompts for query
if system_prompts and user_prompts:
    answer = rag.query(
        "Care este cel mai mare furnizor de energie?",
        top_k=5,
        system_prompt=system_prompts.get('system_analysis', ''),
        user_prompt_template=user_prompts.get('user_report_template', '')
    )
    
    print(answer)

## 8. Save and Load Index

In [None]:
# Save the current index
rag.save_index()
print("✓ Index saved")

In [None]:
# Create a new RAG instance and load existing index
rag_loaded = RAGSystem(
    prompts_dir="./prompts",
    embeddings_dir="./embeddings",
    config_path="./config/config.yaml"
)

rag_loaded.initialize_components()
rag_loaded.load_index()

print("✓ Index loaded")
print(f"Loaded {len(rag_loaded.metadata)} documents")

## Conclusion

This notebook demonstrated:
- Setting up the RAG system
- Creating and indexing sample data
- Performing various types of searches
- Generating reports
- Using advanced features like filtering and aggregation
- Working with custom prompts
- Saving and loading indexes

For more information, see the [README.md](README.md) file.