# üîç InsightSpike-AI: Dynamic RAG Comparison Experiment
## Evaluating Dynamic RAG Construction vs Existing Methods

This notebook compares InsightSpike-AI's dynamic RAG construction capabilities against established baselines using standard question-answering benchmarks.

### Experimental Design
- **Datasets**: Simulated NaturalQuestions & HotpotQA samples
- **Baselines**: BM25, Static Embeddings, DPR (Dense Passage Retrieval)
- **Metrics**: Recall@k, Exact Match (EM), F1 Score, Inference Latency

### InsightSpike-AI Dynamic RAG Features
- **Adaptive Weighting**: Dynamically adjusts retrieval strategy based on query characteristics
- **Intrinsic Motivation**: Uses ŒîGED √ó ŒîIG for document selection enhancement
- **Multi-Strategy Fusion**: Combines lexical, semantic, and learned retrieval methods
- **Context-Aware Memory**: Maintains retrieval history for improved performance

### Expected Outcomes
We expect InsightSpike-AI's dynamic approach to show:
1. Higher recall and precision across different k values
2. Better handling of both factual and multi-hop questions
3. Competitive or superior latency performance
4. More robust performance across question types

In [None]:
# Colab Environment Setup and Package Installation
import sys
import os
from pathlib import Path

# Check if running in Colab
try:
    import google.colab
    IN_COLAB = True
    print("üîß Running in Google Colab")
except:
    IN_COLAB = False
    print("üîß Running in local environment")

# Install required packages for Colab
if IN_COLAB:
    print("üì¶ Installing required packages...")
    # Install compatible versions to avoid meta tensor issues
    !pip install torch==2.2.2 torchvision==0.17.2 torchaudio==2.2.2 --index-url https://download.pytorch.org/whl/cpu
    !pip install sentence-transformers==2.7.0  # Compatible with PyTorch 2.2.2
    !pip install transformers==4.30.0  # Ensure compatibility
    !pip install numpy==1.26.4  # Avoid NumPy 2.x compatibility issues
    !pip install scikit-learn pandas matplotlib seaborn
    !pip install plotly kaleido
    !pip install faiss-cpu networkx
    print("‚úÖ Package installation complete")

In [None]:
# Clone Repository and Setup Environment
if IN_COLAB:
    # Get GitHub token from Colab secrets for private repository
    from google.colab import userdata
    
    try:
        github_token = userdata.get('GITHUB_TOKEN')
        print("‚úÖ GitHub token found in secrets")
    except Exception as e:
        print("‚ùå GitHub token not found in secrets.")
        print("üìù Please add GITHUB_TOKEN to Colab secrets:")
        print("   1. Click the key icon (üîë) in the left sidebar")
        print("   2. Add new secret: Name='GITHUB_TOKEN', Value='your_github_token'")
        print("   3. Get token from: https://github.com/settings/tokens")
        raise e
    
    # Clone the private repository
    print("üì• Cloning InsightSpike-AI repository...")
    clone_url = f"https://{github_token}@github.com/miyauchikazuyoshi/InsightSpike-AI.git"
    !git clone $clone_url
    
    # Change to project directory
    os.chdir('/content/InsightSpike-AI')
    sys.path.append('/content/InsightSpike-AI')
    print("üìÅ Changed to project directory")
else:
    # Assume we're already in the project directory
    print("üìÅ Using local project directory")

# Add experiment module to path
sys.path.append('experiments/colab_experiments/dynamic_rag_comparison')

# Test import of experiment modules
try:
    from dynamic_rag_experiment import (
        run_dynamic_rag_experiment,
        BM25Retriever,
        StaticEmbeddingRetriever,
        DPRRetriever,
        InsightSpikeRAG,
        create_expanded_dataset,
        evaluate_retrieval_system,
        create_rag_visualization
    )
    print("‚úÖ Successfully imported experiment modules")
except ImportError as e:
    print(f"‚ùå Import error: {e}")
    print("üìã Available files:")
    !ls -la experiments/colab_experiments/dynamic_rag_comparison/

In [None]:
# Import Required Libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import json
import time
import warnings
from datetime import datetime
from IPython.display import display, HTML, Markdown
from collections import defaultdict
import re

# Suppress warnings
warnings.filterwarnings('ignore')

# Set random seeds for reproducibility
np.random.seed(42)

# Set up plotting style
plt.style.use('default')
sns.set_palette("husl")

print("üéØ Environment setup complete!")

# Check available packages
try:
    from sentence_transformers import SentenceTransformer
    print("‚úÖ Sentence Transformers available")
    SENTENCE_TRANSFORMERS_AVAILABLE = True
except ImportError:
    print("‚ö†Ô∏è Sentence Transformers not available - using fallback methods")
    SENTENCE_TRANSFORMERS_AVAILABLE = False

try:
    from sklearn.feature_extraction.text import TfidfVectorizer
    from sklearn.metrics.pairwise import cosine_similarity
    print("‚úÖ Scikit-learn available")
    SKLEARN_AVAILABLE = True
except ImportError:
    print("‚ö†Ô∏è Scikit-learn not available - using simplified methods")
    SKLEARN_AVAILABLE = False

## üìä Dataset Preparation and Preview

Let's examine the evaluation dataset we'll be using for this comparison.

In [None]:
# Create and Examine the Evaluation Dataset
print("üìä Creating evaluation dataset...")

# Load the expanded dataset
questions, documents = create_expanded_dataset()

print(f"‚úÖ Dataset created:")
print(f"   üìù Questions: {len(questions)}")
print(f"   üìÑ Documents: {len(documents)}")

# Display dataset statistics
question_types = {}
for q in questions:
    qtype = q.get("type", "unknown")
    question_types[qtype] = question_types.get(qtype, 0) + 1

print(f"\nüìà Question Type Distribution:")
for qtype, count in question_types.items():
    print(f"   {qtype}: {count} questions")

# Show sample questions
print(f"\nüîç Sample Questions:")
print("-" * 50)

for i, q in enumerate(questions[:3]):
    print(f"Q{i+1} [{q.get('type', 'unknown')}]: {q['question']}")
    print(f"   Answer: {q['answer']}")
    print(f"   Context: {q['context'][:100]}...")
    print()

In [None]:
# Document Analysis
print("üìÑ Document Corpus Analysis:")
print("-" * 40)

# Calculate document statistics
doc_lengths = [len(doc.split()) for doc in documents]
total_tokens = sum(doc_lengths)
avg_length = np.mean(doc_lengths)
std_length = np.std(doc_lengths)

print(f"Total documents: {len(documents)}")
print(f"Total tokens: {total_tokens:,}")
print(f"Average doc length: {avg_length:.1f} ¬± {std_length:.1f} tokens")
print(f"Min doc length: {min(doc_lengths)} tokens")
print(f"Max doc length: {max(doc_lengths)} tokens")

# Visualize document length distribution
plt.figure(figsize=(10, 6))
plt.subplot(1, 2, 1)
plt.hist(doc_lengths, bins=15, alpha=0.7, color='skyblue')
plt.xlabel('Document Length (tokens)')
plt.ylabel('Frequency')
plt.title('Document Length Distribution')
plt.grid(True, alpha=0.3)

# Show sample documents
plt.subplot(1, 2, 2)
sample_docs = documents[:5]
doc_indices = range(1, len(sample_docs) + 1)
sample_lengths = [len(doc.split()) for doc in sample_docs]

plt.bar(doc_indices, sample_lengths, alpha=0.7, color='lightcoral')
plt.xlabel('Document Index')
plt.ylabel('Length (tokens)')
plt.title('Sample Document Lengths')
plt.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

# Display sample documents
print(f"\nüìÑ Sample Documents:")
print("-" * 50)
for i, doc in enumerate(documents[:3]):
    print(f"Doc {i+1}: {doc[:150]}...")
    print()

## üîß Retrieval System Initialization

Now let's initialize and test all the retrieval systems we'll be comparing.

In [None]:
# Initialize All Retrieval Systems
print("üîß Initializing retrieval systems...")

# Track initialization time for each system
init_times = {}

# 1. BM25 Retriever
print("\nüìä Initializing BM25 Retriever...")
start_time = time.time()
bm25_retriever = BM25Retriever(documents)
init_times["BM25"] = time.time() - start_time
print(f"   ‚úÖ BM25 initialized in {init_times['BM25']:.3f}s")

# 2. Static Embedding Retriever
print("\nüî¢ Initializing Static Embedding Retriever...")
start_time = time.time()
static_retriever = StaticEmbeddingRetriever(documents)
init_times["Static Embeddings"] = time.time() - start_time
print(f"   ‚úÖ Static Embeddings initialized in {init_times['Static Embeddings']:.3f}s")

# 3. DPR Retriever (if available)
if SENTENCE_TRANSFORMERS_AVAILABLE:
    print("\nüß† Initializing DPR-style Dense Retriever...")
    start_time = time.time()
    dpr_retriever = DPRRetriever(documents)
    init_times["DPR (Dense)"] = time.time() - start_time
    print(f"   ‚úÖ DPR initialized in {init_times['DPR (Dense)']:.3f}s")
else:
    print("\n‚ö†Ô∏è DPR not available - skipping dense retrieval")

# 4. InsightSpike Dynamic RAG
print("\nüöÄ Initializing InsightSpike Dynamic RAG...")
start_time = time.time()
insightspike_rag = InsightSpikeRAG(documents)
init_times["InsightSpike Dynamic RAG"] = time.time() - start_time
print(f"   ‚úÖ InsightSpike RAG initialized in {init_times['InsightSpike Dynamic RAG']:.3f}s")

# Display initialization summary
print(f"\n‚è±Ô∏è Initialization Times Summary:")
print("-" * 40)
for system, init_time in init_times.items():
    print(f"{system:<25}: {init_time:.3f}s")

In [None]:
# Test Retrieval Systems with Sample Query
print("üß™ Testing retrieval systems with sample query...")

sample_query = "When was the Declaration of Independence signed?"
print(f"Test Query: '{sample_query}'")
print("-" * 60)

# Test each retriever
retrievers = {
    "BM25": bm25_retriever,
    "Static Embeddings": static_retriever,
    "InsightSpike Dynamic RAG": insightspike_rag
}

if SENTENCE_TRANSFORMERS_AVAILABLE:
    retrievers["DPR (Dense)"] = dpr_retriever

for name, retriever in retrievers.items():
    print(f"\nüîç {name} Results:")
    start_time = time.time()
    results = retriever.retrieve(sample_query, k=3)
    query_time = time.time() - start_time
    
    print(f"   Query time: {query_time*1000:.1f}ms")
    
    for i, (doc_idx, score) in enumerate(results):
        doc_preview = documents[doc_idx][:100] + "..." if len(documents[doc_idx]) > 100 else documents[doc_idx]
        print(f"   {i+1}. Score: {score:.3f} | Doc: {doc_preview}")

## üöÄ Running the Complete Evaluation

Now let's run the comprehensive evaluation across all systems and metrics.

In [None]:
# Run Complete Evaluation
print("üöÄ Starting comprehensive RAG evaluation...")
print("‚è∞ This will take a few minutes to complete...")

# Configure evaluation parameters
k_values = [1, 3, 5]
print(f"üìä Evaluating with k values: {k_values}")

# Initialize results storage
all_results = {}

# Evaluate each system
for name, retriever in retrievers.items():
    print(f"\nüîç Evaluating {name}...")
    
    # Run evaluation
    results = evaluate_retrieval_system(retriever, questions, documents, k_values)
    all_results[name] = results
    
    # Display quick summary
    avg_recall_5 = np.mean(results["recall_at_k"][5])
    avg_precision_5 = np.mean(results["precision_at_k"][5])
    avg_em = np.mean(results["exact_matches"])
    avg_f1 = np.mean(results["f1_scores"])
    avg_latency = np.mean(results["latencies"])
    
    print(f"   üìà Quick Summary:")
    print(f"      Recall@5: {avg_recall_5:.3f}")
    print(f"      Precision@5: {avg_precision_5:.3f}")
    print(f"      Exact Match: {avg_em:.3f}")
    print(f"      F1 Score: {avg_f1:.3f}")
    print(f"      Avg Latency: {avg_latency*1000:.1f}ms")

print("\n‚úÖ Evaluation completed for all systems!")

## üìà Results Visualization and Analysis

Let's create comprehensive visualizations to understand the performance differences between systems.

In [None]:
# Create Main Visualization
print("üìà Creating comprehensive results visualization...")

# Generate the main comparison visualization
fig = create_rag_visualization(all_results, questions)
plt.show()

print("‚úÖ Main visualization complete!")

In [None]:
# Detailed Performance Analysis
print("üìä Detailed Performance Analysis")
print("=" * 50)

systems = list(all_results.keys())

# Create detailed comparison table
comparison_data = []
for system in systems:
    results = all_results[system]
    
    row = {
        "System": system,
        "Recall@1": f"{np.mean(results['recall_at_k'][1]):.3f} ¬± {np.std(results['recall_at_k'][1]):.3f}",
        "Recall@3": f"{np.mean(results['recall_at_k'][3]):.3f} ¬± {np.std(results['recall_at_k'][3]):.3f}",
        "Recall@5": f"{np.mean(results['recall_at_k'][5]):.3f} ¬± {np.std(results['recall_at_k'][5]):.3f}",
        "Precision@5": f"{np.mean(results['precision_at_k'][5]):.3f} ¬± {np.std(results['precision_at_k'][5]):.3f}",
        "Exact Match": f"{np.mean(results['exact_matches']):.3f} ¬± {np.std(results['exact_matches']):.3f}",
        "F1 Score": f"{np.mean(results['f1_scores']):.3f} ¬± {np.std(results['f1_scores']):.3f}",
        "Latency (ms)": f"{np.mean(results['latencies'])*1000:.1f} ¬± {np.std(results['latencies'])*1000:.1f}"
    }
    comparison_data.append(row)

comparison_df = pd.DataFrame(comparison_data)
display(HTML(comparison_df.to_html(index=False, table_id="comparison_table")))

# Statistical Significance Testing
print(f"\nüî¨ Statistical Significance Analysis:")
print("-" * 40)

from scipy import stats

# Compare InsightSpike against each baseline
insightspike_name = "InsightSpike Dynamic RAG"
if insightspike_name in all_results:
    insightspike_recall5 = all_results[insightspike_name]["recall_at_k"][5]
    insightspike_em = all_results[insightspike_name]["exact_matches"]
    
    for system in systems:
        if system != insightspike_name:
            system_recall5 = all_results[system]["recall_at_k"][5]
            system_em = all_results[system]["exact_matches"]
            
            # T-test for Recall@5
            _, p_recall = stats.ttest_ind(insightspike_recall5, system_recall5)
            
            # T-test for Exact Match
            _, p_em = stats.ttest_ind(insightspike_em, system_em)
            
            # Calculate effect sizes (Cohen's d)
            def cohens_d(group1, group2):
                n1, n2 = len(group1), len(group2)
                pooled_std = np.sqrt(((n1 - 1) * np.var(group1, ddof=1) + 
                                     (n2 - 1) * np.var(group2, ddof=1)) / (n1 + n2 - 2))
                return (np.mean(group1) - np.mean(group2)) / pooled_std
            
            recall_effect = cohens_d(insightspike_recall5, system_recall5)
            em_effect = cohens_d(insightspike_em, system_em)
            
            print(f"\nInsightSpike vs {system}:")
            print(f"  Recall@5: p={p_recall:.4f}, Cohen's d={recall_effect:.3f}")
            print(f"  Exact Match: p={p_em:.4f}, Cohen's d={em_effect:.3f}")
            
            # Interpretation
            if p_recall < 0.05:
                print(f"  Recall@5: Statistically significant difference ‚úÖ")
            else:
                print(f"  Recall@5: No significant difference ‚ùå")

In [None]:
# Performance by Question Type Analysis
print("üéØ Performance by Question Type")
print("=" * 40)

# Separate results by question type
factual_questions = [(i, q) for i, q in enumerate(questions) if q.get("type") == "factual"]
multihop_questions = [(i, q) for i, q in enumerate(questions) if q.get("type") == "multi-hop"]

print(f"Factual questions: {len(factual_questions)}")
print(f"Multi-hop questions: {len(multihop_questions)}")

# Calculate performance by question type
type_performance = {}

for system in systems:
    results = all_results[system]
    
    # Factual performance
    factual_recall5 = [results["recall_at_k"][5][i] for i, _ in factual_questions]
    factual_em = [results["exact_matches"][i] for i, _ in factual_questions]
    
    # Multi-hop performance
    multihop_recall5 = [results["recall_at_k"][5][i] for i, _ in multihop_questions]
    multihop_em = [results["exact_matches"][i] for i, _ in multihop_questions]
    
    type_performance[system] = {
        "factual": {
            "recall5": np.mean(factual_recall5) if factual_recall5 else 0,
            "em": np.mean(factual_em) if factual_em else 0
        },
        "multihop": {
            "recall5": np.mean(multihop_recall5) if multihop_recall5 else 0,
            "em": np.mean(multihop_em) if multihop_em else 0
        }
    }

# Visualize question type performance
fig, axes = plt.subplots(1, 2, figsize=(15, 6))

# Recall@5 by question type
ax1 = axes[0]
x = np.arange(len(systems))
width = 0.35

factual_recall = [type_performance[sys]["factual"]["recall5"] for sys in systems]
multihop_recall = [type_performance[sys]["multihop"]["recall5"] for sys in systems]

ax1.bar(x - width/2, factual_recall, width, label='Factual', alpha=0.8)
ax1.bar(x + width/2, multihop_recall, width, label='Multi-hop', alpha=0.8)

ax1.set_xlabel('System')
ax1.set_ylabel('Recall@5')
ax1.set_title('Recall@5 by Question Type')
ax1.set_xticks(x)
ax1.set_xticklabels([s.replace(' ', '\n') for s in systems], fontsize=9)
ax1.legend()
ax1.grid(True, alpha=0.3)

# Exact Match by question type
ax2 = axes[1]
factual_em = [type_performance[sys]["factual"]["em"] for sys in systems]
multihop_em = [type_performance[sys]["multihop"]["em"] for sys in systems]

ax2.bar(x - width/2, factual_em, width, label='Factual', alpha=0.8)
ax2.bar(x + width/2, multihop_em, width, label='Multi-hop', alpha=0.8)

ax2.set_xlabel('System')
ax2.set_ylabel('Exact Match')
ax2.set_title('Exact Match by Question Type')
ax2.set_xticks(x)
ax2.set_xticklabels([s.replace(' ', '\n') for s in systems], fontsize=9)
ax2.legend()
ax2.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

# Print detailed breakdown
print(f"\nüìã Detailed Question Type Performance:")
print("-" * 60)
print(f"{'System':<25} {'Factual R@5':<12} {'Factual EM':<11} {'Multi-hop R@5':<14} {'Multi-hop EM':<12}")
print("-" * 60)

for system in systems:
    perf = type_performance[system]
    print(f"{system:<25} {perf['factual']['recall5']:<12.3f} {perf['factual']['em']:<11.3f} "
          f"{perf['multihop']['recall5']:<14.3f} {perf['multihop']['em']:<12.3f}")

## üíæ Save Results and Create Download Package

Let's save all our experimental results and create a downloadable package.

In [None]:
# Save Experimental Results
print("üíæ Saving experimental results...")

# Create results directory
results_dir = Path("rag_comparison_results")
results_dir.mkdir(exist_ok=True)

timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")

# Prepare comprehensive results data
results_data = {
    "timestamp": timestamp,
    "experiment_type": "dynamic_rag_comparison",
    "environment": "Google Colab" if IN_COLAB else "Local",
    "dataset_info": {
        "num_questions": len(questions),
        "num_documents": len(documents),
        "question_types": {
            "factual": len([q for q in questions if q.get("type") == "factual"]),
            "multi_hop": len([q for q in questions if q.get("type") == "multi-hop"])
        }
    },
    "systems_evaluated": list(all_results.keys()),
    "evaluation_metrics": {
        "recall_at_k": k_values,
        "precision_at_k": k_values,
        "exact_match": True,
        "f1_score": True,
        "latency": True
    },
    "initialization_times": init_times,
    "detailed_results": all_results,
    "question_type_performance": type_performance
}

# Convert numpy arrays to lists for JSON serialization
def convert_numpy(obj):
    if isinstance(obj, np.ndarray):
        return obj.tolist()
    elif isinstance(obj, np.integer):
        return int(obj)
    elif isinstance(obj, np.floating):
        return float(obj)
    elif isinstance(obj, dict):
        return {k: convert_numpy(v) for k, v in obj.items()}
    elif isinstance(obj, list):
        return [convert_numpy(item) for item in obj]
    return obj

# Save JSON data
json_path = results_dir / f"rag_comparison_results_{timestamp}.json"
with open(json_path, 'w') as f:
    json.dump(convert_numpy(results_data), f, indent=2)

print(f"üìä Results saved to: {json_path}")

# Save main figure
fig.savefig(results_dir / f"rag_comparison_visualization_{timestamp}.png", 
           dpi=300, bbox_inches='tight')

# Save question type analysis figure
plt.savefig(results_dir / f"question_type_analysis_{timestamp}.png", 
           dpi=300, bbox_inches='tight')

print(f"üìà Visualizations saved to: {results_dir}/")

# Create summary CSV
summary_data = []
for system in systems:
    summary_data.append({
        "System": system,
        "Recall@1": np.mean(all_results[system]["recall_at_k"][1]),
        "Recall@3": np.mean(all_results[system]["recall_at_k"][3]),
        "Recall@5": np.mean(all_results[system]["recall_at_k"][5]),
        "Precision@5": np.mean(all_results[system]["precision_at_k"][5]),
        "Exact_Match": np.mean(all_results[system]["exact_matches"]),
        "F1_Score": np.mean(all_results[system]["f1_scores"]),
        "Latency_ms": np.mean(all_results[system]["latencies"]) * 1000,
        "Factual_Recall@5": type_performance[system]["factual"]["recall5"],
        "Factual_EM": type_performance[system]["factual"]["em"],
        "MultiHop_Recall@5": type_performance[system]["multihop"]["recall5"],
        "MultiHop_EM": type_performance[system]["multihop"]["em"]
    })

summary_df = pd.DataFrame(summary_data)
csv_path = results_dir / f"rag_summary_results_{timestamp}.csv"
summary_df.to_csv(csv_path, index=False)

print(f"üìÑ Summary CSV saved to: {csv_path}")
print("\n‚úÖ All results saved successfully!")

In [None]:
# Download Results (for Colab users)
if IN_COLAB:
    print("üì• Preparing files for download...")
    
    # Create a zip file with all results
    import zipfile
    
    zip_path = f"dynamic_rag_comparison_results_{timestamp}.zip"
    
    with zipfile.ZipFile(zip_path, 'w') as zipf:
        # Add all files from results directory
        for file_path in results_dir.glob("*"):
            zipf.write(file_path, file_path.name)
        
        # Add the experiment script
        zipf.write("experiments/colab_experiments/dynamic_rag_comparison/dynamic_rag_experiment.py", 
                   "dynamic_rag_experiment.py")
        
        # Add this notebook
        try:
            zipf.write("experiments/colab_experiments/dynamic_rag_comparison/dynamic_rag_colab.ipynb", 
                       "dynamic_rag_colab.ipynb")
        except:
            pass  # File might not exist in Colab
    
    print(f"üì¶ Created zip file: {zip_path}")
    
    # Download files
    from google.colab import files
    
    try:
        files.download(zip_path)
        print("‚úÖ Download initiated! Check your browser's download folder.")
    except:
        print("‚ö†Ô∏è Automatic download failed. You can manually download the files from the file browser.")
        print("üìÅ Available files:")
        !ls -la rag_comparison_results/
        !ls -la *.zip
else:
    print("üìÅ Results saved locally in the rag_comparison_results/ directory")
    print("üìã Available files:")
    !ls -la rag_comparison_results/

## üì¶ Experiment Results Download

Download your experimental results for further analysis or sharing.

In [None]:
# Download Experiment Results
print("üì¶ Preparing experiment results for download...")

def create_downloadable_results():
    """Create a downloadable package of all experimental results"""
    import zipfile
    import json
    from datetime import datetime
    from pathlib import Path
    
    # Create download directory
    download_dir = Path("downloads")
    download_dir.mkdir(exist_ok=True)
    
    timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
    zip_filename = f"rag_experiment_results_{timestamp}.zip"
    zip_path = download_dir / zip_filename
    
    print(f"üìù Creating results package: {zip_filename}")
    
    # Create comprehensive results package
    with zipfile.ZipFile(zip_path, 'w', zipfile.ZIP_DEFLATED) as zipf:
        
        # Add experiment results
        results_dir = Path("data/rag_experiments/results")
        if results_dir.exists():
            for file_path in results_dir.rglob("*"):
                if file_path.is_file():
                    arcname = f"results/{file_path.relative_to(results_dir)}"
                    zipf.write(file_path, arcname)
                    print(f"   üìÑ Added: {arcname}")
        
        # Add visualizations
        viz_dir = Path("data/rag_experiments/visualizations")
        if viz_dir.exists():
            for file_path in viz_dir.rglob("*.png"):
                if file_path.is_file():
                    arcname = f"visualizations/{file_path.name}"
                    zipf.write(file_path, arcname)
                    print(f"   üñºÔ∏è  Added: {arcname}")
        
        # Add baseline comparisons
        baselines_dir = Path("data/rag_experiments/baselines")
        if baselines_dir.exists():
            for baseline_dir in baselines_dir.iterdir():
                if baseline_dir.is_dir():
                    results_files = baseline_dir.rglob("*.json")
                    for file_path in results_files:
                        arcname = f"baselines/{baseline_dir.name}/{file_path.name}"
                        zipf.write(file_path, arcname)
                        print(f"   üìä Added: {arcname}")
        
        # Add experiment summary
        summary = {
            "experiment_type": "Dynamic RAG Comparison",
            "timestamp": timestamp,
            "notebook_version": "v1.0.0",
            "description": "Comparison of InsightSpike-AI dynamic RAG against baseline methods",
            "datasets": ["NaturalQuestions_sample", "HotpotQA_sample"],
            "methods_compared": ["BM25", "Static Embeddings", "DPR", "InsightSpike RAG"],
            "metrics": ["Recall@k", "Precision@k", "Exact Match", "F1 Score", "Latency"]
        }
        
        summary_path = download_dir / "experiment_summary.json"
        with open(summary_path, 'w') as f:
            json.dump(summary, f, indent=2)
        zipf.write(summary_path, "experiment_summary.json")
        
        print(f"   üìã Added: experiment_summary.json")
    
    file_size = zip_path.stat().st_size / (1024 * 1024)  # MB
    print(f"\n‚úÖ Results package created successfully!")
    print(f"üì¶ File: {zip_path}")
    print(f"üìè Size: {file_size:.2f} MB")
    
    return zip_path

# Create and prepare results for download
if IN_COLAB:
    try:
        # Create downloadable package
        zip_path = create_downloadable_results()
        
        # Download in Colab
        from google.colab import files
        files.download(str(zip_path))
        print("‚¨áÔ∏è  Download started in Colab!")
        
    except Exception as e:
        print(f"‚ùå Error creating download package: {e}")
        print("üí° You can manually download files from the file browser")
        
        # Show available files for manual download
        results_dir = Path("data/rag_experiments/results")
        if results_dir.exists():
            print(f"\nüìã Available result files:")
            for file_path in results_dir.rglob("*"):
                if file_path.is_file():
                    print(f"   üìÑ {file_path}")
else:
    # Local environment - just create the package
    zip_path = create_downloadable_results()
    print(f"üíæ Results saved locally: {zip_path}")
    print("üìÅ Open the 'downloads' folder to access your results")

print(f"\nüéâ Experiment complete! Your results are ready for analysis.")