# InsightSpike-AI Large-Scale Experiments on Google Colab

This notebook provides a comprehensive environment for running InsightSpike experiments at scale.

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Sunwood-ai-labs/InsightSpike-AI/blob/main/experiments/colab_experiments/InsightSpike_Experiments.ipynb)

## 1. Environment Setup

First, let's set up the Google Colab environment with all necessary dependencies.

In [None]:
# Check GPU availability
import torch
if torch.cuda.is_available():
    print(f"‚úÖ GPU Available: {torch.cuda.get_device_name(0)}")
    print(f"   Memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.2f} GB")
else:
    print("‚ö†Ô∏è No GPU detected. Go to Runtime ‚Üí Change runtime type ‚Üí GPU")

In [None]:
# Clone InsightSpike repository
!git clone https://github.com/Sunwood-ai-labs/InsightSpike-AI.git
%cd InsightSpike-AI

In [None]:
# Run setup script
!python experiments/colab_setup.py

In [None]:
# Mount Google Drive for persistent storage
from google.colab import drive
drive.mount('/content/drive')

# Create results directory in Drive
import os
results_dir = '/content/drive/MyDrive/InsightSpike_Results'
os.makedirs(results_dir, exist_ok=True)
print(f"üìÅ Results will be saved to: {results_dir}")

## 2. Weights & Biases Setup (Optional)

Set up experiment tracking with W&B for better visualization and monitoring.

In [None]:
# Login to Weights & Biases (optional but recommended)
use_wandb = False  # Set to True to enable W&B tracking

if use_wandb:
    import wandb
    wandb.login()
    print("‚úÖ W&B initialized for experiment tracking")
else:
    print("‚ÑπÔ∏è Running without W&B tracking")

## 3. Import Experiment Modules

In [None]:
# Add InsightSpike to path
import sys
sys.path.append('/content/InsightSpike-AI/src')
sys.path.append('/content/InsightSpike-AI')

# Import experiment modules
from experiments.colab_experiments.insight_benchmarks import (
    InsightBenchmarkSuite, run_rat_benchmark, run_all_benchmarks
)
from experiments.colab_experiments.scalability_testing import (
    ScalabilityTestSuite, test_memory_scaling, test_all_scalability
)
from experiments.colab_experiments.comparative_analysis import (
    ComparativeAnalysisSuite, compare_quality, compare_all_systems
)

print("‚úÖ All experiment modules imported successfully")

## 4. Quick Tests

Let's run some quick tests to ensure everything is working properly.

In [None]:
# Quick RAT test (5 problems)
print("üß™ Running quick RAT test...")
quick_results = run_rat_benchmark(n_problems=5, use_wandb=use_wandb)

print(f"\nüìä Quick Results:")
print(f"Accuracy: {quick_results['metrics']['accuracy']:.1f}%")
print(f"Spike Rate: {quick_results['metrics']['spike_rate']:.1f}%")
print(f"Avg Processing Time: {quick_results['metrics']['avg_processing_time']:.2f}s")

In [None]:
# Quick memory scaling test
print("üíæ Running quick memory scaling test...")
memory_results = test_memory_scaling(episode_counts=[100, 500, 1000])

print(f"\nüìä Memory Scaling Results:")
for measurement in memory_results['measurements']:
    print(f"{measurement['n_episodes']} episodes: {measurement['avg_retrieval_time']*1000:.1f}ms avg retrieval")

## 5. Insight Task Benchmarks

Run comprehensive benchmarks on various insight discovery tasks.

In [None]:
# Initialize benchmark suite
benchmark_suite = InsightBenchmarkSuite(use_wandb=use_wandb)

In [None]:
# Run RAT benchmark (100 problems)
print("üß† Running full RAT benchmark (100 problems)...")
rat_results = benchmark_suite.run_rat_benchmark(n_problems=100)

# Display results
print(f"\nüìä RAT Benchmark Results:")
print(f"Accuracy: {rat_results['metrics']['accuracy']:.1f}%")
print(f"Spike Rate: {rat_results['metrics']['spike_rate']:.1f}%")
print(f"Spike-Accuracy Correlation: {rat_results['metrics']['spike_accuracy_correlation']:.2f}")

In [None]:
# Visualize RAT results
import matplotlib.pyplot as plt
import pandas as pd

# Create results DataFrame
df = pd.DataFrame(rat_results['problems'])

# Plot accuracy by spike detection
fig, axes = plt.subplots(1, 2, figsize=(12, 5))

# Accuracy distribution
ax = axes[0]
spike_accuracy = df[df['spike_detected']]['correct'].mean() * 100
no_spike_accuracy = df[~df['spike_detected']]['correct'].mean() * 100

ax.bar(['With Spike', 'Without Spike'], [spike_accuracy, no_spike_accuracy], 
       color=['green', 'orange'], alpha=0.7)
ax.set_ylabel('Accuracy (%)')
ax.set_title('Accuracy by Spike Detection')
ax.grid(True, alpha=0.3)

# Processing time distribution
ax = axes[1]
ax.hist(df['processing_time'], bins=30, color='blue', alpha=0.7)
ax.set_xlabel('Processing Time (s)')
ax.set_ylabel('Frequency')
ax.set_title('Processing Time Distribution')
ax.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

In [None]:
# Run other benchmarks
print("üî¨ Running scientific discovery benchmark...")
sci_results = benchmark_suite.run_scientific_discovery_benchmark()

print(f"\nüìä Scientific Discovery Results:")
print(f"Discovery Rate: {sci_results['metrics']['discovery_rate']:.1f}%")
print(f"Novel Insight Rate: {sci_results['metrics']['novel_insight_rate']:.1f}%")

In [None]:
# Run comprehensive benchmark suite
print("üèÉ Running comprehensive benchmark suite (this may take a while)...")
comprehensive_results = benchmark_suite.run_comprehensive_benchmark()

print("\n‚úÖ Comprehensive benchmarks complete!")
for benchmark_name, results in comprehensive_results['benchmarks'].items():
    if 'metrics' in results:
        print(f"\n{benchmark_name}:")
        for metric, value in results['metrics'].items():
            if isinstance(value, (int, float)):
                print(f"  {metric}: {value:.2f}")

## 6. Scalability Testing

Test InsightSpike's performance at different scales.

In [None]:
# Initialize scalability test suite
scalability_suite = ScalabilityTestSuite()

In [None]:
# Test memory scaling
print("üìà Testing memory system scalability...")
memory_scaling_results = scalability_suite.run_memory_scaling_test(
    episode_counts=[100, 500, 1000, 5000, 10000],
    query_count=50
)

print(f"\nüìä Scaling Analysis:")
print(f"Complexity: {memory_scaling_results['analysis']['scaling_complexity']}")

In [None]:
# Test graph scaling
print("üï∏Ô∏è Testing graph processing scalability...")
graph_scaling_results = scalability_suite.run_graph_scaling_test(
    node_counts=[100, 500, 1000, 2500],
    edge_density=0.1
)

print(f"\nüìä Graph Complexity:")
print(f"Estimated: {graph_scaling_results['complexity_analysis']['estimated_complexity']}")

In [None]:
# Test concurrent users
print("üë• Testing concurrent user handling...")
concurrent_results = scalability_suite.run_concurrent_user_test(
    user_counts=[1, 5, 10, 20],
    queries_per_user=5
)

print(f"\nüìä Scalability Metrics:")
metrics = concurrent_results['scalability_metrics']
print(f"Average Scalability Score: {metrics['avg_scalability_score']:.2f}")
print(f"Scalability Degradation: {metrics['scalability_degradation']*100:.1f}%")

In [None]:
# Compare model sizes
print("ü§ñ Comparing different model sizes...")
model_comparison = scalability_suite.run_model_size_comparison(
    models=["TinyLlama/TinyLlama-1.1B-Chat-v1.0", "microsoft/phi-2"],
    test_queries=20
)

print(f"\nüìä Model Comparison:")
if 'tradeoff_analysis' in model_comparison:
    print(f"Best Efficiency: {model_comparison['tradeoff_analysis']['best_efficiency_model']}")

## 7. Comparative Analysis

Compare InsightSpike with baseline RAG systems.

In [None]:
# Initialize comparative analysis suite
comparison_suite = ComparativeAnalysisSuite()

# Prepare knowledge base
knowledge_base = [
    "Machine learning is a subset of artificial intelligence that enables systems to learn from data.",
    "Deep learning uses neural networks with multiple layers to process complex patterns.",
    "Natural language processing helps computers understand and generate human language.",
    "Computer vision enables machines to interpret and understand visual information.",
    "Reinforcement learning allows agents to learn through interaction with an environment.",
    "Transfer learning leverages pre-trained models for new tasks with limited data.",
    "Generative AI creates new content like text, images, and music.",
    "Transformers revolutionized NLP with attention mechanisms.",
    "Graph neural networks process data with complex relationships.",
    "Federated learning enables training on distributed data while preserving privacy."
]

comparison_suite.prepare_knowledge_base(knowledge_base)
print("‚úÖ Knowledge base prepared for both systems")

In [None]:
# Run quality comparison
test_queries = [
    "What is the relationship between deep learning and machine learning?",
    "How do transformers work in NLP?",
    "What are the applications of computer vision?",
    "Explain the concept of transfer learning",
    "What is the difference between supervised and reinforcement learning?"
]

print("üìä Running quality comparison...")
quality_results = comparison_suite.run_quality_comparison(
    test_queries=test_queries * 4,  # Run 20 queries
    evaluate_quality=True
)

metrics = quality_results['comparative_metrics']
print(f"\nüìà Quality Improvements:")
print(f"Overall Quality: {metrics['quality_improvement']:.1f}%")
print(f"Relevance: {metrics['relevance_improvement']:.1f}%")
print(f"Spike Rate: {metrics['spike_rate']*100:.1f}%")

In [None]:
# Run performance comparison
print("‚ö° Running performance comparison...")
performance_results = comparison_suite.run_performance_comparison(
    workloads=[10, 25, 50]
)

print(f"\nüìä Performance Results:")
for measurement in performance_results['measurements']:
    print(f"\n{measurement['n_queries']} queries:")
    print(f"  InsightSpike: {measurement['insightspike']['avg_response_time']:.3f}s")
    print(f"  Baseline RAG: {measurement['baseline']['avg_response_time']:.3f}s")
    print(f"  Speedup: {measurement['speedup']:.2f}x")

In [None]:
# Run insight discovery comparison
creative_queries = [
    "What unexpected connections exist between quantum computing and biology?",
    "How might AI transform education in ways we haven't considered?",
    "What are the hidden relationships between climate change and technology?",
    "Propose novel applications of blockchain beyond cryptocurrency",
    "What insights can we draw from comparing human and artificial intelligence?"
]

print("üí° Running insight discovery comparison...")
insight_results = comparison_suite.run_insight_discovery_comparison(
    creative_queries=creative_queries * 2
)

print(f"\nüìä Insight Discovery Results:")
print(f"InsightSpike Discovery Rate: {insight_results['metrics']['insightspike_discovery_rate']*100:.1f}%")
print(f"Baseline Discovery Rate: {insight_results['metrics']['baseline_discovery_rate']*100:.1f}%")
print(f"Average Insight Advantage: {insight_results['metrics']['avg_insight_advantage']:.2f}")

## 8. Comprehensive Analysis

Run a complete comparison across multiple domains.

In [None]:
# Run comprehensive comparison (this will take significant time)
print("üèÉ Running comprehensive comparison across domains...")
print("This will test: general, scientific, creative, and analytical domains")
print("Expected time: 15-30 minutes\n")

comprehensive_comparison = comparison_suite.run_comprehensive_comparison(
    n_queries=40,  # 10 queries per domain
    domains=['general', 'scientific', 'creative', 'analytical']
)

print("\n‚úÖ Comprehensive comparison complete!")

## 9. Results Summary & Export

Summarize all results and save to Google Drive.

In [None]:
# Create summary report
from datetime import datetime
import json

summary = {
    'experiment_date': datetime.now().isoformat(),
    'environment': 'Google Colab',
    'gpu': torch.cuda.get_device_name(0) if torch.cuda.is_available() else 'None',
    'results_summary': {
        'rat_benchmark': {
            'accuracy': rat_results['metrics']['accuracy'],
            'spike_rate': rat_results['metrics']['spike_rate']
        } if 'rat_results' in locals() else {},
        'scalability': {
            'memory_complexity': memory_scaling_results['analysis']['scaling_complexity']
        } if 'memory_scaling_results' in locals() else {},
        'comparison': {
            'quality_improvement': quality_results['comparative_metrics']['quality_improvement']
        } if 'quality_results' in locals() else {}
    }
}

# Save to Drive
summary_path = f"{results_dir}/experiment_summary_{datetime.now().strftime('%Y%m%d_%H%M%S')}.json"
with open(summary_path, 'w') as f:
    json.dump(summary, f, indent=2)

print(f"üìÑ Summary saved to: {summary_path}")

In [None]:
# Copy all results to Drive
import shutil
from pathlib import Path

local_results = Path('/content/InsightSpike-AI/experiments/colab_experiments/results')
if local_results.exists():
    drive_results = f"{results_dir}/detailed_results_{datetime.now().strftime('%Y%m%d_%H%M%S')}"
    shutil.copytree(local_results, drive_results)
    print(f"üìÅ All results copied to: {drive_results}")

## 10. Resource Monitoring & Cleanup

In [None]:
# Monitor resource usage
import psutil
import GPUtil

print("üíª System Resources:")
print(f"CPU Usage: {psutil.cpu_percent()}%")
print(f"RAM Usage: {psutil.virtual_memory().percent}%")
print(f"Available RAM: {psutil.virtual_memory().available / 1e9:.1f} GB")

if torch.cuda.is_available():
    gpus = GPUtil.getGPUs()
    for gpu in gpus:
        print(f"\nGPU {gpu.id}: {gpu.name}")
        print(f"  Memory Used: {gpu.memoryUsed:.0f} MB / {gpu.memoryTotal:.0f} MB")
        print(f"  GPU Utilization: {gpu.load*100:.1f}%")

In [None]:
# Cleanup GPU memory
import gc

if torch.cuda.is_available():
    torch.cuda.empty_cache()
    gc.collect()
    print("‚úÖ GPU memory cleared")

## üìä Final Summary

Congratulations! You've successfully run comprehensive InsightSpike experiments. Here's what we accomplished:

1. **Insight Benchmarks**: Tested creative problem-solving capabilities
2. **Scalability Tests**: Verified performance with increasing data sizes
3. **Comparative Analysis**: Demonstrated advantages over baseline RAG

All results have been saved to your Google Drive for further analysis.

### Next Steps:
- Analyze the detailed results in the generated reports
- Fine-tune parameters based on your specific use case
- Extend experiments with custom datasets
- Share results with the InsightSpike community

---

*Following CLAUDE.md guidelines: All experiments used genuine processing without mocks or cheats.*