# MI-Identifiability Regularization Experiments

This notebook runs all regularization experiments for testing identifiability of MI criteria.

## ðŸ”’ AUTO-SAVE PROTECTION
**NEW: Results automatically save to Google Drive after each experiment!**
- Never lose your results from runtime disconnections
- Each session creates a timestamped folder
- Results saved incrementally as experiments complete

**Steps:**
1. Install dependencies
2. Mount Google Drive (for auto-save)
3. Upload your code files
4. Run baseline and regularization experiments
5. Analyze results

## 1. Enable GPU in Colab

**IMPORTANT: Before running any code, enable GPU:**

1. Click **Runtime** in the top menu
2. Select **Change runtime type**
3. Under **Hardware accelerator**, select **T4 GPU** (or any available GPU)
4. Click **Save**

Then run the cells below to verify GPU access.

## 2. Setup and Installation

In [None]:
# Check GPU availability
import torch

if torch.cuda.is_available():
    print(f"âœ“ GPU is available!")
    print(f"  GPU Name: {torch.cuda.get_device_name(0)}")
    print(f"  GPU Memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.2f} GB")
    DEVICE = 'cuda:0'
else:
    print("âœ— No GPU available. Please enable GPU in Runtime > Change runtime type")
    print("  Falling back to CPU (will be slower)")
    DEVICE = 'cpu'

print(f"\nUsing device: {DEVICE}")

In [None]:
# Install dependencies
!pip install tqdm matplotlib numpy scipy pandas torch networkx torchvision seaborn -q

## 3. Mount Google Drive (Optional)

In [None]:
# Check if we're in Colab and setup auto-save to Drive
import sys
import os
from datetime import datetime

IN_COLAB = 'google.colab' in sys.modules

if IN_COLAB:
    print("Running in Google Colab")
    # Mount Google Drive for automatic saving
    from google.colab import drive
    drive.mount('/content/drive')
    
    # Create timestamped folder for this session
    timestamp = datetime.now().strftime('%Y%m%d_%H%M%S')
    DRIVE_SAVE_DIR = f'/content/drive/MyDrive/MI_Experiments_{timestamp}'
    os.makedirs(DRIVE_SAVE_DIR, exist_ok=True)
    os.makedirs(f'{DRIVE_SAVE_DIR}/logs', exist_ok=True)
    os.makedirs(f'{DRIVE_SAVE_DIR}/analysis', exist_ok=True)
    
    print(f"\n{'='*70}")
    print(f"âœ“ AUTO-SAVE ENABLED!")
    print(f"Results will be saved to: {DRIVE_SAVE_DIR}")
    print(f"This protects you from losing results if runtime disconnects!")
    print(f"{'='*70}\n")
else:
    print("Not running in Colab")
    DRIVE_SAVE_DIR = None

In [None]:
# Helper function to save results immediately after each experiment
import shutil
import glob

def save_latest_results_to_drive():
    """Save the latest experiment results to Google Drive immediately."""
    if not IN_COLAB or DRIVE_SAVE_DIR is None:
        return
    
    try:
        # Find all run directories
        run_dirs = glob.glob('/content/logs/run_*')
        if not run_dirs:
            print("âš  No results to save yet")
            return
        
        # Copy entire logs directory
        print(f"\nðŸ’¾ Saving results to Google Drive...")
        
        # Remove old backup and create fresh copy
        drive_logs = f'{DRIVE_SAVE_DIR}/logs'
        if os.path.exists(drive_logs):
            shutil.rmtree(drive_logs)
        shutil.copytree('/content/logs', drive_logs)
        
        print(f"âœ“ Saved {len(run_dirs)} experiment runs to Drive")
        print(f"  Location: {drive_logs}")
        
        # Also save a progress log
        with open(f'{DRIVE_SAVE_DIR}/progress.txt', 'a') as f:
            f.write(f"{datetime.now().strftime('%Y-%m-%d %H:%M:%S')} - Saved {len(run_dirs)} runs\n")
        
        return True
        
    except Exception as e:
        print(f"âš  Error saving to Drive: {e}")
        print("  Your results are still in /content/logs")
        return False

print("âœ“ Auto-save helper function loaded")

## 4. Upload Your Code Files

You have two options:

**Option A: Upload files directly**

In [None]:
if IN_COLAB:
    from google.colab import files
    
    print("Please upload the following files:")
    print("- main.py")
    print("- analyze_regularization.py")
    print("- setup.py")
    print("- The entire 'mi_identifiability' folder (zipped)")
    
    uploaded = files.upload()
    
    # If mi_identifiability is uploaded as a zip, extract it
    import zipfile
    for filename in uploaded.keys():
        if filename.endswith('.zip'):
            with zipfile.ZipFile(filename, 'r') as zip_ref:
                zip_ref.extractall('.')
            print(f"Extracted {filename}")

**Option B: Clone from GitHub (if your code is in a repo)**

In [None]:
# Uncomment and modify if cloning from GitHub
# !git clone https://github.com/YOUR_USERNAME/MI-identifiability.git
# %cd MI-identifiability

## 5. Verify Setup

In [None]:
# Check that required files exist
import os

required_files = ['main.py', 'analyze_regularization.py']
required_dirs = ['mi_identifiability']

print("Checking for required files...")
for f in required_files:
    if os.path.exists(f):
        print(f"âœ“ {f} found")
    else:
        print(f"âœ— {f} NOT FOUND")

for d in required_dirs:
    if os.path.isdir(d):
        print(f"âœ“ {d}/ directory found")
    else:
        print(f"âœ— {d}/ directory NOT FOUND")

# List all files in current directory
print("\nCurrent directory contents:")
!ls -la

## 6. Run Baseline Experiment

### 6.1. Quick Test (1 experiment)

First, let's run just 1 experiment to verify everything works:

In [None]:
# Quick test with just 1 experiment to verify setup
!python main.py --verbose --val-frequency 1 --noise-std 0.0 \
    --target-logic-gates XOR \
    --n-experiments 1 --size 3 --depth 2 \
    --device {DEVICE}

### 6.2. Full Baseline (100 experiments)

If the test above worked, run the full baseline:

In [None]:
# Run baseline (no regularization)
!python main.py --verbose --val-frequency 1 --noise-std 0.0 \
    --target-logic-gates XOR \
    --n-experiments 100 --size 3 --depth 2 \
    --device {DEVICE}

# Automatically save results to Drive
save_latest_results_to_drive()

## 6.5. Check Baseline Results

In [None]:
# Check if results were saved
import os
import glob

print("Checking for saved results...\n")

# Find all run directories
run_dirs = glob.glob('/content/logs/run_*')
print(f"Found {len(run_dirs)} run directories:")
for d in sorted(run_dirs):
    print(f"  {d}")

if run_dirs:
    latest_run = sorted(run_dirs)[-1]
    print(f"\nLatest run: {latest_run}")
    
    # Check what files exist
    print(f"\nFiles in latest run:")
    !ls -lh {latest_run}
    
    # Try to read the results
    import pandas as pd
    
    df_out_path = f"{latest_run}/df_out.csv"
    if os.path.exists(df_out_path):
        df = pd.read_csv(df_out_path)
        print(f"\nâœ“ Results file found with {len(df)} rows")
        print("\nFirst few rows:")
        print(df.head())
        
        if len(df) == 0:
            print("\nâš  WARNING: Results file is empty!")
            print("This means no experiments converged successfully.")
            print("\nPossible reasons:")
            print("1. Model isn't converging (loss/accuracy not meeting thresholds)")
            print("2. Training is failing silently")
            print("3. GPU issues")
            
            # Check the log file
            log_path = f"{latest_run}/output.log"
            if os.path.exists(log_path):
                print("\nChecking log file for 'No convergence' messages...")
                !grep -c "No convergence" {log_path} || echo "No convergence messages found"
                print("\nLast 20 lines of log:")
                !tail -20 {log_path}
    else:
        print(f"\nâœ— No df_out.csv found")
        
        # Check for data_tmp.csv
        tmp_path = f"{latest_run}/data_tmp.csv"
        if os.path.exists(tmp_path):
            df_tmp = pd.read_csv(tmp_path)
            print(f"\nâœ“ Temporary data file found with {len(df_tmp)} rows")
        else:
            print("\nâœ— No temporary data file found either")
else:
    print("\nâœ— No run directories found at all!")
    print("The experiments may have failed to start.")

## 7. Run L1 Regularization Experiments

In [None]:
# L1 regularization experiments
l1_lambdas = [0.0001, 0.0005, 0.001, 0.005, 0.01, 0.05]

for lambda_val in l1_lambdas:
    print(f"\n{'='*60}")
    print(f"Running L1 experiment with lambda={lambda_val}")
    print(f"{'='*60}\n")
    
    !python main.py --verbose --val-frequency 1 --noise-std 0.0 \
        --target-logic-gates XOR \
        --n-experiments 100 --size 3 --depth 2 \
        --l1-lambda {lambda_val} \
        --device {DEVICE}
    
    # Save after each lambda value completes
    print(f"\nCompleted L1 lambda={lambda_val}")
    save_latest_results_to_drive()

## 8. Run L2 Regularization Experiments

In [None]:
# L2 regularization experiments
l2_lambdas = [0.0001, 0.0005, 0.001, 0.005, 0.01, 0.05, 0.1]

for lambda_val in l2_lambdas:
    print(f"\n{'='*60}")
    print(f"Running L2 experiment with lambda={lambda_val}")
    print(f"{'='*60}\n")
    
    !python main.py --verbose --val-frequency 1 --noise-std 0.0 \
        --target-logic-gates XOR \
        --n-experiments 100 --size 3 --depth 2 \
        --l2-lambda {lambda_val} \
        --device {DEVICE}
    
    # Save after each lambda value completes
    print(f"\nCompleted L2 lambda={lambda_val}")
    save_latest_results_to_drive()

## 9. Run Dropout Experiments

In [None]:
# Dropout experiments
dropout_rates = [0.1, 0.2, 0.3, 0.4, 0.5]

for rate in dropout_rates:
    print(f"\n{'='*60}")
    print(f"Running Dropout experiment with rate={rate}")
    print(f"{'='*60}\n")
    
    !python main.py --verbose --val-frequency 1 --noise-std 0.0 \
        --target-logic-gates XOR \
        --n-experiments 100 --size 3 --depth 2 \
        --dropout-rate {rate} \
        --device {DEVICE}
    
    # Save after each dropout rate completes
    print(f"\nCompleted Dropout rate={rate}")
    save_latest_results_to_drive()

## 10. Analyze Results

In [None]:
# Run analysis on all results
!python analyze_regularization.py logs --output-dir analysis_output

# Save analysis results to Drive
if IN_COLAB and DRIVE_SAVE_DIR:
    import shutil
    if os.path.exists('analysis_output'):
        drive_analysis = f'{DRIVE_SAVE_DIR}/analysis'
        if os.path.exists(drive_analysis):
            shutil.rmtree(drive_analysis)
        shutil.copytree('analysis_output', drive_analysis)
        print(f"\nâœ“ Analysis saved to: {drive_analysis}")

## 11. View Analysis Results

In [None]:
# Display summary
with open('analysis_output/analysis_summary.txt', 'r') as f:
    print(f.read())

In [None]:
# Display statistical test results
import pandas as pd

print("\nL1 Statistical Tests:")
if os.path.exists('analysis_output/l1_statistical_tests.csv'):
    df_l1 = pd.read_csv('analysis_output/l1_statistical_tests.csv')
    display(df_l1)

print("\nL2 Statistical Tests:")
if os.path.exists('analysis_output/l2_statistical_tests.csv'):
    df_l2 = pd.read_csv('analysis_output/l2_statistical_tests.csv')
    display(df_l2)

print("\nDropout Statistical Tests:")
if os.path.exists('analysis_output/dropout_statistical_tests.csv'):
    df_dropout = pd.read_csv('analysis_output/dropout_statistical_tests.csv')
    display(df_dropout)

In [None]:
# Display plots
from IPython.display import Image, display
import glob

plot_files = glob.glob('analysis_output/*.png')
for plot_file in sorted(plot_files):
    print(f"\n{plot_file}:")
    display(Image(filename=plot_file))

## 12. Download Results (Optional)

In [None]:
if IN_COLAB:
    # Create a zip file of all results
    !zip -r results.zip logs analysis_output
    
    # Download
    from google.colab import files
    files.download('results.zip')

## 13. Results Already Saved!

âœ“ Your results are being automatically saved to Google Drive after each experiment!

Location: Check `MI_Experiments_[TIMESTAMP]` folder in your Drive

The folder contains:
- `logs/` - All experiment results
- `analysis/` - Analysis outputs
- `progress.txt` - Log of what's been saved

You can also manually verify or copy additional files:

In [None]:
if IN_COLAB and DRIVE_SAVE_DIR:
    print(f"Your results are saved at: {DRIVE_SAVE_DIR}")
    print(f"\nFolder contents:")
    !ls -lh {DRIVE_SAVE_DIR}
    print(f"\nNumber of experiment runs saved:")
    !ls -d {DRIVE_SAVE_DIR}/logs/run_* 2>/dev/null | wc -l
    
    # Show progress log
    progress_file = f"{DRIVE_SAVE_DIR}/progress.txt"
    if os.path.exists(progress_file):
        print(f"\nSave history:")
        with open(progress_file, 'r') as f:
            print(f.read())

## Alternative: Run Smaller Test First

If you want to test with fewer experiments first:

In [None]:
# Quick test with just 10 experiments
!python main.py --verbose --val-frequency 1 --noise-std 0.0 \
    --target-logic-gates XOR \
    --n-experiments 10 --size 3 --depth 2 \
    --device {DEVICE}

# Test with one L1 value
!python main.py --verbose --val-frequency 1 --noise-std 0.0 \
    --target-logic-gates XOR \
    --n-experiments 10 --size 3 --depth 2 \
    --l1-lambda 0.001 \
    --device {DEVICE}