# MI-Identifiability Regularization Experiments

This notebook runs all regularization experiments for testing identifiability of MI criteria.

## ðŸ”’ AUTO-SAVE PROTECTION
**NEW: Results automatically save to Google Drive after each experiment!**
- Never lose your results from runtime disconnections
- Each session creates a timestamped folder
- Results saved incrementally as experiments complete

**Steps:**
1. Install dependencies
2. Mount Google Drive (for auto-save)
3. Upload your code files
4. Run baseline and regularization experiments
5. Analyze results

## 1. Enable GPU in Colab

**IMPORTANT: Before running any code, enable GPU:**

1. Click **Runtime** in the top menu
2. Select **Change runtime type**
3. Under **Hardware accelerator**, select **T4 GPU** (or any available GPU)
4. Click **Save**

Then run the cells below to verify GPU access.

## 2. Setup and Installation

In [None]:
# Check GPU availability
import torch

if torch.cuda.is_available():
    print(f"âœ“ GPU is available!")
    print(f"  GPU Name: {torch.cuda.get_device_name(0)}")
    print(f"  GPU Memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.2f} GB")
    DEVICE = 'cuda:0'
else:
    print("âœ— No GPU available. Please enable GPU in Runtime > Change runtime type")
    print("  Falling back to CPU (will be slower)")
    DEVICE = 'cpu'

print(f"\nUsing device: {DEVICE}")

âœ“ GPU is available!
  GPU Name: NVIDIA L4
  GPU Memory: 23.80 GB

Using device: cuda:0


In [None]:
# Install dependencies
!pip install tqdm matplotlib numpy scipy pandas torch networkx torchvision seaborn -q

## 3. Mount Google Drive (Optional)

In [None]:
# Check if we're in Colab and setup auto-save to Drive
import sys
import os
from datetime import datetime

IN_COLAB = 'google.colab' in sys.modules

if IN_COLAB:
    print("Running in Google Colab")
    # Mount Google Drive for automatic saving
    from google.colab import drive
    drive.mount('/content/drive')

    # Create timestamped folder for this session
    timestamp = datetime.now().strftime('%Y%m%d_%H%M%S')
    DRIVE_SAVE_DIR = f'/content/drive/MyDrive/MI_Experiments_{timestamp}'
    os.makedirs(DRIVE_SAVE_DIR, exist_ok=True)
    os.makedirs(f'{DRIVE_SAVE_DIR}/logs', exist_ok=True)
    os.makedirs(f'{DRIVE_SAVE_DIR}/analysis', exist_ok=True)

    print(f"\n{'='*70}")
    print(f"âœ“ AUTO-SAVE ENABLED!")
    print(f"Results will be saved to: {DRIVE_SAVE_DIR}")
    print(f"This protects you from losing results if runtime disconnects!")
    print(f"{'='*70}\n")
else:
    print("Not running in Colab")
    DRIVE_SAVE_DIR = None

Running in Google Colab
Mounted at /content/drive

âœ“ AUTO-SAVE ENABLED!
Results will be saved to: /content/drive/MyDrive/MI_Experiments_20251121_033515
This protects you from losing results if runtime disconnects!



In [None]:
# Helper function to save results immediately after each experiment
import shutil
import glob

def save_latest_results_to_drive():
    """Save the latest experiment results to Google Drive immediately."""
    if not IN_COLAB or DRIVE_SAVE_DIR is None:
        return

    try:
        # Find all run directories
        run_dirs = glob.glob('logs/run_*')
        if not run_dirs:
            print("âš  No results to save yet")
            return

        # Copy entire logs directory
        print(f"\nðŸ’¾ Saving results to Google Drive...")

        # Remove old backup and create fresh copy
        drive_logs = f'{DRIVE_SAVE_DIR}/logs'
        if os.path.exists(drive_logs):
            shutil.rmtree(drive_logs)
        shutil.copytree('logs', drive_logs)

        print(f"âœ“ Saved {len(run_dirs)} experiment runs to Drive")
        print(f"  Location: {drive_logs}")

        # Also save a progress log
        with open(f'{DRIVE_SAVE_DIR}/progress.txt', 'a') as f:
            f.write(f"{datetime.now().strftime('%Y-%m-%d %H:%M:%S')} - Saved {len(run_dirs)} runs\n")

        return True

    except Exception as e:
        print(f"âš  Error saving to Drive: {e}")
        print("  Your results are still in /logs")
        return False

print("âœ“ Auto-save helper function loaded")

save_latest_results_to_drive()

âœ“ Auto-save helper function loaded
âš  No results to save yet


## 4. Upload Your Code Files

You have two options:

**Option A: Upload files directly**

In [None]:
if IN_COLAB:
    from google.colab import files

    print("Please upload the following files:")
    print("- main.py")
    print("- analyze_regularization.py")
    print("- setup.py")
    print("- The entire 'mi_identifiability' folder (zipped)")

    uploaded = files.upload()

    # If mi_identifiability is uploaded as a zip, extract it
    import zipfile
    for filename in uploaded.keys():
        if filename.endswith('.zip'):
            with zipfile.ZipFile(filename, 'r') as zip_ref:
                zip_ref.extractall('.')
            print(f"Extracted {filename}")

Please upload the following files:
- main.py
- analyze_regularization.py
- setup.py
- The entire 'mi_identifiability' folder (zipped)


**Option B: Clone from GitHub (if your code is in a repo)**

In [None]:
# Uncomment and modify if cloning from GitHub
!git clone https://github.com/gwherb/MI-identifiability.git
%cd MI-identifiability

Cloning into 'MI-identifiability'...
remote: Enumerating objects: 98, done.[K
remote: Counting objects: 100% (79/79), done.[K
remote: Compressing objects: 100% (63/63), done.[K
remote: Total 98 (delta 29), reused 57 (delta 13), pack-reused 19 (from 1)[K
Receiving objects: 100% (98/98), 5.02 MiB | 3.78 MiB/s, done.
Resolving deltas: 100% (29/29), done.
/content/MI-identifiability


## 5. Verify Setup

In [None]:
# Check that required files exist
import os

required_files = ['main.py', 'analyze_regularization.py']
required_dirs = ['mi_identifiability']

print("Checking for required files...")
for f in required_files:
    if os.path.exists(f):
        print(f"âœ“ {f} found")
    else:
        print(f"âœ— {f} NOT FOUND")

for d in required_dirs:
    if os.path.isdir(d):
        print(f"âœ“ {d}/ directory found")
    else:
        print(f"âœ— {d}/ directory NOT FOUND")

# List all files in current directory
print("\nCurrent directory contents:")
!ls -la

Checking for required files...
âœ“ main.py found
âœ“ analyze_regularization.py found
âœ“ mi_identifiability/ directory found

Current directory contents:
total 3456
drwxr-xr-x 11 root root    4096 Nov 21 03:35 .
drwxr-xr-x  1 root root    4096 Nov 21 03:35 ..
drwxr-xr-x  2 root root    4096 Nov 21 03:35 analysis_output_run1
drwxr-xr-x  2 root root    4096 Nov 21 03:35 analysis_output_run2
drwxr-xr-x  2 root root    4096 Nov 21 03:35 analysis_output_run3
-rw-r--r--  1 root root    9027 Nov 21 03:35 analyze_regularization.py
-rw-r--r--  1 root root   18772 Nov 21 03:35 demo_mnist.ipynb
-rw-r--r--  1 root root 3317198 Nov 21 03:35 demo_xor.ipynb
drwxr-xr-x  8 root root    4096 Nov 21 03:35 .git
-rw-r--r--  1 root root    3109 Nov 21 03:35 .gitignore
-rw-r--r--  1 root root    1071 Nov 21 03:35 LICENSE
-rw-r--r--  1 root root   11389 Nov 21 03:35 main.py
drwxr-xr-x  2 root root    4096 Nov 21 03:35 MI_Experiments_20251030_024609
drwxr-xr-x  2 root root    4096 Nov 21 03:35 MI_Experiments_2

## 6. Run Baseline Experiment

Test run

In [None]:
# Full validation
%%time
!python main.py --verbose --val-frequency 1 --noise-std 0.0 \
    --target-logic-gates XOR \
    --n-samples-val 20 \
    --n-repeats 1 \
    --min-sparsity 0 \
    --use-gpu-batching \
    --gpu-batch-size 4096 \
    --n-experiments 1 --size 3 --depth 2 \
    --device {DEVICE}


2025-11-18 20:11:09,267 - INFO - Configuration in use:
2025-11-18 20:11:09,267 - INFO - Namespace(seed=0, size=[3], depth=[2], n_repeats=1, n_experiments=1, noise_std=0.0, n_samples_train=1000, n_samples_val=20, n_gates=[1], batch_size=100, learning_rate=[0.001], epochs=1000, max_circuits=None, min_sparsity=0.0, loss_target=[0.01], skewed_distribution=False, device='cuda:0', target_logic_gates=['XOR'], accuracy_threshold=0.99, val_frequency=1, verbose=True, resume_from=None, l1_lambda=0.0, l2_lambda=0.0, dropout_rate=0.0, use_gpu_batching=True, gpu_batch_size=4096)
2025-11-18 20:11:09,268 - INFO - Setting the seeds: 0
Iteration # 0
2025-11-18 20:11:11,897 - INFO - Epoch [1/1000], Train Loss: 0.4492, Train Accuracy: 0.5022
2025-11-18 20:11:11,897 - INFO - Val Loss: 0.4036, Val Accuracy: 0.5125, Bad Epochs: 0
2025-11-18 20:11:11,985 - INFO - Epoch [2/1000], Train Loss: 0.3710, Train Accuracy: 0.5022
2025-11-18 20:11:11,985 - INFO - Val Loss: 0.3349, Val Accuracy: 0.5125, Bad Epochs: 0
20

### 6.2. Full Baseline (100 experiments)

If the test above worked, run the full baseline:

In [None]:
%%time
# Run baseline (no regularization)
!python main.py --verbose --val-frequency 1 --noise-std 0.0 \
      --target-logic-gates XOR \
      --n-samples-val 20 \
      --n-repeats 1 \
      --min-sparsity 0 \
      --early-stopping-steps 200 \
      --use-gpu-batching \
      --gpu-batch-size 4096 \
      --n-experiments 100 --size 3 --depth 2 \
    --device {DEVICE}

# Automatically save results to Drive
save_latest_results_to_drive()

[1;30;43mStreaming output truncated to the last 5000 lines.[0m
2025-11-21 03:45:36,896 - INFO - Val Loss: 0.2516, Val Accuracy: 0.7375, Bad Epochs: 81
2025-11-21 03:45:36,982 - INFO - Epoch [338/1000], Train Loss: 0.2495, Train Accuracy: 0.7690
2025-11-21 03:45:36,983 - INFO - Val Loss: 0.2517, Val Accuracy: 0.7375, Bad Epochs: 82
2025-11-21 03:45:37,077 - INFO - Epoch [339/1000], Train Loss: 0.2495, Train Accuracy: 0.7690
2025-11-21 03:45:37,077 - INFO - Val Loss: 0.2516, Val Accuracy: 0.7375, Bad Epochs: 83
2025-11-21 03:45:37,173 - INFO - Epoch [340/1000], Train Loss: 0.2495, Train Accuracy: 0.5182
2025-11-21 03:45:37,173 - INFO - Val Loss: 0.2515, Val Accuracy: 0.4500, Bad Epochs: 84
2025-11-21 03:45:37,261 - INFO - Epoch [341/1000], Train Loss: 0.2495, Train Accuracy: 0.7690
2025-11-21 03:45:37,262 - INFO - Val Loss: 0.2516, Val Accuracy: 0.7375, Bad Epochs: 85
2025-11-21 03:45:37,352 - INFO - Epoch [342/1000], Train Loss: 0.2495, Train Accuracy: 0.7690
2025-11-21 03:45:37,352 -

True

## 7. Run L1 Regularization Experiments

In [None]:
%%time
# L1 regularization experiments
l1_lambdas = [0.0001, 0.0005, 0.001, 0.005, 0.01, 0.05, 0.1]

for lambda_val in l1_lambdas:
    print(f"\n{'='*60}")
    print(f"Running L1 experiment with lambda={lambda_val}")
    print(f"{'='*60}\n")

    !python main.py --verbose --val-frequency 1 --noise-std 0.0 \
        --target-logic-gates XOR \
        --n-samples-val 20 \
        --n-repeats 1 \
        --min-sparsity 0 \
        --use-gpu-batching \
        --early-stopping-steps 200 \
        --gpu-batch-size 4096 \
        --n-experiments 100 --size 3 --depth 2 \
        --l1-lambda {lambda_val} \
        --device {DEVICE}

    # Save after each lambda value completes
    print(f"\nCompleted L1 lambda={lambda_val}")
    save_latest_results_to_drive()

[1;30;43mStreaming output truncated to the last 5000 lines.[0m
2025-11-21 11:23:28,972 - INFO - Val Loss: 0.2562, Val Accuracy: 0.4625, Bad Epochs: 113, L1: 0.1
2025-11-21 11:23:29,089 - INFO - Epoch [206/1000], Train Loss: 0.2978, Train Accuracy: 0.5000
2025-11-21 11:23:29,089 - INFO - Val Loss: 0.2561, Val Accuracy: 0.4625, Bad Epochs: 114, L1: 0.1
2025-11-21 11:23:29,211 - INFO - Epoch [207/1000], Train Loss: 0.2978, Train Accuracy: 0.5000
2025-11-21 11:23:29,211 - INFO - Val Loss: 0.2563, Val Accuracy: 0.4625, Bad Epochs: 115, L1: 0.1
2025-11-21 11:23:29,329 - INFO - Epoch [208/1000], Train Loss: 0.2978, Train Accuracy: 0.5000
2025-11-21 11:23:29,329 - INFO - Val Loss: 0.2564, Val Accuracy: 0.4625, Bad Epochs: 116, L1: 0.1
2025-11-21 11:23:29,442 - INFO - Epoch [209/1000], Train Loss: 0.2978, Train Accuracy: 0.5000
2025-11-21 11:23:29,442 - INFO - Val Loss: 0.2568, Val Accuracy: 0.4625, Bad Epochs: 117, L1: 0.1
2025-11-21 11:23:29,552 - INFO - Epoch [210/1000], Train Loss: 0.2978

## 8. Run L2 Regularization Experiments

In [None]:
%%time
# L2 regularization experiments
l2_lambdas = [0.0001, 0.0005, 0.001, 0.005, 0.01, 0.05, 0.1]

for lambda_val in l2_lambdas:
    print(f"\n{'='*60}")
    print(f"Running L2 experiment with lambda={lambda_val}")
    print(f"{'='*60}\n")

    !python main.py --verbose --val-frequency 1 --noise-std 0.0 \
        --target-logic-gates XOR \
        --n-samples-val 20 \
        --n-repeats 1 \
        --min-sparsity 0 \
        --use-gpu-batching \
        --early-stopping-steps 200 \
        --gpu-batch-size 4096 \
        --n-experiments 100 --size 3 --depth 2 \
        --l2-lambda {lambda_val} \
        --device {DEVICE}

    # Save after each lambda value completes
    print(f"\nCompleted L2 lambda={lambda_val}")
    save_latest_results_to_drive()

[1;30;43mStreaming output truncated to the last 5000 lines.[0m
2025-11-21 19:09:34,529 - INFO - Val Loss: 0.2487, Val Accuracy: 0.5375, Bad Epochs: 191, L2: 0.1
2025-11-21 19:09:34,645 - INFO - Epoch [372/1000], Train Loss: 0.2723, Train Accuracy: 0.5048
2025-11-21 19:09:34,645 - INFO - Val Loss: 0.2488, Val Accuracy: 0.5375, Bad Epochs: 192, L2: 0.1
2025-11-21 19:09:34,768 - INFO - Epoch [373/1000], Train Loss: 0.2723, Train Accuracy: 0.5048
2025-11-21 19:09:34,768 - INFO - Val Loss: 0.2488, Val Accuracy: 0.5375, Bad Epochs: 193, L2: 0.1
2025-11-21 19:09:34,883 - INFO - Epoch [374/1000], Train Loss: 0.2723, Train Accuracy: 0.5048
2025-11-21 19:09:34,883 - INFO - Val Loss: 0.2488, Val Accuracy: 0.5375, Bad Epochs: 194, L2: 0.1
2025-11-21 19:09:35,000 - INFO - Epoch [375/1000], Train Loss: 0.2723, Train Accuracy: 0.5048
2025-11-21 19:09:35,000 - INFO - Val Loss: 0.2488, Val Accuracy: 0.5375, Bad Epochs: 195, L2: 0.1
2025-11-21 19:09:35,120 - INFO - Epoch [376/1000], Train Loss: 0.2723

## 9. Run Dropout Experiments

In [None]:
%%time
# Dropout experiments
dropout_rates = [0.01, 0.05, 0.1, 0.2, 0.3, 0.4, 0.5]

for rate in dropout_rates:
    print(f"\n{'='*60}")
    print(f"Running Dropout experiment with rate={rate}")
    print(f"{'='*60}\n")

    !python main.py --verbose --val-frequency 1 --noise-std 0.0 \
        --target-logic-gates XOR \
        --n-samples-val 20 \
        --n-repeats 1 \
        --min-sparsity 0 \
        --use-gpu-batching \
        --early-stopping-steps 200 \
        --gpu-batch-size 4096 \
        --n-experiments 100 --size 3 --depth 2 \
        --dropout-rate {rate} \
        --device {DEVICE}

    # Save after each dropout rate completes
    print(f"\nCompleted Dropout rate={rate}")
    save_latest_results_to_drive()

[1;30;43mStreaming output truncated to the last 5000 lines.[0m
2025-11-21 23:18:35,572 - INFO - Epoch [194/1000], Train Loss: 0.0939, Train Accuracy: 0.7582
2025-11-21 23:18:35,572 - INFO - Val Loss: 0.1082, Val Accuracy: 0.8375, Bad Epochs: 49, Dropout: 0.2
2025-11-21 23:18:35,659 - INFO - Epoch [195/1000], Train Loss: 0.0959, Train Accuracy: 0.7582
2025-11-21 23:18:35,659 - INFO - Val Loss: 0.1173, Val Accuracy: 0.8375, Bad Epochs: 50, Dropout: 0.2
2025-11-21 23:18:35,745 - INFO - Epoch [196/1000], Train Loss: 0.0977, Train Accuracy: 0.7582
2025-11-21 23:18:35,745 - INFO - Val Loss: 0.1204, Val Accuracy: 0.8375, Bad Epochs: 51, Dropout: 0.2
2025-11-21 23:18:35,832 - INFO - Epoch [197/1000], Train Loss: 0.0939, Train Accuracy: 0.7582
2025-11-21 23:18:35,832 - INFO - Val Loss: 0.1181, Val Accuracy: 0.8375, Bad Epochs: 52, Dropout: 0.2
2025-11-21 23:18:35,921 - INFO - Epoch [198/1000], Train Loss: 0.0978, Train Accuracy: 0.7582
2025-11-21 23:18:35,921 - INFO - Val Loss: 0.1204, Val Ac

## 10. Analyze Results

In [None]:
# Run analysis on all results
!python analyze_regularization.py logs --output-dir analysis_output

# Save analysis results to Drive
if IN_COLAB and DRIVE_SAVE_DIR:
    import shutil
    if os.path.exists('analysis_output'):
        drive_analysis = f'{DRIVE_SAVE_DIR}/analysis'
        if os.path.exists(drive_analysis):
            shutil.rmtree(drive_analysis)
        shutil.copytree('analysis_output', drive_analysis)
        print(f"\nâœ“ Analysis saved to: {drive_analysis}")

## 11. View Analysis Results

In [None]:
# Display summary
with open('analysis_output/analysis_summary.txt', 'r') as f:
    print(f.read())

In [None]:
# Display statistical test results
import pandas as pd

print("\nL1 Statistical Tests:")
if os.path.exists('analysis_output/l1_statistical_tests.csv'):
    df_l1 = pd.read_csv('analysis_output/l1_statistical_tests.csv')
    display(df_l1)

print("\nL2 Statistical Tests:")
if os.path.exists('analysis_output/l2_statistical_tests.csv'):
    df_l2 = pd.read_csv('analysis_output/l2_statistical_tests.csv')
    display(df_l2)

print("\nDropout Statistical Tests:")
if os.path.exists('analysis_output/dropout_statistical_tests.csv'):
    df_dropout = pd.read_csv('analysis_output/dropout_statistical_tests.csv')
    display(df_dropout)

In [None]:
# Display plots
from IPython.display import Image, display
import glob

plot_files = glob.glob('analysis_output/*.png')
for plot_file in sorted(plot_files):
    print(f"\n{plot_file}:")
    display(Image(filename=plot_file))

## 12. Download Results (Optional)

In [None]:
if IN_COLAB:
    # Create a zip file of all results
    !zip -r results.zip logs analysis_output

    # Download
    from google.colab import files
    files.download('results.zip')

## 13. Results Already Saved!

âœ“ Your results are being automatically saved to Google Drive after each experiment!

Location: Check `MI_Experiments_[TIMESTAMP]` folder in your Drive

The folder contains:
- `logs/` - All experiment results
- `analysis/` - Analysis outputs
- `progress.txt` - Log of what's been saved

You can also manually verify or copy additional files:

In [None]:
if IN_COLAB and DRIVE_SAVE_DIR:
    print(f"Your results are saved at: {DRIVE_SAVE_DIR}")
    print(f"\nFolder contents:")
    !ls -lh {DRIVE_SAVE_DIR}
    print(f"\nNumber of experiment runs saved:")
    !ls -d {DRIVE_SAVE_DIR}/logs/run_* 2>/dev/null | wc -l

    # Show progress log
    progress_file = f"{DRIVE_SAVE_DIR}/progress.txt"
    if os.path.exists(progress_file):
        print(f"\nSave history:")
        with open(progress_file, 'r') as f:
            print(f.read())

## Alternative: Run Smaller Test First

If you want to test with fewer experiments first:

In [None]:
# Quick test with just 10 experiments
!python main.py --verbose --val-frequency 1 --noise-std 0.0 \
    --target-logic-gates XOR \
    --n-experiments 10 --size 3 --depth 2 \
    --device {DEVICE}

# Test with one L1 value
!python main.py --verbose --val-frequency 1 --noise-std 0.0 \
    --target-logic-gates XOR \
    --n-experiments 10 --size 3 --depth 2 \
    --l1-lambda 0.001 \
    --device {DEVICE}