# Gary Micro-Twin Demo

This notebook demonstrates the **Gary Micro-Twin**: a focused 3-zone synthetic data generator for controlled ML testing.

## What is the Gary Micro-Twin?

The Gary Micro-Twin produces zone-aware synthetic IQ data and metadata for **controlled ML testing**. It does **NOT** replace SpectrumX competition data - it's a controlled extension module.

### 3 Anchor Zones:
1. **Gary City Hall** - Civic center, moderate traffic
2. **West Side Leadership Academy** - High school, variable occupancy (equity focus)
3. **Gary Public Library & Cultural Center** - Community hub, steady baseline

### Use Cases:
- **Controlled ML testing**: Test detector robustness across zones
- **Ablation studies**: Isolate zone-specific effects
- **Fairness evaluation**: Compare performance across zones
- **Reproducibility**: Deterministic synthetic data with known ground truth

In [None]:
import sys
from pathlib import Path
import numpy as np
import matplotlib.pyplot as plt
from scipy import signal

# Add src to path
repo_root = Path().resolve().parent
sys.path.insert(0, str(repo_root / "src"))

from edge_ran_gary.digital_twin.gary_micro_twin import GaryMicroTwin, generate_micro_twin_dataset

## Initialize Micro-Twin

In [None]:
# Initialize micro-twin
micro_twin = GaryMicroTwin(config_path="../configs/gary_micro_twin.yaml")

# Show zone info
print("Gary Micro-Twin Zones:")
for zone_id in micro_twin.zone_model.zone_ids:
    zone = micro_twin.zone_model.get_zone(zone_id)
    zone_info = micro_twin.zone_metadata[zone_id]
    print(f"\n{zone_info['name']} ({zone_id}):")
    print(f"  Location: ({zone_info['lat']}, {zone_info['lon']})")
    print(f"  Radius: {zone_info['radius_m']}m")
    print(f"  Weight: {zone.weight}")
    print(f"  Occupancy prior: {zone.occupancy_prior}")
    print(f"  SNR range: {zone.snr_range} dB")
    print(f"  CFO range: {zone.cfo_range} Hz")

## Generate Sample per Zone

In [None]:
# Generate one example per zone
from edge_ran_gary.digital_twin.generator import generate_iq_window

examples = {}
for zone_id in micro_twin.zone_model.zone_ids:
    # Generate label=0 (noise)
    iq_noise, meta_noise = generate_iq_window(
        seed=42,
        label=0,
        config_path="../configs/gary_micro_twin.yaml",
        zone_id=zone_id
    )
    
    # Generate label=1 (signal)
    iq_signal, meta_signal = generate_iq_window(
        seed=43,
        label=1,
        config_path="../configs/gary_micro_twin.yaml",
        zone_id=zone_id
    )
    
    examples[zone_id] = {
        'noise': (iq_noise, meta_noise),
        'signal': (iq_signal, meta_signal)
    }
    
    print(f"✅ {micro_twin.zone_metadata[zone_id]['name']}: Generated noise + signal samples")

## Visualize IQ Samples

In [None]:
# Plot IQ samples for each zone
sample_rate = 1e6
duration = 1.0
n_samples = int(sample_rate * duration)
t = np.arange(n_samples) / sample_rate

fig, axes = plt.subplots(3, 2, figsize=(14, 10))
fig.suptitle("Gary Micro-Twin: IQ Samples per Zone", fontsize=16)

for idx, zone_id in enumerate(micro_twin.zone_model.zone_ids):
    zone_name = micro_twin.zone_metadata[zone_id]['name']
    
    # Noise sample
    iq_noise, meta_noise = examples[zone_id]['noise']
    axes[idx, 0].plot(t[:1000], np.real(iq_noise[:1000]), label='I', alpha=0.7)
    axes[idx, 0].plot(t[:1000], np.imag(iq_noise[:1000]), label='Q', alpha=0.7)
    axes[idx, 0].set_title(f"{zone_name} - Noise (label=0)")
    axes[idx, 0].set_xlabel("Time (s)")
    axes[idx, 0].set_ylabel("Amplitude")
    axes[idx, 0].legend()
    axes[idx, 0].grid(True, alpha=0.3)
    
    # Signal sample
    iq_signal, meta_signal = examples[zone_id]['signal']
    axes[idx, 1].plot(t[:1000], np.real(iq_signal[:1000]), label='I', alpha=0.7)
    axes[idx, 1].plot(t[:1000], np.imag(iq_signal[:1000]), label='Q', alpha=0.7)
    axes[idx, 1].set_title(f"{zone_name} - Signal (label=1, SNR={meta_signal['snr_db']:.1f}dB)")
    axes[idx, 1].set_xlabel("Time (s)")
    axes[idx, 1].set_ylabel("Amplitude")
    axes[idx, 1].legend()
    axes[idx, 1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

## PSD Comparison

In [None]:
# Compute and plot PSD for each zone
fig, axes = plt.subplots(3, 1, figsize=(12, 10))
fig.suptitle("Power Spectral Density (PSD) per Zone", fontsize=16)

for idx, zone_id in enumerate(micro_twin.zone_model.zone_ids):
    zone_name = micro_twin.zone_metadata[zone_id]['name']
    
    # Noise PSD
    iq_noise, _ = examples[zone_id]['noise']
    freqs_noise, psd_noise = signal.welch(iq_noise, fs=sample_rate, nperseg=1024, return_onesided=False)
    
    # Signal PSD
    iq_signal, meta_signal = examples[zone_id]['signal']
    freqs_signal, psd_signal = signal.welch(iq_signal, fs=sample_rate, nperseg=1024, return_onesided=False)
    
    axes[idx].semilogy(freqs_noise, np.abs(psd_noise), label='Noise (label=0)', alpha=0.7)
    axes[idx].semilogy(freqs_signal, np.abs(psd_signal), label=f"Signal (label=1, SNR={meta_signal['snr_db']:.1f}dB)", alpha=0.7)
    axes[idx].set_title(f"{zone_name}")
    axes[idx].set_xlabel("Frequency (Hz)")
    axes[idx].set_ylabel("PSD")
    axes[idx].legend()
    axes[idx].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

## Spectrogram Comparison

In [None]:
# Spectrograms for signal samples
fig, axes = plt.subplots(3, 1, figsize=(12, 10))
fig.suptitle("Spectrogram: Signal Samples per Zone", fontsize=16)

for idx, zone_id in enumerate(micro_twin.zone_model.zone_ids):
    zone_name = micro_twin.zone_metadata[zone_id]['name']
    iq_signal, meta_signal = examples[zone_id]['signal']
    
    freqs, times, Sxx = signal.spectrogram(iq_signal, fs=sample_rate, nperseg=256, return_onesided=False)
    
    im = axes[idx].pcolormesh(times, freqs, 10 * np.log10(np.abs(Sxx) + 1e-10), shading='gouraud')
    axes[idx].set_title(f"{zone_name} - SNR={meta_signal['snr_db']:.1f}dB, CFO={meta_signal['cfo_hz']:.1f}Hz")
    axes[idx].set_xlabel("Time (s)")
    axes[idx].set_ylabel("Frequency (Hz)")
    plt.colorbar(im, ax=axes[idx], label="Power (dB)")

plt.tight_layout()
plt.show()

## Generate Full Dataset for Ananya's ML Testing

In [None]:
# Generate a small dataset for testing
# For full dataset, use: generate_micro_twin_dataset(n_per_zone=1000, ...)

output_dir = "../data/gary_micro_twin_demo"
micro_twin, samples, metadata_df = generate_micro_twin_dataset(
    output_dir=output_dir,
    n_per_zone=50,  # 50 samples per zone = 150 total
    label_balance=0.5,
    seed=42
)

print(f"\n✅ Generated {len(samples)} samples")
print(f"\nLabel distribution:")
print(metadata_df['label'].value_counts())
print(f"\nZone distribution:")
print(metadata_df['zone_id'].value_counts())

## How Ananya Can Use This for Controlled ML Testing

### 1. **Zone-Aware Training**:
```python
# Train detector on City Hall only
city_hall_data = metadata_df[metadata_df['zone_id'] == 'gary_city_hall']
# ... train model ...

# Test on Library (different zone)
library_data = metadata_df[metadata_df['zone_id'] == 'gary_public_library_cultural_center']
# ... evaluate ...
```

### 2. **SNR Ablation**:
```python
# Test detector at different SNR levels
low_snr = metadata_df[metadata_df['snr_db'] < 5]
high_snr = metadata_df[metadata_df['snr_db'] > 15]
# ... compare performance ...
```

### 3. **Fairness Evaluation**:
```python
# Compare performance across zones (equity focus)
for zone_id in ['gary_city_hall', 'west_side_leadership_academy', 'gary_public_library_cultural_center']:
    zone_data = metadata_df[metadata_df['zone_id'] == zone_id]
    # ... evaluate detector ...
    # ... compare metrics ...
```

### 4. **Reproducibility**:
```python
# All samples have deterministic seeds
# Re-generate exact same dataset:
micro_twin, samples, metadata_df = generate_micro_twin_dataset(seed=42)
```