# Thunder Acoustics in the Wild - Quick Start

This notebook demonstrates the complete pipeline for thunder acoustic analysis:

1. **Setup**: Import libraries and load configuration
2. **Audio Extraction**: Extract audio from video (or load existing)
3. **Preprocessing**: Bandpass filter, noise reduction, normalization
4. **Event Detection**: Detect thunder events using energy + spectral analysis
5. **Feature Extraction**: Time, frequency, and time-frequency features
6. **Visualization**: Waveform, spectrogram, feature distributions
7. **Export**: Save results to CSV and JSON

**Requirements**: Ensure `thunder` package is installed:
```bash
pip install -e .
```

## 1. Setup: Import Libraries and Load Configuration

In [None]:
import sys
from pathlib import Path
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from datetime import datetime

# Add parent directory to path
sys.path.insert(0, str(Path.cwd().parent))

# Import thunder package
from thunder import io, preprocess, detection, features, utils, metadata

# Setup logging
utils.setup_logging(log_level='INFO')

# Load configuration
config_path = Path('../configs/default.yaml')
config = utils.load_config(config_path)

print("✓ Libraries imported")
print("✓ Configuration loaded")
print(f"  - Sample rate: {config['preprocessing']['sample_rate']} Hz")
print(f"  - Energy threshold: {config['detection']['energy_threshold']}")
print(f"  - Spectral threshold: {config['detection']['spectral_threshold']}")

## 2. Create Output Directory

Create a timestamped output directory for this analysis.

In [None]:
# Create output directory
output_base = Path('../outputs')
output_dir = utils.ensure_output_dir(output_base / 'quickstart_demo', date_prefix=True)

print(f"✓ Output directory: {output_dir}")

## 3. Generate Synthetic Thunder Audio (Demo)

For this quickstart, we'll generate synthetic thunder-like audio. In production, you would use `scripts/extract_audio.py` to extract audio from videos.

In [None]:
def generate_synthetic_thunder(duration=10.0, sr=48000):
    """Generate synthetic thunder-like audio."""
    t = np.linspace(0, duration, int(sr * duration))
    audio = np.zeros_like(t)
    
    # Add 3 thunder events
    events = [
        (1.0, 2.5, 0.3),   # (start, end, amplitude)
        (4.0, 5.0, 0.5),
        (7.5, 9.0, 0.2),
    ]
    
    for start, end, amp in events:
        mask = (t >= start) & (t < end)
        # Low-frequency rumble (50-200 Hz) + decay envelope
        freq = np.random.uniform(50, 200)
        envelope = np.exp(-3 * (t[mask] - start) / (end - start))
        rumble = amp * envelope * np.sin(2 * np.pi * freq * (t[mask] - start))
        
        # Add high-frequency crack at start
        crack_duration = 0.1
        crack_mask = (t >= start) & (t < start + crack_duration)
        if crack_mask.sum() > 0:
            crack_freq = np.random.uniform(1000, 3000)
            crack_env = np.exp(-20 * (t[crack_mask] - start))
            crack = amp * 1.5 * crack_env * np.sin(2 * np.pi * crack_freq * (t[crack_mask] - start))
            audio[crack_mask] += crack
        
        audio[mask] += rumble
    
    # Add noise
    noise = 0.01 * np.random.randn(len(audio))
    audio += noise
    
    # Normalize
    audio = audio / np.max(np.abs(audio)) * 0.8
    
    return audio.astype(np.float32), sr

# Generate audio
audio, sr = generate_synthetic_thunder(duration=10.0, sr=48000)

print(f"✓ Synthetic audio generated")
print(f"  - Duration: {len(audio) / sr:.2f} seconds")
print(f"  - Sample rate: {sr} Hz")
print(f"  - Shape: {audio.shape}")

## 4. Preprocessing: Bandpass Filter, Noise Reduction, Normalization

In [None]:
# Apply preprocessing pipeline
preprocess_config = config['preprocessing']
audio_processed = preprocess.preprocess_pipeline(audio, sr, preprocess_config)

print("✓ Preprocessing complete")
print(f"  - Bandpass filter: {preprocess_config['bandpass']['low_cutoff']}-{preprocess_config['bandpass']['high_cutoff']} Hz")
print(f"  - Normalization: {preprocess_config['normalize']['method']} @ {preprocess_config['normalize']['target_db']} dB")
print(f"  - Processed shape: {audio_processed.shape}")

## 5. Event Detection: Combined Energy + Spectral Analysis

In [None]:
# Detect thunder events
detection_config = config['detection']
events = detection.detect_thunder_events(audio_processed, sr, detection_config)

print(f"✓ Detected {len(events)} events")
for i, event in enumerate(events):
    print(f"  Event {i+1}: {event['start']:.2f}s - {event['end']:.2f}s (duration: {event['duration']:.2f}s, peak: {event['peak_amplitude']:.4f})")

## 6. Feature Extraction: Time, Frequency, and Time-Frequency Features

In [None]:
# Extract features for all events
features_config = config['features']
feature_list = features.extract_all_features(audio_processed, sr, events, features_config)

# Convert to DataFrame
features_df = pd.DataFrame(feature_list)

print(f"✓ Extracted features for {len(feature_list)} events")
print(f"  - Feature count: {len(features_df.columns)} columns")
print("\nFeature columns:")
print(features_df.columns.tolist())

### Feature Summary Statistics

In [None]:
# Display numeric feature summary
numeric_cols = features_df.select_dtypes(include=['number']).columns
exclude_cols = ['event_id', 'start', 'end', 'duration']
feature_cols = [c for c in numeric_cols if c not in exclude_cols]

summary = features_df[feature_cols].describe()
print(summary)

## 7. Visualization: Waveform with Detected Events

In [None]:
fig, ax = plt.subplots(figsize=(14, 4))

# Time axis
time = np.arange(len(audio_processed)) / sr

# Plot waveform
ax.plot(time, audio_processed, color='steelblue', linewidth=0.7, label='Waveform')

# Highlight detected events
for i, event in enumerate(events):
    ax.axvspan(event['start'], event['end'], alpha=0.3, color='red', 
               label='Thunder Event' if i == 0 else '')
    # Mark peak
    ax.axvline(event['peak_time'], color='darkred', linestyle='--', linewidth=1, alpha=0.7)

ax.set_xlabel('Time (s)', fontsize=12)
ax.set_ylabel('Amplitude', fontsize=12)
ax.set_title('Audio Waveform with Detected Thunder Events', fontsize=14, fontweight='bold')
ax.legend(loc='upper right')
ax.grid(True, alpha=0.3)

plt.tight_layout()
waveform_path = output_dir / 'waveform.png'
plt.savefig(waveform_path, dpi=300)
plt.show()

print(f"✓ Waveform saved: {waveform_path}")

## 8. Visualization: Spectrogram

In [None]:
from scipy.signal import stft

# Compute STFT
spec_config = config['visualization']['spectrogram']
f, t, Zxx = stft(audio_processed, fs=sr, 
                 nperseg=spec_config['n_fft'], 
                 noverlap=spec_config['n_fft'] - spec_config['hop_length'])

# Convert to dB
magnitude = np.abs(Zxx)
magnitude_db = 20 * np.log10(magnitude + 1e-10)

# Plot spectrogram
fig, ax = plt.subplots(figsize=(14, 6))

im = ax.pcolormesh(t, f, magnitude_db, shading='auto', 
                    cmap=spec_config['colormap'],
                    vmin=spec_config['vmin'], vmax=spec_config['vmax'])

# Mark events
for event in events:
    ax.axvline(event['start'], color='red', linestyle='--', linewidth=1.5, alpha=0.8)
    ax.axvline(event['end'], color='red', linestyle='--', linewidth=1.5, alpha=0.8)

ax.set_xlabel('Time (s)', fontsize=12)
ax.set_ylabel('Frequency (Hz)', fontsize=12)
ax.set_title('Spectrogram with Thunder Events', fontsize=14, fontweight='bold')
ax.set_yscale('log')
ax.set_ylim(spec_config['freq_min'], spec_config['freq_max'])
ax.grid(True, alpha=0.3, which='both')

cbar = plt.colorbar(im, ax=ax, label='Magnitude (dB)')
plt.tight_layout()

spec_path = output_dir / 'spectrogram.png'
plt.savefig(spec_path, dpi=300)
plt.show()

print(f"✓ Spectrogram saved: {spec_path}")

## 9. Visualization: Feature Distributions

In [None]:
# Select key features to plot
plot_features = [
    'peak_amplitude', 'rms', 'crest_factor', 'zero_crossing_rate',
    'spectral_centroid', 'spectral_bandwidth', 'dominant_frequency',
    'kurtosis', 'skewness'
]

# Filter to existing columns
plot_features = [f for f in plot_features if f in features_df.columns]

n_features = len(plot_features)
n_cols = 3
n_rows = (n_features + n_cols - 1) // n_cols

fig, axes = plt.subplots(n_rows, n_cols, figsize=(14, n_rows * 3))
axes = axes.flatten()

for i, feature in enumerate(plot_features):
    ax = axes[i]
    data = features_df[feature].dropna()
    
    if len(data) > 0:
        ax.hist(data, bins=10, color='steelblue', edgecolor='black', alpha=0.7)
        ax.set_title(feature, fontsize=11, fontweight='bold')
        ax.set_xlabel('Value', fontsize=10)
        ax.set_ylabel('Count', fontsize=10)
        ax.grid(True, alpha=0.3)

# Hide unused subplots
for i in range(n_features, len(axes)):
    axes[i].axis('off')

plt.tight_layout()
hist_path = output_dir / 'feature_distributions.png'
plt.savefig(hist_path, dpi=300)
plt.show()

print(f"✓ Feature distributions saved: {hist_path}")

## 10. Export Results: CSV, JSON, and Audio

In [None]:
# Save events CSV
events_df = pd.DataFrame(events)
events_path = output_dir / 'events.csv'
events_df.to_csv(events_path, index=False)
print(f"✓ Events saved: {events_path}")

# Save features CSV
features_path = output_dir / 'features.csv'
features_df.to_csv(features_path, index=False)
print(f"✓ Features saved: {features_path}")

# Save processed audio
audio_path = output_dir / 'audio_processed.wav'
io.save_audio(audio_path, audio_processed, sr, bit_depth=24, overwrite=True)
print(f"✓ Audio saved: {audio_path}")

# Create and save metadata
meta = metadata.create_metadata(
    source_url='synthetic_demo',
    config=config,
    additional_info={
        'num_events': len(events),
        'audio_duration': len(audio_processed) / sr,
        'sample_rate': sr,
        'notebook': '00_quickstart.ipynb'
    }
)

meta_path = output_dir / 'meta.json'
metadata.save_metadata(meta, meta_path)
print(f"✓ Metadata saved: {meta_path}")

print(f"\n{'='*60}")
print("All results saved to:")
print(f"  {output_dir}")
print(f"{'='*60}")

## Summary

This notebook demonstrated the complete thunder acoustics analysis pipeline:

✅ **Preprocessing**: Bandpass filter (20-6000 Hz), noise reduction, normalization  
✅ **Event Detection**: Combined energy + spectral flux detection  
✅ **Feature Extraction**: 15+ acoustic features (time/frequency/statistical)  
✅ **Visualization**: Waveform, spectrogram, feature distributions  
✅ **Export**: CSV, JSON, WAV files  

### Next Steps

1. **Real Data**: Use `scripts/extract_audio.py` to process actual video files
2. **Parameter Tuning**: Try `notebooks/10_event_detection_eval.ipynb` for threshold sensitivity analysis
3. **Feature Analysis**: Explore `notebooks/20_features_scouting.ipynb` for correlation and PCA
4. **Batch Processing**: Use `Makefile` targets for automated pipelines

### CLI Usage

```bash
# Extract audio from video
python scripts/extract_audio.py --video video.mp4 --output audio.wav

# Detect events
python scripts/detect_events.py --audio audio.wav --output events.csv

# Compute features
python scripts/compute_features.py --audio audio.wav --events events.csv --output features.csv

# Visualize
python scripts/visualize.py --audio audio.wav --events events.csv --features features.csv --output-dir outputs/viz/

# Generate report
python scripts/build_report.py --events events.csv --features features.csv --meta meta.json --viz-dir outputs/viz/ --output report.md
```