# 01 EDA – Multi-Station Lightning Waveforms  ⚡

This notebook lets you **see** what the simulator produced and builds
intuition before you train any model.

1.  Load London & Paris waveforms (memory-mapped – zero RAM blow-up)
2.  First-second overview + raw spectrogram
3.  Event catalogue print-out
4.  Zoom and inspect one strike at **both** stations (delay & attenuation)
5.  Extra: baseline PSD vs. strike PSD
6.  Extra: cross-correlation lag between LON and PAR

---

> **Adjust paths** if you changed the folder names.


In [None]:
# ╔══════════════════════════════════════════════════════════╗
# ║ 0. Imports and file paths                                ║
# ╚══════════════════════════════════════════════════════════╝
import numpy as np, json, matplotlib.pyplot as plt, pathlib, datetime, scipy.signal as sig

root   = pathlib.Path("../data/synthetic")          # relative to notebook
npy_L  = root / "storm1_LON.npy"
npy_P  = root / "storm1_PAR.npy"
meta_f = root / "storm1_meta.json"


In [None]:
# ╔══════════════════════════════════════════════════════════╗
# ║ 1. Load waveforms (memory-mapped) & event catalogue       ║
# ╚══════════════════════════════════════════════════════════╝
lon = np.load(npy_L, mmap_mode='r')     # shape (N,)
par = np.load(npy_P, mmap_mode='r')

meta   = json.load(open(meta_f))
fs     = meta["fs"]                    # 100 kHz
events = meta["events"]

print("Samples per station:", lon.shape[0])
print("Lightning events embedded:", len(events))
for ev in events[:5]:
    print(ev)


In [None]:
# ╔══════════════════════════════════════════════════════════╗
# ║ 2. Overview – first second                               ║
# ╚══════════════════════════════════════════════════════════╝
lon_1s, par_1s = lon[:fs], par[:fs]

plt.figure(figsize=(12,3))
plt.plot(np.arange(fs)/fs, lon_1s, label="London")
plt.plot(np.arange(fs)/fs, par_1s, label="Paris", alpha=0.7)
plt.title("Raw E-field – first second"); plt.xlabel("Time (s)")
plt.ylabel("Amplitude (arb.)"); plt.legend(); plt.show()

plt.figure(figsize=(10,4))
plt.specgram(lon_1s, Fs=fs, NFFT=2048, noverlap=1024, cmap="viridis")
plt.title("Spectrogram – London, 0-1 s")
plt.xlabel("Time (s)"); plt.ylabel("Freq (Hz)")
plt.colorbar(label="dB"); plt.show()


In [None]:
# ╔══════════════════════════════════════════════════════════╗
# ║ 3. Power-spectral density (baseline noise)               ║
# ╚══════════════════════════════════════════════════════════╝
f, Pxx = sig.welch(lon_1s, fs=fs, nperseg=4096)
plt.semilogy(f, Pxx); plt.xlim(0, 5000)
plt.title("PSD – noise segment"); plt.xlabel("Frequency (Hz)"); plt.ylabel("PSD")
plt.show()


In [None]:
# ╔══════════════════════════════════════════════════════════╗
# ║ 4. Zoom ±40 ms around *first* strike – both stations      ║
# ╚══════════════════════════════════════════════════════════╝
centre = events[0]["t"]
win = 0.08                            # 80 ms window
s, e = int((centre-win/2)*fs), int((centre+win/2)*fs)
seg_L, seg_P = lon[s:e], par[s:e]
t_axis = np.arange(seg_L.size)/fs

plt.figure(figsize=(12,3))
plt.plot(t_axis, seg_L, label="London")
plt.plot(t_axis, seg_P, label="Paris", alpha=0.7)
plt.title("Strike waveform at two stations (±40 ms)")
plt.xlabel("Time (s)"); plt.ylabel("Amplitude"); plt.legend(); plt.show()

plt.figure(figsize=(10,4))
plt.specgram(seg_L, Fs=fs, NFFT=512, noverlap=256, cmap="magma",
             vmin=-110, vmax=-70)
plt.title("High-res spectrogram – London burst")
plt.ylim(0, 20_000); plt.colorbar(label="dB"); plt.show()


In [None]:
# ╔══════════════════════════════════════════════════════════╗
# ║ 5. Cross-correlation lag between LON and PAR             ║
# ╚══════════════════════════════════════════════════════════╝
corr = sig.correlate(seg_P, seg_L, mode='full')
lags = sig.correlation_lags(len(seg_P), len(seg_L), mode='full') / fs
peak = lags[np.argmax(corr)]
print(f"Peak cross-corr lag (PAR relative to LON): {peak*1e3:.2f} ms")

plt.figure(figsize=(8,3))
plt.plot(lags*1e3, corr)
plt.title("Cross-correlation PAR ↔ LON")
plt.xlabel("Lag (ms)"); plt.ylabel("Correlation")
plt.axvline(peak*1e3, color='r')           # ← use color, not colour
plt.show()


## Interpretation & next steps

* **Lag ≈ x ms** matches haversine distance / 300 km s⁻¹ – sanity check.
* Spectrogram shows a clean ~5–10 kHz damped sinusoid, so models can learn frequency content.
* Use this notebook to pick interesting windows for auto-encoder reconstruction, STA/LTA thresholds, or NCD peaks.

Increase `minutes` in the simulator or raise `amp` to create a busier storm and re-run the same notebook for richer plots.
