[← Modules](../../../getting_started/theory_to_python/modules.rst)

# DataChange

In simulation, sensors are perfect: they provide clean, noise-free measurements at exactly the rate we specify. In hardware, reality is messier:

- **IMUs** accumulate drift and have high-frequency noise
- **Encoders** suffer from contact bounce when switching states
- **Wireless sensors** drop packets randomly
- **Analog sensors** quantize continuous signals into discrete levels
- **Force sensors** have uncalibrated bias offsets
- **Temperature sensors** drift as components warm up

The `pykal.data_change` module helps you:
1. **Corrupt clean signals** to simulate realistic hardware issues (`corrupt` class)
2. **Prepare corrupted signals** for use in estimators and controllers (`prepare` class)

This notebook demonstrates all available corruption and preparation methods with visual examples.

## Setup

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from pykal.data_change import corrupt, prepare

# Set random seed for reproducibility
np.random.seed(42)

## Generate Clean Test Signals

We'll create several standard signals to demonstrate corruption and preparation methods.

In [None]:
# Time vector
t = np.linspace(0, 10, 1000)

# Signal 1: Sine wave (smooth, continuous)
sine_wave = np.sin(2 * np.pi * 0.5 * t)

# Signal 2: Step function (discontinuous, for testing bounce/debounce)
step_signal = np.zeros_like(t)
step_signal[200:400] = 1.0
step_signal[600:800] = 1.0

# Signal 3: Ramp (linearly increasing)
ramp_signal = t / 10.0

# Signal 4: Constant (for testing drift/bias)
constant_signal = np.ones_like(t) * 5.0

# Visualize clean signals
fig, axs = plt.subplots(2, 2, figsize=(14, 8))

axs[0, 0].plot(t, sine_wave, "b-", linewidth=2)
axs[0, 0].set_title("Sine Wave", fontsize=14, fontweight="bold")
axs[0, 0].set_ylabel("Amplitude", fontsize=11)
axs[0, 0].grid(True, alpha=0.3)

axs[0, 1].plot(t, step_signal, "g-", linewidth=2)
axs[0, 1].set_title("Step Function", fontsize=14, fontweight="bold")
axs[0, 1].set_ylabel("Amplitude", fontsize=11)
axs[0, 1].grid(True, alpha=0.3)

axs[1, 0].plot(t, ramp_signal, "r-", linewidth=2)
axs[1, 0].set_title("Ramp Signal", fontsize=14, fontweight="bold")
axs[1, 0].set_xlabel("Time (s)", fontsize=11)
axs[1, 0].set_ylabel("Amplitude", fontsize=11)
axs[1, 0].grid(True, alpha=0.3)

axs[1, 1].plot(t, constant_signal, "m-", linewidth=2)
axs[1, 1].set_title("Constant Signal", fontsize=14, fontweight="bold")
axs[1, 1].set_xlabel("Time (s)", fontsize=11)
axs[1, 1].set_ylabel("Amplitude", fontsize=11)
axs[1, 1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

## Part 1: Signal Corruption (Clean → Corrupted)

The `corrupt` class provides methods to simulate common hardware issues. Each method is static and takes a NumPy array as input.

### 1.1 Gaussian Noise

**Hardware source**: Thermal noise in analog sensors, quantization errors, ADC noise

**Common in**: IMUs, force sensors, position sensors

In [None]:
# Scalar noise (same std for all elements)
noisy_sine = corrupt.with_gaussian_noise(sine_wave, std=0.2, seed=42)

# Visualize
plt.figure(figsize=(12, 5))
plt.plot(t, sine_wave, "b-", label="Clean Signal", linewidth=2, alpha=0.7)
plt.plot(
    t, noisy_sine, "r-", label="With Gaussian Noise (σ=0.2)", linewidth=1, alpha=0.6
)
plt.xlabel("Time (s)", fontsize=12)
plt.ylabel("Amplitude", fontsize=12)
plt.title("Gaussian Noise Corruption", fontsize=14, fontweight="bold")
plt.legend(fontsize=11)
plt.grid(True, alpha=0.3)
plt.show()

print(f"Original std: {np.std(sine_wave):.4f}")
print(f"Corrupted std: {np.std(noisy_sine):.4f}")
print(f"Noise added: {np.std(noisy_sine - sine_wave):.4f}")

**Multivariate Gaussian Noise** (with covariance matrix):

Use when different components have different noise levels or are correlated.

In [None]:
# 3D IMU example: accelerometer with different noise per axis
imu_data = np.column_stack([sine_wave[:100], ramp_signal[:100], constant_signal[:100]])

# Covariance: x has low noise, y medium, z high (with small correlations)
Q = np.array([[0.01, 0.002, 0.0], [0.002, 0.05, 0.01], [0.0, 0.01, 0.1]])

noisy_imu = np.array(
    [corrupt.with_gaussian_noise(row, Q=Q, seed=i) for i, row in enumerate(imu_data)]
)

# Visualize
fig, axs = plt.subplots(1, 3, figsize=(15, 4))
axes_names = ["X Axis (σ=0.1)", "Y Axis (σ=0.22)", "Z Axis (σ=0.32)"]

for i in range(3):
    axs[i].plot(imu_data[:, i], "b-", label="Clean", linewidth=2, alpha=0.7)
    axs[i].plot(noisy_imu[:, i], "r-", label="Noisy", linewidth=1, alpha=0.6)
    axs[i].set_title(axes_names[i], fontsize=12, fontweight="bold")
    axs[i].set_xlabel("Sample", fontsize=10)
    axs[i].legend()
    axs[i].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

### 1.2 Contact Bounce

**Hardware source**: Mechanical switches, encoders, limit switches

**Common in**: Digital inputs, rotary encoders, tactile sensors

In [None]:
bounced_step = corrupt.with_bounce(step_signal, duration=5, amplitude=0.3, seed=42)

# Zoom into transition region
fig, axs = plt.subplots(1, 2, figsize=(14, 5))

# Full signal
axs[0].plot(t, step_signal, "b-", label="Clean Step", linewidth=2)
axs[0].plot(t, bounced_step, "r-", label="With Bounce", linewidth=1.5, alpha=0.8)
axs[0].set_xlabel("Time (s)", fontsize=12)
axs[0].set_ylabel("Amplitude", fontsize=12)
axs[0].set_title("Contact Bounce (Full Signal)", fontsize=14, fontweight="bold")
axs[0].legend(fontsize=11)
axs[0].grid(True, alpha=0.3)

# Zoomed transition (first rising edge)
zoom_start, zoom_end = 195, 215
axs[1].plot(
    t[zoom_start:zoom_end],
    step_signal[zoom_start:zoom_end],
    "b-",
    label="Clean Step",
    linewidth=3,
)
axs[1].plot(
    t[zoom_start:zoom_end],
    bounced_step[zoom_start:zoom_end],
    "r-",
    label="With Bounce",
    linewidth=2,
    alpha=0.8,
)
axs[1].set_xlabel("Time (s)", fontsize=12)
axs[1].set_ylabel("Amplitude", fontsize=12)
axs[1].set_title("Contact Bounce (Zoomed Transition)", fontsize=14, fontweight="bold")
axs[1].legend(fontsize=11)
axs[1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

### 1.3 Dropouts (Packet Loss)

**Hardware source**: Wireless communication, intermittent connections, sensor failures

**Common in**: WiFi sensors, Bluetooth devices, CAN bus communication

In [None]:
dropped_sine = corrupt.with_dropouts(sine_wave, dropout_rate=0.2, seed=42)

# Visualize
plt.figure(figsize=(12, 5))
plt.plot(t, sine_wave, "b-", label="Clean Signal", linewidth=2, alpha=0.5)
plt.plot(t, dropped_sine, "r.", label="With 20% Dropouts (NaN)", markersize=3)
plt.xlabel("Time (s)", fontsize=12)
plt.ylabel("Amplitude", fontsize=12)
plt.title("Packet Loss / Dropouts", fontsize=14, fontweight="bold")
plt.legend(fontsize=11)
plt.grid(True, alpha=0.3)
plt.show()

print(f"Total samples: {len(sine_wave)}")
print(f"Dropped samples: {np.isnan(dropped_sine).sum()}")
print(f"Dropout rate: {np.isnan(dropped_sine).sum() / len(sine_wave):.2%}")

### 1.4 Constant Bias

**Hardware source**: Uncalibrated sensors, zero-offset errors

**Common in**: IMU accelerometers/gyroscopes, force/torque sensors, pressure sensors

In [None]:
biased_sine = corrupt.with_bias(sine_wave, bias=1.5)

# Visualize
plt.figure(figsize=(12, 5))
plt.plot(t, sine_wave, "b-", label="Clean Signal (zero-mean)", linewidth=2, alpha=0.7)
plt.plot(t, biased_sine, "r-", label="With Bias (+1.5)", linewidth=2, alpha=0.7)
plt.axhline(y=0, color="k", linestyle=":", alpha=0.5, label="Zero Line")
plt.axhline(y=1.5, color="gray", linestyle=":", alpha=0.5, label="Bias Level")
plt.xlabel("Time (s)", fontsize=12)
plt.ylabel("Amplitude", fontsize=12)
plt.title("Constant Bias Offset", fontsize=14, fontweight="bold")
plt.legend(fontsize=11)
plt.grid(True, alpha=0.3)
plt.show()

print(f"Original mean: {np.mean(sine_wave):.4f}")
print(f"Biased mean: {np.mean(biased_sine):.4f}")
print(f"Bias: {np.mean(biased_sine) - np.mean(sine_wave):.4f}")

### 1.5 Drift (Time-Dependent Bias)

**Hardware source**: Component warm-up, sensor degradation, temperature changes

**Common in**: Gyroscopes, pressure sensors, temperature sensors

In [None]:
# Linear drift
drifted_constant_linear = corrupt.with_drift(
    constant_signal, drift_rate=0.1, drift_type="linear"
)

# Exponential drift
drifted_constant_exp = corrupt.with_drift(
    constant_signal, drift_rate=0.05, drift_type="exponential"
)

# Visualize
fig, axs = plt.subplots(1, 2, figsize=(14, 5))

# Linear drift
axs[0].plot(t, constant_signal, "b-", label="Clean Constant", linewidth=2, alpha=0.7)
axs[0].plot(t, drifted_constant_linear, "r-", label="With Linear Drift", linewidth=2)
axs[0].set_xlabel("Time (s)", fontsize=12)
axs[0].set_ylabel("Amplitude", fontsize=12)
axs[0].set_title("Linear Drift (sensor warm-up)", fontsize=14, fontweight="bold")
axs[0].legend(fontsize=11)
axs[0].grid(True, alpha=0.3)

# Exponential drift
axs[1].plot(t, constant_signal, "b-", label="Clean Constant", linewidth=2, alpha=0.7)
axs[1].plot(t, drifted_constant_exp, "g-", label="With Exponential Drift", linewidth=2)
axs[1].set_xlabel("Time (s)", fontsize=12)
axs[1].set_ylabel("Amplitude", fontsize=12)
axs[1].set_title("Exponential Drift (thermal effects)", fontsize=14, fontweight="bold")
axs[1].legend(fontsize=11)
axs[1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

### 1.6 Quantization (ADC Resolution Limits)

**Hardware source**: Analog-to-digital converters with limited bit depth

**Common in**: All analog sensors (8-bit, 10-bit, 12-bit ADCs)

In [None]:
# 8-bit quantization (256 levels)
quantized_sine_8bit = corrupt.with_quantization(sine_wave, levels=256)

# 4-bit quantization (16 levels) - severe
quantized_sine_4bit = corrupt.with_quantization(sine_wave, levels=16)

# Visualize
fig, axs = plt.subplots(1, 2, figsize=(14, 5))

# 8-bit
axs[0].plot(
    t[:200],
    sine_wave[:200],
    "b-",
    label="Clean (infinite resolution)",
    linewidth=2,
    alpha=0.7,
)
axs[0].plot(
    t[:200],
    quantized_sine_8bit[:200],
    "r-",
    label="8-bit Quantized (256 levels)",
    linewidth=1.5,
)
axs[0].set_xlabel("Time (s)", fontsize=12)
axs[0].set_ylabel("Amplitude", fontsize=12)
axs[0].set_title("8-bit ADC Quantization", fontsize=14, fontweight="bold")
axs[0].legend(fontsize=11)
axs[0].grid(True, alpha=0.3)

# 4-bit
axs[1].plot(
    t[:200],
    sine_wave[:200],
    "b-",
    label="Clean (infinite resolution)",
    linewidth=2,
    alpha=0.7,
)
axs[1].plot(
    t[:200],
    quantized_sine_4bit[:200],
    "g-",
    label="4-bit Quantized (16 levels)",
    linewidth=1.5,
)
axs[1].set_xlabel("Time (s)", fontsize=12)
axs[1].set_ylabel("Amplitude", fontsize=12)
axs[1].set_title("4-bit ADC Quantization (Severe)", fontsize=14, fontweight="bold")
axs[1].legend(fontsize=11)
axs[1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print(f"Unique values in clean signal: {len(np.unique(sine_wave))}")
print(f"Unique values in 8-bit signal: {len(np.unique(quantized_sine_8bit))}")
print(f"Unique values in 4-bit signal: {len(np.unique(quantized_sine_4bit))}")

### 1.7 Spikes / Outliers

**Hardware source**: Electromagnetic interference (EMI), electrical noise, sensor glitches

**Common in**: Unshielded sensors, high-EMI environments, motors turning on/off

In [None]:
spiked_sine = corrupt.with_spikes(
    sine_wave, spike_rate=0.02, spike_magnitude=3.0, seed=42
)

# Visualize
plt.figure(figsize=(12, 5))
plt.plot(t, sine_wave, "b-", label="Clean Signal", linewidth=2, alpha=0.7)
plt.plot(
    t, spiked_sine, "r-", label="With Spikes (2% rate, 3× magnitude)", linewidth=1.5
)
plt.xlabel("Time (s)", fontsize=12)
plt.ylabel("Amplitude", fontsize=12)
plt.title("EMI Spikes / Outliers", fontsize=14, fontweight="bold")
plt.legend(fontsize=11)
plt.grid(True, alpha=0.3)
plt.show()

spike_mask = np.abs(spiked_sine - sine_wave) > 0.5
print(f"Number of spikes: {spike_mask.sum()}")
print(f"Spike rate: {spike_mask.sum() / len(sine_wave):.2%}")

### 1.8 Saturation / Clipping

**Hardware source**: Sensor range limits, amplifier saturation

**Common in**: Force sensors at max load, encoders at mechanical stops, ADCs at voltage limits

In [None]:
# Amplify sine wave to show clipping
large_sine = sine_wave * 2.0
clipped_sine = corrupt.with_clipping(large_sine, lower=-1.0, upper=1.0)

# Visualize
plt.figure(figsize=(12, 5))
plt.plot(
    t, large_sine, "b-", label="Clean Signal (exceeds limits)", linewidth=2, alpha=0.7
)
plt.plot(t, clipped_sine, "r-", label="Clipped to [-1, 1]", linewidth=2)
plt.axhline(y=1.0, color="k", linestyle="--", alpha=0.5, label="Saturation Limits")
plt.axhline(y=-1.0, color="k", linestyle="--", alpha=0.5)
plt.xlabel("Time (s)", fontsize=12)
plt.ylabel("Amplitude", fontsize=12)
plt.title("Sensor Saturation / Clipping", fontsize=14, fontweight="bold")
plt.legend(fontsize=11)
plt.grid(True, alpha=0.3)
plt.show()

saturated_samples = np.sum((np.abs(clipped_sine) >= 0.99))
print(
    f"Saturated samples: {saturated_samples} ({saturated_samples/len(clipped_sine):.1%})"
)

### 1.9 Time Delay / Latency

**Hardware source**: Communication lag, slow sensors, processing delays

**Common in**: Network-based sensors, cameras with processing, filtered sensors

In [None]:
delayed_step = corrupt.with_delay(step_signal, delay=50, fill_value=0.0)

# Visualize
plt.figure(figsize=(12, 5))
plt.plot(t, step_signal, "b-", label="Clean Signal", linewidth=2, alpha=0.7)
plt.plot(t, delayed_step, "r-", label="Delayed by 50 samples", linewidth=2)
plt.xlabel("Time (s)", fontsize=12)
plt.ylabel("Amplitude", fontsize=12)
plt.title("Time Delay / Latency", fontsize=14, fontweight="bold")
plt.legend(fontsize=11)
plt.grid(True, alpha=0.3)
plt.show()

## Part 2: Signal Preparation (Corrupted → Clean)

The `prepare` class provides methods to clean corrupted signals before feeding them to estimators and controllers.

### 2.1 Moving Average Filter

**Best for**: Reducing Gaussian noise, smoothing signals

**Tradeoff**: Introduces lag, reduces bandwidth

In [None]:
# Create noisy signal
noisy_sine_2 = corrupt.with_gaussian_noise(sine_wave, std=0.3, seed=43)

# Apply moving average
smoothed_ma = prepare.with_moving_average(noisy_sine_2, window=10)

# Visualize
plt.figure(figsize=(12, 5))
plt.plot(t, sine_wave, "b-", label="Original Clean Signal", linewidth=2, alpha=0.5)
plt.plot(
    t,
    noisy_sine_2,
    "gray",
    label="Corrupted (Gaussian noise)",
    linewidth=0.5,
    alpha=0.5,
)
plt.plot(t, smoothed_ma, "r-", label="Cleaned (Moving Avg, window=10)", linewidth=2)
plt.xlabel("Time (s)", fontsize=12)
plt.ylabel("Amplitude", fontsize=12)
plt.title("Moving Average Denoising", fontsize=14, fontweight="bold")
plt.legend(fontsize=11)
plt.grid(True, alpha=0.3)
plt.show()

print(f"Corrupted signal std: {np.std(noisy_sine_2):.4f}")
print(f"Cleaned signal std: {np.std(smoothed_ma):.4f}")
print(f"Recovery error (RMS): {np.sqrt(np.mean((smoothed_ma - sine_wave)**2)):.4f}")

### 2.2 Median Filter

**Best for**: Removing spikes and outliers while preserving edges

**Advantage**: Highly effective against impulse noise, doesn't blur edges as much as moving average

In [None]:
# Create spiked signal
spiked_sine_2 = corrupt.with_spikes(
    sine_wave, spike_rate=0.03, spike_magnitude=4.0, seed=44
)

# Apply median filter
cleaned_median = prepare.with_median_filter(spiked_sine_2, window=5)

# Visualize
plt.figure(figsize=(12, 5))
plt.plot(t, sine_wave, "b-", label="Original Clean Signal", linewidth=2, alpha=0.5)
plt.plot(
    t, spiked_sine_2, "gray", label="Corrupted (with spikes)", linewidth=0.5, alpha=0.5
)
plt.plot(
    t, cleaned_median, "r-", label="Cleaned (Median filter, window=5)", linewidth=2
)
plt.xlabel("Time (s)", fontsize=12)
plt.ylabel("Amplitude", fontsize=12)
plt.title("Median Filter: Spike Removal", fontsize=14, fontweight="bold")
plt.legend(fontsize=11)
plt.grid(True, alpha=0.3)
plt.show()

print(f"Recovery error (RMS): {np.sqrt(np.mean((cleaned_median - sine_wave)**2)):.4f}")

### 2.3 Exponential Smoothing

**Best for**: Real-time filtering, giving more weight to recent data

**Parameter**: α ∈ [0,1] where α=1 is no filtering, α=0 is infinite smoothing

In [None]:
# Test different alpha values
noisy_sine_3 = corrupt.with_gaussian_noise(sine_wave, std=0.25, seed=45)

smoothed_alpha_03 = prepare.with_exponential_smoothing(noisy_sine_3, alpha=0.3)
smoothed_alpha_05 = prepare.with_exponential_smoothing(noisy_sine_3, alpha=0.5)
smoothed_alpha_08 = prepare.with_exponential_smoothing(noisy_sine_3, alpha=0.8)

# Visualize
plt.figure(figsize=(12, 5))
plt.plot(t, sine_wave, "b-", label="Original Clean", linewidth=2, alpha=0.5)
plt.plot(t, noisy_sine_3, "gray", label="Corrupted", linewidth=0.5, alpha=0.3)
plt.plot(
    t,
    smoothed_alpha_03,
    "r-",
    label="α=0.3 (heavy smoothing)",
    linewidth=1.5,
    alpha=0.8,
)
plt.plot(t, smoothed_alpha_05, "g-", label="α=0.5 (moderate)", linewidth=1.5, alpha=0.8)
plt.plot(t, smoothed_alpha_08, "m-", label="α=0.8 (light)", linewidth=1.5, alpha=0.8)
plt.xlabel("Time (s)", fontsize=12)
plt.ylabel("Amplitude", fontsize=12)
plt.title("Exponential Smoothing with Different α", fontsize=14, fontweight="bold")
plt.legend(fontsize=11)
plt.grid(True, alpha=0.3)
plt.show()

### 2.4 Debounce

**Best for**: Removing contact bounce from digital signals

**Requires**: Signal to remain stable for minimum duration before accepting transition

In [None]:
# Create bounced signal
bounced_step_2 = corrupt.with_bounce(step_signal, duration=8, amplitude=0.4, seed=46)

# Apply debounce
debounced = prepare.with_debounce(bounced_step_2, threshold=0.2, min_duration=3)

# Visualize - zoom into transition
zoom_start, zoom_end = 195, 230
plt.figure(figsize=(12, 5))
plt.plot(
    t[zoom_start:zoom_end],
    step_signal[zoom_start:zoom_end],
    "b-",
    label="Original Clean",
    linewidth=3,
    alpha=0.5,
)
plt.plot(
    t[zoom_start:zoom_end],
    bounced_step_2[zoom_start:zoom_end],
    "gray",
    label="Corrupted (bounce)",
    linewidth=1.5,
    alpha=0.7,
)
plt.plot(
    t[zoom_start:zoom_end],
    debounced[zoom_start:zoom_end],
    "r-",
    label="Cleaned (debounced)",
    linewidth=2,
)
plt.xlabel("Time (s)", fontsize=12)
plt.ylabel("Amplitude", fontsize=12)
plt.title("Debounce Filter (Zoomed Transition)", fontsize=14, fontweight="bold")
plt.legend(fontsize=11)
plt.grid(True, alpha=0.3)
plt.show()

### 2.5 Outlier Removal (Z-Score Method)

**Best for**: Detecting and handling statistical outliers

**Methods**: Replace with median or interpolate from neighbors

In [None]:
# Create signal with outliers
spiked_sine_3 = corrupt.with_spikes(
    sine_wave, spike_rate=0.05, spike_magnitude=5.0, seed=47
)

# Remove outliers (replace method)
cleaned_replace = prepare.with_outlier_removal(
    spiked_sine_3, threshold=2.5, method="replace"
)

# Remove outliers (interpolate method)
cleaned_interp = prepare.with_outlier_removal(
    spiked_sine_3, threshold=2.5, method="interpolate"
)

# Visualize
fig, axs = plt.subplots(1, 2, figsize=(14, 5))

axs[0].plot(t, sine_wave, "b-", label="Original Clean", linewidth=2, alpha=0.5)
axs[0].plot(
    t, spiked_sine_3, "gray", label="Corrupted (outliers)", linewidth=0.5, alpha=0.5
)
axs[0].plot(
    t, cleaned_replace, "r-", label="Cleaned (replace with median)", linewidth=1.5
)
axs[0].set_xlabel("Time (s)", fontsize=12)
axs[0].set_ylabel("Amplitude", fontsize=12)
axs[0].set_title("Outlier Removal: Replace Method", fontsize=14, fontweight="bold")
axs[0].legend(fontsize=11)
axs[0].grid(True, alpha=0.3)

axs[1].plot(t, sine_wave, "b-", label="Original Clean", linewidth=2, alpha=0.5)
axs[1].plot(
    t, spiked_sine_3, "gray", label="Corrupted (outliers)", linewidth=0.5, alpha=0.5
)
axs[1].plot(t, cleaned_interp, "g-", label="Cleaned (interpolate)", linewidth=1.5)
axs[1].set_xlabel("Time (s)", fontsize=12)
axs[1].set_ylabel("Amplitude", fontsize=12)
axs[1].set_title("Outlier Removal: Interpolate Method", fontsize=14, fontweight="bold")
axs[1].legend(fontsize=11)
axs[1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

### 2.6 Interpolation (Fill Missing Data)

**Best for**: Handling dropouts and missing sensor readings (NaN values)

**Methods**: Linear or nearest-neighbor interpolation

In [None]:
# Create signal with dropouts
dropped_sine_2 = corrupt.with_dropouts(sine_wave, dropout_rate=0.15, seed=48)

# Interpolate linearly
filled_linear = prepare.with_interpolation(dropped_sine_2, method="linear")

# Interpolate nearest
filled_nearest = prepare.with_interpolation(dropped_sine_2, method="nearest")

# Visualize
fig, axs = plt.subplots(1, 2, figsize=(14, 5))

axs[0].plot(t, sine_wave, "b-", label="Original Clean", linewidth=2, alpha=0.5)
axs[0].plot(t, dropped_sine_2, "r.", label="Corrupted (dropouts)", markersize=3)
axs[0].plot(t, filled_linear, "g-", label="Filled (linear)", linewidth=1.5, alpha=0.8)
axs[0].set_xlabel("Time (s)", fontsize=12)
axs[0].set_ylabel("Amplitude", fontsize=12)
axs[0].set_title("Linear Interpolation", fontsize=14, fontweight="bold")
axs[0].legend(fontsize=11)
axs[0].grid(True, alpha=0.3)

axs[1].plot(t, sine_wave, "b-", label="Original Clean", linewidth=2, alpha=0.5)
axs[1].plot(t, dropped_sine_2, "r.", label="Corrupted (dropouts)", markersize=3)
axs[1].plot(t, filled_nearest, "m-", label="Filled (nearest)", linewidth=1.5, alpha=0.8)
axs[1].set_xlabel("Time (s)", fontsize=12)
axs[1].set_ylabel("Amplitude", fontsize=12)
axs[1].set_title("Nearest-Neighbor Interpolation", fontsize=14, fontweight="bold")
axs[1].legend(fontsize=11)
axs[1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print(f"Dropped samples: {np.isnan(dropped_sine_2).sum()}")
print(f"Remaining NaN after linear interpolation: {np.isnan(filled_linear).sum()}")

### 2.7 Staleness Policy (Asynchronous Sensors)

**Best for**: Handling sensors that update at different rates or have intermittent data

**Policies**:
- `'zero'`: Replace missing with zeros
- `'hold'`: Forward-fill last valid value
- `'drop'`: Remove missing samples entirely
- `'none'`: Keep NaN as-is

**Critical for**: Multi-sensor fusion, asynchronous control systems

In [None]:
# Create signal with dropouts (simulating slow/intermittent sensor)
intermittent_signal = corrupt.with_dropouts(sine_wave[:200], dropout_rate=0.3, seed=49)

# Apply different policies
handled_zero = prepare.with_staleness_policy(intermittent_signal, policy="zero")
handled_hold = prepare.with_staleness_policy(intermittent_signal, policy="hold")
handled_drop = prepare.with_staleness_policy(intermittent_signal, policy="drop")

# Visualize
fig, axs = plt.subplots(2, 2, figsize=(14, 10))

# Original with dropouts
axs[0, 0].plot(t[:200], sine_wave[:200], "b-", label="Clean", linewidth=2, alpha=0.5)
axs[0, 0].plot(
    t[:200], intermittent_signal, "r.", label="Intermittent (30% dropout)", markersize=4
)
axs[0, 0].set_title("Original: Intermittent Sensor", fontsize=12, fontweight="bold")
axs[0, 0].legend()
axs[0, 0].grid(True, alpha=0.3)

# Zero policy
axs[0, 1].plot(t[:200], sine_wave[:200], "b-", label="Clean", linewidth=2, alpha=0.5)
axs[0, 1].plot(t[:200], handled_zero, "g-", label="Policy: 'zero'", linewidth=1.5)
axs[0, 1].set_title("Zero Policy (replace with 0)", fontsize=12, fontweight="bold")
axs[0, 1].legend()
axs[0, 1].grid(True, alpha=0.3)

# Hold policy
axs[1, 0].plot(t[:200], sine_wave[:200], "b-", label="Clean", linewidth=2, alpha=0.5)
axs[1, 0].plot(t[:200], handled_hold, "m-", label="Policy: 'hold'", linewidth=1.5)
axs[1, 0].set_xlabel("Time (s)", fontsize=11)
axs[1, 0].set_title("Hold Policy (forward-fill)", fontsize=12, fontweight="bold")
axs[1, 0].legend()
axs[1, 0].grid(True, alpha=0.3)

# Drop policy (shorter array)
t_drop = np.linspace(0, 2, len(handled_drop))
sine_drop = sine_wave[: len(handled_drop)]
axs[1, 1].plot(t_drop, sine_drop, "b-", label="Clean", linewidth=2, alpha=0.5)
axs[1, 1].plot(
    t_drop,
    handled_drop,
    "c-",
    label="Policy: 'drop'",
    linewidth=1.5,
    marker="o",
    markersize=2,
)
axs[1, 1].set_xlabel("Time (s)", fontsize=11)
axs[1, 1].set_title("Drop Policy (remove NaN)", fontsize=12, fontweight="bold")
axs[1, 1].legend()
axs[1, 1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print(f"Original length: {len(intermittent_signal)}")
print(f"After 'drop' policy: {len(handled_drop)} samples")

### 2.8 Calibration (Remove Bias and Scale)

**Best for**: Correcting systematic sensor errors after measuring bias and scale

**Requires**: Offline calibration to determine offset and scale factor

In [None]:
# Simulate miscalibrated sensor: scaled by 2× and offset by +3
miscalibrated = sine_wave * 2.0 + 3.0

# Apply calibration (reverse the corruption)
calibrated = prepare.with_calibration(miscalibrated, offset=3.0, scale=0.5)

# Visualize
plt.figure(figsize=(12, 5))
plt.plot(t, sine_wave, "b-", label="True Signal (calibrated)", linewidth=2, alpha=0.5)
plt.plot(t, miscalibrated, "r-", label="Miscalibrated (2× + 3)", linewidth=2, alpha=0.7)
plt.plot(t, calibrated, "g--", label="After Calibration", linewidth=2)
plt.xlabel("Time (s)", fontsize=12)
plt.ylabel("Amplitude", fontsize=12)
plt.title(
    "Sensor Calibration (Bias and Scale Correction)", fontsize=14, fontweight="bold"
)
plt.legend(fontsize=11)
plt.grid(True, alpha=0.3)
plt.show()

print(f"Calibration error (RMS): {np.sqrt(np.mean((calibrated - sine_wave)**2)):.6f}")

### 2.9 Low-Pass Filter (First-Order RC Filter)

**Best for**: Attenuating high-frequency noise

**Parameter**: α ∈ [0,1] where α=0 is maximum filtering, α=1 is no filtering

In [None]:
# Create noisy signal
noisy_sine_4 = corrupt.with_gaussian_noise(sine_wave, std=0.3, seed=50)

# Apply low-pass filter
filtered_lp = prepare.with_low_pass_filter(noisy_sine_4, alpha=0.1)

# Visualize
plt.figure(figsize=(12, 5))
plt.plot(t, sine_wave, "b-", label="Original Clean", linewidth=2, alpha=0.5)
plt.plot(t, noisy_sine_4, "gray", label="Corrupted (noise)", linewidth=0.5, alpha=0.5)
plt.plot(t, filtered_lp, "r-", label="Cleaned (low-pass α=0.1)", linewidth=2)
plt.xlabel("Time (s)", fontsize=12)
plt.ylabel("Amplitude", fontsize=12)
plt.title("Low-Pass Filter (First-Order RC)", fontsize=14, fontweight="bold")
plt.legend(fontsize=11)
plt.grid(True, alpha=0.3)
plt.show()

### 2.10 Clipping Recovery

**Best for**: Detecting saturated sensor values and marking them as invalid

**Use case**: Identifying when force sensor hits max load, encoder at mechanical stop

In [None]:
# Create clipped signal
large_sine_2 = sine_wave * 2.5
clipped_sine_2 = corrupt.with_clipping(large_sine_2, lower=-1.0, upper=1.0)

# Detect and mark clipped values as NaN
recovered = prepare.with_clipping_recovery(
    clipped_sine_2, upper=1.0, lower=-1.0, mark_invalid=True
)

# Visualize
plt.figure(figsize=(12, 5))
plt.plot(
    t, large_sine_2, "b-", label="True Signal (exceeds limits)", linewidth=2, alpha=0.5
)
plt.plot(t, clipped_sine_2, "r-", label="Clipped Signal", linewidth=2, alpha=0.7)
plt.plot(t, recovered, "g.", label="Valid Data (clipped marked as NaN)", markersize=2)
plt.axhline(y=1.0, color="k", linestyle="--", alpha=0.5, label="Saturation Limits")
plt.axhline(y=-1.0, color="k", linestyle="--", alpha=0.5)
plt.xlabel("Time (s)", fontsize=12)
plt.ylabel("Amplitude", fontsize=12)
plt.title("Clipping Detection and Recovery", fontsize=14, fontweight="bold")
plt.legend(fontsize=11)
plt.grid(True, alpha=0.3)
plt.show()

print(f"Total samples: {len(clipped_sine_2)}")
print(f"Clipped samples detected: {np.isnan(recovered).sum()}")
print(f"Valid samples remaining: {(~np.isnan(recovered)).sum()}")

## Part 3: Realistic Hardware Scenarios

Let's combine multiple corruption and preparation methods to simulate real-world sensor issues.

### Scenario 1: Noisy IMU Accelerometer

**Issues**: Gaussian noise + bias + occasional spikes

**Solution**: Calibration → Median filter (spikes) → Low-pass filter (noise)

In [None]:
# Simulate realistic IMU data
imu_true = sine_wave
imu_raw = imu_true.copy()
imu_raw = corrupt.with_bias(imu_raw, bias=0.5)  # Uncalibrated bias
imu_raw = corrupt.with_gaussian_noise(imu_raw, std=0.15, seed=51)  # Sensor noise
imu_raw = corrupt.with_spikes(
    imu_raw, spike_rate=0.01, spike_magnitude=2.0, seed=52
)  # EMI spikes

# Cleaning pipeline
imu_step1 = prepare.with_calibration(imu_raw, offset=0.5, scale=1.0)  # Remove bias
imu_step2 = prepare.with_median_filter(imu_step1, window=3)  # Remove spikes
imu_clean = prepare.with_low_pass_filter(imu_step2, alpha=0.15)  # Smooth noise

# Visualize
fig, axs = plt.subplots(2, 2, figsize=(14, 10))

axs[0, 0].plot(t, imu_true, "b-", label="True Signal", linewidth=2)
axs[0, 0].set_title("1. True IMU Reading", fontsize=12, fontweight="bold")
axs[0, 0].legend()
axs[0, 0].grid(True, alpha=0.3)

axs[0, 1].plot(
    t, imu_raw, "r-", label="Raw (bias+noise+spikes)", linewidth=1, alpha=0.7
)
axs[0, 1].set_title("2. Raw Corrupted Sensor", fontsize=12, fontweight="bold")
axs[0, 1].legend()
axs[0, 1].grid(True, alpha=0.3)

axs[1, 0].plot(
    t,
    imu_step2,
    "g-",
    label="After calibration + spike removal",
    linewidth=1.5,
    alpha=0.8,
)
axs[1, 0].set_title("3. Intermediate (spikes removed)", fontsize=12, fontweight="bold")
axs[1, 0].set_xlabel("Time (s)", fontsize=11)
axs[1, 0].legend()
axs[1, 0].grid(True, alpha=0.3)

axs[1, 1].plot(t, imu_true, "b-", label="True Signal", linewidth=2, alpha=0.5)
axs[1, 1].plot(t, imu_clean, "m-", label="Final Cleaned", linewidth=2)
axs[1, 1].set_title("4. Final Clean Signal", fontsize=12, fontweight="bold")
axs[1, 1].set_xlabel("Time (s)", fontsize=11)
axs[1, 1].legend()
axs[1, 1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print(f"Raw signal RMS error: {np.sqrt(np.mean((imu_raw - imu_true)**2)):.4f}")
print(f"Cleaned signal RMS error: {np.sqrt(np.mean((imu_clean - imu_true)**2)):.4f}")
print(
    f"Improvement: {((np.sqrt(np.mean((imu_raw - imu_true)**2)) - np.sqrt(np.mean((imu_clean - imu_true)**2))) / np.sqrt(np.mean((imu_raw - imu_true)**2)) * 100):.1f}% reduction in error"
)

### Scenario 2: Bouncing Rotary Encoder

**Issues**: Contact bounce on state transitions

**Solution**: Debounce filter

In [None]:
# Simulate encoder pulses
encoder_true = step_signal
encoder_raw = corrupt.with_bounce(encoder_true, duration=10, amplitude=0.4, seed=53)

# Clean with debounce
encoder_clean = prepare.with_debounce(encoder_raw, threshold=0.2, min_duration=4)

# Visualize
fig, axs = plt.subplots(3, 1, figsize=(12, 10))

axs[0].plot(t, encoder_true, "b-", label="True Encoder State", linewidth=2)
axs[0].set_title("1. True Encoder Signal", fontsize=12, fontweight="bold")
axs[0].set_ylabel("State", fontsize=11)
axs[0].legend()
axs[0].grid(True, alpha=0.3)

axs[1].plot(t, encoder_raw, "r-", label="Raw with Bounce", linewidth=1.5)
axs[1].set_title("2. Corrupted (Contact Bounce)", fontsize=12, fontweight="bold")
axs[1].set_ylabel("State", fontsize=11)
axs[1].legend()
axs[1].grid(True, alpha=0.3)

axs[2].plot(t, encoder_true, "b-", label="True", linewidth=2, alpha=0.5)
axs[2].plot(t, encoder_clean, "g-", label="Debounced", linewidth=2)
axs[2].set_title("3. Cleaned (Debounced)", fontsize=12, fontweight="bold")
axs[2].set_xlabel("Time (s)", fontsize=11)
axs[2].set_ylabel("State", fontsize=11)
axs[2].legend()
axs[2].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

### Scenario 3: Wireless Sensor with Packet Loss

**Issues**: Random dropouts (NaN values)

**Solution**: Interpolation + staleness policy for real-time use

In [None]:
# Simulate wireless sensor
wireless_true = sine_wave
wireless_raw = corrupt.with_dropouts(wireless_true, dropout_rate=0.25, seed=54)

# Two recovery strategies
wireless_interpolated = prepare.with_interpolation(wireless_raw, method="linear")
wireless_hold = prepare.with_staleness_policy(wireless_raw, policy="hold")

# Visualize
fig, axs = plt.subplots(3, 1, figsize=(12, 10))

axs[0].plot(t, wireless_true, "b-", label="True Signal", linewidth=2)
axs[0].set_title("1. True Sensor Reading", fontsize=12, fontweight="bold")
axs[0].legend()
axs[0].grid(True, alpha=0.3)

axs[1].plot(t, wireless_raw, "r.", label="Received (25% packet loss)", markersize=3)
axs[1].set_title("2. Corrupted (Wireless Dropouts)", fontsize=12, fontweight="bold")
axs[1].legend()
axs[1].grid(True, alpha=0.3)

axs[2].plot(t, wireless_true, "b-", label="True", linewidth=2, alpha=0.3)
axs[2].plot(
    t, wireless_interpolated, "g-", label="Interpolated", linewidth=1.5, alpha=0.8
)
axs[2].plot(t, wireless_hold, "m--", label="Hold Policy", linewidth=1.5, alpha=0.8)
axs[2].set_title("3. Cleaned (Two Strategies)", fontsize=12, fontweight="bold")
axs[2].set_xlabel("Time (s)", fontsize=11)
axs[2].legend()
axs[2].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print(
    f"Interpolated RMS error: {np.sqrt(np.mean((wireless_interpolated - wireless_true)**2)):.4f}"
)
print(
    f"Hold policy RMS error: {np.sqrt(np.mean((wireless_hold - wireless_true)**2)):.4f}"
)

## Summary

The `pykal.data_change` module provides a comprehensive toolkit for bridging simulation and hardware:

### Corruption Methods (`corrupt` class):
1. **Gaussian noise** - thermal noise, ADC noise
2. **Bounce** - contact bounce in switches/encoders
3. **Dropouts** - packet loss, intermittent connections
4. **Bias** - uncalibrated sensor offsets
5. **Drift** - time-dependent bias (warm-up, degradation)
6. **Quantization** - ADC resolution limits
7. **Spikes** - EMI, electrical interference
8. **Clipping** - sensor saturation
9. **Delay** - communication/processing latency

### Preparation Methods (`prepare` class):
1. **Moving average** - denoise smooth signals
2. **Median filter** - remove spikes while preserving edges
3. **Exponential smoothing** - real-time filtering
4. **Debounce** - clean digital signal transitions
5. **Outlier removal** - detect and handle statistical outliers
6. **Interpolation** - fill missing data (NaN)
7. **Staleness policy** - handle asynchronous sensors
8. **Calibration** - remove bias and scale errors
9. **Low-pass filter** - attenuate high frequencies
10. **Clipping recovery** - detect saturation

### Use in Your Robots:
- **In simulation**: Use `corrupt` to test robustness before hardware deployment
- **In hardware**: Use `prepare` to clean real sensor data before feeding to estimators
- **In TurtleBot/Crazyflie notebooks**: See practical examples of noise handling and sensor fusion

:::{note}
All methods are static and operate on NumPy arrays, making them easy to integrate into existing pykal workflows and ROS2 nodes.
:::

[← Modules](../../../getting_started/theory_to_python/modules.rst)