# The Curse of Hardware

**← [Back to What is PyKal?](../what_is_pykal/index.rst)** | **[Previous: PyKal Workflow](pykal_workflow.ipynb)**

---

In the pristine realm of theory, our sensors measure exactly what we want, our actuators execute precisely what we command, and our communication channels transmit data with perfect fidelity. Reality, however, has other plans.

Hardware is cursed. Sensors drift like unfaithful companions. Switches bounce like caffeinated rabbits. Wireless packets vanish into the electromagnetic void. And just when you think you've calibrated that IMU, thermal expansion whispers *"think again."*

This notebook catalogues the menagerie of hardware maladies you **will** encounter in robotics, and—more importantly—the ``pykal.data`` utilities that can save you from debugging hell at 3 AM. We'll examine each corruption type, watch it destroy a perfectly good signal, then bring in the preparation methods to restore order.

Each section follows the same structure:
1. **The Corruption**: What breaks and why
2. **The Demonstration**: Watching a clean signal get destroyed
3. **The Salvation**: Applying the right preparation method
4. **The Recovery**: Comparing corrupted vs cleaned signals

Let us begin our descent into hardware chaos.

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from pykal.data_change import corrupt, prepare

# Set random seed for reproducibility
np.random.seed(42)

# Configure matplotlib for nice plots
plt.rcParams['figure.figsize'] = (14, 5)
plt.rcParams['font.size'] = 10

print("✓ Imports successful. Let the corruption begin.")

## Helper Function

We'll use this to generate clean test signals throughout.

In [None]:
def generate_clean_signal(n=200, signal_type='composite'):
    """Generate various clean test signals."""
    t = np.linspace(0, 4*np.pi, n)
    
    if signal_type == 'composite':
        # Smooth signal with multiple frequency components
        return t, np.sin(t) + 0.3*np.sin(3*t) + 0.5*np.cos(0.5*t)
    elif signal_type == 'step':
        # Step signal for testing bounce
        signal = np.zeros(n)
        signal[n//4:n//2] = 1.0
        signal[3*n//4:] = 1.0
        return t, signal
    elif signal_type == 'ramp':
        # Linear ramp for testing drift
        return t, 0.5 * t
    else:
        return t, np.sin(t)

print("✓ Helper function defined.")

## 1. Gaussian Noise: The Universal Tormentor

### The Corruption

Thermal noise. Quantization error. Electromagnetic interference. Johnson noise. Shot noise. The universe *really* wants your measurements to be wrong, and Gaussian noise is its favorite weapon.

Every analog sensor suffers from it. Your accelerometer. Your force sensor. That temperature reading that keeps flickering by ±0.2°C. All victims of the same thermal demons.

Mathematically, we model this as additive white Gaussian noise (AWGN):

$$
y_k = x_k + \mathcal{N}(0, \sigma^2)
$$

where $x_k$ is your true signal and $\mathcal{N}(0, \sigma^2)$ is the noise with zero mean and variance $\sigma^2$.

In [None]:
# Generate clean signal
t, clean = generate_clean_signal(n=200)

# Add Gaussian noise
noisy = corrupt.with_gaussian_noise(clean, std=0.3, seed=42)

# Plot: Clean vs Corrupted
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5))

ax1.plot(t, clean, 'g-', linewidth=2, label='Clean Signal')
ax1.set_xlabel('Time')
ax1.set_ylabel('Amplitude')
ax1.set_title('Clean Signal (The Dream)', fontweight='bold')
ax1.legend()
ax1.grid(True, alpha=0.3)

ax2.plot(t, noisy, 'r-', alpha=0.7, linewidth=1, label='Noisy Signal')
ax2.plot(t, clean, 'g--', alpha=0.3, linewidth=2, label='Clean (reference)')
ax2.set_xlabel('Time')
ax2.set_ylabel('Amplitude')
ax2.set_title('With Gaussian Noise (The Reality)', fontweight='bold')
ax2.legend()
ax2.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print(f"Signal-to-Noise Ratio: {10*np.log10(np.var(clean)/np.var(noisy - clean)):.2f} dB")

### The Salvation: Low-Pass Filtering

Gaussian noise is high-frequency chaos. Our signal? Usually smooth and low-frequency. The solution: **low-pass filters**.

We have three weapons:
1. **Moving Average**: Simple, intuitive, effective. Replaces each point with the average of its neighbors.
2. **Exponential Smoothing**: Gives more weight to recent data. Good for real-time systems.
3. **Low-Pass Filter**: Classic first-order filter. Think RC circuit in discrete time.

Let's deploy all three.

In [None]:
# Apply different denoising methods
cleaned_ma = prepare.with_moving_average(noisy, window=5)
cleaned_exp = prepare.with_exponential_smoothing(noisy, alpha=0.2)
cleaned_lp = prepare.with_low_pass_filter(noisy, alpha=0.2)

# Plot: Corrupted vs Multiple Cleaning Methods
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5))

ax1.plot(t, noisy, 'r-', alpha=0.5, linewidth=1, label='Noisy')
ax1.plot(t, clean, 'g--', alpha=0.3, linewidth=2, label='Original Clean')
ax1.set_xlabel('Time')
ax1.set_ylabel('Amplitude')
ax1.set_title('Corrupted Signal', fontweight='bold')
ax1.legend()
ax1.grid(True, alpha=0.3)

ax2.plot(t, clean, 'g-', alpha=0.3, linewidth=2, label='Original Clean')
ax2.plot(t, cleaned_ma, 'b-', linewidth=1.5, alpha=0.8, label='Moving Average')
ax2.plot(t, cleaned_exp, 'm-', linewidth=1.5, alpha=0.6, label='Exponential Smoothing')
ax2.plot(t, cleaned_lp, 'c-', linewidth=1.5, alpha=0.6, label='Low-Pass Filter')
ax2.set_xlabel('Time')
ax2.set_ylabel('Amplitude')
ax2.set_title('After Denoising (Victory!)', fontweight='bold')
ax2.legend()
ax2.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

# Compute recovery metrics
mse_noisy = np.mean((noisy - clean)**2)
mse_ma = np.mean((cleaned_ma - clean)**2)
print(f"MSE - Noisy: {mse_noisy:.4f}")
print(f"MSE - Moving Average: {mse_ma:.4f}")
print(f"Improvement: {((mse_noisy - mse_ma)/mse_noisy * 100):.1f}%")

## 2. Contact Bounce: The Fidgety Switch

### The Corruption

Press a button. It should go from 0 to 1. Clean. Simple. 

Instead, you get: 0→0.2→-0.1→0.8→1.2→0.9→1. 

Why? Because mechanical contacts don't close cleanly. They *bounce*. The metal surfaces make and break contact multiple times before settling. What should be a single transition becomes a flurry of oscillations.

This plagues:
- Limit switches on your robot arm
- Encoder transitions on your wheels  
- Any mechanical button or switch
- Hall effect sensors detecting magnets

The hardware folks solve this with capacitors. We solve it with math.

In [None]:
# Generate step signal
t, clean = generate_clean_signal(n=200, signal_type='step')

# Add bounce
bounced = corrupt.with_bounce(clean, duration=5, amplitude=0.3, seed=42)

# Plot: Clean vs Bounced
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5))

ax1.plot(t, clean, 'g-', linewidth=2, label='Clean Signal')
ax1.set_xlabel('Time')
ax1.set_ylabel('Signal Level')
ax1.set_title('Clean Digital Signal (Ideal)', fontweight='bold')
ax1.set_ylim([-0.5, 1.5])
ax1.legend()
ax1.grid(True, alpha=0.3)

ax2.plot(t, bounced, 'r-', linewidth=1.5, label='Bounced Signal')
ax2.plot(t, clean, 'g--', alpha=0.3, linewidth=2, label='Clean (reference)')
ax2.set_xlabel('Time')
ax2.set_ylabel('Signal Level')
ax2.set_title('With Contact Bounce (Mechanical Reality)', fontweight='bold')
ax2.set_ylim([-0.5, 1.5])
ax2.legend()
ax2.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

### The Salvation: Debouncing

The fix: **require stability**. Don't accept a transition until the signal has stayed at the new level for several consecutive samples. This is the digital equivalent of "are you *sure* you meant that?"

The ``debounce`` method implements this logic, filtering out the rapid oscillations while preserving genuine state changes.

In [None]:
# Apply debouncing
debounced = prepare.with_debounce(bounced, threshold=0.1, min_duration=3)

# Plot: Bounced vs Debounced
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5))

ax1.plot(t, bounced, 'r-', linewidth=1.5, label='Bounced Signal')
ax1.plot(t, clean, 'g--', alpha=0.3, linewidth=2, label='Original Clean')
ax1.set_xlabel('Time')
ax1.set_ylabel('Signal Level')
ax1.set_title('With Bounce (The Problem)', fontweight='bold')
ax1.set_ylim([-0.5, 1.5])
ax1.legend()
ax1.grid(True, alpha=0.3)

ax2.plot(t, debounced, 'b-', linewidth=2, label='Debounced Signal')
ax2.plot(t, clean, 'g--', alpha=0.5, linewidth=2, label='Original Clean')
ax2.set_xlabel('Time')
ax2.set_ylabel('Signal Level')
ax2.set_title('After Debouncing (The Solution)', fontweight='bold')
ax2.set_ylim([-0.5, 1.5])
ax2.legend()
ax2.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print(f"Recovery accuracy: {np.mean(np.abs(debounced - clean) < 0.1) * 100:.1f}%")

## 3. Dropouts: The Vanishing Act

### The Corruption

Wireless communication is a beautiful lie. Your sensor sends data at 100 Hz. Your computer receives... 94 Hz? 87 Hz? Who knows! 

Packets disappear. Serial connections hiccup. I²C slaves get confused. Your sensor reading becomes:

```
[1.2, 1.3, NaN, NaN, 1.6, 1.7, NaN, 1.9]
```

Those NaNs? They're data points that never arrived. Your Kalman filter **hates** this.

Common causes:
- WiFi interference (looking at you, microwave)
- Bluetooth packet collisions
- Serial buffer overflow
- Intermittent connections (loose wires, poor solder joints)
- Sensor power glitches

In [None]:
# Generate clean signal
t, clean = generate_clean_signal(n=200)

# Add dropouts
dropout_data = corrupt.with_dropouts(clean, dropout_rate=0.15, seed=42)

# Plot: Clean vs Dropout
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5))

ax1.plot(t, clean, 'g-', linewidth=2, label='Clean Signal')
ax1.set_xlabel('Time')
ax1.set_ylabel('Amplitude')
ax1.set_title('Clean Signal (Perfect Communication)', fontweight='bold')
ax1.legend()
ax1.grid(True, alpha=0.3)

# Plot dropout data with gaps
valid_mask = ~np.isnan(dropout_data)
ax2.plot(t[valid_mask], dropout_data[valid_mask], 'r.', markersize=4, label='Received Data')
ax2.plot(t, clean, 'g--', alpha=0.3, linewidth=2, label='Clean (reference)')
ax2.set_xlabel('Time')
ax2.set_ylabel('Amplitude')
ax2.set_title(f'With Dropouts ({np.isnan(dropout_data).sum()} missing points)', fontweight='bold')
ax2.legend()
ax2.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print(f"Packet loss rate: {np.isnan(dropout_data).sum() / len(dropout_data) * 100:.1f}%")

### The Salvation: Interpolation

Missing data? **Infer it** from neighboring points. Linear interpolation draws straight lines between known values. It's not perfect (the universe doesn't owe us linearity), but it's far better than NaN.

For slowly-varying signals, this works remarkably well. Your position sensor dropped a reading? Interpolate from the last and next valid measurements. Much better than telling your controller "the robot's position is undefined."

In [None]:
# Apply interpolation
interpolated = prepare.with_interpolation(dropout_data, method='linear')

# Plot: Dropout vs Interpolated
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5))

valid_mask = ~np.isnan(dropout_data)
ax1.plot(t[valid_mask], dropout_data[valid_mask], 'r.', markersize=4, label='Received Data')
ax1.plot(t, clean, 'g--', alpha=0.3, linewidth=2, label='Original Clean')
ax1.set_xlabel('Time')
ax1.set_ylabel('Amplitude')
ax1.set_title('With Dropouts (The Problem)', fontweight='bold')
ax1.legend()
ax1.grid(True, alpha=0.3)

ax2.plot(t, interpolated, 'b-', linewidth=1.5, label='Interpolated Signal')
ax2.plot(t, clean, 'g--', alpha=0.5, linewidth=2, label='Original Clean')
ax2.plot(t[~valid_mask], interpolated[~valid_mask], 'rx', markersize=6, label='Interpolated Points')
ax2.set_xlabel('Time')
ax2.set_ylabel('Amplitude')
ax2.set_title('After Interpolation (The Solution)', fontweight='bold')
ax2.legend()
ax2.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

mse = np.mean((interpolated - clean)**2)
print(f"Reconstruction MSE: {mse:.4f}")

## 3.5. Staleness Policies: The Update Rate Mismatch

### The Problem

Different sensors update at different rates. Your IMU: 200 Hz. Your GPS: 10 Hz. Your camera: 30 Hz.

What happens when your control loop runs at 100 Hz but GPS hasn't sent new data in the last 5 iterations? Is that data still valid? Should you use it? Replace it with zeros? Hold the old value?

This is the **staleness problem**. Data isn't wrong—it's just *old*. And different applications require different policies for handling old data.

In `ROSNode` (see `src/pykal/ros_node.py`), we configure staleness policies per sensor:

```python
staleness_config = {
    "gps_position": {"after": 0.5, "policy": "hold"},  # Hold for 0.5s
    "imu_accel": {"after": 0.1, "policy": "zero"},     # Zero after 0.1s
    "camera_detection": {"after": 1.0, "policy": "drop"}  # Drop after 1s
}
```

The `pykal.data` module provides the same policies for offline testing and simulation.

In [None]:
# Generate signal with dropouts (simulating stale data)
t, clean = generate_clean_signal(n=200)
stale_data = corrupt.with_dropouts(clean, dropout_rate=0.20, seed=42)

print(f"Data with {np.isnan(stale_data).sum()} stale/missing points")

### The Four Policies

Let's see how each policy handles the same stale data:

1. **'hold'**: Forward fill - hold last valid value (good for slowly-varying signals like GPS)
2. **'zero'**: Replace with zeros (good for velocities, accelerations that should default to zero)
3. **'drop'**: Remove stale points entirely (good for statistical processing)
4. **'none'**: Keep NaN as-is (for debugging, or explicit handling downstream)

In [None]:
# Apply different staleness policies
policy_hold = prepare.with_staleness_policy(stale_data, policy='hold')
policy_zero = prepare.with_staleness_policy(stale_data, policy='zero')
policy_drop = prepare.with_staleness_policy(stale_data, policy='drop')
policy_none = prepare.with_staleness_policy(stale_data, policy='none')

# Plot: All four policies side by side
fig, axes = plt.subplots(2, 2, figsize=(14, 10))

# Policy 1: Hold
axes[0, 0].plot(t, policy_hold, 'b-', linewidth=2, label='Hold Policy')
axes[0, 0].plot(t, clean, 'g--', alpha=0.3, linewidth=2, label='Original Clean')
axes[0, 0].set_xlabel('Time')
axes[0, 0].set_ylabel('Amplitude')
axes[0, 0].set_title('HOLD: Forward Fill (Last Valid Value)', fontweight='bold')
axes[0, 0].legend()
axes[0, 0].grid(True, alpha=0.3)

# Policy 2: Zero
axes[0, 1].plot(t, policy_zero, 'r-', linewidth=2, label='Zero Policy')
axes[0, 1].plot(t, clean, 'g--', alpha=0.3, linewidth=2, label='Original Clean')
axes[0, 1].set_xlabel('Time')
axes[0, 1].set_ylabel('Amplitude')
axes[0, 1].set_title('ZERO: Replace with Zeros', fontweight='bold')
axes[0, 1].legend()
axes[0, 1].grid(True, alpha=0.3)

# Policy 3: Drop
t_drop = t[~np.isnan(stale_data)]  # Time points for valid data only
axes[1, 0].plot(t_drop, policy_drop, 'mo', markersize=4, label='Drop Policy')
axes[1, 0].plot(t, clean, 'g--', alpha=0.3, linewidth=2, label='Original Clean')
axes[1, 0].set_xlabel('Time')
axes[1, 0].set_ylabel('Amplitude')
axes[1, 0].set_title(f'DROP: Remove Stale Points ({len(policy_drop)} valid points)', fontweight='bold')
axes[1, 0].legend()
axes[1, 0].grid(True, alpha=0.3)

# Policy 4: None
valid_mask = ~np.isnan(policy_none)
axes[1, 1].plot(t[valid_mask], policy_none[valid_mask], 'c.', markersize=4, label='Valid Data')
axes[1, 1].plot(t[~valid_mask], [0]*np.sum(~valid_mask), 'rx', markersize=8, label='NaN (stale)')
axes[1, 1].plot(t, clean, 'g--', alpha=0.3, linewidth=2, label='Original Clean')
axes[1, 1].set_xlabel('Time')
axes[1, 1].set_ylabel('Amplitude')
axes[1, 1].set_title('NONE: Keep NaN As-Is (No Processing)', fontweight='bold')
axes[1, 1].legend()
axes[1, 1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print("\nPolicy Comparison:")
print(f"  Hold:  All {len(policy_hold)} points filled (forward fill)")
print(f"  Zero:  All {len(policy_zero)} points filled (with zeros)")
print(f"  Drop:  Only {len(policy_drop)} valid points kept")
print(f"  None:  All {len(policy_none)} points kept ({np.isnan(policy_none).sum()} as NaN)")

### When to Use Each Policy

**HOLD (Forward Fill)**
- Slowly-varying measurements (GPS position, temperature)
- When continuity matters more than accuracy
- State estimates that shouldn't suddenly jump to zero

**ZERO**
- Velocities, accelerations (physical quantities that default to zero at rest)
- Control inputs (if no command, assume zero)
- When a missing measurement means "nothing is happening"

**DROP**
- Statistical processing (mean, variance) where stale data skews results
- When you have enough valid data and stale points would mislead
- Sensor fusion where staleness is handled by the fusion algorithm

**NONE**
- Debugging (see exactly where data is missing)
- When downstream code has explicit NaN handling
- Passing to algorithms that require awareness of missing data

### Real-World Example: Sensor Fusion

In a typical robot, you might use:

```python
# Fast IMU: hold for 50ms, then zero (assume stopped if no updates)
imu_data = prepare.with_staleness_policy(raw_imu, policy='zero')

# Slow GPS: hold for 1 second (position doesn't change instantly)
gps_data = prepare.with_staleness_policy(raw_gps, policy='hold')

# Intermittent camera detections: drop if stale (don't use old detections)
# This returns a shorter array, so handle separately
valid_detections = prepare.with_staleness_policy(raw_detections, policy='drop')
```

This matches exactly how `ROSNode` handles staleness in real-time!

## 4. Bias: The Systematic Liar

### The Corruption

Your accelerometer reads 0.3 m/s² when the robot is sitting still. Your force sensor outputs 12 N with no load. Your gyroscope thinks the world is rotating at 0.02 rad/s while the robot sleeps peacefully on your desk.

This is **bias**: a constant offset in your measurements. Unlike noise (which averages to zero), bias is systematic. Every reading is wrong by the same amount.

Causes:
- Factory calibration drift
- Temperature changes
- Component aging
- Poor zero-offset calibration
- Magnetic interference (for magnetometers)

The insidious part? Your sensor looks precise (low noise), but it's *inaccurate*. All those beautiful low-noise readings are consistently wrong.

In [None]:
# Generate clean signal
t, clean = generate_clean_signal(n=200)

# Add bias
biased = corrupt.with_bias(clean, bias=0.8)

# Plot: Clean vs Biased
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5))

ax1.plot(t, clean, 'g-', linewidth=2, label='Clean Signal')
ax1.axhline(y=0, color='k', linestyle=':', alpha=0.5)
ax1.set_xlabel('Time')
ax1.set_ylabel('Amplitude')
ax1.set_title('Clean Signal (True Zero)', fontweight='bold')
ax1.legend()
ax1.grid(True, alpha=0.3)

ax2.plot(t, biased, 'r-', linewidth=2, label='Biased Signal')
ax2.plot(t, clean, 'g--', alpha=0.3, linewidth=2, label='Clean (reference)')
ax2.axhline(y=0.8, color='r', linestyle=':', alpha=0.5, label='Bias offset')
ax2.axhline(y=0, color='k', linestyle=':', alpha=0.5)
ax2.set_xlabel('Time')
ax2.set_ylabel('Amplitude')
ax2.set_title('With Constant Bias (Uncalibrated)', fontweight='bold')
ax2.legend()
ax2.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print(f"Mean error: {np.mean(biased - clean):.4f} (should equal bias)")

### The Salvation: Calibration

The fix is almost offensively simple: **subtract the bias**. 

How do you find the bias? 
1. Put your sensor in a known state (robot stationary, no load on force sensor, etc.)
2. Record many measurements
3. Average them—that's your bias
4. Subtract it from all future readings

This is why good roboticists start every session with a calibration routine. Don't trust factory calibration. Don't trust yesterday's calibration. Calibrate. Every. Time.

In [None]:
# Apply calibration (remove the bias)
calibrated = prepare.with_calibration(biased, offset=0.8, scale=1.0)

# Plot: Biased vs Calibrated
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5))

ax1.plot(t, biased, 'r-', linewidth=2, label='Biased Signal')
ax1.plot(t, clean, 'g--', alpha=0.3, linewidth=2, label='Original Clean')
ax1.axhline(y=0.8, color='r', linestyle=':', alpha=0.5, label='Bias offset')
ax1.set_xlabel('Time')
ax1.set_ylabel('Amplitude')
ax1.set_title('Biased Signal (The Problem)', fontweight='bold')
ax1.legend()
ax1.grid(True, alpha=0.3)

ax2.plot(t, calibrated, 'b-', linewidth=2, label='Calibrated Signal')
ax2.plot(t, clean, 'g--', alpha=0.5, linewidth=2, label='Original Clean')
ax2.axhline(y=0, color='k', linestyle=':', alpha=0.5)
ax2.set_xlabel('Time')
ax2.set_ylabel('Amplitude')
ax2.set_title('After Calibration (Perfect Recovery)', fontweight='bold')
ax2.legend()
ax2.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print(f"Residual error: {np.mean(np.abs(calibrated - clean)):.6f} (should be ~0)")

## 5. Drift: The Wandering Bias

### The Corruption

Calibrated your sensor? Great. Come back in 30 minutes—it's wrong again.

**Drift** is bias that changes over time. Your gyroscope's zero-rate output slowly climbs. Your pressure sensor wanders as it warms up. Your strain gauge drifts as the sun heats your robot's frame.

Mathematically:
$$
y_k = x_k + b_k, \quad b_k = b_0 + \alpha \cdot k
$$

where $b_k$ is the time-varying bias and $\alpha$ is the drift rate.

This is particularly evil for:
- Gyroscopes (integrate drift → your robot thinks it rotated 47° while sitting still)
- Thermal sensors during warm-up
- Analog circuits experiencing temperature changes
- Any MEMS device over long time scales

In [None]:
# Generate clean signal
t, clean = generate_clean_signal(n=200)

# Add drift
drifted = corrupt.with_drift(clean, drift_rate=0.008, drift_type='linear')

# Plot: Clean vs Drifted
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5))

ax1.plot(t, clean, 'g-', linewidth=2, label='Clean Signal')
ax1.set_xlabel('Time')
ax1.set_ylabel('Amplitude')
ax1.set_title('Clean Signal (Stable)', fontweight='bold')
ax1.legend()
ax1.grid(True, alpha=0.3)

ax2.plot(t, drifted, 'r-', linewidth=2, label='Drifted Signal')
ax2.plot(t, clean, 'g--', alpha=0.3, linewidth=2, label='Clean (reference)')
# Show the drift component
drift_component = drifted - clean
ax2.plot(t, drift_component, 'orange', linestyle=':', linewidth=2, label='Drift component')
ax2.set_xlabel('Time')
ax2.set_ylabel('Amplitude')
ax2.set_title('With Linear Drift (Sensor Warming Up)', fontweight='bold')
ax2.legend()
ax2.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print(f"Total drift: {drift_component[-1]:.4f}")

### The Salvation: Periodic Re-Calibration

Drift is trickier than static bias because it *changes*. Solutions:

1. **Periodic re-calibration**: Return to known state, measure new bias, update offset
2. **Sensor fusion**: Use drift-free sensors (accelerometers) to correct drifting ones (gyros)
3. **Thermal modeling**: If drift correlates with temperature, model and compensate

For demonstration, we'll use calibration with a time-varying offset estimate (simulating periodic re-calibration).

In [None]:
# Estimate drift using first and last segments (simulating periodic calibration)
# In practice, you'd recalibrate at known intervals
# Here we'll use a simple high-pass filter to remove the drift trend

# Method: Remove low-frequency trend with high-pass filtering
from scipy import signal as scipy_signal

# Design a high-pass filter to remove drift
b, a = scipy_signal.butter(3, 0.05, btype='high', analog=False)
detrended = scipy_signal.filtfilt(b, a, drifted)

# Alternative: Simple detrending (subtract linear fit)
coeffs = np.polyfit(t, drifted, deg=1)
trend = np.polyval(coeffs, t)
detrended_simple = drifted - trend + np.mean(clean)

# Plot: Drifted vs Detrended
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5))

ax1.plot(t, drifted, 'r-', linewidth=2, label='Drifted Signal')
ax1.plot(t, clean, 'g--', alpha=0.3, linewidth=2, label='Original Clean')
ax1.plot(t, trend, 'orange', linestyle=':', linewidth=2, label='Estimated trend')
ax1.set_xlabel('Time')
ax1.set_ylabel('Amplitude')
ax1.set_title('Drifted Signal (The Problem)', fontweight='bold')
ax1.legend()
ax1.grid(True, alpha=0.3)

ax2.plot(t, detrended_simple, 'b-', linewidth=2, label='Detrended Signal')
ax2.plot(t, clean, 'g--', alpha=0.5, linewidth=2, label='Original Clean')
ax2.set_xlabel('Time')
ax2.set_ylabel('Amplitude')
ax2.set_title('After Detrending (The Solution)', fontweight='bold')
ax2.legend()
ax2.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

mse = np.mean((detrended_simple - clean)**2)
print(f"Reconstruction MSE: {mse:.4f}")

## 6. Spikes: The Random Saboteurs

### The Corruption

Your sensor is humming along nicely: 1.2, 1.3, 1.35, **947.2**, 1.4, 1.38...

Wait, what?

**Spikes** (also called impulse noise or salt-and-pepper noise) are random, massive outliers. Causes:
- Electromagnetic interference from nearby motors
- Electrostatic discharge (you shuffled across the carpet, now your ADC thinks the world exploded)
- Transmission errors (bit flip: 0b00000010 → 0b10000010)
- Power supply glitches
- Cosmic rays (yes, really—ask the CERN folks)

These aren't Gaussian. They're rare but *huge*. And they'll wreck your control system if you let them through.

In [None]:
# Generate clean signal
t, clean = generate_clean_signal(n=200)

# Add spikes
spiked = corrupt.with_spikes(clean, spike_rate=0.05, spike_magnitude=8.0, seed=42)

# Plot: Clean vs Spiked
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5))

ax1.plot(t, clean, 'g-', linewidth=2, label='Clean Signal')
ax1.set_xlabel('Time')
ax1.set_ylabel('Amplitude')
ax1.set_title('Clean Signal (EMI-Free Paradise)', fontweight='bold')
ax1.legend()
ax1.grid(True, alpha=0.3)

ax2.plot(t, spiked, 'r-', linewidth=1, alpha=0.7, label='Spiked Signal')
ax2.plot(t, clean, 'g--', alpha=0.3, linewidth=2, label='Clean (reference)')
# Highlight spikes
spike_mask = np.abs(spiked - clean) > 2
ax2.plot(t[spike_mask], spiked[spike_mask], 'r*', markersize=10, label='Spikes')
ax2.set_xlabel('Time')
ax2.set_ylabel('Amplitude')
ax2.set_title(f'With Spikes ({spike_mask.sum()} detected)', fontweight='bold')
ax2.legend()
ax2.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

### The Salvation: Median Filtering & Outlier Removal

**Median filters** are spike-killing machines. Unlike mean filters (which get dragged toward outliers), median filters replace each point with the *median* of its neighbors.

Spike of 947.2 surrounded by [1.3, 1.35, 1.4]? Median says: "1.35, you're up."

For even better results, use **outlier removal** based on statistical tests (z-score). If a point is >3 standard deviations from the mean, it's probably not real data—replace it with the median or interpolate.

In [None]:
# Apply spike removal methods
cleaned_median = prepare.with_median_filter(spiked, window=5)
cleaned_outlier = prepare.with_outlier_removal(spiked, threshold=2.5, method='replace')

# Plot: Spiked vs Cleaned
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5))

ax1.plot(t, spiked, 'r-', linewidth=1, alpha=0.7, label='Spiked Signal')
ax1.plot(t, clean, 'g--', alpha=0.3, linewidth=2, label='Original Clean')
spike_mask = np.abs(spiked - clean) > 2
ax1.plot(t[spike_mask], spiked[spike_mask], 'r*', markersize=10, label='Spikes')
ax1.set_xlabel('Time')
ax1.set_ylabel('Amplitude')
ax1.set_title('With Spikes (The Problem)', fontweight='bold')
ax1.legend()
ax1.grid(True, alpha=0.3)

ax2.plot(t, clean, 'g-', alpha=0.3, linewidth=2, label='Original Clean')
ax2.plot(t, cleaned_median, 'b-', linewidth=1.5, label='Median Filter')
ax2.plot(t, cleaned_outlier, 'm-', linewidth=1.5, alpha=0.7, label='Outlier Removal')
ax2.set_xlabel('Time')
ax2.set_ylabel('Amplitude')
ax2.set_title('After Spike Removal (Clean Victory)', fontweight='bold')
ax2.legend()
ax2.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print(f"Median filter MSE: {np.mean((cleaned_median - clean)**2):.4f}")
print(f"Outlier removal MSE: {np.mean((cleaned_outlier - clean)**2):.4f}")

## 7. Quantization: The Staircase Effect

### The Corruption

Analog signals live in continuous land. Digital systems? Not so much.

Your 8-bit ADC can represent exactly **256 discrete levels**. If your signal is 1.234567? Too bad—nearest level is 1.235. This is **quantization**: rounding continuous values to the nearest representable level.

Low-resolution ADCs make this visible as a staircase effect. Your smooth velocity curve becomes a series of flat steps.

This affects:
- Low-bit ADCs (8-bit, 10-bit)
- Integer encoders on motors
- Digital potentiometers
- Any continuous→discrete conversion

In [None]:
# Generate clean signal
t, clean = generate_clean_signal(n=200)

# Add quantization (simulate 6-bit ADC: 64 levels)
quantized = corrupt.with_quantization(clean, levels=64)

# Plot: Clean vs Quantized
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5))

ax1.plot(t, clean, 'g-', linewidth=2, label='Clean Signal (Analog)')
ax1.set_xlabel('Time')
ax1.set_ylabel('Amplitude')
ax1.set_title('Clean Analog Signal', fontweight='bold')
ax1.legend()
ax1.grid(True, alpha=0.3)

ax2.plot(t, quantized, 'r-', linewidth=1.5, label='Quantized Signal (6-bit ADC)')
ax2.plot(t, clean, 'g--', alpha=0.3, linewidth=2, label='Clean (reference)')
ax2.set_xlabel('Time')
ax2.set_ylabel('Amplitude')
ax2.set_title('After 6-bit Quantization (64 levels)', fontweight='bold')
ax2.legend()
ax2.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print(f"Number of unique levels: {len(np.unique(quantized))}")
print(f"Quantization MSE: {np.mean((quantized - clean)**2):.4f}")

### The Salvation: Smoothing (with caution)

Quantization error looks like high-frequency noise. Low-pass filtering can smooth out the staircases—but be careful! You're also removing real high-frequency content.

Better solution? **Use a higher-resolution ADC**. Seriously. 12-bit and 16-bit ADCs are cheap. Don't fight quantization noise when you can just eliminate it.

But if you're stuck with low-res hardware, gentle smoothing can help.

In [None]:
# Apply smoothing to reduce quantization artifacts
smoothed = prepare.with_moving_average(quantized, window=5)

# Plot: Quantized vs Smoothed
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5))

ax1.plot(t, quantized, 'r-', linewidth=1.5, label='Quantized Signal')
ax1.plot(t, clean, 'g--', alpha=0.3, linewidth=2, label='Original Clean')
ax1.set_xlabel('Time')
ax1.set_ylabel('Amplitude')
ax1.set_title('Quantized Signal (The Problem)', fontweight='bold')
ax1.legend()
ax1.grid(True, alpha=0.3)

ax2.plot(t, smoothed, 'b-', linewidth=2, label='Smoothed Signal')
ax2.plot(t, clean, 'g--', alpha=0.5, linewidth=2, label='Original Clean')
ax2.set_xlabel('Time')
ax2.set_ylabel('Amplitude')
ax2.set_title('After Smoothing (Partial Recovery)', fontweight='bold')
ax2.legend()
ax2.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print(f"Quantized MSE: {np.mean((quantized - clean)**2):.4f}")
print(f"Smoothed MSE: {np.mean((smoothed - clean)**2):.4f}")
print(f"Improvement: {((np.mean((quantized - clean)**2) - np.mean((smoothed - clean)**2)) / np.mean((quantized - clean)**2) * 100):.1f}%")

## 8. Clipping/Saturation: The Ceiling and Floor

### The Corruption

Every sensor has a **range**. Your force sensor: 0-100 N. Your ADC: 0-5 V. Your motor encoder: -π to +π rad.

What happens when the real value exceeds this range? **Clipping**. The measurement saturates at the limit:

```
True force:     [50, 75, 110, 95, 80]  N
Measured force: [50, 75, 100, 95, 80]  N  (clipped at 100)
```

You lose information. That peak? Could've been 110 N. Could've been 500 N. You'll never know—your sensor hit its ceiling and stayed there.

This is particularly dangerous because **clipped data looks valid**. No NaN, no error flag. Just a suspiciously flat plateau where there should be a peak.

In [None]:
# Generate clean signal with peaks
t, clean = generate_clean_signal(n=200)
clean = 1.5 * clean  # Amplify to ensure some values exceed limits

# Add clipping
clipped = corrupt.with_clipping(clean, lower=-1.5, upper=1.5)

# Plot: Clean vs Clipped
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5))

ax1.plot(t, clean, 'g-', linewidth=2, label='Clean Signal')
ax1.axhline(y=1.5, color='r', linestyle='--', alpha=0.5, label='Sensor limits')
ax1.axhline(y=-1.5, color='r', linestyle='--', alpha=0.5)
ax1.set_xlabel('Time')
ax1.set_ylabel('Amplitude')
ax1.set_title('Clean Signal (True Values)', fontweight='bold')
ax1.legend()
ax1.grid(True, alpha=0.3)

ax2.plot(t, clipped, 'r-', linewidth=2, label='Clipped Signal')
ax2.plot(t, clean, 'g--', alpha=0.3, linewidth=2, label='Clean (reference)')
ax2.axhline(y=1.5, color='r', linestyle='--', alpha=0.5, label='Sensor limits')
ax2.axhline(y=-1.5, color='r', linestyle='--', alpha=0.5)
ax2.set_xlabel('Time')
ax2.set_ylabel('Amplitude')
ax2.set_title('With Clipping (Information Lost)', fontweight='bold')
ax2.legend()
ax2.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

saturated_points = np.sum((clipped == 1.5) | (clipped == -1.5))
print(f"Saturated points: {saturated_points} / {len(clipped)}")

### The Salvation: Detection and Flagging

You **cannot** recover clipped data. The information is gone. But you *can* detect it and handle it gracefully:

1. **Mark clipped values as invalid** (replace with NaN)
2. **Flag them** so your controller knows "this measurement is unreliable"
3. **Use sensor fusion** if you have redundant sensors with different ranges
4. **Prevent it** by choosing sensors with appropriate ranges

The ``clipping_recovery`` method detects saturation and optionally marks those points as invalid.

In [None]:
# Detect and mark clipped values
recovered = prepare.with_clipping_recovery(clipped, lower=-1.5, upper=1.5, mark_invalid=True)

# Plot: Clipped vs Recovered
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5))

ax1.plot(t, clipped, 'r-', linewidth=2, label='Clipped Signal')
ax1.plot(t, clean, 'g--', alpha=0.3, linewidth=2, label='Original Clean')
ax1.axhline(y=1.5, color='r', linestyle='--', alpha=0.5, label='Sensor limits')
ax1.axhline(y=-1.5, color='r', linestyle='--', alpha=0.5)
ax1.set_xlabel('Time')
ax1.set_ylabel('Amplitude')
ax1.set_title('Clipped Signal (The Problem)', fontweight='bold')
ax1.legend()
ax1.grid(True, alpha=0.3)

# Plot valid points and marked invalid points separately
valid_mask = ~np.isnan(recovered)
ax2.plot(t[valid_mask], recovered[valid_mask], 'b-', linewidth=2, label='Valid Data')
ax2.plot(t[~valid_mask], clipped[~valid_mask], 'rx', markersize=8, label='Detected Saturation (marked NaN)')
ax2.plot(t, clean, 'g--', alpha=0.3, linewidth=2, label='Original Clean')
ax2.set_xlabel('Time')
ax2.set_ylabel('Amplitude')
ax2.set_title('After Detection (Invalid Points Marked)', fontweight='bold')
ax2.legend()
ax2.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print(f"Detected {np.sum(~valid_mask)} saturated points")
print("Note: True values at these points are unknown (information permanently lost)")

## 9. Delay/Latency: The Time Traveler

### The Corruption

Your sensor measures position at time $t_k$. Your computer receives that measurement at time $t_{k+3}$.

**Latency** means your data is *correct* but **outdated**. You're making decisions based on old information—like driving while looking at a photo from 5 seconds ago.

Sources:
- Communication delays (WiFi, Bluetooth, serial)
- Sensor processing time (camera exposure + processing)
- Buffering in the OS or drivers
- Network jitter

This is particularly nasty for control systems. Your controller sends a command based on old state → system has already moved → command is wrong → instability.

Mathematically:
$$
y_k = x_{k-d}
$$
where $d$ is the delay in samples.

In [None]:
# Generate clean signal
t, clean = generate_clean_signal(n=200)

# Add delay
delayed = corrupt.with_delay(clean, delay=10, fill_value=0.0)

# Plot: Clean vs Delayed
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5))

ax1.plot(t, clean, 'g-', linewidth=2, label='Clean Signal (Current)')
ax1.set_xlabel('Time')
ax1.set_ylabel('Amplitude')
ax1.set_title('Clean Signal (Real-Time)', fontweight='bold')
ax1.legend()
ax1.grid(True, alpha=0.3)

ax2.plot(t, delayed, 'r-', linewidth=2, label='Delayed Signal (10 samples)')
ax2.plot(t, clean, 'g--', alpha=0.3, linewidth=2, label='Clean (reference)')
ax2.axvspan(t[0], t[10], alpha=0.2, color='gray', label='Delay period')
ax2.set_xlabel('Time')
ax2.set_ylabel('Amplitude')
ax2.set_title('With Delay (Time Shifted)', fontweight='bold')
ax2.legend()
ax2.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print(f"Phase shift: {10 / len(t) * 4 * np.pi:.2f} radians")

### The Salvation: Timestamping and Prediction

You **cannot** remove latency after the fact. But you can mitigate it:

1. **Timestamp everything**: Know *when* each measurement was taken
2. **Predict forward**: Use a model to estimate current state from delayed measurement
3. **Smith Predictor**: Classic control technique for dealing with known delays
4. **Reduce latency**: Use faster communication, local processing, better hardware

For demonstration, we'll show how timestamping reveals the delay, and how a simple forward prediction (assuming constant velocity) can compensate.

In [None]:
# Simple compensation: shift the delayed signal forward
# In practice, you'd use a state estimator (Kalman filter) to predict forward
compensated = np.roll(delayed, -10)
compensated[-10:] = compensated[-11]  # Hold last value for the tail

# Plot: Delayed vs Compensated
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5))

ax1.plot(t, delayed, 'r-', linewidth=2, label='Delayed Signal')
ax1.plot(t, clean, 'g--', alpha=0.3, linewidth=2, label='Original Clean')
ax1.set_xlabel('Time')
ax1.set_ylabel('Amplitude')
ax1.set_title('Delayed Signal (The Problem)', fontweight='bold')
ax1.legend()
ax1.grid(True, alpha=0.3)

ax2.plot(t, compensated, 'b-', linewidth=2, label='Compensated Signal')
ax2.plot(t, clean, 'g--', alpha=0.5, linewidth=2, label='Original Clean')
ax2.set_xlabel('Time')
ax2.set_ylabel('Amplitude')
ax2.set_title('After Delay Compensation (Time-Aligned)', fontweight='bold')
ax2.legend()
ax2.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

# Compare alignment (ignoring the tail)
mse_delayed = np.mean((delayed[:-10] - clean[:-10])**2)
mse_compensated = np.mean((compensated[:-10] - clean[:-10])**2)
print(f"Delayed MSE: {mse_delayed:.4f}")
print(f"Compensated MSE: {mse_compensated:.4f}")
print(f"Improvement: {((mse_delayed - mse_compensated) / mse_delayed * 100):.1f}%")

## Conclusion: Survival Guide

Hardware will betray you. It's not personal—it's physics, economics, and the second law of thermodynamics.

### What We've Learned

| Corruption | Cause | Solution |
|------------|-------|----------|
| **Gaussian Noise** | Thermal, quantization, EMI | Low-pass filtering, averaging |
| **Bounce** | Mechanical contacts | Debouncing (require stability) |
| **Dropouts** | Packet loss, intermittent connections | Interpolation, extrapolation |
| **Staleness** | Different sensor update rates | Staleness policies (hold/zero/drop/none) |
| **Bias** | Calibration drift, offsets | Calibration (subtract bias) |
| **Drift** | Temperature, aging | Periodic re-calibration, detrending |
| **Spikes** | EMI, glitches, cosmic rays | Median filtering, outlier removal |
| **Quantization** | Low-resolution ADC | Smoothing (or buy better hardware) |
| **Clipping** | Sensor saturation | Detection, flagging (information lost) |
| **Delay** | Communication latency | Timestamping, prediction |

### The PyKal Philosophy

1. **Simulate before you build**: Use ``corrupt.*`` to test your estimators with realistic noise *before* buying hardware
2. **Prepare defensively**: Apply ``prepare.*`` methods in your data pipeline *before* feeding measurements to controllers
3. **Know your enemy**: Different corruptions need different solutions—median filters for spikes, calibration for bias
4. **Trust, but verify**: Just because data looks clean doesn't mean it is. Add sanity checks.

### Next Steps

- Integrate these methods into your ``ROSNode`` callbacks (see ``ros_node.py``)
- Use the same covariance matrix ``Q`` for both corruption (testing) and Kalman filtering (estimation)
- Build a pre-processing pipeline: interpolation → outlier removal → calibration → low-pass filtering
- Test your control system with increasingly severe corruption until it breaks, then fix it

Remember: **Hardware is cursed, but we have the tools to break the curse.**

Now go forth and build robots that work in reality, not just in simulation.

In [None]:
print("\n" + "="*60)
print("THE CURSE OF HARDWARE: SURVIVED")
print("="*60)
print("\nYou now have the knowledge to:")
print("  ✓ Recognize hardware corruption patterns")
print("  ✓ Apply appropriate preparation methods")
print("  ✓ Build robust data pipelines")
print("  ✓ Test estimators with realistic noise")
print("\nGo build something that works in the real world.")
print("="*60)

---

## Navigation

**[Previous: PyKal Workflow](pykal_workflow.ipynb)** | **Next: [Simulating the Curse →](simulating_the_curse.ipynb)**

**← [Back to What is PyKal?](../what_is_pykal/index.rst)**