# Signal Decomposer - Bandpass Filter Comparison

This notebook demonstrates the corrected `SignalDecomposer` class with true bandpass filtering.

**Key concept**: We use period pairs `(low, high)` to define frequency bands.

## Setup and Imports

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from scipy.fft import fft, fftfreq

# Import the corrected decomposer
import sys
sys.path.insert(0, '.')  # Add current directory to path
from signal_decomposer_v3 import SignalDecomposer, preprocess_for_forecast

# Set plotting style
sns.set_style('whitegrid')
plt.rcParams['figure.figsize'] = (15, 10)
plt.rcParams['font.size'] = 10

print("✓ Imports successful")

## 1. Generate Synthetic Signal

Create a signal with known frequency components:
- **Daily cycle**: 24-hour period (1 day)
- **Weekly cycle**: 7-day period
- **Monthly cycle**: 28-day period
- **Noise**: Random high-frequency variations

In [None]:
# Generate 6 months of 15-minute data
np.random.seed(42)
dates = pd.date_range('2023-01-01', '2023-07-01', freq='15min')
n = len(dates)
t = np.arange(n)

# Data frequency: 15-min intervals = 96 observations per day
freq = 96

# Create individual components
subdaily_cycle = 6 * np.sin(2 * np.pi * t / (freq * 0.5))  # 12-hour cycle (0.5 days)
daily_cycle = 10 * np.sin(2 * np.pi * t / freq)  # 24-hour cycle (1 day)
weekly_cycle = 5 * np.sin(2 * np.pi * t / (freq * 7))  # 7-day cycle
monthly_cycle = 3 * np.sin(2 * np.pi * t / (freq * 28))  # 28-day cycle
noise = 0.5 * np.random.randn(n)  # High-frequency noise
trend = 20 + 0.01 * t  # Long-term trend

# Combine all components
y = subdaily_cycle + daily_cycle + weekly_cycle + monthly_cycle + noise + trend

# Create DataFrame
df = pd.DataFrame({
    'ds': dates,
    'y': y,
    'subdaily_true': subdaily_cycle,
    'daily_true': daily_cycle,
    'weekly_true': weekly_cycle,
    'monthly_true': monthly_cycle,
    'noise_true': noise,
    'trend_true': trend
})

print(f"Generated {len(df):,} observations over {len(df) / freq:.1f} days")
print(f"\nSignal components:")
print(f"  Sub-daily cycle: amplitude = 6,  period = 0.5 days (12 hours)")
print(f"  Daily cycle:     amplitude = 10, period = 1 day (24 hours)")
print(f"  Weekly cycle:    amplitude = 5,  period = 7 days")
print(f"  Monthly cycle:   amplitude = 3,  period = 28 days")
print(f"  Noise:           std = 0.5")
print(f"  Trend:           linear increase")

## 2. Visualize Original Signal

In [None]:
fig, axes = plt.subplots(6, 1, figsize=(15, 14))

# Plot first 2 weeks
plot_days = 14
plot_n = plot_days * freq
df_plot = df.iloc[:plot_n]

axes[0].plot(df_plot['ds'], df_plot['y'], label='Combined Signal', alpha=0.8, linewidth=1)
axes[0].set_title('Combined Signal (First 14 Days)', fontsize=12, fontweight='bold')
axes[0].set_ylabel('Value')
axes[0].legend()
axes[0].grid(True, alpha=0.3)

axes[1].plot(df_plot['ds'], df_plot['subdaily_true'], label='Sub-daily Cycle (12h)', color='purple', linewidth=1.5)
axes[1].set_title('Sub-daily Cycle Component (12-hour period)', fontsize=11)
axes[1].set_ylabel('Value')
axes[1].legend()
axes[1].grid(True, alpha=0.3)

axes[2].plot(df_plot['ds'], df_plot['daily_true'], label='Daily Cycle (24h)', color='orange', linewidth=1.5)
axes[2].set_title('Daily Cycle Component (24-hour period)', fontsize=11)
axes[2].set_ylabel('Value')
axes[2].legend()
axes[2].grid(True, alpha=0.3)

axes[3].plot(df_plot['ds'], df_plot['weekly_true'], label='Weekly Cycle', color='green', linewidth=1.5)
axes[3].set_title('Weekly Cycle Component (7-day period)', fontsize=11)
axes[3].set_ylabel('Value')
axes[3].legend()
axes[3].grid(True, alpha=0.3)

axes[4].plot(df_plot['ds'], df_plot['monthly_true'], label='Monthly Cycle', color='brown', linewidth=1.5)
axes[4].set_title('Monthly Cycle Component (28-day period)', fontsize=11)
axes[4].set_ylabel('Value')
axes[4].legend()
axes[4].grid(True, alpha=0.3)

axes[5].plot(df_plot['ds'], df_plot['trend_true'], label='Trend', color='red', linewidth=1.5)
axes[5].set_title('Trend Component', fontsize=11)
axes[5].set_ylabel('Value')
axes[5].set_xlabel('Date')
axes[5].legend()
axes[5].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

## 3. Define Period Pairs for Bandpass Filtering

Each pair `(low, high)` defines a frequency band:
- `(0.5, 1)`: Sub-daily band (12-24 hours)
- `(1, 3)`: Daily band - should capture the daily cycle
- `(3, 7)`: Multi-day band
- `(7, 14)`: Weekly band - should capture the weekly cycle
- `(14, 28)`: Bi-weekly band - should capture the monthly cycle

In [None]:
# Define period pairs (in days)
# Structure: sub-daily, daily, weekly, monthly, trend
period_pairs = [
    (0.25, 0.75),   # Sub-daily: 6-18 hours (should extract 12h cycle)
    (0.75, 1.25),   # Daily: 18-30 hours (should extract 24h cycle)
    (1.25, 7.0),    # Multi-day to weekly (should extract 7-day cycle)
    (7.0, 30.0),    # Weekly to monthly (should extract 28-day cycle)
    (30.0, 180.0), # Long-term trend (no upper limit difference, just lowest freq)
]

print("Period pairs for bandpass filtering:")
print(f"{'Band':<15} {'Period Range (days)':<25} {'Target Component':<25}")
print("-" * 70)
for i, (low, high) in enumerate(period_pairs):
    targets = [
        'Sub-daily (12h cycle)',
        'Daily (24h cycle)',
        'Weekly (7d cycle)',
        'Monthly (28d cycle)',
        'Trend (long-term)'
    ]
    print(f"Band {i:<10} [{low:6.2f}, {high:7.2f}]           {targets[i]}")

## 4. Apply Savitzky-Golay Bandpass Decomposition

In [None]:
# Create decomposer with Savitzky-Golay
# Savitzky-Golay with Butterworth cleanup (default: enabled)
# The cleanup removes residual frequencies outside the target band
decomposer_sg = SignalDecomposer(
    freq=freq,
    period_pairs=period_pairs,
    filter_type='savgol',
    savgol_butter_cleanup=True,  # Apply Butterworth bandpass after SavGol
    savgol_butter_margin=0.2,    # 20% frequency margin
    savgol_polyorder=3,
    savgol_mode='nearest',
    mode='keep'  # Keep all rows (no NaN dropping)
)

# Decompose
df_sg = decomposer_sg.decompose(df[['ds', 'y']].copy())

# Show component info
print("\nSavitzky-Golay Component Information:")
print(decomposer_sg.get_component_info().to_string(index=False))

## 5. Apply Butterworth Bandpass Decomposition

In [None]:
# Create decomposer with Butterworth
decomposer_bw = SignalDecomposer(
    freq=freq,
    period_pairs=period_pairs,
    filter_type='butterworth',
    butter_order=4,
    mode='keep'  # Keep all rows
)

# Decompose
df_bw = decomposer_bw.decompose(df[['ds', 'y']].copy())

# Show component info
print("\nButterworth Component Information:")
print(decomposer_bw.get_component_info().to_string(index=False))

## 6. Apply DWT Bandpass Decomposition

Now let's test the Discrete Wavelet Transform (DWT) method.

In [None]:
# Create decomposer with DWT
decomposer_dwt = SignalDecomposer(
    freq=freq,
    period_pairs=period_pairs,
    filter_type='dwt',
    wavelet='db4',
    dwt_max_level=12,
    mode='keep'
)

# Decompose
df_dwt = decomposer_dwt.decompose(df[['ds', 'y']].copy())

print("\nDWT Component Information:")
print(decomposer_dwt.get_component_info().to_string(index=False))

## 7. Compare Sub-daily Cycle Extraction

The sub-daily (12-hour) cycle should be captured in `y_band_0` (0.25-0.75 day band).

In [None]:
fig, axes = plt.subplots(3, 1, figsize=(15, 10))

plot_days = 3  # Show 3 days to see the 12-hour pattern clearly
plot_n = plot_days * freq

# True sub-daily component
axes[0].plot(df.iloc[:plot_n]['ds'], df.iloc[:plot_n]['subdaily_true'], 
             label='True Sub-daily Cycle (12h)', color='purple', linewidth=2)
axes[0].set_title('True Sub-daily Cycle (First 3 Days)', fontsize=12, fontweight='bold')
axes[0].set_ylabel('Amplitude')
axes[0].legend()
axes[0].grid(True, alpha=0.3)

# Savitzky-Golay extraction
axes[1].plot(df.iloc[:plot_n]['ds'], df.iloc[:plot_n]['subdaily_true'], 
             label='True', color='purple', alpha=0.4, linewidth=2)
axes[1].plot(df_sg.iloc[:plot_n]['ds'], df_sg.iloc[:plot_n]['y_band_0'], 
             label='Savitzky-Golay Extract (band_0: 0.25-0.75d)', color='red', linewidth=2)
axes[1].set_title('Savitzky-Golay: Sub-daily Cycle Extraction', fontsize=12, fontweight='bold')
axes[1].set_ylabel('Amplitude')
axes[1].legend()
axes[1].grid(True, alpha=0.3)

# Butterworth extraction
axes[2].plot(df.iloc[:plot_n]['ds'], df.iloc[:plot_n]['subdaily_true'], 
             label='True', color='purple', alpha=0.4, linewidth=2)
axes[2].plot(df_bw.iloc[:plot_n]['ds'], df_bw.iloc[:plot_n]['y_band_0'], 
             label='Butterworth Extract (band_0: 0.25-0.75d)', color='green', linewidth=2)
axes[2].set_title('Butterworth: Sub-daily Cycle Extraction', fontsize=12, fontweight='bold')
axes[2].set_ylabel('Amplitude')
axes[2].set_xlabel('Date')
axes[2].legend()
axes[2].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

# Calculate errors
error_sg = np.sqrt(np.mean((df_sg['y_band_0'].values - df['subdaily_true'].values) ** 2))
error_bw = np.sqrt(np.mean((df_bw['y_band_0'].values - df['subdaily_true'].values) ** 2))
print(f"\nRMSE (Sub-daily Cycle Extraction):")
print(f"  Savitzky-Golay: {error_sg:.4f}")
print(f"  Butterworth:    {error_bw:.4f}")

# Correlation
corr_sg = np.corrcoef(df_sg['y_band_0'].values, df['subdaily_true'].values)[0, 1]
corr_bw = np.corrcoef(df_bw['y_band_0'].values, df['subdaily_true'].values)[0, 1]
print(f"\nCorrelation with True Sub-daily Cycle:")
print(f"  Savitzky-Golay: {corr_sg:.4f}")
print(f"  Butterworth:    {corr_bw:.4f}")

## 8. Compare Daily Cycle Extraction

The daily cycle should be captured in `y_band_1` (0.75-1.25 day band).

In [None]:
fig, axes = plt.subplots(3, 1, figsize=(15, 10))

plot_days = 7
plot_n = plot_days * freq

# True daily component
axes[0].plot(df.iloc[:plot_n]['ds'], df.iloc[:plot_n]['daily_true'], 
             label='True Daily Cycle', color='blue', linewidth=2)
axes[0].set_title('True Daily Cycle (First 7 Days)', fontsize=12, fontweight='bold')
axes[0].set_ylabel('Amplitude')
axes[0].legend()
axes[0].grid(True, alpha=0.3)

# Savitzky-Golay extraction
axes[1].plot(df.iloc[:plot_n]['ds'], df.iloc[:plot_n]['daily_true'], 
             label='True', color='blue', alpha=0.4, linewidth=2)
axes[1].plot(df_sg.iloc[:plot_n]['ds'], df_sg.iloc[:plot_n]['y_band_1'], 
             label='Savitzky-Golay Extract (band_1: 1-3d)', color='red', linewidth=2)
axes[1].set_title('Savitzky-Golay: Daily Cycle Extraction', fontsize=12, fontweight='bold')
axes[1].set_ylabel('Amplitude')
axes[1].legend()
axes[1].grid(True, alpha=0.3)

# Butterworth extraction
axes[2].plot(df.iloc[:plot_n]['ds'], df.iloc[:plot_n]['daily_true'], 
             label='True', color='blue', alpha=0.4, linewidth=2)
axes[2].plot(df_bw.iloc[:plot_n]['ds'], df_bw.iloc[:plot_n]['y_band_1'], 
             label='Butterworth Extract (band_1: 1-3d)', color='green', linewidth=2)
axes[2].set_title('Butterworth: Daily Cycle Extraction', fontsize=12, fontweight='bold')
axes[2].set_ylabel('Amplitude')
axes[2].set_xlabel('Date')
axes[2].legend()
axes[2].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

# Calculate errors
error_sg = np.sqrt(np.mean((df_sg['y_band_1'].values - df['daily_true'].values) ** 2))
error_bw = np.sqrt(np.mean((df_bw['y_band_1'].values - df['daily_true'].values) ** 2))
print(f"\nRMSE (Daily Cycle Extraction):")
print(f"  Savitzky-Golay: {error_sg:.4f}")
print(f"  Butterworth:    {error_bw:.4f}")

# Correlation
corr_sg = np.corrcoef(df_sg['y_band_1'].values, df['daily_true'].values)[0, 1]
corr_bw = np.corrcoef(df_bw['y_band_1'].values, df['daily_true'].values)[0, 1]
print(f"\nCorrelation with True Daily Cycle:")
print(f"  Savitzky-Golay: {corr_sg:.4f}")
print(f"  Butterworth:    {corr_bw:.4f}")

## 9. Compare Weekly Cycle Extraction

The weekly cycle should be captured in `y_band_2` (1.25-7 day band).

In [None]:
fig, axes = plt.subplots(3, 1, figsize=(15, 10))

plot_days = 28
plot_n = plot_days * freq

# True weekly component
axes[0].plot(df.iloc[:plot_n]['ds'], df.iloc[:plot_n]['weekly_true'], 
             label='True Weekly Cycle', color='blue', linewidth=2)
axes[0].set_title('True Weekly Cycle (First 28 Days)', fontsize=12, fontweight='bold')
axes[0].set_ylabel('Amplitude')
axes[0].legend()
axes[0].grid(True, alpha=0.3)

# Savitzky-Golay extraction
axes[1].plot(df.iloc[:plot_n]['ds'], df.iloc[:plot_n]['weekly_true'], 
             label='True', color='blue', alpha=0.4, linewidth=2)
axes[1].plot(df_sg.iloc[:plot_n]['ds'], df_sg.iloc[:plot_n]['y_band_2'], 
             label='Savitzky-Golay Extract (band_2: 1.25-7d)', color='red', linewidth=2)
axes[1].set_title('Savitzky-Golay: Weekly Cycle Extraction', fontsize=12, fontweight='bold')
axes[1].set_ylabel('Amplitude')
axes[1].legend()
axes[1].grid(True, alpha=0.3)

# Butterworth extraction
axes[2].plot(df.iloc[:plot_n]['ds'], df.iloc[:plot_n]['weekly_true'], 
             label='True', color='blue', alpha=0.4, linewidth=2)
axes[2].plot(df_bw.iloc[:plot_n]['ds'], df_bw.iloc[:plot_n]['y_band_2'], 
             label='Butterworth Extract (band_2: 1.25-7d)', color='green', linewidth=2)
axes[2].set_title('Butterworth: Weekly Cycle Extraction', fontsize=12, fontweight='bold')
axes[2].set_ylabel('Amplitude')
axes[2].set_xlabel('Date')
axes[2].legend()
axes[2].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

# Calculate errors
error_sg = np.sqrt(np.mean((df_sg['y_band_2'].values - df['weekly_true'].values) ** 2))
error_bw = np.sqrt(np.mean((df_bw['y_band_2'].values - df['weekly_true'].values) ** 2))
print(f"\nRMSE (Weekly Cycle Extraction):")
print(f"  Savitzky-Golay: {error_sg:.4f}")
print(f"  Butterworth:    {error_bw:.4f}")

# Correlation
corr_sg = np.corrcoef(df_sg['y_band_2'].values, df['weekly_true'].values)[0, 1]
corr_bw = np.corrcoef(df_bw['y_band_2'].values, df['weekly_true'].values)[0, 1]
print(f"\nCorrelation with True Weekly Cycle:")
print(f"  Savitzky-Golay: {corr_sg:.4f}")
print(f"  Butterworth:    {corr_bw:.4f}")

## 10. Compare All Bands Side-by-Side

In [None]:
# Define band-to-component mapping
band_to_true = {
    'y_band_0': ('subdaily_true', 'Sub-daily (12h)', 'purple'),
    'y_band_1': ('daily_true', 'Daily (24h)', 'orange'),
    'y_band_2': ('weekly_true', 'Weekly (7d)', 'green'),
    'y_band_3': ('monthly_true', 'Monthly (28d)', 'brown'),
    'y_band_4': ('trend_true', 'Trend (long-term)', 'red'),
}

band_cols = ['y_band_0', 'y_band_1', 'y_band_2', 'y_band_3', 'y_band_4']
band_names = [
    'Band 0: Sub-daily (0.25-0.75d)',
    'Band 1: Daily (0.75-1.25d)',
    'Band 2: Weekly (1.25-7d)',
    'Band 3: Monthly (7-30d)',
    'Band 4: Trend (30-180d)',
]

fig, axes = plt.subplots(len(band_cols), 3, figsize=(20, 15))
fig.suptitle('All Frequency Bands: Savitzky-Golay vs Butterworth vs DWT (with True Components)', 
             fontsize=14, fontweight='bold')

plot_days = 14
plot_n = plot_days * freq

for i, (band_col, band_name) in enumerate(zip(band_cols, band_names)):
    true_col, true_label, true_color = band_to_true[band_col]
    
    # Savitzky-Golay
    axes[i, 0].plot(df.iloc[:plot_n]['ds'], df.iloc[:plot_n][true_col], 
                   label=f'True {true_label}', color=true_color, linewidth=2, alpha=0.5, linestyle='--')
    axes[i, 0].plot(df_sg.iloc[:plot_n]['ds'], df_sg.iloc[:plot_n][band_col], 
                   label='Savitzky-Golay', color='red', linewidth=1.5, alpha=0.8)
    axes[i, 0].set_title(f'{band_name} - Savitzky-Golay', fontsize=9)
    axes[i, 0].set_ylabel('Amplitude', fontsize=8)
    axes[i, 0].legend(loc='upper right', fontsize=7)
    axes[i, 0].grid(True, alpha=0.3)
    
    # Butterworth
    axes[i, 1].plot(df.iloc[:plot_n]['ds'], df.iloc[:plot_n][true_col], 
                   label=f'True {true_label}', color=true_color, linewidth=2, alpha=0.5, linestyle='--')
    axes[i, 1].plot(df_bw.iloc[:plot_n]['ds'], df_bw.iloc[:plot_n][band_col], 
                   label='Butterworth', color='green', linewidth=1.5, alpha=0.8)
    axes[i, 1].set_title(f'{band_name} - Butterworth', fontsize=9)
    axes[i, 1].set_ylabel('Amplitude', fontsize=8)
    axes[i, 1].legend(loc='upper right', fontsize=7)
    axes[i, 1].grid(True, alpha=0.3)
    
    # DWT
    axes[i, 2].plot(df.iloc[:plot_n]['ds'], df.iloc[:plot_n][true_col], 
                   label=f'True {true_label}', color=true_color, linewidth=2, alpha=0.5, linestyle='--')
    axes[i, 2].plot(df_dwt.iloc[:plot_n]['ds'], df_dwt.iloc[:plot_n][band_col], 
                   label='DWT', color='blue', linewidth=1.5, alpha=0.8)
    axes[i, 2].set_title(f'{band_name} - DWT', fontsize=9)
    axes[i, 2].set_ylabel('Amplitude', fontsize=8)
    axes[i, 2].legend(loc='upper right', fontsize=7)
    axes[i, 2].grid(True, alpha=0.3)

axes[-1, 0].set_xlabel('Date')
axes[-1, 1].set_xlabel('Date')
axes[-1, 2].set_xlabel('Date')

plt.tight_layout()
plt.show()

## 11. Power Spectrum Analysis

Use FFT to see which filter better isolates target frequencies

In [None]:
def compute_spectrum(signal):
    """Compute power spectrum of signal."""
    n = len(signal)
    yf = fft(signal)
    xf = fftfreq(n, 1/freq)[:n//2]  # Frequency in cycles per day
    power = 2.0/n * np.abs(yf[0:n//2])
    return xf, power

# Compute spectra for original signal
f_orig, p_orig = compute_spectrum(df['y'].values)

# Compute spectra for daily band (band_1)
f_sg_daily, p_sg_daily = compute_spectrum(df_sg['y_band_1'].values)
f_bw_daily, p_bw_daily = compute_spectrum(df_bw['y_band_1'].values)
f_true_daily, p_true_daily = compute_spectrum(df['daily_true'].values)

# Compute spectra for weekly band (band_3)
f_sg_weekly, p_sg_weekly = compute_spectrum(df_sg['y_band_2'].values)
f_bw_weekly, p_bw_weekly = compute_spectrum(df_bw['y_band_2'].values)
f_true_weekly, p_true_weekly = compute_spectrum(df['weekly_true'].values)

# Plot
fig, axes = plt.subplots(3, 1, figsize=(15, 12))

# Original signal spectrum
axes[0].semilogy(f_orig, p_orig, label='Original Signal', alpha=0.7, linewidth=1.5)
axes[0].axvline(x=1, color='red', linestyle='--', label='Daily (1 cycle/day)', alpha=0.7)
axes[0].axvline(x=1/7, color='green', linestyle='--', label='Weekly (1/7 cycle/day)', alpha=0.7)
axes[0].axvline(x=1/28, color='purple', linestyle='--', label='Monthly (1/28 cycle/day)', alpha=0.7)
axes[0].set_title('Power Spectrum - Original Signal', fontsize=12, fontweight='bold')
axes[0].set_xlabel('Frequency (cycles per day)')
axes[0].set_ylabel('Power')
axes[0].legend()
axes[0].grid(True, alpha=0.3)
axes[0].set_xlim([0, 3])

# Daily band spectrum
axes[1].semilogy(f_true_daily, p_true_daily, label='True Daily Cycle', 
                color='blue', linewidth=2, alpha=0.7)
axes[1].semilogy(f_sg_daily, p_sg_daily, label='Savitzky-Golay (band_1)', 
                color='red', linewidth=2, alpha=0.7)
axes[1].semilogy(f_bw_daily, p_bw_daily, label='Butterworth (band_1)', 
                color='green', linewidth=2, alpha=0.7)
axes[1].axvline(x=1, color='black', linestyle='--', label='Target (1 cycle/day)', alpha=0.5)
axes[1].set_title('Power Spectrum - Daily Band (1-3 days)', fontsize=12, fontweight='bold')
axes[1].set_xlabel('Frequency (cycles per day)')
axes[1].set_ylabel('Power')
axes[1].legend()
axes[1].grid(True, alpha=0.3)
axes[1].set_xlim([0, 3])

# Weekly band spectrum
axes[2].semilogy(f_true_weekly, p_true_weekly, label='True Weekly Cycle', 
                color='blue', linewidth=2, alpha=0.7)
axes[2].semilogy(f_sg_weekly, p_sg_weekly, label='Savitzky-Golay (band_3)', 
                color='red', linewidth=2, alpha=0.7)
axes[2].semilogy(f_bw_weekly, p_bw_weekly, label='Butterworth (band_3)', 
                color='green', linewidth=2, alpha=0.7)
axes[2].axvline(x=1/7, color='black', linestyle='--', label='Target (1/7 cycle/day)', alpha=0.5)
axes[2].set_title('Power Spectrum - Weekly Band (7-14 days)', fontsize=12, fontweight='bold')
axes[2].set_xlabel('Frequency (cycles per day)')
axes[2].set_ylabel('Power')
axes[2].legend()
axes[2].grid(True, alpha=0.3)
axes[2].set_xlim([0, 0.5])

plt.tight_layout()
plt.show()

## 12. Summary Statistics

In [None]:
print("=" * 80)
print("SUMMARY STATISTICS")
print("=" * 80)

# Calculate variance explained by each band
total_var = df['y'].var()

print(f"\nTotal signal variance: {total_var:.4f}")

print(f"\nVariance by Band (Savitzky-Golay):")
print(f"{'Band':<20} {'Variance':<12} {'% of Total':<12} {'Std Dev':<12}")
print("-" * 60)
for i, (band_col, band_name) in enumerate(zip(band_cols, band_names)):
    var = df_sg[band_col].var()
    pct = (var / total_var) * 100
    std = df_sg[band_col].std()
    print(f"{band_name:<20} {var:<12.4f} {pct:<12.2f} {std:<12.4f}")

print(f"\nVariance by Band (Butterworth):")
print(f"{'Band':<20} {'Variance':<12} {'% of Total':<12} {'Std Dev':<12}")
print("-" * 60)
for i, (band_col, band_name) in enumerate(zip(band_cols, band_names)):
    var = df_bw[band_col].var()
    pct = (var / total_var) * 100
    std = df_bw[band_col].std()
    print(f"{band_name:<20} {var:<12.4f} {pct:<12.2f} {std:<12.4f}")

# Component accuracy
print(f"\n\nComponent Extraction Accuracy (RMSE):")
print(f"{'Component':<20} {'Savitzky-Golay':<20} {'Butterworth':<20}")
print("-" * 60)

# Sub-daily cycle (band_0)
rmse_sg_subdaily = np.sqrt(np.mean((df_sg['y_band_0'] - df['subdaily_true']) ** 2))
rmse_bw_subdaily = np.sqrt(np.mean((df_bw['y_band_0'] - df['subdaily_true']) ** 2))
rmse_dwt_subdaily = np.sqrt(np.mean((df_dwt['y_band_0'] - df['subdaily_true']) ** 2))
print(f"{'Sub-daily cycle':<20} {rmse_sg_subdaily:<20.4f} {rmse_bw_subdaily:<20.4f} {rmse_dwt_subdaily:<20.4f}")

# Daily cycle (band_1)
rmse_sg_daily = np.sqrt(np.mean((df_sg['y_band_1'] - df['daily_true']) ** 2))
rmse_bw_daily = np.sqrt(np.mean((df_bw['y_band_1'] - df['daily_true']) ** 2))
rmse_dwt_daily = np.sqrt(np.mean((df_dwt['y_band_1'] - df['daily_true']) ** 2))
print(f"{'Daily cycle':<20} {rmse_sg_daily:<20.4f} {rmse_bw_daily:<20.4f} {rmse_dwt_daily:<20.4f}")

# Weekly cycle (band_2)
rmse_sg_weekly = np.sqrt(np.mean((df_sg['y_band_2'] - df['weekly_true']) ** 2))
rmse_bw_weekly = np.sqrt(np.mean((df_bw['y_band_2'] - df['weekly_true']) ** 2))
rmse_dwt_weekly = np.sqrt(np.mean((df_dwt['y_band_2'] - df['weekly_true']) ** 2))
print(f"{'Weekly cycle':<20} {rmse_sg_weekly:<20.4f} {rmse_bw_weekly:<20.4f} {rmse_dwt_weekly:<20.4f}")

# Monthly cycle (band_3)
rmse_sg_monthly = np.sqrt(np.mean((df_sg['y_band_3'] - df['monthly_true']) ** 2))
rmse_bw_monthly = np.sqrt(np.mean((df_bw['y_band_3'] - df['monthly_true']) ** 2))
rmse_dwt_monthly = np.sqrt(np.mean((df_dwt['y_band_3'] - df['monthly_true']) ** 2))
print(f"{'Monthly cycle':<20} {rmse_sg_monthly:<20.4f} {rmse_bw_monthly:<20.4f} {rmse_dwt_monthly:<20.4f}")

# Trend (band_4)
rmse_sg_trend = np.sqrt(np.mean((df_sg['y_band_4'] - df['trend_true']) ** 2))
rmse_bw_trend = np.sqrt(np.mean((df_bw['y_band_4'] - df['trend_true']) ** 2))
rmse_dwt_trend = np.sqrt(np.mean((df_dwt['y_band_4'] - df['trend_true']) ** 2))
print(f"{'Trend':<20} {rmse_sg_trend:<20.4f} {rmse_bw_trend:<20.4f} {rmse_dwt_trend:<20.4f}")

print("\n" + "=" * 80)

## 13. Conclusions

### Key Findings:

1. **All three filters successfully extract target frequencies**
   - Sub-daily (12h) cycle extracted in band_0 (0.25-0.75 days)
   - Daily (24h) cycle extracted in band_1 (0.75-1.25 days)
   - Weekly cycle extracted in band_2 (1.25-7 days)
   - Monthly cycle extracted in band_3 (7-30 days)
   - Trend extracted in band_4 (30-180 days)

2. **Savitzky-Golay characteristics (with Butterworth cleanup):**
   - **Hybrid approach**: SavGol difference + Butterworth bandpass cleanup
   - SavGol provides smooth initial separation
   - Butterworth cleanup removes residual frequencies outside target band
   - Result: Smooth output with true bandpass isolation
   - Best of both worlds: smoothness + precision
   - Cleanup can be disabled with `savgol_butter_cleanup=False`
   - Excellent for temperature trends and forecasting

3. **Butterworth characteristics:**
   - Sharpest frequency cutoff
   - Pure bandpass filtering
   - May show slight ringing near edges
   - Most precise frequency isolation
   - Good for oscillation extraction

4. **DWT characteristics:**
   - Automatic multi-scale decomposition
   - Very fast (O(n) complexity)
   - Good time-frequency localization
   - Dyadic frequency bands (powers of 2)
   - Approximate period matching
   - May not isolate target frequencies well
   - Best for natural hierarchical structure

### Why Hybrid Savitzky-Golay Works Better:

**Problem with pure Savitzky-Golay:**
- Difference of two lowpass filters is not a true bandpass
- Can leak frequencies from outside the target band
- Example: When extracting 1-3 day band, may still contain 0.5 day or 5 day frequencies

**Solution - Butterworth cleanup:**
```python
# Step 1: Savitzky-Golay bandpass (smooth but imprecise)
sg_bandpass = lowpass(low_window) - lowpass(high_window)

# Step 2: Butterworth cleanup (remove residuals)
# Apply bandpass at [f_low*(1-0.2), f_high*(1+0.2)]
clean_bandpass = butterworth_bandpass(sg_bandpass, freqs_with_margin)
```

This hybrid approach:
- ✓ Preserves Savitzky-Golay smoothness
- ✓ Adds Butterworth precision
- ✓ Removes residual frequencies
- ✓ Better frequency isolation than pure SavGol
- ✓ Smoother than pure Butterworth

### Recommendations:

**Use Savitzky-Golay (with Butterworth cleanup - default) when:**
- You want smooth, trend-like components
- Preserving amplitude is important
- Data has smooth variations (e.g., temperature)
- You need both smoothness AND precision
- **Recommended for most forecasting tasks**

**Use pure Butterworth when:**
- You need sharpest possible frequency cutoff
- Isolating oscillations is critical
- Smoothness is not a priority
- You're doing pure signal processing

**Use DWT when:**
- You want automatic multi-scale decomposition
- Processing speed is critical
- Period pairs align with powers of 2
- You're exploring data structure
- Note: May not isolate specific frequencies well

**For forecasting:**
- **Temperature/weather:** Savitzky-Golay + Butterworth cleanup (default)
- **Energy/load:** Savitzky-Golay + cleanup or pure Butterworth
- **Financial:** Try all three and compare
- **General rule:** Start with SavGol + cleanup, it works well for most cases

**Special note on trend extraction:**
When the upper period bound exceeds the Nyquist frequency (signal_length/2), the filters automatically use mean removal: `trend = lowpass(lower_period) - mean(signal)`. This ensures stable trend extraction without requiring impossibly long windows.

**Customization:**
```python
# Disable Butterworth cleanup (pure SavGol)
decomposer = SignalDecomposer(
    filter_type='savgol',
    savgol_butter_cleanup=False  # Disable cleanup
)

# Adjust cleanup margin (default 0.2 = 20%)
decomposer = SignalDecomposer(
    filter_type='savgol',
    savgol_butter_cleanup=True,
    savgol_butter_margin=0.1  # Stricter (10% margin)
)
```

## 14. Example: Using Decomposition for Forecasting

In [None]:
# Example: Prepare data for forecasting with bandpass features
df_forecast = preprocess_for_forecast(
    df[['ds', 'y']].copy(),
    decompose=True,
    freq=freq,
    period_pairs=[(0.5, 1), (1, 3), (3, 7), (7, 14)],
    filter_type='savgol',
    train_end_date='2023-06-01',
    mode='keep'
)

print(f"Forecast-ready data shape: {df_forecast.shape}")
print(f"\nColumns available for modeling:")
print([col for col in df_forecast.columns if col.startswith('y_band')])
print(f"\nFirst few rows:")
print(df_forecast[['ds', 'y', 'y_band_0', 'y_band_1', 'y_band_2', 'y_band_3']].head())