### 📁 Feature Engineering

This notebook focuses on extracting relevant features from the raw daily cryptocurrency prices. These engineered features are crucial inputs for PCA, clustering, and later trading regime analysis.

In [2]:
import pandas as pd
import numpy as np

# Load the daily price data
df = pd.read_csv("crypto_prices.csv", index_col=0, parse_dates=True)

# Ensure columns are sorted alphabetically for consistency
df = df.sort_index(axis=1)

# === CORE FEATURES ===

# Log returns
log_returns = np.log(df / df.shift(1)).add_suffix("_log_return")

# 30-day volatility (standard deviation of log returns)
volatility = log_returns.rolling(window=30).std().add_suffix("_30d_vol")

# 90-day momentum (% price change)
momentum = df.pct_change(periods=90, fill_method=None).add_suffix("_90d_momentum")

# === TECHNICAL INDICATORS ===

# RSI (14-day)
def compute_rsi(series, window=14):
    delta = series.diff()
    gain = (delta.where(delta > 0, 0)).rolling(window).mean()
    loss = (-delta.where(delta < 0, 0)).rolling(window).mean()
    rs = gain / loss
    return 100 - (100 / (1 + rs))

rsi = pd.concat([compute_rsi(df[col]).rename(f"{col}_rsi14") for col in df.columns], axis=1)

# %B Bollinger Band
def compute_percent_b(series, window=20):
    ma = series.rolling(window).mean()
    std = series.rolling(window).std()
    upper = ma + 2 * std
    lower = ma - 2 * std
    return ((series - lower) / (upper - lower)).rename(f"{series.name}_pctB")

pct_b = pd.concat([compute_percent_b(df[col]) for col in df.columns], axis=1)

# MACD Histogram
def compute_macd_hist(series, fast=12, slow=26, signal=9):
    ema_fast = series.ewm(span=fast).mean()
    ema_slow = series.ewm(span=slow).mean()
    macd = ema_fast - ema_slow
    signal_line = macd.ewm(span=signal).mean()
    return (macd - signal_line).rename(f"{series.name}_macd_hist")

macd_hist = pd.concat([compute_macd_hist(df[col]) for col in df.columns], axis=1)

# Rolling Sharpe Ratio (30-day)
rolling_sharpe = (
    log_returns.rolling(30).mean() / log_returns.rolling(30).std()
).add_suffix("_sharpe30")

# === COMBINE ALL FEATURES ===
features = pd.concat([log_returns, volatility, momentum, rsi, pct_b, macd_hist, rolling_sharpe], axis=1)
features = features.dropna()

# Save to CSV
features.to_csv("crypto_features.csv")
print("✅ Feature engineering complete. Saved to 'crypto_features.csv'.")

✅ Feature engineering complete. Saved to 'crypto_features.csv'.


### 📊 Feature List

We calculate several rolling-window features that help capture **momentum**, **volatility**, and **trend dynamics**:

- **Daily Return**: Percentage change between close prices.
- **Rolling Mean Return (7-day)**: Smooths return signals to reduce noise.
- **Rolling Volatility (7-day)**: Standard deviation of returns, capturing short-term risk.
- **Rolling Momentum (14-day)**: Measures price acceleration to identify short- to mid-term trends.
- **Rolling Maximum Drawdown (14-day)**: Captures downside risk over the lookback window.
- **Bollinger Band Position**: Relative position within the Bollinger Bands.
- **Z-score of Returns (7-day)**: Standardized return to identify unusual movements.
- **Cumulative Return (30-day)**: Measures long-term performance trends.
- **MACD Signal Diff**: Captures crossover signal between trend-following EMAs.
- **RSI (Relative Strength Index)**: Indicates overbought or oversold conditions.

In [5]:
from sklearn.preprocessing import StandardScaler

# Load the engineered features
features = pd.read_csv("crypto_features.csv", index_col=0, parse_dates=True)

# Normalize (Z-score standardization)
scaler = StandardScaler()
features_normalized = pd.DataFrame(
    scaler.fit_transform(features),
    index=features.index,
    columns=features.columns
)

# Save the normalized version
features_normalized.to_csv("crypto_features_normalized.csv")
print("✅ Normalized features saved to 'crypto_features_normalized.csv'.")

✅ Normalized features saved to 'crypto_features_normalized.csv'.


### ⚙️ Rolling Feature Parameters

The window sizes for feature calculations are chosen to balance:
- **Short-term sensitivity** (7–14 day windows for momentum/volatility).
- **Medium-term regime detection** (30-day cumulative returns).### 📌 Output Summary

The final dataset includes:
- **Date** and **Coin name**
- All computed features aligned per asset-date
- Ready for normalization and dimensionality reduction (PCA) in the next step