## Feature Engineering Summary Table

| Metric | Category | Dependencies | Key Formula | Default Params |
|--------|----------|--------------|-------------|----------------|
| **Ret_1d** | Return | `Adj Close` | $\frac{\text{Adj Close}_t}{\text{Adj Close}_{t-1}} - 1$ | — |
| **TRP** | Volatility | `Adj High`, `Adj Low`, `Adj Close` | $\frac{\max(\text{HL}, \|\text{H-PC}\|, \|\text{L-PC}\|)}{\text{Adj Close}_t}$ | — |
| **ATR** | Volatility | `TR` | $\alpha \cdot \text{TR}_t + (1-\alpha) \cdot \text{ATR}_{t-1}$, $\alpha=1/14$ | `atr_p=14` |
| **ATRP** | Volatility | `ATR`, `Adj Close` | $\frac{\text{ATR}_t}{\text{Adj Close}_t}$ | — |
| **RSI** | Momentum | `Adj Close` | $100 - \frac{100}{1 + \frac{\text{EMA}(\text{Up})}{\text{EMA}(\text{Down})}}$ | `rsi_p=14` |
| **Mom_21** | Momentum | `Adj Close` | $\frac{\text{Adj Close}_t}{\text{Adj Close}_{t-21}} - 1$ | `win_21=21` |
| **Consistency** | Momentum/Quality | `Ret_1d` | $\frac{1}{5}\sum_{i=0}^{4}\mathbb{1}_{[\text{Ret\_1d}_{t-i} \gt 0]}$ | `win_5=5` |
| **Beta_63** | Risk/Systematic | `Ret_1d`, `Mkt_Ret` | $\frac{\text{Cov}(R_i, R_m)}{\text{Var}(R_m)}$ | `win_63=63` |
| **IR_63** | Risk-Adjusted Performance | `Ret_1d`, `Mkt_Ret` | $\frac{\text{Mean}(R_i - R_m)}{\text{Std}(R_i - R_m)}$ | `win_63=63` |
| **DD_21** | Risk/Drawdown | `Adj Close` | $\frac{\text{Adj Close}_t}{\text{RollingMax}_{21}} - 1$ | `win_21=21` |
| **RollingStalePct** | Data Quality | `Volume`, `Adj High`, `Adj Low` | $\text{Mean}(\mathbb{1}_{[V=0 \lor H=L]})$ over 252d | `q_win=252`, `q_min=126` |
| **RollMedDollarVol** | Liquidity/Quality | `Adj Close`, `Volume` | $\text{Median}(\text{Adj Close} \times \text{Volume})$ over 252d | `q_win=252`, `q_min=126` |
| **RollingSameVolCount** | Data Quality | `Volume` | $\sum\mathbb{1}_{[V_t = V_{t-1}]}$ over 252d | `q_win=252`, `q_min=126` |

## Dependency Graph
Ret_1d ← Adj Close  
↓  
Consistency, Beta_63, IR_63, ATR (via prev_close)  
TRP ← Adj High, Adj Low, Adj Close (prev_close)  
↓  
ATR ← TRP/TR  
↓  
ATRP  
RSI ← Adj Close (separate path)  
Mom_21 ← Adj Close  
DD_21 ← Adj Close  
Quality Metrics ← Volume, Adj Close, Adj High, Adj Low  


## Global Settings Reference

| Setting | Variable | Used By |
|---------|----------|---------|
| ATR Period | `atr_p` | ATR |
| RSI Period | `rsi_p` | RSI |
| 5-Day Window | `win_5` | Consistency |
| 21-Day Window | `win_21` | Mom_21, DD_21 |
| 63-Day Window | `win_63` | Beta_63, IR_63 |
| Quality Window | `q_win` | All quality metrics |
| Quality Min Periods | `q_min` | All quality metrics |

---
## Ret_1d

| Attribute | Value |
|-----------|-------|
| **Category** | Return |
| **Dependencies** | `Adj Close` |
| **Used By** | ATR, ATRP, TRP, Consistency, Beta_63, IR_63 |

### Formula
$$\text{Ret\_1d}_t = \frac{\text{Adj Close}_t}{\text{Adj Close}_{t-1}} - 1$$

### Pseudocode
```python
# Per ticker (groupby level=0)
previous_close = shift(Adj Close, 1)
Ret_1d = (Adj Close - previous_close) / previous_close
# Alternative: pct_change()

# From audit_feature_engineering_integrity()
shadow_data["shadow_Ret_1d"] = adj_close.groupby(level=0).pct_change()

Implementation Notes
Uses groupby(level=0) to prevent cross-ticker leakage
First observation per ticker is NaN (no prior day)
Uses adjusted close (splits/dividends accounted for)
Foundation metric for most risk/momentum calculations

---  
## TRP (True Range Percentage)

| Attribute | Value |
|-----------|-------|
| **Category** | Volatility |
| **Dependencies** | `Adj High`, `Adj Low`, `Adj Close` |
| **Used By** | ATR, ATRP |

### Formula
$$\text{True Range}_t = \max\left(\text{Adj High}_t - \text{Adj Low}_t,\ |\text{Adj High}_t - \text{Adj Close}_{t-1}|,\ |\text{Adj Low}_t - \text{Adj Close}_{t-1}|\right)$$

$$\text{TRP}_t = \frac{\text{True Range}_t}{\text{Adj Close}_t}$$

### Pseudocode
```python
# Per ticker (groupby level=0)
prev_close = shift(Adj Close, 1)

tr1 = Adj High - Adj Low                    # Current bar range
tr2 = abs(Adj High - prev_close)            # Gap up vs yesterday close
tr3 = abs(Adj Low - prev_close)             # Gap down vs yesterday close

True Range = max(tr1, tr2, tr3)
TRP = True Range / Adj Close

# From audit_feature_engineering_integrity()
prev_close = adj_close.groupby(level=0).shift(1)
tr = pd.concat(
    [
        adj_high - adj_low,
        (adj_high - prev_close).abs(),
        (adj_low - prev_close).abs(),
    ],
    axis=1,
).max(axis=1, skipna=False)
shadow_data["shadow_TRP"] = tr / adj_close

Implementation Notes
Captures overnight gaps unlike simple High-Low range
First observation per ticker is NaN (requires previous close)
Normalized by current close for cross-asset comparability
Raw TR used for ATR; TRP is the normalized version

---
## ATR (Average True Range)

| Attribute | Value |
|-----------|-------|
| **Category** | Volatility |
| **Dependencies** | `TR` (True Range — max of [High-Low, \|High-PrevClose\|, \|Low-PrevClose\|]) |
| **Used By** | ATRP |

### Formula
$$\text{ATR}_t = \alpha \cdot \text{TR}_t + (1-\alpha) \cdot \text{ATR}_{t-1}$$

Where $\alpha = \frac{1}{\text{atr\_p}} = \frac{1}{14}$ default (Wilder's smoothing)

### Pseudocode
```python
# Per ticker (groupby level=0)
# Initialization: ATR_0 = mean(TR_1 to TR_atr_p) or TR_1
# Then recursive:
ATR_t = alpha * TR_t + (1 - alpha) * ATR_{t-1}

# From audit_feature_engineering_integrity()
shadow_data["shadow_ATR"] = tr.groupby(level=0).transform(
    lambda x: x.ewm(alpha=1 / atr_p, adjust=False).mean()
)

Implementation Notes
Wilder's EMA: α=1/period  vs standard EMA α=2/(period+1) 
adjust=False uses recursive formula above (memory efficient, matches Excel)
First atr_p periods use expanding mean (EWM warmup), then switches to recursive
Lower alpha = slower, smoother response; higher alpha = faster, noisier


---
## ATRP (Average True Range Percentage)

| Attribute | Value |
|-----------|-------|
| **Category** | Volatility |
| **Dependencies** | `ATR` (Average True Range), `Adj Close` |
| **Used By** | — |

### Formula
$$\text{ATRP}_t = \frac{\text{ATR}_t}{\text{Adj Close}_t}$$


### Pseudocode
```python
# Per ticker (groupby level=0)
ATRP = ATR / Adj Close

# From audit_feature_engineering_integrity()
shadow_data["shadow_ATRP"] = shadow_data["shadow_ATR"] / adj_close

Implementation Notes
Normalizes ATR by price level for cross-asset comparability
Allows comparison of volatility between $10 stock and $1000 stock
Same units as TRP (percentage of price)
Useful for position sizing: risk amount / (ATRP × price) = share quantity

---
## RSI (Relative Strength Index)

| Attribute | Value |
|-----------|-------|
| **Category** | Momentum/Oscillator |
| **Dependencies** | `Adj Close` |
| **Used By** | — |

### Formula
$$\text{RSI}_t = 100 - \frac{100}{1 + \text{RS}_t}$$

Where:
- $\text{RS}_t = \frac{\text{Avg Up}_t}{\text{Avg Down}_t}$ (Relative Strength)
- $\text{Avg Up}_t = \alpha \cdot \text{Up}_t + (1-\alpha) \cdot \text{Avg Up}_{t-1}$ (Wilder's EMA of upward changes)
- $\text{Avg Down}_t = \alpha \cdot \text{Down}_t + (1-\alpha) \cdot \text{Avg Down}_{t-1}$ (Wilder's EMA of downward changes)
- $\alpha = \frac{1}{\text{rsi\_p}} = \frac{1}{14}$ default


### Pseudocode
```python
# Per ticker (groupby level=0)
delta = Adj Close - Adj Close.shift(1)
Up = max(delta, 0)
Down = max(-delta, 0)

Avg Up = EMA(Up, alpha=1/rsi_p)
Avg Down = EMA(Down, alpha=1/rsi_p)

RS = Avg Up / Avg Down  # Allow div/0 → inf
Raw RSI = 100 - (100 / (1 + RS))

RSI = Raw RSI.replace(inf→100, -inf→0).fillna(50)

# From audit_feature_engineering_integrity()
delta = adj_close.groupby(level=0).diff()
up, down = delta.clip(lower=0), (-delta).clip(lower=0)

roll_up = up.groupby(level=0).transform(
    lambda x: x.ewm(alpha=1 / rsi_p, adjust=False).mean()
)
roll_down = down.groupby(level=0).transform(
    lambda x: x.ewm(alpha=1 / rsi_p, adjust=False).mean()
)

rs = roll_up / roll_down
raw_rsi = 100 - (100 / (1 + rs))
shadow_data["shadow_RSI"] = raw_rsi.replace({np.inf: 100, -np.inf: 0}).fillna(50)

Implementation Notes
Wilder's original formula: Uses EMA with α=1/period , not 2/(period+1) 
Division by zero: When Avg Down = 0 (no down days), RS = inf → RSI = 100 (correct)
NaN handling:
Initial periods (before EMA warmup): filled with 50 (neutral)
Flat prices (Avg Up = 0 and Avg Down = 0): RS = 0/0 = NaN → filled with 50
Range: 0-100; >70 overbought, <30 oversold (traditional levels)

---  
## Mom_21 (21-Day Momentum)

| Attribute | Value |
|-----------|-------|
| **Category** | Momentum |
| **Dependencies** | `Adj Close` |
| **Used By** | — |

### Formula
$$\text{Mom\_21}_t = \frac{\text{Adj Close}_t - \text{Adj Close}_{t-21}}{\text{Adj Close}_{t-21}} = \frac{\text{Adj Close}_t}{\text{Adj Close}_{t-21}} - 1$$


### Pseudocode
```python
# Per ticker (groupby level=0)
Mom_21 = (Adj Close - Adj Close.shift(21)) / Adj Close.shift(21)
# Or equivalently:
Mom_21 = Adj Close.pct_change(21)

# From audit_feature_engineering_integrity()
shadow_data[f"shadow_Mom_{win_21}"] = adj_close.groupby(level=0).pct_change(win_21)

Implementation Notes
Simple percentage return over 21 trading days (~1 month)
First 21 observations per ticker are NaN (insufficient history)
Positive = uptrend, negative = downtrend
Common tactical asset allocation signal (12M momentum popular in academia, 21D for short-term)
No smoothing — raw price change only

---  
## Consistency (5-Day Positive Return Consistency)

| Attribute | Value |
|-----------|-------|
| **Category** | Momentum/Quality |
| **Dependencies** | `Ret_1d` (1-Day Return) |
| **Used By** | — |


### Formula
$$\text{Consistency}_t = \frac{1}{5} \sum_{i=0}^{4} \mathbb{1}_{[\text{Ret\_1d}_{t-i} \gt 0]}$$

Where $\mathbb{1}_{[\cdot]}$ is the indicator function (1 if condition true, 0 otherwise)


### Pseudocode
```python
# Per ticker (groupby level=0)
positive_flag = 1 if Ret_1d > 0 else 0
Consistency = mean(positive_flag over past 5 days)

# From audit_feature_engineering_integrity()
pos_ret = (shadow_data["shadow_Ret_1d"] > 0).astype(float)
shadow_data["shadow_Consistency"] = pos_ret.groupby(level=0).transform(
    lambda x: x.rolling(win_5).mean()
)

Implementation Notes
Measures directional consistency, not magnitude
Range: 0 to 1 (0% to 100% positive days)
1.0 = all 5 days up; 0.0 = all 5 days down; 0.6 = 3 up, 2 down
Different from volatility — a stock can have high Consistency (always up) but low magnitude, or high magnitude with low Consistency (choppy)
Useful for trend quality assessment: smooth trends have high Consistency + positive Mom_21

---  
## Beta_63 (63-Day Beta vs Market)

| Attribute | Value |
|-----------|-------|
| **Category** | Risk/Systematic |
| **Dependencies** | `Ret_1d` (1-Day Return), `Mkt_Ret` (Market return from engine.macro_df) |
| **Used By** | — |

### Formula
$$\beta_{i,t} = \frac{\text{Cov}(R_{i,t}, R_{m,t})}{\text{Var}(R_{m,t})}$$

Where:
- $R_{i,t}$ = Stock return over 63-day window
- $R_{m,t}$ = Market return over 63-day window
- Cov = rolling covariance, Var = rolling variance


### Pseudocode
```python
# Per ticker (groupby level=0)
stock_ret = Ret_1d  # for single ticker
mkt_ret = Mkt_Ret   # same dates aligned

rolling_cov = stock_ret.rolling(63).cov(mkt_ret)
rolling_var = mkt_ret.rolling(63).var()
Beta_63 = rolling_cov / rolling_var

# From audit_feature_engineering_integrity()
shadow_data[f"shadow_Beta_{win_63}"] = (
    s_ret.groupby(level=0)
    .transform(
        lambda x: x.rolling(win_63).cov(
            mkt_ret.reindex(x.index.get_level_values(1))
        )
        / mkt_ret.reindex(x.index.get_level_values(1)).rolling(win_63).var()
    )
    .fillna(1.0)
)

Implementation Notes
Measures systematic risk: sensitivity to market movements
β=1 : moves with market; β>1 : more volatile than market; β<1 : less volatile; β<0 : inverse
Uses sample covariance/variance (not population)
Requires aligned dates between stock and market returns
NaN if insufficient history or zero market variance
Default fill: 1.0 (neutral/market-like) when calculation fails

---
## IR_63 (63-Day Information Ratio)

| Attribute | Value |
|-----------|-------|
| **Category** | Risk-Adjusted Performance |
| **Dependencies** | `Ret_1d` (1-Day Return), `Mkt_Ret` (Market return from engine.macro_df) |
| **Used By** | — |

### Formula
$$\text{IR}_{i,t} = \frac{\text{Mean}(\text{Active Return}_{i,t})}{\text{Std}(\text{Active Return}_{i,t})}$$

Where:
- $\text{Active Return}_{i,t} = R_{i,t} - R_{m,t}$ (stock return minus market return)
- Mean and Std computed over 63-day rolling window


### Pseudocode
```python
# Per ticker (groupby level=0)
active_ret = Ret_1d - Mkt_Ret  # excess return vs market

mean_active = active_ret.rolling(63).mean()
std_active = active_ret.rolling(63).std()

IR_63 = mean_active / std_active

# From audit_feature_engineering_integrity()
active_ret = s_ret - mkt_series
shadow_data["shadow_IR_63"] = (
    active_ret.groupby(level=0)
    .transform(lambda x: x.rolling(win_63).mean() / x.rolling(win_63).std())
    .fillna(0.0)
)

Implementation Notes
Measures risk-adjusted active return: return per unit of tracking error
IR > 0: outperforming market on risk-adjusted basis; IR < 0: underperforming
Scale: IR ≈ 0.5 is good, IR ≈ 1.0 is exceptional (annualized ~2× higher)
Uses sample standard deviation (denominator n-1)
NaN if insufficient history or zero tracking error
Default fill: 0.0 (neutral) when calculation fails
Unlike Sharpe ratio, uses market (not risk-free rate) as benchmark

## DD_21 (21-Day Drawdown)

| Attribute | Value |
|-----------|-------|
| **Category** | Risk/Drawdown |
| **Dependencies** | `Adj Close` |
| **Used By** | — |

### Formula
$$\text{DD\_21}_t = \frac{\text{Adj Close}_t}{\text{RollingMax}_{21}(\text{Adj Close})_t} - 1$$

Where $\text{RollingMax}_{21}(\text{Adj Close})_t = \max(\text{Adj Close}_{t-20}, ..., \text{Adj Close}_t)$


### Pseudocode
```python
# Per ticker (groupby level=0)
rolling_max_21 = Adj Close.rolling(21).max()
DD_21 = (Adj Close / rolling_max_21) - 1

# From audit_feature_engineering_integrity()
roll_max_21 = adj_close.groupby(level=0).transform(
    lambda x: x.rolling(win_21).max()
)
shadow_data[f"shadow_DD_{win_21}"] = (adj_close / roll_max_21 - 1).fillna(0.0)

Implementation Notes
Measures peak-to-trough decline over trailing 21 days
Range: 0 (at new high) to -1 (complete wipeout within window)
0 = price at 21-day high; -0.10 = 10% below 21-day high; -0.50 = 50% below
Short-term pain indicator — complements longer-term metrics like max drawdown
Always ≤ 0 by construction (0 when at rolling high)
First 20 observations per ticker are NaN (insufficient history for max)

---
## RollingStalePct (Rolling Stale Bar Percentage)

| Attribute | Value |
|-----------|-------|
| **Category** | Data Quality |
| **Dependencies** | `Volume`, `Adj High`, `Adj Low` |
| **Used By** | — |

### Formula
$$\text{StaleBar}_t = \mathbb{1}_{[\text{Volume}_t = 0 \lor \text{Adj High}_t = \text{Adj Low}_t]}$$

$$\text{RollingStalePct}_t = \frac{1}{q\_win} \sum_{i=0}^{q\_win-1} \text{StaleBar}_{t-i}$$


### Pseudocode
```python
# Per ticker (groupby level=0)
stale_condition = (Volume == 0) OR (High == Low)
stale_flag = 1 if stale_condition else 0

RollingStalePct = mean(stale_flag over past q_win days, min_periods=q_min)

# From audit_feature_engineering_integrity()
stale_mask = ((volume == 0) | (adj_high == adj_low)).astype(int)

shadow_data["shadow_RollingStalePct"] = stale_mask.groupby(level=0).transform(
    lambda x: x.rolling(q_win, min_periods=q_min).mean()
)

Implementation Notes
Detects suspicious bars: zero volume (no trade) or zero range (synthetic/placeholder data)
Common in thinly traded stocks, OTC, delisted names, or bad data feeds
Higher values = lower data quality, potentially unreliable signals
Typical thresholds: >5% stale = caution; >20% = avoid for quantitative strategies
Uses min_periods to allow calculation with partial history (e.g., IPOs)
NaN if fewer than q_min observations available

---
## RollMedDollarVol (Rolling Median Dollar Volume)

| Attribute | Value |
|-----------|-------|
| **Category** | Liquidity/Quality |
| **Dependencies** | `Adj Close`, `Volume` |
| **Used By** | — |

### Formula
$$\text{DollarVol}_t = \text{Adj Close}_t \times \text{Volume}_t$$

$$\text{RollMedDollarVol}_t = \text{Median}(\text{DollarVol}_{t-q\_win+1}, ..., \text{DollarVol}_t)$$


### Pseudocode
```python
# Per ticker (groupby level=0)
dollar_volume = Adj Close * Volume

RollMedDollarVol = median(dollar_volume over past q_win days, min_periods=q_min)

# From audit_feature_engineering_integrity()
dollar_vol = adj_close * volume
shadow_data["shadow_RollMedDollarVol"] = dollar_vol.groupby(level=0).transform(
    lambda x: x.rolling(q_win, min_periods=q_min).median()
)

Implementation Notes
Measures liquidity: typical daily trading value in dollars
Uses median (not mean) to ignore spike days (earnings, news events)
Higher = more liquid, easier to trade without market impact
Common thresholds: >$10M = institutional quality; <$1M = retail/illiquid
Critical for position sizing: max position limited by liquidity
NaN if fewer than q_min observations available (e.g., recent IPOs)

---  
## RollingSameVolCount (Rolling Same Volume Count)

| Attribute | Value |
|-----------|-------|
| **Category** | Data Quality |
| **Dependencies** | `Volume` |
| **Used By** | — |

### Formula
$$\text{SameVol}_t = \mathbb{1}_{[\text{Volume}_t = \text{Volume}_{t-1}]}$$

$$\text{RollingSameVolCount}_t = \sum_{i=0}^{q\_win-1} \text{SameVol}_{t-i}$$


### Pseudocode
```python
# Per ticker (groupby level=0)
volume_changed = (Volume - Volume.shift(1)) == 0
same_vol_flag = 1 if volume_changed else 0

RollingSameVolCount = sum(same_vol_flag over past q_win days, min_periods=q_min)

# From audit_feature_engineering_integrity()
same_vol = (volume.groupby(level=0).diff() == 0).astype(int)
shadow_data["shadow_RollingSameVolCount"] = same_vol.groupby(level=0).transform(
    lambda x: x.rolling(q_win, min_periods=q_min).sum()
)

Implementation Notes
Detects stale/synthetic volume data: repeated identical values suggest data filling or errors
Natural volume varies continuously; exact repeats are suspicious
Higher counts = lower data quality, potentially corrupted feed
Typical thresholds: >10 same-volume days/year = caution; >50 = serious data quality issue
Often coincides with delisted, halted, or OTC securities
Uses min_periods to allow calculation with partial history
NaN if fewer than q_min observations available