### Price (Total Return)
**The "Scoreboard" (Absolute Growth)**

*   **Formula:**
    Calculated as the percentage change over the specific lookback period:
    $$\text{Gain} = \left( \frac{\text{Price}_{\text{Decision}}}{\text{Price}_{\text{Start}}} \right) - 1$$

*   **Interpretation:**
    *   **Positive (> 0):** The stock is objectively worth more now than at the start of the window.
    *   **Negative (< 0):** The stock has lost value.
    *   **Magnitude:** A +0.20 (20%) gain identifies a "Winner," but doesn't tell you how "bumpy" the ride was to get there.

*   **How the Alpha Engine Uses It:**
    *   **"Pure Momentum":** When the engine ranks by "Price," it is looking for the "Fastest Horses" in the race, ignoring volatility entirely. It assumes that "strength leads to more strength."
    *   **"Baseline Benchmarking":** It serves as the raw material for the Sharpe ratio; without Price gain, there is no "Reward" to justify the "Risk."

---

### Sharpe Ratio (Annualized)
**The "Risk Manager’s Ruler" (Efficiency & Consistency)**

*   **Formula:**
    Calculated using the mean and standard deviation of daily returns ($r$), scaled to a trading year:
    $$\text{Sharpe} = \frac{\text{Mean}(r)}{\text{StdDev}(r)} \times \sqrt{252}$$

*   **Interpretation (Risk-Adjusted Units):**
    *   **> 2.0 (Elite):** The stock is moving up in a "straight line." Very high return for very little volatility.
    *   **1.0 (Good):** A balanced investment where the reward justifies the fluctuations.
    *   **< 0.5 (Inefficient):** The "ride" is too wild for the amount of profit generated.

*   **How the Alpha Engine Uses It:**
    *   **"Finding the Steady Climber":** The engine uses Sharpe to filter out "Lucky Gambles." It prefers a stock that goes up 1% every week over a stock that jumps 20% in one day and crashes 10% the next.
    *   **"Quality Control":** In your **Sync'd Funnel**, Sharpe is the ultimate "Coarse Filter" to find high-quality candidates.

---

### 6. Sharpe (ATRP)
**The "Regulated Efficiency" Metric (Smoothed Risk)**

*   **Formula:**
    Uses **Average True Range (ATR)**, which applies **Wilder’s Smoothing** (an Exponential Moving Average) to the raw True Range.
    $$
    ATR_t = \alpha \times TR_t + (1 - \alpha) \times ATR_{t-1}
    $$ 
    *(Where $\alpha = 1/14$ in your global settings)*    
    $$
    ATRP = \frac{ATR}{\text{Adj Close}}
    $$
    $$
    \text{Sharpe(ATRP)} = \frac{\text{Mean}(\text{Daily Returns}, N)}{\text{Mean}(ATRP, N)}
    $$


*   **Interpretation (The "Warm Start" Requirement):**
    *   **Memory-Dependent:** ATRP has a "Long Memory." Today's value is influenced by data from weeks or months ago.
    *   **Low Sensitivity:** It filters out "noise." A single-day spike won't move the ATRP much; it requires a sustained increase in volatility to change the score.
    *   **Warm Start:** **This metric requires convergence.** Because it is an EWM, the math is "wrong" for the first ~50–100 days of a stock's life while the average "warms up."
    *   **High Value (> 0.5):** The stock is "drifting" upward with very little chaotic noise. This is a high-quality, stable trend that is easy to sit through.
    *   **Low Value (< 0.1):** Even if the stock is going up, it is doing so with massive "jagged" moves. The risk/stress of holding it outweighs the reward.
    
*   **How the Alpha Engine Uses It:**
    *   **"The Strategic Anchor":** This is your "Filter 1." It looks for stocks that have a high return relative to their *normalized, long-term* volatility regime. It prevents the engine from buying "flash-in-the-pan" volatility spikes.

*   **How the RL Agent Uses It:**
    *   **"Quality Control":** The Agent uses this as a filter to avoid "Spiky" stocks. It identifies stocks where the gains are consistent and the "Typical Daily Range" is small compared to the profit.
    *   **"Position Sizing":** Because ATRP is smoothed, the Agent can rely on this metric to predict the "Safety" of a trend over the next several days.
---

### 1. The Ratio Structure (The "Reward-to-Risk")
*   **Formula:** $\frac{\text{Average Returns}}{\text{Average ATRP}}$

### 5. Sharpe (TRP)
**The "Tactical Sniper" (Instant/Raw Risk-Adjusted Return, Memoryless Risk)**

*   **Formula:**
    Uses the **Raw True Range (TR)** without any smoothing. TR is the maximum of (High-Low, |High-PrevClose|, |Low-PrevClose|).  
    Then, we calculate the **TRP** (True Range Percent), which measures the "Instant Heat" or raw price movement of the current bar:
    $$TR = \max(\text{High} - \text{Low}, |\text{High} - \text{PrevClose}|, |\text{Low} - \text{PrevClose}|)$$
    $$TRP = \frac{TR}{\text{Price}}$$
    Then, we calculate the efficiency ratio using the raw daily range:
    $$\text{Sharpe(TRP)} = \frac{\text{Mean}(\text{Daily Returns}, N)}{\text{Mean}(TRP, N)}$$

*   **Interpretation:**
    *   **Memoryless:** TRP is "Instant." It only cares about what happened **today**.
    *   **High Value:** The stock is moving up efficiently relative to its *maximum daily expansion*. It identifies "Tight" breakouts where the price moves up without requiring a massive, scary intra-day swing.
    *   **Low Value:** The stock is "Thrashing." It might be up for the day, but it had a massive High-to-Low range (TR) to get there, suggesting high intra-day uncertainty.

*   **How the RL Agent Uses It:**
    *   **"Conviction Filter":** Unlike ATRP (which is slow/smoothed), TRP is "Fast." The Agent uses Sharpe(TRP) to detect when a stock has suddenly become "Efficiently Explosive."
    *   **"The Funnel Sniper":** As we discussed in the 2-Filter method, the Agent can use Sharpe(ATRP) to find the "Healthy" stocks and then use Sharpe(TRP) to pick the one that is moving with the most "surgical" precision *right now*.  

---

### Momentum nD (Rate of Change)
**The "Trend Surfer" (Velocity & Conviction)**

*   **Formula:**
    Measures the velocity of price change over $n$ days (where $n$ is 1, 3, 5, 10, or 21):
    $$\text{ROC}_n = \left( \frac{\text{Price}_t}{\text{Price}_{t-n}} \right) - 1$$

*   **Interpretation:**
    *   **High Positive:** The stock is in a "Vertical Breakout." Buying here is a play on continued trend following.
    *   **Near Zero:** The stock is "Consolidating" or "Dead Money." 
    *   **High Negative:** The stock is in a "Freefall."

*   **How the Alpha Engine Uses It:**
    *   **"Velocity Ranking":** The engine uses ROC to find stocks that are currently "In Play." 
    *   **"Convergence":** If `ROC_5` (Weekly) is higher than `ROC_21` (Monthly), the engine detects that the stock is **accelerating**, which is a high-conviction buy signal.

---

### Pullback nD (Mean Reversion)
**The "Coiled Spring" (Buying the Dip)**

*   **Formula:**
    The mathematical inverse of Momentum. It identifies the magnitude of a recent drop:
    $$\text{Pullback}_n = -\left[ \left( \frac{\text{Price}_t}{\text{Price}_{t-n}} \right) - 1 \right]$$

*   **Interpretation:**
    *   **Positive Value:** Represents a "Discount." A Pullback of +0.10 means the stock has dropped 10% recently.
    *   **Negative Value:** The stock is actually moving up (not pulling back).

*   **How the Alpha Engine Uses It:**
    *   **"Buying the Blood":** The engine uses this to find "High Quality" stocks (from Filter 1) that are currently experiencing a "Temporary Sale" (Filter 2).
    *   **"Mean Reversion":** The engine assumes that if a strong stock pulls back significantly, it is "Oversold" and likely to "Snap Back" to its previous trend. It looks for the highest Pullback score to find the biggest "Spring Load."

---

### RSI (Relative Strength Index)
**The "Speedometer" (Internal Momentum)**

*   **Formula:**
    $$RSI = 100 - \frac{100}{1 + RS}$$
    Where $RS = \frac{\text{Average Gain over } N \text{ days}}{\text{Average Loss over } N \text{ days}}$.
    *(We use Wilder’s Smoothing, which remembers previous days effectively.)*

*   **Interpretation (0 to 100):**
    *   **> 70 (Overbought):** The price has moved up too fast, too soon. The "engine is redlining." It might need to cool down (pull back).
    *   **< 30 (Oversold):** The price has dropped too aggressively. It might be due for a bounce (mean reversion).
    *   **50 (Neutral):** Buyers and Sellers are balanced.

*   **How the RL Agent Uses It:**
    *   **"Don't Chase":** If RSI is 85, the agent learns that buying now usually leads to a loss because the stock is about to pull back.
    *   **"Buy the Dip":** If RSI drops to 35 while the trend is up, the agent sees a discount opportunity.

---

### Alpha RelStrength (Relative Strength vs. Benchmark)
**The "Race Position" (Comparative Performance)**

*   **Formula:**
    First, we create a Ratio Line:
    $$\text{Ratio} = \frac{\text{Stock Price}}{\text{S\&P 500 Price (SPY)}}$$
    Then, we measure the momentum of that ratio:
    $$\text{RelStrength} = \text{Percentage Change of Ratio (21 Days)}$$

*   **Interpretation:**
    *   **Positive (+):** The stock is **outperforming** the market.
        *   *Example:* Stock is up 5%, SPY is up 1%. This is a Leader.
        *   *Example:* Stock is flat (0%), SPY is down -2%. This is "Relative Strength" (showing resilience).
    *   **Negative (-):** The stock is **underperforming** (Lagging).
        *   *Example:* Stock is up 1%, SPY is up 3%. This is actually "weakness" disguised as a gain.

*   **How the RL Agent Uses It:**
    *   **"Pick Winners":** In a bull market, buying everything works. In a mixed market, the Agent learns to only buy stocks with positive `RelStrength`.
    *   **"Avoid Traps":** If a stock looks cheap but `RelStrength` is falling, the Agent learns it's likely a "value trap" (a dying company).

---

### OBV_Score (On-Balance Volume Z-Score)
**The "X-Ray Machine" (Hidden Accumulation/Distribution)**

*   **Formula:**
    First, we calculate Cumulative OBV:
    $$OBV_{cum} = \sum (\text{Volume} \times \text{Direction})$$
    Then, we Normalize it (Z-Score) to make it readable for the AI:
    $$OBV\_Score = \frac{OBV_{cum} - \text{Mean}(OBV, 21)}{\text{StdDev}(OBV, 21)}$$

*   **Interpretation (Standard Deviations):**
    *   **> +2.0 (Accumulation):** Buyers are aggressively accumulating shares, even if the price hasn't moved up yet. This is a "Coiled Spring."
    *   **< -2.0 (Distribution):** Sellers are dumping shares, even if the price is holding up. This is a "Hidden Weakness."
    *   **0.0 (Neutral):** Volume flow matches the price trend.

*   **How the RL Agent Uses It:**
    *   **"Spotting Divergence":** If Price is Flat, but OBV_Score shoots up to +3.0, the Agent sees "Smart Money" buying and enters the trade *before* the price breakout.
    *   **"Early Exit":** If Price is hitting new highs, but OBV_Score is dropping to -1.0, the Agent detects "Exhaustion" and sells before the crash.
---    

### $V Inc (Cur$Vol / Avg$Vol(21)) 
**The "Conviction Meter" (Ticker Relative Volume)**

*   **Formula:**
    $$RVol = \frac{\text{Current Dollar Volume}}{\text{Average Dollar Volume (21 Days)}}$$
    *(We use Dollar Volume, calculated as Price $\times$ Shares, to measure true liquidity impact).*

*   **Interpretation (Ratio):**
    *   **1.0 (Normal):** Trading activity is exactly average.
    *   **> 2.0 (High Conviction):** Institutional money is active. The stock is trading 2x its normal size. A price move accompanied by this volume is likely "Real."
    *   **< 0.5 (Low Interest):** The market is ignoring this stock. Price moves here are often "Noise" or retail drifting.

*   **How the RL Agent Uses It:**
    *   **"Filter Fakes":** If the Price jumps +5% but RVol is only 0.4, the Agent learns this is likely a "bull trap" and refuses to buy.
    *   **"Ride the Wave":** If Price breaks out and RVol spikes to 3.5, the Agent learns that big buyers are committed, increasing the probability of a sustained trend.

---

### Vol Inc (5dStDev / 21dStDev) 
**The "Weather Forecast" (VolRegime (Volatility Regime))**

*   **Formula:**
    $$\text{VolRegime} = \frac{\text{Short Term Volatility (5-Day StdDev)}}{\text{Long Term Volatility (21-Day StdDev)}}$$

*   **Interpretation:**
    *   **< 1.0 (Compression / Squeeze):** The price range is getting tighter and tighter. It is "coiling up" like a spring. This is the **Calm before the Storm**.
    *   **> 1.0 (Expansion / Breakout):** The spring has sprung. Price is moving violently. The "Storm" has started.
    *   **> 2.0 (Climax):** Volatility is extreme. The move might be exhausted.

*   **How the RL Agent Uses It:**
    *   **"Timing the Entry":** The most explosive trades happen right as `VolRegime` transitions from < 1.0 to > 1.0 (The Breakout).
    *   **"Risk Management":** If `VolRegime` is very high (e.g., 3.0), the Agent learns that the environment is dangerous and unpredictable, prompting it to reduce position size or sell.

---

In [None]:
##################################
##################################

---

### Consistency_5d (5-Day Win-Rate Score)  
**The "Steady Eddie" Meter**  

*   **Formula:**  
    $$
    \text{Consistency\_5d} = \frac{1}{5}\sum_{i=t-4}^{t}\mathbb{1}_{r_i>0}
    $$  

*   **Interpretation (Hit Ratio):**  
    *   **> 0.80:** Stock has closed higher 4-of-5 days → serially positive, low noise.  
    *   **< 0.20:** 4-of-5 down closes → persistent offer, avoid longs.  
    *   **≈ 0.50:** Coin-toss action, no edge.  

*   **How the RL Agent Uses It:**  
    *   **"Filter First":** Only allows entry if Consistency_5d > 0.70 → eliminates lottery-ticket pumps.  
    *   **"Exit on Decay":** Holds while win-rate stays > 0.60; drops position immediately if win-rate falls below 0.40 (trend fatigue).  

---

### IR_63d (63-Day Information Ratio vs SPY)  
**The "Persistent Alpha" Gauge**  

*   **Formula:**  
    $$
    \text{IR\_63d} = \frac{\bar{r}_{\text{active}}} {\sigma_{\text{active}}}, \quad \text{active}_i = r_i - r_{\text{SPY},i}
    $$  

*   **Interpretation (Standard Deviations of Alpha):**  
    *   **> +0.50:** Stock delivers +50 bp of alpha per unit of tracking risk → quality out-performer.  
    *   **< -0.50:** Chronic under-performer, even on risk-adjusted basis.  
    *   **≈ 0.00:** Index clone, no timing value.  

*   **How the RL Agent Uses It:**  
    *   **"Core Long List":** Only considers names with IR_63d > 0.30 for long inventory.  
    *   **"Short Candidate":** Automatically flags IR_63d < -0.30 for potential short if regime flips bearish.  

---

### max_DD_21d (21-Day Maximum Drawdown)  
**The "Tail-Risk Thermometer"**  

*   **Formula:**  
    $$
    \text{max\_DD\_21d} = \min_{t-20\le j \le t}\left( \frac{\text{Close}_j - \max_{t-20\le k \le j}\text{Close}_k} {\max_{t-20\le k \le j}\text{Close}_k} \right)
    $$  

*   **Interpretation (Peak-to-Trough Pain):**  
    *   **> -5 %:** Tight risk control, shallow pull-backs.  
    *   **< -15 %:** Deep retracement, high tail risk.  
    *   **≈ -10 %:** Normal equity noise.  

*   **How the RL Agent Uses It:**  
    *   **"Position Size Limiter":** Scales gross exposure ∝ (1 + max_DD_21d / 0.15); deeper DD → smaller line.  
    *   **"Stop-Gate":** Halts new entries if max_DD_21d < -12 % until DD recovers above -8 %.  

---

### beta_SPY (63-Day Beta to S&P 500)  
**The "Macro Gear" Sensor**  

*   **Formula:**  
    $$
    \beta_{\text{SPY}} = \frac{\text{Cov}(r_{\text{stock}}, r_{\text{SPY}})} {\text{Var}(r_{\text{SPY}})} \quad \text{(63-day rolling)}
    $$  

*   **Interpretation (Market Sensitivity):**  
    *   **> 1.3:** High-beta rocket; amplifies index moves.  
    *   **< 0.70:** Low-beta/defensive; decorrelates during sell-offs.  
    *   **≈ 1.00:** Index-like behavior.  

*   **How the RL Agent Uses It:**  
    *   **"Regime Tilt":** Raises gross exposure when beta_SPY < 0.8 in risk-off macro (defensives win).  
    *   **"Hedge Sizing":** Uses β to calculate dollar-neutral hedge with SPY futures (β-adjusted DV01).  

---

### OBV_Sharpe_21d (OBV Risk-Adjusted Momentum)  
**The "Volume-Efficiency" Score**  

*   **Formula:**  
    $$
    \text{OBV\_Sharpe\_21d} = \frac{\text{Mean}(\text{OBV\_ret}, 21)} {\text{StdDev}(\text{OBV\_ret}, 21)}, \quad \text{OBV\_ret}_t = \frac{\text{OBV}_t - \text{OBV}_{t-1}} {\text{OBV}_{t-1}}
    $$  

*   **Interpretation (Volume Trend per Unit Risk):**  
    *   **> +1.0:** Rising volume stream with low volatility → institutional accumulation.  
    *   **< -1.0:** Falling volume stream with high volatility → distribution or churn.  
    *   **≈ 0.0:** Random volume noise.  

*   **How the RL Agent Uses It:**  
    *   **"Pre-Breakout Entry":** Buys when OBV_Sharpe_21d crosses above +1.0 while price is still range-bound (volume leading price).  
    *   **"Failure Confirmation":** Sells when OBV_Sharpe_21d drops below -0.5 even if price looks stable (volume disagreeing with price).

In [None]:
##################################
##################################

Quants treat **Sharpe(ATRP)**—return-per-unit-of-smoothed-vol—as a *slow-moving regime thermometer*, not a touch-off trigger.  Typical workflow:

1. **Universe pre-filter (weekly rebalance)**  
   - Rank all liquid names by Sharpe(ATRP) using 60- to 120-day mean return ÷ 14-day Wilder ATRP.  
   - Keep only the top *k* deciles (or z-score > 0.5).  
   *Goal*: discard stocks that are “cheap for a reason”—i.e. high realised return but even higher *sustained* volatility; keeps the ones whose vol has stayed *quiet* while price has drifted up.

2. **Regime bucketing**  
   - Break the history of Sharpe(ATRP) into quintiles.  
   - When a ticker *crosses* from Q3 → Q4/Q5 for **four consecutive weeks**, tag it “vol-adjusted momentum”.  
   - Reverse crossing (Q4 → Q2) flags *vol-expansion-without-return* → avoid or go flat.

3. **Position sizing**  
   - Target vol of portfolio (e.g. 8 % ann.).  
   - Use **inverse of current ATRP** as the *volatility scalar*; because ATRP is already smoothed, position changes are *gradual*—no whipsaw on one-day gap.  
   - Sharpe(ATRP) enters as a *multiplier*:  
     `size_i ∝ (Sharpe(ATRP)_i – floor) / Σ(…)`  
     so only positive-regime names get full risk budget.

4. **Stress / overlay**  
   - When **median Sharpe(ATRP)** of the whole universe drops below zero for **>20 trading days**, book reduces gross exposure (e.g. from 100 % to 60 %).  
   - This caught Feb-2018, Sep-2020, Jan-2022 vol explosions in many prop models because the *smoothed* ratio turned *before* 30-day realised vol exploded.

5. **Option world**  
   - Vol-arb desks go **long gamma** on names whose *recent* Sharpe(ATRP) has **fallen** two deciles (return flat, vol quiet = cheap options).  
   - **Short gamma** on names whose Sharpe(ATRP) has **risen** two deciles (return strong but vol now too sleepy → expect vol sellers to be over-positioned).

6. **Futures / asset allocation**  
   - Compute Sharpe(ATRP) for equity index, bond futures, gold, etc.  
   - Feed into **risk-parity or momentum overlay**: increase weight to asset whose Sharpe(ATRP) > 1-yr median and *trending up* for 40 days; decrease if below median and falling.

Key takeaway: because the metric moves **slowly**, quents use it to **tilt, scale, or gate**—not to time exact entry/exit.  It answers “*should I be there at all?*” and “*how much risk once I am?*” rather than “*do I buy tomorrow’s open?*”

In [None]:
##################################
##################################

#### 1. The Pandas Method (The "Honest" Shock)
Pandas `ewm(adjust=False)` treats the first valid observation as the **Seed**.
*   $AverageGain = 0.197$ (100% of the movement)
*   $AverageLoss = 0.0$
*   $RS = \frac{0.197}{0} = \infty$
*   **RSI = 100**

This is the **Google Deepmind Research Team**.

Excellent question. To train a successful RL Agent, you must understand what "senses" you are giving it. If the agent doesn't understand the input, it cannot learn the output.

Here is the breakdown of the three new "Sensors" we just installed in your engine.

---






### Summary of the "Agent's Brain"

With these three features, your Agent can now "think" like a Senior Trader:

1.  **Is this stock beating the market?** (Checking `RelStrength`)
2.  **Is it coiled and ready to move?** (Checking `VolRegime`)
3.  **Is the price currently too expensive/overextended?** (Checking `RSI`)

This is a massive upgrade from just knowing "did the price go up yesterday."

This is the **Google Deepmind Research Team**.

Here is the breakdown of these two critical concepts with concrete mathematical examples.

---

### 2. Wilder's Smoothing ("The Memory")
**The Concept:**
A standard Simple Moving Average (SMA) has **Amensia**. If you use a 14-day SMA, on Day 15, the data from Day 1 is *completely deleted*.
**Wilder's Smoothing** (used in RSI and ATR) has **Long-Term Memory**. It works like a radioactive half-life. Old data decays, but it never completely disappears.

**The Formula:**
$$ \text{Current Value} = \frac{\text{Current Data} + (\text{Previous Value} \times (N-1))}{N} $$

*   *Note:* This means today counts for $1$ part, and history counts for $N-1$ parts.

**Example: The "Crash" Memory**
Let's use an **ATR (Volatility)** example with **N=14**.
Assume the stock usually moves $1 a day. So the ATR is **$1.00**.

*   **Day 1 (The Crash):** The stock crashes $15.
    *   True Range (Input): 15.0
    *   New ATR: $\frac{15 + (1.0 \times 13)}{14} = \frac{28}{14} = \mathbf{2.0}$
    *   *Result:* The ATR doubles. The system is now on "High Alert."

*   **Day 2 (Quiet Day):** The market goes dead silent. Moves only $1.
    *   **If we used SMA:** It would just average the numbers.
    *   **Using Wilder's:**
        $$ \text{New ATR} = \frac{1 (\text{Today}) + (2.0 \times 13)}{14} = \frac{27}{14} \approx \mathbf{1.93} $$
    *   *Result:* Even though today was quiet ($1), the ATR stays very high ($1.93). **It remembers the crash from yesterday.**

*   **Day 15 (Two Weeks Later):**
    *   **If we used SMA:** The $15 crash falls out of the window. The SMA drops instantly back to 1.0. The system "forgets" the risk.
    *   **Using Wilder's:** That $15 crash is still mathematically present, just diluted. The ATR might still be $1.20, telling the RL Agent: *"Be careful, this stock has a history of violence."*

**Why this matters for your Code:**
By using `ewm(alpha=1/period, adjust=False)` in your code, you enabled this "infinite memory." This prevents your RL agent from getting blindsided by a stock that looks calm today but exploded three weeks ago.

---


### 2. Spy_RVol (Benchmark Relative Volume)
**The "Market Heartbeat" (Global Environment State)**

*   **Formula:**
    $$Spy\_RVol = \frac{\text{SPY Current Dollar Volume}}{\text{SPY Average Dollar Volume (21 Days)}}$$
    *(This value is broadcasted to every stock. All stocks see the same Spy_RVol for a given day).*

*   **Interpretation (Ratio):**
    *   **1.0 (Status Quo):** The broader market is behaving normally.
    *   **> 2.5 (Panic / Euphoria):** Extreme participation. The entire herd is moving. This usually happens during market crashes or massive rallies.
    *   **< 0.6 (Holiday Mode):** Low participation. Smart money is on the sidelines (e.g., day before a holiday).

*   **How the RL Agent Uses It:**
    *   **"Risk On / Risk Off":** The Agent learns that patterns working in a quiet market (Spy_RVol $\approx$ 1.0) might fail spectacularly during a panic (Spy_RVol > 3.0).
    *   **"Ignore Noise":** If Spy_RVol is very low, the Agent learns that signals are weak and it might be better to sit in cash.

---



---

### 4. SPY_OBV_Score (Benchmark On-Balance Volume)
**The "Tide" (Market Directional Pressure)**

*   **Formula:**
    Calculated exactly like the Ticker OBV, but using the Benchmark (SPY) data, then broadcast to all stocks:
    $$SPY\_OBV\_Score = \frac{OBV_{SPY} - \text{Mean}(OBV_{SPY}, 21)}{\text{StdDev}(OBV_{SPY}, 21)}$$
    *(This tells us if the broad market is under accumulation or distribution, regardless of what the individual stock is doing.)*

*   **Interpretation (Standard Deviations):**
    *   **> +2.0 (Market Accumulation):** The tide is coming in. Broad market buying pressure is high. *Most* stocks will float up easily.
    *   **< -2.0 (Market Distribution):** The tide is going out. Institutional investors are selling the index. *Most* stocks will sink.
    *   **0.0 (Neutral):** Buying and selling pressure in the broad market are balanced.

*   **How the RL Agent Uses It:**
    *   **"Go with the Flow":** If the Stock's `OBV_Score` is positive AND `SPY_OBV_Score` is positive, the Agent recognizes a **High Probability Trade**. It is easier to swim with the current.
    *   **"Swim Upstream":** If the Stock's `OBV_Score` is positive (Buy signal) but `SPY_OBV_Score` is -2.5 (Market Crash), the Agent learns to **Hesitate or Reduce Size**. It knows the stock is fighting a massive headwind.

### 5. Sharpe (TRP)
**The "Tactical Sniper" (Instant/Raw Risk-Adjusted Return)**

*   **Formula:**
    First, we calculate the **TRP** (True Range Percent), which measures the "Instant Heat" or raw price movement of the current bar:
    $$TR = \max(\text{High} - \text{Low}, |\text{High} - \text{PrevClose}|, |\text{Low} - \text{PrevClose}|)$$
    $$TRP = \frac{TR}{\text{Price}}$$
    Then, we calculate the efficiency ratio using the raw daily range:
    $$\text{Sharpe(TRP)} = \frac{\text{Mean}(\text{Daily Returns}, N)}{\text{Mean}(TRP, N)}$$

*   **Interpretation:**
    *   **High Value:** The stock is moving up efficiently relative to its *maximum daily expansion*. It identifies "Tight" breakouts where the price moves up without requiring a massive, scary intra-day swing.
    *   **Low Value:** The stock is "Thrashing." It might be up for the day, but it had a massive High-to-Low range (TR) to get there, suggesting high intra-day uncertainty.

*   **How the RL Agent Uses It:**
    *   **"Conviction Filter":** Unlike ATRP (which is slow/smoothed), TRP is "Fast." The Agent uses Sharpe(TRP) to detect when a stock has suddenly become "Efficiently Explosive."
    *   **"The Funnel Sniper":** As we discussed in the 2-Filter method, the Agent can use Sharpe(ATRP) to find the "Healthy" stocks and then use Sharpe(TRP) to pick the one that is moving with the most "surgical" precision *right now*.  

---

$$\text{Batting Average} = \frac{\sum (\text{Returns}_{\text{Ticker}} > \text{Returns}_{\text{Benchmark}})}{\text{Total Periods}}$$