# Data Wrangling and Feature Engineering for RL Environment
This notebook preprocesses and engineers features from raw trading data to prepare it for Soft Actor-Critic training.


### Intraday ETF Data Processing and Feature Engineering

This cell handles the complete data preparation pipeline for intraday ETF data before modeling.

Steps performed:

1. Data download  
   - Fetches 2-minute interval OHLCV data from Yahoo Finance for the following ETFs:  
     `SPY`, `QQQ`, `IWM`, `TLT`, `GLD`, `XLE`, `XLF`, `EEM`, `HYG`, `DBC`.  
   - Period: from `2025-08-19` to `2025-10-09`.  
   - Data is auto-adjusted for splits and dividends.

2. Data cleaning  
   - Restricts to standard fields: `Open`, `High`, `Low`, `Close`, `Volume`.  
   - Ensures timestamps are localized to `America/New_York`.  
   - Trims to regular U.S. trading hours (`09:30`–`16:00`).  
   - Applies forward and backward filling (up to 5 intervals) to patch short gaps.  
   - Removes empty rows and sorts data chronologically.  
   - Saves the cleaned price panel to `data/processed/etfs_2m_clean.pkl`.

3. Feature construction  
   - Uses `build_all_features` to generate engineered features for each ETF individually.  
   - Ensures a consistent multi-index format `(Ticker, Feature)`.  
   - Removes raw OHLCV fields to retain only derived features.

4. Normalization  
   - Applies z-score normalization feature-wise across the entire dataset:  
     `(x - mean) / (std + 1e-9)`  
   - Handles missing and infinite values by dropping or replacing as needed.  
   - Saves the normalized feature matrix to `data/processed/etfs_2m_features_clean.pkl`.

5. Verification  
   - Ensures correct multi-index structure.  
   - Computes and displays average feature variance across all tickers as a quick sanity check.


In [32]:
import os
import pandas as pd
import numpy as np
import yfinance as yf
from src.utils.features import add_features, build_all_features  


# Downloading and cleaning intraday price data for the selected ETFs

tickers = ["SPY", "QQQ", "IWM", "TLT", "GLD", "XLE", "XLF", "EEM", "HYG", "DBC"]

raw = yf.download(
    tickers=tickers,
    start="2025-08-19",
    end="2025-10-09",
    interval="2m",
    group_by="ticker",
    auto_adjust=True,
    threads=True
)

def clean_panel(raw):
    """
    Cleans raw Yahoo Finance panel data:
    - Restricts to standard OHLCV fields
    - Localizes to NY time zone if missing
    - Trims to regular trading hours
    - Forward/backward fills short gaps
    - Drops entirely empty rows
    """
    fields = ["Open", "High", "Low", "Close", "Volume"]
    df = raw.dropna(how="all")
    df = df.loc[:, pd.IndexSlice[:, fields]]
    df = df.sort_index()
    if df.index.tz is None:
        df = df.tz_localize("America/New_York", nonexistent="shift_forward", ambiguous="NaT")
    df = df.between_time("09:30", "16:00")
    df = df.groupby(axis=1, level=0).ffill(limit=5).bfill(limit=5)
    df = df.dropna(how="all")
    return df

panel = clean_panel(raw)

-
#Saving cleaned price panel to disk for reproducibility

os.makedirs("data/raw", exist_ok=True)
os.makedirs("data/processed", exist_ok=True)
panel.to_pickle("data/processed/etfs_2m_clean.pkl")
print(" Clean panel saved:", panel.shape)


# Feature construction per ticker, each ticker is processed individually using the custom feature builder
-
feat = build_all_features(panel)
print(" Raw feature matrix:", feat.shape)

# Standardising column structure to a consistent multiIndex (Ticker, Feature)
if feat.columns.nlevels == 3:
    feat.columns = feat.columns.droplevel(2)
feat.columns.set_names(["Ticker", "Feature"], inplace=True)

# need to remove raw OHLCV fields from the feature set to keep only engineered features

drop_fields = ["Open", "High", "Low", "Close", "Volume"]
feat = feat.drop(columns=drop_fields, level="Feature", errors="ignore")

# normalising feature-wise z-scoring across the entire dataset to ensure standardisation

feat = (
    feat.T
    .groupby(level="Feature")
    .apply(lambda x: (x - x.mean()) / (x.std() + 1e-9))
    .T
)


#final cleanup and persistence of normalised feature matrix

feat = feat.replace([np.inf, -np.inf], np.nan).dropna(how="all")
feat.to_pickle("data/processed/etfs_2m_features_clean.pkl")
print(" Normalized features saved:", feat.shape)


# sanity check to verify successful normalisation

if feat.columns.nlevels > 2:
    feat.columns = feat.columns.droplevel(list(range(feat.columns.nlevels - 2)))

# Rebuild MultiIndex with clear level names
feat.columns = pd.MultiIndex.from_tuples(feat.columns, names=["Ticker", "Feature"])

# Calculate average standard deviation per feature across tickers
feat_var = (
    feat.T
    .groupby(level="Feature")
    .std()
    .mean(axis=1)
    .sort_values(ascending=False)
)

print(" Feature variance summary:")
print(feat_var)


[*********************100%***********************]  10 of 10 completed
  df = df.groupby(axis=1, level=0).ffill(limit=5).bfill(limit=5)
  df["returns"] = df["Close"].pct_change()
  df["momentum_5"] = df["Close"].pct_change(5)
  df["momentum_20"] = df["Close"].pct_change(20)
  df["vol_change"] = df["Volume"].pct_change()


 Clean panel saved: (2128, 50)
 Raw feature matrix: (2108, 130)
 Normalized features saved: (2108, 80)
 Feature variance summary:
Feature
rsi              1.000000
zscore           1.000000
vol_change       1.000000
macd             1.000000
momentum_20      0.999999
momentum_5       0.999999
returns          0.999997
volatility_20    0.999995
dtype: float64


### Turbulence Index Calculation

This cell defines the `compute_turbulence` function, which quantifies market turbulence using the Mahalanobis distance over a rolling window of past returns.  
The idea is to measure how statistically unusual the current return vector is compared to its recent historical distribution.

Mechanics:
- For each time step after an initial `window` period (default = 390 observations), the function:
  1. Extracts the rolling historical return matrix.
  2. Computes its mean vector (`μ`) and covariance matrix (`Σ`).
  3. Calculates the Mahalanobis distance between the current return vector and the historical mean:
     \[
     D_M = \sqrt{(x - \mu)^T \Sigma^{-1} (x - \mu)}
     \]
  4. Records this as the turbulence measure.
- If the covariance matrix is singular (non-invertible), it assigns `NaN` for that point.

Output:
A `pandas.Series` of turbulence scores aligned with the original index, where higher values indicate greater deviation from recent market behavior — i.e., more "turbulent" conditions.

In the current prototype this is not taken as a feature, but I kept the function since it would be useful in a future extension.

In [33]:
def compute_turbulence(r1, window=390):
    """
    Computes a turbulence index using Mahalanobis distance over a rolling window.
    This measures how unusual the current return vector is relative to its recent history.
    """

    turb = []
    idx = r1.index

    
    for i in range(window, len(r1)):
        
        hist = r1.iloc[i - window:i]

        
        mu = hist.mean().values
        cov = hist.cov().values

        
        if np.linalg.det(cov) <= 0:
            turb.append(np.nan)
            continue

        
        diff = r1.iloc[i].values - mu
        m_dist = np.sqrt(diff.T @ np.linalg.inv(cov) @ diff)
        turb.append(m_dist)

    
    turb = pd.Series(turb, index=idx[window:])
    return turb.reindex(idx)


### Transaction Cost and Execution Modeling Utilities

This module defines helper functions for estimating trading costs, liquidity effects, and execution prices based on market microstructure proxies.

Functions included:

1. `spread_proxy(high, low)`  
   Estimates the bid–ask half-spread from intrabar high–low ranges.  
   Useful when direct spread data is unavailable.  
   Formula: `(high - low) / (0.5 * (high + low)) * 0.25`

2. `realized_vol(close, win=60)`  
   Computes rolling realized volatility as the standard deviation of percentage returns over a specified window.  
   Acts as a simple measure of short-term price variability.

3. `participation(dollar_trade, price, volume)`  
   Calculates the participation rate of a trade relative to the total dollar volume in the bar.  
   Ensures the trade size remains within realistic liquidity limits.

4. `exec_price(side, mid, half_spread, sigma, dt_min, part, k=0.6, lam=2e-4)`  
   Models the executed trade price including spread, drift, and market impact components.  
   - `side` determines trade direction (+1 buy, -1 sell).  
   - `drift` term scales with volatility and time.  
   - `impact` term grows quadratically with participation rate.

5. `turnover_l1(w_new, w_old)`  
   Computes L1 turnover as the sum of absolute changes in portfolio weights between two time steps.  
   Serves as a penalty for frequent rebalancing.

Saved as a script in `costs.py`.

In [28]:
%%writefile src/utils/costs.py
import numpy as np
import pandas as pd

def spread_proxy(high, low):
    """
    Estimates the bid-ask half-spread using high–low ranges.
    Acts as a proxy when direct spread data is unavailable.
    """
    rng = (high - low).clip(lower=0)
    mid = 0.5 * (high + low)
    sp = (rng / mid).fillna(0.0)
    return 0.25 * sp  # Scales the range to approximate half-spread

def realized_vol(close, win=60):
    """
    Computes rolling realized volatility based on percentage returns.
    Uses a simple standard deviation over a specified window.
    """
    r = close.pct_change()
    return r.rolling(win).std().fillna(0.0)

def participation(dollar_trade, price, volume):
    """
    Calculates participation rate relative to the dollar volume of the bar.
    Used to cap trading volume relative to available liquidity.
    """
    dollar_bar = (price * volume).replace(0, np.nan)
    return (dollar_trade / dollar_bar).fillna(0.0).clip(lower=0.0)

def exec_price(side, mid, half_spread, sigma, dt_min, part, k=0.6, lam=2e-4):
    """
    Models execution price with drift and impact components.
    - side: trade direction (+1 buy, -1 sell, 0 no trade)
    - mid: current mid price
    - half_spread: estimated half-spread
    - sigma: local volatility estimate
    - dt_min: time increment in minutes
    - part: participation rate
    - k: drift scaling coefficient
    - lam: impact parameter
    """
    drift = k * sigma * np.sqrt(max(dt_min, 1.0))
    impact = lam * (part ** 2)
    if side > 0:
        px = mid * (1.0 + half_spread + drift + impact)
    elif side < 0:
        px = mid * (1.0 - half_spread - drift - impact)
    else:
        px = mid
    return px

def turnover_l1(w_new, w_old):
    """
    L1 turnover penalty: sum of absolute changes in portfolio weights.
    Used to penalize excessive rebalancing.
    """
    return np.abs(w_new - w_old).sum()



Overwriting src/utils/costs.py


### Portfolio Environment Class

This module defines `PortfolioEnv`, a lightweight simulation environment for multi-asset portfolio management with execution costs, risk penalties, and liquidity constraints.  
It is designed for reinforcement learning or rule-based allocation strategies that interact step-by-step with market data.

Key components:

1. Initialization  
   - Inputs a price panel with fields `Close`, `High`, `Low`, and `Volume`.  
   - Computes half-spread and realized volatility proxies using functions from `src.utils.costs`.  
   - Defines simulation parameters:  
     - `freq_min`: time step size in minutes  
     - `start_equity`: initial portfolio value  
     - `part_cap`: maximum participation rate per trade  
     - `k`, `lam`: drift and impact coefficients for execution model  
     - `gamma_bar`: annualized risk aversion, rescaled to per-minute  
     - `eta_turnover`: turnover penalty weight  

2. Reset  
   - Initializes state variables including equity, cash, share positions, and weights.  
   - Returns the initial observation containing current weights and equity.

3. Observation  
   - `_obs()` returns the observable state as a dictionary with `weights` and `equity`.

4. Step function  
   - Executes one portfolio update step given a target weight vector `w_target`.  
   - Performs:  
     - Weight normalization and participation capping  
     - Execution price estimation using `exec_price`  
     - Trade cash flow and position update  
     - Equity and weight recalculation at next time step  
   - Applies penalties for:  
     - Instantaneous variance (risk)  
     - Turnover (rebalancing cost)  
     - Drift (allocation shift regularization)  
   - Computes realized return from price change and aggregates into a reward:  
     `reward = tanh((pnl - risk_pen - tvr_pen - drift_pen) * 20)`

5. Output  
   - Returns updated observation, computed reward, termination flag, and an info dictionary containing:  
     - `pnl_ret`: realized return  
     - `risk_pen`: risk penalty  
     - `tvr_pen`: turnover penalty  
     - `equity`: current portfolio value  

The environment progresses through each time step of intraday market data, enabling sequential decision-making under realistic market frictions.

Saved as a script in `portfolio_env.py`.

In [2]:
%%writefile src/env/portfolio_env.py
import numpy as np
import pandas as pd
from src.utils.costs import spread_proxy, realized_vol, participation, exec_price, turnover_l1


class PortfolioEnv:
    """
    A simulation environment for multi-asset portfolio trading with market frictions,
    transaction costs, and risk penalties.

    Attributes
    ----------
    prices : DataFrame
        Multi-indexed price panel containing OHLCV data for multiple tickers.
    freq_min : int
        Time step size in minutes.
    start_equity : float
        Initial portfolio equity.
    part_cap : float
        Maximum participation rate per trade.
    k : float
        Drift scaling coefficient in execution model.
    lam : float
        Market impact parameter.
    gamma_bar : float
        Annualized risk aversion, converted to per-minute scale internally.
    eta_turnover : float
        Turnover penalty weight.
    """

    def __init__(
        self,
        prices,
        freq_min=1,
        start_equity=1_000_000,
        part_cap=0.05,
        k=0.6,
        lam=2e-4,
        gamma_bar=12,
        eta_turnover=0.001
    ):
        """
        Initializes the environment with price data and trading parameters.
        Precomputes spread and volatility proxies used in transaction and risk modeling.
        """
        self.prices = prices
        self.close = prices.loc[:, pd.IndexSlice[:, "Close"]]
        self.high = prices.loc[:, pd.IndexSlice[:, "High"]]
        self.low = prices.loc[:, pd.IndexSlice[:, "Low"]]
        self.vol = prices.loc[:, pd.IndexSlice[:, "Volume"]]
        self.mid = self.close

        self.half_spread = spread_proxy(self.high, self.low)
        self.sigma = realized_vol(self.close, win=60)

        self.part_cap = part_cap
        self.k = k
        self.lam = lam
        self.freq_min = freq_min
        self.gamma = gamma_bar / (252 * 390)
        self.eta_turnover = eta_turnover
        self.start_equity = start_equity
        self.tickers = self.close.columns.get_level_values(0).unique()

        self.reset()

    def reset(self):
        """
        Resets the environment to its initial state.

        Returns
        -------
        dict
            Initial observation containing portfolio weights and equity.
        """
        self.t = 1
        self.equity = float(self.start_equity)
        self.w = np.zeros(len(self.tickers))
        self.q = np.zeros(len(self.tickers))
        self.cash = float(self.start_equity)
        return self._obs()

    def _obs(self):
        """
        Returns the current observable state of the environment.

        Returns
        -------
        dict
            Dictionary with current weights and equity.
        """
        return {"weights": self.w.copy(), "equity": float(self.equity)}

    def step(self, w_target):
        """
        Advances the environment by one step given a target weight allocation.

        Parameters
        ----------
        w_target : array-like
            Target portfolio weights for all tickers.

        Returns
        -------
        tuple
            observation : dict
                Updated weights and equity.
            reward : float
                Scalar reward value after applying penalties.
            done : bool
                Whether the simulation has reached the final time step.
            info : dict
                Diagnostic metrics including pnl, penalties, and equity.
        """
        w_target = np.array(w_target).flatten()
        w_target = np.clip(w_target, 0, 1)
        if w_target.sum() > 0:
            w_target /= w_target.sum()

        idx = self.close.index[self.t]
        idx_next = self.close.index[min(self.t + 1, len(self.close) - 1)]

        mid = np.nan_to_num(self.mid.loc[idx].values, nan=0.0, posinf=0.0, neginf=0.0)
        hs = np.nan_to_num(self.half_spread.loc[idx].values, nan=0.0, posinf=0.0, neginf=0.0)
        sg = np.nan_to_num(self.sigma.loc[idx].values, nan=1e-6, posinf=1e-6, neginf=1e-6)
        vol = np.nan_to_num(self.vol.loc[idx].values, nan=1.0, posinf=1.0, neginf=1.0)

        dollar_pos_now = self.q * mid
        port_now = float(self.cash + np.sum(dollar_pos_now))
        self.equity = max(port_now, 1.0)

        delta_w = w_target - self.w

        desired_notional = delta_w * self.equity
        part = participation(pd.Series(np.abs(desired_notional)), pd.Series(mid), pd.Series(vol)).values
        part = np.nan_to_num(np.clip(part, 0, self.part_cap), nan=0.0)
        feasible_notional = desired_notional * part
        signed_shares = np.nan_to_num(feasible_notional / np.maximum(mid, 1e-12))

        exec_px = np.array([
            exec_price(np.sign(dw), m, h, sgm, self.freq_min, p, self.k, self.lam)
            for dw, m, h, sgm, p in zip(delta_w, mid, hs, sg, part)
        ])
        exec_px = np.nan_to_num(exec_px, nan=mid, posinf=mid, neginf=mid)

        trade_cash_flow = float(np.sum(signed_shares * exec_px))
        self.cash -= trade_cash_flow
        self.q += signed_shares

        next_px = np.nan_to_num(self.mid.loc[idx_next].values, nan=mid, posinf=mid, neginf=mid)
        portfolio_value = float(self.cash + np.sum(self.q * next_px))
        self.equity = max(portfolio_value, 1.0)

        dollar_pos_next = self.q * next_px
        self.w = np.clip(np.nan_to_num(dollar_pos_next / np.maximum(self.equity, 1e-12)), 0, 1)

        var_diag = np.nan_to_num((self.sigma.loc[idx] ** 2).values, nan=0.0)
        risk_pen = 0.5 * self.gamma * float(np.dot(self.w, var_diag * self.w))

        prev_w = np.nan_to_num(dollar_pos_now / np.maximum(port_now, 1e-12))
        tvr = turnover_l1(self.w, prev_w)
        tvr_pen = self.eta_turnover * tvr

        prev_idx = self.close.index[self.t - 1]
        r_bar = np.nan_to_num(
            self.mid.loc[idx_next].values / np.maximum(self.mid.loc[prev_idx].values, 1e-12) - 1.0,
            nan=0.0,
        )
        pnl_ret = float(np.dot(self.w, r_bar))

        drift_pen = np.clip(1e-5 * np.square(delta_w).sum(), 0, 1e-3)

        raw = pnl_ret - risk_pen - tvr_pen - drift_pen
        reward = float(np.clip(np.tanh(raw * 20.0), -1.0, 1.0))

        info = {
            "pnl_ret": pnl_ret,
            "risk_pen": risk_pen,
            "tvr_pen": tvr_pen,
            "equity": float(self.equity)
        }

        self.t += 1
        done = self.t >= len(self.close) - 1

        return self._obs(), reward, done, info


Overwriting src/env/portfolio_env.py


### PortfolioGym Environment Wrapper

This cell defines the `PortfolioGym` class, a `gymnasium`-compatible wrapper around `PortfolioEnv`, allowing direct integration of the trading environment with reinforcement learning frameworks.

Key points:

1. Purpose  
   - Provides a standardized RL interface (`reset`, `step`, `render`) for portfolio management tasks.  
   - Enables agents to interact with financial data through continuous observation and action spaces.

2. Components  
   - `core`: an instance of `PortfolioEnv` that handles trading logic, execution costs, and rewards.  
   - `feature_df`: optional external features (e.g., macro indicators, technical signals) aligned with the same timestamps as the price data.  
   - `observation_space`: continuous vector combining equity, current portfolio weights, and optional features.  
   - `action_space`: continuous vector of target weights for each asset.

3. Workflow  
   - `reset()`: initializes the portfolio and returns the first observation.  
   - `step(action)`: executes trades, updates equity, computes reward, and returns the next state.  
   - `render()`: prints a short summary of the current timestep and portfolio equity.

4. Design  
   - Handles NaN and non-finite values gracefully for numerical stability.  
   - Normalizes feature vectors and clips extreme values to maintain RL training stability.  
   - Fully compliant with Gymnasium API, supporting integration with libraries such as `Stable-Baselines3`, `Ray RLlib`, or `CleanRL`.


In [3]:
%%writefile src/env/portfolio_gym.py
import numpy as np
import pandas as pd
import gymnasium as gym
from gymnasium import spaces
from src.env.portfolio_env import PortfolioEnv


class PortfolioGym(gym.Env):
    """
    Gymnasium-compatible wrapper around the PortfolioEnv environment.

    This class provides a standardized reinforcement learning interface for
    portfolio management tasks, enabling interaction through observations,
    actions, and rewards in compliance with Gymnasium API conventions.

    Attributes
    ----------
    core : PortfolioEnv
        Core environment handling portfolio mechanics, execution, and reward logic.
    feature_df : DataFrame or None
        Optional feature matrix aligned with the time index of the price data.
    tickers : list
        List of asset tickers included in the environment.
    n_assets : int
        Number of tradable assets.
    observation_space : gym.spaces.Box
        Continuous space containing equity, portfolio weights, and optional features.
    action_space : gym.spaces.Box
        Continuous space representing target portfolio weights.
    """

    metadata = {"render.modes": ["human"]}

    def __init__(self, prices, feature_df=None, start_equity=1_000_000):
        """
        Initialize the Gym-compatible portfolio environment.

        Parameters
        ----------
        prices : DataFrame
            Multi-indexed price panel containing OHLCV data.
        feature_df : DataFrame, optional
            External feature matrix to augment observations.
        start_equity : float, default=1_000_000
            Initial portfolio value.
        """
        super().__init__()
        self.core = PortfolioEnv(prices, start_equity=start_equity)
        self.feature_df = feature_df
        self.tickers = self.core.tickers
        self.n_assets = len(self.tickers)

        feat_dim = 0 if feature_df is None else feature_df.shape[1]
        obs_dim = 1 + self.n_assets + feat_dim
        self.observation_space = spaces.Box(low=-np.inf, high=np.inf, shape=(obs_dim,), dtype=np.float32)
        self.action_space = spaces.Box(low=0.0, high=1.0, shape=(self.n_assets,), dtype=np.float32)

    def reset(self, *, seed=None, options=None):
        """
        Reset the environment and return the initial observation.

        Parameters
        ----------
        seed : int, optional
            Random seed for reproducibility.
        options : dict, optional
            Additional reset options.

        Returns
        -------
        tuple
            observation : ndarray
                Normalized observation vector after reset.
            info : dict
                Auxiliary information (empty by default).
        """
        super().reset(seed=seed)
        obs_core = self.core.reset()
        obs = self._build_obs(obs_core)
        return obs.astype(np.float32), {}

    def _build_obs(self, core_obs):
        """
        Construct the observation vector from core environment data.

        Parameters
        ----------
        core_obs : dict
            Core observation dictionary containing weights and equity.

        Returns
        -------
        ndarray
            Flattened and normalized observation vector.
        """
        equity = float(core_obs["equity"])
        w = np.array(core_obs["weights"], dtype=np.float32)

        if self.feature_df is not None:
            t = min(self.core.t, len(self.feature_df) - 1)
            f = self.feature_df.iloc[t].to_numpy(dtype=np.float32)
            f = np.nan_to_num(f, nan=0.0, posinf=0.0, neginf=0.0)
            f = (f - np.mean(f)) / (np.std(f) + 1e-8)
            obs = np.concatenate(([equity], w, f))
        else:
            obs = np.concatenate(([equity], w))

        return np.nan_to_num(obs, nan=0.0, posinf=0.0, neginf=0.0)

    def step(self, action):
        """
        Advance the environment one step given an action.

        Parameters
        ----------
        action : ndarray
            Target portfolio weights to apply.

        Returns
        -------
        tuple
            observation : ndarray
                Updated observation vector.
            reward : float
                Scalar reward signal from the core environment.
            terminated : bool
                Whether the episode has reached its end.
            truncated : bool
                Whether the episode was truncated externally.
            info : dict
                Diagnostic and performance metrics.
        """
        obs_core, reward, done, info = self.core.step(action)

        if not np.isfinite(reward):
            print(f"[Warning] Non-finite reward at t={self.core.t}: {reward}")
            reward = 0.0

        obs = self._build_obs(obs_core)

        if np.isnan(obs).any():
            print(f"[NaN detected in obs] t={self.core.t}, replacing with zeros.")
            obs = np.nan_to_num(obs, nan=0.0, posinf=0.0, neginf=0.0)

        if np.isnan(reward):
            reward = 0.0

        obs = np.clip(obs, -1e6, 1e6)
        reward = float(np.clip(reward, -1e3, 1e3))

        terminated = bool(done)
        truncated = False

        return obs.astype(np.float32), reward, terminated, truncated, info

    def render(self):
        """
        Print a simple textual representation of the current simulation state.

        Displays the current step index and portfolio equity for quick debugging.
        """
        print(f"Step {self.core.t}, Equity {self.core.equity:.2f}")


Overwriting src/env/portfolio_gym.py


### Feature Engineering Utilities

This cell defines helper functions for constructing engineered features from raw OHLCV data, both at the single-ticker and multi-ticker level.

Functions:

1. `add_features(df)`  
   Enriches a single-ticker dataframe with common statistical and technical indicators.  
   Expects columns `['Open', 'High', 'Low', 'Close', 'Volume']`.  
   Generated features include:
   - `returns`: percentage change in closing price  
   - `volatility_20`: 20-period rolling standard deviation of returns  
   - `momentum_5`, `momentum_20`: short and medium-term price momentum  
   - `rsi`: 14-period Relative Strength Index computed from average gains and losses  
   - `macd`: Moving Average Convergence Divergence (`EMA12 - EMA26`)  
   - `zscore`: rolling z-score of closing price relative to a 20-period mean and standard deviation  
   - `vol_change`: percentage change in trading volume  
   The function returns a dataframe with NaN values dropped.

2. `build_all_features(panel)`  
   Applies `add_features` to each ticker in a multi-indexed price panel and concatenates results.  
   The returned dataframe uses a two-level column MultiIndex in the format `(Ticker, Feature)`, suitable for multi-asset feature modeling or machine learning pipelines.


In [31]:
%%writefile src/utils/features.py

import pandas as pd
import numpy as np

def add_features(df: pd.DataFrame) -> pd.DataFrame:
    """
    Enriches a single-ticker price dataframe with a set of standard technical indicators.
    Assumes columns: ['Open', 'High', 'Low', 'Close', 'Volume'].
    """
    df = df.copy()

    df["returns"] = df["Close"].pct_change()
    df["volatility_20"] = df["returns"].rolling(20).std()

    df["momentum_5"] = df["Close"].pct_change(5)
    df["momentum_20"] = df["Close"].pct_change(20)

    delta = df["Close"].diff()
    gain = (delta.where(delta > 0, 0)).rolling(14).mean()
    loss = (-delta.where(delta < 0, 0)).rolling(14).mean()
    rs = gain / (loss + 1e-9)
    df["rsi"] = 100 - (100 / (1 + rs))

    ema12 = df["Close"].ewm(span=12, adjust=False).mean()
    ema26 = df["Close"].ewm(span=26, adjust=False).mean()
    df["macd"] = ema12 - ema26

    df["zscore"] = (df["Close"] - df["Close"].rolling(20).mean()) / (df["Close"].rolling(20).std() + 1e-9)

    df["vol_change"] = df["Volume"].pct_change()

    return df.dropna()

def build_all_features(panel):
    """
    Applies the add_features function to each ticker in a multi-index price panel.
    Returns a feature panel with a two-level column index: (Ticker, Feature).
    """
    tickers = panel.columns.get_level_values(0).unique()
    feat_list = []

    
    for t in tickers:
        df_t = panel[t].copy()
        df_feat = add_features(df_t)
        feat_list.append(df_feat)

    
    feat = pd.concat(feat_list, axis=1, keys=tickers)
    return feat


Overwriting src/utils/features.py
