Fantastic — your answers make it clear: we're building a **free-thinking, risk-aware, market-grounded learning trader**, with the potential to become not just profitable, but deeply **adaptive** and **introspective**. This will be like training a real human trader, starting simple and layering on cognition and tools.

---

## ✅ Summary of Your Vision

| Category              | Your Decision                                                          |
| --------------------- | ---------------------------------------------------------------------- |
| **Self-Awareness**    | Track PnL, trade history, confidence per trade                         |
| **Market Context**    | Allow multi-timeframe (1D, 1W, maybe 1H later), and any usable feature |
| **External Signals**  | Include events like earnings/fed/news if available                     |
| **Risk Management**   | Wants liquidation/capital erosion + learned position sizing (v2+)      |
| **Strategy Modeling** | Enable strategy playbooks and adaptive behavior                        |
| **Meta-Learning**     | Agent should retain memory of past conditions, learn from meta-signals |
| **Limitations**       | No peeking into future — only prediction from available past           |

---

## 🎯 Now Here's the Plan: "The Trader Intelligence Stack"

We'll organize this into **four layers** that build on each other. Each layer adds trader-like qualities and improves survivability and strategy creation.

---

### **🔹 Layer 1: Survival & Orientation (v1)**

> Minimal working agent that can hold/sell one stock, one timeframe, rewarded by position-based score.

**Inputs:**

* OHLCV (daily)
* Agent’s current position
* Time since position opened
* Estimated profit/loss if selling now

**Internal features:**

* Current PnL (unrealized)
* Position duration
* Action history (last N actions — optional at this stage)

**Reward:**

* Oracle-relative reward between 0–100 per episode (✅ already implemented)

**Goal:** Learn to enter/exit positions intelligently on one stock.

---

### **🔹 Layer 2: Market Perception & Meta-Features**

> Now the agent *reads the environment*, and we open it to *multi-feature* inputs.

**Additions:**

* Volatility, momentum, kurtosis, entropy, regime label, VIX, etc.
* Optional: add price features from 3-day, 1-week trailing windows

**Goal:** Learn to recognize **conditions** that precede profitable trends.

---

### **🔹 Layer 3: Portfolio & Risk Awareness**

> The agent now becomes a risk-aware trader.

**Additions:**

* Realized volatility, trailing drawdown
* Simulated liquidation: episode ends if capital drops below X%
* Optional: reward penalty for big drawdowns

**Later upgrade:**

* Learn dynamic position sizing (0%, 25%, 50%, 100%) or continuous size

**Goal:** Survive, control risk, avoid death by bad trades.

---

### **🔹 Layer 4: Strategic Thinking & Memory**

> Agent becomes *introspective* and *adaptive* — career-trader-level.

**Additions:**

* Confidence score (learned or predicted)
* Episodic memory (compare current conditions to prior wins/losses)
* Strategy archetype detection (trend following, mean reversion, etc.)
* Meta-reward: evaluate *how well the agent acted*, not just profit

**Goal:** Develop strategic behavior that generalizes to new situations.

---

### **🔹 Layer 5: Real World Awareness (Optional Later)**

> External signals from scraping/news pipelines and macro indicators.

* Earnings day, Fed announcement, news sentiment
* Sector rotation features (sector-relative strength)
* Macro ETFs (SPY, QQQ, TLT, etc.)

---

## ✅ Next Step: We Start at Layer 1

Let’s:

1. **Solidify internal features**: PnL, duration, trade history, position status.
2. **Wrap it into the `PositionTradingEnv`** — this becomes part of the observation.
3. **Optionally**: Add rolling average reward, confidence, or simple position score.

---

### 🔧 Can I proceed to implement a Layer 1 `env.get_observation()` that includes:

* Agent's current position (0 = flat, 1 = long)
* Time since entry
* Unrealized PnL (oracle-relative)
* Normalized current price vs. entry price
* One-hot day-of-week (already done)
* Rolling average return over past N days
* Optional: last 3 actions (as one-hot vectors)

?

Once this is done, we’ll run the first agent and start logging learnability under **"Survival Mode"**.

Ready to code this?


In [1]:
import jupyter

In [2]:
import random
import numpy as np
import pandas as pd
import seaborn as sns
import gymnasium as gym
import matplotlib.pyplot as plt


from src.utils.system import boot
from src.data.feature_pipeline import load_base_dataframe
from experiments import check_if_experiment_exists, register_experiment ,experiment_hash
from environments import PositionTradingEnv,PositionTradingEnvV1

# ========== SYSTEM BOOT ==========
DEVICE = boot()
EXPERIMENT_NAME = "trading_environment_development"
DEFAULT_PATH = "data/experiments/" + EXPERIMENT_NAME

# ========== CONFIG ==========
TICKER = "AAPL"
TIMESTEPS = 10_000
EVAL_EPISODES = 5
N_TIMESTEPS = 60
LOOKBACK = 0
SEEDS = [42, 52, 62]
MARKET_FEATURES = ['close']
BENCHMARK_PATH = DEFAULT_PATH+"/benchmark_episodes.json"
CHECKPOINT_DIR = DEFAULT_PATH+"/checkpoints"
SCORES_DIR = DEFAULT_PATH+"/scores"
META_PATH = DEFAULT_PATH+"/meta_df.csv"

MARKET_FEATURES.sort()
SEEDS.sort()

DEVICE = boot()
OHLCV_DF = load_base_dataframe()

  from pandas.core import (


In [12]:
e.reset()
print('price', e.prices[e.step_idx])
a,b,c,d,_ =e.step(1)
print('price',e.prices[e.step_idx], 'reward',b)
a,b,c,d,_ =e.step(1)
print('price',e.prices[e.step_idx], 'reward',b)
a,b,c,d,_ =e.step(1)
print('price',e.prices[e.step_idx], 'reward',b)


price 172.19
price 175.08 reward 0.03755481697733883
price 175.53 reward 0.000910531535531262
price 172.19 reward -0.05016062023591644


In [13]:
e.reset()
print(e.prices[e.step_idx])
a,b,c,d,_ =e.step(0)
print(b, e.prices[e.step_idx])
a,b,c,d,_ =e.step(0)
print(b,e.prices[e.step_idx])
a,b,c,d,_ =e.step(0)
print(b,e.prices[e.step_idx])

172.19
-0.03755481697733883 175.08
-0.000910531535531262 175.53
0.05016062023591644 172.19


In [9]:
e = PositionTradingEnvV1(OHLCV_DF[OHLCV_DF['symbol']==TICKER], ticker=TICKER, seed=42, start_idx=4)
print("Sum of raw rel returns:", np.sum([
    abs((e.prices[i + 1] - e.prices[i]) / e.prices[i])
    for i in range(len(e.prices) - 1)
]))

print("Sum of normalized weights:", np.sum(e.step_weights))  # This should be 1.0

Sum of raw rel returns: 0.8977696292553151
Sum of normalized weights: 0.9999999999999999


In [None]:
e.

In [10]:
.2* np.sign(10.02)

0.2

In [21]:
result_df = pd.read_csv(DEFAULT_PATH+"/meta_df_transfer.csv")
result_df.groupby(['env_version','agent_name']).mean(numeric_only=True).T

env_version,v0,v0,v1,v1
agent_name,A2C,PPO,A2C,PPO
mean_return,0.0005378421,0.0005378421,0.0005378421,0.0005378421
median_return,0.001570828,0.001570828,0.001570828,0.001570828
std_return,0.01524706,0.01524706,0.01524706,0.01524706
skew_return,-0.3313888,-0.3313888,-0.3313888,-0.3313888
kurtosis_return,1.625025,1.625025,1.625025,1.625025
return_trend,2.421621e-05,2.421621e-05,2.421621e-05,2.421621e-05
ewm_mean_return,0.001588245,0.001588245,0.001588245,0.001588245
hurst,0.4965619,0.4965619,0.4965619,0.4965619
adf_stat,-1.457884,-1.457884,-1.457884,-1.457884
adf_pval,0.5049387,0.5049387,0.5049387,0.5049387


In [36]:
import json
best_transferable = result_df.sort_values(by="transfer_delta",ascending=False).iloc[0]
best_transferable_config = json.loads(best_transferable['config'])

{'ticker': 'AAPL',
 'train_idx': 355,
 'test_idx': 475,
 'timesteps': 100000,
 'episode_steps': 120,
 'env_version': 'v1',
 'env_config': {'market_features': ['close']},
 'agent_config': None}

In [55]:
best_transferable2 = result_df.sort_values(by="transfer_delta",ascending=False).iloc[10]
best_transferable_config2 = json.loads(best_transferable2['config'])
best_transferable_config2

{'ticker': 'AAPL',
 'train_idx': 360,
 'test_idx': 480,
 'timesteps': 100000,
 'episode_steps': 120,
 'env_version': 'v1',
 'env_config': {'market_features': ['close']},
 'agent_config': None}

Here’s a full breakdown of each column in your benchmark dataframe and what it represents:

---

## 📊 **DataFrame Column Meanings**

| Column                | Type  | Description                                               |
| --------------------- | ----- | --------------------------------------------------------- |
| **mean\_return**      | float | Mean of 1-day returns over the episode (market baseline)  |
| **median\_return**    | float | Median daily return over the episode                      |
| **std\_return**       | float | Standard deviation of returns (volatility estimate)       |
| **skew\_return**      | float | Skewness of returns (positive = long tail on right)       |
| **kurtosis\_return**  | float | Kurtosis of returns (tail risk/heaviness)                 |
| **return\_trend**     | float | Slope of a linear trend fitted to the return series       |
| **ewm\_mean\_return** | float | Exponentially weighted moving average of returns          |
| **hurst**             | float | Hurst exponent (predictability/mean-reversion: <0.5)      |
| **adf\_stat**         | float | Augmented Dickey-Fuller test statistic (stationarity)     |
| **adf\_pval**         | float | ADF test p-value (p < 0.05 = likely stationary)           |
| **entropy**           | float | Information entropy of the return series (chaotic = high) |

---

| Column                | Type   | Description                                          |
| --------------------- | ------ | ---------------------------------------------------- |
| **config\_hash**      | object | Unique ID for the agent + env config used            |
| **env\_version**      | object | Environment version (`v0`, `v1`, etc.)               |
| **agent\_name**       | object | Agent type used (e.g. `PPO`, `A2C`)                  |
| **score\_train**      | float  | Total normalized score (0–100) on training episode   |
| **score\_test**       | float  | Same as above, on test episode                       |
| **advantage\_train**  | float  | PPO score − random score on train                    |
| **advantage\_test**   | float  | PPO score − random score on test                     |
| **transfer\_delta**   | float  | `score_test − score_train` (generalization quality)  |
| **transfer\_success** | int    | 1 if advantage\_test > 0 (agent beat random), else 0 |

---

| Column             | Type   | Description                                |
| ------------------ | ------ | ------------------------------------------ |
| **ticker**         | object | Stock symbol (e.g., AAPL, MSFT)            |
| **seed**           | int    | Random seed for episode sampling           |
| **train\_idx**     | int    | Index of training episode start            |
| **test\_idx**      | int    | Index of test episode start                |
| **timesteps**      | int    | Number of agent training steps             |
| **episode\_steps** | int    | Number of environment steps in the episode |

---

| Column            | Type  | Description                        |
| ----------------- | ----- | ---------------------------------- |
| **volatility**    | float | Std. dev. of episode price returns |
| **max\_drawdown** | float | Max % drop from a peak to trough   |
| **sharpe**        | float | Return / Volatility                |
| **sortino**       | float | Return / Downside Volatility       |
| **calmar**        | float | Return / Max Drawdown              |

---

| Column                           | Type   | Description                                                                            |
| -------------------------------- | ------ | -------------------------------------------------------------------------------------- |
| **success\_trades**              | int    | Number of profitable trades in the episode                                             |
| **action\_hold\_ratio**          | float  | % of actions that were "wait" (0)                                                      |
| **action\_long\_ratio**          | float  | % of actions that were "buy/sell" (1)                                                  |
| **agent\_config**                | object | (currently empty) placeholder for agent config JSON                                    |
| **env\_config.market\_features** | object | List of features used by the environment (e.g., `['close', 'volatility', 'momentum']`) |

---

## 🧠 Bonus Insights

* **score\_* metrics*\* are aligned with your `0–100` normalized reward function.
* **advantage\_* metrics*\* let you evaluate whether your agent truly outperforms a random strategy.
* **transfer\_delta** is key for assessing **overfitting vs. generalization**.
* **entropy, hurst, adf** are chaos/statistical meta-features — great for predicting when RL will perform well.
* **success\_trades** and **action ratios** tell you about the agent’s behavioral strategy.

---

Would you like a Markdown/CSV glossary of these definitions saved to disk or added to your project docs?


In [58]:
result_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 200 entries, 0 to 199
Data columns (total 35 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   mean_return        200 non-null    float64
 1   median_return      200 non-null    float64
 2   std_return         200 non-null    float64
 3   skew_return        200 non-null    float64
 4   kurtosis_return    200 non-null    float64
 5   return_trend       200 non-null    float64
 6   ewm_mean_return    200 non-null    float64
 7   hurst              200 non-null    float64
 8   adf_stat           200 non-null    float64
 9   adf_pval           200 non-null    float64
 10  entropy            200 non-null    float64
 11  config_hash        200 non-null    object 
 12  env_version        200 non-null    object 
 13  agent_name         200 non-null    object 
 14  score_train        200 non-null    float64
 15  score_test         200 non-null    float64
 16  advantage_train    200 non

In [68]:


class TradingEnvironmentBenchmark:
    benchmark_path=DEFAULT_PATH+"/benchmark_episodes.json"
    result_path=DEFAULT_PATH+"/meta_df_transfer.csv"
    checkpoint_dir=DEFAULT_PATH+"/checkpoints"
    
    def __init__(self):
        self._boot()
        
    def _boot(self):
        self._load_results()
        self._load_ohlcv()
        
    def _load_results(self):
        # Get results
        result_df = pd.read_csv(self.result_path)
        
        # Parse configuration
        result_df['config_dict'] = result_df['config'].apply(json.loads)
        
        # Normalize the JSON dictionaries into a DataFrame
        config_expanded_df = pd.json_normalize(result_df['config_dict'])
        
        # Remove overlapping columns
        overlapping_cols = set(result_df.columns).intersection(config_expanded_df.columns)
        config_expanded_df = config_expanded_df.drop(columns=overlapping_cols)
        
        # Join the expanded config to the original DataFrame
        result_df_expanded = pd.concat([
            result_df.drop(columns=['config', 'config_dict']),
            config_expanded_df
        ], axis=1)
        self.result_df = result_df_expanded
        
        
    def _load_ohlcv(self):
        self.ohlcv_df = OHLCV_DF.copy()
        
        

In [69]:
bm = TradingEnvironmentBenchmark()


In [74]:
bm.result_df.groupby(['train_idx']).mean(numeric_only=True).sort_values(by="advantage_test",ascending=False)

Unnamed: 0_level_0,mean_return,median_return,std_return,skew_return,kurtosis_return,return_trend,ewm_mean_return,hurst,adf_stat,adf_pval,...,timesteps,episode_steps,volatility,max_drawdown,sharpe,sortino,calmar,success_trades,action_hold_ratio,action_long_ratio
train_idx,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
615,0.001054,0.002034,0.013287,-0.611195,1.119947,1.6e-05,0.005045,0.339827,-1.703688,0.429213,...,100000.0,120.0,0.015863,-0.140452,-0.387645,-0.533045,-0.002758,28.48,0.508475,0.491525
355,0.000607,0.00216,0.01218,-0.82875,1.622443,1.6e-05,0.004138,0.356515,-1.731875,0.414769,...,100000.0,120.0,0.011093,-0.085458,-1.030301,-1.678216,-0.008425,28.5,0.496949,0.503051
360,0.00035,0.001797,0.012033,-0.827952,1.739689,2e-05,0.000996,0.541022,-1.731129,0.41515,...,100000.0,120.0,0.011304,-0.085559,-1.026879,-1.731934,-0.008546,29.06,0.501356,0.498644
71,-0.0008,0.0,0.022876,-0.365616,-0.173244,1.4e-05,-0.001585,0.505479,-1.90652,0.329,...,100000.0,120.0,0.025535,-0.190702,-0.610527,-1.267244,-0.00515,28.88,0.512542,0.487458
528,0.001479,0.001864,0.01586,0.976569,3.816292,5.5e-05,-0.000652,0.739966,-0.216208,0.936562,...,100000.0,120.0,0.01254,-0.058619,2.294734,3.132475,0.030924,28.62,0.507458,0.492542


In [75]:
bm.result_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 250 entries, 0 to 249
Data columns (total 36 columns):
 #   Column                      Non-Null Count  Dtype  
---  ------                      --------------  -----  
 0   mean_return                 250 non-null    float64
 1   median_return               250 non-null    float64
 2   std_return                  250 non-null    float64
 3   skew_return                 250 non-null    float64
 4   kurtosis_return             250 non-null    float64
 5   return_trend                250 non-null    float64
 6   ewm_mean_return             250 non-null    float64
 7   hurst                       250 non-null    float64
 8   adf_stat                    250 non-null    float64
 9   adf_pval                    250 non-null    float64
 10  entropy                     250 non-null    float64
 11  config_hash                 250 non-null    object 
 12  env_version                 250 non-null    object 
 13  agent_name                  250 non

In [None]:
import numpy as np
import pandas as pd
from scipy.stats import pearsonr, spearmanr
import matplotlib.pyplot as plt
import seaborn as sns
import warnings

warnings.filterwarnings("ignore")

def analyze_trade_level_alignment(env):
    """
    Analyzes the alignment between rewards and wallet growth on a per-trade basis.
    env must have:
        - env.rewards (cumulative reward per step)
        - env.wallet_progress (wallet value per step)
        - env.actions (0=hold, 1=buy/sell depending on current position)
    """

    rewards = np.array(env.rewards)
    wallet_progress = np.array(env.wallet_progress)
    actions = np.array(env.actions)

    trade_rewards = []
    trade_returns = []

    curr_trade_reward = 0
    prev_wallet = wallet_progress[0]
    in_trade = False

    for t in range(1, len(actions)):
        curr_action = actions[t]
        prev_action = actions[t - 1]
        curr_reward = rewards[t] - rewards[t - 1]  # delta reward

        if prev_action == 0 and curr_action == 1:
            # Entering a trade
            in_trade = True
            curr_trade_reward = 0

        if in_trade:
            curr_trade_reward += curr_reward

        if prev_action == 1 and curr_action == 0:
            # Exiting a trade
            in_trade = False
            wallet_return = wallet_progress[t] - prev_wallet
            trade_rewards.append(curr_trade_reward)
            trade_returns.append(wallet_return)
            prev_wallet = wallet_progress[t]

    trade_rewards = np.array(trade_rewards)
    trade_returns = np.array(trade_returns)

    # Basic stats
    if len(trade_rewards) > 0 and len(trade_returns) > 0:
        pearson_corr, _ = pearsonr(trade_rewards, trade_returns)
        spearman_corr, _ = spearmanr(trade_rewards, trade_returns)
    else:
        pearson_corr, spearman_corr = np.nan, np.nan

    # Plot
    plt.figure(figsize=(8, 5))
    sns.scatterplot(x=trade_rewards, y=trade_returns)
    plt.xlabel("Total Reward per Trade")
    plt.ylabel("Wallet Return per Trade")
    plt.title("Reward vs Wallet Return (Per Trade)")
    plt.grid(True)
    plt.axhline(0, linestyle="--", color="gray")
    plt.axvline(0, linestyle="--", color="gray")
    plt.show()

    return {
        "num_trades": len(trade_rewards),
        "pearson_corr": pearson_corr,
        "spearman_corr": spearman_corr,
        "avg_reward": np.mean(trade_rewards) if len(trade_rewards) else 0,
        "avg_return": np.mean(trade_returns) if len(trade_returns) else 0,
        "trade_rewards": trade_rewards,
        "trade_returns": trade_returns
    }


In [77]:
import os
import json
import pandas as pd
import numpy as np
from sklearn.preprocessing import MinMaxScaler
from sklearn.ensemble import RandomForestRegressor

class TradingEnvironmentBenchmark:
    def __init__(self, 
                 benchmark_path="data/experiments/trading_environment_development/benchmark_episodes.json",
                 result_path="data/experiments/trading_environment_development/meta_df_transfer.csv",
                 checkpoint_dir="data/experiments/trading_environment_development/checkpoints",
                 ohlcv_df=None):
        self.benchmark_path = benchmark_path
        self.result_path = result_path
        self.checkpoint_dir = checkpoint_dir
        self.ohlcv_df = c
        self._boot()

    def _boot(self):
        self._load_results()
        self._load_ohlcv()

    def _load_results(self):
        result_df = pd.read_csv(self.result_path)
        result_df['config_dict'] = result_df['config'].apply(json.loads)
        config_expanded_df = pd.json_normalize(result_df['config_dict'])
        overlapping_cols = set(result_df.columns).intersection(config_expanded_df.columns)
        config_expanded_df = config_expanded_df.drop(columns=overlapping_cols)
        result_df_expanded = pd.concat([
            result_df.drop(columns=['config', 'config_dict']),
            config_expanded_df
        ], axis=1)
        self.result_df = result_df_expanded
        self._compute_scores()

    def _load_ohlcv(self):
        if self.ohlcv_df is None:
            raise ValueError("ohlcv_df must be provided externally")

    def _compute_scores(self):
        df = self.result_df.copy()
        df['learnability_score'] = 100 * self._sigmoid(df['advantage_train'])
        df['transferability_score'] = 100 * self._sigmoid(df['advantage_test']) * (1 - abs(df['transfer_delta'])/100)
        df['difficulty_score'] = 100 - df['learnability_score']
        self.result_df = df

    def _sigmoid(self, x, scale=1):
        return 1 / (1 + np.exp(-x / scale))

    def rank_episode_learnability(self):
        return self.result_df.sort_values(by="learnability_score", ascending=False)

    def rank_episode_transferability(self):
        return self.result_df.sort_values(by="transferability_score", ascending=False)

    def get_episodes_report(self):
        return self.result_df[['ticker', 'train_idx', 'test_idx', 'env_version', 'agent_name',
                               'learnability_score', 'transferability_score', 'difficulty_score']]

    def rank_environment_performance(self):
        return self.result_df.groupby('env_version')[['learnability_score', 'transferability_score']].mean().sort_values(by="learnability_score", ascending=False)

    def load_environments_by_performance(self):
        return self.result_df.groupby('env_version')

    def get_environments_report(self):
        return self.rank_environment_performance()

    def rank_agent_performance(self):
        return self.result_df.groupby('agent_name')[['learnability_score', 'transferability_score']].mean().sort_values(by="learnability_score", ascending=False)

    def load_agents_by_performance(self):
        return self.result_df.groupby('agent_name')

    def get_agents_report(self):
        return self.rank_agent_performance()

    def train_meta_models(self):
        meta_cols = ['entropy', 'volatility', 'hurst', 'adf_pval', 'mean_return', 'return_trend', 'kurtosis_return', 'skew_return', 'max_drawdown']
        df = self.result_df.dropna(subset=meta_cols)
        X = df[meta_cols]

        self.learn_model = RandomForestRegressor(random_state=42).fit(X, df['learnability_score'])
        self.transfer_model = RandomForestRegressor(random_state=42).fit(X, df['transferability_score'])

    def predict_episode_success(self, episode_features: pd.DataFrame):
        if not hasattr(self, 'learn_model') or not hasattr(self, 'transfer_model'):
            raise ValueError("Meta models not trained. Call train_meta_models() first.")

        learn_pred = self.learn_model.predict(episode_features)
        transfer_pred = self.transfer_model.predict(episode_features)

        return pd.DataFrame({
            'predicted_learnability': learn_pred,
            'predicted_transferability': transfer_pred
        }, index=episode_features.index)

    def filter_promising_episodes(self, learn_thresh=60, transfer_thresh=60):
        predictions = self.predict_episode_success(self.result_df[[
            'entropy', 'volatility', 'hurst', 'adf_pval', 'mean_return', 'return_trend', 'kurtosis_return', 'skew_return', 'max_drawdown'
        ]])
        mask = (predictions['predicted_learnability'] >= learn_thresh) & (predictions['predicted_transferability'] >= transfer_thresh)
        return self.result_df[mask].copy()


In [79]:
bm = TradingEnvironmentBenchmark(ohlcv_df=OHLCV_DF.copy())
bm.train_meta_models()

In [81]:
bm.result_df[['entropy', 'volatility', 'hurst', 'adf_pval', 'mean_return', 'return_trend', 'kurtosis_return', 'skew_return', 'max_drawdown']]

Unnamed: 0,entropy,volatility,hurst,adf_pval,mean_return,return_trend,kurtosis_return,skew_return,max_drawdown
0,0.999793,0.015863,0.339827,0.429213,0.001054,0.000016,1.119947,-0.611195,-0.140452
1,0.998134,0.011304,0.541022,0.415150,0.000350,0.000020,1.739689,-0.827952,-0.085559
2,0.994813,0.012540,0.739966,0.936562,0.001479,0.000055,3.816292,0.976569,-0.058619
3,0.994813,0.025535,0.505479,0.329000,-0.000800,0.000014,-0.173244,-0.365616,-0.190702
4,0.983149,0.011093,0.356515,0.414769,0.000607,0.000016,1.622443,-0.828750,-0.085458
...,...,...,...,...,...,...,...,...,...
245,0.994813,0.015863,0.339827,0.429213,0.001054,0.000016,1.119947,-0.611195,-0.140452
246,0.994813,0.011304,0.541022,0.415150,0.000350,0.000020,1.739689,-0.827952,-0.085559
247,0.964690,0.012540,0.739966,0.936562,0.001479,0.000055,3.816292,0.976569,-0.058619
248,0.998134,0.025535,0.505479,0.329000,-0.000800,0.000014,-0.173244,-0.365616,-0.190702


In [82]:
bm.result_df['train_idx'].unique()

array([615, 360, 528,  71, 355], dtype=int64)

In [None]:
predict_episode_success

In [85]:
bm.result_df[bm.result_df['train_idx']==615].groupby('train_idx').mean(numeric_only=True)

Unnamed: 0_level_0,mean_return,median_return,std_return,skew_return,kurtosis_return,return_trend,ewm_mean_return,hurst,adf_stat,adf_pval,...,max_drawdown,sharpe,sortino,calmar,success_trades,action_hold_ratio,action_long_ratio,learnability_score,transferability_score,difficulty_score
train_idx,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
615,0.001054,0.002034,0.013287,-0.611195,1.119947,1.6e-05,0.005045,0.339827,-1.703688,0.429213,...,-0.140452,-0.387645,-0.533045,-0.002758,28.48,0.508475,0.491525,73.342748,56.261322,26.657252


In [87]:
bm.predict_episode_success(bm.result_df[bm.result_df['train_idx']==615].groupby('train_idx').mean(numeric_only=True)[['entropy', 'volatility', 'hurst', 'adf_pval', 'mean_return', 'return_trend', 'kurtosis_return', 'skew_return', 'max_drawdown']])

Unnamed: 0_level_0,predicted_learnability,predicted_transferability
train_idx,Unnamed: 1_level_1,Unnamed: 2_level_1
615,85.816943,46.56071


In [88]:
bm.get_episodes_report()

Unnamed: 0,ticker,train_idx,test_idx,env_version,agent_name,learnability_score,transferability_score,difficulty_score
0,AAPL,615,735,v0,PPO,62.397848,55.632252,37.602152
1,AAPL,360,480,v0,PPO,66.549706,68.023764,33.450294
2,AAPL,528,648,v0,PPO,27.523316,57.981017,72.476684
3,AAPL,71,191,v0,PPO,53.420696,45.924451,46.579304
4,AAPL,355,475,v0,PPO,24.665842,73.202938,75.334158
...,...,...,...,...,...,...,...,...
245,AAPL,615,735,v2,PPO,96.590628,56.328982,3.409372
246,AAPL,360,480,v2,PPO,84.752934,12.775701,15.247066
247,AAPL,528,648,v2,PPO,99.340476,41.610513,0.659524
248,AAPL,71,191,v2,PPO,86.645398,58.018011,13.354602


In [95]:
bm.result_df.corr(numeric_only=True)['transferability_score'].sort_values(ascending=False)

transferability_score    1.000000
advantage_test           0.995006
score_test               0.536169
transfer_delta           0.386415
ewm_mean_return          0.233998
train_idx                0.142617
test_idx                 0.142617
median_return            0.127622
difficulty_score         0.116357
mean_return              0.090071
action_hold_ratio        0.048027
transfer_success         0.021600
success_trades          -0.001488
seed                    -0.022267
max_drawdown            -0.030415
entropy                 -0.038068
volatility              -0.045747
kurtosis_return         -0.047032
action_long_ratio       -0.048027
sortino                 -0.059369
adf_pval                -0.073045
sharpe                  -0.075672
calmar                  -0.080430
adf_stat                -0.080943
return_trend            -0.102098
skew_return             -0.106448
advantage_train         -0.112502
learnability_score      -0.116357
std_return              -0.118849
hurst         

In [99]:
# Re-run after kernel reset
import os
import json
import joblib
import pandas as pd
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report, roc_auc_score
from sklearn.model_selection import train_test_split

class TransferabilityPredictor:
    def __init__(self, meta_path, model_path="transfer_model.pkl", threshold=50):
        self.meta_path = meta_path
        self.model_path = model_path
        self.threshold = threshold
        self.df = None
        self.model = None
        self.feature_cols = []
        self._load_meta_df()
    
    def _load_meta_df(self):
        self.df = pd.read_csv(self.meta_path)
        self.df['transferability_score'] = self.df['advantage_test']  # Fallback if transferability_score not present
        self.df['transferable'] = (self.df['transferability_score'] > self.threshold).astype(int)
        self.feature_cols = [
            'mean_return', 'median_return', 'std_return', 'skew_return',
            'kurtosis_return', 'return_trend', 'ewm_mean_return', 'hurst',
            'adf_stat', 'adf_pval', 'entropy', 'volatility', 'max_drawdown',
            'sharpe', 'sortino', 'calmar', 'success_trades',
            'action_hold_ratio', 'action_long_ratio'
        ]
        self.df = self.df.dropna(subset=self.feature_cols + ['transferable'])

    def train_model(self):
        X = self.df[self.feature_cols]
        y = self.df['transferable']
        X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2, random_state=42)

        self.model = RandomForestClassifier(n_estimators=100, random_state=42)
        self.model.fit(X_train, y_train)

        y_pred = self.model.predict(X_val)
        y_prob = self.model.predict_proba(X_val)#[:, 1]

        print("Classification Report:\n", classification_report(y_val, y_pred))
        print("ROC AUC Score:", roc_auc_score(y_val, y_prob))

        joblib.dump(self.model, self.model_path)

    def predict_transferability(self, new_meta_df):
        if self.model is None:
            self.model = joblib.load(self.model_path)
        X_new = new_meta_df[self.feature_cols]
        new_meta_df["transferable_pred"] = self.model.predict(X_new)
        new_meta_df["transfer_proba"] = self.model.predict_proba(X_new)[:, 1]
        return new_meta_df

    def rank_episodes_by_transferability(self):
        return self.df.sort_values(by="transferability_score", ascending=False)[
            ['ticker', 'train_idx', 'test_idx', 'transferability_score']
        ]

# Execute
predictor = TransferabilityPredictor(meta_path="data/experiments/trading_environment_development/meta_df_transfer.csv")
predictor.train_model()
top_episodes = predictor.rank_episodes_by_transferability()
import ace_tools_open as tools; tools.display_dataframe_to_user(name="Ranked Transferable Episodes", dataframe=top_episodes)


Classification Report:
               precision    recall  f1-score   support

           0       1.00      1.00      1.00        50

    accuracy                           1.00        50
   macro avg       1.00      1.00      1.00        50
weighted avg       1.00      1.00      1.00        50

ROC AUC Score: nan




Ranked Transferable Episodes


0
Loading ITables v2.4.0 from the internet...  (need help?)


In [109]:
predictor.df['transferability_score'].describe()

count    250.000000
mean      -0.050613
std        0.874191
min       -2.044696
25%       -0.645093
50%       -0.056160
75%        0.542423
max        2.735840
Name: transferability_score, dtype: float64

In [105]:
predictor.df['transferable']

0      0
1      0
2      0
3      0
4      0
      ..
245    0
246    0
247    0
248    0
249    0
Name: transferable, Length: 250, dtype: int32

In [None]:
class TradingEnvironmentBattleground:
    """
    * Will train different environments with different agents in different episodes
    * Will log their performance after a predefined number of steps to understand later the episode and environment learnability and transferability metrics
    * Will create benchmark episodes so every agent and environment is trained and tested on the same conditions
    
    """
    def __init__(self):
        pass
    
    def compare_environments(self):
        pass
    
    def evaluate(self): 
        #formalized_transferability_evaluation
        pass

    def _compute_additional_metrics(self):
        pass
    
    def load_benchmark_episodes(self):
        pass
        #if os.path.exists(BENCHMARK_PATH):
        #with open(BENCHMARK_PATH) as f:
        #    benchmark_episodes = json.load(f)
        #else:
        #print("[INFO] Sampling benchmark episodes...")
        #np.random.seed(0)
        #benchmark_episodes = sample_valid_episodes(OHLCV_DF[OHLCV_DF['symbol']==TICKER], TICKER, N_TIMESTEPS, LOOKBACK, EVAL_EPISODES)
        #with open(BENCHMARK_PATH, "w") as f:
        #    json.dump(benchmark_episodes.tolist(), f)  # ← ✅ Convert to list here
        
    def _compute_aditional_benchmark_episodes(self):
        # Creates more benchmark episodes
        pass 
    
    def load_experiment(self,config_hash):
        # Loads trained model 
        # Loads environment he used 
        # Returns model, environment and dataframe row with all the data
        pass

In [None]:
import os
import json
import pandas as pd
import numpy as np
from sklearn.preprocessing import MinMaxScaler
from sklearn.ensemble import RandomForestRegressor

class TradingEnvironmentBenchmark:
    """
    * Will load the results derived from TradingEnvironmentBattleground evaluate runs
    * 
    """
    def __init__(self, 
                 benchmark_path="data/experiments/trading_environment_development/benchmark_episodes.json",
                 result_path="data/experiments/trading_environment_development/meta_df_transfer.csv",
                 checkpoint_dir="data/experiments/trading_environment_development/checkpoints",
                 ohlcv_df=None):
        self.benchmark_path = benchmark_path
        self.result_path = result_path
        self.checkpoint_dir = checkpoint_dir
        self.ohlcv_df = c
        self._boot()

    def _boot(self):
        self._load_results()
        self._load_ohlcv()

    def _load_results(self):
        result_df = pd.read_csv(self.result_path)
        result_df['config_dict'] = result_df['config'].apply(json.loads)
        config_expanded_df = pd.json_normalize(result_df['config_dict'])
        overlapping_cols = set(result_df.columns).intersection(config_expanded_df.columns)
        config_expanded_df = config_expanded_df.drop(columns=overlapping_cols)
        result_df_expanded = pd.concat([
            result_df.drop(columns=['config', 'config_dict']),
            config_expanded_df
        ], axis=1)
        self.result_df = result_df_expanded
        self._compute_scores()

    def _load_ohlcv(self):
        if self.ohlcv_df is None:
            raise ValueError("ohlcv_df must be provided externally")

    def _compute_scores(self):
        df = self.result_df.copy()
        df['learnability_score'] = 100 * self._sigmoid(df['advantage_train'])
        df['transferability_score'] = 100 * self._sigmoid(df['advantage_test']) * (1 - abs(df['transfer_delta'])/100)
        df['difficulty_score'] = 100 - df['learnability_score']
        self.result_df = df

    def _sigmoid(self, x, scale=1):
        return 1 / (1 + np.exp(-x / scale))

    def rank_episode_learnability(self):
        return self.result_df.sort_values(by="learnability_score", ascending=False)

    def rank_episode_transferability(self):
        return self.result_df.sort_values(by="transferability_score", ascending=False)

    def get_episodes_report(self):
        return self.result_df[['ticker', 'train_idx', 'test_idx', 'env_version', 'agent_name',
                               'learnability_score', 'transferability_score', 'difficulty_score']]

    def rank_environment_performance(self):
        return self.result_df.groupby('env_version')[['learnability_score', 'transferability_score']].mean().sort_values(by="learnability_score", ascending=False)

    def load_environments_by_performance(self):
        return self.result_df.groupby('env_version')

    def get_environments_report(self):
        return self.rank_environment_performance()

    def rank_agent_performance(self):
        return self.result_df.groupby('agent_name')[['learnability_score', 'transferability_score']].mean().sort_values(by="learnability_score", ascending=False)

    def load_agents_by_performance(self):
        return self.result_df.groupby('agent_name')

    def get_agents_report(self):
        return self.rank_agent_performance()

    def train_meta_models(self):
        meta_cols = ['entropy', 'volatility', 'hurst', 'adf_pval', 'mean_return', 'return_trend', 'kurtosis_return', 'skew_return', 'max_drawdown']
        df = self.result_df.dropna(subset=meta_cols)
        X = df[meta_cols]

        self.learn_model = RandomForestRegressor(random_state=42).fit(X, df['learnability_score'])
        self.transfer_model = RandomForestRegressor(random_state=42).fit(X, df['transferability_score'])

    def predict_episode_success(self, episode_features: pd.DataFrame):
        if not hasattr(self, 'learn_model') or not hasattr(self, 'transfer_model'):
            raise ValueError("Meta models not trained. Call train_meta_models() first.")

        learn_pred = self.learn_model.predict(episode_features)
        transfer_pred = self.transfer_model.predict(episode_features)

        return pd.DataFrame({
            'predicted_learnability': learn_pred,
            'predicted_transferability': transfer_pred
        }, index=episode_features.index)

    def filter_promising_episodes(self, learn_thresh=60, transfer_thresh=60):
        predictions = self.predict_episode_success(self.result_df[[
            'entropy', 'volatility', 'hurst', 'adf_pval', 'mean_return', 'return_trend', 'kurtosis_return', 'skew_return', 'max_drawdown'
        ]])
        mask = (predictions['predicted_learnability'] >= learn_thresh) & (predictions['predicted_transferability'] >= transfer_thresh)
        return self.result_df[mask].copy()


In [115]:
# Re-run after kernel reset
import os
import json
import joblib
import pandas as pd
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report, roc_auc_score
from sklearn.model_selection import train_test_split

class EpisodePrevisibilityPredictor:
    def __init__(self, meta_path, model_path=CHECKPOINT_DIR+"/transfer_model.pkl"):
        ...
        

    def _load_meta_df(self):
       ...

    def train_model(self):
        ...

    def predict_transferability(self):
        ...
    def predict_dificulty(self):
        ...

    def rank_episodes_by_transferability(self):
       ...
    def rank_episodes_by_dificulty(self):
       ...
    def rank_episode_overall_quality(self):
       ...

# Execute
predictor = TransferabilityPredictor(meta_path="data/experiments/trading_environment_development/meta_df_transfer.csv")
predictor.train_model()
top_episodes = predictor.rank_episodes_by_transferability()
import ace_tools_open as tools; tools.display_dataframe_to_user(name="Ranked Transferable Episodes", dataframe=top_episodes)



Classification Report:
               precision    recall  f1-score   support

           0       0.52      0.58      0.55        24
           1       0.57      0.50      0.53        26

    accuracy                           0.54        50
   macro avg       0.54      0.54      0.54        50
weighted avg       0.54      0.54      0.54        50

ROC AUC Score: 0.5104166666666667
Ranked Transferable Episodes


0
Loading ITables v2.4.0 from the internet...  (need help?)


In [None]:
# Re-run after kernel reset
import os
import json
import joblib
import pandas as pd
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report, roc_auc_score
from sklearn.model_selection import train_test_split

class DificultyPredictor:
    def __init__(self, meta_path, model_path=CHECKPOINT_DIR+"/dificulty_model.pkl", quantile_threshold=0.75):
        self.meta_path = meta_path
        self.model_path = model_path
        self.quantile_threshold = quantile_threshold
        self.df = None
        self.model = None
        self.feature_cols = []
        self._load_meta_df()
        

    def _load_meta_df(self):
        self.df = pd.read_csv(self.meta_path)
        if 'transferability_score' not in self.df.columns:
            self.df['transferability_score'] = self.df['advantage_test']
        threshold_value = self.df['transferability_score'].quantile(self.quantile_threshold)
        #self.df['transferable'] = (self.df['transferability_score'] > threshold_value).astype(int)
        self.df['transferable'] = (self.df['advantage_test'] > self.df['advantage_test'].quantile(0.5)).astype(int)
        self.feature_cols = [
            'mean_return', 'median_return', 'std_return', 'skew_return',
            'kurtosis_return', 'return_trend', 'ewm_mean_return', 'hurst',
            'adf_stat', 'adf_pval', 'entropy', 'volatility', 'max_drawdown',
            'sharpe', 'sortino', 'calmar', 'success_trades',
            'action_hold_ratio', 'action_long_ratio'
        ]
        self.df = self.df.dropna(subset=self.feature_cols + ['transferable'])

    def train_model(self):
        X = self.df[self.feature_cols]
        y = self.df['transferable']
        X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2, random_state=42)

        self.model = RandomForestClassifier(n_estimators=100, random_state=42)
        self.model.fit(X_train, y_train)

        y_pred = self.model.predict(X_val)
        y_prob = self.model.predict_proba(X_val)[:, 1]

        print("Classification Report:\n", classification_report(y_val, y_pred))
        print("ROC AUC Score:", roc_auc_score(y_val, y_prob))

        joblib.dump(self.model, self.model_path)

    def predict_transferability(self, new_meta_df):
        if self.model is None:
            self.model = joblib.load(self.model_path)
        X_new = new_meta_df[self.feature_cols]
        new_meta_df["transferable_pred"] = self.model.predict(X_new)
        new_meta_df["transfer_proba"] = self.model.predict_proba(X_new)[:, 1]
        return new_meta_df

    def rank_episodes_by_transferability(self):
        return self.df.sort_values(by="transferability_score", ascending=False)[
            ['ticker', 'train_idx', 'test_idx', 'transferability_score']
        ]





In [119]:
result_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 200 entries, 0 to 199
Data columns (total 36 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   mean_return        200 non-null    float64
 1   median_return      200 non-null    float64
 2   std_return         200 non-null    float64
 3   skew_return        200 non-null    float64
 4   kurtosis_return    200 non-null    float64
 5   return_trend       200 non-null    float64
 6   ewm_mean_return    200 non-null    float64
 7   hurst              200 non-null    float64
 8   adf_stat           200 non-null    float64
 9   adf_pval           200 non-null    float64
 10  entropy            200 non-null    float64
 11  config_hash        200 non-null    object 
 12  env_version        200 non-null    object 
 13  agent_name         200 non-null    object 
 14  score_train        200 non-null    float64
 15  score_test         200 non-null    float64
 16  advantage_train    200 non