This is gold, Pi — thank you for laying it out so clearly. What you built is already stronger than many academic meta-RL setups. Now let’s **merge the best of your past success with this current Battleground framework** and push it to the next level.

---

## 🎯 Current Goal

**Predict, with ≥75% confidence, whether a given RL training episode is transferable.**

To do that, we need a **rich and expressive feature space** that captures **structure, uncertainty, dynamics, and regime information** — just like your Stock-Month Predictability Study.

---

## ✅ What We Already Have

From your current `result_df`, we already extract:

| Group                 | Features                                                                 |
| --------------------- | ------------------------------------------------------------------------ |
| Statistical Moments   | `mean_return`, `std_return`, `skew_return`, `kurtosis_return`, `entropy` |
| Price Trend           | `return_trend`, `ewm_mean_return`                                        |
| Chaos / Regime        | `hurst`, `adf_stat`, `adf_pval`                                          |
| Risk & Reward Metrics | `volatility`, `max_drawdown`, `sharpe`, `sortino`, `calmar`              |
| Agent Diagnostics     | `success_trades`, `action_hold_ratio`, `action_long_ratio`               |
| Outcome Labels        | `score_train`, `score_test`, `advantage_test`, `transfer_delta`          |

---

## 🔁 Let’s Extend: Feature Suggestions

Below are **10 additional meta-features** we can compute **either now or as next step**, all in PyTorch/Numpy-friendly form:

| Feature Name               | Why Add It?                                                                |
| -------------------------- | -------------------------------------------------------------------------- |
| `resid_std`                | From RF prediction of t+1 returns → measures noise                         |
| `resid_skew`, `resid_kurt` | Shape of the error → asymmetry or tails                                    |
| `resid_acf1`               | Temporal memory in prediction error                                        |
| `ljung_pval`               | Statistical confirmation of noise/randomness                               |
| `cv_r2`                    | Proxy for model learnability/predictability                                |
| `garch_volatility`         | Conditional volatility ⇒ market stress estimation                          |
| `change_point_count`       | Regime switch count (e.g. via ruptures or cusum)                           |
| `rolling_adf_pval`         | Stationarity evolution over time                                           |
| `forecast_entropy`         | Entropy of predictions from RF or AE                                       |
| `price_entropy_peak`       | Local entropy spike detection before regime breaks (good for online usage) |

These build on your previous success and aim at:

* **Residual structure**
* **Volatility structure**
* **Forecast structure**
* **Regime changes**

---

## 🧠 Architecture Suggestion (Final Plan)

```bash
[Episode -> Raw OHLCV]
         |
         v
[Feature Extractor (Meta + Residual + Chaos)]
         |
         v
[Representation Learner (AE, Transformer Encoder, etc)]
         |
         v
[Predictor (Classifier or Ranker)]
         |
         v
[Score: Learnability + Transferability + Difficulty]
```

---

## ✅ Action Plan (Ordered)

**Phase 1 – Today**

1. ✅ Implement `EpisodeMetaFeatureExtractor` (done or in progress)
2. ✅ Implement `TradingEnvironmentBattleground` (done!)
3. 🔁 Add new diagnostics: `resid_std`, `resid_acf1`, `cv_r2`, `ljung_pval`
4. 🧠 Train `EpisodeTransferabilityPredictor` with new features

**Phase 2 – Next**
5\. 🧬 Add `AutoencoderRepresentation` wrapper
6\. 🔄 Train contrastive ranking model: "Is A > B?"
7\. ⏱ Evaluate with out-of-time validation (e.g., train on 2023 Q1, test on Q2)

---


In [3]:
import jupyter

In [8]:
import random
import numpy as np
import pandas as pd
import seaborn as sns
import gymnasium as gym
import matplotlib.pyplot as plt


from src.utils.system import boot
from src.data.feature_pipeline import load_base_dataframe
from experiments import check_if_experiment_exists, register_experiment ,experiment_hash
from environments import PositionTradingEnv,PositionTradingEnvV1

# ========== SYSTEM BOOT ==========
DEVICE = boot()
EXPERIMENT_NAME = "trading_environment_development"
DEFAULT_PATH = "data/experiments/" + EXPERIMENT_NAME

# ========== CONFIG ==========
TICKER = "AAPL"
TIMESTEPS = 10_000
EVAL_EPISODES = 5
N_TIMESTEPS = 60
LOOKBACK = 0
SEEDS = [42, 52, 62]
MARKET_FEATURES = ['close']
BENCHMARK_PATH = DEFAULT_PATH+"/benchmark_episodes.json"
CHECKPOINT_DIR = DEFAULT_PATH+"/checkpoints"
SCORES_DIR = DEFAULT_PATH+"/scores"
META_PATH = DEFAULT_PATH+"/meta_df_transfer.csv"

MODEL_PATH = CHECKPOINT_DIR+"/episode_quality_model.pkl"
MARKET_FEATURES.sort()
SEEDS.sort()

DEVICE = boot()
OHLCV_DF = load_base_dataframe()

In [13]:
import pandas as pd
import numpy as np
from sklearn.ensemble import RandomForestClassifier
from sklearn.preprocessing import RobustScaler
from sklearn.metrics import classification_report, confusion_matrix
import joblib
import os

class EpisodePrevisibilityPredictor:
    def __init__(self, meta_path=META_PATH, model_path=MODEL_PATH):
        self.meta_path = meta_path
        self.model_path = model_path
        self.model = None
        self.df = None
        self.feature_cols = []
        self._load_meta_df()

    def _load_meta_df(self):
        self.df = pd.read_csv(self.meta_path)
        self.df['target'] = (self.df['advantage_test'] > 0).astype(int)

        exclude_cols = [
            'symbol', 'month', 'month_str',
            'agent_reward', 'random_reward',
            'advantage', 'advantage_test', 'advantage_train',
            'target', 'transfer_success', 'config', 'config_dict',
            'ticker', 'train_idx', 'test_idx', 'config_hash', 'env_version','agent_name','score_test', 'transfer_delta', 'seed'
        ]
        self.feature_cols = [col for col in self.df.columns if col not in exclude_cols]
        self.df = self.df.dropna(subset=self.feature_cols + ['target'])
        print('feature_columns',self.feature_cols)

    def train_model(self):
        X = self.df[self.feature_cols]
        y = self.df['target']

        scaler = RobustScaler()
        X_scaled = scaler.fit_transform(X)

        self.model = RandomForestClassifier(n_estimators=200, class_weight='balanced', random_state=42)
        self.model.fit(X_scaled, y)

        # Save model and scaler
        joblib.dump({'model': self.model, 'scaler': scaler}, self.model_path)

        # Evaluation
        from sklearn.model_selection import train_test_split
        X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.2, stratify=y, random_state=42)
        y_pred = self.model.predict(X_test)
        print("Classification Report:\n", classification_report(y_test, y_pred))
        print("Confusion Matrix:\n", confusion_matrix(y_test, y_pred))

    def _load_model(self):
        if self.model is None:
            data = joblib.load(self.model_path)
            self.model = data['model']
            self.scaler = data['scaler']

    def predict_transferability(self, new_df):
        self._load_model()
        X_new = new_df[self.feature_cols]
        X_scaled = self.scaler.transform(X_new)
        new_df['transfer_proba'] = self.model.predict_proba(X_scaled)[:, 1]
        new_df['transferable'] = self.model.predict(X_scaled)
        return new_df

    def rank_episodes_by_transferability(self):
        return self.df.sort_values(by='advantage_test', ascending=False)[
            ['ticker', 'train_idx', 'test_idx', 'advantage_test']
        ]

    def rank_episodes_by_difficulty(self):
        return self.df.sort_values(by='advantage_train')[
            ['ticker', 'train_idx', 'test_idx', 'advantage_train']
        ]

    def rank_episode_overall_quality(self):
        score = self.df['advantage_test'] * (1 - abs(self.df['transfer_delta'])/100)
        ranked_df = self.df.copy()
        ranked_df['overall_score'] = score
        return ranked_df.sort_values(by='overall_score', ascending=False)[
            ['ticker', 'train_idx', 'test_idx', 'overall_score']
        ]
    
# Example usage:
predictor = EpisodePrevisibilityPredictor()
predictor.train_model()
ranked_df = predictor.rank_episode_overall_quality()

import ace_tools_open as tools; tools.display_dataframe_to_user(name="Ranked Episode Overall Quality", dataframe=ranked_df)


feature_columns ['mean_return', 'median_return', 'std_return', 'skew_return', 'kurtosis_return', 'return_trend', 'ewm_mean_return', 'hurst', 'adf_stat', 'adf_pval', 'entropy', 'score_train', 'timesteps', 'episode_steps', 'volatility', 'max_drawdown', 'sharpe', 'sortino', 'calmar', 'success_trades', 'action_hold_ratio', 'action_long_ratio']
Classification Report:
               precision    recall  f1-score   support

           0       1.00      1.00      1.00        26
           1       1.00      1.00      1.00        24

    accuracy                           1.00        50
   macro avg       1.00      1.00      1.00        50
weighted avg       1.00      1.00      1.00        50

Confusion Matrix:
 [[26  0]
 [ 0 24]]
Ranked Episode Overall Quality


0
Loading ITables v2.4.0 from the internet...  (need help?)
