This is gold, Pi — thank you for laying it out so clearly. What you built is already stronger than many academic meta-RL setups. Now let’s **merge the best of your past success with this current Battleground framework** and push it to the next level.

---

## 🎯 Current Goal

**Predict, with ≥75% confidence, whether a given RL training episode is transferable.**

To do that, we need a **rich and expressive feature space** that captures **structure, uncertainty, dynamics, and regime information** — just like your Stock-Month Predictability Study.

---

## ✅ What We Already Have

From your current `result_df`, we already extract:

| Group                 | Features                                                                 |
| --------------------- | ------------------------------------------------------------------------ |
| Statistical Moments   | `mean_return`, `std_return`, `skew_return`, `kurtosis_return`, `entropy` |
| Price Trend           | `return_trend`, `ewm_mean_return`                                        |
| Chaos / Regime        | `hurst`, `adf_stat`, `adf_pval`                                          |
| Risk & Reward Metrics | `volatility`, `max_drawdown`, `sharpe`, `sortino`, `calmar`              |
| Agent Diagnostics     | `success_trades`, `action_hold_ratio`, `action_long_ratio`               |
| Outcome Labels        | `score_train`, `score_test`, `advantage_test`, `transfer_delta`          |

---

## 🔁 Let’s Extend: Feature Suggestions

Below are **10 additional meta-features** we can compute **either now or as next step**, all in PyTorch/Numpy-friendly form:

| Feature Name               | Why Add It?                                                                |
| -------------------------- | -------------------------------------------------------------------------- |
| `resid_std`                | From RF prediction of t+1 returns → measures noise                         |
| `resid_skew`, `resid_kurt` | Shape of the error → asymmetry or tails                                    |
| `resid_acf1`               | Temporal memory in prediction error                                        |
| `ljung_pval`               | Statistical confirmation of noise/randomness                               |
| `cv_r2`                    | Proxy for model learnability/predictability                                |
| `garch_volatility`         | Conditional volatility ⇒ market stress estimation                          |
| `change_point_count`       | Regime switch count (e.g. via ruptures or cusum)                           |
| `rolling_adf_pval`         | Stationarity evolution over time                                           |
| `forecast_entropy`         | Entropy of predictions from RF or AE                                       |
| `price_entropy_peak`       | Local entropy spike detection before regime breaks (good for online usage) |

These build on your previous success and aim at:

* **Residual structure**
* **Volatility structure**
* **Forecast structure**
* **Regime changes**

---

## 🧠 Architecture Suggestion (Final Plan)

```bash
[Episode -> Raw OHLCV]
         |
         v
[Feature Extractor (Meta + Residual + Chaos)]
         |
         v
[Representation Learner (AE, Transformer Encoder, etc)]
         |
         v
[Predictor (Classifier or Ranker)]
         |
         v
[Score: Learnability + Transferability + Difficulty]
```

---

## ✅ Action Plan (Ordered)

**Phase 1 – Today**

1. ✅ Implement `EpisodeMetaFeatureExtractor` (done or in progress)
2. ✅ Implement `TradingEnvironmentBattleground` (done!)
3. 🔁 Add new diagnostics: `resid_std`, `resid_acf1`, `cv_r2`, `ljung_pval`
4. 🧠 Train `EpisodeTransferabilityPredictor` with new features

**Phase 2 – Next**
5\. 🧬 Add `AutoencoderRepresentation` wrapper
6\. 🔄 Train contrastive ranking model: "Is A > B?"
7\. ⏱ Evaluate with out-of-time validation (e.g., train on 2023 Q1, test on Q2)

---


In [3]:
import jupyter

In [4]:
import random
import numpy as np
import pandas as pd
import seaborn as sns
import gymnasium as gym
import matplotlib.pyplot as plt


from src.utils.system import boot
from src.data.feature_pipeline import load_base_dataframe
from experiments import check_if_experiment_exists, register_experiment ,experiment_hash
from environments import PositionTradingEnv,PositionTradingEnvV1

# ========== SYSTEM BOOT ==========
DEVICE = boot()
EXPERIMENT_NAME = "trading_environment_development"
DEFAULT_PATH = "data/experiments/" + EXPERIMENT_NAME

# ========== CONFIG ==========
TICKER = "AAPL"
TIMESTEPS = 10_000
EVAL_EPISODES = 5
N_TIMESTEPS = 60
LOOKBACK = 0
SEEDS = [42, 52, 62]
MARKET_FEATURES = ['close']
BENCHMARK_PATH = DEFAULT_PATH+"/benchmark_episodes.json"
CHECKPOINT_DIR = DEFAULT_PATH+"/checkpoints"
SCORES_DIR = DEFAULT_PATH+"/scores"
META_PATH = DEFAULT_PATH+"/meta_df_transfer.csv"

MODEL_PATH = CHECKPOINT_DIR+"/episode_quality_model.pkl"
MARKET_FEATURES.sort()
SEEDS.sort()

DEVICE = boot()
OHLCV_DF = load_base_dataframe()

In [11]:
result_df = pd.read_csv(DEFAULT_PATH+"/meta_df_transfer.csv")
train_ep_ids = result_df['train_idx'].unique()
#[615, 360, 528,  71, 355],

array([615, 360, 528,  71, 355], dtype=int64)

In [18]:
"""
Organiza-te :

feature_dict = {
            'mean_return': returns.mean(),
            'median_return':np.median(returns),
            'std_return': returns.std(),
            'skew': skew(returns),
            'kurtosis': kurtosis(returns),
            'entropy': entropy(np.histogram(returns, bins=10, density=True)[0] + 1e-8),
            'volatility': returns.std(),
            'drawdown': (values / np.maximum.accumulate(values)).min() - 1,
            ['adf_pvalue'] = adf_result[1]
        }
        
        df_ep[f'return_lag_{lag}'] = df_ep['close'].pct_change().shift(lag)
        'resid_std': residuals.std(),
                'resid_skew': skew(residuals),
                'resid_kurtosis': kurtosis(residuals),
                'ljung_pval': acorr_ljungbox(residuals, lags=[1], return_df=True).iloc[0]['lb_pvalue'],
                'resid_acf1': pd.Series(residuals).autocorr(lag=1)
        
"""

print('ver como o classifier se comporta ')

'\nOrganiza-te :\n'

In [19]:
train_ep = OHLCV_DF[OHLCV_DF['symbol']=="AAPL"].iloc[71:71+120].copy()
train_ep.reset_index()
MARKET_FEATURES = ['volume','close']
train_ep[MARKET_FEATURES]

Unnamed: 0,volume,close
33494,74493604.0,165.07
33495,75240221.0,167.40
33496,73954306.0,167.23
33497,93153701.0,166.42
33498,93535037.0,161.79
...,...,...
33609,141390692.0,138.20
33610,124534891.0,142.45
33611,96785630.0,146.10
33612,83933253.0,146.40


In [17]:
train_ep = OHLCV_DF[OHLCV_DF['symbol']=="AAPL"].iloc[71:71+120].copy()
train_ep.reset_index()
MARKET_FEATURES = ['volume','close']




"""
  'mean_return': returns.mean(),
            'std_return': returns.std(),
            'skew': skew(returns),
            'kurtosis': kurtosis(returns),
            'entropy': entropy(np.histogram(returns, bins=10, density=True)[0] + 1e-8),
            'volatility': returns.std(),
            'drawdown': (values / np.maximum.accumulate(values)).min() - 1,
            'median_return':np.median(returns),
"""
def extract_episode_features( df_ep: pd.DataFrame, config={"lags":[5]}) -> dict:
    returns = df_ep['close'].pct_change().dropna()
    values = df_ep['close'].values

    feature_dict = {
            'mean_return': returns.mean(),
            'median_return':np.median(returns),
            'std_return': returns.std(),
            'skew': skew(returns),
            'kurtosis': kurtosis(returns),
            'entropy': entropy(np.histogram(returns, bins=10, density=True)[0] + 1e-8),
            'volatility': returns.std(),
            'drawdown': (values / np.maximum.accumulate(values)).min() - 1,
        }

    try:
        adf_result = adfuller(returns)
        feature_dict['adf_pvalue'] = adf_result[1]
    except Exception:
        feature_dict['adf_pvalue'] = 1.0

    try:
        for lag in range(1, config['lags'] + 1):
            df_ep[f'return_lag_{lag}'] = df_ep['close'].pct_change().shift(lag)
        df_lagged = df_ep.dropna()
        if len(df_lagged) < config['min_samples']:
            raise Exception("Insufficient data")
        X = df_lagged[[f'return_lag_{i}' for i in range(1, config['lags'] + 1)]].values
        y = df_lagged['close'].pct_change().dropna().values[-len(X):]

        model = RandomForestClassifier(n_estimators=50, random_state=42)
        model.fit(X, y > 0)
        residuals = y - model.predict_proba(X)[:, 1]

        feature_dict.update({
                'resid_std': residuals.std(),
                'resid_skew': skew(residuals),
                'resid_kurtosis': kurtosis(residuals),
                'ljung_pval': acorr_ljungbox(residuals, lags=[1], return_df=True).iloc[0]['lb_pvalue'],
                'resid_acf1': pd.Series(residuals).autocorr(lag=1)
        })
    except Exception:
        feature_dict.update({
            'resid_std': np.nan,
            'resid_skew': np.nan,
            'resid_kurtosis': np.nan,
            'ljung_pval': np.nan,
            'resid_acf1': np.nan
        })

    return feature_dict


Unnamed: 0,index,id,symbol,timestamp,date,open,high,low,close,volume,...,vwap_change,trade_count_change,sector_id,industry_id,return_1d,vix,vix_norm,sp500,sp500_norm,market_return_1d
0,33494,33495,AAPL,2022-04-18 04:00:00,2022-04-18,163.920,166.5984,163.570,165.07,74493604.0,...,-0.013575,-0.033606,10.0,unknown,-0.001331,0.2217,-0.023348,43.9169,-0.000205,-0.000205
1,33495,33496,AAPL,2022-04-19 04:00:00,2022-04-19,165.020,167.8200,163.910,167.40,75240221.0,...,0.010325,-0.049385,10.0,unknown,0.014115,0.2137,-0.036085,44.6221,0.016058,0.016058
2,33496,33497,AAPL,2022-04-20 04:00:00,2022-04-20,168.760,168.8800,166.100,167.23,73954306.0,...,0.004275,0.085275,10.0,unknown,-0.001016,0.2032,-0.049134,44.5945,-0.000619,-0.000619
3,33497,33498,AAPL,2022-04-21 04:00:00,2022-04-21,168.910,171.5300,165.910,166.42,93153701.0,...,0.008196,0.263759,10.0,unknown,-0.004844,0.2268,0.116142,43.9366,-0.014753,-0.014753
4,33498,33499,AAPL,2022-04-22 04:00:00,2022-04-22,166.460,167.8699,161.500,161.79,93535037.0,...,-0.027863,-0.033611,10.0,unknown,-0.027821,0.2821,0.243827,42.7178,-0.027740,-0.027740
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
115,33609,33610,AAPL,2022-09-30 04:00:00,2022-09-30,141.280,143.1000,138.000,138.20,141390692.0,...,-0.018747,-0.117879,10.0,unknown,-0.030039,0.3162,-0.006910,35.8562,-0.015067,-0.015067
116,33610,33611,AAPL,2022-10-03 04:00:00,2022-10-03,138.210,143.0700,137.685,142.45,124534891.0,...,0.008126,-0.109068,10.0,unknown,0.030753,0.3010,-0.048071,36.7843,0.025884,0.025884
117,33611,33612,AAPL,2022-10-04 04:00:00,2022-10-04,145.030,146.2200,144.260,146.10,96785630.0,...,0.030161,-0.180357,10.0,unknown,0.025623,0.2907,-0.034219,37.9093,0.030584,0.030584
118,33612,33613,AAPL,2022-10-05 04:00:00,2022-10-05,144.075,147.3800,143.010,146.40,83933253.0,...,-0.001604,-0.118751,10.0,unknown,0.002053,0.2855,-0.017888,37.8328,-0.002018,-0.002018


In [5]:
EXPERIENCE_NAME = "stock_universe_predictability_selection__MetaFeatures__MetaRlLabeling"
FEATURES_PATH = f"../data/cache/features_{EXPERIENCE_NAME}.pkl"
TARGETS_PATH = f"../data/cache/targets_{EXPERIENCE_NAME}.pkl"
META_PATH = f"../data/cache/meta_{EXPERIENCE_NAME}.pkl"
RL_LABELS_PATH = "../data/cache/meta_rl_labels_stock_universe_predictability_selection__MetaFeatures__MetaRlLabeling__6293649262173480064.pkl"

excluded_tickers=['CEG', 'GEHC', 'GEV', 'KVUE', 'SOLV']
excluded_tickers.sort()
#tickers = TOP2_STOCK_BY_SECTOR

config={
    "regressor":"RandomForestRegressor",
    "n_estimators": 200,
    "random_state":314,
    "transaction_cost":0
}
run_settings={
    "excluded_tickers": excluded_tickers,
    "min_samples": 10,
    "cv_folds": 3,
    "lags": 5,
    "start_date":"2022-01-01",
    "end_date":"2025-01-01",
    "seed":314,
    "episode_length":18,
    "noise_feature_cols": ["return_1d", "volume"]  ,

    "train_steps": 300,
    "min_ep_len" : 18
}

# Config section

In [6]:
import os
import json
import joblib
import hashlib
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

from tqdm import tqdm
from typing import Optional
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report, confusion_matrix, ConfusionMatrixDisplay
from sklearn.preprocessing import RobustScaler
from scipy.stats import skew, kurtosis, entropy
from statsmodels.tsa.stattools import adfuller
from statsmodels.stats.diagnostic import acorr_ljungbox

class MetaTransferabilityPredictor:
    def __init__(self, raw_df: pd.DataFrame, benchmark_path: str, result_path: str, cache_path: str, config: dict):
        self.df = raw_df.copy()
        self.benchmark_path = benchmark_path
        self.result_path = result_path
        self.cache_path = cache_path
        self.config = config

        self.scaler = RobustScaler()
        self.model = None
        self.meta_df = pd.DataFrame()

    def _extract_episode_features(self, df_ep: pd.DataFrame) -> dict:
        returns = df_ep['close'].pct_change().dropna()
        values = df_ep['close'].values

        feature_dict = {
            'mean_return': returns.mean(),
            'std_return': returns.std(),
            'skew': skew(returns),
            'kurtosis': kurtosis(returns),
            'entropy': entropy(np.histogram(returns, bins=10, density=True)[0] + 1e-8),
            'volatility': returns.std(),
            'drawdown': (values / np.maximum.accumulate(values)).min() - 1,
        }

        try:
            adf_result = adfuller(returns)
            feature_dict['adf_pvalue'] = adf_result[1]
        except Exception:
            feature_dict['adf_pvalue'] = 1.0

        try:
            for lag in range(1, self.config['lags'] + 1):
                df_ep[f'return_lag_{lag}'] = df_ep['close'].pct_change().shift(lag)
            df_lagged = df_ep.dropna()
            if len(df_lagged) < self.config['min_samples']:
                raise Exception("Insufficient data")
            X = df_lagged[[f'return_lag_{i}' for i in range(1, self.config['lags'] + 1)]].values
            y = df_lagged['close'].pct_change().dropna().values[-len(X):]

            model = RandomForestClassifier(n_estimators=50, random_state=42)
            model.fit(X, y > 0)
            residuals = y - model.predict_proba(X)[:, 1]

            feature_dict.update({
                'resid_std': residuals.std(),
                'resid_skew': skew(residuals),
                'resid_kurtosis': kurtosis(residuals),
                'ljung_pval': acorr_ljungbox(residuals, lags=[1], return_df=True).iloc[0]['lb_pvalue'],
                'resid_acf1': pd.Series(residuals).autocorr(lag=1)
            })
        except Exception:
            feature_dict.update({
                'resid_std': np.nan,
                'resid_skew': np.nan,
                'resid_kurtosis': np.nan,
                'ljung_pval': np.nan,
                'resid_acf1': np.nan
            })

        return feature_dict

    def extract_features_and_labels(self):
        with open(self.benchmark_path) as f:
            benchmarks = json.load(f)

        ticker = self.config['ticker']
        df_ticker = self.df[self.df['symbol'] == ticker].reset_index(drop=True)

        meta_records = []
        for start_idx in tqdm(benchmarks, desc="Processing benchmark episodes"):
            test_idx = start_idx + self.config['episode_steps']
            if test_idx + self.config['episode_steps'] >= len(df_ticker):
                continue

            train_df = df_ticker.iloc[start_idx:start_idx + self.config['episode_steps']]
            test_df = df_ticker.iloc[test_idx:test_idx + self.config['episode_steps']]

            train_feats = self._extract_episode_features(train_df)
            test_feats = self._extract_episode_features(test_df)

            hash_id = hashlib.sha256(f"{ticker}_{start_idx}".encode()).hexdigest()

            train_reward = train_df['close'].pct_change().sum()
            test_reward = test_df['close'].pct_change().sum()
            rand_train = np.random.choice([-1, 1], size=len(train_df)).sum()
            rand_test = np.random.choice([-1, 1], size=len(test_df)).sum()

            advantage_train = train_reward - rand_train
            advantage_test = test_reward - rand_test
            transfer_delta = test_reward - train_reward

            record = {
                'config_hash': hash_id,
                'train_idx': start_idx,
                'test_idx': test_idx,
                'ticker': ticker,
                'advantage_train': advantage_train,
                'advantage_test': advantage_test,
                'transfer_delta': transfer_delta,
                'label': int(transfer_delta > 0),
                **{f"train_{k}": v for k, v in train_feats.items()},
                **{f"test_{k}": v for k, v in test_feats.items()}
            }
            meta_records.append(record)

        self.meta_df = pd.DataFrame(meta_records)
        self.meta_df.to_pickle(self.cache_path)
        print("[INFO] Feature and label dataset created and cached.")

    def train_classifier(self):
        feature_cols = [c for c in self.meta_df.columns if c not in ['label', 'config_hash', 'ticker', 'train_idx', 'test_idx']]
        X = self.meta_df[feature_cols].fillna(0)
        y = self.meta_df['label']
        X_scaled = self.scaler.fit_transform(X)

        self.X_train, self.X_test, self.y_train, self.y_test = train_test_split(
            X_scaled, y, test_size=0.2, random_state=self.config['seed'], stratify=y
        )

        self.model = RandomForestClassifier(n_estimators=100, random_state=self.config['seed'], class_weight='balanced')
        self.model.fit(self.X_train, self.y_train)

    def evaluate_model(self):
        y_pred = self.model.predict(self.X_test)
        print("\n[Classification Report]")
        print(classification_report(self.y_test, y_pred))
        cm = confusion_matrix(self.y_test, y_pred)
        ConfusionMatrixDisplay(cm).plot(cmap=plt.cm.Blues)
        plt.title("Confusion Matrix")
        plt.show()

    def predict(self, new_df: pd.DataFrame):
        X_scaled = self.scaler.transform(new_df)
        return self.model.predict(X_scaled)

    def feature_importance(self):
        importances = self.model.feature_importances_
        feature_cols = [c for c in self.meta_df.columns if c not in ['label', 'config_hash', 'ticker', 'train_idx', 'test_idx']]
        sorted_idx = np.argsort(importances)[::-1]
        plt.figure(figsize=(12, 6))
        plt.bar(range(len(importances)), importances[sorted_idx])
        plt.xticks(range(len(importances)), [feature_cols[i] for i in sorted_idx], rotation=90)
        plt.title("Meta-Feature Importances")
        plt.tight_layout()
        plt.show()


In [8]:
predictor = MetaTransferabilityPredictor(OHLCV_DF, config, run_settings, EXPERIENCE_NAME)

TypeError: MetaTransferabilityPredictor.__init__() missing 1 required positional argument: 'config'

In [None]:
# Prepare data and extract meta-features
predictor.prepare_data()
predictor.extract_or_load_features()

# Train classifier using cached RL labels
predictor.train()

# Evaluate the trained classifier
predictor.evaluate()

# Plot feature importance
predictor.report()

In [None]:
bm = TradingEnvironmentBenchmark()
bm.result_df

In [None]:
df