Data splits (walk-forward, same stocks/timeframes)

Features (identical input sets)

Reward (cumulative)

Evaluation (out-of-sample EWM, Sharpe, etc.)



Let’s set up a true **apples-to-apples benchmark** of agent architectures on our trading pipeline, using the exact same:

* Data splits (walk-forward, same stocks/timeframes)
* Features (identical input sets)
* Reward (cumulative)
* Evaluation (out-of-sample EWM, Sharpe, etc.)



---

## **Agent Architecture Benchmark Plan**

### **1. Baseline PPO-MLP**

* **Policy:** Standard multilayer perceptron (MLP)
* **Library:** Stable Baselines3 (`PPO`)
* **Policy kwargs:** e.g., `[128, 128]` or `[256, 128]`

---

### **2. LSTM PPO (RecurrentPPO)**

* **Policy:** LSTM (single-layer or 2-layer, 128 units)
* **Library:** Stable Baselines3-Contrib (`RecurrentPPO`)
* **Policy:** `"MlpLstmPolicy"`
* **Handles sequences natively**
* **Extra: Tune sequence/episode length for best results**

---

### **3. Single-Head Attention Transformer Policy**

* **Policy:** Transformer encoder with 1 attention head (minimalist setup)
* **Implementation:**

  * *Option 1*: Use `stable-baselines3` with a custom policy class (PyTorch).
  * *Option 2*: Use SB3 fork/extensions that support transformer policies out-of-the-box (less common; will probably need custom code).
* **Goal:** Test transformer’s “pattern memory” edge vs LSTM.

---

### **4. Multi-Head Attention Transformer Policy**

* **Policy:** Transformer encoder, e.g., 4–8 heads, 1–2 layers
* **Implementation:** Same as above but with multiple heads
* **Why:** See if more heads/layers boost performance (at higher compute cost)

---

## **Benchmarking Protocol**

1. **Data**: Use our best meta-selected stocks/timeframes, identical for all runs.
2. **Feature set**: Fix features for all models (no advantage to one or another).
3. **Hyperparameters**: Tune as fairly as possible (similar total params, same optimizer, batch size, episode length).
4. **Evaluation**:

   * Out-of-sample EWM cumulative reward
   * Sharpe ratio, drawdown, % > market
   * Policy entropy, if curious
   * 5+ random seeds per setting
5. **Logging**: Use Weights & Biases, MLflow, or simple CSVs to compare runs.

---

## **Implementation Plan**

**A. Write/Adapt Custom Policies**

* For LSTM: use `RecurrentPPO` (easy).
* For Transformers: extend SB3’s `ActorCriticPolicy` using PyTorch, plug in transformer blocks.

**B. Standardized Training Loop**

* For each agent: loop over all stocks/timeframes, train, evaluate, record metrics.

**C. Result Table**

| Model           | Architecture     | Params | Mean EWM Reward | Sharpe | % > Market | Notes       |
| --------------- | ---------------- | ------ | --------------- | ------ | ---------- | ----------- |
| PPO-MLP         | \[256,128] MLP   | X      | ...             | ...    | ...        | Baseline    |
| PPO-LSTM        | 1x128 LSTM       | Y      | ...             | ...    | ...        | Recurrent   |
| PPO-Transformer | 1-head, 1 layer  | Z      | ...             | ...    | ...        | Single head |
| PPO-Transformer | 4-head, 2 layers | W      | ...             | ...    | ...        | Multi-head  |

---

## **Deliverables**

* **Scripts for each model type** (ready to run)
* **Unified training and eval pipeline** (for apples-to-apples comparison)
* **Benchmarking notebooks** for quick result viz
* **Markdown summary template** for documentation

---



In [1]:
# SETUP: Imports & Paths ===========================
import jupyter
from src.utils.system import boot, Notify

boot()
import os
import joblib
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt


from tqdm import tqdm

from src.data.feature_pipeline import basic_chart_features,load_base_dataframe
from src.predictability.easiness import rolling_sharpe, rolling_r2, rolling_info_ratio, rolling_autocorr
from src.predictability.pipeline import generate_universe_easiness_report
from IPython import display

from src.experiments.experiment_tracker import ExperimentTracker
from src.config import TOP2_STOCK_BY_SECTOR


from sklearn.ensemble import RandomForestRegressor
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import StandardScaler, RobustScaler
from sklearn.metrics import r2_score
from sklearn.model_selection import cross_val_score
from scipy.stats import skew, kurtosis, entropy
from statsmodels.stats.diagnostic import acorr_ljungbox
from statsmodels.tsa.stattools import acf, acovf

from src.env.base_trading_env import (
    CumulativeTradingEnv,
)
import warnings
warnings.filterwarnings("ignore")


  from pandas.core import (


In [11]:
# ========== IMPORTS & SETUP ==========
import os
import numpy as np
import pandas as pd
import torch
import torch.nn as nn
from stable_baselines3 import PPO
from sb3_contrib import RecurrentPPO
from stable_baselines3.common.vec_env import DummyVecEnv
from tqdm import tqdm

from src.env.base_trading_env import CumulativeTradingEnv
from src.data.feature_pipeline import load_base_dataframe
from src.experiments.experiment_tracker import ExperimentTracker
from src.defaults import FEATURE_COLS, EPISODE_LENGTH, EXCLUDED_TICKERS

# ========== CONFIG ==========
EXPERIENCE_NAME = "agent_design_and_benchmark"
RESULTS_PATH = f"data/experiments/{EXPERIENCE_NAME}_barebones_results.csv"
N_EPISODES = 20
N_SEEDS = 3
N_EVAL_EPISODES = 3
AGENT_TYPES = ['mlp', 'lstm', 'transformer_single', 'transformer_multi']

TRANSACTION_COST = 0

CONFIG = {
    "batch_size": 32,
    "n_steps": 128,
    "total_timesteps": 10000,   # Adjust for speed/depth
}

walk_forward_splits = [
    ("2023-01-01", "2023-07-01", "2023-12-01"),
    ("2024-01-01", "2024-07-01", "2024-12-01"),
]

# --- Load data ---
ohlcv_df = load_base_dataframe()
ohlcv_df['date'] = pd.to_datetime(ohlcv_df['date'])

# --- Experiment tracker ---
experiment_tracker = ExperimentTracker(EXPERIENCE_NAME)

# --- Transformer Policy ---
from stable_baselines3.common.torch_layers import BaseFeaturesExtractor
from stable_baselines3.common.policies import ActorCriticPolicy

class TransformerExtractor(BaseFeaturesExtractor):
    def __init__(self, observation_space, d_model=32, nhead=1, num_layers=1):
        super().__init__(observation_space, features_dim=d_model)
        self.embedding = nn.Linear(observation_space.shape[0], d_model)
        encoder_layer = nn.TransformerEncoderLayer(d_model=d_model, nhead=nhead)
        self.transformer = nn.TransformerEncoder(encoder_layer, num_layers=num_layers)
    def forward(self, obs):
        x = self.embedding(obs)
        x = x.unsqueeze(0)  # (seq=1, batch, d_model)
        x = self.transformer(x)
        x = x.squeeze(0)
        return x

class TransformerPolicy(ActorCriticPolicy):
    def __init__(self, *args, nhead=1, num_layers=1, **kwargs):
        super().__init__(
            *args,
            features_extractor_class=TransformerExtractor,
            features_extractor_kwargs={'d_model': 32, 'nhead': nhead, 'num_layers': num_layers},
            **kwargs
        )

# --- Env factory ---
def make_env(df, ticker, feature_cols, episode_length):
    df_ticker = df[df['symbol'] == ticker].copy()
    return CumulativeTradingEnv(
        df=df_ticker,
        feature_cols=feature_cols,
        episode_length=episode_length,
        transaction_cost=TRANSACTION_COST,
    )

# --- Episode generator ---
def generate_episode_sequences(df, episode_length, n_episodes, excluded_tickers, seed=314):
    rng = np.random.default_rng(seed)
    eligible_tickers = [t for t in df['symbol'].unique() if t not in excluded_tickers]
    sequences = []
    for _ in range(n_episodes):
        ticker = rng.choice(eligible_tickers)
        stock_df = df[df['symbol'] == ticker]
        max_start = len(stock_df) - episode_length - 1
        if max_start < 1:
            continue
        start_idx = rng.integers(0, max_start)
        sequences.append((ticker, int(start_idx)))
    return sequences

# --- Evaluation: Only use scalar metrics from env info ---
def is_scalar_series(series):
    return series.apply(lambda x: np.isscalar(x) or isinstance(x, (np.floating, np.integer, float, int, np.float64, np.int64))).all()

def evaluate_agent(model, df, sequences, feature_cols, episode_length):
    all_infos = []
    all_actions = []
    for ticker, start_idx in sequences:
        env = make_env(df, ticker, feature_cols, episode_length)
        obs, _ = env.reset(start_index=start_idx)
        done = False
        info = {}
        episode_actions = []
        while not done:
            action, _ = model.predict(obs, deterministic=True)
            obs, reward, terminated, truncated, info = env.step(action)
            done = terminated or truncated
            episode_actions.append(int(action))
        all_infos.append(info)
        all_actions.extend(episode_actions)
    infos_df = pd.DataFrame(all_infos)
    scalar_cols = [col for col in infos_df.columns if is_scalar_series(infos_df[col])]
    metrics = {f"mean_{col}": infos_df[col].mean() for col in scalar_cols}
    metrics.update({f"std_{col}": infos_df[col].std() for col in scalar_cols})
    action_counts = pd.Series(all_actions).value_counts(normalize=True).to_dict()
    metrics["action_counts"] = action_counts
    metrics["action_entropy"] = -sum(p * np.log(p + 1e-8) for p in action_counts.values())
    return metrics

# --- Resumability: Load existing results ---
results = []
done_keys = set()

if os.path.exists(RESULTS_PATH):
    results_df = pd.read_csv(RESULTS_PATH)
    required_cols = {'split', 'split_start', 'agent', 'seed'}
    if required_cols.issubset(results_df.columns):
        done_keys = set(zip(
            results_df['split'], results_df['split_start'], results_df['agent'], results_df['seed']
        ))
        results = results_df.to_dict('records')
        print(f"Loaded {len(done_keys)} previously completed results.")
    else:
        print(f"WARNING: Existing {RESULTS_PATH} is missing required columns or is from an old experiment.")
        backup_path = RESULTS_PATH.replace(".csv", "_backup.csv")
        os.rename(RESULTS_PATH, backup_path)
        print(f"Backed up old file to {backup_path}. Starting new results file.")

# --- Precompute episode sequences for all splits, types, and seeds ---
episode_sequences = {}  # (split_type, split_start, seed) -> list of (ticker, start_idx)

for split in walk_forward_splits:
    train_start, train_end, test_end = split
    test_start = train_end
    df_train = ohlcv_df[(ohlcv_df['date'] >= train_start) & (ohlcv_df['date'] < train_end)]
    df_test  = ohlcv_df[(ohlcv_df['date'] >= test_start) & (ohlcv_df['date'] < test_end)]
    for split_type, df, split_start in [
        ("train", df_train, train_start),
        ("test",  df_test,  test_start),
    ]:
        for seed in range(N_SEEDS):
            seqs = generate_episode_sequences(df, EPISODE_LENGTH, N_EPISODES, EXCLUDED_TICKERS, seed=seed)
            if len(seqs) == 0:
                print(f"WARNING: No episodes for {split_type} {split_start} seed={seed}")
            episode_sequences[(split_type, split_start, seed)] = seqs

# --- Main walk-forward benchmark ---
for split in tqdm(walk_forward_splits, desc="Splits"):
    train_start, train_end, test_end = split
    test_start = train_end
    df_train = ohlcv_df[(ohlcv_df['date'] >= train_start) & (ohlcv_df['date'] < train_end)]
    df_test  = ohlcv_df[(ohlcv_df['date'] >= test_start) & (ohlcv_df['date'] < test_end)]

    if len(df_train) < EPISODE_LENGTH or len(df_test) < EPISODE_LENGTH:
        print(f"Skipping split {split}: Not enough data")
        continue

    for agent_type in AGENT_TYPES:
        for seed in range(N_SEEDS):
            for split_type, df, split_start, split_end in [
                ("train", df_train, train_start, train_end),
                ("test",  df_test,  test_start,  test_end)
            ]:
                key = (split_type, split_start, agent_type, seed)
                if key in done_keys:
                    print(f"Skipping {key} (already done)")
                    continue

                sequences = episode_sequences.get((split_type, split_start, seed), [])
                if len(sequences) == 0:
                    print(f"No episodes to sample in {split_type} {split} seed={seed}")
                    continue

                # Train only on train split, eval on both
                if split_type == "train":
                    ticker = sequences[0][0]  # For env construction
                    env = make_env(df, ticker, FEATURE_COLS, EPISODE_LENGTH)
                    vec_env = DummyVecEnv([lambda: env])
                    np.random.seed(seed)
                    torch.manual_seed(seed)
                    if agent_type == 'mlp':
                        model = PPO(
                            "MlpPolicy", vec_env,
                            verbose=0, seed=seed,
                            batch_size=CONFIG["batch_size"], n_steps=CONFIG['n_steps']
                        )
                    elif agent_type == 'lstm':
                        model = RecurrentPPO(
                            "MlpLstmPolicy", vec_env,
                            verbose=0, seed=seed,
                            batch_size=CONFIG["batch_size"], n_steps=CONFIG['n_steps']
                        )
                    elif agent_type == 'transformer_single':
                        model = PPO(
                            TransformerPolicy, vec_env,
                            policy_kwargs={'nhead': 1, 'num_layers': 1},
                            verbose=0, seed=seed,
                            batch_size=CONFIG["batch_size"], n_steps=CONFIG['n_steps']
                        )
                    elif agent_type == 'transformer_multi':
                        model = PPO(
                            TransformerPolicy, vec_env,
                            policy_kwargs={'nhead': 4, 'num_layers': 2},
                            verbose=0, seed=seed,
                            batch_size=CONFIG["batch_size"], n_steps=CONFIG['n_steps']
                        )
                    model.learn(total_timesteps=CONFIG["total_timesteps"])

                # Evaluate on current split (train or test)
                metrics = evaluate_agent(model, df, sequences, FEATURE_COLS, EPISODE_LENGTH)
                result = {
                    "split": split_type,
                    "split_start": split_start,
                    "split_end": split_end,
                    "agent": agent_type,
                    "seed": seed,
                }
                result.update(metrics)
                results.append(result)
                pd.DataFrame(results).to_csv(RESULTS_PATH, index=False)
                #for k, v in metrics.items():
                #    print(f"{split_type}_{agent_type}_{k}", v)
                #print(f"Done: {result}")
                print(f"Complete {key} ")
                
print("\nFinished all splits. Final summary:")
results_df = pd.DataFrame(results)
results_df.groupby(['split', 'agent']).mean(numeric_only=True)


Loaded 19 previously completed results.


Splits:   0%|          | 0/2 [00:00<?, ?it/s]

Skipping ('train', '2023-01-01', 'mlp', 0) (already done)
Skipping ('test', '2023-07-01', 'mlp', 0) (already done)
Skipping ('train', '2023-01-01', 'mlp', 1) (already done)
Skipping ('test', '2023-07-01', 'mlp', 1) (already done)
Skipping ('train', '2023-01-01', 'mlp', 2) (already done)
Skipping ('test', '2023-07-01', 'mlp', 2) (already done)
Skipping ('train', '2023-01-01', 'lstm', 0) (already done)
Skipping ('test', '2023-07-01', 'lstm', 0) (already done)
Skipping ('train', '2023-01-01', 'lstm', 1) (already done)
Skipping ('test', '2023-07-01', 'lstm', 1) (already done)
Skipping ('train', '2023-01-01', 'lstm', 2) (already done)
Skipping ('test', '2023-07-01', 'lstm', 2) (already done)
Skipping ('train', '2023-01-01', 'transformer_single', 0) (already done)
Skipping ('test', '2023-07-01', 'transformer_single', 0) (already done)
Skipping ('train', '2023-01-01', 'transformer_single', 1) (already done)
Skipping ('test', '2023-07-01', 'transformer_single', 1) (already done)
Skipping ('tra

Splits:  50%|█████     | 1/2 [00:28<00:28, 28.93s/it]

Complete ('test', '2023-07-01', 'transformer_multi', 2) 
Complete ('train', '2024-01-01', 'mlp', 0) 
Complete ('test', '2024-07-01', 'mlp', 0) 
Complete ('train', '2024-01-01', 'mlp', 1) 
Complete ('test', '2024-07-01', 'mlp', 1) 
Complete ('train', '2024-01-01', 'mlp', 2) 
Complete ('test', '2024-07-01', 'mlp', 2) 
Complete ('train', '2024-01-01', 'lstm', 0) 
Complete ('test', '2024-07-01', 'lstm', 0) 
Complete ('train', '2024-01-01', 'lstm', 1) 
Complete ('test', '2024-07-01', 'lstm', 1) 
Complete ('train', '2024-01-01', 'lstm', 2) 
Complete ('test', '2024-07-01', 'lstm', 2) 
Complete ('train', '2024-01-01', 'transformer_single', 0) 
Complete ('test', '2024-07-01', 'transformer_single', 0) 
Complete ('train', '2024-01-01', 'transformer_single', 1) 
Complete ('test', '2024-07-01', 'transformer_single', 1) 
Complete ('train', '2024-01-01', 'transformer_single', 2) 
Complete ('test', '2024-07-01', 'transformer_single', 2) 
Complete ('train', '2024-01-01', 'transformer_multi', 0) 
Comple

Splits: 100%|██████████| 2/2 [03:04<00:00, 92.40s/it] 

Complete ('test', '2024-07-01', 'transformer_multi', 2) 

Finished all splits. Final summary:





Unnamed: 0_level_0,Unnamed: 1_level_0,seed,mean_episode_sharpe,mean_episode_sortino,mean_episode_total_reward,mean_cumulative_return,mean_calmar,mean_max_drawdown,mean_win_rate,mean_alpha,std_episode_sharpe,std_episode_sortino,std_episode_total_reward,std_cumulative_return,std_calmar,std_max_drawdown,std_win_rate,std_alpha,action_entropy
split,agent,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1
test,lstm,1.0,-0.003046,0.017218,-0.019373,-0.008014,0.436019,0.208449,0.516667,-0.057638,0.094523,0.1485,0.256862,0.229958,1.334451,0.12296,0.485103,0.229958,0.473894
test,mlp,1.0,0.026756,0.064663,0.037001,0.050893,1.005697,0.171707,0.541667,0.001269,0.092662,0.164701,0.23635,0.274541,2.351576,0.093621,0.4532,0.274541,0.523273
test,transformer_multi,1.0,-0.020906,-0.010912,-0.006878,0.006541,0.364972,0.203345,0.491667,-0.043083,0.092396,0.157954,0.253562,0.293601,2.031629,0.075117,0.48591,0.293601,0.232417
test,transformer_single,1.0,-0.007059,0.008128,-0.026506,-0.015795,0.391341,0.211759,0.525,-0.065419,0.093594,0.151086,0.256978,0.230177,1.377273,0.11941,0.469392,0.230177,0.291752
train,lstm,1.0,0.012013,0.047523,0.029267,0.027484,0.66055,0.172421,0.433333,-0.065674,0.094472,0.171525,0.185874,0.19342,1.793349,0.075913,0.468352,0.19342,0.521109
train,mlp,1.0,0.007386,0.034946,0.017979,0.014827,0.515431,0.161548,0.463889,-0.078331,0.088598,0.135192,0.161559,0.16089,1.433661,0.080222,0.472129,0.16089,0.542128
train,transformer_multi,1.0,-0.017742,-0.009338,-0.027687,-0.032414,0.038703,0.178787,0.5,-0.125572,0.077843,0.115734,0.147831,0.137016,0.836562,0.074926,0.470878,0.137016,0.319299
train,transformer_single,1.0,0.019387,0.056244,0.047793,0.042041,0.538371,0.169278,0.325,-0.051117,0.084363,0.149505,0.160071,0.159366,1.170913,0.063959,0.419478,0.159366,0.281175


In [None]:
results_df.groupby(['split', 'agent']).mean(numeric_only=True)

In [13]:
df

Unnamed: 0,id,symbol,timestamp,date,open,high,low,close,volume,trade_count,...,vwap_change,trade_count_change,sector_id,industry_id,return_1d,vix,vix_norm,sp500,sp500_norm,market_return_1d
624,625,MMM,2024-07-01 04:00:00,2024-07-01,102.86,103.4494,100.2050,100.61,2705605.0,47196.0,...,-0.011600,-0.015499,unknown,unknown,-0.015461,0.1222,-0.017685,54.7509,0.002676,0.002676
625,626,MMM,2024-07-02 04:00:00,2024-07-02,100.56,101.9300,100.4600,101.62,2291274.0,43717.0,...,0.001638,-0.073714,unknown,unknown,0.010039,0.1203,-0.015548,55.0901,0.006195,0.006195
626,627,MMM,2024-07-03 04:00:00,2024-07-03,101.29,102.1500,100.6800,101.62,1230776.0,24937.0,...,0.002640,-0.429581,unknown,unknown,0.000000,0.1209,0.004988,55.3702,0.005084,0.005084
627,628,MMM,2024-07-05 04:00:00,2024-07-05,101.40,101.6600,100.6400,101.32,3059577.0,40548.0,...,-0.003695,0.626018,unknown,unknown,-0.002952,0.1248,0.017945,55.6719,0.005449,0.005449
628,629,MMM,2024-07-08 04:00:00,2024-07-08,101.51,102.7400,100.6200,101.10,2338695.0,37410.0,...,0.001051,-0.077390,unknown,unknown,-0.002171,0.1237,-0.008814,55.7285,0.001017,0.001017
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
429575,429576,SPY,2024-11-22 05:00:00,2024-11-22,593.66,596.1500,593.1525,595.51,38226390.0,346477.0,...,0.004474,-0.230292,unknown,unknown,0.003099,0.1524,-0.096621,59.6934,0.003468,0.003468
429576,429577,SPY,2024-11-25 05:00:00,2024-11-25,599.52,600.8600,595.2000,597.53,42441393.0,427181.0,...,0.004801,0.232927,unknown,unknown,0.003392,0.1460,-0.041995,59.8737,0.003020,0.003020
429577,429578,SPY,2024-11-26 05:00:00,2024-11-26,598.80,601.3300,598.0700,600.65,45621288.0,383149.0,...,0.003447,-0.103076,unknown,unknown,0.005221,0.1410,-0.034247,60.2163,0.005722,0.005722
429578,429579,SPY,2024-11-27 05:00:00,2024-11-27,600.46,600.8500,597.2800,598.83,34000163.0,332766.0,...,-0.001755,-0.131497,unknown,unknown,-0.003030,0.1410,0.000000,59.9874,-0.003801,-0.003801


In [12]:
results_df

Unnamed: 0,split,split_start,split_end,agent,seed,mean_episode_sharpe,mean_episode_sortino,mean_episode_total_reward,mean_cumulative_return,mean_calmar,...,std_episode_sharpe,std_episode_sortino,std_episode_total_reward,std_cumulative_return,std_calmar,std_max_drawdown,std_win_rate,std_alpha,action_counts,action_entropy
0,train,2023-01-01,2023-07-01,mlp,0,0.030434,0.07599,0.067848,0.070676,1.3215,...,0.111854,0.195593,0.217212,0.227883,2.724519,0.088135,0.483046,0.227883,"{0: 0.7343434343434343, 1: 0.18383838383838383...",0.742928
1,train,2023-01-01,2023-07-01,mlp,1,0.04905,0.084663,0.106694,0.098973,0.726095,...,0.067295,0.116762,0.135225,0.14522,1.078623,0.050302,0.527046,0.14522,"{1: 0.9565656565656566, 0: 0.03333333333333333...",0.2022657
2,train,2023-01-01,2023-07-01,mlp,2,0.027308,0.069499,0.068865,0.06525,0.994427,...,0.099196,0.157862,0.208642,0.20889,2.282008,0.073453,0.459468,0.20889,"{2: 0.5909090909090909, 1: 0.40505050505050505...",0.6992033
3,train,2023-01-01,2023-07-01,lstm,0,0.000861,0.032622,0.010508,0.019712,0.930239,...,0.114692,0.199133,0.252707,0.276069,2.728358,0.096097,0.527046,0.276069,"{2: 0.39191919191919194, 1: 0.3181818181818182...",1.09043
4,test,2023-07-01,2023-12-01,mlp,0,-0.004599,0.003448,-0.019133,-0.020448,0.171054,...,0.100938,0.121515,0.175353,0.162992,1.045763,0.100536,0.497214,0.162992,"{1: 0.6363636363636364, 0: 0.35454545454545455...",0.6979933
5,test,2023-07-01,2023-12-01,mlp,1,0.096087,0.156835,0.054453,0.055374,1.859839,...,0.060597,0.144408,0.078708,0.083064,2.214508,0.054527,0.273861,0.083064,"{0: 0.6797979797979798, 1: 0.31616161616161614...",0.6487035
6,test,2023-07-01,2023-12-01,mlp,2,0.036479,0.058427,0.05901,0.05323,0.663966,...,0.075797,0.103403,0.120942,0.123998,1.137068,0.067046,0.483046,0.123998,"{1: 0.7080808080808081, 0: 0.28585858585858587...",0.6333412
7,test,2023-07-01,2023-12-01,lstm,0,-0.004599,0.003448,-0.019133,-0.020448,0.171054,...,0.100938,0.121515,0.175353,0.162992,1.045763,0.100536,0.497214,0.162992,"{1: 0.6363636363636364, 0: 0.35454545454545455...",0.6979933
8,train,2023-01-01,2023-07-01,lstm,1,0.030312,0.066135,0.073419,0.070784,0.749906,...,0.084623,0.16731,0.188742,0.198477,1.530434,0.063955,0.527046,0.198477,"{1: 0.5373737373737374, 2: 0.3101010101010101,...",0.9836384
9,test,2023-07-01,2023-12-01,lstm,1,-0.023487,-0.018862,-0.046584,-0.047905,-0.05216,...,0.079739,0.106496,0.16765,0.15194,0.711243,0.091152,0.437798,0.15194,"{2: 0.42323232323232324, 1: 0.4090909090909091...",1.028986


In [14]:
episode_sequences

{('train', '2023-01-01', 0): [('SYF', 14),
  ('INVH', 6),
  ('DHI', 0),
  ('APA', 0),
  ('CARR', 18),
  ('MS', 20),
  ('IPG', 13),
  ('WST', 16),
  ('MAA', 12),
  ('LRCX', 21)],
 ('train', '2023-01-01', 1): [('HUBB', 11),
  ('PGR', 21),
  ('LNT', 3),
  ('SPG', 21),
  ('CPAY', 7),
  ('TEL', 9),
  ('DVA', 19),
  ('COST', 9),
  ('MPWR', 12),
  ('ACGL', 0)],
 ('train', '2023-01-01', 2): [('SBUX', 6),
  ('AXON', 6),
  ('GE', 18),
  ('HSY', 2),
  ('ENPH', 13),
  ('SLB', 16),
  ('ZBRA', 4),
  ('TXT', 1),
  ('LH', 6),
  ('CMG', 15)],
 ('test', '2023-07-01', 0): [('SNPS', 3),
  ('IQV', 1),
  ('DHI', 0),
  ('APA', 0),
  ('CARR', 4),
  ('MOS', 4),
  ('INTU', 3),
  ('WST', 3),
  ('MRNA', 2),
  ('LW', 4)],
 ('test', '2023-07-01', 1): [('HUM', 2),
  ('PGR', 4),
  ('LNT', 0),
  ('SWKS', 4),
  ('CPAY', 1),
  ('TDY', 2),
  ('DVA', 4),
  ('CTRA', 2),
  ('MNST', 2),
  ('ACGL', 0)],
 ('test', '2023-07-01', 2): [('SBUX', 1),
  ('AXON', 1),
  ('GE', 4),
  ('HES', 0),
  ('ETR', 3),
  ('STX', 3),
  ('ZBRA', 0