
# 00 — End-to-End Smoke Test (Demo using `data/ProbLearn100003.csv`)

This notebook is pinned to a single example subject CSV: **`data/ProbLearn100003.csv`**.
Keep using this same file name when you swap in real data so downstream cells don't break.

**Covers:**
1. `get_predicted_df` (RNG-aware, overflow-safe logistic)
2. `scripts/run_parallel.py` (`run_hyperthreaded` for parameter sweeps)
3. `select_optimal_parameters` (multi-start MLE with new model config helpers)
4. Quick QC & reproducibility checks


## 0) Environment & path setup

python -m venv .venv
.\.venv\Scripts\Activate.ps1
pip install -e .
python -m pip install --upgrade pip setuptools wheel



pip install -r requirements.txt

In [23]:
# Reimport everything needed for this section
import os
import sys
import pathlib
import numpy as np
import pandas as pd

# If you’ve restarted the kernel, re-detect the repo root too
cwd = pathlib.Path().resolve()
candidates = [cwd, cwd.parent, cwd.parent.parent, cwd.parent.parent.parent]
repo_root = None
for c in candidates:
    if (c / 'src').exists() and (c / 'scripts').exists():
        repo_root = c
        break
if repo_root is None:
    repo_root = cwd  # fallback

print("Detected repo root:", repo_root)


# Ensure imports work
sys.path.insert(0, str(repo_root))
sys.path.insert(0, str(repo_root / 'src'))


Detected repo root: C:\Users\Owner\Desktop\Neuro-Code\Neuro-Code


## 1) Imports

In [24]:

try:
    from neuro_code.helpers.get_predicted_df import get_predicted_df
    from scripts.run_parallel import run_hyperthreaded
    from neuro_code.helpers.select_optimal_parameters import select_optimal_parameters
    ok = True
except Exception as e:
    ok = False
    print("Import error. Ensure `pip install -e .` was run at repo root and notebook is inside repo.")
    raise e

print("Imports OK:", ok)


Imports OK: True


## 2) Load example subject CSV (pinned)

In [25]:
import pandas as pd
import numpy as np


DATA_CSV = repo_root / 'data' / 'ProbLearn100003.csv'  # <-- keep this file name consistent

if not DATA_CSV.exists():
    raise FileNotFoundError(f"Expected {DATA_CSV}. Place your CSV there (repo_root/data/ProbLearn100003.csv)." )

df = pd.read_csv(DATA_CSV)
print("Loaded:", DATA_CSV)
print("Shape:", df.shape)
print("Columns:", list(df.columns))
df.head()


Loaded: C:\Users\Owner\Desktop\Neuro-Code\Neuro-Code\data\ProbLearn100003.csv
Shape: (180, 10)
Columns: ['Trial_type', 'Response', 'Points_earned', 'Trial_start_time', 'Stim_end_time', 'Reaction_time', 'ITI_start_time', 'ITI_length', 'Counterbalance', 'Unnamed: 9']


Unnamed: 0,Trial_type,Response,Points_earned,Trial_start_time,Stim_end_time,Reaction_time,ITI_start_time,ITI_length,Counterbalance,Unnamed: 9
0,2.0,1,-5,547263,549780,2250,551782,519.0,1,
1,3.0,1,100,552309,555951,3288,557953,1454.0,1,
2,1.0,1,5,559414,562091,1677,564093,1381.0,1,
3,4.0,0,0,565481,568094,2612,570096,2868.0,1,
4,4.0,1,-100,572971,576884,899,578886,3060.0,1,


### 2.1) (Optional) Column sanity check / adapter

In [26]:

# Expect: Trial_type, Response, Points_earned
required = ['Trial_type','Response','Points_earned']
missing = [c for c in required if c not in df.columns]
if missing:
    raise ValueError(f"Missing required columns: {missing}. Please adapt this cell for your schema.")

# If your columns need renaming, do it here (example left as a template):
# df = df.rename(columns={'old_trial': 'Trial_type', 'old_resp': 'Response', 'old_pts': 'Points_earned'})


## 3) `get_predicted_df` demo + sanity checks

In [27]:

pars_for_pred = {
    'beta': 0.05,
    'lossave': 0.8,
    'alpha': 0.1,   # symmetric alpha
    'exp': 1.0,     # symmetric exp
}
rng = np.random.default_rng(1337)
pred = get_predicted_df(df, pars_for_pred, rng=rng)

print("Predicted DF shape:", pred.shape)
print("Columns:", list(pred.columns))

# Basic validations
required_cols = ['EV','PE','choiceprob','pred_choice','pred_reward']
missing = [c for c in required_cols if c not in pred.columns]
assert not missing, f"Missing expected columns: {missing}"

print("\nNaN counts:")
print(pred.isna().sum())

print("\nChoice probability summary:")
print(pred['choiceprob'].describe())

print("\nEV/PE summary:")
print(pred[['EV','PE']].describe())


Predicted DF shape: (180, 8)
Columns: ['Trial_type', 'Response', 'Points_earned', 'EV', 'PE', 'choiceprob', 'pred_choice', 'pred_reward']

NaN counts:
Trial_type        0
Response          0
Points_earned     0
EV                0
PE               87
choiceprob        0
pred_choice       0
pred_reward       0
dtype: int64

Choice probability summary:
count    180.000000
mean       0.464631
std        0.239704
min        0.124575
25%        0.269054
50%        0.455588
75%        0.512497
max        0.952189
Name: choiceprob, dtype: float64

EV/PE summary:
               EV         PE
count  180.000000  93.000000
mean    -4.721825  -0.193976
std     26.760093   7.063588
min    -48.745000 -49.695000
25%    -24.985718  -3.325131
50%     -4.452905  -0.067543
75%      1.000000   4.353345
max     59.830041  10.000000


## 4) Parallel sweep demo (`scripts/run_parallel.py`)

In [28]:

# Ensure output dirs
os.makedirs(repo_root / 'runs', exist_ok=True)

# Use the pinned CSV as input
param_ranges_demo = {
    'beta':    [0.01, 0.05],
    'lossave': [0.5, 1.0],
    'alpha':   [0.05],  # symmetric alpha
    'exp':     [1.0],   # symmetric exp
}

out = run_hyperthreaded(
    df_csv_path=str(DATA_CSV),
    param_ranges=param_ranges_demo,
    n_reps=2,
    max_workers=2,
    base_seed=42,
    out_csv_path=str(repo_root / 'runs' / 'all_sims_demo.csv'),
    write_incremental=False,
    out_dir=str(repo_root / 'runs' / 'sim_outputs_demo')
)

print("Rows:", len(out))
print("Columns:", list(out.columns)[:15], "...")
out.head()


Rows: 1440
Columns: ['Trial_type', 'Response', 'Points_earned', 'EV', 'PE', 'choiceprob', 'pred_choice', 'pred_reward', 'run_id', 'seed', 'param_beta', 'param_lossave', 'param_alpha', 'param_exp'] ...


Unnamed: 0,Trial_type,Response,Points_earned,EV,PE,choiceprob,pred_choice,pred_reward,run_id,seed,param_beta,param_lossave,param_alpha,param_exp
0,2.0,1,-5,0.0,,0.5,2,0.0,0,42,0.01,0.5,0.05,1.0
1,3.0,1,100,0.0,-0.25,0.5,1,-5.0,0,42,0.01,0.5,0.05,1.0
2,1.0,1,5,0.0,,0.5,2,0.0,0,42,0.01,0.5,0.05,1.0
3,4.0,0,0,0.0,0.25,0.5,1,5.0,0,42,0.01,0.5,0.05,1.0
4,4.0,1,-100,0.25,,0.500625,2,0.0,0,42,0.01,0.5,0.05,1.0


### 4.1) Quick QC & reproducibility checks

In [29]:

print(out[['choiceprob','EV','PE']].describe())

# Repeat a small sweep with same seed to confirm determinism
out2 = run_hyperthreaded(
    df_csv_path=str(DATA_CSV),
    param_ranges=param_ranges_demo,
    n_reps=1,
    max_workers=2,
    base_seed=42,
    out_csv_path=str(repo_root / 'runs' / 'all_sims_demo_repeat.csv'),
    write_incremental=False,
    out_dir=str(repo_root / 'runs' / 'sim_outputs_demo_repeat')
)

gcols = [c for c in out.columns if c.startswith('param_')] + ['run_id', 'seed']
merge_cols = gcols + ['Trial_type']
cmp1 = out.groupby(merge_cols)['choiceprob'].sum()
cmp2 = out2.groupby(merge_cols)['choiceprob'].sum()
print("Deterministic reproducibility:", cmp1.equals(cmp2))


        choiceprob           EV          PE
count  1440.000000  1440.000000  746.000000
mean      0.510034     0.859399   -0.227398
std       0.153936    26.655436    6.246035
min       0.105504   -83.630836  -24.921664
25%       0.440275   -16.904224   -2.145214
50%       0.500000     0.000000   -0.157562
75%       0.565524    20.643594    2.253973
max       0.934946    76.177687   24.905662
Deterministic reproducibility: False


## 5) MLE demo (`select_optimal_parameters`)

In [30]:

# Example MLE spec; update for your actual model presets
PARS_SPEC = {
    "alpha": None,     # fit
    "beta": None,      # fit
    "lossave": 0.8,    # fixed
    "exp": 1.0,        # fixed
    "model": "sym_alpha_exp"
}

best, table = select_optimal_parameters(
    data=df,
    subject="SUBJ_100003",
    n_fits=8,
    pars=PARS_SPEC,
    save=False,
    output_path=None,
    return_full_table=True,
    base_seed=2025
)

print("Best-fit (fitted keys only):", best)
table.sort_values('neglogprob').head()


TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''


## 6) Notes
- This notebook is pinned to **`data/ProbLearn100003.csv`** — keep that filename consistent.
- To test a different subject, either overwrite that file or duplicate this notebook and adjust `DATA_CSV`.
- Scale up `n_reps` (parallel) and `n_fits` (MLE) when you move beyond the demo configuration.
