# **Recherche d’hyperparamètres - Stratégie structurée et contrainte**

## **Objectif du notebook**
L’objectif de ce notebook est de concevoir une **recherche d’hyperparamètres rigoureuse, efficace et reproductible** pour nos agents du challenge *Permuted MNIST*.  
Contrairement à une recherche aléatoire exhaustive ou non contrôlée, l’approche présentée ici repose sur une **méthodologie progressive et modulaire**, adaptée aux **contraintes strictes de la plateforme** et à la **nature du problème** (classification multi-tâches sur données d’images permutées).

---

## **Contexte et contraintes**
Le challenge impose plusieurs contraintes matérielles et temporelles fortes :
- **CPU-only**, sans GPU autorisé  
- **2 threads maximum** pour le calcul  
- **Mémoire ≤ 4 Go**
- **Temps d’exécution par task ≤ 60 secondes**, incluant :
  - initialisation de l’agent  
  - apprentissage  
  - prédiction sur le jeu de test

Ces limitations interdisent l’utilisation de modèles trop lourds (notamment convolutionnels), ce qui oriente naturellement vers des **architectures MLP** optimisées pour la vitesse et la stabilité en CPU.

---

## **Philosophie de la recherche**
L’objectif n’est pas simplement de “lancer un grid search géant”, mais de **construire un protocole méthodique d’exploration** :
1. **Définir un espace de recherche raisonnable**, centré sur les hyperparamètres réellement influents (taille des couches, dropout, batch size, learning rate, label smoothing, etc.).  
2. **Évaluer progressivement** les configurations selon une stratégie multi-phases avec **pruning adaptatif**, permettant d’éliminer très tôt les modèles inefficaces.  
3. **Mesurer systématiquement** les performances et les temps d’exécution pour chaque configuration, afin de concilier **précision et faisabilité temporelle**.  
4. **Conserver les résultats complets** (CSV + JSON) à chaque étape pour analyse et traçabilité.  

Cette approche vise à **reproduire le raisonnement scientifique** d’un tuning rigoureux : formuler des hypothèses, les tester sous contrainte, et retenir les modèles présentant le meilleur compromis entre **performance, robustesse et efficacité temporelle**.

---

## **Structure de la démarche**
La recherche se déroule en **trois phases successives**, chacune avec un rôle précis :

| Phase | Objectif | # Tasks | Pruning | Description |
|:------|:----------|:--------|:---------|:-------------|
| **A — Exploration large** | Balayer un grand nombre de combinaisons pour identifier les tendances globales. | 3 | Oui | Permet de repérer rapidement les hyperparamètres prometteurs tout en éliminant les modèles trop lents ou sous-performants. |
| **B — Raffinement** | Confirmer les meilleures configurations issues de la phase A. | 6 | Oui | Évaluation plus stable, seuils de pruning plus exigeants. |
| **C — Validation finale** | Mesurer la robustesse statistique des top modèles. | 10 | Non | Évaluation complète sur 10 tasks pour obtenir des moyennes et écarts-types représentatifs. |

Chaque phase produit un rapport CSV (et un résumé JSON) enregistrant :
- la configuration exacte testée ;
- la moyenne et l’écart-type des accuracies ;
- le temps moyen et total par task ;
- les indicateurs de pruning.

---

## **Principe de modularité**
Le code est entièrement **modulaire et réutilisable**.  
Il repose sur une série de fonctions génériques qui permettent :
- d’instancier dynamiquement un agent via son module (`agent_spec`);
- de définir un **espace de recherche** arbitraire (`search_space`);
- de contrôler les règles de **pruning** et les tailles d’échantillons par phase ;
- d’automatiser toute la séquence `Phase A → Phase B → Phase C` via un unique appel :  
  ```python
  results = run_hparam_search(EXPERIMENT)


## **0) Setup & contraintes (à exécuter en premier)**


In [None]:
# --- Imports de base (sans hack de sys.path) ---
import os, sys, time, json, csv, random, itertools, importlib, inspect
from pathlib import Path
from typing import Dict, List, Tuple, Callable, Iterable
import numpy as np
import matplotlib.pyplot as plt

# --- Limites "plateforme" AVANT d'importer torch/numpy lourds si possible ---
from permuted_mnist.limits_perf3 import apply_challenge_limits, print_limits
apply_challenge_limits(threads=2, forbid_gpu=True, ram_gb=4, show=True)
print_limits()

# --- Environnement PermutedMNIST ---
from permuted_mnist.env.permuted_mnist import PermutedMNISTEnv

SEED = 42
random.seed(SEED)
np.random.seed(SEED)

[limits] OMP/BLAS threads=2 | CUDA_VISIBLE_DEVICES=-1
[limits] RLIMIT_AS: soft=8589934591GB hard=8589934591GB
[limits] OMP/BLAS threads=2 | CUDA_VISIBLE_DEVICES=-1
[limits] RLIMIT_AS: soft=8589934591GB hard=8589934591GB



## **1) Fabrique d'agent générique**


In [21]:
def make_agent_factory(agent_spec: Tuple[str, str],
                       fixed_kwargs: Dict | None = None,
                       seed: int = 42) -> Callable[[Dict], object]:
    """
    Retourne une fonction make_agent(cfg) qui instancie l'agent défini par agent_spec
    en filtrant automatiquement les kwargs non supportés par __init__ de la classe.
    """
    if fixed_kwargs is None:
        fixed_kwargs = {}

    mod = importlib.import_module(agent_spec[0])
    AgentCls = getattr(mod, agent_spec[1])

    sig = inspect.signature(AgentCls.__init__)
    allowed = set(sig.parameters.keys()) - {"self"}

    def make_agent(cfg: Dict) -> object:
        kw = dict(fixed_kwargs); kw.update(cfg)
        # Valeurs par défaut utiles si absentes
        kw.setdefault("seed", seed)
        kw.setdefault("output_dim", 10)
        # Filtrage automatique
        kw = {k: v for k, v in kw.items() if k in allowed}
        return AgentCls(**kw)

    return make_agent


## **2) Génération d'espaces (grid) et utilitaires**


In [22]:

def product_space(space: Dict[str, Iterable]) -> List[Dict]:
    """
    space: {"param": [v1, v2, ...], "param2": [...]}
    -> liste de dicts (toutes les combinaisons, mélangées)
    """
    keys = list(space.keys())
    vals = [list(space[k]) for k in keys]
    out = []
    for combo in itertools.product(*vals):
        out.append({k: v for k, v in zip(keys, combo)})
    random.shuffle(out)
    return out

def pick_top(results: List[Dict], k: int) -> List[Dict]:
    """
    Classement pénalisé: d'abord pénalité prune, puis -accuracy, puis temps moyen.
    results: sorties de eval_cfg(...)
    """
    def key(r):
        pen = (1.0 if r["pruned_time"] else 0.0) + (0.5 if r["pruned_acc"] else 0.0)
        return (pen, -r["mean_acc"], r["mean_time"])
    return sorted(results, key=key)[:k]

def save_phase_csv(path: Path, results: List[Dict], param_keys: List[str]):
    path.parent.mkdir(parents=True, exist_ok=True)
    with open(path, "w", newline="") as f:
        w = csv.writer(f)
        header = param_keys + ["mean_acc","std_acc","mean_time","total_time","n_tasks","pruned_time","pruned_acc"]
        w.writerow(header)
        for r in results:
            row = [r["config"].get(k, None) for k in param_keys]
            row += [r["mean_acc"], r["std_acc"], r["mean_time"], r["total_time"], r["n_tasks"],
                    r["pruned_time"], r["pruned_acc"]]
            w.writerow(row)


## **3) Évaluation d'une config avec pruning (multi-tâches)**


In [23]:
def eval_cfg(cfg: Dict,
             make_agent: Callable[[Dict], object],
             env_tasks: int,
             prune_rules: Dict,
             seed: int = 42,
             include_init_time: bool = False) -> Dict:
    """
    prune_rules attend:
      {
        "time_budget_s": 58.0,
        "time_factor_stop": 1.20,     # stop si t_task > factor * budget
        "phase": "A" / "B" / None,
        "A_first": 0.970, "A_mean": 0.970,
        "B_first": 0.980, "B_mean": 0.980
      }
    include_init_time:
      - False: on mesure train+predict (comme avant)
      - True : on mesure init+reset+train+predict (proche plateforme)
    """
    env = PermutedMNISTEnv(number_episodes=env_tasks)
    env.set_seed(seed)

    accs, times = [], []
    pruned_time = False
    pruned_acc = False
    phase = prune_rules.get("phase", None)

    t_id = 0
    while True:
        task = env.get_next_task()
        if task is None:
            break
        t_id += 1

        if include_init_time:
            t0 = time.time()
            agent = make_agent(cfg)     # on compte le coût d'instanciation
            agent.reset()
            t_start = t0
        else:
            agent = make_agent(cfg)
            agent.reset()
            t_start = time.time()

        agent.train(task["X_train"], task["y_train"])
        preds = agent.predict(task["X_test"])
        elapsed = time.time() - t_start

        acc = env.evaluate(preds, task["y_test"])
        accs.append(acc)
        times.append(elapsed)

        # Pruning temps
        if elapsed > prune_rules["time_factor_stop"] * prune_rules["time_budget_s"]:
            pruned_time = True
            break

        # Pruning accuracy
        if phase == "A":
            if len(accs) == 1 and accs[0] < prune_rules["A_first"]:
                pruned_acc = True; break
            if len(accs) >= 2 and np.mean(accs) < prune_rules["A_mean"]:
                pruned_acc = True; break
        if phase == "B":
            if len(accs) == 1 and accs[0] < prune_rules["B_first"]:
                pruned_acc = True; break
            if len(accs) >= 2 and np.mean(accs) < prune_rules["B_mean"]:
                pruned_acc = True; break

        if t_id >= env_tasks:
            break

    return {
        "config": cfg,
        "mean_acc": float(np.mean(accs) if accs else 0.0),
        "std_acc": float(np.std(accs) if accs else 0.0),
        "mean_time": float(np.mean(times) if times else 0.0),
        "total_time": float(np.sum(times)),
        "n_tasks": len(accs),
        "pruned_time": pruned_time,
        "pruned_acc": pruned_acc
    }


## **4) Orchestration d'une phase (A / B / C)**


In [24]:
def run_phase(label: str,
              cfg_list: List[Dict],
              make_agent: Callable[[Dict], object],
              env_tasks: int,
              prune_rules: Dict,
              out_csv: Path | None = None,
              param_keys: List[str] | None = None,
              include_init_time: bool = False) -> List[Dict]:
    print(f"\n=== Phase {label}: N={len(cfg_list)}, tasks={env_tasks}, phase={prune_rules.get('phase', None)} ===")
    results = []
    for i, cfg in enumerate(cfg_list, 1):
        r = eval_cfg(cfg,
                     make_agent=make_agent,
                     env_tasks=env_tasks,
                     prune_rules=prune_rules,
                     seed=SEED,
                     include_init_time=include_init_time)
        tag = " PRUNE[T]" if r["pruned_time"] else (" PRUNE[A]" if r["pruned_acc"] else "")
        print(f"[{label} {i}/{len(cfg_list)}] acc={r['mean_acc']:.4f}±{r['std_acc']:.4f} | "
              f"t={r['mean_time']:.1f}s | tasks={r['n_tasks']}{tag}")
        results.append(r)
    if out_csv and param_keys:
        save_phase_csv(out_csv, results, param_keys)
    return results

In [25]:
## **5) Pipeline complet prêt à l'emploi (A -> B -> C)**

def run_hparam_search(experiment: Dict) -> Dict:
    """
    experiment = {
        "agent_spec": ("module.path", "AgentClass"),
        "fixed_kwargs": {"time_budget_s": 55.0, ...},   # kwargs fixes de l'Agent
        "search_space": {...},                          # dict: param -> liste de valeurs
        "n_A": 30, "n_B_keep": 8, "n_C_keep": 3,
        "tasks": {"A": 3, "B": 6, "C": 10},
        "prune": {
            "A": {"time_budget_s": 58.0, "time_factor_stop": 1.20, "phase": "A",
                  "A_first": 0.970, "A_mean": 0.970},
            "B": {"time_budget_s": 58.0, "time_factor_stop": 1.20, "phase": "B",
                  "B_first": 0.980, "B_mean": 0.980},
            "C": {"time_budget_s": 58.0, "time_factor_stop": 1.20, "phase": None}
        },
        "outdir": Path(".../experiments/run_X"),
        "seed": 42
    }
    """
    outdir = experiment["outdir"]; outdir.mkdir(parents=True, exist_ok=True)
    param_keys = list(experiment["search_space"].keys())

    # Fabrique d'agent
    make_agent = make_agent_factory(
        agent_spec=experiment["agent_spec"],
        fixed_kwargs=experiment.get("fixed_kwargs", {}),
        seed=experiment.get("seed", 42)
    )

    # Espace Phase A
    all_cfgs = product_space(experiment["search_space"])
    A_cfgs = all_cfgs[:experiment["n_A"]]
    A = run_phase("A", A_cfgs, make_agent,
                  env_tasks=experiment["tasks"]["A"],
                  prune_rules=experiment["prune"]["A"],
                  out_csv=outdir/"phase_A.csv", param_keys=param_keys)

    # Sélection pour Phase B
    B_cfgs = [r["config"] for r in pick_top(A, experiment["n_B_keep"])]
    B = run_phase("B", B_cfgs, make_agent,
                  env_tasks=experiment["tasks"]["B"],
                  prune_rules=experiment["prune"]["B"],
                  out_csv=outdir/"phase_B.csv", param_keys=param_keys)

    # Sélection pour Phase C
    C_cfgs = [r["config"] for r in pick_top(B, experiment["n_C_keep"])]
    C = run_phase("C", C_cfgs, make_agent,
                  env_tasks=experiment["tasks"]["C"],
                  prune_rules=experiment["prune"]["C"],
                  out_csv=outdir/"phase_C.csv", param_keys=param_keys)

    final = sorted(C, key=lambda r: (-r["mean_acc"], r["mean_time"]))
    with open(outdir/"final_top.json", "w") as f:
        json.dump(final[:3], f, indent=2)

    # Affichage compact du TOP-3
    def _cfg_slice(r):
        return {k: r['config'].get(k) for k in param_keys}
    print("\nTOP-3:", [
        (_cfg_slice(final[0]), final[0]["mean_acc"], final[0]["mean_time"]) if len(final)>0 else None,
        (_cfg_slice(final[1]), final[1]["mean_acc"], final[1]["mean_time"]) if len(final)>1 else None,
        (_cfg_slice(final[2]), final[2]["mean_acc"], final[2]["mean_time"]) if len(final)>2 else None,
    ])

    return {"A": A, "B": B, "C": C, "final": final}

In [None]:
'''
# ============================================================
# EXPÉRIMENTATION — MLP 1 couche cachée (hidden = (H,))
# Agent cible : le même que d’habitude (simple MLP)
# ============================================================

# 1) Espace de recherche (1 seule couche)
WIDTHS_1L = [512, 768, 1024, 1536, 2048]   # tu peux élargir/réduire
DROPS     = [0.00, 0.05, 0.10]
BATCHES   = [1024, 2048, 3072]
LRS       = [1e-3, 1.2e-3, 1.5e-3]
LSMOOTH   = [0.0, 0.05]

SEARCH_SPACE_1L = {
    "hidden": [(w,) for w in WIDTHS_1L],   # <-- 1 SEULE couche : tuple à 1 élément
    "dropout": DROPS,
    "batch_size": BATCHES,
    "learning_rate": LRS,
    "label_smoothing": LSMOOTH,
    "max_epochs": [10],
    "val_fraction": [0.10],
    "weight_decay": [1e-4],
}

# 2) Règles de pruning un peu plus souples pour Phase A (modèles simples)
PRUNE_RULES_1L = {
    "A": {"time_budget_s": 58.0, "time_factor_stop": 1.20, "phase": "A",
          "A_first": 0.960, "A_mean": 0.965},      # <-- un poil plus bas que 2-couches
    "B": {"time_budget_s": 58.0, "time_factor_stop": 1.20, "phase": "B",
          "B_first": 0.975, "B_mean": 0.978},      # <-- on resserre en phase B
    "C": {"time_budget_s": 58.0, "time_factor_stop": 1.20, "phase": None}
}

# 3) Choix de l’agent
# - si tu veux tester ton simple MLP générique :
AGENT_SPEC_1L = ("permuted_mnist.agent_r1.agent_mlp_v3", "Agent")
# - ou une autre implémentation mono-couche compatible si tu en as une :
# AGENT_SPEC_1L = ("permuted_mnist.agent.mlp4.agent_Bruce_Wayne", "Agent")

# 4) Définition de l’expérience 1-couche
EXPERIMENT_1L = {
    "agent_spec": AGENT_SPEC_1L,
    "fixed_kwargs": {
        "time_budget_s": 55.0,   # marge vs 60 s plateforme
        "output_dim": 10,
        "seed": 42
    },
    "search_space": SEARCH_SPACE_1L,
    "n_A": 30,          # nb de configs max en Phase A
    "n_B_keep": 8,      # on retient les 8 meilleures pour B
    "n_C_keep": 3,      # on retient les 3 meilleures pour C
    "tasks": {"A": 3, "B": 6, "C": 10},
    "prune": PRUNE_RULES_1L,
    "outdir": Path("../experiments/hparam_1layer"),
    "seed": 42
}

# 5) Lancer la recherche
results_1L = run_hparam_search(EXPERIMENT_1L)

# 6) Petit affichage confort : récap rapide des TOP-3
final_1L = results_1L["final"]
print("\n[1-LAYER] TOP-3 (tri = acc desc, puis temps asc)")
for i, r in enumerate(sorted(final_1L, key=lambda x: (-x["mean_acc"], x["mean_time"]))[:3], 1):
    print(f"#{i}  cfg={r['config']}  |  acc={r['mean_acc']:.4f}  |  t={r['mean_time']:.1f}s")
''' 


=== Phase A: N=30, tasks=3, phase=A ===
[A 1/30] acc=0.9625±0.0006 | t=8.6s | tasks=2 PRUNE[A]
[A 2/30] acc=0.9544±0.0000 | t=4.5s | tasks=1 PRUNE[A]
[A 3/30] acc=0.9585±0.0000 | t=4.0s | tasks=1 PRUNE[A]
[A 4/30] acc=0.9814±0.0007 | t=10.3s | tasks=3
[A 5/30] acc=0.9826±0.0004 | t=12.5s | tasks=3
[A 6/30] acc=0.9729±0.0003 | t=4.4s | tasks=3
[A 7/30] acc=0.9672±0.0007 | t=5.0s | tasks=3
[A 8/30] acc=0.9785±0.0004 | t=12.3s | tasks=3
[A 9/30] acc=0.9721±0.0005 | t=3.2s | tasks=3
[A 10/30] acc=0.9625±0.0005 | t=5.9s | tasks=2 PRUNE[A]
[A 11/30] acc=0.9729±0.0004 | t=10.8s | tasks=3
[A 12/30] acc=0.9621±0.0003 | t=7.7s | tasks=2 PRUNE[A]
[A 13/30] acc=0.9801±0.0000 | t=3.2s | tasks=3
[A 14/30] acc=0.9767±0.0006 | t=15.1s | tasks=3
[A 15/30] acc=0.9741±0.0006 | t=9.8s | tasks=3
[A 16/30] acc=0.9758±0.0003 | t=8.5s | tasks=3
[A 17/30] acc=0.9776±0.0002 | t=10.0s | tasks=3
[A 18/30] acc=0.9815±0.0002 | t=6.6s | tasks=3
[A 19/30] acc=0.9688±0.0007 | t=11.0s | tasks=3
[A 20/30] acc=0.9709±0.

In [26]:
# ============================================================
# EXPÉRIMENTATION — MLP 1 couche cachée (hidden = (H,))
# ============================================================

# 0) Choisis l’agent à tester (doit accepter hidden=(H,), dropout, batch_size, learning_rate, etc.)
#    - agent_mlp_v3 : MLP générique (ReLU), compatible 1-couche
#    - agent_Bruce_Wayne : ton agent rapide (ReLU), compatible 1-couche
AGENT_SPEC_1L = ("permuted_mnist.agent_r1.agent_mlp_v3", "Agent")
# AGENT_SPEC_1L = ("permuted_mnist.agent.mlp4.agent_Bruce_Wayne", "Agent")

# 1) Presets d’espace de recherche (tu peux en ajouter d’autres)
def preset_search_space_1L(mode: str = "quick"):
    """
    mode in {"sanity", "quick", "full"} :
      - sanity : 4–6 configs pour valider le pipeline
      - quick  : ~40–60 configs pour un vrai screening
      - full   : ~100+ configs (plus long)
    Toutes les configs sont mono-couche: hidden=(H,)
    """
    if mode == "sanity":
        WIDTHS = [512, 1024]                   # petit éventail
        DROPS  = [0.00, 0.05]
        BATCH  = [1024, 2048]
        LRS    = [1e-3]
        LSM    = [0.0, 0.05]
    elif mode == "quick":
        WIDTHS = [512, 768, 1024, 1536, 2048]  # large mais raisonnable
        DROPS  = [0.00, 0.05, 0.10]
        BATCH  = [1024, 2048, 3072]
        LRS    = [1e-3, 1.2e-3, 1.5e-3]
        LSM    = [0.0, 0.05]
    elif mode == "full":
        WIDTHS = [384, 512, 768, 1024, 1280, 1536, 1792, 2048]
        DROPS  = [0.00, 0.05, 0.10, 0.15]
        BATCH  = [1024, 1536, 2048, 2560, 3072]
        LRS    = [8e-4, 1e-3, 1.2e-3, 1.5e-3]
        LSM    = [0.0, 0.05]
    else:
        raise ValueError("mode must be 'sanity', 'quick' or 'full'.")

    return {
        "hidden": [(w,) for w in WIDTHS],   # MONO-COUCHE: tuple à 1 élément
        "dropout": DROPS,
        "batch_size": BATCH,
        "learning_rate": LRS,
        "label_smoothing": LSM,
        "max_epochs": [10],                 # garde 10 ep. cohérent avec tes agents
        "val_fraction": [0.10],
        "weight_decay": [1e-4],
    }

# 2) Règles de pruning (un peu plus souples en Phase A pour 1-couche)
PRUNE_RULES_1L = {
    "A": {"time_budget_s": 58.0, "time_factor_stop": 1.20, "phase": "A",
          "A_first": 0.960, "A_mean": 0.965},
    "B": {"time_budget_s": 58.0, "time_factor_stop": 1.20, "phase": "B",
          "B_first": 0.975, "B_mean": 0.978},
    "C": {"time_budget_s": 58.0, "time_factor_stop": 1.20, "phase": None}
}

# 3) Fabrique l’expérience à partir d’un preset et lance
def make_experiment_1L(agent_spec, search_mode: str, outdir: Path) -> dict:
    space = preset_search_space_1L(search_mode)
    return {
        "agent_spec": agent_spec,
        "fixed_kwargs": {
            "time_budget_s": 55.0,   # marge vs 60 s plateforme
            "output_dim": 10,
            "seed": 42
        },
        "search_space": space,
        # Taille des phases (tu peux augmenter pour "full")
        "n_A": 30 if search_mode != "sanity" else 6,
        "n_B_keep": 8 if search_mode != "sanity" else 3,
        "n_C_keep": 3 if search_mode != "sanity" else 2,
        "tasks": {"A": 3, "B": 6, "C": 10},
        "prune": PRUNE_RULES_1L,
        "outdir": outdir,
        "seed": 42
    }

# 4) Choix du preset et lancement
SEARCH_MODE = "quick"   # "sanity" | "quick" | "full"
OUTDIR_1L = Path("../experiments/hparam_1layer") / SEARCH_MODE

EXPERIMENT_1L = make_experiment_1L(AGENT_SPEC_1L, SEARCH_MODE, OUTDIR_1L)

# IMPORTANT : mesure proche plateforme (inclure l’instanciation)
# -> on redéfinit run_hparam_search pour passer include_init_time=True à run_phase
def run_hparam_search_with_init(experiment: Dict) -> Dict:
    outdir = experiment["outdir"]; outdir.mkdir(parents=True, exist_ok=True)
    param_keys = list(experiment["search_space"].keys())

    make_agent = make_agent_factory(
        agent_spec=experiment["agent_spec"],
        fixed_kwargs=experiment.get("fixed_kwargs", {}),
        seed=experiment.get("seed", 42)
    )

    all_cfgs = product_space(experiment["search_space"])
    A_cfgs = all_cfgs[:experiment["n_A"]]
    A = run_phase("A", A_cfgs, make_agent,
                  env_tasks=experiment["tasks"]["A"],
                  prune_rules=experiment["prune"]["A"],
                  out_csv=outdir/"phase_A.csv", param_keys=param_keys,
                  include_init_time=True)

    B_cfgs = [r["config"] for r in pick_top(A, experiment["n_B_keep"])]
    B = run_phase("B", B_cfgs, make_agent,
                  env_tasks=experiment["tasks"]["B"],
                  prune_rules=experiment["prune"]["B"],
                  out_csv=outdir/"phase_B.csv", param_keys=param_keys,
                  include_init_time=True)

    C_cfgs = [r["config"] for r in pick_top(B, experiment["n_C_keep"])]
    C = run_phase("C", C_cfgs, make_agent,
                  env_tasks=experiment["tasks"]["C"],
                  prune_rules=experiment["prune"]["C"],
                  out_csv=outdir/"phase_C.csv", param_keys=param_keys,
                  include_init_time=True)

    final = sorted(C, key=lambda r: (-r["mean_acc"], r["mean_time"]))
    with open(outdir/"final_top.json", "w") as f:
        json.dump(final[:3], f, indent=2)

    return {"A": A, "B": B, "C": C, "final": final}

# 5) Lancer la recherche 1-couche
results_1L = run_hparam_search_with_init(EXPERIMENT_1L)

# 6) Affichage TOP-3 propre
final_1L = results_1L["final"]
print("\n[1-LAYER] TOP-3 (tri = acc desc, puis temps asc)")
for i, r in enumerate(sorted(final_1L, key=lambda x: (-x["mean_acc"], x["mean_time"]))[:3], 1):
    print(f"#{i}  cfg={r['config']}  |  acc={r['mean_acc']:.4f}  |  t={r['mean_time']:.1f}s")


=== Phase A: N=30, tasks=3, phase=A ===


KeyboardInterrupt: 

In [18]:
# ================================================
# 7) Post-hoc analysis: What drives performance?
# ================================================
import json, ast
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from pathlib import Path

# ---- Paths to phase CSVs (adjust if needed) ----
PHASE_DIR = Path("../experiments/phased_modular")  # or your OUT dir
phase_files = [PHASE_DIR/"phase_A.csv", PHASE_DIR/"phase_B.csv", PHASE_DIR/"phase_C.csv"]

# --------- 7.1 Load & unify results ----------
dfs = []
for p in phase_files:
    if p.exists():
        df = pd.read_csv(p)
        df["phase_file"] = p.name
        dfs.append(df)
if not dfs:
    raise RuntimeError("No phase CSV found. Run the search phases first.")
res = pd.concat(dfs, ignore_index=True)

# Normalize types: ensure hyperparams are strings if tuples
def _canonicalize_hidden(x):
    # x might be like "(2048, 1024)" string; parse then re-string cleanly
    if isinstance(x, str):
        try:
            t = ast.literal_eval(x)
            if isinstance(t, (list, tuple)):
                return str(tuple(int(v) for v in t))
        except Exception:
            pass
        return x
    if isinstance(x, (list, tuple, np.ndarray)):
        return str(tuple(int(v) for v in x))
    return str(x)

if "hidden" in res.columns:
    res["hidden"] = res["hidden"].apply(_canonicalize_hidden)

# Ensure numeric dtypes
for c in ["dropout","batch_size","learning_rate","label_smoothing",
          "mean_acc","std_acc","mean_time","total_time"]:
    if c in res.columns:
        res[c] = pd.to_numeric(res[c], errors="coerce")

# A small helper: time-budget compliance at 60s total per task (use your effective budget)
BUDGET_S = 60.0
res["ok_time"] = (res["mean_time"] <= BUDGET_S).astype(int)

print("Merged results shape:", res.shape)
display(res.head(10))

# --------- 7.2 Marginal effects per hyperparameter ----------
def summarize_marginal_effect(df, param, metric="mean_acc"):
    g = df.groupby(param)[metric]
    out = (g.mean().rename("mean")
             .to_frame()
             .join(g.std().rename("std"))
             .join(g.count().rename("n")))
    # 95% CI on the mean (normal approx); safe for large n
    out["se"] = out["std"] / np.sqrt(out["n"].clip(lower=1))
    out["ci95_low"] = out["mean"] - 1.96 * out["se"]
    out["ci95_high"] = out["mean"] + 1.96 * out["se"]
    return out.sort_values("mean", ascending=False)

candidate_params = [c for c in ["hidden","dropout","batch_size","learning_rate","label_smoothing"] if c in res.columns]

marginals = {}
for p in candidate_params:
    marginals[p] = summarize_marginal_effect(res, p, metric="mean_acc")

print("\n--- Marginal effects on accuracy (mean±CI) ---")
for p in candidate_params:
    print(f"\n[{p}]")
    display(marginals[p])

# Plot marginal bars for each param
for p in candidate_params:
    m = marginals[p]
    plt.figure()
    m["mean"].plot(kind="barh")
    plt.gca().invert_yaxis()
    plt.xlabel("Mean Accuracy")
    plt.title(f"Marginal accuracy by {p}")
    plt.show()

# --------- 7.3 Pairwise heatmaps (accuracy) ----------
def heatmap_param2(df, p1, p2, metric="mean_acc"):
    pivot = pd.pivot_table(df, index=p1, columns=p2, values=metric, aggfunc=np.mean)
    plt.figure()
    plt.imshow(pivot.values, aspect="auto")
    plt.xticks(range(pivot.shape[1]), pivot.columns, rotation=45, ha="right")
    plt.yticks(range(pivot.shape[0]), pivot.index)
    plt.title(f"{metric} heatmap: {p1} × {p2}")
    plt.colorbar()
    plt.tight_layout()
    plt.show()
    return pivot

pairs = []
# choose a few useful pairs if present
if "hidden" in res.columns and "learning_rate" in res.columns:
    pairs.append(("hidden","learning_rate"))
if "hidden" in res.columns and "dropout" in res.columns:
    pairs.append(("hidden","dropout"))
if "batch_size" in res.columns and "learning_rate" in res.columns:
    pairs.append(("batch_size","learning_rate"))

for p1, p2 in pairs:
    pivot = heatmap_param2(res, p1, p2, metric="mean_acc")
    display(pivot)

# --------- 7.4 Simple linear model on one-hot features (effect sizes) ----------
# Build X with one-hot encoding for categorical-ish features, numeric for continuous
use_cols = []
X_parts = []
for p in candidate_params:
    if res[p].dtype.kind in "ifu":  # numeric
        X_parts.append(((res[p] - res[p].mean()) / (res[p].std() + 1e-9)).to_frame(p))
        use_cols.append(p)
    else:
        # one-hot
        d = pd.get_dummies(res[p], prefix=p)
        # standardize columns for comparability
        d = (d - d.values.mean()) / (d.values.std() + 1e-9)
        X_parts.append(pd.DataFrame(d, index=res.index))
        use_cols.extend(d.columns.tolist())

X = pd.concat(X_parts, axis=1)
y = res["mean_acc"].fillna(res["mean_acc"].mean())

# Ridge-like closed form with small L2 to stabilize: beta = (X^T X + λI)^(-1) X^T y
lam = 1e-3
XtX = X.values.T @ X.values
beta = np.linalg.solve(XtX + lam*np.eye(XtX.shape[0]), X.values.T @ y.values)
coef = pd.Series(beta, index=X.columns).sort_values(ascending=False)

print("\n--- Standardized linear-effect sizes on accuracy (higher => stronger positive association) ---")
display(coef.head(20))
print("\n--- Most negative effects ---")
display(coef.tail(20))

# --------- 7.5 Permutation importance on the linear model ----------
def permutation_importance_linear(X_df, y_vec, beta_vec, n_repeats=10, seed=42):
    rng = np.random.default_rng(seed)
    base_pred = X_df.values @ beta_vec
    base_mse = np.mean((y_vec.values - base_pred)**2)
    imps = {}
    for col in X_df.columns:
        losses = []
        for _ in range(n_repeats):
            x_perm = X_df.copy()
            x_perm[col] = x_perm[col].sample(frac=1.0, random_state=int(rng.integers(1e9))).values
            pred = x_perm.values @ beta_vec
            losses.append(np.mean((y_vec.values - pred)**2))
        imps[col] = np.mean(losses) - base_mse
    s = pd.Series(imps).sort_values(ascending=False)
    return s

perm_imp = permutation_importance_linear(X, y, beta, n_repeats=20)
print("\n--- Permutation importance (ΔMSE; higher = more important) ---")
display(perm_imp.head(20))

plt.figure()
perm_imp.head(20).plot(kind="barh")
plt.gca().invert_yaxis()
plt.xlabel("Δ MSE on linear surrogate")
plt.title("Permutation importance (top 20 features)")
plt.show()

# --------- 7.6 Time/accuracy trade-off and budget compliance ----------
if "mean_time" in res.columns:
    # Compliance rate per config key
    print("\n--- Time budget compliance by parameter level (ok_time=mean_time ≤ 60s) ---")
    for p in candidate_params:
        grp = res.groupby(p)["ok_time"].mean().sort_values(ascending=False)
        print(f"\n[{p}]")
        display(grp)

    # Pareto front (accuracy vs time)
    df_at = res[["mean_acc","mean_time"]].dropna().copy()
    df_at = df_at.sort_values(["mean_time","mean_acc"])
    pareto_idx = []
    best_acc = -np.inf
    for i, row in df_at.iterrows():
        if row["mean_acc"] > best_acc:
            pareto_idx.append(i)
            best_acc = row["mean_acc"]
    pareto = df_at.loc[pareto_idx]

    plt.figure()
    plt.scatter(res["mean_time"], res["mean_acc"], s=18)
    plt.plot(pareto["mean_time"], pareto["mean_acc"])
    plt.xlabel("Mean time per task (s)")
    plt.ylabel("Mean accuracy")
    plt.title("Accuracy vs Time (Pareto front)")
    plt.show()

# --------- 7.7 Compact textual takeaways ----------
def top_levels(summary_dict, k=3):
    for p, dfp in summary_dict.items():
        best = dfp.head(k)[["mean","ci95_low","ci95_high","n"]]
        print(f"[{p}] top-{k} levels by accuracy mean:")
        print(best.to_string(float_format=lambda x: f"{x:.4f}"))
        print()

print("\n=== Quick takeaways (top-3 levels per parameter) ===")
top_levels(marginals, k=3)

RuntimeError: No phase CSV found. Run the search phases first.