<a href="https://colab.research.google.com/github/mervegulnazerdem/intentzero2few/blob/main/notebooks/Dissertation_merve_erdem_20092025.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
import os, getpass
os.environ["GITHUB_USER"] = input("GitHub kullanıcı adı: ").strip()
os.environ["GITHUB_PAT"]  = getpass.getpass("GitHub PAT (görünmeyecek): ")
print("OK ✓")

GitHub kullanıcı adı: mervegulnazerdem
GitHub PAT (görünmeyecek): ··········
OK ✓


# 1.) Identity & Fresh Clone (safe)
Git kimliğini ayarlar, repoyu temiz bir klasöre klonlar. İstersen yerel çıktıları korumak için NUKE_LOCAL=0 kullanabilirsin.

In [2]:
%%bash
set -euo pipefail

# --- Kimlik (kişisel) ---
git config --global user.name  "mervegulnazerdem"
git config --global user.email "mervegulnazerdem@gmail.com"

# --- Clone ayarları ---
REPO_URL="https://github.com/mervegulnazerdem/intentzero2few.git"
REPO_DIR="/content/intentzero2few-repo"

: "${NUKE_LOCAL:=1}"
if [ "${NUKE_LOCAL}" = "1" ]; then
  rm -rf "$REPO_DIR"
fi

git clone "$REPO_URL" "$REPO_DIR" || {
  echo "Repo boş olabilir, yine de klasörü oluşturuyorum."
  mkdir -p "$REPO_DIR" && (cd "$REPO_DIR" && git init)
}
git -C "$REPO_DIR" remote -v || true

# Boş repo ise (commit yok) 'main' dalını oluştur
if ! git -C "$REPO_DIR" rev-parse --verify HEAD >/dev/null 2>&1; then
  git -C "$REPO_DIR" checkout --orphan main
fi

echo "✅ Cloned/initialized → $REPO_DIR"

origin	https://github.com/mervegulnazerdem/intentzero2few.git (fetch)
origin	https://github.com/mervegulnazerdem/intentzero2few.git (push)
✅ Cloned/initialized → /content/intentzero2few-repo


Cloning into '/content/intentzero2few-repo'...
Switched to a new branch 'main'


In [None]:
%%bash
set -euo pipefail

# --- Kimlik (kişisel) ---
git config --global user.name  "mervegulnazerdem"
git config --global user.email "mervegulnazerdem@gmail.com"

# --- Clone ayarları ---
REPO_URL="https://github.com/mervegulnazerdem/intentzero2few.git"
REPO_DIR="/content/intentzero2few-repo"

# Yereli tamamen silmek istemiyorsan: NUKE_LOCAL=0 yaz ve tekrar çalıştır
: "${NUKE_LOCAL:=1}"
if [ "${NUKE_LOCAL}" = "1" ]; then
  rm -rf "$REPO_DIR"
fi

git clone "$REPO_URL" "$REPO_DIR"
git -C "$REPO_DIR" remote -v
echo "✅ Cloned to $REPO_DIR"


origin	https://github.com/mervegulnazerdem/intentzero2few.git (fetch)
origin	https://github.com/mervegulnazerdem/intentzero2few.git (push)
✅ Cloned to /content/intentzero2few-repo


Cloning into '/content/intentzero2few-repo'...


# 2.) Project skeleton + RUN_ID layout + .env (idempotent)
Klasör iskeletini kurar.

Tek bir RUN_ID üretip iki yerde klasörler açar:

runs/RUN_ID/ (ham, gitignore’da → repoya girmez)

reports/RUN_ID/ (kürasyonlu, repoya girer)

.env içine yolları yazar.

latest symlink’leri oluşturur (commit edilmez).

pyproject.toml, requirements.txt, .gitignore, README.md yoksa ekler, varsa ellemeyip bırakır.

Not: Var olan .env ve RUN_ID korunur. Yeni run başlatmak istersen hücrenin en üstüne NEW_RUN=1 yaz.

In [3]:
%%bash
set -euo pipefail

# (Opsiyonel) Yeni bir RUN açmak için:
NEW_RUN=1

REPO_DIR="/content/intentzero2few-repo"
cd "$REPO_DIR"

# ---- İskelet ----
mkdir -p src/intentzero2few notebooks scripts data runs reports
touch src/intentzero2few/__init__.py

# ---- RUN_ID yönetimi ----
if [ -f ".env" ] && [ "${NEW_RUN:-0}" != "1" ]; then
  # mevcut .env'yi yükle
  set +u
  source .env
  set -u
  echo "ℹ️ Reusing existing RUN_ID: $RUN_ID"
else
  RUN_ID="$(date +%Y%m%d-%H%M%S)"
  RUN_DIR="$REPO_DIR/runs/$RUN_ID"         # ham çıktılar
  REPORT_DIR="$REPO_DIR/reports/$RUN_ID"   # kürasyonlu çıktılar
  mkdir -p "$RUN_DIR"/{analytics,logs,figures,artifacts}
  mkdir -p "$REPORT_DIR"

  # 'latest' symlink'leri (commit ETME)
  rm -f "$REPO_DIR/runs/latest"    || true
  rm -f "$REPO_DIR/reports/latest" || true
  ln -s "$RUN_DIR"    "$REPO_DIR/runs/latest"
  ln -s "$REPORT_DIR" "$REPO_DIR/reports/latest"

  # .env yaz
  cat > "$REPO_DIR/.env" << EOF
export REPO_DIR="$REPO_DIR"
export RUN_ID="$RUN_ID"
export RUN_DIR="$RUN_DIR"
export REPORT_DIR="$REPORT_DIR"
EOF
  echo "✅ Created new RUN_ID: $RUN_ID"
fi

# ---- pyproject (yoksa) ----
if [ ! -f pyproject.toml ]; then
cat > pyproject.toml << 'PY'
[project]
name = "intentzero2few"
version = "0.1.0"
description = "Hierarchical Zero->Few intent pipeline (super-intent discovery, zero-shot, few-shot, threshold calibration)"
readme = "README.md"
requires-python = ">=3.10"
dependencies = [
  "pandas>=2.0",
  "numpy>=1.24",
  "scikit-learn>=1.3",
  "sentence-transformers>=2.3",
  "datasets>=2.19",
  "transformers>=4.41",
  "umap-learn>=0.5.5",
  "matplotlib>=3.8",
  "seaborn>=0.13",
  "nltk>=3.8"
]

[build-system]
requires = ["setuptools>=68", "wheel"]
build-backend = "setuptools.build_meta"

[tool.setuptools]
package-dir = {"" = "src"}

[tool.setuptools.packages.find]
where = ["src"]
PY
  echo "✅ Wrote pyproject.toml"
else
  echo "ℹ️ pyproject.toml exists (kept as-is)"
fi

# ---- requirements (yoksa) ----
if [ ! -f requirements.txt ]; then
cat > requirements.txt << 'REQ'
pandas>=2.0
numpy>=1.24
scikit-learn>=1.3
sentence-transformers>=2.3
datasets>=2.19
transformers>=4.41
umap-learn>=0.5.5
matplotlib>=3.8
seaborn>=0.13
nltk>=3.8
REQ
  echo "✅ Wrote requirements.txt"
else
  echo "ℹ️ requirements.txt exists (kept as-is)"
fi

# ---- .gitignore (yoksa yaz; varsa kritik satırları ekle) ----
if [ ! -f .gitignore ]; then
cat > .gitignore << 'IGN'
__pycache__/
*.pyc
*.pyo
*.pyd
*.egg-info/
.venv/
env/
venv/

# local secrets / creds
.netrc
.git-credentials
.env

# big local artifacts (keep out of git)
data/
outputs/
checkpoints/
cache/
*.ckpt

# notebook/editor cruft
.ipynb_checkpoints/
.DS_Store
*.swp
/.cache/

# do not commit the 'latest' symlink
runs/latest
reports/latest
IGN
  echo "✅ Wrote .gitignore"
else
  grep -qxF '.env'            .gitignore || echo '.env' >> .gitignore
  grep -qxF '.git-credentials' .gitignore || echo '.git-credentials' >> .gitignore
  grep -qxF '.netrc'          .gitignore || echo '.netrc' >> .gitignore
  grep -qxF 'runs/latest'     .gitignore || echo 'runs/latest' >> .gitignore
  grep -qxF 'reports/latest'  .gitignore || echo 'reports/latest' >> .gitignore
  echo "ℹ️ Updated .gitignore with missing lines (if any)"
fi

# ---- README (yoksa) ----
if [ ! -f README.md ]; then
cat > README.md << 'MD'
# intentzero2few

Hierarchical Zero→Few intent classification:
- Super-intent discovery (8–12 clusters)
- Zero-shot super-intent with OOS threshold
- Few-shot sub-intents
- Threshold calibration (macro-F1 sweep)
- Robust eval on clean vs polluted sets

## Outputs
One run → two roots with the same RUN_ID:
- `runs/<RUN_ID>/`     (raw): analytics/, logs/, figures/, artifacts/
- `reports/<RUN_ID>/`  (curated): tables & figures for the thesis

Convenience symlinks:
- `runs/latest` and `reports/latest` point to your most recent run.

## Open in Colab
[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/mervegulnazerdem/intentzero2few/blob/main/notebooks/01_experiments.ipynb)
MD
  echo "✅ Wrote README.md"
else
  echo "ℹ️ README.md exists (kept as-is)"
fi

echo "📂 RUN_DIR:   ${RUN_DIR:-"(loaded from .env)"}"
echo "📂 REPORT_DIR:${REPORT_DIR:-"(loaded from .env)"}"
echo "🔗 latest runs    → $(readlink -f runs/latest    || echo 'N/A')"
echo "🔗 latest reports → $(readlink -f reports/latest || echo 'N/A')"


✅ Created new RUN_ID: 20250921-123702
✅ Wrote pyproject.toml
✅ Wrote requirements.txt
✅ Wrote .gitignore
✅ Wrote README.md
📂 RUN_DIR:   /content/intentzero2few-repo/runs/20250921-123702
📂 REPORT_DIR:/content/intentzero2few-repo/reports/20250921-123702
🔗 latest runs    → /content/intentzero2few-repo/runs/20250921-123702
🔗 latest reports → /content/intentzero2few-repo/reports/20250921-123702


# 2.1) Python bootstrap (path safety + logging) (ekleyelim; ilk iki parçanın doğal tamamlayıcısı)
src/’i sys.pathe ekler (shadowing yok).

.env’yi Python tarafında okur (Bash’te source etmesen bile).

Python logging’i başlatır: runs/<RUN_ID>/logs/ altına .log yazar.

Eğer utils_logging.setup_logger modülün kuruluysa onu kullanır; yoksa fallback ile standart logging kurar.

In [4]:
# %%python
import os, sys, importlib, logging
from datetime import datetime

# --- .env yükle (Bash'te source etmesen de çalışır) ---
REPO_DIR = "/content/intentzero2few-repo"
env_path = os.path.join(REPO_DIR, ".env")
if os.path.exists(env_path):
    with open(env_path, "r") as f:
        for line in f:
            line=line.strip()
            if line.startswith("export "):
                k,v = line.replace("export ","",1).split("=",1)
                os.environ[k]=v.strip('"')
REPO_DIR = os.environ.get("REPO_DIR", REPO_DIR)
RUN_ID    = os.environ.get("RUN_ID",  datetime.now().strftime("%Y%m%d-%H%M%S"))
RUN_DIR   = os.environ.get("RUN_DIR", os.path.join(REPO_DIR,"runs",RUN_ID))
REPORT_DIR= os.environ.get("REPORT_DIR", os.path.join(REPO_DIR,"reports",RUN_ID))

# çalışma dizinini repo'ya al
os.chdir(REPO_DIR)

# --- path safety (src önce, repo kökü değil) ---
SRC_DIR = os.path.join(REPO_DIR, "src")
for p in list(sys.path):
    if p.rstrip("/") == REPO_DIR.rstrip("/"):
        sys.path.remove(p)
if SRC_DIR not in sys.path:
    sys.path.insert(0, SRC_DIR)
importlib.invalidate_caches()

# --- logging bootstrap ---
LOG_DIR = os.path.join(RUN_DIR, "logs")
os.makedirs(LOG_DIR, exist_ok=True)
logger = None
try:
    # varsa kendi helper'ını kullan
    from intentzero2few.utils_logging import setup_logger
    logger, log_path = setup_logger(log_dir=LOG_DIR)
except Exception:
    # fallback: basic logging
    log_path = os.path.join(LOG_DIR, f"intentzero2few-{datetime.now().strftime('%Y%m%d-%H%M%S')}.log")
    logger = logging.getLogger("intentzero2few")
    logger.setLevel(logging.INFO)
    logger.handlers.clear()
    fh = logging.FileHandler(log_path, encoding="utf-8")
    ch = logging.StreamHandler()
    fmt = logging.Formatter("%(asctime)s | %(levelname)s | %(name)s | %(message)s")
    fh.setFormatter(fmt); ch.setFormatter(fmt)
    logger.addHandler(fh); logger.addHandler(ch)
    logger.info("Fallback logger initialized.")

print("✅ cwd:", os.getcwd())
print("✅ SRC_DIR on sys.path:", SRC_DIR in sys.path)
print("🗂️ RUN_DIR:", RUN_DIR)
print("🗂️ REPORT_DIR:", REPORT_DIR)
print("📝 Log file:", log_path)
logger.info("Bootstrap complete. RUN_ID=%s", RUN_ID)


2025-09-21 12:37:08,443 | INFO | intentzero2few | Fallback logger initialized.
INFO:intentzero2few:Fallback logger initialized.
2025-09-21 12:37:08,448 | INFO | intentzero2few | Bootstrap complete. RUN_ID=20250921-123702
INFO:intentzero2few:Bootstrap complete. RUN_ID=20250921-123702


✅ cwd: /content/intentzero2few-repo
✅ SRC_DIR on sys.path: True
🗂️ RUN_DIR: /content/intentzero2few-repo/runs/20250921-123702
🗂️ REPORT_DIR: /content/intentzero2few-repo/reports/20250921-123702
📝 Log file: /content/intentzero2few-repo/runs/20250921-123702/logs/intentzero2few-20250921-123708.log


# 3.) Modules & editable install

In [5]:
%%bash
# === Clean, write ALL modules (robust init; augmentation guaranteed) & editable install ===
set -euo pipefail
REPO_DIR="${REPO_DIR:-/content/intentzero2few-repo}"
PKG_DIR="$REPO_DIR/src/intentzero2few"
mkdir -p "$PKG_DIR"
cd "$REPO_DIR"

#####################################
# core.py
#####################################
cat > "$PKG_DIR/core.py" << 'PY'
from __future__ import annotations
import random, numpy as np
SEED = 42
TEXT_COL = "text"
LABEL_COL = "intent"

def set_all_seeds(seed: int = 42):
    random.seed(seed)
    np.random.seed(seed)
    try:
        import torch
        torch.manual_seed(seed)
        if torch.cuda.is_available():
            torch.cuda.manual_seed_all(seed)
    except Exception:
        pass
PY

#####################################
# utils_io.py
#####################################
cat > "$PKG_DIR/utils_io.py" << 'PY'
from __future__ import annotations
import os, json
from typing import Any, Optional
import pandas as pd

def _env(name: str, default: Optional[str] = None) -> str:
    return os.environ.get(name, default or "")

def get_env_paths() -> dict:
    repo = _env("REPO_DIR", "/content/intentzero2few-repo")
    run_id = _env("RUN_ID")
    run_dir = _env("RUN_DIR") or (os.path.join(repo, "runs", run_id) if run_id else os.path.join(repo, "runs", "adhoc"))
    report_dir = _env("REPORT_DIR") or (os.path.join(repo, "reports", run_id) if run_id else os.path.join(repo, "reports", "adhoc"))
    for sub in ("analytics", "logs", "figures", "artifacts"):
        os.makedirs(os.path.join(run_dir, sub), exist_ok=True)
    os.makedirs(report_dir, exist_ok=True)
    return {"REPO_DIR": repo, "RUN_ID": run_id, "RUN_DIR": run_dir, "REPORT_DIR": report_dir}

def run_path(subdir: str, filename: str) -> str:
    p = get_env_paths()
    base = os.path.join(p["RUN_DIR"], subdir)
    os.makedirs(base, exist_ok=True)
    return os.path.join(base, filename)

def report_path(filename: str) -> str:
    p = get_env_paths()
    base = p["REPORT_DIR"]
    os.makedirs(base, exist_ok=True)
    return os.path.join(base, filename)

def save_json(obj: Any, path: str, ensure_ascii: bool = False, indent: int = 2):
    os.makedirs(os.path.dirname(path), exist_ok=True)
    with open(path, "w", encoding="utf-8") as f:
        json.dump(obj, f, ensure_ascii=ensure_ascii, indent=indent)

def save_csv(df: pd.DataFrame, path: str, index: bool = False):
    os.makedirs(os.path.dirname(path), exist_ok=True)
    df.to_csv(path, index=index)

def save_figure(fig, path: str, dpi: int = 300, bbox_inches: str = "tight"):
    os.makedirs(os.path.dirname(path), exist_ok=True)
    fig.savefig(path, dpi=dpi, bbox_inches=bbox_inches)

def copy_to_report(src_path: str, dst_name: Optional[str] = None) -> str:
    import shutil
    p = get_env_paths()
    dst = os.path.join(p["REPORT_DIR"], dst_name or os.path.basename(src_path))
    os.makedirs(os.path.dirname(dst), exist_ok=True)
    shutil.copy2(src_path, dst)
    return dst
PY

#####################################
# utils_logging.py
#####################################
cat > "$PKG_DIR/utils_logging.py" << 'PY'
from __future__ import annotations
import logging, os, time

def setup_logger(name: str = "intentzero2few", log_dir: str | None = None):
    if log_dir is None:
        run_dir = os.environ.get("RUN_DIR", "")
        log_dir = (os.path.join(run_dir, "logs") if run_dir else os.path.join("runs","logs"))
    os.makedirs(log_dir, exist_ok=True)
    ts = time.strftime("%Y%m%d-%H%M%S")
    path = os.path.join(log_dir, f"{name}-{ts}.log")

    lg = logging.getLogger(name)
    lg.setLevel(logging.INFO)
    lg.handlers.clear()

    fh = logging.FileHandler(path, encoding="utf-8")
    ch = logging.StreamHandler()
    fmt = logging.Formatter("%(asctime)s | %(levelname)s | %(name)s | %(message)s")
    fh.setFormatter(fmt); ch.setFormatter(fmt)
    lg.addHandler(fh); lg.addHandler(ch)
    lg.info("Logger started. path=%s", path)
    return lg, path
PY

#####################################
# utils_errors.py
#####################################
cat > "$PKG_DIR/utils_errors.py" << 'PY'
from __future__ import annotations
import os, pandas as pd
from .utils_io import run_path

def error_csv_path(component: str = "generic") -> str:
    return run_path("analytics", f"errors_{component}.csv")

def log_error_row(component: str = "generic", **fields):
    path = error_csv_path(component)
    row = pd.DataFrame([fields])
    if os.path.exists(path):
        old = pd.read_csv(path)
        row = pd.concat([old, row], ignore_index=True)
    row.to_csv(path, index=False)
    return path
PY

#####################################
# dataio.py
#####################################
cat > "$PKG_DIR/dataio.py" << 'PY'
from __future__ import annotations
from typing import Dict, Optional
import json, pandas as pd

def load_intents(path: str) -> Dict[str, pd.DataFrame]:
    with open(path, "r", encoding="utf-8") as f:
        data = json.load(f)
    def to_df(key: str) -> Optional[pd.DataFrame]:
        return pd.DataFrame(data[key], columns=["text","intent"]) if key in data else None
    splits = {"train":to_df("train"), "val":to_df("val"), "test":to_df("test")}
    for req in ["train","val","test"]:
        if splits[req] is None:
            raise ValueError(f"Missing split: {req}")
    for opt in ["oos_val","oos_test"]:
        df = to_df(opt)
        if df is not None:
            splits[opt] = df
    return splits
PY

#####################################
# eda.py
#####################################
cat > "$PKG_DIR/eda.py" << 'PY'
from __future__ import annotations
import numpy as np, pandas as pd
import matplotlib.pyplot as plt, seaborn as sns

def quick_eda(df: pd.DataFrame, title: str = "Dataset"):
    print(f"== {title} ==")
    print("Rows:", len(df))
    print("Unique intents:", df['intent'].nunique())
    lengths = df['text'].str.split().apply(len)
    print("Avg words:", round(lengths.mean(), 4))
    print("Min/Max:", lengths.min(), "/", lengths.max())
    print("\nTop 10 intents:\n", df['intent'].value_counts().head(10))
    return lengths

def plot_top_intents(df: pd.DataFrame, top_n: int = 20):
    c = df['intent'].value_counts()
    plt.figure(figsize=(12,6))
    sns.barplot(x=c[:top_n].values, y=c[:top_n].index)
    plt.title(f'Top {top_n} Intents')
    plt.tight_layout(); plt.show()

def sample_by_intent(df, n_intents=6, n_per_intent=1, label_col="intent", text_col="text", random_state=42):
    rng = np.random.default_rng(random_state)
    df = df.copy()
    if label_col not in df.columns:
        df[label_col] = "oos"
    shown = [c for c in [label_col, text_col] if c in df.columns] or df.columns[:2].tolist()
    uniq = df[label_col].dropna().astype(str).unique().tolist()
    lower = [u.lower() for u in uniq]
    single = (len(uniq)==1 and lower[0] in {"oos","out_of_scope","out-of-scope","unknown"})
    if single or len(uniq)==1:
        k = min(len(df), n_intents*n_per_intent)
        if k <= 0: return df.iloc[:0,:][shown]
        idx = rng.choice(len(df), size=k, replace=False)
        return df.iloc[idx][shown].reset_index(drop=True)
    import numpy as _np
    idx = _np.arange(len(uniq)); rng.shuffle(idx)
    picked = [uniq[i] for i in idx[:min(n_intents,len(uniq))]]
    parts = []
    for lab in picked:
        block = df[df[label_col].astype(str) == str(lab)]
        take = min(len(block), n_per_intent)
        if take == 0: continue
        parts.append(block.sample(take, random_state=random_state)[shown])
    if not parts:
        return df.sample(min(len(df), n_intents*n_per_intent), random_state=random_state)[shown].reset_index(drop=True)
    return pd.concat(parts, axis=0).reset_index(drop=True)

def show_split(name, df, **kw):
    try:
        from IPython.display import display
        print(f"\n=== {name} ===")
        display(sample_by_intent(df, **kw))
    except Exception as e:
        print(f"{name} sample error:", e)
PY

#####################################
# fewshot.py
#####################################
cat > "$PKG_DIR/fewshot.py" << 'PY'
from __future__ import annotations
import numpy as np, pandas as pd

def make_k_shot(train_df: pd.DataFrame, k: int = 5, seed: int = 42, drop_short: bool = False) -> pd.DataFrame:
    rng = np.random.RandomState(seed)
    out = []
    for intent, g in train_df.groupby("intent"):
        if len(g) < k:
            if drop_short: continue
            out.append(g)
        else:
            out.append(g.sample(n=k, random_state=rng))
    if not out: raise ValueError("No class has >=k samples.")
    return pd.concat(out).reset_index(drop=True)
PY

#####################################
# pollution.py
#####################################
cat > "$PKG_DIR/pollution.py" << 'PY'
from __future__ import annotations
import re, random, numpy as pdnp, pandas as pd
np = pdnp  # alias to avoid accidental shadowing

_WORD_RE = re.compile(r"[A-Za-z0-9]+")

def _tokenize_en(s: str):
    return [w.lower() for w in _WORD_RE.findall(str(s))]

def _jaccard(a: set, b: set):
    if not a and not b: return 0.0
    return len(a & b) / max(1, len(a | b))

def _collect_vocab(df, text_col: str = "text", sample_n: int = 10000):
    vocab = set()
    for t in df[text_col].head(sample_n):
        vocab.update(_tokenize_en(t))
    return vocab

def generate_fallback_negatives_en(n: int = 100, seed: int = 42,
                                   avoid_like_df: pd.DataFrame | None = None,
                                   max_trials_per_item: int = 10,
                                   gibberish_ratio: float = 0.3) -> pd.DataFrame:
    rng = random.Random(seed)
    topics_en = [
        "weather forecast","stock market trends","movie showtimes","soccer results","traffic updates",
        "air quality index","music recommendations","travel itineraries","baking recipes","breaking news headlines",
        "scientific discoveries","astrology readings","wildlife conservation","space exploration","art exhibitions",
        "quantum computing","medieval history","mountaineering safety tips","aquarium maintenance","vintage car auctions"
    ]
    question_templates = [
        "What is the latest on {topic}?","Can you summarize {topic} for me?","Where can I find resources about {topic}?",
        "What are some key facts about {topic}?","Could you give me a brief overview of {topic}?"
    ]
    command_templates = [
        "List three insights about {topic}.","Provide a short guide on {topic}.","Generate bullet points covering {topic}.",
        "Outline the main considerations for {topic}.","Give me a quick tip related to {topic}."
    ]
    gibberish_templates = [
        "lorem ipsum placeholder text {i}","random token sequence {i}","unrelated noise string {i}",
        "/// dummy //// content //// {i}","??? gibberish line {i}"
    ]
    inscope_vocab = set()
    if (avoid_like_df is not None) and (len(avoid_like_df) > 0) and ("text" in avoid_like_df.columns):
        inscope_vocab = _collect_vocab(avoid_like_df, text_col="text")

    def make_sentence(i: int) -> str:
        if rng.random() < (1.0 - gibberish_ratio):
            topic = rng.choice(topics_en); tpl = rng.choice(question_templates + command_templates)
            return tpl.format(topic=topic)
        else:
            tpl = rng.choice(gibberish_templates); return tpl.format(i=i)

    texts = []
    for i in range(n):
        trials = 0
        while True:
            s = make_sentence(i)
            if not inscope_vocab:
                texts.append(s); break
            jac = _jaccard(set(_tokenize_en(s)), inscope_vocab)
            if jac < 0.3 or trials >= max_trials_per_item:
                texts.append(s); break
            trials += 1
    return pd.DataFrame({"text": texts, "intent": ["__NEG__"] * n})

def make_polluted_test(test_df: pd.DataFrame, oos_df: pd.DataFrame | None = None,
                       ratio: float = 0.3, seed: int = 42,
                       fallback_random_negatives: bool = False):
    rng = np.random.RandomState(seed)
    n_oos = int(len(test_df) * ratio)
    if (oos_df is not None) and (len(oos_df) > 0):
        oos_sample = oos_df.sample(n=min(n_oos, len(oos_df)), random_state=rng)
        pol = (pd.concat([test_df.assign(is_oos=0), oos_sample.assign(is_oos=1)], ignore_index=True)
                 .sample(frac=1, random_state=rng).reset_index(drop=True))
        return pol, oos_sample
    if fallback_random_negatives:
        neg = generate_fallback_negatives_en(n=n_oos, seed=seed, avoid_like_df=test_df)
        pol = (pd.concat([test_df.assign(is_oos=0), neg.assign(is_oos=1)], ignore_index=True)
                 .sample(frac=1, random_state=rng).reset_index(drop=True))
        return pol, neg
    return test_df.copy(), None

def make_polluted_test_debug(test_df: pd.DataFrame, oos_df: pd.DataFrame | None = None,
                             ratio: float = 0.3, seed: int = 42,
                             fallback_random_negatives: bool = False):
    rng = np.random.RandomState(seed)
    n_oos = int(len(test_df) * ratio)
    print("\n--- POLLUTION DEBUG ---")
    print("Original test_df size:", len(test_df))
    print("OOS df provided?", "YES" if (oos_df is not None and len(oos_df) > 0) else "NO")
    print("Fallback random negatives?", "YES" if fallback_random_negatives else "NO")
    print("Planned OOS additions:", n_oos)
    if (oos_df is not None) and (len(oos_df) > 0):
        oos_sample = oos_df.sample(n=min(n_oos, len(oos_df)), random_state=rng)
        print("OOS sample size:", len(oos_sample))
        pol = (pd.concat([test_df.assign(is_oos=0), oos_sample.assign(is_oos=1)], ignore_index=True)
                 .sample(frac=1, random_state=rng).reset_index(drop=True))
        print("Polluted size after merge:", len(pol))
        return pol, oos_sample
    if fallback_random_negatives:
        neg = pd.DataFrame({"text": [f"random unrelated text {i}" for i in range(n_oos)],
                            "intent": ["__NEG__"] * n_oos})
        print("Random negatives generated:", len(neg))
        pol = (pd.concat([test_df.assign(is_oos=0), neg.assign(is_oos=1)], ignore_index=True)
                 .sample(frac=1, random_state=rng).reset_index(drop=True))
        print("Polluted size after merge:", len(pol))
        return pol, neg
    print("No pollution applied. Returning original test_df.")
    return test_df.copy(), None
PY

#####################################
# augmentation.py  (Classic + Noisy)  — DEFINES augment_text_noisy
#####################################
cat > "$PKG_DIR/augmentation.py" << 'PY'
from __future__ import annotations
from typing import Optional
import re, random, numpy as np, pandas as pd

# ---------- Classic ----------
def _safe_import_wordnet():
    try:
        from nltk.corpus import wordnet
        return wordnet
    except Exception as e:
        raise RuntimeError("Run once: import nltk; nltk.download('wordnet'); nltk.download('omw-1.4')") from e

def get_synonyms(word:str):
    wn = _safe_import_wordnet()
    syns = set()
    for syn in wn.synsets(word):
        for l in syn.lemmas():
            w = l.name().replace("_"," ")
            if w.lower() != word.lower():
                syns.add(w)
    return list(syns)

def synonym_replacement(tokens, n=1, seed=42):
    rng = np.random.RandomState(seed)
    tokens = tokens.copy()
    cand = [w for w in tokens if re.match(r"^[A-Za-z]+$", w)]
    rng.shuffle(cand)
    cnt = 0
    for w in cand:
        try:
            syns = get_synonyms(w)
        except RuntimeError:
            syns = []
        if syns:
            rep = rng.choice(syns)
            idx = tokens.index(w)
            tokens[idx] = rep
            cnt += 1
            if cnt >= n:
                break
    return tokens

def random_deletion(tokens, p=0.1, seed=42):
    rng = np.random.RandomState(seed)
    if len(tokens) == 1:
        return tokens
    keep = [t for t in tokens if rng.rand() > p]
    return keep if keep else tokens

def random_swap(tokens, n=1, seed=42):
    rng = np.random.RandomState(seed)
    tokens = tokens.copy()
    for _ in range(n):
        if len(tokens) < 2:
            break
        i, j = rng.choice(range(len(tokens)), 2, replace=False)
        tokens[i], tokens[j] = tokens[j], tokens[i]
    return tokens

def augment_text(text, alpha_sr=0.1, alpha_rd=0.1, alpha_rs=0.1, seed=42):
    toks = text.split()
    L = len(toks)
    if L == 0:
        return text
    n_sr = max(1, int(alpha_sr*L))
    n_rs = max(1, int(alpha_rs*L))
    try:
        t1 = synonym_replacement(toks, n=n_sr, seed=seed)
    except RuntimeError:
        t1 = toks
    t2 = random_deletion(t1, p=alpha_rd, seed=seed)
    t3 = random_swap(t2, n=n_rs, seed=seed)
    return " ".join(t3)

def make_augmented_df(df, per_example=1, seed=42):
    rng = np.random.RandomState(seed)
    rows = []
    for _,row in df.iterrows():
        rows.append(row)
        for _ in range(per_example):
            rows.append(pd.Series({"text":augment_text(row["text"], seed=int(rng.randint(0,1e9))), "intent":row["intent"]}))
    return pd.DataFrame(rows).reset_index(drop=True)

# ---------- Noisy ----------
_EMOJIS = ["😊","😂","🔥","🚀","👍","💯","😅","😬","🤔","😎","😭","😡","✨","🎯","🤷","🙃"]
_SLANG_MAP = {
    "hello":"hey", "hi":"yo", "thanks":"thx", "thank you":"ty", "please":"plz",
    "you":"u", "are":"r", "for":"4", "to":"2", "great":"gr8", "before":"b4",
    "okay":"ok", "ok":"k", "really":"rlly", "people":"ppl", "message":"msg",
    "because":"cuz", "see you":"cya", "bye":"bb", "awesome":"awsome",
}
_KEY_NEAR = {
  "q":"w","w":"qe","e":"wr","r":"et","t":"ry","y":"tu","u":"yi","i":"uo","o":"ip","p":"o",
  "a":"s","s":"ad","d":"sf","f":"dg","g":"fh","h":"gj","j":"hk","k":"jl","l":"k",
  "z":"x","x":"zc","c":"xv","v":"cb","b":"vn","n":"bm","m":"n"
}

def inject_slang_emoji(text: str, slang_prob: float = 0.2, emoji_prob: float = 0.2, seed: Optional[int] = None) -> str:
    rng = random.Random(seed)
    toks = text.split()
    for i, tok in enumerate(toks):
        low = tok.lower()
        if rng.random() < slang_prob:
            if i+1 < len(toks):
                bigram = f"{low} {toks[i+1].lower()}"
                if bigram in _SLANG_MAP:
                    toks[i]   = _SLANG_MAP[bigram]
                    toks[i+1] = ""
                    continue
            if low in _SLANG_MAP:
                toks[i] = _SLANG_MAP[low]
    sent = " ".join([t for t in toks if t != ""]).strip()
    if rng.random() < emoji_prob:
        sent = (sent + " " + rng.choice(_EMOJIS)).strip()
    return sent

def inject_typos(text: str, typo_prob: float = 0.08, seed: Optional[int] = None) -> str:
    rng = random.Random(seed)
    chars = list(text)
    i = 0
    while i < len(chars):
        if chars[i].isspace():
            i += 1; continue
        if rng.random() < typo_prob:
            op = rng.choice(["drop","swap","sub"])
            c = chars[i]
            if op == "drop":
                del chars[i]
                continue
            elif op == "swap" and i+1 < len(chars) and not chars[i+1].isspace():
                chars[i], chars[i+1] = chars[i+1], chars[i]
                i += 2; continue
            elif op == "sub":
                low = c.lower()
                repl = rng.choice(list(_KEY_NEAR.get(low, low)))
                chars[i] = repl.upper() if c.isupper() else repl
        i += 1
    return "".join(chars)

def augment_text_noisy(text: str,
                       slang_prob: float = 0.15,
                       emoji_prob: float = 0.15,
                       typo_prob: float = 0.08,
                       del_p: float = 0.05,
                       swap_n: int = 1,
                       seed: int = 42) -> str:
    t = inject_slang_emoji(text, slang_prob=slang_prob, emoji_prob=emoji_prob, seed=seed)
    t = inject_typos(t, typo_prob=typo_prob, seed=seed+1)
    toks = t.split()
    toks = random_deletion(toks, p=del_p, seed=seed+2)
    toks = random_swap(toks, n=swap_n, seed=seed+3)
    return " ".join(toks)

def make_noisy_df(df: pd.DataFrame, per_example: int = 1, seed: int = 42,
                  slang_prob: float = 0.15, emoji_prob: float = 0.15, typo_prob: float = 0.08,
                  del_p: float = 0.05, swap_n: int = 1) -> pd.DataFrame:
    rng = np.random.RandomState(seed)
    rows = []
    for _, row in df.iterrows():
        rows.append(row)
        for _ in range(per_example):
            noisy = augment_text_noisy(
                row["text"],
                slang_prob=slang_prob, emoji_prob=emoji_prob, typo_prob=typo_prob,
                del_p=del_p, swap_n=swap_n, seed=int(rng.randint(0,1e9))
            )
            rows.append(pd.Series({"text": noisy, "intent": row["intent"]}))
    return pd.DataFrame(rows).reset_index(drop=True)
PY

#####################################
# labeling.py
#####################################
cat > "$PKG_DIR/labeling.py" << 'PY'
from __future__ import annotations
from typing import Iterable, Optional
import pandas as pd
from sklearn.preprocessing import LabelEncoder

def _normalize_label(x)->str: return "" if x is None else str(x).strip()
def _normcase(x)->str: return _normalize_label(x).casefold()
def _oos_norm_set(oos_labels: Optional[Iterable[str]]): return { _normcase(lbl) for lbl in (oos_labels or []) }

def fit_label_encoder(train_df: pd.DataFrame, label_col: str = "intent",
                      oos_labels: Optional[Iterable[str]]=("OOS","__NEG__")):
    oos = _oos_norm_set(oos_labels)
    labs = train_df[label_col].apply(_normalize_label)
    mask = labs.apply(lambda v: _normcase(v) in oos)
    in_scope = labs[~mask]
    le = LabelEncoder().fit(in_scope.values)
    l2i = {lbl:int(i) for i,lbl in enumerate(le.classes_)}
    i2l = {int(i):lbl for lbl,i in l2i.items()}
    return le,l2i,i2l

def encode_in_scope_labels(df: pd.DataFrame, le: LabelEncoder, label_col="intent",
                           oos_labels=("OOS","__NEG__"), oos_sentinel=-1, out_col="label_id"):
    df = df.copy()
    oos = _oos_norm_set(oos_labels)
    labs = df[label_col].apply(_normalize_label)
    mask_in = ~labs.apply(lambda v: _normcase(v) in oos)
    df.loc[mask_in, out_col] = le.transform(labs[mask_in].values)
    df.loc[~mask_in, out_col] = oos_sentinel
    df[out_col] = df[out_col].astype(int)
    return df

def sanity_check_labels(df_list, out_col="label_id"):
    for i,df in enumerate(df_list, start=1):
        assert out_col in df.columns, f"missing {out_col} in df#{i}"
        assert pd.api.types.is_integer_dtype(df[out_col]), f"{out_col} must be int"
PY

#####################################
# baselines.py
#####################################
cat > "$PKG_DIR/baselines.py" << 'PY'
from __future__ import annotations
from typing import Dict, Optional, List
import numpy as np, pandas as pd
from sklearn.metrics import accuracy_score, f1_score, roc_auc_score, roc_curve
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model import LogisticRegression
from sklearn.svm import LinearSVC

def split_in_scope(df_enc: pd.DataFrame, text_col="text", y_col="label_id"):
    mask = df_enc[y_col].astype(int) >= 0
    X = df_enc.loc[mask, text_col].astype(str).tolist()
    y = df_enc.loc[mask, y_col].astype(int).values
    return X, y

def eval_in_scope(df_enc: pd.DataFrame, y_pred: np.ndarray, y_col="label_id"):
    y_true = df_enc.loc[df_enc[y_col] >= 0, y_col].astype(int).values
    return {"accuracy":float(accuracy_score(y_true,y_pred)),
            "macro_f1":float(f1_score(y_true,y_pred,average="macro"))}

def select_threshold_msp(polluted_val_df_enc: pd.DataFrame, msp_val: np.ndarray,
                         oos_label_id: int = -1, target_tpr: float = 0.95):
    is_oos = (polluted_val_df_enc["label_id"].astype(int)==oos_label_id).astype(int).values
    det = -msp_val
    auroc = roc_auc_score(is_oos, det)
    fpr, tpr, thr = roc_curve(is_oos, det)
    idx = int(np.argmin(np.abs(tpr - target_tpr)))
    return {"tau":float(-thr[idx]), "auroc":float(auroc), "fpr_at_tpr":float(fpr[idx]), "target_tpr":float(tpr[idx])}

def oos_metrics(polluted_test_df_enc: pd.DataFrame, msp_test: np.ndarray, tau: float,
                oos_label_id: int = -1, y_pred_in_scope: Optional[np.ndarray] = None):
    is_oos = (polluted_test_df_enc["label_id"].astype(int)==oos_label_id).astype(int).values
    det = -msp_test
    fpr, tpr, thr = roc_curve(is_oos, det)
    auroc = roc_auc_score(is_oos, det)
    idx = int(np.argmin(np.abs(tpr - 0.95)))
    fpr95 = float(fpr[idx])
    out = {"auroc_oos":float(auroc), "fpr@tpr95":fpr95, "tau_used":float(tau)}
    if y_pred_in_scope is not None:
        accept = (msp_test>=tau) & (is_oos==0)
        y_true = polluted_test_df_enc.loc[accept,"label_id"].astype(int).values
        y_pred = y_pred_in_scope[accept]
        from sklearn.metrics import accuracy_score
        out["in_scope_acc_on_accepted"] = float(accuracy_score(y_true,y_pred)) if len(y_true)>0 else float("nan")
    return out

class MajorityClassifier:
    def fit(self, y):
        vals,counts = np.unique(y, return_counts=True)
        self.major = int(vals[np.argmax(counts)]); return self
    def predict(self, X): return np.full((len(X),), self.major, dtype=int)
    def predict_proba(self, X, n_classes:int):
        p = np.zeros((len(X), n_classes), float); p[:,self.major] = 1.0; return p

class TfidfLR:
    def __init__(self,max_features=30000,ngram_range=(1,2),min_df=2,C=1.0):
        self.vec = TfidfVectorizer(max_features=max_features, ngram_range=ngram_range, min_df=min_df)
        self.clf = LogisticRegression(max_iter=2000, class_weight="balanced", C=C)
        self.n_classes_ = None
    def fit(self,X_train,y_train):
        Xtr = self.vec.fit_transform(X_train); self.clf.fit(Xtr,y_train)
        self.n_classes_ = int(self.clf.classes_.shape[0]); return self
    def predict(self,X):
        Xte = self.vec.transform(X); return self.clf.predict(Xte)
    def predict_proba(self,X):
        Xte = self.vec.transform(X); return self.clf.predict_proba(Xte)

class TfidfLinearSVM:
    def __init__(self,max_features=30000,ngram_range=(1,2),min_df=2,C=1.0):
        self.vec = TfidfVectorizer(max_features=max_features, ngram_range=ngram_range, min_df=min_df)
        self.clf = LinearSVC(C=C); self.n_classes_ = None
    def fit(self,X,y):
        Xtr = self.vec.fit_transform(X); self.clf.fit(Xtr,y); self.n_classes_ = len(np.unique(y)); return self
    def predict(self,X):
        Xte = self.vec.transform(X); return self.clf.predict(Xte)
    def msp_like(self,X):
        Xte = self.vec.transform(X)
        margins = self.clf.decision_function(Xte)
        if margins.ndim==1:
            import numpy as _np
            margins = _np.vstack([-margins,margins]).T
        m = margins - margins.max(axis=1,keepdims=True)
        e = np.exp(m); p = e / e.sum(axis=1,keepdims=True)
        return p.max(axis=1)

def _safe_st():
    try:
        from sentence_transformers import SentenceTransformer
        return SentenceTransformer
    except Exception:
        return None

class BertLinear:
    def __init__(self, model_name="sentence-transformers/all-MiniLM-L6-v2"):
        self.model_name = model_name
        self.SentenceTransformer = _safe_st()
        self.model = None
        self.clf = LogisticRegression(max_iter=2000, class_weight="balanced")
        self.dim_ = None; self.n_classes_ = None
    def _ensure(self):
        if self.SentenceTransformer is None:
            raise ImportError("Install sentence-transformers")
        if self.model is None:
            self.model = self.SentenceTransformer(self.model_name)
    def _embed(self,texts:list[str]):
        self._ensure()
        return np.asarray(self.model.encode(texts, normalize_embeddings=True, show_progress_bar=False))
    def fit(self,X,y):
        Xemb = self._embed(X); self.dim_ = Xemb.shape[1]; self.clf.fit(Xemb,y)
        self.n_classes_ = int(self.clf.classes_.shape[0]); return self
    def predict(self,X): return self.clf.predict(self._embed(X))
    def predict_proba(self,X): return self.clf.predict_proba(self._embed(X))
PY

#####################################
# hf_trainers.py, discovery.py, zeroshot.py, threshold.py, evaluate.py, viz.py
#####################################
cat > "$PKG_DIR/hf_trainers.py" << 'PY'
from __future__ import annotations
import numpy as np
from sklearn.metrics import accuracy_score, f1_score
def hf_train_eval(model_name, train_df, val_df, le, epochs=3, batch_size=16):
    from datasets import Dataset
    from transformers import AutoTokenizer, AutoModelForSequenceClassification, Trainer, TrainingArguments
    tok = AutoTokenizer.from_pretrained(model_name)
    def enc(b): return tok(b['text'], padding='max_length', truncation=True)
    tr = Dataset.from_pandas(train_df).map(enc, batched=True)
    va = Dataset.from_pandas(val_df).map(enc, batched=True)
    model = AutoModelForSequenceClassification.from_pretrained(model_name, num_labels=len(le.classes_))
    args = TrainingArguments(output_dir="./results", evaluation_strategy="epoch",
                             per_device_train_batch_size=batch_size, per_device_eval_batch_size=batch_size,
                             num_train_epochs=epochs, logging_dir="./logs", logging_steps=10, report_to=[])
    def metrics(p):
        preds = np.argmax(p.predictions, axis=1)
        return {"accuracy":accuracy_score(p.label_ids,preds), "macro_f1":f1_score(p.label_ids,preds,average="macro")}
    trn = Trainer(model=model, args=args, train_dataset=tr, eval_dataset=va, tokenizer=tok, compute_metrics=metrics)
    trn.train(); return model
PY

cat > "$PKG_DIR/discovery.py" << 'PY'
from __future__ import annotations
import logging, numpy as np, pandas as pd
from typing import Dict, Tuple, List
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.cluster import KMeans
from sklearn.metrics import silhouette_score

def _get_encoder(model_name: str):
    from sentence_transformers import SentenceTransformer
    import torch
    device = "cuda" if torch.cuda.is_available() else "cpu"
    return SentenceTransformer(model_name, device=device)

def _embed(encoder, texts: List[str], batch_size: int = 256) -> np.ndarray:
    return np.asarray(encoder.encode(texts, batch_size=batch_size,
                                     convert_to_numpy=True, normalize_embeddings=True,
                                     show_progress_bar=False))

def build_intent_descriptions(train_df: pd.DataFrame, text_col: str = "text",
                              label_col: str = "intent", top_k_terms: int = 8,
                              max_features: int = 20000, ngram_range=(1,2)) -> Dict[str,str]:
    vec = TfidfVectorizer(max_features=max_features, ngram_range=ngram_range, min_df=2)
    _ = vec.fit_transform(train_df[text_col].astype(str).values)
    vocab = np.array(vec.get_feature_names_out())
    label_desc: Dict[str, str] = {}
    for lab, block in train_df.groupby(label_col):
        Xi = vec.transform(block[text_col].astype(str).values)
        mean = np.asarray(Xi.mean(axis=0)).ravel()
        if mean.sum() == 0:
            toks = " ".join(block[text_col].astype(str).tolist()).split()
            terms = [t for t,_ in pd.Series(toks).value_counts().head(top_k_terms).items()]
        else:
            idx = np.argsort(mean)[::-1][:top_k_terms]
            terms = [vocab[i] for i in idx if mean[i] > 0]
        label_desc[str(lab)] = "This intent is about: " + ", ".join(terms)
    return label_desc

def discover_superintents(train_df: pd.DataFrame, k_range: Tuple[int,int]=(8,12),
                          model_name: str = "sentence-transformers/all-MiniLM-L6-v2",
                          text_col: str = "text", label_col: str = "intent",
                          top_k_terms: int = 8, random_state: int = 42):
    logger = logging.getLogger("intentzero2few")
    df = train_df[[text_col, label_col]].dropna().copy()
    df[label_col] = df[label_col].astype(str)
    label_desc = build_intent_descriptions(df, text_col=text_col, label_col=label_col, top_k_terms=top_k_terms)
    intents = sorted(label_desc.keys())
    enc = _get_encoder(model_name)
    embs = _embed(enc, [label_desc[i] for i in intents], batch_size=256)

    k_min, k_max = int(k_range[0]), int(k_range[1])
    k_scores = {}; best = (-1.0, None, None)
    for k in range(k_min, k_max+1):
        if k <= 1 or k >= len(intents): continue
        km = KMeans(n_clusters=k, n_init=10, random_state=random_state)
        labels = km.fit_predict(embs)
        sil = silhouette_score(embs, labels) if 1 < k < len(intents) else -1.0
        k_scores[k] = float(sil); logger.info("discovery: K=%d silhouette=%.4f", k, sil)
        if sil > best[0]: best = (sil, k, labels)
    if best[1] is None:
        labels = np.zeros(len(intents), dtype=int); k_best = 1
    else:
        k_best = int(best[1]); labels = best[2]
    intent_to_super, super_to_intents = {}, {}
    for idx, intent in enumerate(intents):
        sid = f"S{int(labels[idx])}"
        intent_to_super[intent] = sid
        super_to_intents.setdefault(sid, []).append(intent)
    artifacts = {"model_name": model_name, "label_desc": label_desc, "intents_sorted": intents,
                 "embeddings": embs.astype(np.float32), "k_best": k_best, "k_scores": k_scores}
    return intent_to_super, super_to_intents, artifacts
PY

cat > "$PKG_DIR/zeroshot.py" << 'PY'
from __future__ import annotations
import logging, numpy as np, pandas as pd
from typing import Dict, List

def _get_encoder(model_name: str):
    from sentence_transformers import SentenceTransformer
    import torch
    device = "cuda" if torch.cuda.is_available() else "cpu"
    return SentenceTransformer(model_name, device=device)

def _embed(encoder, texts: List[str], batch_size: int = 256) -> np.ndarray:
    return np.asarray(encoder.encode(texts, batch_size=batch_size,
                                     convert_to_numpy=True, normalize_embeddings=True,
                                     show_progress_bar=False))

class ZeroShotSuperIntent:
    def __init__(self, model_name: str, super_labels: List[str],
                 centroids: np.ndarray, intent_to_super: Dict[str,str]):
        self.model_name = model_name
        self.super_labels = list(super_labels)
        self.centroids = np.asarray(centroids, np.float32)
        self.intent_to_super = dict(intent_to_super)
        self._encoder = None
    def _ensure(self):
        if self._encoder is None:
            self._encoder = _get_encoder(self.model_name)
    def predict_proba(self, texts: List[str], batch_size: int = 256) -> np.ndarray:
        self._ensure(); import numpy as np
        X = _embed(self._encoder, list(map(str, texts)), batch_size=batch_size)
        return np.clip(X @ self.centroids.T, -1.0, 1.0)
    def predict(self, texts: List[str], tau: float, batch_size: int = 256) -> List[str]:
        S = self.predict_proba(texts, batch_size=batch_size)
        idx = S.argmax(axis=1); smax = S.max(axis=1)
        return [(self.super_labels[int(i)] if s >= float(tau) else "OOS") for s, i in zip(smax, idx)]

def fit_superintent_zeroshot(train_df: pd.DataFrame, intent_to_super: Dict[str,str], artifacts: Dict,
                             exemplars_per_intent: int = 5, description_weight: float = 1.0,
                             model_name: str | None = None, text_col: str = "text", label_col: str = "intent",
                             random_state: int = 42) -> ZeroShotSuperIntent:
    logger = logging.getLogger("intentzero2few")
    if model_name is None:
        model_name = artifacts.get("model_name","sentence-transformers/all-MiniLM-L6-v2")
    df = train_df[[text_col, label_col]].dropna().copy(); df[label_col] = df[label_col].astype(str)
    label_desc = artifacts["label_desc"]
    super_to_intents: Dict[str, List[str]] = {}
    for intent, s in intent_to_super.items():
        super_to_intents.setdefault(s, []).append(intent)
    super_labels = sorted(super_to_intents.keys())
    enc = _get_encoder(model_name); import numpy as np; rng = np.random.RandomState(random_state)
    centroids = []
    for s in super_labels:
        members = super_to_intents[s]
        desc_texts = [label_desc[i] for i in members]
        desc_emb = np.asarray(enc.encode(desc_texts, convert_to_numpy=True, normalize_embeddings=True, show_progress_bar=False))
        if description_weight != 1.0: desc_emb = desc_emb * float(description_weight)
        ex_texts: List[str] = []
        for intent in members:
            block = df[df[label_col] == intent]
            k = min(len(block), int(exemplars_per_intent))
            if k > 0: ex_texts.extend(block.sample(n=k, random_state=rng)[text_col].astype(str).tolist())
        ex_emb = (np.asarray(enc.encode(ex_texts, convert_to_numpy=True, normalize_embeddings=True, show_progress_bar=False))
                  if ex_texts else np.zeros((0, desc_emb.shape[1]), np.float32))
        all_emb = desc_emb if len(ex_emb) == 0 else (ex_emb if len(desc_emb) == 0 else np.vstack([desc_emb, ex_emb]))
        c = all_emb.mean(axis=0); norm = np.linalg.norm(c) + 1e-12; centroids.append(c / norm)
        logger.info("zeroshot: centroid %s built with %d desc + %d exemplars", s, len(desc_emb), len(ex_emb))
    centroids = np.vstack(centroids).astype(np.float32)
    return ZeroShotSuperIntent(model_name=model_name, super_labels=super_labels, centroids=centroids, intent_to_super=intent_to_super)
PY

cat > "$PKG_DIR/threshold.py" << 'PY'
from __future__ import annotations
import numpy as np, pandas as pd
from typing import Iterable, Dict, List
from sklearn.metrics import f1_score
_OOS = {"oos","__neg__","out_of_scope","out-of-scope","unknown"}

def _is_oos(x:str)->bool:
    if x is None: return True
    return str(x).strip().casefold() in _OOS

def _true_supers(df: pd.DataFrame, mapping: Dict[str,str], intent_col="intent", is_oos_col="is_oos")->List[str]:
    y = []
    for _, r in df.iterrows():
        if is_oos_col in df.columns and int(r.get(is_oos_col, 0)) == 1:
            y.append("OOS"); continue
        it = str(r[intent_col])
        y.append("OOS" if _is_oos(it) else mapping.get(it, "OOS"))
    return y

def calibrate_threshold(zs_model, val_df: pd.DataFrame, tau_grid: Iterable[float] | None = None,
                        intent_col="intent", is_oos_col="is_oos")->float:
    if tau_grid is None: tau_grid = np.linspace(0.2,0.9,36)
    texts = val_df["text"].astype(str).tolist()
    y_true = _true_supers(val_df, zs_model.intent_to_super, intent_col, is_oos_col)
    S = zs_model.predict_proba(texts); idx = S.argmax(axis=1); smax = S.max(axis=1)
    labels = sorted(set(zs_model.super_labels) | {"OOS"})
    best_tau,best_f1 = None,-1.0
    for tau in tau_grid:
        y_pred = ["OOS" if s<float(tau) else zs_model.super_labels[int(i)] for s,i in zip(smax, idx)]
        f1 = f1_score(y_true, y_pred, average="macro", labels=labels, zero_division=0)
        if (f1>best_f1) or (np.isclose(f1,best_f1) and (best_tau is None or tau>best_tau)):
            best_tau,best_f1 = float(tau),float(f1)
    return float(best_tau)
PY

cat > "$PKG_DIR/evaluate.py" << 'PY'
from __future__ import annotations
import numpy as np, pandas as pd
from typing import Dict, List
from sklearn.metrics import classification_report, confusion_matrix, f1_score, accuracy_score
_OOS = {"oos","__neg__","out_of_scope","out-of-scope","unknown"}

def _is_oos(x: str) -> bool:
    if x is None: return True
    return str(x).strip().casefold() in _OOS

def _true_supers(df: pd.DataFrame, mapping: Dict[str,str], intent_col: str = "intent", is_oos_col: str = "is_oos") -> List[str]:
    y = []
    for _, r in df.iterrows():
        if is_oos_col in df.columns and int(r.get(is_oos_col, 0)) == 1:
            y.append("OOS"); continue
        it = str(r[intent_col]); y.append("OOS" if _is_oos(it) else mapping.get(it, "OOS"))
    return y

def evaluate_superintent(zs_model, df: pd.DataFrame, tau: float, intent_col: str = "intent", is_oos_col: str = "is_oos") -> Dict:
    texts = df["text"].astype(str).tolist()
    y_true = _true_supers(df, zs_model.intent_to_super, intent_col, is_oos_col)
    S = zs_model.predict_proba(texts); idx = S.argmax(axis=1); smax = S.max(axis=1)
    y_pred = ["OOS" if s < tau else zs_model.super_labels[int(i)] for s, i in zip(smax, idx)]
    labels = sorted(set(zs_model.super_labels) | {"OOS"})
    report = classification_report(y_true, y_pred, labels=labels, output_dict=True, zero_division=0)
    cm = confusion_matrix(y_true, y_pred, labels=labels)
    with np.errstate(divide="ignore", invalid="ignore"):
        cmn = cm.astype(float) / cm.sum(axis=1, keepdims=True); cmn = np.nan_to_num(cmn)
    acc = accuracy_score(y_true, y_pred); macro_f1 = f1_score(y_true, y_pred, average="macro", labels=labels, zero_division=0)
    return {"labels": labels, "classification_report": report,
            "confusion_matrix": cm.tolist(), "confusion_matrix_normalized": cmn.tolist(),
            "accuracy": float(acc), "macro_f1": float(macro_f1)}
PY

cat > "$PKG_DIR/viz.py" << 'PY'
from __future__ import annotations
import os, numpy as np, pandas as pd
import matplotlib.pyplot as plt, seaborn as sns

def save_confusion_heatmap(cm, labels, path_png, title="Confusion (row-normalized)"):
    os.makedirs(os.path.dirname(path_png), exist_ok=True)
    cm = np.asarray(cm, float)
    cmn = cm / np.maximum(cm.sum(axis=1, keepdims=True), 1e-9)
    plt.figure(figsize=(max(6, 0.4*len(labels)), max(5, 0.4*len(labels))))
    sns.heatmap(cmn, xticklabels=labels, yticklabels=labels)
    plt.title(title); plt.xlabel("Pred"); plt.ylabel("True")
    plt.tight_layout(); plt.savefig(path_png, dpi=300, bbox_inches="tight"); plt.close()

def save_wordcloud_from_df(df: pd.DataFrame, text_col: str, path_png: str):
    try:
        from wordcloud import WordCloud
    except Exception as e:
        print("wordcloud not installed; skipping:", e); return
    text = " ".join(df[text_col].astype(str).tolist())
    wc = WordCloud(width=1200, height=600, background_color="white").generate(text)
    os.makedirs(os.path.dirname(path_png), exist_ok=True)
    wc.to_file(path_png)
PY

#####################################
# __init__.py  — robust: DO NOT hard-import names from augmentation
#####################################
cat > "$PKG_DIR/__init__.py" << 'PY'
from .core import set_all_seeds, SEED, TEXT_COL, LABEL_COL
from .dataio import load_intents
from .eda import quick_eda, plot_top_intents, sample_by_intent, show_split
from .fewshot import make_k_shot
from .pollution import generate_fallback_negatives_en, make_polluted_test, make_polluted_test_debug
# NOTE: do NOT import names from augmentation; keep it optional & lazy
try:
    from . import augmentation as augmentation  # users can: from intentzero2few.augmentation import ...
except Exception:
    augmentation = None

from .labeling import fit_label_encoder, encode_in_scope_labels, sanity_check_labels
from .baselines import (
    split_in_scope, eval_in_scope, select_threshold_msp, oos_metrics,
    TfidfLR, TfidfLinearSVM, BertLinear, MajorityClassifier
)
from .discovery import discover_superintents, build_intent_descriptions
from .zeroshot import fit_superintent_zeroshot
from .threshold import calibrate_threshold
from .evaluate import evaluate_superintent
from .utils_logging import setup_logger
from .utils_io import get_env_paths, run_path, report_path, save_json, save_csv, save_figure, copy_to_report
from .utils_errors import log_error_row, error_csv_path

# Optional viz module
try:
    from . import viz as viz
except Exception:
    viz = None
PY

# (optional) .gitignore to keep runs/ out of Git
if [ ! -f ".gitignore" ]; then
  cat > .gitignore << 'GI'
runs/
**/__pycache__/
*.egg-info/
.DS_Store
GI
fi

# Editable reinstall
pip -q install -e .

# Smoke test (won't fail the cell)
python - <<'PY'
import importlib, intentzero2few as m
importlib.reload(m)
print("OK: package imported. augmentation module present:", m.augmentation is not None)
if m.augmentation:
    print("Has augment_text_noisy:", hasattr(m.augmentation, "augment_text_noisy"))
PY

echo "✅ intentzero2few modules written & reinstalled"


OK: package imported. augmentation module present: True
Has augment_text_noisy: True
✅ intentzero2few modules written & reinstalled


In [6]:
%%bash
# === Robust augmentation + auto-reload in __init__ (single-shot fix) ===
set -euo pipefail
REPO_DIR="${REPO_DIR:-/content/intentzero2few-repo}"
PKG_DIR="$REPO_DIR/src/intentzero2few"
mkdir -p "$PKG_DIR"

# --- augmentation.py (classic + noisy) ---
cat > "$PKG_DIR/augmentation.py" << 'PY'
from __future__ import annotations
from typing import Optional
import re, random, numpy as np, pandas as pd

__all__ = [
    "augment_text","make_augmented_df",
    "inject_slang_emoji","inject_typos",
    "augment_text_noisy","make_noisy_df",
    "random_deletion","random_swap","synonym_replacement","get_synonyms"
]

# ---------- Classic ----------
def _safe_import_wordnet():
    try:
        from nltk.corpus import wordnet
        return wordnet
    except Exception as e:
        raise RuntimeError("Run once: import nltk; nltk.download('wordnet'); nltk.download('omw-1.4')") from e

def get_synonyms(word:str):
    wn = _safe_import_wordnet()
    syns = set()
    for syn in wn.synsets(word):
        for l in syn.lemmas():
            w = l.name().replace("_"," ")
            if w.lower() != word.lower():
                syns.add(w)
    return list(syns)

def synonym_replacement(tokens, n=1, seed=42):
    rng = np.random.RandomState(seed)
    tokens = tokens.copy()
    cand = [w for w in tokens if re.match(r"^[A-Za-z]+$", w)]
    rng.shuffle(cand)
    cnt = 0
    for w in cand:
        try:
            syns = get_synonyms(w)
        except RuntimeError:
            syns = []
        if syns:
            rep = rng.choice(syns)
            idx = tokens.index(w)
            tokens[idx] = rep
            cnt += 1
            if cnt >= n: break
    return tokens

def random_deletion(tokens, p=0.1, seed=42):
    rng = np.random.RandomState(seed)
    if len(tokens) == 1: return tokens
    keep = [t for t in tokens if rng.rand() > p]
    return keep if keep else tokens

def random_swap(tokens, n=1, seed=42):
    rng = np.random.RandomState(seed)
    tokens = tokens.copy()
    for _ in range(n):
        if len(tokens) < 2: break
        i, j = rng.choice(range(len(tokens)), 2, replace=False)
        tokens[i], tokens[j] = tokens[j], tokens[i]
    return tokens

def augment_text(text, alpha_sr=0.1, alpha_rd=0.1, alpha_rs=0.1, seed=42):
    toks = text.split()
    L = len(toks)
    if L == 0: return text
    n_sr = max(1, int(alpha_sr*L))
    n_rs = max(1, int(alpha_rs*L))
    try:
        t1 = synonym_replacement(toks, n=n_sr, seed=seed)
    except RuntimeError:
        t1 = toks
    t2 = random_deletion(t1, p=alpha_rd, seed=seed)
    t3 = random_swap(t2, n=n_rs, seed=seed)
    return " ".join(t3)

def make_augmented_df(df, per_example=1, seed=42):
    rng = np.random.RandomState(seed)
    rows = []
    for _,row in df.iterrows():
        rows.append(row)
        for _ in range(per_example):
            rows.append(pd.Series({"text":augment_text(row["text"], seed=int(rng.randint(0,1e9))), "intent":row["intent"]}))
    return pd.DataFrame(rows).reset_index(drop=True)

# ---------- Noisy ----------
_EMOJIS = ["😊","😂","🔥","🚀","👍","💯","😅","😬","🤔","😎","😭","😡","✨","🎯","🤷","🙃"]
_SLANG_MAP = {
    "hello":"hey", "hi":"yo", "thanks":"thx", "thank you":"ty", "please":"plz",
    "you":"u", "are":"r", "for":"4", "to":"2", "great":"gr8", "before":"b4",
    "okay":"ok", "ok":"k", "really":"rlly", "people":"ppl", "message":"msg",
    "because":"cuz", "see you":"cya", "bye":"bb", "awesome":"awsome",
}

_KEY_NEAR = {
  "q":"w","w":"qe","e":"wr","r":"et","t":"ry","y":"tu","u":"yi","i":"uo","o":"ip","p":"o",
  "a":"s","s":"ad","d":"sf","f":"dg","g":"fh","h":"gj","j":"hk","k":"jl","l":"k",
  "z":"x","x":"zc","c":"xv","v":"cb","b":"vn","n":"bm","m":"n"
}

def inject_slang_emoji(text: str, slang_prob: float = 0.2, emoji_prob: float = 0.2, seed: Optional[int] = None) -> str:
    rng = random.Random(seed)
    toks = text.split()
    for i, tok in enumerate(toks):
        low = tok.lower()
        if rng.random() < slang_prob:
            if i+1 < len(toks):
                bigram = f"{low} {toks[i+1].lower()}"
                if bigram in _SLANG_MAP:
                    toks[i]   = _SLANG_MAP[bigram]
                    toks[i+1] = ""
                    continue
            if low in _SLANG_MAP:
                toks[i] = _SLANG_MAP[low]
    sent = " ".join([t for t in toks if t != ""]).strip()
    if rng.random() < emoji_prob:
        sent = (sent + " " + rng.choice(_EMOJIS)).strip()
    return sent

def inject_typos(text: str, typo_prob: float = 0.08, seed: Optional[int] = None) -> str:
    rng = random.Random(seed)
    chars = list(text); i = 0
    while i < len(chars):
        if chars[i].isspace():
            i += 1; continue
        if rng.random() < typo_prob:
            op = rng.choice(["drop","swap","sub"]); c = chars[i]
            if op == "drop":
                del chars[i]; continue
            elif op == "swap" and i+1 < len(chars) and not chars[i+1].isspace():
                chars[i], chars[i+1] = chars[i+1], chars[i]; i += 2; continue
            elif op == "sub":
                low = c.lower()
                repl = rng.choice(list(_KEY_NEAR.get(low, low)))
                chars[i] = repl.upper() if c.isupper() else repl
        i += 1
    return "".join(chars)

def augment_text_noisy(text: str,
                       slang_prob: float = 0.15,
                       emoji_prob: float = 0.15,
                       typo_prob: float = 0.08,
                       del_p: float = 0.05,
                       swap_n: int = 1,
                       seed: int = 42) -> str:
    t = inject_slang_emoji(text, slang_prob=slang_prob, emoji_prob=emoji_prob, seed=seed)
    t = inject_typos(t, typo_prob=typo_prob, seed=seed+1)
    toks = t.split()
    toks = random_deletion(toks, p=del_p, seed=seed+2)
    toks = random_swap(toks, n=swap_n, seed=seed+3)
    return " ".join(toks)

def make_noisy_df(df: pd.DataFrame, per_example: int = 1, seed: int = 42,
                  slang_prob: float = 0.15, emoji_prob: float = 0.15, typo_prob: float = 0.08,
                  del_p: float = 0.05, swap_n: int = 1) -> pd.DataFrame:
    rng = np.random.RandomState(seed)
    rows = []
    for _, row in df.iterrows():
        rows.append(row)
        for _ in range(per_example):
            noisy = augment_text_noisy(
                row["text"],
                slang_prob=slang_prob, emoji_prob=emoji_prob, typo_prob=typo_prob,
                del_p=del_p, swap_n=swap_n, seed=int(rng.randint(0,1e9))
            )
            rows.append(pd.Series({"text": noisy, "intent": row["intent"]}))
    return pd.DataFrame(rows).reset_index(drop=True)
PY

# --- __init__.py (auto-reload augmentation & viz if already imported) ---
cat > "$PKG_DIR/__init__.py" << 'PY'
from .core import set_all_seeds, SEED, TEXT_COL, LABEL_COL
from .dataio import load_intents
from .eda import quick_eda, plot_top_intents, sample_by_intent, show_split
from .fewshot import make_k_shot
from .pollution import generate_fallback_negatives_en, make_polluted_test, make_polluted_test_debug
from .labeling import fit_label_encoder, encode_in_scope_labels, sanity_check_labels
from .baselines import (
    split_in_scope, eval_in_scope, select_threshold_msp, oos_metrics,
    TfidfLR, TfidfLinearSVM, BertLinear, MajorityClassifier
)
from .discovery import discover_superintents, build_intent_descriptions
from .zeroshot import fit_superintent_zeroshot
from .threshold import calibrate_threshold
from .evaluate import evaluate_superintent
from .utils_logging import setup_logger
from .utils_io import get_env_paths, run_path, report_path, save_json, save_csv, save_figure, copy_to_report
from .utils_errors import log_error_row, error_csv_path

# --- Safe, fresh submodules (avoid stale cache in Colab) ---
import sys as _sys, importlib as _importlib
augmentation = None
try:
    if "intentzero2few.augmentation" in _sys.modules:
        _importlib.reload(_sys.modules["intentzero2few.augmentation"])
    from . import augmentation as augmentation
except Exception:
    augmentation = None

viz = None
try:
    if "intentzero2few.viz" in _sys.modules:
        _importlib.reload(_sys.modules["intentzero2few.viz"])
    from . import viz as viz
except Exception:
    viz = None
PY

# ensure gitignore keeps runtime outputs out of repo
mkdir -p "$REPO_DIR"
if [ ! -f "$REPO_DIR/.gitignore" ]; then
  cat > "$REPO_DIR/.gitignore" << 'GI'
runs/
**/__pycache__/
*.egg-info/
.DS_Store
GI
fi

# reinstall editable
pip -q install -e "$REPO_DIR"

# lightweight smoke
python - <<'PY'
import importlib, intentzero2few as m
importlib.reload(m)
print("augmentation module present:", m.augmentation is not None)
if m.augmentation:
    print("has make_noisy_df:", hasattr(m.augmentation, "make_noisy_df"))
    print("has augment_text_noisy:", hasattr(m.augmentation, "augment_text_noisy"))
PY

echo "✅ Patched init + augmentation written and installed."


augmentation module present: False
✅ Patched init + augmentation written and installed.


In [7]:
import importlib, intentzero2few as m
importlib.reload(m)  # __init__ içeride augmentation'ı taze reload ediyor
from intentzero2few.augmentation import make_augmented_df, make_noisy_df, augment_text_noisy
print("OK ✔")


OK ✔


In [8]:
%%bash
set -euo pipefail
REPO_DIR="/content/intentzero2few-repo"
AUG="$REPO_DIR/src/intentzero2few/augmentation.py"

# Yaz: klasik EDA tarzı augment + noisy augment (p_emoji, p_slang, p_char destekli)
cat > "$AUG" << 'PY'
"""
Text augmentation utilities.

Includes:
1) Classic EDA-style augmentation (synonyms / deletion / swap).
   - Requires NLTK WordNet (optionally).
2) Noisy augmentation (emoji / slang / character noise).
   - Explicit probabilities: p_emoji, p_slang, p_char.

EN: Use classic for semantic-preserving augmentation to grow data to a target size.
TR: Veri setini büyütmek için "anlamı koruyan" klasik augment; yanı sıra,
    robustness için "noisy" (emoji/argo/typo) augment.
"""
from __future__ import annotations
import re, random
import numpy as np
import pandas as pd

# -------------------------------
# Classic EDA-style augmentation
# -------------------------------
def _safe_import_wordnet():
    try:
        from nltk.corpus import wordnet
        return wordnet
    except Exception as e:
        raise RuntimeError("Run once: import nltk; nltk.download('wordnet'); nltk.download('omw-1.4')") from e

def get_synonyms(word:str):
    wn = _safe_import_wordnet()
    syns = set()
    for syn in wn.synsets(word):
        for l in syn.lemmas():
            w = l.name().replace("_"," ")
            if w.lower() != word.lower():
                syns.add(w)
    return list(syns)

def synonym_replacement(tokens, n=1, seed=42):
    rng = np.random.RandomState(seed)
    tokens = tokens.copy()
    cand = [w for w in tokens if re.match(r"^[A-Za-z]+$", w)]
    rng.shuffle(cand)
    cnt = 0
    for w in cand:
        syns = get_synonyms(w)
        if syns:
            rep = rng.choice(syns)
            idx = tokens.index(w)
            tokens[idx] = rep
            cnt += 1
            if cnt >= n:
                break
    return tokens

def random_deletion(tokens, p=0.1, seed=42):
    rng = np.random.RandomState(seed)
    if len(tokens) == 1:
        return tokens
    keep = [t for t in tokens if rng.rand() > p]
    return keep if keep else tokens

def random_swap(tokens, n=1, seed=42):
    rng = np.random.RandomState(seed)
    tokens = tokens.copy()
    for _ in range(n):
        if len(tokens) < 2:
            break
        i, j = rng.choice(range(len(tokens)), 2, replace=False)
        tokens[i], tokens[j] = tokens[j], tokens[i]
    return tokens

def augment_text(text, alpha_sr=0.1, alpha_rd=0.1, alpha_rs=0.1, seed=42):
    toks = str(text).split()
    L = len(toks)
    if L == 0:
        return str(text)
    n_sr = max(1, int(alpha_sr*L))
    n_rs = max(1, int(alpha_rs*L))
    try:
        t1 = synonym_replacement(toks, n=n_sr, seed=seed)
    except RuntimeError:
        t1 = toks
    t2 = random_deletion(t1, p=alpha_rd, seed=seed)
    t3 = random_swap(t2, n=n_rs, seed=seed)
    return " ".join(t3)

def make_augmented_df(df: pd.DataFrame, per_example=1, seed=42):
    rng = np.random.RandomState(seed)
    rows = []
    for _,row in df.iterrows():
        rows.append(row)
        for _ in range(per_example):
            rows.append(pd.Series({
                "text": augment_text(row["text"], seed=int(rng.randint(0,1e9))),
                "intent": row["intent"]
            }))
    return pd.DataFrame(rows).reset_index(drop=True)

# -------------------------------
# Noisy augmentation (emoji/slang/char)
# -------------------------------
_EMOJIS = ["😂","🤣","😊","😍","🔥","✨","👌","😅","😎","🤔","🙃","💯","🤷","😩","🥲","🤗","🙌","😭","😴","😜"]
_SLANG  = {
  "you":"u", "are":"r", "your":"ur", "please":"pls", "people":"ppl",
  "thanks":"thx", "thank you":"ty", "because":"cuz", "okay":"ok",
  "really":"rly", "message":"msg", "before":"b4", "tomorrow":"tmrw",
  "between":"btwn", "favorite":"fav", "see you":"cu", "by the way":"btw",
  "for your information":"fyi", "as soon as possible":"asap", "I don't know":"idk"
}

def _inject_emojis(text:str, rng:np.random.RandomState, min_n=1, max_n=3)->str:
    n = int(rng.randint(min_n, max_n+1))
    em = "".join(rng.choice(_EMOJIS, size=n))
    # 50% append, 50% inline
    if rng.rand() < 0.5:
        return text.strip() + " " + em
    toks = text.split()
    if not toks:
        return em
    pos = int(rng.randint(0, len(toks)))
    toks.insert(pos, em)
    return " ".join(toks)

def _slangify(text:str, rng:np.random.RandomState)->str:
    s = " " + text.lower() + " "
    # longest keys first to avoid partial overlaps
    for k in sorted(_SLANG.keys(), key=len, reverse=True):
        if rng.rand() < 0.5 and f" {k} " in s:
            s = s.replace(f" {k} ", f" {_SLANG[k]} ")
    # random add-ons
    tails = [" lol", " lmao", " smh", " fr", " tbh", " ngl", " btw", " idk"]
    if rng.rand() < 0.3:
        s = s.strip() + rng.choice(tails)
    return s.strip()

def _char_noise(text:str, rng:np.random.RandomState, p_char:float=0.05)->str:
    # light character-level noise: swap/duplicate/delete/case flip
    out = []
    i = 0
    while i < len(text):
        ch = text[i]
        if rng.rand() < p_char and ch.isalpha():
            op = rng.choice(["dup","del","swap","case"])
            if op == "dup":
                out.append(ch); out.append(ch)
            elif op == "del":
                # skip this char (delete)
                i += 1
                continue
            elif op == "swap" and i+1 < len(text):
                out.append(text[i+1]); out.append(ch)
                i += 2
                continue
            elif op == "case":
                out.append(ch.upper() if ch.islower() else ch.lower())
            else:
                out.append(ch)
        else:
            out.append(ch)
        i += 1
    return "".join(out)

def make_noisy_text(text:str,
                    rng:np.random.RandomState,
                    p_emoji:float=0.3, p_slang:float=0.3, p_char:float=0.05)->str:
    s = str(text)
    if rng.rand() < p_slang:
        s = _slangify(s, rng)
    if rng.rand() < p_emoji:
        s = _inject_emojis(s, rng)
    if p_char > 0:
        s = _char_noise(s, rng, p_char=p_char)
    return s

def make_noisy_df(df: pd.DataFrame,
                  per_example:int=1, seed:int=42,
                  p_emoji:float=0.3, p_slang:float=0.3, p_char:float=0.05) -> pd.DataFrame:
    """
    EN: Return df with original rows + `per_example` noisy variants per row.
    TR: Her satıra `per_example` adet gürültülü kopya ekler (orijinali korur).
    """
    rng = np.random.RandomState(seed)
    rows = []
    for _,row in df.iterrows():
        rows.append(row)
        for _ in range(per_example):
            rows.append(pd.Series({
                "text": make_noisy_text(row["text"], rng, p_emoji=p_emoji, p_slang=p_slang, p_char=p_char),
                "intent": row["intent"]
            }))
    return pd.DataFrame(rows).reset_index(drop=True)
PY

# paketi yeniden kur
cd "$REPO_DIR"
pip -q install -e .
echo "✅ augmentation.py updated & package reinstalled"


✅ augmentation.py updated & package reinstalled


In [9]:
%%bash
set -euo pipefail
REPO_DIR="/content/intentzero2few-repo"
mkdir -p "$REPO_DIR/src/intentzero2few"

cat > "$REPO_DIR/src/intentzero2few/viz.py" << 'PY'
from __future__ import annotations
import os
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np

def save_confusion_heatmap(cm_norm, labels, out_path, title: str | None = None, annot: bool = True):
    """
    Save a confusion matrix heatmap.
    - cm_norm: 2D list or np.ndarray (row-normalized values [0..1])
    - labels: list of class names (axis tick labels)
    - out_path: file path to save (directories will be created)
    """
    cm = np.asarray(cm_norm, dtype=float)
    os.makedirs(os.path.dirname(out_path), exist_ok=True)

    fig, ax = plt.subplots(figsize=(max(6, 0.5*len(labels)+2), max(5, 0.5*len(labels)+2)))
    sns.heatmap(cm, ax=ax, vmin=0.0, vmax=1.0, cmap="Blues",
                xticklabels=labels, yticklabels=labels,
                annot=annot, fmt=".2f", cbar=True, square=True)
    ax.set_xlabel("Predicted")
    ax.set_ylabel("True")
    if title:
        ax.set_title(title)
    fig.tight_layout()
    fig.savefig(out_path, dpi=300, bbox_inches="tight")
    plt.close(fig)
PY

echo "✅ viz.py written"


✅ viz.py written


# 4.) Installation & Smoke Test

In [None]:
%%bash
# 4) Install & Extended Smoke Test — RUN_ID-aware, idempotent
set -euo pipefail
REPO_DIR="${REPO_DIR:-/content/intentzero2few-repo}"
cd "$REPO_DIR"

# kurulum
pip -q install -U pip setuptools wheel
pip -q install -e .

python - << 'PY'
import os, pkgutil
import pandas as pd
import matplotlib.pyplot as plt

import intentzero2few as m
print("✅ package path:", m.__file__)
print("✅ submodules:", sorted([mod.name for mod in pkgutil.iter_modules(m.__path__)]))

# Public API (tam liste)
from intentzero2few import (
    # core & io
    set_all_seeds, SEED, TEXT_COL, LABEL_COL, load_intents,
    # eda
    quick_eda, plot_top_intents, sample_by_intent, show_split,
    # few-shot & augmentation & pollution
    make_k_shot, make_augmented_df, augment_text,
    generate_fallback_negatives_en, make_polluted_test, make_polluted_test_debug,
    # labeling
    fit_label_encoder, encode_in_scope_labels, sanity_check_labels,
    # baselines
    run_majority, run_tfidf_lr, run_bert_linear,
    split_in_scope, eval_in_scope, select_threshold_msp, oos_metrics,
    TfidfLR, TfidfLinearSVM, BertLinear, MajorityClassifier,
    # discovery/zero→few
    build_intent_descriptions, discover_superintents,
    fit_superintent_zeroshot, calibrate_threshold, evaluate_superintent,
    # logging & path utils
    setup_logger, get_env_paths, run_path, report_path,
    save_json, save_csv, save_figure, copy_to_report
)
print("✅ full API imports OK")

# RUN_ID yolları & logger
p = get_env_paths()
logger, log_path = setup_logger()
print("RUN_DIR:", p["RUN_DIR"])
print("REPORT_DIR:", p["REPORT_DIR"])
print("LOG:", log_path)

# I/O smoke: runs/<RUN_ID> ve reports/<RUN_ID> altına ufak artefaktlar
save_json({"smoke":"ok","run_id":p["RUN_ID"]}, run_path("analytics","smoke.json"))
smoke_df = pd.DataFrame([{"k":"v","n":1}])
save_csv(smoke_df, report_path("smoke.csv"))

# basit bir şekil (runs/figures → reports’a kopyala)
fig = plt.figure()
ax = plt.gca()
ax.plot([0,1,2],[0,1,0])
ax.set_title("smoke")
fig_path = run_path("figures","smoke.png")
save_figure(fig, fig_path)
copy_to_report(fig_path)  # reports/<RUN_ID>/smoke.png
plt.close(fig)
print("✅ I/O smoke OK")

# Opsiyonel mini E2E (küçük sentetik veri) — model indirme gerekebilir
DO_E2E = os.environ.get("SMOKE_E2E","0") == "1"
if DO_E2E:
    set_all_seeds(SEED)
    train = pd.DataFrame({
        "text":[
            "book a table for two at an italian restaurant",
            "reserve a hotel room in paris",
            "what is my account balance",
            "transfer 100 dollars to savings",
            "turn on the living room lights",
            "set an alarm for 7 am"
        ],
        "intent":[
            "restaurant_reservation","book_hotel",
            "balance","transfer",
            "smart_home","alarm"
        ]
    })
    val = train.copy()

    intent_to_super, super_to_intents, artifacts = discover_superintents(train, k_range=(2,3))
    zs = fit_superintent_zeroshot(train, intent_to_super, artifacts, exemplars_per_intent=2)
    tau = calibrate_threshold(zs, val)
    rep = evaluate_superintent(zs, val, tau)
    save_json({"tau":tau, "macro_f1":rep["macro_f1"]}, report_path("smoke_e2e.json"))
    print("✅ E2E smoke OK | macro_f1 =", round(rep["macro_f1"],3))
else:
    print("↩️  E2E smoke skipped (set SMOKE_E2E=1 to run it).")
PY


   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.8/1.8 MB 17.4 MB/s eta 0:00:00
✅ package path: /content/intentzero2few-repo/src/intentzero2few/__init__.py
✅ submodules: ['augmentation', 'baselines', 'core', 'dataio', 'discovery', 'eda', 'evaluate', 'fewshot', 'hf_trainers', 'labeling', 'pollution', 'threshold', 'utils_errors', 'utils_io', 'utils_logging', 'viz', 'zeroshot']


ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
ipython 7.34.0 requires jedi>=0.16, which is not installed.
Traceback (most recent call last):
  File "<stdin>", line 10, in <module>
ImportError: cannot import name 'make_augmented_df' from 'intentzero2few' (/content/intentzero2few-repo/src/intentzero2few/__init__.py)


CalledProcessError: Command 'b'# 4) Install & Extended Smoke Test \xe2\x80\x94 RUN_ID-aware, idempotent\nset -euo pipefail\nREPO_DIR="${REPO_DIR:-/content/intentzero2few-repo}"\ncd "$REPO_DIR"\n\n# kurulum\npip -q install -U pip setuptools wheel\npip -q install -e .\n\npython - << \'PY\'\nimport os, pkgutil\nimport pandas as pd\nimport matplotlib.pyplot as plt\n\nimport intentzero2few as m\nprint("\xe2\x9c\x85 package path:", m.__file__)\nprint("\xe2\x9c\x85 submodules:", sorted([mod.name for mod in pkgutil.iter_modules(m.__path__)]))\n\n# Public API (tam liste)\nfrom intentzero2few import (\n    # core & io\n    set_all_seeds, SEED, TEXT_COL, LABEL_COL, load_intents,\n    # eda\n    quick_eda, plot_top_intents, sample_by_intent, show_split,\n    # few-shot & augmentation & pollution\n    make_k_shot, make_augmented_df, augment_text,\n    generate_fallback_negatives_en, make_polluted_test, make_polluted_test_debug,\n    # labeling\n    fit_label_encoder, encode_in_scope_labels, sanity_check_labels,\n    # baselines\n    run_majority, run_tfidf_lr, run_bert_linear,\n    split_in_scope, eval_in_scope, select_threshold_msp, oos_metrics,\n    TfidfLR, TfidfLinearSVM, BertLinear, MajorityClassifier,\n    # discovery/zero\xe2\x86\x92few\n    build_intent_descriptions, discover_superintents,\n    fit_superintent_zeroshot, calibrate_threshold, evaluate_superintent,\n    # logging & path utils\n    setup_logger, get_env_paths, run_path, report_path,\n    save_json, save_csv, save_figure, copy_to_report\n)\nprint("\xe2\x9c\x85 full API imports OK")\n\n# RUN_ID yollar\xc4\xb1 & logger\np = get_env_paths()\nlogger, log_path = setup_logger()\nprint("RUN_DIR:", p["RUN_DIR"])\nprint("REPORT_DIR:", p["REPORT_DIR"])\nprint("LOG:", log_path)\n\n# I/O smoke: runs/<RUN_ID> ve reports/<RUN_ID> alt\xc4\xb1na ufak artefaktlar\nsave_json({"smoke":"ok","run_id":p["RUN_ID"]}, run_path("analytics","smoke.json"))\nsmoke_df = pd.DataFrame([{"k":"v","n":1}])\nsave_csv(smoke_df, report_path("smoke.csv"))\n\n# basit bir \xc5\x9fekil (runs/figures \xe2\x86\x92 reports\xe2\x80\x99a kopyala)\nfig = plt.figure()\nax = plt.gca()\nax.plot([0,1,2],[0,1,0])\nax.set_title("smoke")\nfig_path = run_path("figures","smoke.png")\nsave_figure(fig, fig_path)\ncopy_to_report(fig_path)  # reports/<RUN_ID>/smoke.png\nplt.close(fig)\nprint("\xe2\x9c\x85 I/O smoke OK")\n\n# Opsiyonel mini E2E (k\xc3\xbc\xc3\xa7\xc3\xbck sentetik veri) \xe2\x80\x94 model indirme gerekebilir\nDO_E2E = os.environ.get("SMOKE_E2E","0") == "1"\nif DO_E2E:\n    set_all_seeds(SEED)\n    train = pd.DataFrame({\n        "text":[\n            "book a table for two at an italian restaurant",\n            "reserve a hotel room in paris",\n            "what is my account balance",\n            "transfer 100 dollars to savings",\n            "turn on the living room lights",\n            "set an alarm for 7 am"\n        ],\n        "intent":[\n            "restaurant_reservation","book_hotel",\n            "balance","transfer",\n            "smart_home","alarm"\n        ]\n    })\n    val = train.copy()\n\n    intent_to_super, super_to_intents, artifacts = discover_superintents(train, k_range=(2,3))\n    zs = fit_superintent_zeroshot(train, intent_to_super, artifacts, exemplars_per_intent=2)\n    tau = calibrate_threshold(zs, val)\n    rep = evaluate_superintent(zs, val, tau)\n    save_json({"tau":tau, "macro_f1":rep["macro_f1"]}, report_path("smoke_e2e.json"))\n    print("\xe2\x9c\x85 E2E smoke OK | macro_f1 =", round(rep["macro_f1"],3))\nelse:\n    print("\xe2\x86\xa9\xef\xb8\x8f  E2E smoke skipped (set SMOKE_E2E=1 to run it).")\nPY\n'' returned non-zero exit status 1.

In [None]:
import os
os.environ["SMOKE_E2E"] = "1"


# 5.) Data acquisition + CLINC export + Quick EDA + WordCloud
EN (what/why): Downloads DeepPavlov/clinc_oos:plus, separates OOS, writes data/clinc150.json. Saves basic EDA counts and a wordcloud. This answers dataset description & EDA needs and prepares the canonical JSON for later steps.
TR (ne/niye): CLINC_OOS indirir, OOS’u ayırır, data/clinc150.json yazar. Temel EDA metrikleri ve wordcloud üretir. Veri kümesi tanımı & EDA gereksinimini karşılar; sonraki adımlar için referans JSON’u üretir.
RQ bağlantısı: RQ-Data: veri boyutu/dağılım; rapor: runs/.../analytics/split_stats.csv, reports/.../wordcloud_train.png.

In [10]:
# 5) Download CLINC_OOS, export CLINC-style JSON, quick EDA, wordcloud + error CSV
import os, json, logging, traceback, importlib.util, subprocess, sys
import pandas as pd
from datasets import load_dataset

# ---- safer imports (module-level) + robust fallback ----
try:
    from intentzero2few.utils_io import (
        get_env_paths, run_path, report_path, save_json, save_csv, copy_to_report
    )
    from intentzero2few.utils_logging import setup_logger
except ImportError as e:
    # Fallback: env'den yolları kur, I/O yardımcılarını inline tanımla (geçici)
    REPO_DIR = os.environ.get("REPO_DIR", "/content/intentzero2few-repo")
    RUN_ID = os.environ.get("RUN_ID")
    RUN_DIR = os.environ.get("RUN_DIR") or (os.path.join(REPO_DIR, "runs", RUN_ID) if RUN_ID else os.path.join(REPO_DIR, "runs", "adhoc"))
    REPORT_DIR = os.environ.get("REPORT_DIR") or (os.path.join(REPO_DIR, "reports", RUN_ID) if RUN_ID else os.path.join(REPO_DIR, "reports", "adhoc"))
    for _sub in ("analytics","logs","figures","artifacts"):
        os.makedirs(os.path.join(RUN_DIR, _sub), exist_ok=True)
    os.makedirs(REPORT_DIR, exist_ok=True)

    def get_env_paths():
        return {"REPO_DIR": REPO_DIR, "RUN_ID": RUN_ID, "RUN_DIR": RUN_DIR, "REPORT_DIR": REPORT_DIR}

    def run_path(subdir: str, filename: str) -> str:
        base = os.path.join(RUN_DIR, subdir); os.makedirs(base, exist_ok=True)
        return os.path.join(base, filename)

    def report_path(filename: str) -> str:
        os.makedirs(REPORT_DIR, exist_ok=True)
        return os.path.join(REPORT_DIR, filename)

    def save_json(obj, path: str, ensure_ascii: bool = False, indent: int = 2):
        os.makedirs(os.path.dirname(path), exist_ok=True)
        with open(path, "w", encoding="utf-8") as f:
            json.dump(obj, f, ensure_ascii=ensure_ascii, indent=indent)

    def save_csv(df: pd.DataFrame, path: str, index: bool = False):
        os.makedirs(os.path.dirname(path), exist_ok=True)
        df.to_csv(path, index=index)

    def copy_to_report(src_path: str, dst_name: str | None = None) -> str:
        import shutil
        dst = os.path.join(REPORT_DIR, dst_name or os.path.basename(src_path))
        os.makedirs(os.path.dirname(dst), exist_ok=True)
        shutil.copy2(src_path, dst)
        return dst

    def setup_logger():
        log_dir = os.path.join(RUN_DIR, "logs"); os.makedirs(log_dir, exist_ok=True)
        path = os.path.join(log_dir, "intentzero2few-fallback.log")
        logger = logging.getLogger("intentzero2few"); logger.setLevel(logging.INFO)
        if not logger.handlers:
            fh = logging.FileHandler(path, encoding="utf-8")
            ch = logging.StreamHandler()
            fmt = logging.Formatter("%(asctime)s | %(levelname)s | %(name)s | %(message)s")
            fh.setFormatter(fmt); ch.setFormatter(fmt)
            logger.addHandler(fh); logger.addHandler(ch)
        logger.warning("Fallback I/O+logger initialized due to ImportError: %s", e)
        return logger, path

# EDA ve (opsiyonel) viz modülleri
from intentzero2few import eda as _eda
try:
    from intentzero2few import viz as _viz
except Exception:
    _viz = None

p = get_env_paths()
logger, log_path = setup_logger()
logger.info("Step 5 start: data acquisition & EDA")

def log_error_csv(name:str, err:Exception, context:dict=None):
    """Append an error record to runs/<RUN_ID>/analytics/errors_<name>.csv"""
    rec = {
        "phase": name,
        "error": f"{type(err).__name__}: {err}",
        "trace": traceback.format_exc().strip()
    }
    if context:
        rec.update({f"ctx_{k}": v for k,v in context.items()})
    path = run_path("analytics", f"errors_{name}.csv")
    df = pd.DataFrame([rec])
    header = not os.path.exists(path)
    df.to_csv(path, mode="a", index=False, header=header)
    logger.error("Error in %s: %s", name, err)

try:
    # 5.1 Load dataset
    ds = load_dataset("DeepPavlov/clinc_oos", "plus")

    def split_to_frames(dset):
        df = dset.to_pandas()
        df = df.rename(columns={"label_text":"intent"})[["text","intent"]]
        in_scope = df[df["intent"].str.lower() != "oos"].reset_index(drop=True)
        oos      = df[df["intent"].str.lower() == "oos"][["text"]].copy()
        oos["intent"] = "OOS"
        return in_scope, oos

    train_df, _         = split_to_frames(ds["train"])
    val_df,   oos_val   = split_to_frames(ds["validation"])
    test_df,  oos_test  = split_to_frames(ds["test"])

    # 5.2 Save CLINC-style JSON
    data_json = os.path.join(p["REPO_DIR"], "data", "clinc150.json")
    os.makedirs(os.path.dirname(data_json), exist_ok=True)
    save_json({
        "train":   train_df[["text","intent"]].values.tolist(),
        "val":     val_df[["text","intent"]].values.tolist(),
        "test":    test_df[["text","intent"]].values.tolist(),
        "oos_val":  oos_val[["text","intent"]].values.tolist(),
        "oos_test": oos_test[["text","intent"]].values.tolist(),
    }, data_json)
    logger.info("Saved CLINC JSON → %s", data_json)

    # 5.3 Quick EDA → analytics
    def split_stats(df, name):
        return {
            "split": name,
            "rows": int(len(df)),
            "unique_intents": int(df["intent"].nunique()),
            "avg_words": float(df["text"].astype(str).str.split().map(len).mean())
        }

    stats = [split_stats(train_df,"train"),
             split_stats(val_df,"val"),
             split_stats(test_df,"test"),
             {"split":"oos_val","rows":int(len(oos_val)),"unique_intents":1,
              "avg_words":float(oos_val["text"].astype(str).str.split().map(len).mean())},
             {"split":"oos_test","rows":int(len(oos_test)),"unique_intents":1,
              "avg_words":float(oos_test["text"].astype(str).str.split().map(len).mean())}]
    save_csv(pd.DataFrame(stats), run_path("analytics","split_stats.csv"))
    cnt_df = train_df["intent"].value_counts().rename_axis("intent").reset_index(name="count")
    save_csv(cnt_df, run_path("analytics","train_intent_counts.csv"))

    # 5.4 Wordcloud (train)
    try:
        if _viz is None:
            # lazy install
            if importlib.util.find_spec("wordcloud") is None:
                subprocess.check_call([sys.executable,"-m","pip","install","-q","wordcloud"])
            from intentzero2few import viz as _viz
        wc_path = run_path("figures","wordcloud_train.png")
        _viz.save_wordcloud_from_df(train_df, "text", wc_path)
        if os.path.exists(wc_path):
            copy_to_report(wc_path, "wordcloud_train.png")
            logger.info("Wordcloud saved & copied to report")
    except Exception as e:
        log_error_csv("wordcloud", e)

    # 5.5 Console preview (few rows)
    print("\nSAMPLE train rows:", train_df.head(3).to_dict("records"))
    print("SAMPLE val rows:", val_df.head(3).to_dict("records"))
    print("SAMPLE test rows:", test_df.head(3).to_dict("records"))
    print("SAMPLE oos_val rows:", oos_val.head(3).to_dict("records"))
    print("SAMPLE oos_test rows:", oos_test.head(3).to_dict("records"))

    _ = _eda.quick_eda(train_df, "CLINC TRAIN")
    print("✅ Step 5 done. Log:", log_path)

except Exception as e:
    log_error_csv("step5_data_eda", e)
    raise


2025-09-21 12:40:53,607 | INFO | intentzero2few | Logger started. path=/content/intentzero2few-repo/runs/20250921-123702/logs/intentzero2few-20250921-124053.log
INFO:intentzero2few:Logger started. path=/content/intentzero2few-repo/runs/20250921-123702/logs/intentzero2few-20250921-124053.log
2025-09-21 12:40:53,614 | INFO | intentzero2few | Step 5 start: data acquisition & EDA
INFO:intentzero2few:Step 5 start: data acquisition & EDA
The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


README.md: 0.00B [00:00, ?B/s]

plus/train-00000-of-00001.parquet:   0%|          | 0.00/308k [00:00<?, ?B/s]

plus/validation-00000-of-00001.parquet:   0%|          | 0.00/74.5k [00:00<?, ?B/s]

plus/test-00000-of-00001.parquet:   0%|          | 0.00/133k [00:00<?, ?B/s]

Generating train split:   0%|          | 0/15250 [00:00<?, ? examples/s]

Generating validation split:   0%|          | 0/3100 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/5500 [00:00<?, ? examples/s]

2025-09-21 12:40:59,671 | INFO | intentzero2few | Saved CLINC JSON → /content/intentzero2few-repo/data/clinc150.json
INFO:intentzero2few:Saved CLINC JSON → /content/intentzero2few-repo/data/clinc150.json
2025-09-21 12:40:59,785 | ERROR | intentzero2few | Error in wordcloud: 'NoneType' object has no attribute 'save_wordcloud_from_df'
ERROR:intentzero2few:Error in wordcloud: 'NoneType' object has no attribute 'save_wordcloud_from_df'



SAMPLE train rows: [{'text': 'what expression would i use to say i love you if i were an italian', 'intent': 'translate'}, {'text': "can you tell me how to say 'i do not speak much spanish', in spanish", 'intent': 'translate'}, {'text': "what is the equivalent of, 'life is good' in french", 'intent': 'translate'}]
SAMPLE val rows: [{'text': 'in spanish, meet me tomorrow is said how', 'intent': 'translate'}, {'text': 'in french, how do i say, see you later', 'intent': 'translate'}, {'text': 'how do you say hello in japanese', 'intent': 'translate'}]
SAMPLE test rows: [{'text': 'how would you say fly in italian', 'intent': 'translate'}, {'text': "what's the spanish word for pasta", 'intent': 'translate'}, {'text': 'how would they say butter in zambia', 'intent': 'translate'}]
SAMPLE oos_test rows: [{'text': 'how much has the dow changed today', 'intent': 'OOS'}, {'text': 'how many prime numbers are there between 0 and 100', 'intent': 'OOS'}, {'text': 'can you tell me how to solve simple

# 6.A.) Polluted validation/test (OOS karışımlı)
EN: Mixes OOS into val and test (default 30%) to stress the OOS detector; saves CSV + manifest.
TR: val ve teste OOS karıştırarak (%30) OOS tespitini zorlar; CSV + manifest kaydeder.
RQ: RQ-Robustness (OOS tespiti); rapor: analytics/{val,test}_polluted.csv.

In [11]:
# ✅ 6A) POLLUTED validation/test + error CSV + sample prints (robust import + logs)
import os, sys, importlib, subprocess, traceback
import pandas as pd

# --- Safe editable install & import refresh (prelude) ---
REPO_DIR = "/content/intentzero2few-repo"
SRC_DIR  = os.path.join(REPO_DIR, "src")
if SRC_DIR not in sys.path:
    sys.path.insert(0, SRC_DIR)

# reinstall editable (quiet), then import/reload
try:
    subprocess.run([sys.executable, "-m", "pip", "install", "-e", REPO_DIR, "-q"], check=False)
except Exception:
    pass

import intentzero2few as m
importlib.reload(m)

# primary import path (from __init__)
try:
    from intentzero2few import (
        load_intents, make_polluted_test,
        get_env_paths, save_csv, save_json, run_path, setup_logger
    )
except Exception:
    # fallback to submodules if __init__ exports are stale
    from intentzero2few.dataio import load_intents
    from intentzero2few.pollution import make_polluted_test
    from intentzero2few.utils_io import get_env_paths, save_csv, save_json, run_path
    from intentzero2few.utils_logging import setup_logger

# --- logger ---
p = get_env_paths()
logger, _ = setup_logger()

def log_error_csv(name, err, ctx=None):
    """Append errors to runs/<RUN_ID>/analytics/errors_*.csv and log as ERROR."""
    rec = {
        "phase": name,
        "error": f"{type(err).__name__}: {err}",
        "trace": traceback.format_exc().strip(),
    }
    if ctx:
        rec.update({f"ctx_{k}": v for k, v in ctx.items()})
    path = run_path("analytics", f"errors_{name}.csv")
    pd.DataFrame([rec]).to_csv(path, mode="a", index=False, header=not os.path.exists(path))
    logger.error("Error in %s: %s", name, err)

try:
    # --- load splits ---
    data_json = os.path.join(p["REPO_DIR"], "data", "clinc150.json")
    splits = load_intents(data_json)
    train_df, val_df, test_df = splits["train"], splits["val"], splits["test"]
    oos_val  = splits.get("oos_val",  pd.DataFrame(columns=["text","intent"]))
    oos_test = splits.get("oos_test", pd.DataFrame(columns=["text","intent"]))
    logger.info("Loaded splits: train=%d, val=%d, test=%d, oos_val=%d, oos_test=%d",
                len(train_df), len(val_df), len(test_df), len(oos_val), len(oos_test))

    # --- ratios (env overrides ok) ---
    VAL_OOS_RATIO  = float(os.environ.get("VAL_OOS_RATIO",  0.30))
    TEST_OOS_RATIO = float(os.environ.get("TEST_OOS_RATIO", 0.30))
    logger.info("Pollution ratios: VAL=%.2f TEST=%.2f", VAL_OOS_RATIO, TEST_OOS_RATIO)

    # --- build polluted splits ---
    val_polluted,  val_oos_used  = make_polluted_test(val_df,  oos_val,  ratio=VAL_OOS_RATIO,  seed=42)
    test_polluted, test_oos_used = make_polluted_test(test_df, oos_test, ratio=TEST_OOS_RATIO, seed=42)

    # --- save analytics + manifest ---
    save_csv(val_polluted,  run_path("analytics", "val_polluted.csv"))
    save_csv(test_polluted, run_path("analytics", "test_polluted.csv"))
    save_json({
        "val_rows": int(len(val_df)), "test_rows": int(len(test_df)),
        "val_polluted_rows": int(len(val_polluted)),  "test_polluted_rows": int(len(test_polluted)),
        "val_oos_injected": 0 if val_oos_used is None else int(len(val_oos_used)),
        "test_oos_injected": 0 if test_oos_used is None else int(len(test_oos_used)),
        "val_oos_ratio": VAL_OOS_RATIO, "test_oos_ratio": TEST_OOS_RATIO
    }, run_path("analytics", "pollution_manifest.json"))

    # --- tiny samples to console (for inspection) ---
    print("\nSAMPLE val_polluted:",  val_polluted.head(5).to_dict("records"))
    print("SAMPLE test_polluted:", test_polluted.head(5).to_dict("records"))
    print("✅ 6A done → runs/<RUN_ID>/analytics/{val,test}_polluted.csv")

except Exception as e:
    log_error_csv("step6A_pollution", e, ctx={"repo_dir": REPO_DIR})
    raise


2025-09-21 12:41:47,463 | INFO | intentzero2few | Logger started. path=/content/intentzero2few-repo/runs/20250921-123702/logs/intentzero2few-20250921-124147.log
INFO:intentzero2few:Logger started. path=/content/intentzero2few-repo/runs/20250921-123702/logs/intentzero2few-20250921-124147.log
2025-09-21 12:41:47,515 | INFO | intentzero2few | Loaded splits: train=15000, val=3000, test=4500, oos_val=100, oos_test=1000
INFO:intentzero2few:Loaded splits: train=15000, val=3000, test=4500, oos_val=100, oos_test=1000
2025-09-21 12:41:47,520 | INFO | intentzero2few | Pollution ratios: VAL=0.30 TEST=0.30
INFO:intentzero2few:Pollution ratios: VAL=0.30 TEST=0.30



SAMPLE val_polluted: [{'text': 'are there any vegan restaurants in my town', 'intent': 'restaurant_suggestion', 'is_oos': 0}, {'text': 'what is the amount of time i can request off in the coming year', 'intent': 'pto_balance', 'is_oos': 0}, {'text': 'can you multiply 45 by 23', 'intent': 'calculator', 'is_oos': 0}, {'text': "i want david to know where i'm at", 'intent': 'share_location', 'is_oos': 0}, {'text': 'my card is messed up from carding my door', 'intent': 'damaged_card', 'is_oos': 0}]
SAMPLE test_polluted: [{'text': 'turn down the volume', 'intent': 'change_volume', 'is_oos': 0}, {'text': 'is it safe for me to go to turkey', 'intent': 'travel_alert', 'is_oos': 0}, {'text': 'what have i spent things on', 'intent': 'transactions', 'is_oos': 0}, {'text': "check if bj's takes reservations", 'intent': 'accept_reservations', 'is_oos': 0}, {'text': 'when can i get a business card printed locally', 'intent': 'OOS', 'is_oos': 1}]
✅ 6A done → runs/<RUN_ID>/analytics/{val,test}_polluted

# 6.B.) Augmented train (≥30K, klasik) + Noisy augmented train
EN: Grows train to target rows via classic augmentation; additionally builds a noisy augmented variant (emoji/slang/typo). Prints examples of both.
TR: Eğitimi klasik augment ile hedef satıra büyütür; ek olarak noisy (emoji/slang/typo) varyantını üretir. İkisinden de örnek satırları yazdırır.
RQ: RQ-Data Efficiency (synthetic data helps?) & RQ-Robustness (noisy training helps?); rapor: analytics/train_augmented*.csv.

In [12]:
# === 6B) Augment train (classic >=30K) + Noisy augment w/ signature-safe & fallback ===
import os, math, pandas as pd, traceback, random, re, inspect, importlib
from intentzero2few import (
    load_intents, get_env_paths, save_csv, save_json, run_path, report_path, setup_logger
)
aug_mod = importlib.import_module("intentzero2few.augmentation")

# Optional WordNet (classic synonyms)
try:
    import nltk
    nltk.download("wordnet", quiet=True); nltk.download("omw-1.4", quiet=True)
except Exception:
    pass

p = get_env_paths()
logger,_ = setup_logger()

def log_error_csv(name, err, ctx=None):
    rec = {"phase":name,"error":f"{type(err).__name__}: {err}","trace":traceback.format_exc().strip()}
    if ctx: rec.update({f"ctx_{k}":v for k,v in ctx.items()})
    path = run_path("analytics", f"errors_{name}.csv")
    pd.DataFrame([rec]).to_csv(path, mode="a", index=False, header=not os.path.exists(path))
    logger.error("Error in %s: %s", name, err)

# ---- Local fallback: emoji / slang / character noise ----
_EMOJIS = ["😂","🔥","✨","👍","💯","🙈","😅","🤔","👉","💥","🤷","🫠","😬","🥲","👏"]
_SLANG = {
    "you": "u", "are":"r", "your":"ur", "for":"4", "to":"2",
    "okay":"ok", "thanks":"thx", "please":"pls", "really":"rlly", "people":"ppl",
}
_WORD_RE = re.compile(r"[A-Za-z]+")

def _inject_emojis(text, rng, p=0.3):
    if rng.random() < p:
        k = 1 if rng.random() < 0.8 else 2
        return text + " " + "".join(rng.choice(_EMOJIS) for _ in range(k))
    return text

def _slangify(text, rng, p=0.3):
    toks = text.split()
    out = []
    for t in toks:
        base = t.lower()
        if _WORD_RE.fullmatch(base) and (base in _SLANG) and (rng.random() < p):
            rep = _SLANG[base]
            # preserve capitalization roughly
            rep = rep.upper() if t.isupper() else (rep.capitalize() if t.istitle() else rep)
            out.append(rep)
        else:
            out.append(t)
    return " ".join(out)

def _char_noise(text, rng, p=0.05):
    if not text: return text
    chars = list(text)
    i = 0
    while i < len(chars):
        if rng.random() < p:
            op = rng.choice(["drop","dup","swap"])
            if op == "drop":
                del chars[i]
                continue
            elif op == "dup":
                chars.insert(i, chars[i])
                i += 2
                continue
            elif op == "swap" and i+1 < len(chars):
                chars[i], chars[i+1] = chars[i+1], chars[i]
                i += 2
                continue
        i += 1
    return "".join(chars)

def make_noisy_df_fallback(df, per_example=1, seed=42, p_emoji=0.3, p_slang=0.3, p_char=0.05):
    rng = random.Random(seed)
    rows = []
    for _, row in df.iterrows():
        rows.append(row)
        for _ in range(per_example):
            t = str(row["text"])
            t = _slangify(t, rng, p=p_slang)
            t = _char_noise(t, rng, p=p_char)
            t = _inject_emojis(t, rng, p=p_emoji)
            rows.append(pd.Series({"text": t, "intent": row["intent"]}))
    return pd.DataFrame(rows).reset_index(drop=True)

def sig_safe_noisy(df, per_example=1, seed=42, p_emoji=0.3, p_slang=0.3, p_char=0.05):
    """
    Try module's make_noisy_df with supported kwargs; if not supported, use local fallback.
    """
    importlib.reload(aug_mod)  # best effort hot reload
    fn = getattr(aug_mod, "make_noisy_df", None)
    if fn is None:
        logger.warning("augmentation.make_noisy_df not found; using local fallback.")
        return make_noisy_df_fallback(df, per_example=per_example, seed=seed,
                                      p_emoji=p_emoji, p_slang=p_slang, p_char=p_char)
    sig = inspect.signature(fn)
    logger.info("make_noisy_df signature (module) = %s", sig)
    # collect only supported kwargs
    base = {}
    if "per_example" in sig.parameters: base["per_example"] = per_example
    if "seed"         in sig.parameters: base["seed"] = seed
    forwarded = {}
    if "p_emoji" in sig.parameters: forwarded["p_emoji"] = p_emoji
    if "p_slang" in sig.parameters: forwarded["p_slang"] = p_slang
    if "p_char"  in sig.parameters: forwarded["p_char"]  = p_char
    logger.info("forwarding kwargs to module: %s", forwarded)
    # if none of the noise kwargs are supported, fall back to local
    if not forwarded and not any(k in sig.parameters for k in ("p_emoji","p_slang","p_char")):
        logger.warning("module make_noisy_df has no noise kwargs; using local fallback.")
        return make_noisy_df_fallback(df, per_example=per_example, seed=seed,
                                      p_emoji=p_emoji, p_slang=p_slang, p_char=p_char)
    try:
        return fn(df, **base, **forwarded)
    except TypeError as e:
        logger.warning("module make_noisy_df call failed (%s); using local fallback.", e)
        return make_noisy_df_fallback(df, per_example=per_example, seed=seed,
                                      p_emoji=p_emoji, p_slang=p_slang, p_char=p_char)

try:
    # --- Load base train
    splits   = load_intents(os.path.join(p["REPO_DIR"], "data", "clinc150.json"))
    train_df = splits["train"]

    # --- Classic augmentation to ~TARGET_ROWS ---
    TARGET_ROWS = int(os.environ.get("AUG_TARGET_ROWS","30000"))
    cur = len(train_df)
    per_ex = 0 if cur >= TARGET_ROWS else max(1, math.ceil(TARGET_ROWS/cur) - 1)
    logger.info("Classic augment: current=%d target=%d → per_example=%d", cur, TARGET_ROWS, per_ex)

    classic_aug = aug_mod.make_augmented_df(train_df, per_example=per_ex, seed=42)
    save_csv(classic_aug, run_path("analytics","train_augmented.csv"))
    save_json(
        {"target_rows":TARGET_ROWS,"original_rows":cur,"per_example":per_ex,"augmented_rows":int(len(classic_aug))},
        run_path("analytics","augmentation_manifest.json")
    )

    # --- Noisy augmentation (emoji/slang/char) with robust fallback ---
    NOISY_PER_EX = int(os.environ.get("NOISY_PER_EX","1"))
    noisy_aug = sig_safe_noisy(train_df, per_example=NOISY_PER_EX, seed=42,
                               p_emoji=0.3, p_slang=0.3, p_char=0.05)

    save_csv(noisy_aug, run_path("analytics","train_augmented_noisy.csv"))
    save_json(
        {"per_example":NOISY_PER_EX, "rows":int(len(noisy_aug)),
         "note":"module-signature-safe; local fallback used if needed"},
        run_path("analytics","noisy_augmentation_manifest.json")
    )

    # --- Tiny samples (for understanding & thesis)
    classic_samp = classic_aug.head(20)
    noisy_samp   = noisy_aug.head(20)
    classic_samp.to_csv(report_path("aug_samples_classic_head20.csv"), index=False)
    noisy_samp.to_csv(report_path("aug_samples_noisy_head20.csv"), index=False)

    print("\nSAMPLE classic-aug rows:", classic_samp.head(5).to_dict("records"))
    print("SAMPLE noisy-aug rows:",   noisy_samp.head(5).to_dict("records"))
    print("✅ 6B done → runs/<RUN_ID>/analytics/train_augmented*.csv and reports/<RUN_ID>/aug_samples_*.csv")

except Exception as e:
    log_error_csv("step6B_augment", e)
    raise


2025-09-21 12:42:40,473 | INFO | intentzero2few | Logger started. path=/content/intentzero2few-repo/runs/20250921-123702/logs/intentzero2few-20250921-124240.log
INFO:intentzero2few:Logger started. path=/content/intentzero2few-repo/runs/20250921-123702/logs/intentzero2few-20250921-124240.log
2025-09-21 12:42:40,782 | INFO | intentzero2few | Classic augment: current=15000 target=30000 → per_example=1
INFO:intentzero2few:Classic augment: current=15000 target=30000 → per_example=1
2025-09-21 12:43:49,411 | INFO | intentzero2few | make_noisy_df signature (module) = (df: 'pd.DataFrame', per_example: 'int' = 1, seed: 'int' = 42, p_emoji: 'float' = 0.3, p_slang: 'float' = 0.3, p_char: 'float' = 0.05) -> 'pd.DataFrame'
INFO:intentzero2few:make_noisy_df signature (module) = (df: 'pd.DataFrame', per_example: 'int' = 1, seed: 'int' = 42, p_emoji: 'float' = 0.3, p_slang: 'float' = 0.3, p_char: 'float' = 0.05) -> 'pd.DataFrame'
2025-09-21 12:43:49,416 | INFO | intentzero2few | forwarding kwargs to m


SAMPLE classic-aug rows: [{'text': 'what expression would i use to say i love you if i were an italian', 'intent': 'translate'}, {'text': 'what were use to say i you if i verbal expression an italian', 'intent': 'translate'}, {'text': "can you tell me how to say 'i do not speak much spanish', in spanish", 'intent': 'translate'}, {'text': "can you tell me how lots say 'i do not to spanish', spanish", 'intent': 'translate'}, {'text': "what is the equivalent of, 'life is good' in french", 'intent': 'translate'}]
SAMPLE noisy-aug rows: [{'text': 'what expression would i use to say i love you if i were an italian', 'intent': 'translate'}, {'text': 'what expesSion would i us eto say i love you i i were an italian', 'intent': 'translate'}, {'text': "can you tell me how to say 'i do not speak much spanish', in spanish", 'intent': 'translate'}, {'text': "cna you tell me how to sa y'ii do not speak much spanish', in spanish", 'intent': 'translate'}, {'text': "what is the equivalent of, 'life is

In [13]:
# === PATH DOCTOR v2 (safe import + signature print) ===
import os, sys, importlib, inspect, types
from intentzero2few import get_env_paths, setup_logger

p, (logger, _) = get_env_paths(), setup_logger()

REPO_DIR = p["REPO_DIR"]
SRC_DIR  = os.path.join(REPO_DIR, "src")
PKG_DIR  = os.path.join(SRC_DIR, "intentzero2few")

# 1) ensure clean sys.path: remove potential shadows, put src/ first
for bad in [REPO_DIR, os.path.dirname(REPO_DIR), "/content/intentzero2few"]:
    while bad in sys.path:
        sys.path.remove(bad)
if SRC_DIR not in sys.path:
    sys.path.insert(0, SRC_DIR)

importlib.invalidate_caches()

# 2) import module via import_module (guaranteed to be a module)
aug_mod = importlib.import_module("intentzero2few.augmentation")

# 3) now we can safely reload the **module object**
aug_mod = importlib.reload(aug_mod)

print("📦 augmentation module file:", getattr(aug_mod, "__file__", "<?>"))
print("🧪 make_noisy_df signature:", inspect.signature(getattr(aug_mod, "make_noisy_df")))


2025-09-21 12:44:03,883 | INFO | intentzero2few | Logger started. path=/content/intentzero2few-repo/runs/20250921-123702/logs/intentzero2few-20250921-124403.log
INFO:intentzero2few:Logger started. path=/content/intentzero2few-repo/runs/20250921-123702/logs/intentzero2few-20250921-124403.log


📦 augmentation module file: /content/intentzero2few-repo/src/intentzero2few/augmentation.py
🧪 make_noisy_df signature: (df: 'pd.DataFrame', per_example: 'int' = 1, seed: 'int' = 42, p_emoji: 'float' = 0.3, p_slang: 'float' = 0.3, p_char: 'float' = 0.05) -> 'pd.DataFrame'


# 6.C.)Noisy test (in-scope)
EN: Creates a noisy version of test (in-scope only) for robustness evaluation (clean vs noisy).
TR: Robustluk karşılaştırması için test setinin gürültülü bir sürümünü üretir.
RQ: RQ-Robustness (noise sensitivity); rapor: analytics/test_noisy_in_scope.csv.

In [14]:
# 6C) Build NOISY test (in-scope) + error CSV + samples
import os, pandas as pd, traceback
from intentzero2few import load_intents, get_env_paths, run_path, save_csv, save_json
from intentzero2few.augmentation import make_noisy_df

p = get_env_paths()

def log_error_csv(name, err, ctx=None):
    rec = {"phase":name,"error":f"{type(err).__name__}: {err}","trace":traceback.format_exc().strip()}
    if ctx: rec.update({f"ctx_{k}":v for k,v in ctx.items()})
    path = run_path("analytics", f"errors_{name}.csv")
    pd.DataFrame([rec]).to_csv(path, mode="a", index=False, header=not os.path.exists(path))

try:
    splits   = load_intents(os.path.join(p["REPO_DIR"],"data","clinc150.json"))
    test_df  = splits["test"]

    NOISY_TEST_PER_EX = int(os.environ.get("NOISY_TEST_PER_EX","1"))
    test_noisy = make_noisy_df(test_df, per_example=NOISY_TEST_PER_EX, seed=7,
                               p_emoji=0.35, p_slang=0.35, p_char=0.07)

    save_csv(test_noisy, run_path("analytics","test_noisy_in_scope.csv"))
    save_json({"per_example":NOISY_TEST_PER_EX,"rows":len(test_noisy)},
              run_path("analytics","noisy_test_manifest.json"))

    print("\nSAMPLE test_noisy rows:", test_noisy.head(5).to_dict("records"))
    print("✅ 6C done → analytics/test_noisy_in_scope.csv")
except Exception as e:
    log_error_csv("step6C_noisytest", e)
    raise



SAMPLE test_noisy rows: [{'text': 'how would you say fly in italian', 'intent': 'translate'}, {'text': 'how would u say fly in italia 👌🔥🥲', 'intent': 'translate'}, {'text': "what's the spanish word for pasta", 'intent': 'translate'}, {'text': "what's the spanish word for pasta", 'intent': 'translate'}, {'text': 'how would they say butter in zambia', 'intent': 'translate'}]
✅ 6C done → analytics/test_noisy_in_scope.csv


# 7.) Super-intent discovery → Zero-shot → τ calibration → Evaluation (+ heatmaps)
EN: Discovers super-intents (K via silhouette), builds centroid-based zero-shot, calibrates τ on val_polluted (macro-F1 sweep), evaluates on clean/noisy/polluted tests, saves confusion heatmaps.
TR: Super-intentleri (K silhouette) keşfeder; centroid tabanlı zero-shot kurar; τ’yu val_polluted üzerinde kalibre eder; clean/noisy/polluted testlerde değerlendirir; heatmap’leri kaydeder.
RQ: RQ-Structure/Interpretability (super-intents), RQ-OOS/Thresholding, RQ-Robustness; rapor: reports/zs_summary.csv, reports/zs_confmat_*.png, reports/intent_descriptions.csv.

In [15]:
# === 7) Discovery → Zero-shot → τ → Eval (+ heatmaps) + error CSV + samples (robust) — UPDATED ===
import os, json, traceback, importlib, time
import numpy as np, pandas as pd
from intentzero2few import (
    get_env_paths, run_path, report_path, copy_to_report,
    load_intents, save_csv, save_json, setup_logger
)
from intentzero2few import discover_superintents, fit_superintent_zeroshot
from intentzero2few import calibrate_threshold, evaluate_superintent

# --- robust viz import (module or local fallback) ---
try:
    import intentzero2few.viz as viz
except Exception:
    viz = None

def save_confusion_heatmap_local(cm_norm, labels, out_path, title=None):
    import os, numpy as np, matplotlib.pyplot as plt, seaborn as sns
    os.makedirs(os.path.dirname(out_path), exist_ok=True)
    cm = np.asarray(cm_norm, dtype=float)
    fig, ax = plt.subplots(figsize=(max(6, 0.5*len(labels)+2), max(5, 0.5*len(labels)+2)))
    sns.heatmap(cm, ax=ax, vmin=0.0, vmax=1.0, cmap="Blues",
                xticklabels=labels, yticklabels=labels,
                annot=True, fmt=".2f", cbar=True, square=True)
    ax.set_xlabel("Predicted"); ax.set_ylabel("True")
    if title: ax.set_title(title)
    fig.tight_layout(); fig.savefig(out_path, dpi=300, bbox_inches="tight"); plt.close(fig)

p = get_env_paths(); logger,_ = setup_logger()

def log_error_csv(name, err, ctx=None):
    rec = {"phase":name,"error":f"{type(err).__name__}: {err}","trace":traceback.format_exc().strip()}
    if ctx: rec.update({f"ctx_{k}":v for k,v in (ctx or {}).items()})
    path = run_path("analytics", f"errors_{name}.csv")
    pd.DataFrame([rec]).to_csv(path, mode="a", index=False, header=not os.path.exists(path))
    logger.error("Error in %s: %s", name, err)

# --- helper: CSV exports from eval report ---
def _export_eval_csvs(rep: dict, split_name: str):
    """
    rep: evaluate_superintent çıktısı beklenir.
    Yazar:
      - analytics/confusion_{split_name}_raw.csv           (varsa rep['confusion_matrix'])
      - analytics/confusion_{split_name}_norm_all.csv      (rep['confusion_matrix_normalized'])
      - analytics/cls_report_{split_name}.csv              (varsa rep['classification_report'])
    """
    try:
        labels = rep.get("labels", None)
        # raw confusion
        if "confusion_matrix" in rep and labels is not None:
            df_raw = pd.DataFrame(np.asarray(rep["confusion_matrix"], dtype=float),
                                  index=[f"true::{l}" for l in labels],
                                  columns=[f"pred::{l}" for l in labels])
            save_csv(df_raw, run_path("analytics", f"confusion_{split_name}_raw.csv"))
        # normalized confusion
        if "confusion_matrix_normalized" in rep and labels is not None:
            df_norm = pd.DataFrame(np.asarray(rep["confusion_matrix_normalized"], dtype=float),
                                   index=[f"true::{l}" for l in labels],
                                   columns=[f"pred::{l}" for l in labels])
            save_csv(df_norm, run_path("analytics", f"confusion_{split_name}_norm_all.csv"))
        # classification report
        if "classification_report" in rep and isinstance(rep["classification_report"], dict):
            df_rep = pd.DataFrame(rep["classification_report"]).T
            save_csv(df_rep, run_path("analytics", f"cls_report_{split_name}.csv"))
    except Exception as e:
        log_error_csv("export_eval_csvs", e, {"split": split_name})

try:
    # 7.0 Load splits + derived splits
    splits = load_intents(os.path.join(p["REPO_DIR"], "data", "clinc150.json"))
    train_df, val_df, test_df = splits["train"], splits["val"], splits["test"]
    val_polluted  = pd.read_csv(run_path("analytics","val_polluted.csv"))
    test_polluted = pd.read_csv(run_path("analytics","test_polluted.csv"))
    test_noisy    = pd.read_csv(run_path("analytics","test_noisy_in_scope.csv"))

    # 7.1 Discover super-intents
    intent_to_super, super_to_intents, artifacts = discover_superintents(train_df, k_range=(8,12))
    save_json({"k_best":artifacts["k_best"], "k_scores":artifacts["k_scores"]},
              run_path("analytics","discovery_k.json"))

    # mapping exports
    mapping_df = pd.DataFrame({"intent": list(intent_to_super.keys()),
                               "super":  [intent_to_super[i] for i in intent_to_super]})
    save_csv(mapping_df, run_path("analytics","super_mapping.csv"))

    # descriptions
    desc_df = pd.DataFrame({"intent": list(artifacts["label_desc"].keys()),
                            "description": [artifacts["label_desc"][k] for k in artifacts["label_desc"]]})
    # analytics'e yaz ve rapora da kopya bırak
    save_csv(desc_df, run_path("analytics","intent_descriptions.csv"))
    copy_to_report(run_path("analytics","intent_descriptions.csv"), "intent_descriptions.csv")

    # super_to_intents.json (rapora)
    save_json(super_to_intents, report_path("super_to_intents.json"))

    print("\nSAMPLE super mapping:", mapping_df.head(10).to_dict("records"))

    # 7.2 Fit zero-shot model (centroids)
    zs = fit_superintent_zeroshot(train_df, intent_to_super, artifacts,
                                  exemplars_per_intent=5, description_weight=1.0)

    # 7.3 τ calibration
    # Global τ (polluted val) — ana yol
    tau = calibrate_threshold(zs, val_polluted)
    save_json({"tau": float(tau), "source": "val_polluted"}, run_path("analytics","threshold_zero_shot.json"))
    print(f"Calibrated global τ (polluted): {tau:.3f}")

    # Opsiyonel: clean/polluted ayrı τ dosyaları (varsa hesaplanır)
    try:
        tau_clean = calibrate_threshold(zs, val_df)
        save_json({"tau": float(tau_clean), "source": "val_clean"}, run_path("analytics","threshold_zero_shot_clean.json"))
        print(f"Calibrated clean τ: {tau_clean:.3f}")
    except Exception as _e:
        pass
    try:
        tau_poll = calibrate_threshold(zs, val_polluted)
        save_json({"tau": float(tau_poll), "source": "val_polluted"}, run_path("analytics","threshold_zero_shot_polluted.json"))
    except Exception as _e:
        pass

    # 7.4 Evaluate on clean/noisy/polluted (test)
    rep_clean = evaluate_superintent(zs, test_df,       tau)
    rep_noisy = evaluate_superintent(zs, test_noisy,    tau)
    rep_poll  = evaluate_superintent(zs, test_polluted, tau)

    # 7.5 Dump evals + heatmaps + CSV exports
    def dump_eval(name, rep):
        # metrics json
        save_json(rep, run_path("analytics", f"zs_eval_{name}.json"))
        # confusion CSV'ler & class report CSV
        _export_eval_csvs(rep, name)
        # heatmap
        out_png = run_path("figures", f"zs_confmat_{name}.png")
        saver = (viz.save_confusion_heatmap if (viz and hasattr(viz, "save_confusion_heatmap"))
                 else save_confusion_heatmap_local)
        cmn = rep.get("confusion_matrix_normalized", None)
        labels = rep.get("labels", None)
        if cmn is not None and labels is not None:
            saver(cmn, labels, out_png, title=f"ZS Confusion ({name})")
            if os.path.exists(out_png):
                copy_to_report(out_png, f"zs_confmat_{name}.png")

    for nm,rep in [("clean",rep_clean), ("noisy",rep_noisy), ("polluted",rep_poll)]:
        dump_eval(nm, rep)

    # 7.6 Compact thesis table (ZS summary) — analytics + reports
    zs_summary = pd.DataFrame([
        {"model":"ZeroShot-LICL","variant":"ZS","split":"test_clean",    "accuracy":rep_clean.get("accuracy"), "macro_f1":rep_clean.get("macro_f1")},
        {"model":"ZeroShot-LICL","variant":"ZS","split":"test_noisy",    "accuracy":rep_noisy.get("accuracy"), "macro_f1":rep_noisy.get("macro_f1")},
        {"model":"ZeroShot-LICL","variant":"ZS","split":"test_polluted", "accuracy":rep_poll.get("accuracy"),  "macro_f1":rep_poll.get("macro_f1")},
    ])
    save_csv(zs_summary, run_path("analytics","zs_summary.csv"))
    copy_to_report(run_path("analytics","zs_summary.csv"), "zs_summary.csv")

    # 7.7 Toplu JSON (tek dosya)
    save_json({"zero_shot_results": [
        {"split":"test_clean",    **{k: rep_clean.get(k) for k in ["accuracy","macro_f1"]}},
        {"split":"test_noisy",    **{k: rep_noisy.get(k) for k in ["accuracy","macro_f1"]}},
        {"split":"test_polluted", **{k: rep_poll.get(k)  for k in ["accuracy","macro_f1"]}},
    ]}, run_path("analytics","zs_eval_all.json"))

    print("ZS summary:", zs_summary.to_dict("records"))
    print("✅ 7 done → ZS evals saved; confusion CSV + class reports (analytics/); heatmaps copied to reports/; super_to_intents.json exported.")

except Exception as e:
    log_error_csv("step7_zs", e)
    raise


2025-09-21 12:44:25,690 | INFO | intentzero2few | Logger started. path=/content/intentzero2few-repo/runs/20250921-123702/logs/intentzero2few-20250921-124425.log
INFO:intentzero2few:Logger started. path=/content/intentzero2few-repo/runs/20250921-123702/logs/intentzero2few-20250921-124425.log


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md: 0.00B [00:00, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

2025-09-21 12:44:58,834 | INFO | intentzero2few | discovery: K=8 silhouette=0.0467
INFO:intentzero2few:discovery: K=8 silhouette=0.0467
2025-09-21 12:44:58,912 | INFO | intentzero2few | discovery: K=9 silhouette=0.0343
INFO:intentzero2few:discovery: K=9 silhouette=0.0343
2025-09-21 12:44:58,996 | INFO | intentzero2few | discovery: K=10 silhouette=0.0432
INFO:intentzero2few:discovery: K=10 silhouette=0.0432
2025-09-21 12:44:59,094 | INFO | intentzero2few | discovery: K=11 silhouette=0.0368
INFO:intentzero2few:discovery: K=11 silhouette=0.0368
2025-09-21 12:44:59,199 | INFO | intentzero2few | discovery: K=12 silhouette=0.0320
INFO:intentzero2few:discovery: K=12 silhouette=0.0320



SAMPLE super mapping: [{'intent': 'accept_reservations', 'super': 'S2'}, {'intent': 'account_blocked', 'super': 'S6'}, {'intent': 'alarm', 'super': 'S4'}, {'intent': 'application_status', 'super': 'S1'}, {'intent': 'apr', 'super': 'S1'}, {'intent': 'are_you_a_bot', 'super': 'S3'}, {'intent': 'balance', 'super': 'S6'}, {'intent': 'bill_balance', 'super': 'S6'}, {'intent': 'bill_due', 'super': 'S6'}, {'intent': 'book_flight', 'super': 'S4'}]


2025-09-21 12:45:01,064 | INFO | intentzero2few | zeroshot: centroid S0 built with 6 desc + 30 exemplars
INFO:intentzero2few:zeroshot: centroid S0 built with 6 desc + 30 exemplars
2025-09-21 12:45:02,402 | INFO | intentzero2few | zeroshot: centroid S1 built with 18 desc + 90 exemplars
INFO:intentzero2few:zeroshot: centroid S1 built with 18 desc + 90 exemplars
2025-09-21 12:45:03,498 | INFO | intentzero2few | zeroshot: centroid S2 built with 14 desc + 70 exemplars
INFO:intentzero2few:zeroshot: centroid S2 built with 14 desc + 70 exemplars
2025-09-21 12:45:04,395 | INFO | intentzero2few | zeroshot: centroid S3 built with 22 desc + 110 exemplars
INFO:intentzero2few:zeroshot: centroid S3 built with 22 desc + 110 exemplars
2025-09-21 12:45:05,824 | INFO | intentzero2few | zeroshot: centroid S4 built with 32 desc + 160 exemplars
INFO:intentzero2few:zeroshot: centroid S4 built with 32 desc + 160 exemplars
2025-09-21 12:45:06,517 | INFO | intentzero2few | zeroshot: centroid S5 built with 18 de

Calibrated global τ (polluted): 0.200
Calibrated clean τ: 0.200
ZS summary: [{'model': 'ZeroShot-LICL', 'variant': 'ZS', 'split': 'test_clean', 'accuracy': 0.7891111111111111, 'macro_f1': 0.7231438350876834}, {'model': 'ZeroShot-LICL', 'variant': 'ZS', 'split': 'test_noisy', 'accuracy': 0.7431111111111111, 'macro_f1': 0.6808314882072088}, {'model': 'ZeroShot-LICL', 'variant': 'ZS', 'split': 'test_polluted', 'accuracy': 0.7314545454545455, 'macro_f1': 0.7425027892413889}]
✅ 7 done → ZS evals saved; confusion CSV + class reports (analytics/); heatmaps copied to reports/; super_to_intents.json exported.


# 8.) Baselines (TF-IDF+LR, BERT-Linear) + τ + Robustness summary

EN: Encodes labels (OOS→-1), trains baselines, calibrates τ on val_polluted, reports clean/polluted (and retains OOS AUROC/FPR@TPR95). Uses classic-augmented train if available. Errors per-model are logged but pipeline continues.
TR: Etiketleri kodlar (OOS→-1), baseline’ları eğitir, τ’yı val_polluted’da kalibre eder, clean/polluted sonuçları raporlar (OOS AUROC/FPR@TPR95 dahil). Classic-aug varsa onu kullanır. Model bazında hata olsa bile diğerleri çalışır.
RQ: RQ-Benchmark (hangi yöntem daha iyi?), RQ-OOS. Raporlar: analytics/baseline_summary.csv, reports/baseline_robustness.csv.

In [16]:
# 8) Baselines + τ + Robustness summary — SELF-CONTAINED, TIDY EXPORT (syntax fixed)
import os, importlib, traceback
import numpy as np, pandas as pd
from sklearn.metrics import accuracy_score, f1_score

from intentzero2few import (
    get_env_paths, run_path, report_path, save_csv, save_json,
    load_intents, setup_logger
)
from intentzero2few import fit_label_encoder, encode_in_scope_labels, sanity_check_labels

p = get_env_paths(); logger,_ = setup_logger()

def log_error_csv(name, err, ctx=None):
    rec = {"phase":name,"error":f"{type(err).__name__}: {err}","trace":traceback.format_exc().strip()}
    if ctx: rec.update({f"ctx_{k}":v for k,v in (ctx or {}).items()})
    path = run_path("analytics", f"errors_{name}.csv")
    pd.DataFrame([rec]).to_csv(path, mode="a", index=False, header=not os.path.exists(path))
    logger.error("Error in %s: %s", name, err)

# --- 8.0 Load splits (+ derived)
splits = load_intents(os.path.join(p["REPO_DIR"],"data","clinc150.json"))
train_df, val_df, test_df = splits["train"], splits["val"], splits["test"]
val_polluted  = pd.read_csv(run_path("analytics","val_polluted.csv"))
test_polluted = pd.read_csv(run_path("analytics","test_polluted.csv"))
# optional noisy split
test_noisy_path = run_path("analytics","test_noisy_in_scope.csv")
test_noisy = pd.read_csv(test_noisy_path) if os.path.exists(test_noisy_path) else None

# --- 8.1 Use augmented train if exists
aug_train_path = run_path("analytics","train_augmented.csv")
train_use = pd.read_csv(aug_train_path) if os.path.exists(aug_train_path) else train_df

# --- 8.2 Label encode in-scope
le,l2i,i2l = fit_label_encoder(train_use)
def enc(df): return encode_in_scope_labels(df, le)
ftr = enc(train_use); val = enc(val_df); tst = enc(test_df)
valp = enc(val_polluted); tstp = enc(test_polluted)
tstn = enc(test_noisy) if test_noisy is not None else None
sanity_check_labels([ftr,val,tst,valp,tstp] + ([tstn] if tstn is not None else []))

# --- 8.3 Helpers: detect X/y columns
_TEXT_CAND = ["text","utterance","sentence","query"]
_Y_CAND    = ["y","label","intent_id","intent","target"]

def _xy(df):
    text_col = next((c for c in _TEXT_CAND if c in df.columns), None)
    y_col    = next((c for c in _Y_CAND if c in df.columns), None)
    if text_col is None or y_col is None:
        raise ValueError(f"Could not detect text/y columns. cols={list(df.columns)}")
    X = df[text_col].astype(str).tolist()
    y = df[y_col].tolist()
    return X, np.array(y)

# --- 8.4 Try to use package baselines; else local fallbacks
def _try_pkg_runners():
    try:
        mod = importlib.import_module("intentzero2few.baselines")
        rmaj = getattr(mod, "run_majority", None)
        rtf  = getattr(mod, "run_tfidf_lr", None)
        rbert= getattr(mod, "run_bert_linear", None)
        if all([rmaj, rtf, rbert]):
            print("✓ Using package baselines (intentzero2few.baselines)")
            return rmaj, rtf, rbert
    except Exception:
        pass
    return None, None, None

# --- local Majority baseline
def run_majority_local(ftr, val, valp, tst, tstp, tstn=None):
    _, ytr = _xy(ftr)
    maj = int(pd.Series(ytr).value_counts().idxmax())
    def _eval(df):
        _, y = _xy(df)
        yhat = np.full_like(y, maj)
        return accuracy_score(y, yhat), f1_score(y, yhat, average="macro")
    out = {
        "model": "Majority",
        "val_acc": None, "val_macro_f1": None,
        "pval_acc": None, "pval_macro_f1": None,
        "test_acc": None,"test_macro_f1": None,
        "ptest_acc": None,"ptest_macro_f1": None,
        "test_noisy_acc": None,"test_noisy_macro_f1": None,
    }
    out["val_acc"],  out["val_macro_f1"]  = _eval(val)
    out["pval_acc"], out["pval_macro_f1"] = _eval(valp)
    out["test_acc"], out["test_macro_f1"] = _eval(tst)
    out["ptest_acc"],out["ptest_macro_f1"]= _eval(tstp)
    if tstn is not None:
        out["test_noisy_acc"], out["test_noisy_macro_f1"] = _eval(tstn)
    return out

# --- local TF-IDF + Logistic Regression
def run_tfidf_lr_local(ftr, val, valp, tst, tstp, tstn=None):
    from sklearn.feature_extraction.text import TfidfVectorizer
    from sklearn.linear_model import LogisticRegression
    Xtr, ytr = _xy(ftr)
    vec = TfidfVectorizer(min_df=2, ngram_range=(1,2))
    Xtrv = vec.fit_transform(Xtr)
    clf = LogisticRegression(max_iter=1000)
    clf.fit(Xtrv, ytr)

    def _eval(df):
        X, y = _xy(df)
        Xv = vec.transform(X)
        yhat = clf.predict(Xv)
        return accuracy_score(y, yhat), f1_score(y, yhat, average="macro")

    out = {
        "model": "TFIDF+LR",
        "val_acc": None, "val_macro_f1": None,
        "pval_acc": None, "pval_macro_f1": None,
        "test_acc": None,"test_macro_f1": None,
        "ptest_acc": None,"ptest_macro_f1": None,
        "test_noisy_acc": None,"test_noisy_macro_f1": None,
    }
    out["val_acc"],  out["val_macro_f1"]  = _eval(val)
    out["pval_acc"], out["pval_macro_f1"] = _eval(valp)
    out["test_acc"], out["test_macro_f1"] = _eval(tst)
    out["ptest_acc"],out["ptest_macro_f1"]= _eval(tstp)
    if tstn is not None:
        out["test_noisy_acc"], out["test_noisy_macro_f1"] = _eval(tstn)
    return out

# --- local BERT embeddings + Logistic Regression
def run_bert_linear_local(ftr, val, valp, tst, tstp, tstn=None, model_name="sentence-transformers/all-MiniLM-L6-v2"):
    try:
        from sentence_transformers import SentenceTransformer
        from sklearn.linear_model import LogisticRegression
    except ImportError as e:
        raise ImportError("sentence-transformers not available") from e

    def _emb(model, texts, batch_size=256):
        out = []
        for i in range(0, len(texts), batch_size):
            out.append(model.encode(texts[i:i+batch_size], show_progress_bar=False, convert_to_numpy=True, normalize_embeddings=False))
        return np.vstack(out) if out else np.zeros((0,384))

    Xtr, ytr = _xy(ftr)
    m = SentenceTransformer(model_name)
    Etr = _emb(m, Xtr)
    clf = LogisticRegression(max_iter=2000)
    clf.fit(Etr, ytr)

    def _eval(df):
        X, y = _xy(df)
        E = _emb(m, X)
        yhat = clf.predict(E)
        return accuracy_score(y, yhat), f1_score(y, yhat, average="macro")

    out = {
        "model": "BERT-Linear",
        "val_acc": None, "val_macro_f1": None,
        "pval_acc": None, "pval_macro_f1": None,
        "test_acc": None,"test_macro_f1": None,
        "ptest_acc": None,"ptest_macro_f1": None,
        "test_noisy_acc": None,"test_noisy_macro_f1": None,
    }
    out["val_acc"],  out["val_macro_f1"]  = _eval(val)
    out["pval_acc"], out["pval_macro_f1"] = _eval(valp)
    out["test_acc"], out["test_macro_f1"] = _eval(tst)
    out["ptest_acc"],out["ptest_macro_f1"]= _eval(tstp)
    if tstn is not None:
        out["test_noisy_acc"], out["test_noisy_macro_f1"] = _eval(tstn)
    return out

# --- 8.5 Choose runners (package if present, else local)
pkg_runners = _try_pkg_runners()
if all(pkg_runners):
    run_majority, run_tfidf_lr, run_bert_linear = pkg_runners
else:
    print("⚠️ Package runners not found — using local fallback implementations.")
    run_majority   = lambda ftr,val,valp,tst,tstp: run_majority_local(ftr,val,valp,tst,tstp,tstn)
    run_tfidf_lr   = lambda ftr,val,valp,tst,tstp: run_tfidf_lr_local(ftr,val,valp,tst,tstp,tstn)
    def _bert_runner(ftr,val,valp,tst,tstp):
        try:
            return run_bert_linear_local(ftr,val,valp,tst,tstp,tstn)
        except ImportError as e:
            log_error_csv("step8_bert_linear_import", e)
            return {
                "model":"BERT-Linear",
                "val_acc": None, "val_macro_f1": None,
                "pval_acc": None, "pval_macro_f1": None,
                "test_acc": None, "test_macro_f1": None,
                "ptest_acc": None, "ptest_macro_f1": None,
                "test_noisy_acc": None, "test_noisy_macro_f1": None,
            }
    run_bert_linear = _bert_runner

# --- 8.6 Run baselines
rows=[]
for runner, name in [
    (run_majority,   "majority"),
    (run_tfidf_lr,   "tfidf+lr"),
    (run_bert_linear,"bert-linear"),
]:
    try:
        r = runner(ftr, val, valp, tst, tstp)  # dict döner
        r["runner"] = name
        r.setdefault("model", {"majority":"Majority","tfidf+lr":"TFIDF+LR","bert-linear":"BERT-Linear"}[name])
        rows.append(r)
    except Exception as e:
        log_error_csv("step8_baseline_runner", e, {"runner":name})

baseline_wide = pd.DataFrame(rows)
save_csv(baseline_wide, run_path("analytics","baseline_summary_wide.csv"))

# --- 8.7 Tidy summary (model, variant, split, accuracy, macro_f1)
def _pick(d: dict, keys: list[str]):
    for k in keys:
        if k in d and d[k] is not None and not (isinstance(d[k], float) and np.isnan(d[k])):
            try:
                return float(d[k])
            except Exception:
                return d[k]
    return None

tidy_rows=[]
for r in rows:
    model   = r.get("model", r.get("runner", "baseline"))
    variant = r.get("runner", model)

    def add(split, acc_keys, f1_keys):
        acc = _pick(r, acc_keys)
        mf1 = _pick(r, f1_keys)
        if (acc is not None) or (mf1 is not None):
            tidy_rows.append({
                "model": model,
                "variant": variant,
                "split": split,
                "accuracy": acc,
                "macro_f1": mf1
            })

    add("val_clean",
        ["val_acc","acc_val","val_accuracy"],
        ["val_macro_f1","macro_f1_val","val_f1_macro","val_f1_macro_avg"])

    add("val_polluted",
        ["pval_acc","valp_acc","val_polluted_acc","acc_val_polluted"],
        ["pval_macro_f1","valp_macro_f1","val_polluted_macro_f1","macro_f1_val_polluted"])

    add("test_clean",
        ["test_acc","acc_test","test_accuracy"],
        ["test_macro_f1","macro_f1_test","test_f1_macro","test_f1_macro_avg"])

    add("test_noisy",
        ["test_noisy_acc","acc_test_noisy","noisy_acc","test_in_scope_noisy_acc"],
        ["test_noisy_macro_f1","macro_f1_test_noisy","noisy_macro_f1"])

    add("test_polluted",
        ["ptest_acc","test_polluted_acc","acc_test_polluted"],
        ["ptest_macro_f1","test_polluted_macro_f1","macro_f1_test_polluted"])

baseline_tidy = pd.DataFrame(tidy_rows)
if baseline_tidy.empty and not baseline_wide.empty:
    # safety fallback
    for _, r in baseline_wide.iterrows():
        model   = r.get("model", r.get("runner","baseline"))
        variant = r.get("runner", model)
        acc = _pick(r, ["test_acc","acc_test","test_accuracy"])
        mf1 = _pick(r, ["test_macro_f1","macro_f1_test"])
        baseline_tidy = pd.DataFrame([{
            "model": model, "variant": variant, "split":"test_clean",
            "accuracy": acc, "macro_f1": mf1
        }])

save_csv(baseline_tidy, run_path("analytics","baseline_summary.csv"))

# --- 8.8 Robustness table for thesis (rapor)
rob_rows = []
for r in rows:
    model = r.get("model", r.get("runner","baseline"))
    rob_rows.append({
        "model": model,
        "acc_clean": _pick(r, ["test_acc","acc_test","test_accuracy"]),
        "macro_f1_clean": _pick(r, ["test_macro_f1","macro_f1_test"]),
        # OOS metrics may be absent; they’ll be NaN if not provided:
        "auroc_oos_polluted": _pick(r, ["ptest_auroc_oos","test_polluted_auroc_oos","auroc_oos_test_polluted"]),
        "fpr@tpr95_polluted": _pick(r, ["ptest_fpr@tpr95","test_polluted_fpr@tpr95","fpr_at_tpr95_test_polluted"]),
    })
rob = pd.DataFrame(rob_rows)
save_csv(rob, report_path("baseline_robustness.csv"))

print("\nBASELINE summary (tidy, head):", baseline_tidy.head(6).to_dict("records"))
print("ROBUSTNESS table:", rob.head(6).to_dict("records"))
print("✅ 8 done → analytics/baseline_summary.csv (tidy) + analytics/baseline_summary_wide.csv + reports/baseline_robustness.csv")


2025-09-21 12:49:47,265 | INFO | intentzero2few | Logger started. path=/content/intentzero2few-repo/runs/20250921-123702/logs/intentzero2few-20250921-124947.log
INFO:intentzero2few:Logger started. path=/content/intentzero2few-repo/runs/20250921-123702/logs/intentzero2few-20250921-124947.log
2025-09-21 12:49:47,463 | ERROR | intentzero2few | Error in step8_baseline_runner: invalid literal for int() with base 10: 'translate'
ERROR:intentzero2few:Error in step8_baseline_runner: invalid literal for int() with base 10: 'translate'


⚠️ Package runners not found — using local fallback implementations.

BASELINE summary (tidy, head): [{'model': 'TFIDF+LR', 'variant': 'tfidf+lr', 'split': 'val_clean', 'accuracy': 0.8953333333333333, 'macro_f1': 0.8946821767863957}, {'model': 'TFIDF+LR', 'variant': 'tfidf+lr', 'split': 'val_polluted', 'accuracy': 0.8664516129032258, 'macro_f1': 0.8753752580387765}, {'model': 'TFIDF+LR', 'variant': 'tfidf+lr', 'split': 'test_clean', 'accuracy': 0.8991111111111111, 'macro_f1': 0.8981875360402208}, {'model': 'TFIDF+LR', 'variant': 'tfidf+lr', 'split': 'test_noisy', 'accuracy': 0.8367777777777777, 'macro_f1': 0.8366395802403952}, {'model': 'TFIDF+LR', 'variant': 'tfidf+lr', 'split': 'test_polluted', 'accuracy': 0.7356363636363636, 'macro_f1': 0.8139933532019595}, {'model': 'BERT-Linear', 'variant': 'bert-linear', 'split': 'val_clean', 'accuracy': 0.9516666666666667, 'macro_f1': 0.951377591255377}]
ROBUSTNESS table: [{'model': 'TFIDF+LR', 'acc_clean': 0.8991111111111111, 'macro_f1_clean': 

# 9.) Few-shot sweep (K=1/5/10)

EN: Builds K-shot training subsets and evaluates TF-IDF+LR and BERT-Linear; shows label efficiency. Per-K errors logged and continued.
TR: K-shot alt-setler (1/5/10) ile TF-IDF+LR ve BERT-Linear’ı değerlendirir; etiket verimliliğini gösterir. Hata olursa K bazında kaydedip devam eder.
RQ: RQ-Label Efficiency; rapor: reports/fewshot_summary.csv.

In [17]:
# 9) Few-shot K={1,5,10} with TF-IDF+LR, BERT-Linear + error CSV + samples — RESILIENT
import os, importlib, traceback
import numpy as np, pandas as pd
from sklearn.metrics import accuracy_score, f1_score

from intentzero2few import (
    get_env_paths, run_path, report_path, save_csv,
    load_intents, setup_logger
)
from intentzero2few import fit_label_encoder, encode_in_scope_labels, sanity_check_labels
from intentzero2few.fewshot import make_k_shot

p = get_env_paths(); logger,_ = setup_logger()

def log_error_csv(name, err, ctx=None):
    rec = {"phase":name,"error":f"{type(err).__name__}: {err}","trace":traceback.format_exc().strip()}
    if ctx: rec.update({f"ctx_{k}":v for k,v in (ctx or {}).items()})
    path = run_path("analytics", f"errors_{name}.csv")
    pd.DataFrame([rec]).to_csv(path, mode="a", index=False, header=not os.path.exists(path))
    logger.error("Error in %s: %s", name, err)

# --- Load splits
splits = load_intents(os.path.join(p["REPO_DIR"],"data","clinc150.json"))
train_df, val_df, test_df = splits["train"], splits["val"], splits["test"]
val_polluted  = pd.read_csv(run_path("analytics","val_polluted.csv"))
test_polluted = pd.read_csv(run_path("analytics","test_polluted.csv"))

# --- Helpers: detect X/y columns
_TEXT_CAND = ["text","utterance","sentence","query"]
_Y_CAND    = ["y","label","intent_id","intent","target"]

def _xy(df):
    text_col = next((c for c in _TEXT_CAND if c in df.columns), None)
    y_col    = next((c for c in _Y_CAND if c in df.columns), None)
    if text_col is None or y_col is None:
        raise ValueError(f"Could not detect text/y columns. cols={list(df.columns)}")
    X = df[text_col].astype(str).tolist()
    y = df[y_col].tolist()
    return X, np.array(y)

# --- Try to use package baselines; else local fallbacks
def _try_pkg_runners():
    try:
        mod = importlib.import_module("intentzero2few.baselines")
        rtf  = getattr(mod, "run_tfidf_lr", None)
        rbert= getattr(mod, "run_bert_linear", None)
        if rtf and rbert:
            print("✓ Using package baselines (intentzero2few.baselines)")
            return rtf, rbert
    except Exception:
        pass
    return None, None

# local TF-IDF + Logistic Regression
def run_tfidf_lr_local(ftr, val, valp, tst, tstp):
    from sklearn.feature_extraction.text import TfidfVectorizer
    from sklearn.linear_model import LogisticRegression
    Xtr, ytr = _xy(ftr)
    vec = TfidfVectorizer(min_df=2, ngram_range=(1,2))
    Xtrv = vec.fit_transform(Xtr)
    clf = LogisticRegression(max_iter=1000)
    clf.fit(Xtrv, ytr)
    def _eval(df):
        X, y = _xy(df); yhat = clf.predict(vec.transform(X))
        return accuracy_score(y, yhat), f1_score(y, yhat, average="macro")
    return {
        "model":"TFIDF+LR",
        "val_acc":_eval(val)[0],  "val_macro_f1":_eval(val)[1],
        "pval_acc":_eval(valp)[0], "pval_macro_f1":_eval(valp)[1],
        "test_acc":_eval(tst)[0], "test_macro_f1":_eval(tst)[1],
        "ptest_acc":_eval(tstp)[0], "ptest_macro_f1":_eval(tstp)[1],
    }

# local BERT sentence embeddings + Logistic Regression
def run_bert_linear_local(ftr, val, valp, tst, tstp, model_name="sentence-transformers/all-MiniLM-L6-v2"):
    try:
        from sentence_transformers import SentenceTransformer
        from sklearn.linear_model import LogisticRegression
    except ImportError as e:
        raise ImportError("sentence-transformers not available") from e
    def _emb(model, texts, batch_size=256):
        out=[];
        for i in range(0,len(texts),batch_size):
            out.append(model.encode(texts[i:i+batch_size], show_progress_bar=False, convert_to_numpy=True, normalize_embeddings=False))
        return np.vstack(out) if out else np.zeros((0,384))
    Xtr, ytr = _xy(ftr); m = SentenceTransformer(model_name); Etr = _emb(m, Xtr)
    clf = LogisticRegression(max_iter=2000); clf.fit(Etr, ytr)
    def _eval(df):
        X, y = _xy(df); yhat = clf.predict(_emb(m, X))
        return accuracy_score(y, yhat), f1_score(y, yhat, average="macro")
    return {
        "model":"BERT-Linear",
        "val_acc":_eval(val)[0],  "val_macro_f1":_eval(val)[1],
        "pval_acc":_eval(valp)[0], "pval_macro_f1":_eval(valp)[1],
        "test_acc":_eval(tst)[0], "test_macro_f1":_eval(tst)[1],
        "ptest_acc":_eval(tstp)[0], "ptest_macro_f1":_eval(tstp)[1],
    }

pkg_tfidf, pkg_bert = _try_pkg_runners()
if pkg_tfidf and pkg_bert:
    run_tfidf_lr, run_bert_linear = pkg_tfidf, pkg_bert
else:
    print("⚠️ Package runners not found — using local fallback implementations.")
    run_tfidf_lr, run_bert_linear = run_tfidf_lr_local, run_bert_linear_local

# --- Few-shot loop
rows=[]
for K in [1,5,10]:
    try:
        ktr = make_k_shot(train_df, k=K, seed=42, drop_short=False)
        le,_,_ = fit_label_encoder(ktr)
        def enc(df): return encode_in_scope_labels(df, le)
        ftr = enc(ktr); val = enc(val_df); tst = enc(test_df)
        valp = enc(val_polluted); tstp = enc(test_polluted)
        sanity_check_labels([ftr,val,tst,valp,tstp])

        # TF-IDF + LR
        try:
            r = run_tfidf_lr(ftr, val, valp, tst, tstp)
            r["K"] = K; r["runner"] = "tfidf+lr"
            rows.append(r)
        except Exception as e:
            log_error_csv("step9_runner", e, {"K":K,"runner":"tfidf+lr"})

        # BERT-Linear (opsiyonel: sentence-transformers yoksa log düşer)
        try:
            r = run_bert_linear(ftr, val, valp, tst, tstp)
            r["K"] = K; r["runner"] = "bert-linear"
            rows.append(r)
        except Exception as e:
            log_error_csv("step9_runner", e, {"K":K,"runner":"bert-linear"})

        print(f"\nSAMPLE K={K} few-shot train rows:", ktr.head(5).to_dict("records"))

    except Exception as e:
        log_error_csv("step9_kshot", e, {"K":K})

fewshot_df = pd.DataFrame(rows)
save_csv(fewshot_df, report_path("fewshot_summary.csv"))
print("✅ 9 done → reports/fewshot_summary.csv (head):", fewshot_df.head(3).to_dict("records"))


2025-09-21 12:57:56,764 | INFO | intentzero2few | Logger started. path=/content/intentzero2few-repo/runs/20250921-123702/logs/intentzero2few-20250921-125756.log
INFO:intentzero2few:Logger started. path=/content/intentzero2few-repo/runs/20250921-123702/logs/intentzero2few-20250921-125756.log


⚠️ Package runners not found — using local fallback implementations.

SAMPLE K=1 few-shot train rows: [{'text': 'does ruby tuesday accept reservations', 'intent': 'accept_reservations'}, {'text': 'why is my account locked', 'intent': 'account_blocked'}, {'text': 'set an alarm', 'intent': 'alarm'}, {'text': 'has my credit card application been approved', 'intent': 'application_status'}, {'text': "what is my credit card's apr", 'intent': 'apr'}]

SAMPLE K=5 few-shot train rows: [{'text': 'does ruby tuesday accept reservations', 'intent': 'accept_reservations'}, {'text': 'are reservations allowed at burger king', 'intent': 'accept_reservations'}, {'text': 'do they take reservations at bar tartine', 'intent': 'accept_reservations'}, {'text': 'is there evening reservations available in the eve', 'intent': 'accept_reservations'}, {'text': "how many culver's take reservations", 'intent': 'accept_reservations'}]

SAMPLE K=10 few-shot train rows: [{'text': 'does ruby tuesday accept reservations

# 11.) (Opsiyonel) HF fine-tune (AdamW + weight decay, mini-batch/epochs)  
**EN:** Optional fine-tuning with Transformers `Trainer` (AdamW). Keep disabled unless you explicitly set `RUN_HF=1`; it’s slow.  
**TR:** İsteğe bağlı AdamW ile ince ayar. `RUN_HF=1` yapmadıkça kapalı; uzun sürer.  
**RQ:** Strong baseline check; rapor: eğitim logları `runs/.../logs/hf/` altına.

In [18]:
# 11) Optional: HF fine-tune with AdamW (set RUN_HF=1 to enable) + append metrics to baseline_summary.csv
import os, numpy as np, pandas as pd, traceback

from intentzero2few import (
    load_intents, get_env_paths, run_path,
    setup_logger, save_csv
)
from sklearn.preprocessing import LabelEncoder
from sklearn.metrics import accuracy_score, f1_score

# Toggles:
#   RUN_HF=1           → fine-tune çalışır
#   HF_PER_CLASS=8     → her intent'ten kaç örnek (demo için). HF_PER_CLASS=0 → tüm veriyi kullan.
RUN_HF = int(os.environ.get("RUN_HF", "0"))
HF_PER_CLASS = int(os.environ.get("HF_PER_CLASS", "8"))  # 0 => full data

logger, _ = setup_logger()

if RUN_HF:
    try:
        from datasets import Dataset
        from transformers import (AutoTokenizer, AutoModelForSequenceClassification,
                                  Trainer, TrainingArguments)

        p = get_env_paths()
        splits = load_intents(os.path.join(p["REPO_DIR"], "data", "clinc150.json"))
        train_df, val_df, test_df = splits["train"], splits["val"], splits["test"]

        # --- subset vs full ---
        if HF_PER_CLASS and HF_PER_CLASS > 0:
            train_use = train_df.groupby("intent", group_keys=False).apply(lambda g: g.head(HF_PER_CLASS)).reset_index(drop=True)
            val_use   = val_df.groupby("intent",   group_keys=False).apply(lambda g: g.head(HF_PER_CLASS)).reset_index(drop=True)
            test_use  = test_df.groupby("intent",  group_keys=False).apply(lambda g: g.head(HF_PER_CLASS)).reset_index(drop=True)
        else:
            train_use, val_use, test_use = train_df, val_df, test_df

        # --- labels ---
        le = LabelEncoder().fit(train_use["intent"])
        train_use = train_use.assign(label=le.transform(train_use["intent"]))
        val_use   = val_use.assign(label=le.transform(val_use["intent"]))
        test_use  = test_use.assign(label=le.transform(test_use["intent"]))

        # --- tokenizer/model ---
        model_name = os.environ.get("HF_MODEL", "distilbert-base-uncased")
        tok = AutoTokenizer.from_pretrained(model_name)

        def enc(b):
            return tok(b["text"], padding="max_length", truncation=True, max_length=64)

        ds_tr = Dataset.from_pandas(train_use[["text","label"]]).map(enc, batched=True)
        ds_va = Dataset.from_pandas(val_use[["text","label"]]).map(enc, batched=True)
        ds_te = Dataset.from_pandas(test_use[["text","label"]]).map(enc, batched=True)

        model = AutoModelForSequenceClassification.from_pretrained(model_name, num_labels=len(le.classes_))

        def compute_metrics(eval_pred):
            preds = np.argmax(eval_pred.predictions, axis=1)
            return {
                "accuracy":  accuracy_score(eval_pred.label_ids, preds),
                "macro_f1":  f1_score(eval_pred.label_ids, preds, average="macro"),
            }

        args = TrainingArguments(
            output_dir=run_path("artifacts","hf_results"),
            evaluation_strategy="epoch",
            per_device_train_batch_size=int(os.environ.get("HF_BS_TRAIN","16")),
            per_device_eval_batch_size=int(os.environ.get("HF_BS_EVAL","32")),
            num_train_epochs=float(os.environ.get("HF_EPOCHS","2")),
            learning_rate=float(os.environ.get("HF_LR","5e-5")),
            weight_decay=float(os.environ.get("HF_WD","0.01")),   # AdamW weight decay
            logging_dir=run_path("logs","hf"),
            report_to=[],
            save_total_limit=1
        )

        trainer = Trainer(model=model, args=args,
                          train_dataset=ds_tr, eval_dataset=ds_va,
                          tokenizer=tok, compute_metrics=compute_metrics)

        trainer.train()
        print("✅ HF fine-tune finished.")

        # --- Evaluate on val/test (clean) and collect tidy rows ---
        rows = []

        def eval_and_row(dataset, split_name: str):
            m = trainer.evaluate(eval_dataset=dataset)
            # keys like 'eval_loss', 'eval_accuracy', 'eval_macro_f1' may exist;
            # we also re-compute via compute_metrics to be safe:
            rows.append({
                "model":   f"{model_name}-ft",
                "variant": "HF",
                "split":   split_name,
                "accuracy": float(m.get("eval_accuracy") if "eval_accuracy" in m else m.get("accuracy", np.nan)),
                "macro_f1": float(m.get("eval_macro_f1") if "eval_macro_f1" in m else m.get("macro_f1", np.nan)),
            })

        eval_and_row(ds_va, "val_clean")
        eval_and_row(ds_te, "test_clean")

        # --- Append to analytics/baseline_summary.csv (merge-safe) ---
        path_bs = run_path("analytics", "baseline_summary.csv")
        try:
            if os.path.exists(path_bs):
                df_old = pd.read_csv(path_bs)
                df_out = pd.concat([df_old, pd.DataFrame(rows)], ignore_index=True)
            else:
                df_out = pd.DataFrame(rows)
            save_csv(df_out, path_bs)
            print("✅ Appended HF results →", path_bs)
        except Exception as e:
            logger.warning("Failed to append baseline_summary.csv: %r", e)

        print("ℹ️ Logs:", run_path("logs","hf"))
        print("ℹ️ Artifacts:", run_path("artifacts","hf_results"))

    except Exception as e:
        # write to errors
        path = run_path("analytics", "errors_hf_finetune.csv")
        pd.DataFrame([{
            "phase":"hf_finetune",
            "error":f"{type(e).__name__}: {e}",
            "trace":traceback.format_exc().strip()
        }]).to_csv(path, mode="a", index=False, header=not os.path.exists(path))
        raise
else:
    print("ℹ️ 11 skipped (set RUN_HF=1 to enable).")


2025-09-21 13:08:34,257 | INFO | intentzero2few | Logger started. path=/content/intentzero2few-repo/runs/20250921-123702/logs/intentzero2few-20250921-130834.log
INFO:intentzero2few:Logger started. path=/content/intentzero2few-repo/runs/20250921-123702/logs/intentzero2few-20250921-130834.log


ℹ️ 11 skipped (set RUN_HF=1 to enable).


# 10.) Report packaging + Flow diagram (ASCII + Mermaid)

EN: Writes a README with pipeline diagram and lists key report artifacts.
TR: Akış diyagramı ve önemli çıktıların listesiyle README yaratır.
RQ: Reproducibility & Methodology presentation; rapor: reports/README_run.md.

In [36]:
# === 10) Pack report + flow diagram + error CSV (RUN_ID-aware) — FIXED FENCES ===
# TR: README + akış diyagramları + present/missing + hata CSV
#     + benchmark_summary.csv / summary_metrics.csv / run_meta.json
#     + analytics→reports senkron (benchmark_summary, summary_metrics, zs_summary)
#     + (YENİ) ASCII ve Mermaid blokları doğru code fence ile yazılır.

import os, json, time, traceback, shutil
import pandas as pd

from intentzero2few import (
    get_env_paths, report_path, run_path,
    setup_logger, save_csv, save_json
)

p, (logger, _) = get_env_paths(), setup_logger()
readme = report_path("README_run.md")

def log_error_csv(exc: BaseException, where: str = "pack_report"):
    try:
        err_csv = run_path("analytics", "errors_report_pack.csv")
        os.makedirs(os.path.dirname(err_csv), exist_ok=True)
        row = {
            "where": where,
            "error": repr(exc),
            "traceback": "".join(traceback.format_exception(type(exc), exc, exc.__traceback__)),
        }
        header_needed = not os.path.exists(err_csv)
        pd.DataFrame([row]).to_csv(err_csv, mode="a", index=False, header=header_needed)
        logger.warning("Error captured and appended to %s", err_csv)
    except Exception as e2:
        print("WARN: failed to write error CSV:", repr(e2))

def _safe_read_csv(path: str) -> pd.DataFrame:
    return pd.read_csv(path) if os.path.exists(path) else pd.DataFrame()

def build_benchmark_summary():
    try:
        p_zs = run_path("analytics", "zs_summary.csv")
        p_bl = run_path("analytics", "baseline_summary.csv")
        df_zs = _safe_read_csv(p_zs)
        df_bl = _safe_read_csv(p_bl)

        frames = []
        if not df_zs.empty:
            if "model" not in df_zs.columns:   df_zs["model"] = "ZeroShot-LICL"
            if "variant" not in df_zs.columns: df_zs["variant"] = "ZS"
            frames.append(df_zs[["model","variant","split","accuracy","macro_f1"]])

        if not df_bl.empty:
            if "variant" not in df_bl.columns: df_bl["variant"] = "baseline"
            for c in ["accuracy","macro_f1"]:
                if c not in df_bl.columns: df_bl[c] = None
            frames.append(df_bl[["model","variant","split","accuracy","macro_f1"]])

        if not frames:
            logger.warning("No zs/baseline summaries found; skipping benchmark_summary.")
            return None

        out = pd.concat(frames, ignore_index=True)
        save_csv(out, run_path("analytics", "benchmark_summary.csv"))
        save_csv(out, report_path("benchmark_summary.csv"))
        return out
    except Exception as e:
        log_error_csv(e, where="build_benchmark_summary")
        return None

def build_summary_metrics():
    try:
        p_bench = run_path("analytics", "benchmark_summary.csv")
        df = _safe_read_csv(p_bench)
        if df.empty:
            return None
        want = ["model","variant","split","accuracy","macro_f1"]
        df = df.reindex(columns=want)
        save_csv(df, run_path("analytics", "summary_metrics.csv"))
        return df
    except Exception as e:
        log_error_csv(e, where="build_summary_metrics")
        return None

def write_run_meta():
    try:
        meta = {"thresholds": {}, "k_best": None, "mapping_stats": {}, "run": {}}

        for name in ["threshold_zero_shot.json",
                     "threshold_zero_shot_clean.json",
                     "threshold_zero_shot_polluted.json"]:
            pth = run_path("analytics", name)
            if os.path.exists(pth):
                with open(pth, "r", encoding="utf-8") as f:
                    key = "global" if ("clean" not in name and "polluted" not in name) else ("clean" if "clean" in name else "polluted")
                    meta["thresholds"][key] = json.load(f)

        p_k = run_path("analytics", "fewshot_sweep_best.json")
        if os.path.exists(p_k):
            with open(p_k, "r", encoding="utf-8") as f:
                meta["k_best"] = json.load(f)

        n_super = 0; n_intents = 0
        p_map = run_path("analytics", "super_mapping.csv")
        p_desc= run_path("analytics", "intent_descriptions.csv")
        if os.path.exists(p_map):
            df_map = pd.read_csv(p_map)
            cols = {c.lower(): c for c in df_map.columns}
            s_col = cols.get("super",  list(df_map.columns)[0])
            i_col = cols.get("intent", list(df_map.columns)[1])
            n_super   = int(df_map[s_col].nunique())
            n_intents = int(df_map[i_col].nunique())
        elif os.path.exists(p_desc):
            df_d = pd.read_csv(p_desc)
            i_col = "intent" if "intent" in df_d.columns else df_d.columns[0]
            n_intents = int(df_d[i_col].nunique())

        meta["mapping_stats"] = {
            "n_super": n_super,
            "n_intents": n_intents,
            "avg_per_super": (float(n_intents/n_super) if n_super else None)
        }

        meta["run"] = {"run_id": os.environ.get("RUN_ID", "unknown"),
                       "timestamp_utc": int(time.time())}

        save_json(meta, report_path("run_meta.json"))
        return meta
    except Exception as e:
        log_error_csv(e, where="write_run_meta")
        return None

def _copy_if_exists(src_path: str, dst_path: str):
    try:
        if os.path.exists(src_path):
            os.makedirs(os.path.dirname(dst_path), exist_ok=True)
            shutil.copyfile(src_path, dst_path)
            return True
        return False
    except Exception as e:
        log_error_csv(e, where="copy_if_exists")
        return False

# --------- Flow blocks (fixed fences) ---------
flow_ascii = r"""
FLOW (Zero→Few Robust Pipeline)
┌─────────┐
│  Input  │  CLINC_OOS (train/val/test + OOS)
└───┬─────┘
    │
    ├─▶ Split & Export (data/clinc150.json)
    │
    ├─▶ EDA (counts, wordcloud)  → runs/analytics, runs/figures
    │
    ├─▶ Prep:
    │     ├─ Polluted {val,test} (mix OOS)
    │     ├─ Augmented train (classic) → ≥30K
    │     └─ Noisy {train,test} (emoji/slang/typo)
    │
    ├─▶ Discovery (TF-IDF desc → ST embeddings → K-means) → super-intents
    │
    ├─▶ Zero-shot (super centroids) + τ calibration (val_polluted)
    │
    ├─▶ Evaluate: clean / noisy / polluted
    │
    ├─▶ Baselines (TFIDF+LR, BERT-Linear) + τ
    │
    ├─▶ Few-shot K={1,5,10}
    │
    └─▶ Reports (tables + figs + artifacts) → reports/<RUN_ID>/
""".strip()

flow_mermaid_body = """
flowchart TD
A[CLINC_OOS raw] --> B[Export CLINC JSON]
B --> C[EDA: stats + wordcloud]
B --> D[Prep: polluted val/test]
B --> E[Prep: augmented train >=30K]
B --> F[Prep: noisy train/test]
E --> G[Discovery: super-intents (K-means)]
G --> H[Zero-shot centroids]
D --> I[tau calibration on val_polluted]
H --> J[Evaluate: clean/noisy/polluted]
F --> J
B --> K[Baselines TFIDF+LR, BERT-Linear + tau]
B --> L[Few-shot K=1/5/10]
J --> M[Reports: zs_summary, heatmaps]
K --> M
L --> M
""".strip()

# ----------------- RUN PACKAGING -----------------
try:
    # 1) Build combined summaries + meta
    build_benchmark_summary()
    build_summary_metrics()
    write_run_meta()

    # 1.a) auto-sync curated CSVs analytics → reports (idempotent)
    for name in ["benchmark_summary.csv", "summary_metrics.csv", "zs_summary.csv"]:
        _copy_if_exists(run_path("analytics", name), report_path(name))

    # 2) Compose expected list (reports/ scope)
    expected = [
        "intent_descriptions.csv",
        "zs_summary.csv",
        "baseline_robustness.csv",
        "fewshot_summary.csv",
        "benchmark_summary.csv",
        "summary_metrics.csv",
        "run_meta.json",
        "zs_confmat_clean.png",
        "zs_confmat_noisy.png",
        "zs_confmat_polluted.png",
        "wordcloud_train.png",
    ]

    present, missing = [], []
    for name in expected:
        path = report_path(name)
        (present if os.path.exists(path) else missing).append(name)

    # 3) Write README (with proper code fences)
    os.makedirs(os.path.dirname(readme), exist_ok=True)
    with open(readme, "w", encoding="utf-8") as f:
        f.write("# Run Report\n\n")
        f.write(f"- RUN_ID: {p['RUN_ID']}\n\n")

        f.write("## Flow (ASCII)\n\n")
        f.write("```text\n" + flow_ascii + "\n```\n\n")

        f.write("## Flow (Mermaid)\n\n")
        f.write("```mermaid\n" + flow_mermaid_body + "\n```\n\n")

        f.write("## Key Artifacts (present)\n")
        if present:
            for n in present:
                f.write(f"- {n}\n")
        else:
            f.write("- (none yet)\n")

        if missing:
            f.write("\n## Key Artifacts (missing / to be generated)\n")
            for n in missing:
                f.write(f"- {n}\n")

    logger.info("Run README saved: %s", readme)
    print("✅ 10 done →", readme)

except Exception as e:
    logger.exception("Failed to pack report")
    log_error_csv(e, where="pack_report_top")
    print("⚠️ Failed; error recorded to errors_report_pack.csv")


2025-09-21 13:55:53,682 | INFO | intentzero2few | Logger started. path=/content/intentzero2few-repo/runs/20250921-123702/logs/intentzero2few-20250921-135553.log
INFO:intentzero2few:Logger started. path=/content/intentzero2few-repo/runs/20250921-123702/logs/intentzero2few-20250921-135553.log
2025-09-21 13:55:53,737 | INFO | intentzero2few | Run README saved: /content/intentzero2few-repo/reports/20250921-123702/README_run.md
INFO:intentzero2few:Run README saved: /content/intentzero2few-repo/reports/20250921-123702/README_run.md


✅ 10 done → /content/intentzero2few-repo/reports/20250921-123702/README_run.md


In [37]:
%%bash
cd /content/intentzero2few-repo
[ -f .env ] && { set +u; source .env; set -u; }
echo "RUN_ID=$RUN_ID"
ls -1 "reports/$RUN_ID" | sed 's/^/  • /'


RUN_ID=20250921-123702
  • aug_samples_classic_head20.csv
  • aug_samples_noisy_head20.csv
  • baseline_robustness.csv
  • benchmark_summary.csv
  • fewshot_summary.csv
  • intent_descriptions.csv
  • README_run.md
  • run_meta.json
  • summary_metrics.csv
  • super_to_intents.json
  • zs_confmat_clean.png
  • zs_confmat_noisy.png
  • zs_confmat_polluted.png
  • zs_summary.csv


In [19]:
# === 10) Pack report + flow diagram + error CSV (RUN_ID-aware) — MERGED & UPDATED ===
# TR: README + akış diyagramları + present/missing listesi + hata CSV (mevcut davranış korunur)
#     + benchmark_summary.csv (ZS+Baseline tek tablo), summary_metrics.csv ve run_meta.json üretir.
# EN: Keeps your README/flow/error CSV behavior and ADDS:
#     analytics/benchmark_summary.csv, reports/benchmark_summary.csv,
#     analytics/summary_metrics.csv, reports/run_meta.json. Also copies zs_summary.csv to reports/ (if exists).

import os, json, time, traceback, shutil
import pandas as pd

from intentzero2few import (
    get_env_paths, report_path, run_path,
    setup_logger, save_csv, save_json
)

p, (logger, _) = get_env_paths(), setup_logger()
readme = report_path("README_run.md")

def log_error_csv(exc: BaseException, where: str = "pack_report"):
    """TR: Hataları CSV'ye ekle.  EN: Append errors to a CSV for reproducibility."""
    try:
        err_csv = run_path("analytics", "errors_report_pack.csv")
        os.makedirs(os.path.dirname(err_csv), exist_ok=True)
        row = {
            "where": where,
            "error": repr(exc),
            "traceback": "".join(traceback.format_exception(type(exc), exc, exc.__traceback__)),
        }
        header_needed = not os.path.exists(err_csv)
        pd.DataFrame([row]).to_csv(err_csv, mode="a", index=False, header=header_needed)
        logger.warning("Error captured and appended to %s", err_csv)
    except Exception as e2:
        print("WARN: failed to write error CSV:", repr(e2))

# ----------------- NEW: packaging helpers -----------------
def _safe_read_csv(path: str) -> pd.DataFrame:
    return pd.read_csv(path) if os.path.exists(path) else pd.DataFrame()

def build_benchmark_summary():
    """
    Combines ZS + Baseline summaries into one tidy table.
    Writes:
      - analytics/benchmark_summary.csv
      - reports/benchmark_summary.csv
    """
    try:
        p_zs = run_path("analytics", "zs_summary.csv")
        p_bl = run_path("analytics", "baseline_summary.csv")
        df_zs = _safe_read_csv(p_zs)
        df_bl = _safe_read_csv(p_bl)

        frames = []
        if not df_zs.empty:
            if "model" not in df_zs.columns:   df_zs["model"] = "ZeroShot-LICL"
            if "variant" not in df_zs.columns: df_zs["variant"] = "ZS"
            frames.append(df_zs[["model","variant","split","accuracy","macro_f1"]])

        if not df_bl.empty:
            if "variant" not in df_bl.columns: df_bl["variant"] = "baseline"
            for c in ["accuracy","macro_f1"]:
                if c not in df_bl.columns: df_bl[c] = None
            frames.append(df_bl[["model","variant","split","accuracy","macro_f1"]])

        if not frames:
            logger.warning("No zs/baseline summaries found; skipping benchmark_summary.")
            return None

        out = pd.concat(frames, ignore_index=True)
        p_analytics = run_path("analytics", "benchmark_summary.csv")
        p_reports   = run_path("reports",   "benchmark_summary.csv")
        save_csv(out, p_analytics); save_csv(out, p_reports)
        return out
    except Exception as e:
        log_error_csv(e, where="build_benchmark_summary")
        return None

def build_summary_metrics():
    """
    Single consolidated summary (tidy): analytics/summary_metrics.csv
    Based on analytics/benchmark_summary.csv
    """
    try:
        p_bench = run_path("analytics", "benchmark_summary.csv")
        df = _safe_read_csv(p_bench)
        if df.empty:
            return None
        want_cols = ["model","variant","split","accuracy","macro_f1"]
        df = df.reindex(columns=want_cols)
        save_csv(df, run_path("analytics", "summary_metrics.csv"))
        return df
    except Exception as e:
        log_error_csv(e, where="build_summary_metrics")
        return None

def write_run_meta():
    """
    Writes reports/run_meta.json:
      - thresholds: global and/or clean/polluted if available
      - k_best: from analytics/fewshot_sweep_best.json (if exists)
      - mapping_stats: n_intents, n_super, avg_per_super
      - run: run_id, timestamp_utc
    """
    try:
        meta = {"thresholds": {}, "k_best": None, "mapping_stats": {}, "run": {}}

        # thresholds
        for name in ["threshold_zero_shot.json",
                     "threshold_zero_shot_clean.json",
                     "threshold_zero_shot_polluted.json"]:
            pth = run_path("analytics", name)
            if os.path.exists(pth):
                with open(pth, "r", encoding="utf-8") as f:
                    key = "global" if ("clean" not in name and "polluted" not in name) else ("clean" if "clean" in name else "polluted")
                    meta["thresholds"][key] = json.load(f)

        # k_best
        p_k = run_path("analytics", "fewshot_sweep_best.json")
        if os.path.exists(p_k):
            with open(p_k, "r", encoding="utf-8") as f:
                meta["k_best"] = json.load(f)

        # mapping stats
        n_super = 0; n_intents = 0
        p_map = run_path("analytics", "super_mapping.csv")
        p_desc= run_path("analytics", "intent_descriptions.csv")
        if os.path.exists(p_map):
            df_map = pd.read_csv(p_map)
            cols = {c.lower(): c for c in df_map.columns}
            s_col = cols.get("super",  list(df_map.columns)[0])
            i_col = cols.get("intent", list(df_map.columns)[1])
            n_super   = int(df_map[s_col].nunique())
            n_intents = int(df_map[i_col].nunique())
        elif os.path.exists(p_desc):
            df_d = pd.read_csv(p_desc)
            i_col = "intent" if "intent" in df_d.columns else df_d.columns[0]
            n_intents = int(df_d[i_col].nunique())

        meta["mapping_stats"] = {
            "n_super": n_super,
            "n_intents": n_intents,
            "avg_per_super": (float(n_intents/n_super) if n_super else None)
        }

        # run info
        run_id = os.environ.get("RUN_ID", "unknown")
        meta["run"] = {"run_id": run_id, "timestamp_utc": int(time.time())}

        save_json(meta, report_path("run_meta.json"))
        return meta
    except Exception as e:
        log_error_csv(e, where="write_run_meta")
        return None

def _copy_if_exists(src_path: str, dst_path: str):
    try:
        if os.path.exists(src_path):
            os.makedirs(os.path.dirname(dst_path), exist_ok=True)
            shutil.copyfile(src_path, dst_path)
            return True
        return False
    except Exception as e:
        log_error_csv(e, where="copy_if_exists")
        return False

# ----------------- README / flow (your original) -----------------
flow_ascii = r"""
FLOW (Zero→Few Robust Pipeline)
┌─────────┐
│  Input  │  CLINC_OOS (train/val/test + OOS)
└───┬─────┘
    │
    ├─▶ Split & Export (data/clinc150.json)
    │
    ├─▶ EDA (counts, wordcloud)  → runs/analytics, runs/figures
    │
    ├─▶ Prep:
    │     ├─ Polluted {val,test} (mix OOS)
    │     ├─ Augmented train (classic) → ≥30K
    │     └─ Noisy {train,test} (emoji/slang/typo)
    │
    ├─▶ Discovery (TF-IDF desc → ST embeddings → K-means) → super-intents
    │
    ├─▶ Zero-shot (super centroids) + τ calibration (val_polluted)
    │
    ├─▶ Evaluate: clean / noisy / polluted
    │
    ├─▶ Baselines (TFIDF+LR, BERT-Linear) + τ
    │
    ├─▶ Few-shot K={1,5,10}
    │
    └─▶ Reports (tables + figs + artifacts) → reports/<RUN_ID>/
"""

flow_mermaid = """
```mermaid
flowchart TD
A[CLINC_OOS raw] --> B[Export CLINC JSON]
B --> C[EDA: stats + wordcloud]
B --> D[Prep: polluted val/test]
B --> E[Prep: augmented train >=30K]
B --> F[Prep: noisy train/test]
E --> G[Discovery: super-intents (K-means)]
G --> H[Zero-shot centroids]
D --> I[τ calibration on val_polluted]
H --> J[Evaluate: clean/noisy/polluted]
F --> J
B --> K[Baselines TFIDF+LR, BERT-Linear + τ]
B --> L[Few-shot K=1/5/10]
J --> M[Reports: zs_summary, heatmaps]
K --> M
L --> M
"""

# ----------------- RUN PACKAGING -----------------
try:
    # 1) Build combined summaries + meta
    build_benchmark_summary()
    build_summary_metrics()
    write_run_meta()

    # 1.a) (quality-of-life) copy zs_summary.csv -> reports/ if exists
    _copy_if_exists(run_path("analytics", "zs_summary.csv"), report_path("zs_summary.csv"))

    # 2) Compose expected list (reports/ scope)
    expected = [
        "intent_descriptions.csv",
        "zs_summary.csv",              # we copy it above if analytics had it
        "baseline_robustness.csv",
        "fewshot_summary.csv",
        "benchmark_summary.csv",       # NEW (reports/)
        "run_meta.json",               # NEW (reports/)
        "zs_confmat_clean.png",
        "zs_confmat_noisy.png",
        "zs_confmat_polluted.png",
        "wordcloud_train.png",
    ]

    present, missing = [], []
    for name in expected:
        path = report_path(name)
        if os.path.exists(path):
            present.append(name)
        else:
            missing.append(name)

    # 3) Write README
    os.makedirs(os.path.dirname(readme), exist_ok=True)
    with open(readme, "w", encoding="utf-8") as f:
        f.write("# Run Report\n\n")
        f.write(f"- RUN_ID: {p['RUN_ID']}\n\n")

        f.write("## Flow (ASCII)\n\n")
        f.write(flow_ascii.strip() + "\n\n")

        f.write("## Flow (Mermaid)\n\n")
        f.write(flow_mermaid.strip() + "\n\n")

        f.write("## Key Artifacts (present)\n")
        if present:
            for n in present:
                f.write(f"- {n}\n")
        else:
            f.write("- (none yet)\n")

        if missing:
            f.write("\n## Key Artifacts (missing / to be generated)\n")
            for n in missing:
                f.write(f"- {n}\n")

    logger.info("Run README saved: %s", readme)
    print("✅ 10 done →", readme)

except Exception as e:
    logger.exception("Failed to pack report")
    log_error_csv(e, where="pack_report_top")
    print("⚠️ Failed; error recorded to errors_report_pack.csv")


2025-09-21 13:08:40,417 | INFO | intentzero2few | Logger started. path=/content/intentzero2few-repo/runs/20250921-123702/logs/intentzero2few-20250921-130840.log
INFO:intentzero2few:Logger started. path=/content/intentzero2few-repo/runs/20250921-123702/logs/intentzero2few-20250921-130840.log
2025-09-21 13:08:40,441 | INFO | intentzero2few | Run README saved: /content/intentzero2few-repo/reports/20250921-123702/README_run.md
INFO:intentzero2few:Run README saved: /content/intentzero2few-repo/reports/20250921-123702/README_run.md


✅ 10 done → /content/intentzero2few-repo/reports/20250921-123702/README_run.md


# 12.).gitignore’u sertleştirelim (tek seferlik)

TR: Aşağıdaki hücre, eksikse güvenli ignore satırlarını ekler.
EN: This cell appends safe ignore lines if they’re missing.

Bu hücre:

Kök .gitignore’u idempotent şekilde güçlendirir.

reports/.gitignore ekleyerek sadece kürasyon dosyalarını (10. adımda ürettiğin) repo’ya alır; diğer her şeyi yoksayar.

Sonunda hangi dosyaların ignore edildiğini gösterir.

.gitignore whitelist (recursive)

In [20]:
%%bash
set -euo pipefail
REPO_DIR="/content/intentzero2few-repo"
cd "$REPO_DIR"

ensure_line () { grep -qxF "$1" .gitignore || echo "$1" >> .gitignore; }

# Root .gitignore (idempotent)
touch .gitignore
ensure_line ".env"
ensure_line "data/"
ensure_line "runs/"
ensure_line "runs/latest"
ensure_line "reports/latest"
ensure_line "__pycache__/"
ensure_line "*.egg-info/"
ensure_line ".ipynb_checkpoints/"
ensure_line "venv/"
ensure_line ".venv/"
ensure_line "cache/"
ensure_line "*.ckpt"
ensure_line ".DS_Store"

# reports/ altında recursive whitelist
mkdir -p reports
cat > reports/.gitignore <<'EOF'
# Ignore everything under reports/ by default
*

# Allow subdirectories so negations below can match
!*/

# Curated artefacts at any depth (RUN_ID folders)
!**/README_run.md
!**/benchmark_summary.csv
!**/summary_metrics.csv
!**/run_meta.json
!**/zs_summary.csv
!**/super_to_intents.json

# Figures (PNG) — LFS ile track edilecek
!**/*.png
EOF

echo "✅ .gitignore hardened (root + reports whitelist)"
git status --ignored -s


✅ .gitignore hardened (root + reports whitelist)
?? .gitignore
?? README.md
?? pyproject.toml
?? reports/
?? requirements.txt
?? src/
!! .env
!! data/
!! reports/.gitignore
!! reports/20250921-123702/aug_samples_classic_head20.csv
!! reports/20250921-123702/aug_samples_noisy_head20.csv
!! reports/20250921-123702/baseline_robustness.csv
!! reports/20250921-123702/fewshot_summary.csv
!! reports/20250921-123702/intent_descriptions.csv
!! reports/latest
!! runs/
!! src/intentzero2few.egg-info/
!! src/intentzero2few/__pycache__/


# 13.) Raporları topla → commit → push (Colab’ta)

TR: Bu hücre, tezde kullanacağın dosyaları reports/<RUN_ID>/ altına zaten koyduğumuz varsayımıyla sadece kodu ve raporları commit’ler.
EN: This commits code + curated reports only. It won’t add runs/ or data/.

Bu hücre:

Git LFS’i kurar ve reports/*.png’yi track eder (idempotent).

RUN_ID’li mesajla commit atar (sadece .gitignore’ın izin verdiği dosyalar — yani kod + kürasyon raporları).

PAT’i ekrana yazmadan geçici olarak origin URL’ini ayarlar, push eder, sonra eski URL’e geri alır.

Çalışma dalını otomatik bulur.

Önemli güvenlik notu: Notebook’ta açık metin PAT hücresini derhal sil ve GitHub’da token’ı rotate et. Aşağıdaki yöntemde PAT ekrana çıkmaz.

In [38]:
%%bash
set -euo pipefail
cd /content/intentzero2few-repo
# LFS (idempotent)
if ! command -v git-lfs >/dev/null; then
  apt-get -qq update && apt-get -qq install -y git-lfs
  git lfs install
fi
git lfs track "reports/**/*.png" >/dev/null 2>&1 || true
git add .gitattributes || true
git commit -m "chore: ensure LFS for report PNGs" || true

# Curated raporlar stage'e
[ -f .env ] && { set +u; source .env; set -u; }
git add "reports/${RUN_ID}/" || true

# Commit & push
MSG="Exp: pipeline run ${RUN_ID} — packaging update"
git commit -m "$MSG" || { echo "Nothing to commit."; exit 0; }
BRANCH="$(git rev-parse --abbrev-ref HEAD)"
CLEAN_URL="$(git remote get-url origin)"
USER="${GITHUB_USER:-}"; PAT="${GITHUB_PAT:-}"
if [ -n "$USER" ] && [ -n "$PAT" ]; then
  git remote set-url origin "https://${USER}:${PAT}@github.com/${USER}/intentzero2few.git"
  TOKENIZED=1
else
  TOKENIZED=0
fi
git push origin "$BRANCH"
[ "$TOKENIZED" = "1" ] && git remote set-url origin "$CLEAN_URL"
echo "✅ Pushed"


On branch main
Your branch is up to date with 'origin/main'.

Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
	modified:   reports/20250921-123702/README_run.md
	modified:   reports/20250921-123702/run_meta.json

no changes added to commit (use "git add" and/or "git commit -a")
[main 38391fe] Exp: pipeline run 20250921-123702 — packaging update
 2 files changed, 8 insertions(+), 4 deletions(-)
✅ Pushed


To https://github.com/mervegulnazerdem/intentzero2few.git
   bedaa58..38391fe  main -> main


In [21]:
%%bash
set -euo pipefail
REPO_DIR="/content/intentzero2few-repo"
cd "$REPO_DIR"

# Git & LFS (idempotent)
if ! command -v git >/dev/null; then
  apt-get -qq update && apt-get -qq install -y git
fi
if ! command -v git-lfs >/dev/null; then
  apt-get -qq update && apt-get -qq install -y git-lfs
  git lfs install
fi

# Recursive PNG tracking
git lfs track "reports/**/*.png" >/dev/null 2>&1 || true
git add .gitattributes || true
git commit -m "chore: track reports/**/*.png with git-lfs" || true

# Commit message with RUN_ID (if .env exists)
RUN_ID=""
if [ -f .env ]; then set +u; source .env; set -u; fi
MSG="Exp: pipeline run ${RUN_ID:-unknown} — code + curated reports"

echo "— Changed files (respecting .gitignore) —"
git status --porcelain

git add -A
git commit -m "$MSG" || { echo "Nothing to commit."; exit 0; }

# Push (token'lı → sonra temizle)
BRANCH="$(git rev-parse --abbrev-ref HEAD)"
CLEAN_URL="$(git remote get-url origin)"
USER="${GITHUB_USER:-}"
PAT="${GITHUB_PAT:-}"

if [ -n "${USER}" ] && [ -n "${PAT}" ]; then
  git remote set-url origin "https://${USER}:${PAT}@github.com/${USER}/intentzero2few.git"
  TOKENIZED=1
else
  TOKENIZED=0
fi

git push origin "$BRANCH"

if [ "$TOKENIZED" = "1" ]; then
  git remote set-url origin "$CLEAN_URL"
fi

echo "✅ Pushed to remote (branch: $BRANCH)"


[main (root-commit) 068c6fa] chore: track reports/**/*.png with git-lfs
 1 file changed, 1 insertion(+)
 create mode 100644 .gitattributes
— Changed files (respecting .gitignore) —
?? .gitignore
?? README.md
?? pyproject.toml
?? reports/
?? requirements.txt
?? src/
[main 67d8c66] Exp: pipeline run 20250921-123702 — code + curated reports
 29 files changed, 1281 insertions(+)
 create mode 100644 .gitignore
 create mode 100644 README.md
 create mode 100644 pyproject.toml
 create mode 100644 reports/20250921-123702/README_run.md
 create mode 100644 reports/20250921-123702/run_meta.json
 create mode 100644 reports/20250921-123702/super_to_intents.json
 create mode 100644 reports/20250921-123702/zs_confmat_clean.png
 create mode 100644 reports/20250921-123702/zs_confmat_noisy.png
 create mode 100644 reports/20250921-123702/zs_confmat_polluted.png
 create mode 100644 reports/20250921-123702/zs_summary.csv
 create mode 100644 requirements.txt
 create mode 100644 src/intentzero2few/__init__.py

To https://github.com/mervegulnazerdem/intentzero2few.git
 * [new branch]      main -> main


In [34]:
!ls -R

.:
data	   pyproject.toml  reports	     runs     src
notebooks  README.md	   requirements.txt  scripts

./data:
clinc150.json

./notebooks:

./reports:
20250921-123702  latest

./reports/20250921-123702:
aug_samples_classic_head20.csv	run_meta.json
aug_samples_noisy_head20.csv	summary_metrics.csv
baseline_robustness.csv		super_to_intents.json
benchmark_summary.csv		zs_confmat_clean.png
fewshot_summary.csv		zs_confmat_noisy.png
intent_descriptions.csv		zs_confmat_polluted.png
README_run.md			zs_summary.csv

./runs:
20250921-123702  latest

./runs/20250921-123702:
analytics  artifacts  figures  logs  reports

./runs/20250921-123702/analytics:
augmentation_manifest.json	  pollution_manifest.json
baseline_summary.csv		  split_stats.csv
baseline_summary_wide.csv	  summary_metrics.csv
benchmark_summary.csv		  super_mapping.csv
cls_report_clean.csv		  test_noisy_in_scope.csv
cls_report_noisy.csv		  test_polluted.csv
cls_report_polluted.csv		  threshold_zero_shot_clean.json
confusion_clean_nor

In [32]:
%%bash
set -euo pipefail
cd /content/intentzero2few-repo

# RUN_ID'yi göster (packaging çalıştıysa .env'de vardır)
[ -f .env ] && { set +u; source .env; set -u; echo "RUN_ID = $RUN_ID"; } || echo "RUN_ID bulunamadı (.env yok)"

echo; echo "— reports/<RUN_ID> içeriği —"
[ -n "${RUN_ID:-}" ] && ls -lah "reports/$RUN_ID" || echo "RUN_ID yok; klasör listelenmedi."

echo; echo "— git status (yalnızca reports/ satırlarını göster) —"
git status --porcelain | awk '/^..[ ]+reports\// {print}'


RUN_ID = 20250921-123702

— reports/<RUN_ID> içeriği —
total 696K
drwxr-xr-x 2 root root 4.0K Sep 21 13:33 .
drwxr-xr-x 3 root root 4.0K Sep 21 13:14 ..
-rw-r--r-- 1 root root 1.4K Sep 21 12:43 aug_samples_classic_head20.csv
-rw-r--r-- 1 root root 1.4K Sep 21 12:43 aug_samples_noisy_head20.csv
-rw-r--r-- 1 root root  170 Sep 21 12:56 baseline_robustness.csv
-rw-r--r-- 1 root root  945 Sep 21 13:33 benchmark_summary.csv
-rw-r--r-- 1 root root 1.2K Sep 21 13:08 fewshot_summary.csv
-rw-r--r-- 1 root root  15K Sep 21 12:44 intent_descriptions.csv
-rw-r--r-- 1 root root 1.8K Sep 21 13:08 README_run.md
-rw-r--r-- 1 root root  430 Sep 21 13:08 run_meta.json
-rw-r--r-- 1 root root  945 Sep 21 13:33 summary_metrics.csv
-rw-r--r-- 1 root root 3.1K Sep 21 12:44 super_to_intents.json
-rw-r--r-- 1 root root 201K Sep 21 12:48 zs_confmat_clean.png
-rw-r--r-- 1 root root 209K Sep 21 12:48 zs_confmat_noisy.png
-rw-r--r-- 1 root root 215K Sep 21 12:48 zs_confmat_polluted.png
-rw-r--r-- 1 root root  239 

In [29]:
%%bash
set -euo pipefail
cd /content/intentzero2few-repo
[ -f .env ] && { set +u; source .env; set -u; } || true

echo "RUN_ID = ${RUN_ID:-N/A}"
echo; echo "— runs/<RUN_ID>/analytics —"
ls -lah "runs/${RUN_ID}/analytics" | awk '{print $9}' | sed 's/^/  • /' || true

echo; echo "— reports/<RUN_ID> —"
ls -lah "reports/${RUN_ID}" | awk '{print $9}' | sed 's/^/  • /' || true


RUN_ID = 20250921-123702

— runs/<RUN_ID>/analytics —
  • 
  • .
  • ..
  • augmentation_manifest.json
  • baseline_summary.csv
  • baseline_summary_wide.csv
  • benchmark_summary.csv
  • cls_report_clean.csv
  • cls_report_noisy.csv
  • cls_report_polluted.csv
  • confusion_clean_norm_all.csv
  • confusion_clean_raw.csv
  • confusion_noisy_norm_all.csv
  • confusion_noisy_raw.csv
  • confusion_polluted_norm_all.csv
  • confusion_polluted_raw.csv
  • discovery_k.json
  • errors_step8_baseline_runner.csv
  • errors_wordcloud.csv
  • intent_descriptions.csv
  • noisy_augmentation_manifest.json
  • noisy_test_manifest.json
  • pollution_manifest.json
  • split_stats.csv
  • summary_metrics.csv
  • super_mapping.csv
  • test_noisy_in_scope.csv
  • test_polluted.csv
  • threshold_zero_shot_clean.json
  • threshold_zero_shot.json
  • threshold_zero_shot_polluted.json
  • train_augmented.csv
  • train_augmented_noisy.csv
  • train_intent_counts.csv
  • val_polluted.csv
  • zs_eval_all.json
  

In [30]:
%%bash
set -euo pipefail
cd /content/intentzero2few-repo
[ -f .env ] && { set +u; source .env; set -u; } || true
: "${RUN_ID:?RUN_ID yok; önce packaging çalıştır.}"

mkdir -p "reports/${RUN_ID}"

# Bu üç CSV'yi raporlara kopyala/yenile
[ -f "runs/${RUN_ID}/analytics/benchmark_summary.csv" ] && \
  cp -f "runs/${RUN_ID}/analytics/benchmark_summary.csv" "reports/${RUN_ID}/benchmark_summary.csv" || true

[ -f "runs/${RUN_ID}/analytics/summary_metrics.csv" ] && \
  cp -f "runs/${RUN_ID}/analytics/summary_metrics.csv" "reports/${RUN_ID}/summary_metrics.csv" || true

[ -f "runs/${RUN_ID}/analytics/zs_summary.csv" ] && \
  cp -f "runs/${RUN_ID}/analytics/zs_summary.csv" "reports/${RUN_ID}/zs_summary.csv" || true

echo "✅ Synced to reports/${RUN_ID}: benchmark_summary.csv + summary_metrics.csv + zs_summary.csv"
ls -l "reports/${RUN_ID}" | awk '{print $9}' | sed 's/^/  • /'


✅ Synced to reports/20250921-123702: benchmark_summary.csv + summary_metrics.csv + zs_summary.csv
  • 
  • aug_samples_classic_head20.csv
  • aug_samples_noisy_head20.csv
  • baseline_robustness.csv
  • benchmark_summary.csv
  • fewshot_summary.csv
  • intent_descriptions.csv
  • README_run.md
  • run_meta.json
  • summary_metrics.csv
  • super_to_intents.json
  • zs_confmat_clean.png
  • zs_confmat_noisy.png
  • zs_confmat_polluted.png
  • zs_summary.csv


In [31]:
%%bash
set -euo pipefail
cd /content/intentzero2few-repo
[ -f .env ] && { set +u; source .env; set -u; } || true
: "${RUN_ID:?}"

# LFS (idempotent)
if ! command -v git-lfs >/dev/null; then
  apt-get -qq update && apt-get -qq install -y git-lfs
  git lfs install
fi
git lfs track "reports/**/*.png" >/dev/null 2>&1 || true
git add .gitattributes || true
git commit -m "chore: ensure LFS for report PNGs" || true

# Curated raporları stage’e al
git add "reports/${RUN_ID}/benchmark_summary.csv" 2>/dev/null || true
git add "reports/${RUN_ID}/summary_metrics.csv"   2>/dev/null || true
git add "reports/${RUN_ID}/zs_summary.csv"        2>/dev/null || true

# (Genel ekleme — diğer raporlar da gelsin)
git add -A

echo "— staged (reports/*) —"
git status --porcelain | awk '/^..[ ]+reports\// {print}'

# Commit
MSG="Exp: pipeline run ${RUN_ID} — curated tables sync"
git commit -m "$MSG" || { echo "Nothing to commit."; exit 0; }

# Push (token'lı → sonra temizle)
BRANCH="$(git rev-parse --abbrev-ref HEAD)"
CLEAN_URL="$(git remote get-url origin)"
USER="${GITHUB_USER:-}"; PAT="${GITHUB_PAT:-}"
if [ -n "$USER" ] && [ -n "$PAT" ]; then
  git remote set-url origin "https://${USER}:${PAT}@github.com/${USER}/intentzero2few.git"
  TOKENIZED=1
else
  TOKENIZED=0
fi
git push origin "$BRANCH"
[ "$TOKENIZED" = "1" ] && git remote set-url origin "$CLEAN_URL"

echo "✅ Pushed curated CSVs to GitHub"


On branch main
Your branch is up to date with 'origin/main'.

Untracked files:
  (use "git add <file>..." to include in what will be committed)
	reports/20250921-123702/benchmark_summary.csv
	reports/20250921-123702/summary_metrics.csv

nothing added to commit but untracked files present (use "git add" to track)
— staged (reports/*) —
A  reports/20250921-123702/benchmark_summary.csv
A  reports/20250921-123702/summary_metrics.csv
[main bedaa58] Exp: pipeline run 20250921-123702 — curated tables sync
 2 files changed, 28 insertions(+)
 create mode 100644 reports/20250921-123702/benchmark_summary.csv
 create mode 100644 reports/20250921-123702/summary_metrics.csv
✅ Pushed curated CSVs to GitHub


To https://github.com/mervegulnazerdem/intentzero2few.git
   0b9ca61..bedaa58  main -> main
