## Visualizing Results of the Deletion Capacity Experiment

These are the results of the deletion capacity experiment. 

At a high level, we're seeing very conservative regret bounds for the Memory Pair. This means that we're requiring large sample complexity in return for a very low deletion capacity.

It's also worth noting that our sample complexity (bar for a good learner) increases as the data wiggles more. When the Lipschitz constant and upper-bound on the Hessian are high, the sample complexity jumps and the amount of noise injected to the model becomes destabilizingly high.

Goals:
- Analyze the simulation results from the experiment runs and visualize the cumulative regret
- Focus on $\widehat{G}$ such that we can see its impact on the downstream stability of the learner
- Investigate alternative methods of privacy accounting. Can we get tigheter regret bounds such that we don't inject so much noise into the parameter estimates.


### Page-Wide Questions
- The formulas for sample complexity and deletion capacity look very similar (ie. use the $GD$ term). Why is this the case, and what does this suggest about the relationship between these two formulas? If I were to divide sample complexity by deletion capacity, it would almost look like something like a harmonic mean.
- I wonder how $\widehat{D}$ is being estimated. It looks like a lot of seeds are capping it at 10, which is a worst-case scenario. Is there something that can reduce this?

In [2]:
import pandas as pd
import numpy as np
import random
import re
import os
import seaborn as sns
import matplotlib.pyplot as plt

In [3]:
path = "/workspaces/unlearning-research-meta/experiments/deletion_capacity/results/grid_2025_08_11/sweep/gamma_1.0-split_0.5_q0.90_k5_relaxed_eps1.0/seed_000_synthetic_memorypair.csv"

data = pd.read_csv(path)

In [5]:
data.columns

Index(['C_hat', 'D_hat', 'G_hat', 'N_star_theory', 'P_T', 'P_T_est',
       'S_scalar', 'acc', 'accountant_type', 'base_eta_t', 'c_hat',
       'capacity_remaining', 'comparator_type', 'deletion_capacity',
       'deletions_count', 'delta_step_theory', 'delta_total',
       'drift_boost_remaining', 'drift_flag', 'eps_converted',
       'eps_step_theory', 'eta_t', 'event', 'event_id', 'event_type',
       'lambda_est', 'm_theory', 'op', 'regret', 'relaxation_factor',
       'rho_remaining', 'rho_spent', 'rho_step', 'rho_total', 'sample_id',
       'segment_id', 'sens_delete', 'sigma_step_base', 'sigma_step_theory',
       'x_norm'],
      dtype='object')

In [6]:
"""
Analyze deletion-capacity runs (hashed grid IDs).
- Reads sweep/manifest.(json|csv), per-grid params.json, seed_*.csv, seed_*_events.csv
- Computes calibration/sample complexity, odometer/noise checks, regret vs bounds,
  capacity alignment, drift responsiveness, and comparator/path-length checks.
- Produces plots and CSVs under results/assessment/, and an optional Markdown summary.

USAGE
-----
# typical (auto-detect manifest in <base-out>/sweep):
python analyze_capacity.py --sweep-dir results/grid_2025_08_11/sweep --out-dir results/assessment

# limit to a few grids/seeds for fast sanity:
python analyze_capacity.py --sweep-dir results/grid_2025_08_11/sweep --max-grids 3 --max-seeds 2

# also write a high-level Markdown summary:
python analyze_capacity.py --sweep-dir results/grid_2025_08_11/sweep --write-report

OUTPUTS
-------
results/assessment/
  ├── summary_tables.csv                 # per (grid,seed) summary
  ├── missing_fields.csv                 # schema presence matrix
  ├── <grid_id>/
  │    ├── seed_<seed>/
  │    │    ├── regret_vs_bounds.png
  │    │    ├── m_live_vs_emp.png
  │    │    ├── noise_check_sample.csv
  │    │    └── metrics.json
  │    └── grid_summary.json
  └── REPORT.md (if --write-report)
"""

import argparse
import json
import math
import os
from pathlib import Path
from typing import Dict, Any, List, Tuple, Optional

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

try:
    from scipy.stats import spearmanr, kendalltau
    HAVE_SCIPY = True
except Exception:
    HAVE_SCIPY = False


# ----------------------------- IO helpers -----------------------------

def _ensure_dir(p: Path):
    p.mkdir(parents=True, exist_ok=True)

def load_manifest(sweep_dir: Path) -> Dict[str, Dict[str, Any]]:
    """Return map: grid_id -> params dict."""
    jpath = sweep_dir / "manifest.json"
    cpath = sweep_dir / "manifest.csv"
    if jpath.exists():
        return json.loads(jpath.read_text())
    elif cpath.exists():
        df = pd.read_csv(cpath)
        out = {}
        for _, row in df.iterrows():
            gid = str(row["grid_id"])
            params = row.drop(labels=["grid_id"]).to_dict()
            out[gid] = params
        return out
    else:
        raise FileNotFoundError(f"manifest.json/csv not found under {sweep_dir}")

def load_grid_params(grid_dir: Path) -> Dict[str, Any]:
    p = grid_dir / "params.json"
    if p.exists():
        try:
            return json.loads(p.read_text())
        except Exception:
            pass
    return {}

def find_seed_csvs(grid_dir: Path) -> List[Path]:
    return sorted(grid_dir.glob("seed_*.csv"))

def find_event_csv(seed_csv: Path) -> Optional[Path]:
    # expect sibling seed_<seed>_events.csv
    events = seed_csv.with_name(seed_csv.stem + "_events.csv")
    return events if events.exists() else None

def coerce_numeric(v):
    if isinstance(v, str):
        try:
            if any(c in v.lower() for c in (".", "e")):
                return float(v)
            return int(v)
        except Exception:
            return v
    return v

def first_present(d: Dict[str, Any], keys: List[str], default=None):
    for k in keys:
        if k in d and pd.notna(d[k]):
            return d[k]
    return default


# ----------------------------- math helpers -----------------------------

def red_ceil(x: float) -> int:
    try:
        return int(math.ceil(float(x)))
    except Exception:
        return int(x)

def safe_spearman(x: np.ndarray, y: np.ndarray) -> float:
    if len(x) < 3 or len(y) < 3:
        return np.nan
    if HAVE_SCIPY:
        r, _ = spearmanr(x, y, nan_policy="omit")
        return float(r)
    # fallback: manual rank corr
    xr = pd.Series(x).rank().to_numpy()
    yr = pd.Series(y).rank().to_numpy()
    xc = (xr - xr.mean()) / (xr.std() + 1e-12)
    yc = (yr - yr.mean()) / (yr.std() + 1e-12)
    return float(np.nanmean(xc * yc))

def safe_kendall(x: np.ndarray, y: np.ndarray) -> float:
    if len(x) < 3 or len(y) < 3:
        return np.nan
    if HAVE_SCIPY:
        r, _ = kendalltau(x, y, nan_policy="omit")
        return float(r)
    return np.nan


# ----------------------------- analysis primitives -----------------------------

def derive_gamma_split(params: Dict[str, Any], row: Dict[str, Any]) -> Tuple[Optional[float], Optional[float], Optional[float]]:
    """Return (gamma_bar, alpha, (gamma_ins, gamma_del)) using CSV columns first, else params."""
    gamma_bar = first_present(row, ["gamma_bar"], params.get("gamma_bar"))
    alpha = first_present(row, ["gamma_split", "alpha"], params.get("gamma_split", params.get("alpha")))
    if gamma_bar is None or alpha is None:
        return None, None, (None, None)
    try:
        gamma_bar = float(gamma_bar)
        alpha = float(alpha)
        gamma_ins = (1.0 - alpha) * gamma_bar
        gamma_del = alpha * gamma_bar
        return gamma_bar, alpha, (gamma_ins, gamma_del)
    except Exception:
        return None, None, (None, None)

def rederive_N_star(G, D, c, C, gamma_ins) -> Optional[int]:
    try:
        val = (G * D * math.sqrt(c * C) / max(1e-12, gamma_ins)) ** 2
        return red_ceil(val)
    except Exception:
        return None

def adaptive_regret_bound(G, D, c, C, S_T) -> Optional[float]:
    try:
        return G * D * math.sqrt(c * C * S_T)
    except Exception:
        return None

def static_regret_bound(G, lam, c, T) -> Optional[float]:
    try:
        return (G ** 2 / max(1e-12, lam * c)) * (1.0 + math.log(max(1, int(T))))
    except Exception:
        return None

def dynamic_regret_bound(G, lam, c, T, P_T) -> Optional[float]:
    stat = static_regret_bound(G, lam, c, T)
    if stat is None or P_T is None:
        return None
    try:
        return stat + G * P_T
    except Exception:
        return None

def recompute_sigma_epsdelta(sigma_base, eps_step, delta_step) -> Optional[float]:
    """sigma_base should be L / lambda_est; formula: base * sqrt(2 ln(1.25/delta_step))/eps_step"""
    try:
        return sigma_base * (math.sqrt(2.0 * math.log(1.25 / max(1e-18, float(delta_step)))) / max(1e-18, float(eps_step)))
    except Exception:
        return None

def recompute_sigma_zcdp(sens_delete, rho_step) -> Optional[float]:
    try:
        return sens_delete / math.sqrt(2.0 * max(1e-18, float(rho_step)))
    except Exception:
        return None


# ----------------------------- per-seed analysis -----------------------------

def analyze_seed(
    grid_id: str,
    seed_csv: Path,
    params: Dict[str, Any],
    out_dir: Path
) -> Dict[str, Any]:
    """
    Returns a metrics dict and writes per-seed artifacts to out_dir.
    """
    _ensure_dir(out_dir)
    df = pd.read_csv(seed_csv)

    # Merge in params for missing columns
    for k, v in params.items():
        if k not in df.columns:
            df[k] = v

    # Best-effort dataset/accountant
    dataset = first_present(df.iloc[-1].to_dict(), ["dataset"], params.get("dataset", "unknown"))
    acct = first_present(df.iloc[-1].to_dict(), ["accountant", "accountant_type"], params.get("accountant", "unknown"))

    # Basic signals
    G = float(first_present(df.iloc[-1].to_dict(), ["G_hat"], np.nan) or np.nan)
    D = float(first_present(df.iloc[-1].to_dict(), ["D_hat"], np.nan) or np.nan)
    c = float(first_present(df.iloc[-1].to_dict(), ["c_hat"], np.nan) or np.nan)
    C = float(first_present(df.iloc[-1].to_dict(), ["C_hat"], np.nan) or np.nan)
    lam = float(first_present(df.iloc[-1].to_dict(), ["lambda_est", "lambda_"], params.get("lambda_", np.nan)) or np.nan)
    S_T = float(first_present(df.iloc[-1].to_dict(), ["S_scalar"], np.nan) or np.nan)

    gamma_bar, alpha, (gamma_ins, gamma_del) = derive_gamma_split(params, df.iloc[-1].to_dict())

    # Re-derive N*
    N_star_logged = first_present(df.iloc[-1].to_dict(), ["N_star_theory", "N_star_live"], np.nan)
    N_star_derived = rederive_N_star(G, D, c, C, gamma_ins) if all(np.isfinite([G, D, c, C, gamma_ins or np.nan])) else np.nan

    # Capacity alignment (end-of-run)
    m_theory = first_present(df.iloc[-1].to_dict(), ["m_theory", "m_theory_live"], np.nan)
    m_emp = int((df["op"] == "delete").sum()) if "op" in df.columns else np.nan

    # Regret series (if per-event available, prefer that; otherwise cumsum if 'regret' is per-event already)
    # Seed CSVs in your setup are per-event when using output_granularity=event; otherwise seed-level summaries can't plot timeseries.
    # We handle both: if 'event_id' present and many rows, treat as event log.
    is_event_like = "event_id" in df.columns and df["event_id"].nunique() > 10
    if is_event_like:
        # cumulative regret
        if "regret" in df.columns:
            R_t = df["regret"].cumsum().to_numpy()
        else:
            R_t = np.array([])
        T = len(R_t)
        # adaptive/static/dynamic bounds
        R_adapt = adaptive_regret_bound(G, D, c, C, S_T) if np.isfinite(S_T) else np.nan
        R_static = static_regret_bound(G, lam, c, T) if np.isfinite(lam) and np.isfinite(c) else np.nan
        P_T = first_present(df.iloc[-1].to_dict(), ["P_T", "P_T_est"], np.nan)
        R_dyn = dynamic_regret_bound(G, lam, c, T, P_T) if np.isfinite(P_T) else np.nan

        # Plot regret vs bounds
        if R_t.size > 0:
            plt.figure()
            plt.plot(np.arange(1, T + 1), R_t, label="R_T (empirical)")
            if np.isfinite(R_adapt): plt.axhline(R_adapt, linestyle="--", label="R_adapt (bound)")
            if np.isfinite(R_static): plt.axhline(R_static, linestyle="--", label="R_static (bound)")
            if np.isfinite(R_dyn): plt.axhline(R_dyn, linestyle="--", label="R_dyn (bound)")
            plt.xlabel("Event")
            plt.ylabel("Cumulative regret")
            plt.title(f"{grid_id} / seed {seed_csv.stem.split('_')[-1]} — regret vs bounds")
            plt.legend()
            plt.tight_layout()
            plt.savefig(out_dir / "regret_vs_bounds.png", dpi=150)
            plt.close()
    else:
        R_t, R_adapt, R_static, R_dyn, T = np.array([]), np.nan, np.nan, np.nan, np.nan

    # Per-delete noise calibration sample
    noise_sample = []
    if "op" in df.columns:
        deletes = df[df["op"] == "delete"].copy()
        if not deletes.empty:
            # choose ~5 evenly spaced deletes
            idxs = np.linspace(0, len(deletes) - 1, num=min(5, len(deletes)), dtype=int)
            for _, row in deletes.iloc[idxs].iterrows():
                acct_row = str(first_present(row.to_dict(), ["accountant_type", "accountant"], acct))
                sigma_code = row.get("sigma_step_theory", np.nan)
                if acct_row in ("eps_delta", "default", "legacy"):
                    sigma_base = row.get("sigma_step_base", np.nan)  # expected L/lambda
                    eps_step = row.get("eps_step_theory", np.nan)
                    delta_step = row.get("delta_step_theory", np.nan)
                    sigma_theory = recompute_sigma_epsdelta(sigma_base, eps_step, delta_step)
                elif acct_row in ("zcdp", "rdp"):
                    sens = row.get("sens_delete", np.nan)
                    rho_step = row.get("rho_step", np.nan)
                    sigma_theory = recompute_sigma_zcdp(sens, rho_step)
                else:
                    sigma_theory = np.nan
                abs_err = np.nan
                rel_err = np.nan
                if np.isfinite(sigma_code) and sigma_theory is not None and np.isfinite(sigma_theory):
                    abs_err = float(abs(sigma_code - sigma_theory))
                    denom = max(1e-12, abs(sigma_theory))
                    rel_err = float(abs_err / denom)
                noise_sample.append({
                    "event_id": int(row.get("event_id", -1)) if "event_id" in row else -1,
                    "sigma_code": sigma_code,
                    "sigma_recomputed": sigma_theory,
                    "abs_err": abs_err,
                    "rel_err": rel_err,
                    "accountant_type": acct_row,
                    "sens_delete": row.get("sens_delete", np.nan),
                    "eps_step_theory": row.get("eps_step_theory", np.nan),
                    "delta_step_theory": row.get("delta_step_theory", np.nan),
                    "rho_step": row.get("rho_step", np.nan),
                })
            # write sample
            pd.DataFrame(noise_sample).to_csv(out_dir / "noise_check_sample.csv", index=False)

    # Capacity vs deletes over time (if event log)
    corr_m = np.nan
    if is_event_like and "m_theory" in df.columns:
        # build a time series of realized cumulative deletes
        cum_del = (df["op"] == "delete").astype(int).cumsum().to_numpy()
        m_series = df["m_theory"].to_numpy()
        # spearman correlation across time
        corr_m = safe_spearman(cum_del, m_series)
        # plot
        plt.figure()
        plt.plot(cum_del, label="cumulative deletes (emp)")
        plt.plot(m_series, label="m_theory (live/logged)")
        if "capacity_remaining" in df.columns:
            plt.plot(df["capacity_remaining"].to_numpy(), label="capacity_remaining")
        plt.xlabel("Event")
        plt.legend()
        plt.title(f"{grid_id} / seed {seed_csv.stem.split('_')[-1]} — capacity vs deletes")
        plt.tight_layout()
        plt.savefig(out_dir / "m_live_vs_emp.png", dpi=150)
        plt.close()

    # Schema presence (for quick matrix later)
    required = ["G_hat", "D_hat", "sigma_step_theory"]
    missing = [col for col in required if col not in df.columns]

    # Summaries
    metrics = {
        "grid_id": grid_id,
        "seed": int(seed_csv.stem.split("_")[-1]),
        "dataset": dataset,
        "accountant_type": acct,
        "G_hat": G, "D_hat": D, "c_hat": c, "C_hat": C, "lambda_est": lam,
        "S_scalar": S_T,
        "gamma_bar": gamma_bar, "gamma_split": alpha,
        "gamma_ins": gamma_ins, "gamma_del": gamma_del,
        "N_star_logged": float(N_star_logged) if pd.notna(N_star_logged) else np.nan,
        "N_star_derived": float(N_star_derived) if N_star_derived is not None else np.nan,
        "N_star_rel_err": (
            abs(float(N_star_logged) - float(N_star_derived)) / max(1.0, float(N_star_derived))
            if pd.notna(N_star_logged) and N_star_derived not in (None, 0, np.nan) else np.nan
        ),
        "m_theory_final": float(m_theory) if pd.notna(m_theory) else np.nan,
        "m_emp_final": float(m_emp) if pd.notna(m_emp) else np.nan,
        "m_rel_err": (
            abs(float(m_theory) - float(m_emp)) / max(1.0, float(m_theory))
            if pd.notna(m_theory) and pd.notna(m_emp) and float(m_theory) > 0 else np.nan
        ),
        "regret_T": float(df["regret"].sum()) if "regret" in df.columns else np.nan,
        "R_adapt": float(adaptive_regret_bound(G, D, c, C, S_T)) if all(np.isfinite([G, D, c, C, S_T])) else np.nan,
        "R_static": float(static_regret_bound(G, lam, c, len(df))) if all(np.isfinite([G, lam, c])) else np.nan,
        "R_dynamic": float(dynamic_regret_bound(G, lam, c, len(df), first_present(df.iloc[-1].to_dict(), ["P_T", "P_T_est"], np.nan))) if all(np.isfinite([G, lam, c])) else np.nan,
        "spearman_m_live_vs_emp": corr_m,
        "missing_required_fields": ",".join(missing) if missing else "",
    }

    # dump metrics for the seed
    with open(out_dir / "metrics.json", "w") as f:
        json.dump(metrics, f, indent=2, sort_keys=True)

    return metrics


# ----------------------------- per-grid analysis -----------------------------

def analyze_grid(grid_id: str, grid_dir: Path, params: Dict[str, Any], out_root: Path, max_seeds: Optional[int]) -> List[Dict[str, Any]]:
    grid_out = out_root / grid_id
    _ensure_dir(grid_out)
    # persist params for convenience
    with open(grid_out / "grid_summary.json", "w") as f:
        json.dump(params, f, indent=2, sort_keys=True)

    seed_csvs = find_seed_csvs(grid_dir)
    if max_seeds:
        seed_csvs = seed_csvs[:max_seeds]

    all_metrics = []
    for seed_csv in seed_csvs:
        seed_out = grid_out / seed_csv.stem
        m = analyze_seed(grid_id, seed_csv, params, seed_out)
        all_metrics.append(m)
    return all_metrics


# ----------------------------- report writer -----------------------------

def write_summary_tables(all_metrics: List[Dict[str, Any]], out_dir: Path):
    if not all_metrics:
        return
    df = pd.DataFrame(all_metrics)
    df.to_csv(out_dir / "summary_tables.csv", index=False)

    # presence matrix
    presence_rows = []
    for m in all_metrics:
        presence_rows.append({
            "grid_id": m["grid_id"],
            "seed": m["seed"],
            "has_G_hat": not pd.isna(m.get("G_hat", np.nan)),
            "has_D_hat": not pd.isna(m.get("D_hat", np.nan)),
            "has_sigma_step_theory": (m.get("missing_required_fields", "") == ""),
        })
    pd.DataFrame(presence_rows).to_csv(out_dir / "missing_fields.csv", index=False)

def write_report(markdown_path: Path, summary_csv: Path):
    lines = []
    lines.append("# Deletion Capacity — Early Analysis Report\n")
    lines.append("This report summarizes calibration, odometer, and regret checks across grids/seeds.\n")
    lines.append(f"- Summary table: `{summary_csv}`\n")
    lines.append("\n## Key Checks\n")
    lines.append("- Re-derived N* within 5% of logged value.\n")
    lines.append("- Noise recomputation rel-error ≤ 5% for sampled deletes.\n")
    lines.append("- Spearman(m_theory vs cumulative deletes) ≥ 0.6 where event logs exist.\n")
    lines.append("\n## Artifacts\n")
    lines.append("- Per-grid, per-seed plots under `results/assessment/<grid_id>/seed_<seed>/`.\n")
    markdown_path.write_text("\n".join(lines))


# ----------------------------- main -----------------------------

def main():
    ap = argparse.ArgumentParser()
    ap.add_argument("--sweep-dir", required=True, help="Path to results/.../sweep directory")
    ap.add_argument("--out-dir", default="results/assessment", help="Where to write plots/tables")
    ap.add_argument("--max-grids", type=int, default=None, help="Limit number of grids")
    ap.add_argument("--max-seeds", type=int, default=None, help="Limit number of seeds per grid")
    ap.add_argument("--write-report", action="store_true", help="Write a brief Markdown summary")
    args = ap.parse_args()

    sweep_dir = Path(args.sweep_dir)
    out_dir = Path(args.out_dir)
    _ensure_dir(out_dir)

    manifest = load_manifest(sweep_dir)
    grid_ids = sorted(manifest.keys())
    if args.max_grids:
        grid_ids = grid_ids[:args.max_grids]

    all_metrics: List[Dict[str, Any]] = []

    for i, gid in enumerate(grid_ids, 1):
        print(f"[{i}/{len(grid_ids)}] Analyzing grid {gid}")
        grid_dir = sweep_dir / gid
        if not grid_dir.exists():
            print(f"  ! grid dir missing: {grid_dir}")
            continue
        # load per-grid params (prefer params.json, fallback to manifest entry)
        params = load_grid_params(grid_dir)
        if not params:
            params = manifest[gid]
        # coerce possible numeric strings
        params = {k: coerce_numeric(v) for k, v in params.items()}

        metrics = analyze_grid(gid, grid_dir, params, out_dir, args.max_seeds)
        all_metrics.extend(metrics)

    write_summary_tables(all_metrics, out_dir)
    if args.write_report:
        write_report(out_dir / "REPORT.md", out_dir / "summary_tables.csv")

    print(f"\nDone. Artifacts at: {out_dir.resolve()}")


if __name__ == "__main__":
    main()


usage: ipykernel_launcher.py [-h] --sweep-dir SWEEP_DIR [--out-dir OUT_DIR]
                             [--max-grids MAX_GRIDS] [--max-seeds MAX_SEEDS]
                             [--write-report]
ipykernel_launcher.py: error: the following arguments are required: --sweep-dir


SystemExit: 2

  warn("To exit: use 'exit', 'quit', or Ctrl-D.", stacklevel=1)


The code performs a grid search over the experiment parameters. For each seed,
1. **(Calibration.)** a `Calibrator` object draws a small sample of the data stream to estimate stream-attributes like the Lipschitz constant $L$, the upper and lower bound of the Hessian eigenvalues $C, C$, and the resulting sample complexity required to meet predefined accuracy goals.
2. **(Warmup.)** the model is trained on a stream of samples until it reaches sample complexity. This sets the model up for success when we test deletions.
3. **(Workload.)** a stream of interleaved insertions and deletions is passed to the model. It's expected to service the requests in the order they're given.

In [None]:
summary_statistics = data.describe()
summary_statistics.columns

ValueError: Cannot describe a DataFrame without columns

In [None]:
# get the individual event types
print(data["event_type"].unique())
print(data["event_type"].value_counts())

['calibrate' 'warmup' 'insert' 'delete']
event_type
warmup       169493
calibrate      2500
insert         1932
delete         1927
Name: count, dtype: int64


In [None]:
data

Unnamed: 0,C_hat,D_hat,G_hat,N_star_theory,acc,c_hat,capacity_remaining,delta_step_theory,delta_total,eps_spent,...,sigma_step_theory,gamma_learning,gamma_privacy,quantile,deletion_ratio,accountant_type,privacy_budget,seed,data_stream_type,algorithm
0,,,,,1.995444e+00,,inf,,,0.000000,...,,0.5,0.5,0.90,1,legacy,1.0,5,synthetic,memorypair
1,,,,,2.302350e+00,,inf,,,0.000000,...,,0.5,0.5,0.90,1,legacy,1.0,5,synthetic,memorypair
2,,,,,5.028844e+00,,inf,,,0.000000,...,,0.5,0.5,0.90,1,legacy,1.0,5,synthetic,memorypair
3,,,,,1.397093e+01,,inf,,,0.000000,...,,0.5,0.5,0.90,1,legacy,1.0,5,synthetic,memorypair
4,,,,,8.504764e+00,,inf,,,0.000000,...,,0.5,0.5,0.90,1,legacy,1.0,5,synthetic,memorypair
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
175847,1.0,4.710798,4.605077,1883.0,1.305034e+27,1.0,,1.426534e-08,0.00001,0.997147,...,195235.982215,0.5,0.5,0.90,1,legacy,1.0,0,synthetic,memorypair
175848,1.0,4.710798,4.605077,1883.0,,1.0,,1.426534e-08,0.00001,0.998573,...,195235.982215,0.5,0.5,0.90,1,legacy,1.0,0,synthetic,memorypair
175849,1.0,4.710798,4.605077,1883.0,9.006921e+25,1.0,,1.426534e-08,0.00001,0.998573,...,195235.982215,0.5,0.5,0.90,1,legacy,1.0,0,synthetic,memorypair
175850,1.0,4.710798,4.605077,1883.0,,1.0,,1.426534e-08,0.00001,1.000000,...,195235.982215,0.5,0.5,0.90,1,legacy,1.0,0,synthetic,memorypair


In [None]:
seed_level_data = data.loc[data["event_type"].isnull()]
event_level_data = data.loc[~data["event_type"].isnull()]

In [None]:
event_level_data

Unnamed: 0,C_hat,D_hat,G_hat,N_star_theory,acc,c_hat,capacity_remaining,delta_step_theory,delta_total,eps_spent,...,sigma_step_theory,gamma_learning,gamma_privacy,quantile,deletion_ratio,accountant_type,privacy_budget,seed,data_stream_type,algorithm
0,,,,,1.995444e+00,,inf,,,0.000000,...,,0.5,0.5,0.90,1,legacy,1.0,5,synthetic,memorypair
1,,,,,2.302350e+00,,inf,,,0.000000,...,,0.5,0.5,0.90,1,legacy,1.0,5,synthetic,memorypair
2,,,,,5.028844e+00,,inf,,,0.000000,...,,0.5,0.5,0.90,1,legacy,1.0,5,synthetic,memorypair
3,,,,,1.397093e+01,,inf,,,0.000000,...,,0.5,0.5,0.90,1,legacy,1.0,5,synthetic,memorypair
4,,,,,8.504764e+00,,inf,,,0.000000,...,,0.5,0.5,0.90,1,legacy,1.0,5,synthetic,memorypair
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
175847,1.0,4.710798,4.605077,1883.0,1.305034e+27,1.0,,1.426534e-08,0.00001,0.997147,...,195235.982215,0.5,0.5,0.90,1,legacy,1.0,0,synthetic,memorypair
175848,1.0,4.710798,4.605077,1883.0,,1.0,,1.426534e-08,0.00001,0.998573,...,195235.982215,0.5,0.5,0.90,1,legacy,1.0,0,synthetic,memorypair
175849,1.0,4.710798,4.605077,1883.0,9.006921e+25,1.0,,1.426534e-08,0.00001,0.998573,...,195235.982215,0.5,0.5,0.90,1,legacy,1.0,0,synthetic,memorypair
175850,1.0,4.710798,4.605077,1883.0,,1.0,,1.426534e-08,0.00001,1.000000,...,195235.982215,0.5,0.5,0.90,1,legacy,1.0,0,synthetic,memorypair


### What does this data mean?

The data we get from the experiment is incredibly granular. This is good because we can isolate the impact of different operations on the regret. A list of the parameters is included below:

`Data Stream Attributes`
- $q$ is the quantile used for selecting the parameter estimates so we don't accidentally pull a high-ass parameter estimate
- $\widehat{C}$ is the upper bound on the Hessian eigenvalues
- $\widehat{C}$ is the lower bound on the Hessian eigenvalues
- $\widehat{D}$ is the upper bound of the diameter of the ellipsoid
- $\widehat{G}$ is the Lipschitz constant of the function, representing how much the output of the function changes as the inputs change
- $N^{\star}_{theory}$ is the theoretical sample complexity to reach the specified amount of average-regret

`Workload Parameters`
- $k$ is the number of insertions per delete operation
- $m_{emp}$ is the empirical deletion capacity of the seed

`Privacy Parameters`
- $\delta_{total}$ and $\varepsilon_{total}$ are the total $(\varepsilon,\delta)$ budget given the the accountant
- $\delta_{step}$ and $\varepsilon_{step}$ are the amount of privacy "spent" per deletion


`Event-Level Attributes`
- $event$ is the zero-based index of the operation within the seed run
- $avg\_regret\_empirical$ is the mean per-operation regret for the stream of events up to this point

## Theoretical Sample Complexities

We can calculate theoretical sample complexities using the data we get from calibration. 

The formula for the sample complexity is based entirely on the attributes of our data stream and its spread: $G$, $D$, and $\sqrt{cC}$ and so the estimates from our calibration period actually mean a lot. A large estimate for Lipschitz constant, or the bounds of our Hessian eigenvalues means we'll have an artificially inflated Sample Complexity.

$$
S = [\frac{GD\sqrt{Cc}}{\gamma_{learn}}]^{2}
$$

It's also worth noting that the sample complexity is already quite conservative because of the method used for accouting. 

In [None]:
sample_complexity_calculations = seed_level_data[["seed", "N_star_theory", "C_hat", "c_hat", "D_hat", "G_hat", "gamma_learning"]]
sample_complexity_calculations

Unnamed: 0,seed,N_star_theory,C_hat,c_hat,D_hat,G_hat,gamma_learning


### Interpreting $\gamma$ Parameters

**Question:** What is the interpretation of $\gamma_{learn}$ and how is it used to calculate sample complexity and deletion capacity?

**Answer:** If $\gamma_{learn}$ is the amount of slack given to the learner, then a $\gamma_{learn}$ of `0.5` is really inflating my sample complexity by 4. Consider a larger $\gamma_{learn}$ for the first round of experiments so that you don't blow up your sample complexity too early.

The large sample complexities can also be an issue because our `max_events` parameter is set to 100000. So if the sample complexity is any larger than that, then the learner wouldn't even be able to unlearn a single point.

**Question:** Okay, so we have two parameters $\gamma_{learn}$ and $\gamma_{private}$, why do we need them both? What's the difference between the learning parameter or the private parameter?

**Answer:** They were separated because we need two separate slack parameters. One is used to bound the average regret during the learning period, and the second is used to bound the average regret when processing the workload.

### Effects of Limited Convexity

If the loss function is only weakly convex, then the experiment would end before the sample complexity is reached, and so even doing a single insertion would be a waste of time. I'm increasing the maximum number of events to allow for more of the experiments to reach this stage.

**Note:** a suggestion would be to replace the two gamma parameters with a single $\alpha$ that's used to split the amount of slack given to deletions versus insertions. 


## Theoretical Deletion Capacities 

The $\gamma_{priv}$ is also used to calculate deletion capacity. The quantifies the amount of cumulative regret you're willing to pay for all future deletions. It's used to calculate the upper bound on deletion capacity.

$$
m \leq \gamma_{priv} \times \frac{N^{\star}}{GD + \sigma\sqrt{2N^{*}\ln{\frac{1}{\delta_{step}}}}}
$$

The deletion capacity is only determined once the warmup has completed. We use the calibration statistics and the results from the warmup to calculate the theoretical deletion capacity for the experiment. This is the maximum number of deletions served (although many seeds never reach that point) and is used to calibrate the noise in the standard odometer.

For some reason, we're not getting the $m_{theory}$ that we need to actually run the experiment.

In [None]:
deletion_capacity_data = seed_level_data[["seed", "m_theory","m_emp", "gamma_priv", "G_hat", "D_hat", "N_star_theory", "sigma_step_theory"]]
deletion_capacity_data

KeyError: "['m_emp', 'gamma_priv'] not in index"