## Visualizing Results of the Deletion Capacity Experiment

These are the results of the deletion capacity experiment. 

At a high level, we're seeing very conservative regret bounds for the Memory Pair. This means that we're requiring large sample complexity in return for a very low deletion capacity.

It's also worth noting that our sample complexity (bar for a good learner) increases as the data wiggles more. When the Lipschitz constant and upper-bound on the Hessian are high, the sample complexity jumps and the amount of noise injected to the model becomes destabilizingly high.

Goals:
- Analyze the simulation results from the experiment runs and visualize the cumulative regret
- Focus on $\widehat{G}$ such that we can see its impact on the downstream stability of the learner
- Investigate alternative methods of privacy accounting. Can we get tigheter regret bounds such that we don't inject so much noise into the parameter estimates.


### Page-Wide Questions
- The formulas for sample complexity and deletion capacity look very similar (ie. use the $GD$ term). Why is this the case, and what does this suggest about the relationship between these two formulas? If I were to divide sample complexity by deletion capacity, it would almost look like something like a harmonic mean.
- I wonder how $\widehat{D}$ is being estimated. It looks like a lot of seeds are capping it at 10, which is a worst-case scenario. Is there something that can reduce this?

In [1]:
import pandas as pd
import numpy as np
import random
import re
import os
import seaborn as sns
import matplotlib.pyplot as plt

In [2]:
sample_record = """
C_hat,D_hat,G_hat,N_star_theory,P_T_est,S_scalar,acc,c_hat,capacity_remaining,delta_step_theory,delta_total,eps_spent,eps_step_theory,eta_t,event,event_id,event_type,lambda_est,m_theory,op,regret,rho_step,sample_id,segment_id,sens_delete,sigma_step,sigma_step_theory,x_norm
,,,,,,4.511492515380755,,inf,,,0.0,,,0,0,calibrate,,,calibrate,0.0,,linear_053060,0,,,,5.282049655914307
,,,,,,0.6477066254256968,,inf,,,0.0,,,1,1,calibrate,,,calibrate,0.0,,linear_902593,0,,,,3.2045962810516357
,,,,,,3.6940734626420304,,inf,,,0.0,,,2,2,calibrate,,,calibrate,0.0,,linear_086976,0,,,,3.4850804805755615
,,,,,,1.3768957923705547,,inf,,,0.0,,,3,3,calibrate,,,calibrate,0.0,,linear_828659,0,,,,4.006361961364746
,,,,,,0.9326645666037168,,inf,,,0.0,,,4,4,calibrate,,,calibrate,0.0,,linear_958359,0,,,,3.5934643745422363
"""

columns = [
    "C_hat", "D_hat", "G_hat", "N_star_theory", "P_T_est", "S_scalar", "acc", "c_hat", "capacity_remaining", "delta_step_theory", "delta_total", "eps_spent", "eps_step_theory", "eta_t", "event", "event_id", "event_type", "lambda_est", "m_theory", "op", "regret", "rho_step", "sample_id", "segment_id", "sens_delete", "sigma_step", "sigma_step_theory", "x_norm"
]

file_path = "/workspaces/unlearning-research-meta/experiments/deletion_capacity/results/grid_2025_01_01/sweep/split_0.3-0.7_q0.90_k1_legacy_eps1.0/seed_001_synthetic_memorypair.csv"

data = pd.read_csv(file_path, names=columns, header=None, skiprows=1)


In [3]:
data.columns

# create a dictionary of dataframes by event type
event_dfs = {event: data[data["event_type"] == event] for event in data["event_type"].unique()}
event_dfs.keys()

dict_keys(['calibrate', 'warmup', 'insert', 'delete'])

In [12]:
# print value counts for each event type
for event_type, df in event_dfs.items():
    print(f"Value counts for {event_type}:")
    print(df.describe())

Value counts for calibrate:
       C_hat  D_hat  G_hat  N_star_theory  P_T_est  S_scalar         acc  \
count    0.0    0.0    0.0            0.0      0.0       0.0  500.000000   
mean     NaN    NaN    NaN            NaN      NaN       NaN    1.709943   
std      NaN    NaN    NaN            NaN      NaN       NaN    1.281029   
min      NaN    NaN    NaN            NaN      NaN       NaN    0.005286   
25%      NaN    NaN    NaN            NaN      NaN       NaN    0.708226   
50%      NaN    NaN    NaN            NaN      NaN       NaN    1.502960   
75%      NaN    NaN    NaN            NaN      NaN       NaN    2.417920   
max      NaN    NaN    NaN            NaN      NaN       NaN    7.185795   

       c_hat  capacity_remaining  delta_step_theory  ...    event_id  \
count    0.0               500.0                0.0  ...  500.000000   
mean     NaN                 inf                NaN  ...  249.500000   
std      NaN                 NaN                NaN  ...  144.481833   

  sqr = _ensure_numeric((avg - values) ** 2)
  diff_b_a = subtract(b, a)


       C_hat         D_hat       G_hat  N_star_theory  P_T_est  S_scalar  \
count  315.0  3.150000e+02  315.000000          315.0      0.0       0.0   
mean     1.0  5.213894e-01   16.420728          815.0      NaN       NaN   
std      0.0  1.111989e-16    0.000000            0.0      NaN       NaN   
min      1.0  5.213894e-01   16.420728          815.0      NaN       NaN   
25%      1.0  5.213894e-01   16.420728          815.0      NaN       NaN   
50%      1.0  5.213894e-01   16.420728          815.0      NaN       NaN   
75%      1.0  5.213894e-01   16.420728          815.0      NaN       NaN   
max      1.0  5.213894e-01   16.420728          815.0      NaN       NaN   

              acc  c_hat  capacity_remaining  delta_step_theory  ...  \
count  315.000000  315.0                 0.0                0.0  ...   
mean     1.455746    1.0                 NaN                NaN  ...   
std      1.102797    0.0                 NaN                NaN  ...   
min      0.008724    1.0   

The code performs a grid search over the experiment parameters. For each seed,
1. **(Calibration.)** a `Calibrator` object draws a small sample of the data stream to estimate stream-attributes like the Lipschitz constant $L$, the upper and lower bound of the Hessian eigenvalues $C, C$, and the resulting sample complexity required to meet predefined accuracy goals.
2. **(Warmup.)** the model is trained on a stream of samples until it reaches sample complexity. This sets the model up for success when we test deletions.
3. **(Workload.)** a stream of interleaved insertions and deletions is passed to the model. It's expected to service the requests in the order they're given.

In [None]:
summary_statistics = data.describe()
summary_statistics.columns

ValueError: Cannot describe a DataFrame without columns

In [None]:
# get the individual event types
print(data["event_type"].unique())
print(data["event_type"].value_counts())

['calibrate' 'warmup' 'insert' 'delete']
event_type
warmup       169493
calibrate      2500
insert         1932
delete         1927
Name: count, dtype: int64


In [None]:
data

Unnamed: 0,C_hat,D_hat,G_hat,N_star_theory,acc,c_hat,capacity_remaining,delta_step_theory,delta_total,eps_spent,...,sigma_step_theory,gamma_learning,gamma_privacy,quantile,deletion_ratio,accountant_type,privacy_budget,seed,data_stream_type,algorithm
0,,,,,1.995444e+00,,inf,,,0.000000,...,,0.5,0.5,0.90,1,legacy,1.0,5,synthetic,memorypair
1,,,,,2.302350e+00,,inf,,,0.000000,...,,0.5,0.5,0.90,1,legacy,1.0,5,synthetic,memorypair
2,,,,,5.028844e+00,,inf,,,0.000000,...,,0.5,0.5,0.90,1,legacy,1.0,5,synthetic,memorypair
3,,,,,1.397093e+01,,inf,,,0.000000,...,,0.5,0.5,0.90,1,legacy,1.0,5,synthetic,memorypair
4,,,,,8.504764e+00,,inf,,,0.000000,...,,0.5,0.5,0.90,1,legacy,1.0,5,synthetic,memorypair
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
175847,1.0,4.710798,4.605077,1883.0,1.305034e+27,1.0,,1.426534e-08,0.00001,0.997147,...,195235.982215,0.5,0.5,0.90,1,legacy,1.0,0,synthetic,memorypair
175848,1.0,4.710798,4.605077,1883.0,,1.0,,1.426534e-08,0.00001,0.998573,...,195235.982215,0.5,0.5,0.90,1,legacy,1.0,0,synthetic,memorypair
175849,1.0,4.710798,4.605077,1883.0,9.006921e+25,1.0,,1.426534e-08,0.00001,0.998573,...,195235.982215,0.5,0.5,0.90,1,legacy,1.0,0,synthetic,memorypair
175850,1.0,4.710798,4.605077,1883.0,,1.0,,1.426534e-08,0.00001,1.000000,...,195235.982215,0.5,0.5,0.90,1,legacy,1.0,0,synthetic,memorypair


In [None]:
seed_level_data = data.loc[data["event_type"].isnull()]
event_level_data = data.loc[~data["event_type"].isnull()]

In [None]:
event_level_data

Unnamed: 0,C_hat,D_hat,G_hat,N_star_theory,acc,c_hat,capacity_remaining,delta_step_theory,delta_total,eps_spent,...,sigma_step_theory,gamma_learning,gamma_privacy,quantile,deletion_ratio,accountant_type,privacy_budget,seed,data_stream_type,algorithm
0,,,,,1.995444e+00,,inf,,,0.000000,...,,0.5,0.5,0.90,1,legacy,1.0,5,synthetic,memorypair
1,,,,,2.302350e+00,,inf,,,0.000000,...,,0.5,0.5,0.90,1,legacy,1.0,5,synthetic,memorypair
2,,,,,5.028844e+00,,inf,,,0.000000,...,,0.5,0.5,0.90,1,legacy,1.0,5,synthetic,memorypair
3,,,,,1.397093e+01,,inf,,,0.000000,...,,0.5,0.5,0.90,1,legacy,1.0,5,synthetic,memorypair
4,,,,,8.504764e+00,,inf,,,0.000000,...,,0.5,0.5,0.90,1,legacy,1.0,5,synthetic,memorypair
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
175847,1.0,4.710798,4.605077,1883.0,1.305034e+27,1.0,,1.426534e-08,0.00001,0.997147,...,195235.982215,0.5,0.5,0.90,1,legacy,1.0,0,synthetic,memorypair
175848,1.0,4.710798,4.605077,1883.0,,1.0,,1.426534e-08,0.00001,0.998573,...,195235.982215,0.5,0.5,0.90,1,legacy,1.0,0,synthetic,memorypair
175849,1.0,4.710798,4.605077,1883.0,9.006921e+25,1.0,,1.426534e-08,0.00001,0.998573,...,195235.982215,0.5,0.5,0.90,1,legacy,1.0,0,synthetic,memorypair
175850,1.0,4.710798,4.605077,1883.0,,1.0,,1.426534e-08,0.00001,1.000000,...,195235.982215,0.5,0.5,0.90,1,legacy,1.0,0,synthetic,memorypair


### What does this data mean?

The data we get from the experiment is incredibly granular. This is good because we can isolate the impact of different operations on the regret. A list of the parameters is included below:

`Data Stream Attributes`
- $q$ is the quantile used for selecting the parameter estimates so we don't accidentally pull a high-ass parameter estimate
- $\widehat{C}$ is the upper bound on the Hessian eigenvalues
- $\widehat{C}$ is the lower bound on the Hessian eigenvalues
- $\widehat{D}$ is the upper bound of the diameter of the ellipsoid
- $\widehat{G}$ is the Lipschitz constant of the function, representing how much the output of the function changes as the inputs change
- $N^{\star}_{theory}$ is the theoretical sample complexity to reach the specified amount of average-regret

`Workload Parameters`
- $k$ is the number of insertions per delete operation
- $m_{emp}$ is the empirical deletion capacity of the seed

`Privacy Parameters`
- $\delta_{total}$ and $\varepsilon_{total}$ are the total $(\varepsilon,\delta)$ budget given the the accountant
- $\delta_{step}$ and $\varepsilon_{step}$ are the amount of privacy "spent" per deletion


`Event-Level Attributes`
- $event$ is the zero-based index of the operation within the seed run
- $avg\_regret\_empirical$ is the mean per-operation regret for the stream of events up to this point

## Theoretical Sample Complexities

We can calculate theoretical sample complexities using the data we get from calibration. 

The formula for the sample complexity is based entirely on the attributes of our data stream and its spread: $G$, $D$, and $\sqrt{cC}$ and so the estimates from our calibration period actually mean a lot. A large estimate for Lipschitz constant, or the bounds of our Hessian eigenvalues means we'll have an artificially inflated Sample Complexity.

$$
S = [\frac{GD\sqrt{Cc}}{\gamma_{learn}}]^{2}
$$

It's also worth noting that the sample complexity is already quite conservative because of the method used for accouting. 

In [None]:
sample_complexity_calculations = seed_level_data[["seed", "N_star_theory", "C_hat", "c_hat", "D_hat", "G_hat", "gamma_learning"]]
sample_complexity_calculations

Unnamed: 0,seed,N_star_theory,C_hat,c_hat,D_hat,G_hat,gamma_learning


### Interpreting $\gamma$ Parameters

**Question:** What is the interpretation of $\gamma_{learn}$ and how is it used to calculate sample complexity and deletion capacity?

**Answer:** If $\gamma_{learn}$ is the amount of slack given to the learner, then a $\gamma_{learn}$ of `0.5` is really inflating my sample complexity by 4. Consider a larger $\gamma_{learn}$ for the first round of experiments so that you don't blow up your sample complexity too early.

The large sample complexities can also be an issue because our `max_events` parameter is set to 100000. So if the sample complexity is any larger than that, then the learner wouldn't even be able to unlearn a single point.

**Question:** Okay, so we have two parameters $\gamma_{learn}$ and $\gamma_{private}$, why do we need them both? What's the difference between the learning parameter or the private parameter?

**Answer:** They were separated because we need two separate slack parameters. One is used to bound the average regret during the learning period, and the second is used to bound the average regret when processing the workload.

### Effects of Limited Convexity

If the loss function is only weakly convex, then the experiment would end before the sample complexity is reached, and so even doing a single insertion would be a waste of time. I'm increasing the maximum number of events to allow for more of the experiments to reach this stage.

**Note:** a suggestion would be to replace the two gamma parameters with a single $\alpha$ that's used to split the amount of slack given to deletions versus insertions. 


## Theoretical Deletion Capacities 

The $\gamma_{priv}$ is also used to calculate deletion capacity. The quantifies the amount of cumulative regret you're willing to pay for all future deletions. It's used to calculate the upper bound on deletion capacity.

$$
m \leq \gamma_{priv} \times \frac{N^{\star}}{GD + \sigma\sqrt{2N^{*}\ln{\frac{1}{\delta_{step}}}}}
$$

The deletion capacity is only determined once the warmup has completed. We use the calibration statistics and the results from the warmup to calculate the theoretical deletion capacity for the experiment. This is the maximum number of deletions served (although many seeds never reach that point) and is used to calibrate the noise in the standard odometer.

For some reason, we're not getting the $m_{theory}$ that we need to actually run the experiment.

In [None]:
deletion_capacity_data = seed_level_data[["seed", "m_theory","m_emp", "gamma_priv", "G_hat", "D_hat", "N_star_theory", "sigma_step_theory"]]
deletion_capacity_data

KeyError: "['m_emp', 'gamma_priv'] not in index"