### Simple test task
Use sklearn make_classification to generate a dummy dataset and train a model.

In [1]:
import numpy as np
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression


X1, y1 = make_classification(n_samples=500, n_features=15, n_classes=2, random_state=42, class_sep=1.0)
X2, y2 = make_classification(n_samples=500, n_features=15, n_redundant=5, n_classes=2, random_state=42, class_sep=0.4)

A1 = np.ones(y1.shape)
A2 = 2 * np.ones(y2.shape)

X = np.vstack([X1, X2])
y = np.hstack([y1, y2])
A = np.hstack([A1, A2])

X_train, X_test, y_train, y_test, A_train, A_test = train_test_split(X, y, A, test_size=0.3, random_state=42)

clf = LogisticRegression()
clf.fit(X_train, y_train, A_train)
y_prob_train = clf.predict_proba(X_train)[:, 1]
y_prob_test = clf.predict_proba(X_test)[:, 1]

### Default method for tests
We used the ApproxThreshNet model for our main results.

In [5]:
from approx_thresh import ApproxThresholdNet
from metrics import tpr, fpr, precision

m_funcs = {
    'tpr': tpr,
    'fpr': fpr,
    'precision': precision
}

model = ApproxThresholdNet(metric_functions=m_funcs, 
                        lambda_=0.9, 
                        max_error=0.001, 
                        max_total_combinations=50000)

model.fit(y_prob_train, y_train, A_train)

print("Determined thresholds for each group:")
print(model.thresholds_)

Adjusted max_error to 0.0017674423812903172 to limit total combinations to approximately 50000
Number of points in the epsilon net: 225
Adjusted max_error: 0.0017674423812903172
Number of points in data: 700
<multiprocessing.context.SpawnContext object at 0x7fec50944520>
Enters pool
Submits futures


Threshold Combinations: 100%|██████████| 50625/50625 [00:10<00:00, 5010.89it/s]


Best objective value: 0.2164160941884889
Best thresholds: {1.0: 0.4151785714285714, 2.0: 0.48660714285714285}
Epsilon differences: {1.0: {2.0: array([0.18315135, 0.05394233, 0.08214286])}, 2.0: {1.0: array([0.18315135, 0.05394233, 0.08214286])}}
Determined thresholds for each group:
{1.0: 0.4151785714285714, 2.0: 0.48660714285714285}


### SGD for Optimizing Thresholds
We also experimented with using SGD to optimize the thresholds. 

In [3]:
from approx_thresh_pytorch import ApproxThresholdPytorch, tpr_sigmoid, fpr_sigmoid, precision_sigmoid, accuracy_sigmoid

m_funcs_soft = {
    'tpr': tpr_sigmoid,
    'fpr': fpr_sigmoid,
    'precision': precision_sigmoid
}

model_sgd = ApproxThresholdPytorch(metric_functions=m_funcs_soft, 
                                lambda_=2.0, 
                                global_metric=accuracy_sigmoid)

model_sgd.fit(y_prob_train, y_train, A_train)

print("Determined thresholds for each group:")
print(model_sgd.thresholds_)

Initialization 0, Epoch 0, Loss: 3.2249155044555664
Initialization 0, Epoch 10, Loss: 2.11283540725708
Initialization 0, Epoch 20, Loss: 2.0099377632141113
Initialization 0, Epoch 30, Loss: 1.999869465827942
Initialization 0, Epoch 40, Loss: 1.9988653659820557
Initialization 0, Epoch 50, Loss: 1.9987642765045166
Early stopping triggered at initialization 0, epoch 60
Initialization 1, Epoch 0, Loss: 0.714952290058136
Initialization 1, Epoch 10, Loss: 0.6668986678123474
Initialization 1, Epoch 20, Loss: 0.6629946827888489
Initialization 1, Epoch 30, Loss: 0.6625248789787292
Early stopping triggered at initialization 1, epoch 40
Initialization 2, Epoch 0, Loss: 3.4280388355255127
Initialization 2, Epoch 10, Loss: 2.399082660675049
Initialization 2, Epoch 20, Loss: 2.299625873565674
Initialization 2, Epoch 30, Loss: 2.2898051738739014
Initialization 2, Epoch 40, Loss: 2.288825035095215
Initialization 2, Epoch 50, Loss: 2.2887260913848877
Initialization 2, Epoch 60, Loss: 2.288715124130249
