# Calibration and Fusion

Python script from https://gitlab.eurecom.fr/nautsch/pybosaris/tree/master. Python implementation of BOSARIS toolkit https://sites.google.com/site/bosaristoolkit/. The script is based on `linear_calibrate_scores.m`, `train_linear_calibration.m` and `train_binary_classifier.m` from the BOSARIS Matlab toolkit.

**Ressources**

* [The BOSARIS Toolkit: Theory, Algorithms and
Code for Surviving the New DCF](https://arxiv.org/pdf/1304.2865.pdf)
* [The BOSARIS Toolkit User Guide: Theory, Algorithms and Code for Binary Classifier Score Processing](https://docs.google.com/viewer?a=v&pid=sites&srcid=ZGVmYXVsdGRvbWFpbnxib3NhcmlzdG9vbGtpdHxneDozOTEwZjAzZmM3ZThmNjE0)

**Notes**

* la calibration ne change pas l'EER, mais elle change la valeur du seuil qu'il faut utiliser pour obtenir le point EER;
* la calibration va changer le "detection cost", car il est évalué pour une valeur fixe du seuil;
* par contre le "minimum detection cost" est celui qu'on obtient en choisissant un seuil qui minimise le coût de détection, donc la calibration ne changera pas le "minimum detection cost" non plus;
* quand on soumet au NIST, on donne une liste de trials avec leurs scores, et le NIST trace la courbe DET, détermine le seuil pour l'EER, et calcule un "detection cost" avec un seuil qui est prédeterminé.
* Dans le plan d'évaluation 2019, il y a en fait deux seuils, nommés log(\beta_1) et log(\beta_2), qui seront utilisés et la mesure principale, le C_primary, est la moyenne des deux coûts de détection correspondants.
* en conclusion, le C_primary sera influencé par la calibration. En principe, une calibration idéale ferait qu'on obtiendrait la valeur minimale de C_primary  pour les seuils fixes log(\beta_1) et log(\beta_2) .

**DET Curve**

<img src="imgs/det-curve.jpg" alt="IMAGE ALT TEXT HERE" align="middle" width="500" border="1" /></a>


https://link.springer.com/referenceworkentry/10.1007%2F978-0-387-73003-5_643

Detection Error Tradeoff curves are ROC (receiver operating characteristic) type curves showing the range of operating points of systems performing detection tasks as a threshold is varied to alter the miss and false alarm rates and plotted using a normal deviate scale for each axis. DET curves have the property that if the underlying score distributions for the two types of trials are normal, the curve becomes a straight line. They have been widely used to present the performance characteristics of speaker recognition systems.

**DCF Plot**

*Good Calibration*

<img src="imgs/dcf-curve.jpg" alt="IMAGE ALT TEXT HERE" align="middle" width="500" border="1" /></a>

https://arxiv.org/pdf/1304.2865.pdf

**The following description refers to the above plot.**

The plot show curves for dev database used to train the calibration. The actual Bayes error-rate (red), and contributions, is shown by plain lines. The minimum Bayes error-rate (thick red), and contributions, is shown by dashed lines. 

BOSARIS can plot contributions of the misses and false alarms to both the minimum Bayes error-rate (respectively dashed cyan and dashed pink) and actual Bayes error-rate (respectively plain blue and plain greend). 

The operating points `logit(P_tar)` are shown on the plots by the vertical dashed magenta line at `logit(0.005)` and logit`(0.01)`. `dev FA DR30` refers to the point to the left of which there are fewer than 30 false-alarms. DR30 refer to Doddington’s Rule of 30. This rule suggests you need at least 30 false-alarms and at least 30 misses for meaningful evaluation. The toolkit can plot both the DR30 point for the misses (to the right of which the absolute number of misses drops below 30) and the one for the false alarms (to the left of which the absolute number of false-alarms drops below 30). These points are on the Emin curve, because we use the false-alarm count and miss count that result from the evaluator’s optimized threshold.

*Bad Calibration*

<img src="imgs/bad-dcf-curve.jpg" alt="IMAGE ALT TEXT HERE" align="middle" width="500" border="1" /></a>

In [None]:
from unittest import TestCase
import os
import sys
lib_path = os.path.join(os.path.dirname('.'), "pybosaris-master")
sys.path.append(lib_path)
from pybosaris.calibration.linear_fuser import LinearFuser
from pybosaris.calibration.objectives import evaluate_objective
from pybosaris.calibration.training import train_binary_classifier
from pybosaris.libperformance import cllr, min_cllr
from operator import itemgetter
from sklearn.metrics import roc_curve
from scipy.optimize import brentq
from scipy.interpolate import interp1d
import numpy as np
import pandas as pd
import logging
import matplotlib.pyplot as plt
import matplotlib as mpl
mpl.rcParams['figure.dpi']= 300
%matplotlib inline
# DATA_PATH_v1 = '/Volumes/dfs/gen/Misc/data19/patx/alamja/sre19works/fusion_calibration/all_scores_with_keys'
# # DATA_PATH = '/Volumes/dfs/gen/Misc/data19/patx/monteijo/scores_sre2019/cosine/aspp_pre'
# DATA_PATH = '/Volumes/dfs/gen/Misc/scratch05/patx/alamja/sre19works/fusion_stuffs/final_to_olda_6oct2019/v2mfcc_coral/lda200_sre1216_adapted_sre19trn_sre19_unlab'
# DATA_PATH_2 = '/Volumes/dfs/gen/Misc/scratch05/patx/alamja/sre19works/fusion_stuffs/final_to_olda_6oct2019/v2sre04to18_train_sre19trn/lda200_train_combined_200k_sre19trn'
KEY_DIR = '/Volumes/dfs/gen/Misc/scratch05/patx/alamja/sre19works/fusion_stuffs/scripts_for_calibration_fusion/keys'
DATA_PATH = '/Volumes/dfs/gen/Misc/scratch05/patx/alamja/sre19works/fusion_stuffs/scripts_for_calibration_fusion/systems'
SUBSYS = 'CRIM_30SEP_V2MFCC_LDA250_SRE1618'

## 1. Preprocess training data

In [55]:
# Load initial scores with labels
dev_scores_labels = pd.read_csv(os.path.join(KEY_DIR, 'sre18_cmn2_evl_tst'), sep=' ', header=None)
dev_scores_labels.columns = ['speaker', 'test', 'label']
dev_scores_labels = dev_scores_labels.replace({'imp': 0, 'tgt': 1})

# Load actual scores without labels
dev_scores = pd.read_csv(os.path.join(DATA_PATH, SUBSYS, 'sre18_cmn2_evl_tst.txt'), sep=' ', header=None)
dev_scores.columns = ['speaker', 'test', 'score']

# Labels are missing so we will join the labels from the initial scores
scores = dev_scores.merge(dev_scores_labels[['speaker', 'test', 'label']], on=['speaker', 'test'])
scores.shape
scores.head()

(349685, 4)

Unnamed: 0,speaker,test,score,label
0,1131_sre18,aaeeiknb_sre18,-25.22875,0
1,1131_sre18,aaqclfew_sre18,-14.63305,0
2,1131_sre18,abmnegny_sre18,-28.95438,0
3,1131_sre18,aboerwai_sre18,-14.89049,0
4,1131_sre18,abwyevvm_sre18,-37.73,0


In [56]:
# Split data in target and non-target trials
tar = np.array(scores[scores.label == 1]['score'])
non = np.array(scores[scores.label == 0]['score'])

# Instantiate a LinearFuser
# Function handle for function that must be trained
# Get a starting point for the calibration weights: 'w0',
# which are zeros by default
train_scores = np.hstack((tar, non))
fuser = LinearFuser(scores=train_scores)

# Create label vector
# Let the trainer know which scores are target scores and which are non-target scores. 
ntar = tar.shape[0]
nnon = non.shape[0]
classf = np.hstack((np.ones(ntar), -np.ones(nnon)))

## 2. Train linear classifier

Do the training to get the calibration weights 'w'

In [57]:
# Taken from `/misc/scratch05/patx/alamja/sre19works/fusion_stuffs/scripts_for_calibration_fusion/fusion_sre19_cmn2_CRIM_v0.m`
prior = 0.005
maxiters = 25
quiet = False
objfun = None
w0 = fuser.w
w, train_cxe, w_pen, optimizerState, converged = train_binary_classifier(
    classifier=fuser, classf=classf, w0=w0,
    objective_function=objfun, prior=prior,
    penalizer=None, penalizer_weight=0,
    maxiters=maxiters, maxCG=100, optimizerState=None, quiet=quiet, cstepHessian=True)
print(f'Scaling: {w[0]}, Offset: {w[1]}')
train_cxe, w_pen, optimizerState, converged

2019-10-11 13:01:45.379 INFO     TR 0 (initial state): obj = 1, Delta = 10.1719
2019-10-11 13:01:45.505 INFO     CG 0: curv=27345.8, converged inside trust region; radius = 0.00378367, residual=0.212558, model=0.195743
2019-10-11 13:01:45.529 INFO     TR 1: obj=0.720254; rho=1.42914
2019-10-11 13:01:45.635 INFO     CG 0: curv=1526.5, converged inside trust region; radius = 0.0115509, residual=0.41897, model=0.331304
2019-10-11 13:01:45.657 INFO     TR 2: obj=0.299232; rho=1.2708
2019-10-11 13:01:45.772 INFO     CG 0: curv=73.9269, converged inside trust region; radius = 0.00797785, residual=0.224558, model=0.0737261
2019-10-11 13:01:45.794 INFO     TR 3: obj=0.207434; rho=1.24513
2019-10-11 13:01:45.899 INFO     CG 0: curv=3.52994, converged inside trust region; radius = 0.00681839, residual=0.387285, model=0.0216939
2019-10-11 13:01:45.917 INFO     TR 4: obj=0.181552; rho=1.19305
2019-10-11 13:01:46.022 INFO     CG 0: curv=0.136314, radius = 0.00349469, residual=1.12016, model=0.00300

Scaling: 0.3061653635786684, Offset: 0.0114315348042734


(0.181552218067861,
 0,
 {'Delta': 0.15893596319922618,
  'y': 0.181552218067861,
  'g': array([-1.92592994e-34, -2.28704181e-34]),
  'hess': functools.partial(<bound method ReplaceHessian.hessian of <pybosaris.calibration.objectives.ReplaceHessian object at 0x11e23c978>>, dy=1, w=array([0.30616536, 0.01143153]))},
 True)

## 3. Calibrate input scores

Create a function handle that will calibrate input scores using the trained weights: 'w'.

In [58]:
train_fuser = LinearFuser(scores=train_scores, w=w)
train_fused_scores = train_fuser.fusion()
train_c = cllr(train_fused_scores[:ntar], train_fused_scores[ntar:])
train_c_min = min_cllr(train_fused_scores[:ntar], train_fused_scores[ntar:])
logging.info('train Cxe = {}, cllr: {}, min: {}'.format(train_cxe, train_c, train_c_min))
train_fuser
train_fused_scores
train_c
train_c_min

2019-10-11 13:01:52.545 INFO     train Cxe = 0.181552218067861, cllr: 0.17854079981480248, min: 0.12304418931467008


<pybosaris.calibration.linear_fuser.LinearFuser at 0x121cdb5f8>

array([ 12.74475409,   1.75607016,   4.80392676, ...,  -6.83571801,
       -10.0726605 ,  -6.42719851])

0.17854079981480248

0.12304418931467008

## 4. Analysis

In [59]:
def calculate_eer(y, y_score, pos):
# y denotes groundtruth scores,
# y_score denotes the prediction scores.

    fpr, tpr, thresholds = roc_curve(y, y_score, pos_label=pos)
    eer = brentq(lambda x : 1. - x - interp1d(fpr, tpr)(x), 0., 1.)
    thresh = interp1d(fpr, thresholds)(eer)

    return eer, thresh, fpr, tpr

# Creates a list of false-negative rates, a list of false-positive rates
# and a list of decision thresholds that give those error-rates.
def ComputeErrorRates(scores, labels):

    # Sort the scores from smallest to largest, and also get the corresponding
    # indexes of the sorted scores.  We will treat the sorted scores as the
    # thresholds at which the the error-rates are evaluated.
    sorted_indexes, thresholds = zip(*sorted(
        [(index, threshold) for index, threshold in enumerate(scores)],
        key=itemgetter(1)))
    sorted_labels = []
    labels = [labels[i] for i in sorted_indexes]
    fnrs = []
    fprs = []

    # At the end of this loop, fnrs[i] is the number of errors made by
    # incorrectly rejecting scores less than thresholds[i]. And, fprs[i]
    # is the total number of times that we have correctly accepted scores
    # greater than thresholds[i].
    for i in range(0, len(labels)):
        if i == 0:
            fnrs.append(labels[i])
            fprs.append(1 - labels[i])
        else:
            fnrs.append(fnrs[i-1] + labels[i])
            fprs.append(fprs[i-1] + 1 - labels[i])
    fnrs_norm = sum(labels)
    fprs_norm = len(labels) - fnrs_norm

    # Now divide by the total number of false negative errors to
    # obtain the false positive rates across all thresholds
    fnrs = [x / float(fnrs_norm) for x in fnrs]

    # Divide by the total number of corret positives to get the
    # true positive rate.  Subtract these quantities from 1 to
    # get the false positive rates.
    fprs = [1 - x / float(fprs_norm) for x in fprs]
    return fnrs, fprs, thresholds

# Computes the minimum of the detection cost function.  The comments refer to
# equations in Section 3 of the NIST 2016 Speaker Recognition Evaluation Plan.
def ComputeMinDcf(fnrs, fprs, thresholds, p_target, c_miss, c_fa):
    min_c_det = float("inf")
    min_c_det_threshold = thresholds[0]
    for i in range(0, len(fnrs)):
        # See Equation (2).  it is a weighted sum of false negative
        # and false positive errors.
        c_det = c_miss * fnrs[i] * p_target + c_fa * fprs[i] * (1 - p_target)
        if c_det < min_c_det:
            min_c_det = c_det
            min_c_det_threshold = thresholds[i]
    # See Equations (3) and (4).  Now we normalize the cost.
    c_def = min(c_miss * p_target, c_fa * (1 - p_target))
    min_dcf = min_c_det / c_def
    return min_dcf, min_c_det_threshold

def compute_actual_cost(scores, labels, p_target, c_miss=1, c_fa=1):
    beta = c_fa * (1 - p_target) / (c_miss * p_target)
    decisions = (scores >= np.log(beta)).astype('i')
    num_targets = np.sum(labels)
    fp = np.sum(decisions * (1 - labels))
    num_nontargets = np.sum(1 - labels)
    fn = np.sum((1 - decisions) * labels)
    fpr = fp / num_nontargets if num_nontargets > 0 else np.nan
    fnr = fn / num_targets if num_targets > 0 else np.nan
    print("act_C : {0:.4f}, at threshold {1:.4f} (p-target={2}, c-miss={3}, "
    "c-fa={4})".format(fnr + beta * fpr, np.log(beta), p_target, c_miss, c_fa))
    return fnr + beta * fpr, np.log(beta), fpr, fnr

In [60]:
import os
import sys
lib_path = '/Volumes/dfs/gen/Misc/data19/projets/multi/sre19_multimedia_asv/multimedia_scoring_software'
sys.path.append(lib_path)
import sre_scorer as sc
import scoring_utils as st

def compute_equalized_act_cost(scores, tar_nontar_labs,
                               p_target, c_miss=1, c_fa=1):
    act_c = 0.
    for p_t in p_target:
        beta = c_fa * (1 - p_t) / (c_miss * p_t)
#        act_c_norm = np.zeros(len(partition_masks))
        fpr, fnr = np.zeros((2, 1))
        _, fpr, fnr = sc.compute_actual_cost(scores,
                                             tar_nontar_labs, p_t,
                                             c_miss, c_fa)
#        act_c += act_c_norm.mean()
        fnr_avg = 0. if np.all(np.isnan(fnr)) else np.nanmean(fnr)
        fpr_avg = 0. if np.all(np.isnan(fpr)) else np.nanmean(fpr)
        act_c += fnr_avg + beta * fpr_avg
    return act_c / len(p_target)

def compute_act_cost(scores, tar_nontar_labs, p_target,
                                 c_miss=1, c_fa=1):
    act_c = 0.
    for p_t in p_target:
        act_c += sc.compute_actual_cost(scores, tar_nontar_labs, p_t,
                                        c_miss=1, c_fa=1)[0]
    return act_c / len(p_target)

def compute_min_cost(scores, labels, ptar, weights=None):
    fnr, fpr = sc.compute_pmiss_pfa_rbst(scores, labels, weights)
    eer = sc.compute_eer(fnr, fpr)
    min_c = 0.
    for pt in ptar:
        min_c += sc.compute_c_norm(fnr, fpr, pt)
    return eer, min_c / len(ptar)

### 4.1 Calibrated

#### vastMinDCF

In [61]:
# c_miss = 1
# c_fa = 1
# p_target = 0.05
classf[classf == -1] = 0

# fnrs, fprs, thresholds = ComputeErrorRates(train_fused_scores, 
#                                            classf)
# mindcf, threshold = ComputeMinDcf(fnrs, fprs, thresholds, p_target, c_miss, c_fa)

# print("minDCF : {0:.4f}, at threshold {1:.4f} (p-target={2}, c-miss={3}, "
#     "c-fa={4})\n".format(mindcf, threshold, p_target,c_miss, c_fa))

vast_eer, vast_min_c = compute_min_cost(train_fused_scores, classf, [0.05])
print("vastMinDCF : {0:.6f}".format(vast_min_c))
print('EER : %.6f%%'%(vast_eer))

vastMinDCF : 0.141142
EER : 0.030019%


#### vastActDCF

In [62]:
vast_act_c = compute_act_cost(train_fused_scores, classf, [0.05])
print("vast16ActDCF : {0:.6f}".format(vast_act_c))

vast16ActDCF : 0.156545


#### sre16MinDCF

In [63]:
eer, min_c = compute_min_cost(train_fused_scores, classf, [0.01, 0.005])
print("sre16MinDCF : {0:.6f}".format(min_c))
print('EER : %.6f%%'%(eer))

sre16MinDCF : 0.250086
EER : 0.030019%


#### sre16ActDCF

In [64]:
# C_norm, thresh, fpr, fnr = compute_actual_cost(train_fused_scores, classf, 0.05)
# C_norm1, thresh1, fpr, fnr = compute_actual_cost(train_fused_scores, classf, 0.01)
# C_norm2, thresh2, fpr, fnr = compute_actual_cost(train_fused_scores, classf, 0.005)
# C_primary = (C_norm1 + C_norm2) / 2
# print("C_primary : {0:.4f}\n".format(C_primary))

act_c = compute_act_cost(train_fused_scores, classf, [0.01, 0.005]) # Same as compute_equalized_act_cost
act_c = compute_equalized_act_cost(train_fused_scores, classf, [0.01, 0.005])
print("sre16ActDCF : {0:.6f}".format(act_c))

sre16ActDCF : 0.273762


### 4.2 Uncalibrated

#### vastMinDCF

In [65]:
# c_miss = 1
# c_fa = 1
# p_target = 0.05
classf[classf == -1] = 0

# fnrs, fprs, thresholds = ComputeErrorRates(train_scores, 
#                                            classf)
# mindcf, threshold_un = ComputeMinDcf(fnrs, fprs, thresholds, p_target, c_miss, c_fa)

# print("minDCF : {0:.4f}, at threshold {1:.4f} (p-target={2}, c-miss={3}, "
#     "c-fa={4})\n".format(mindcf, threshold, p_target,c_miss, c_fa))

vast_eer, vast_min_c = compute_min_cost(train_scores, classf, [0.05])
print("vastMinDCF : {0:.6f}".format(vast_min_c))
print('EER : %.6f%%'%(vast_eer))

vastMinDCF : 0.141142
EER : 0.030019%


#### vastActDCF

In [66]:
vast_act_c = compute_act_cost(train_scores, classf, [0.05])
print("vast16ActDCF : {0:.6f}".format(vast_act_c))

vast16ActDCF : 0.166403


#### sre16MinDCF

In [67]:
eer, min_c = compute_min_cost(train_scores, classf, [0.01, 0.005])
print("sre16MinDCF : {0:.6f}".format(min_c))
print('EER : %.6f%%'%(eer))

sre16MinDCF : 0.250086
EER : 0.030019%


#### sre16ActDCF

In [68]:
# C_norm, thresh, fpr, fnr = compute_actual_cost(train_scores, classf, 0.05)
# C_norm1, thresh1_un, fpr, fnr = compute_actual_cost(train_scores, classf, 0.01)
# C_norm2, thresh1_un, fpr, fnr = compute_actual_cost(train_scores, classf, 0.005)
# C_primary = (C_norm1 + C_norm2) / 2
# print("C_primary : {0:.4f}\n".format(C_primary))

act_c = compute_act_cost(train_scores, classf, [0.01, 0.005]) # Same as compute_equalized_act_cost
act_c = compute_equalized_act_cost(train_scores, classf, [0.01, 0.005])
print("sre16ActDCF : {0:.6f}".format(act_c))

sre16ActDCF : 0.540947


### 4.3 Plot

In [None]:
plt.figure(figsize=(14,10),)
fig = plt.subplot(211)
plt.hist(train_scores[:ntar], bins=200, alpha=0.5,)
plt.hist(train_scores[ntar:], bins=200, alpha=0.5,)
plt.text(threshold_un+0.25, 35000, 'minDCF', fontsize=14, verticalalignment='top')
plt.text(thresh1_un+0.25, 30000, 'act_C (p-target=0.01)', c='green', fontsize=14, verticalalignment='top')
plt.text(thresh2_un+0.25, 25000, 'act_C (p-target=0.005)', c='red', fontsize=14, verticalalignment='top')
plt.axvline(x=threshold_un, c='black', linestyle='--', lw=0.75, label='minDCF: '+str(np.round(threshold,4)))
plt.axvline(x=thresh1_un, c='green', linestyle='--', lw=0.75, label='act_C (p-target=0.01): '+str(np.round(thresh1,4)))
plt.axvline(x=thresh2_un, c='red', linestyle='--', lw=0.75, label='act_C (p-target=0.005): '+str(np.round(thresh2,4)))
plt.title('Uncalibrated Scores')
plt.grid()
# plt.xlim(-15,20);

fig = plt.subplot(212)
plt.hist(train_fused_scores[:ntar], bins=200, alpha=0.5,)
plt.hist(train_fused_scores[ntar:], bins=200, alpha=0.5,)
plt.text(threshold+0.25, 35000, 'minDCF', fontsize=14, verticalalignment='top')
plt.text(thresh1+0.25, 30000, 'act_C (p-target=0.01)', c='green', fontsize=14, verticalalignment='top')
plt.text(thresh2+0.25, 25000, 'act_C (p-target=0.005)', c='red', fontsize=14, verticalalignment='top')
plt.axvline(x=threshold, c='black', linestyle='--', lw=0.75, label='minDCF: '+str(np.round(threshold,4)))
plt.axvline(x=thresh1, c='green', linestyle='--', lw=0.75, label='act_C (p-target=0.01): '+str(np.round(thresh1,4)))
plt.axvline(x=thresh2, c='red', linestyle='--', lw=0.75, label='act_C (p-target=0.005): '+str(np.round(thresh2,4)))
plt.title('Calibrated Scores')
plt.grid()
plt.tight_layout();
# plt.legend()
# plt.xlim(-15,20);

## 5. System Fusing

### 5.1 Preprocess training data

In [None]:
# Load initial scores with labels
dev_scores_labels = pd.read_csv(os.path.join(DATA_PATH_v1, 'js4_scores_sre19_dev_test_cmn2_adapt.txt'), sep=' ', header=None)
dev_scores_labels.columns = ['speaker', 'test', 'score', 'label']
dev_scores_labels = dev_scores_labels.replace({'nontarget': 0, 'target': 1})

# Load actual scores without labels
dev_scores = pd.read_csv(os.path.join(DATA_PATH_2, 'scores_sre19_dev_test_cmn2_adapt'), sep=" ", header=None)
dev_scores.columns = ['speaker', 'test', 'score']

# Labels are missing so we will join the labels from the initial scores
scores = dev_scores.merge(dev_scores_labels[['speaker', 'test', 'label']], on=['speaker', 'test'])
scores.shape
scores.head()

In [None]:
# Split data in target and non-target trials
tar = np.array(scores[scores.label == 1]['score'])
non = np.array(scores[scores.label == 0]['score'])

# Instantiate a LinearFuser
# Function handle for function that must be trained
# Get a starting point for the calibration weights: 'w0',
# which are zeros by default
train_scores = np.hstack((tar, non))
fuser = LinearFuser(scores=train_scores)

# Create label vector
# Let the trainer know which scores are target scores and which are non-target scores. 
ntar = tar.shape[0]
nnon = non.shape[0]
classf = np.hstack((np.ones(ntar), -np.ones(nnon)))

### 5.2 Train linear classifier

Do the training to get the calibration weights 'w'

In [None]:
# Taken from `/misc/scratch05/patx/alamja/sre19works/fusion_stuffs/scripts_for_calibration_fusion/fusion_sre19_cmn2_CRIM_v0.m`
prior = 0.005
maxiters = 50
quiet = False
objfun = None
w0 = fuser.w
w, train_cxe, w_pen, optimizerState, converged = train_binary_classifier(
    classifier=fuser, classf=classf, w0=w0,
    objective_function=objfun, prior=prior,
    penalizer=None, penalizer_weight=0,
    maxiters=maxiters, maxCG=100, optimizerState=None, quiet=quiet, cstepHessian=True)
print(f'Scaling: {w[0]}, Offset: {w[1]}')
train_cxe, w_pen, optimizerState, converged

### 5.3 Calibrate input scores

Create a function handle that will calibrate input scores using the trained weights: 'w'.

In [None]:
train_fuser = LinearFuser(scores=train_scores, w=w)
train_fused_scores_2 = train_fuser.fusion()
train_c = cllr(train_fused_scores_2[:ntar], train_fused_scores_2[ntar:])
train_c_min = min_cllr(train_fused_scores_2[:ntar], train_fused_scores_2[ntar:])
logging.info('train Cxe = {}, cllr: {}, min: {}'.format(train_cxe, train_c, train_c_min))
train_fuser
train_fused_scores_2
train_c
train_c_min

### 5.4 Analysis

In [None]:
eer, thresh, fpr, tpr = calculate_eer(classf, train_fused_scores_2, pos=1)
print('EER : %.2f%%'%(eer*100))

In [None]:
c_miss = 1
c_fa = 1
p_target = 0.05
classf[classf == -1] = 0

fnrs, fprs, thresholds = ComputeErrorRates(train_fused_scores_2, 
                                           classf)
mindcf, threshold = ComputeMinDcf(fnrs, fprs, thresholds, p_target, c_miss, c_fa)

print("minDCF : {0:.4f}, at threshold {1:.4f} (p-target={2}, c-miss={3}, "
    "c-fa={4})".format(mindcf, threshold, p_target,c_miss, c_fa))

In [None]:
C_norm1, thresh1, fpr, fnr = compute_actual_cost(train_fused_scores_2, classf, 0.01)
C_norm2, thresh2, fpr, fnr = compute_actual_cost(train_fused_scores_2, classf, 0.005)
C_primary = (C_norm1 + C_norm2) / 2
print("C_primary : {0:.4f}".format(C_primary))

### 5.5 Merge scores

In [None]:
# Divide scores of each system to target and non-target
tar = train_fused_scores[classf==1]
non = train_fused_scores[classf==0]
tar2 = train_fused_scores_2[classf==1]
non2 = train_fused_scores_2[classf==0]

# Merge target and non-target instances of each system
# Merge to single scores array
tar = np.vstack((tar, tar2))
non = np.vstack((non, non2))
merged_train_scores = np.vstack((tar.T, non.T)).T

# Instantiate Fuser
merged_fuser = LinearFuser(scores=merged_train_scores)

# Labels
ntar = tar.shape[1]
nnon = non.shape[1]
merged_classf = np.hstack((np.ones(ntar), -np.ones(nnon)))

### 5.6 Train Linear Classifier

In [None]:
prior = 0.005
maxiters = 50
quiet = False
objfun = None
w0 = merged_fuser.w
w, train_cxe, w_pen, optimizerState, converged = train_binary_classifier(
    classifier=merged_fuser, classf=merged_classf, w0=w0,
    objective_function=objfun, prior=prior,
    penalizer=None, penalizer_weight=0,
    maxiters=maxiters, maxCG=100, optimizerState=None, quiet=quiet, cstepHessian=True)
# print(f'Scaling: {w[0]}, Offset: {w[1]}')
train_cxe, w_pen, optimizerState, converged

### 5.7 Calibrate input scores

In [None]:
merged_train_fuser = LinearFuser(scores=merged_train_scores, w=w)
merged_train_fused_scores = merged_train_fuser.fusion()
train_c = cllr(merged_train_fused_scores[:ntar], merged_train_fused_scores[ntar:])
train_c_min = min_cllr(merged_train_fused_scores[:ntar], merged_train_fused_scores[ntar:])
logging.info('train Cxe = {}, cllr: {}, min: {}'.format(train_cxe, train_c, train_c_min))
merged_train_fuser
merged_train_fused_scores
train_c
train_c_min

### 5.8 Analysis

In [None]:
eer, thresh, fpr, tpr = calculate_eer(merged_classf, merged_train_fused_scores, pos=1)
print('EER : %.2f%%'%(eer*100))

In [None]:
c_miss = 1
c_fa = 1
p_target = 0.05
merged_classf[merged_classf == -1] = 0

fnrs, fprs, thresholds = ComputeErrorRates(merged_train_fused_scores, 
                                           merged_classf)
mindcf, threshold = ComputeMinDcf(fnrs, fprs, thresholds, p_target, c_miss, c_fa)

print("minDCF : {0:.4f}, at threshold {1:.4f} (p-target={2}, c-miss={3}, "
    "c-fa={4})".format(mindcf, threshold, p_target,c_miss, c_fa))

In [None]:
C_norm1, thresh1, fpr, fnr = compute_actual_cost(merged_train_fused_scores, merged_classf, 0.01)
C_norm2, thresh2, fpr, fnr = compute_actual_cost(merged_train_fused_scores, merged_classf, 0.005)
C_primary = (C_norm1 + C_norm2) / 2
print("C_primary : {0:.4f}".format(C_primary))

In [None]:
plt.figure(figsize=(14,10),)
fig = plt.subplot(111)
plt.hist(merged_train_fused_scores[:ntar], bins=200, alpha=0.5,)
plt.hist(merged_train_fused_scores[ntar:], bins=200, alpha=0.5,)
plt.text(threshold+0.25, 35000, 'minDCF', fontsize=14, verticalalignment='top')
plt.text(thresh1+0.25, 30000, 'act_C (p-target=0.01)', c='green', fontsize=14, verticalalignment='top')
plt.text(thresh2+0.25, 25000, 'act_C (p-target=0.005)', c='red', fontsize=14, verticalalignment='top')
plt.axvline(x=threshold, c='black', linestyle='--', lw=0.75, label='minDCF: '+str(np.round(threshold,4)))
plt.axvline(x=thresh1, c='green', linestyle='--', lw=0.75, label='act_C (p-target=0.01): '+str(np.round(thresh1,4)))
plt.axvline(x=thresh2, c='red', linestyle='--', lw=0.75, label='act_C (p-target=0.005): '+str(np.round(thresh2,4)))
plt.title('Calibrated Scores')
plt.grid()
plt.tight_layout();
# plt.legend()
plt.xlim(-15,20);

### 5.9 Transform eval set

In [None]:
# eval_scores = pd.read_csv(os.path.join(DATA_PATH, 'js4_scores_sre19_eval_test_cmn2_adapt.txt'), sep=' ', header=None)
eval_scores = pd.read_csv(os.path.join(DATA_PATH, 'scores_sre19_eval_test_cmn2'), sep=' ', header=None)
eval_scores.columns = ['speaker', 'test', 'score']
eval_scores.shape
eval_scores.head()

eval_scores2 = pd.read_csv(os.path.join(DATA_PATH_2, 'scores_sre19_eval_test_cmn2_adapt'), sep=' ', header=None)
eval_scores2.columns = ['speaker', 'test', 'score']
eval_scores2.shape
eval_scores2.head()

merged_eval_scores = np.vstack((eval_scores.score, eval_scores2.score))
eval_fuser = LinearFuser(scores=merged_eval_scores, w=w)
merged_eval_fused_scores = eval_fuser.fusion()

In [None]:
plt.figure(figsize=(14,10),)
fig = plt.subplot(311)
plt.hist(eval_scores['score'], bins=200, alpha=0.5,)
plt.text(thresh1_un+0.25, 50000, 'act_C (p-target=0.01)', c='green', fontsize=14, verticalalignment='top')
plt.text(thresh2_un+0.25, 40000, 'act_C (p-target=0.005)', c='red', fontsize=14, verticalalignment='top')
plt.axvline(x=thresh1_un, c='green', linestyle='--', lw=0.75, label='act_C (p-target=0.01): '+str(np.round(thresh1,4)))
plt.axvline(x=thresh2_un, c='red', linestyle='--', lw=0.75, label='act_C (p-target=0.005): '+str(np.round(thresh2,4)))
plt.title('Uncalibrated Scores: '+DATA_PATH.split('/')[-1])
plt.grid()
# plt.xlim(-15,20);

fig = plt.subplot(312)
plt.hist(eval_scores2['score'], bins=200, alpha=0.5,)
plt.text(thresh1_un+0.25, 50000, 'act_C (p-target=0.01)', c='green', fontsize=14, verticalalignment='top')
plt.text(thresh2_un+0.25, 40000, 'act_C (p-target=0.005)', c='red', fontsize=14, verticalalignment='top')
plt.axvline(x=thresh1_un, c='green', linestyle='--', lw=0.75, label='act_C (p-target=0.01): '+str(np.round(thresh1,4)))
plt.axvline(x=thresh2_un, c='red', linestyle='--', lw=0.75, label='act_C (p-target=0.005): '+str(np.round(thresh2,4)))
plt.title('Uncalibrated Scores: '+DATA_PATH_2.split('/')[-1])
plt.grid()
# plt.xlim(-15,20);

fig = plt.subplot(313)
plt.hist(merged_train_fused_scores, bins=200, alpha=0.5,)
plt.text(thresh1+0.25, 50000, 'act_C (p-target=0.01)', c='green', fontsize=14, verticalalignment='top')
plt.text(thresh2+0.25, 40000, 'act_C (p-target=0.005)', c='red', fontsize=14, verticalalignment='top')
plt.axvline(x=thresh1, c='green', linestyle='--', lw=0.75, label='act_C (p-target=0.01): '+str(np.round(thresh1,4)))
plt.axvline(x=thresh2, c='red', linestyle='--', lw=0.75, label='act_C (p-target=0.005): '+str(np.round(thresh2,4)))
plt.title('Calibrated Scores')
plt.grid()
plt.tight_layout();
# plt.legend()
# plt.xlim(-15,20);

### 5.10 Save Outputs

In [None]:
save_name = DATA_PATH.split('/')[-1]+'+'+DATA_PATH_2.split('/')[-1]
eval_df = pd.DataFrame(np.vstack((eval_scores.speaker, eval_scores.test, merged_eval_fused_scores))).T
eval_df.to_csv('/Volumes/dfs/gen/Misc/scratch05/tesd/noiseuce/NIST-SRE/'+save_name+'.csv')

In [None]:
!jupyter nbconvert /Users/noiseuce/Documents/NIST-SRE/Calibration-No-Normalization.ipynb --to html --output /Volumes/dfs/gen/Misc/scratch05/tesd/noiseuce/NIST-SRE/$save_name-Calibration_No-Normalization.html

In [None]:
with open('/Volumes/dfs/gen/Misc/scratch05/tesd/noiseuce/NIST-SRE/'+save_name+'.txt', mode='w') as f:
    f.write('train_cxe: {}\nw_pen: {}\noptimizerState: {}\n'. format(train_cxe, w_pen, optimizerState))
    f.write("minDCF : {0:.4f}, at threshold {1:.4f} (p-target={2}, c-miss={3}, "
    "c-fa={4})\n".format(mindcf, threshold, p_target,c_miss, c_fa))
    f.write('EER : %.2f%%\n'%(eer*100))
    f.write("C_primary : {0:.4f}".format(C_primary))