# *Introduction* : 

I've seen recently **tomcwalker** [Kernel](https://www.kaggle.com/tomcwalker/keras-nn-with-custom-loss-function-for-gini-auc) that introduced a Differentiable Estimate of the AUC called the **Rank Statistic** that can be used as a loss to be optimized by our models. I did really like the approach and I went on exploring the maths involved and how to implement it using the **Numpy** Library. You'll see that the presentation lacks style compared to the other notebooks but this notebook will be my first in Kaggle, so feel free to comment/share your thoughts about it.

# AUC Mathematical Definition :

The AUC and GINI are quite similar in term of interpretation, both quantifies the power of a classifer to rank observations. Mathematicaly speaking, he AUC of a classifier is equal to the probability that the classifier will rank a randomly chosen positive example higher than a randomly chosen negative example, which can be written :

$$AUC= P( Classifier (x+) > Classifier (x−) )$$

The Classifier here means our predictive function, the function that maps the observations to the targets.

# The Empirical AUC Metric

To calculate the $AUC= P( Classifier (x+) > Classifier (x−) )$ based on our observations, we should define an unbiased estimator to approach the probability with the data we have under our hands. A good estimator that we can generate from Data is :

$$\hat{AUC} = \frac{1}{PN}\sum_{i=1}^{P}\sum_{j=1}^{N}H(Cl(x_i+) - Cl(x_j-))$$

* With **P** and **N** are respectively the number of True positive labeled points and True negative labeled points
* **H** is the **Heaviside** function.
* **$x_i+$** and $x_j-$ are respectively positive labeled obsevations and negative labeled observations

You can already see why the AUC isn't used as a loss function to optimize. the **Heaviside** function isn't differentiable and cannot be plugged to our gradient based optimization algorithms. but we will discuss this later.

The next blocks will explain how to implement the AUC metric with a loop approach and a broadcasting approach.

In [2]:
import numpy as np

#the true classes of our dataset
y_true = np.random.binomial(n= 1, p= 0.05, size = 5000)
#the predicted probabilities
y_pred = np.random.random(size = 5000)
#Define the heaviside function
heaviside = np.vectorize(lambda x : 0 if x<0 else .5 if x == 0 else 1)

#the loop implementation that matches with the sum definition
def Loop_AUC(y_true, y_pred):
    #the predictions of our classifier for all the positive labeled data
    pos_pred = y_pred[ y_true == 1]
    P = len(pos_pred) #the number of the positive population
    
    #the predictions of our classifier for all the negative labeled data
    neg_pred = y_pred[ y_true == 0]
    N = len(neg_pred) #the number of the negative population
    
    AUC = 0
    for pos in pos_pred :
        for neg in neg_pred :
            AUC +=  heaviside(pos - neg)
    
    return AUC/(P*N)

print('The AUC of a random Classifier is :', Loop_AUC(y_true, y_pred))

In [3]:
#this is a more optimized approach that uses the broadcasting method of numpy that is much faster than
#the loop version

def Broadcasted_AUC(y_true, y_pred):
    #the predictions of our classifier for all the positive labeled data
    pos_pred = y_pred[ y_true == 1]
    
    #the predictions of our classifier for all the negative labeled data
    neg_pred = y_pred[ y_true == 0]
    #creates a matrix that have pairwise difference between all the observations
    pairwise_matrix = pos_pred[:, np.newaxis] - neg_pred
    transform = heaviside(pairwise_matrix)
    
    return transform.mean()

print('The AUC of a Random Classifier is :', Broadcasted_AUC(y_true, y_pred))

That's cool right ? But how can we define a differentiable estimate ? The problem of the AUC formula being not differentiable is coming from the **Heaviside** function ( not even continuous ), which can be seen in the plot below.

In [4]:
import matplotlib.pyplot as plt 
import seaborn as sns
sns.set()

X = np.linspace(-10,10,num= 100)
Y = heaviside(X)

plt.title('the Heaviside Function')
plt.plot(X,Y)
plt.show()

Thus, we need to replace the heaviside function with a differentiable approximative function. The Heaviside function can be seen as an **Extreme** Sigmoid function, $$ Sigmoid : x,\lambda \to \frac{1}{1 - \exp^{- \lambda x}} $$ that have an extreme value of $\lambda$.



In [5]:
def param_sigmoid(x,alpha):
    return 1/(1+ np.exp(-alpha*x))

fig ,ax = plt.subplots(ncols=3, sharey = True, figsize = (12,7))

weak, normal_sigmoid, extreme_sigmoid = 0.1, 1, 10

ax[0].plot(X,param_sigmoid(X,weak))
ax[0].set_title('weak sigmoid')

ax[1].plot(X,param_sigmoid(X,normal_sigmoid))
ax[1].set_title('normal sigmoid')

ax[2].plot(X,param_sigmoid(X,extreme_sigmoid))
ax[2].set_title('extreme sigmoid')

plt.show()

You can see that a $\lambda = 10$ is sufficient to approximately estimate the **Heaviside** function. So we're gonna define our New loss function which is called the **Rank Statistic** in this [paper](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.2.3727&rep=rep1&type=pdf).

In [6]:
def Rank_Statistic(y_true, y_pred):
    #the predictions of our classifier for all the positive labeled data
    pos_pred = y_pred[ y_true == 1]
    
    #the predictions of our classifier for all the negative labeled data
    neg_pred = y_pred[ y_true == 0]
    #creates a matrix that have pairwise difference between all the observations
    pairwise_matrix = pos_pred[:, np.newaxis] - neg_pred
    transform = param_sigmoid(pairwise_matrix, 10)
    
    return transform.mean()

AUC = Broadcasted_AUC(y_true, y_pred)
Rank_stat = Rank_Statistic(y_true, y_pred)

print('The Rank Statistic of Random Classifier is :', Rank_stat)
print("The difference between the AUC and it's differentiable estimation is :", np.abs(AUC-Rank_stat))

We can see that the difference isn't huge and we can try to optimize our models by using the Rank Statistic Loss. I'll try to develop the notebook by adding an implementation of the **Rank Statistic** using **Tensorflow** so you can try to tune your neural networks using this loss function.

Please share your thoughts about it !

# To Be continued ..