# Evaluation Metrics: GINI & KS 

<img src="img/bannerlogo.jpg" width=400 height=400 align="center">

**Explore Data Science Academy**

**Author:** [Marang Mutloatse](https://www.linkedin.com/in/marangmutloatse/)

**Email:** <marangmutloatse@gmail.com> 

**Last Revision**: 23 August 2020

**version**: 1.2

- **For the best viewing experience, one should [view this notebook in nbviewer].**
- **With no local installation required, you can also [execute the code in this notebook on Binder]**.

## Learning Objectives

[[ go back to the top ]](#Table-of-contents)
- buwbb
- kbfwi
 - iwbqfib
 - oqwifb

## Outline
[[ go back to the top ]](#Table-of-contents)
- fallen heros
- waving not 

## Introduction

[[ go back to the top ]](#Table-of-contents)

[[ go back to the top ]](#Table-of-contents)

Where:
- **TP** = True Positive - True Positive - It is 1 and I rate it as 1.
- **TN** = True Negative - True Negative - It is 0 and I rate it as 0.
- **FN** = False Negative - False Negative.
- **FP** = False Positive - Positive False.


* Accuracy (accuracy) answers the question, *What is the proportion of correct predictions?*

\begin{equation*}
accuracy =
\frac{( TP + TN )} {Total ( TP + TN + FP + FN)}
\end{equation*}

* Sensitivity (recall) or Percent Support (support) answers the question, *What proportion of real positives have been correctly predicted?*
\begin{equation*}
recall =
\frac{( TP )} {( TP + FN)}
\end{equation*}



* Precision (Confidence) responds to the question *What proportion of my positive predictions is correct?*
\begin{equation*}
precision =
\frac{( TP )} {( TP + FP)}
\end{equation*}

Note that sensitivity and accuracy are defined here as proportion of real positives and proportion of positive predictions.

### F1-score

The F1-score is a classifying metric that calculates a mean of accuracy and recall in a way that emphasizes the lowest value.

It is calculated as the harmonic average of precision and recall, where an F1-score reaches its best value at 1 (perfect accuracy and reminder) and the worst at 0.


### Harmonic Average

The harmonic mean is defined as the inverse of the arithmetic mean of the inverses. Because of that, the result is not sensitive to extremely large values.



## Installation and Required Libraries

[[ go back to the top ]](#Table-of-contents)

This notebook is based on *python 3x* and uses several Python packages that come standard with the Anaconda Python distribution. The primary libraries that we'll be using are:

- [NumPy](https://numpy.org/install/)
- [pandas](https://pandas.pydata.org/pandas-docs/version/0.17.0/install.html)
- [scikit-learn](https://scikit-learn.org/stable/install.html)
- [matplotlib](https://matplotlib.org/3.3.1/users/installing.html)
- [warnings](https://docs.python.org/3/library/warnings.html)

To ensure you have all of the packages to run the notebook, install them with `conda`:

    conda install numpy pandas scikit-learn matplotlib warnings

`Conda` may prompt an update if the most recent version is not installed. Allow it to do so.

**Note:** Ideally in an end-to-end Machine learning project, one would create an [environment](https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html) with the packages and the correct versions for the specific model. To see more about MLOps and the machine learning pipeline, read [here](https://christophergs.com/machine%20learning/2019/03/17/how-to-deploy-machine-learning-models/).

## Import Libraries

[[ go back to the top ]](#Table-of-contents)

In [9]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

#Suppressing warnings
import warnings
warnings.filterwarnings('ignore')

## Data Check

[[ go back to the top ]](#Table-of-contents)

In [10]:
df = pd.read_csv("https://gist.github.com/maz2198/bddb267dcc4ee16f029e74aaea196900")
df.index = df.iloc[:,0]
df = df.drop(df.columns[0], axis=1)

ParserError: Error tokenizing data. C error: Expected 1 fields in line 33, saw 3


## Train-test-split

[[ go back to the top ]](#Table-of-contents)

In [7]:
from sklearn.metrics import roc_auc_score

def calculate_gini_score(a,b):
    """Function that receives two parameters; first: 
    a binary variable representing 0=good and 1=bad, and then a second variable with the prediction of the first variable, 
    the second variable can be continuous, integer or binary - continuous is better. Finally, the function returns the 
    GINI Coefficient of the two lists."""    
    gini = 2*roc_auc_score(a,b)-1
    return gini

import pandas as pd
def calculate_KS(b,a):  
    """Function that received two parameters; first: a binary variable representing 0=good and 1=bad, and then a second 
    variable with the prediction of the first variable, the second variable can be continuous, integer or binary - 
    continuous is better. Finally, the function returns the KS Statistics of the two lists."""
    try:
        tot_bads=1.0*sum(b)
        tot_goods=1.0*(len(b)-tot_bads)
        elements = zip(*[a,b])
        elements = sorted(elements,key= lambda x: x[0])
        elements_df = pd.DataFrame({'probability': b,'gbi': a})
        pivot_elements_df = pd.pivot_table(elements_df, values='probability', index=['gbi'], aggfunc=[sum,len]).fillna(0)
        max_ks = perc_goods = perc_bads = cum_perc_bads = cum_perc_goods = 0
        for i in range(len(pivot_elements_df)):
            perc_goods =  (pivot_elements_df.iloc[i]['len'] - pivot_elements_df.iloc[i]['sum']) / tot_goods
            perc_bads = pivot_elements_df.iloc[i]['sum']/ tot_bads
            cum_perc_goods += perc_goods
            cum_perc_bads += perc_bads
            A = cum_perc_bads-cum_perc_goods
            if abs(A['probability']) > max_ks:
                max_ks = abs(A['probability'])
    except:
        max_ks = 0
    return max_ks

# Conclusion

[[ go back to the top ]](#Table-of-contents)

# Further Reading

[[ go back to the top ]](#Table-of-contents)

# Optional Exercises

[[ go back to the top ]](#Table-of-contents)

1. Knowing what we know about `GINI` and `KS`, what would be the appropriate metric (**f1**, **accuracy**, **mcc** etc.) to use when performing cross validation? If possible, use several different classification models (Random Forest, Logit and Decision Tree) and [plot_comparisons](https://scikit-learn.org/stable/auto_examples/model_selection/plot_learning_curve.html). **Here is some code and few examples on plotting the [roc-auc curve](https://scikit-learn.org/stable/auto_examples/model_selection/plot_roc.html).**

2. How/Is, the use of `GINI` and `KS` effective in multiclass classification problems? **Note: Focus on a specific industry and use case, this will ease answering this question**