# Performance on Ethical Supervised Learning with TransRisk Scores

<hr/>

## Overview:
- Measuring performance of supervised learning predictors that are:
    - classification problems
    - binary predictors
- Introduction to non-discriminatory supervised learning predictors
- Charting performance of a TransRisk score case study to determine if it passes as non-discriminatory

<hr/>

## Part 1: Measuring Performance on Binary Classifiers
While are many ways to calculate the performance of a binary predictor, two methods are particularly useful when combined:
<ul>
<li><i>Precision</i>:
<br/> - Among the 1's we predict, how many were actually 1?
</li>
<li><i>Recall</i>:
<br/> - Among all of the actual 1's, what percentage did we predict were 1?
</li>
</ul>

<hr/>

## Part 2: A Brief Introduction to Non-Discriminatory Machine Learning Predictors
For companies that use classification based predictors, sometimes the predicted outcome of individuals within a group will fully influence the decision that is made for that individual. This needs to be treated particularly carefully when the decision being made is a <i>Social Benefit</i> - ie) health care, loan approval, or college admission. What if the data that is being used to train the model is inherently discriminatory? What if factors that created the data we use was inherently discriminatory and we didn't even know? Then the outcome predicted would also be discriminatory.<br/><br/>
This is what non-discriminatory predictors seek to solve. While there are many models to use, we will be focusing on <b>The Equal Opportunity Model</b>. This means, for each group - the true positive rate is the same. What does this mean in terms of performance for binary classifiers? (Write in terms of 1's an 0's below)

** Write Answer Here: **

<hr/>

## Part 3: Introducing the TransRisk Dataset
For this tutorial, we will be working with a dataset that represents the distribution of TransRisk scores for non-defaulters (people who have previously paid off their loans on time) against four main demographic groups: Asian, Hispanic, Black, and White. Go ahead and import this data to take a look. What collected information to create TransRisk scores could be inherently discriminatory?

In [7]:
import pickle
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.lines as mlines
%matplotlib inline
white_non_default = pd.read_csv("white-non-default.csv")
asian_non_default = pd.read_csv("asian-non-default.csv")
black_non_default = pd.read_csv("black-non-default.csv")
hispanic_non_default = pd.read_csv("hispanic-non-default.csv")

For loan approval, usually a bank will set a <b>threshold TransRisk score</b> that determines who is approved and who is denied. For example, if the threshold was 650: everyone with a TransRisk score below 650 would be denied the loan, and everyone with a TransRisk score above 650 would be approved a loan.
Theoretically, with equal-opportunity the probability of a non-defaulter getting approved a loan ($\hat Y$ = 1) at any threshold TransRisk score should be the same amongst all four groups. Finish the function below to plot the distribution of non-defaulters from one group getting ($\hat Y$ = 1) based on a threshold value of TransRisk scores. Then, get the probabilities for all four demographic groups and plot them on top of eachother.

In [8]:
def getGraphData(dataset, metricName, graphType):
    i= 0
    x = []
    y = []
    while(i < 100.5):
        # our dataset doesn't include these scores so this line is necessary
        if(i == 72.5 or i == 77.5 or i == 92.5):
            i = (i + 0.5)
        # create and append the x and y values to the x and y arrays to be returned for the plot here:
        
        
        
        i = (i + 0.5)
    plt.plot(x, y, graphType, label=metricName)

** Plot Graph Below **

** Calculating Precision and Recall **<br/>
Now that we've seen the likelihood of non-defaulting individuals from each of the four demographic groups to be approved a loan based on threshold value, let's check the performance of this model. First, we need to import the distribution of TransRisk scores for <i>defaulters</i> (people who don't historically pay loans on time) of these four groups.

In [9]:
white_default = pd.read_csv("white-default.csv")
asian_default = pd.read_csv("asian-default.csv")
black_default = pd.read_csv("black-default.csv")
hispanic_default = pd.read_csv("hispanic-default.csv")

Create a pandas dataframe that includes all four groups (defaulting and non-defaulting) and transRisk scores. With each score acting as the <i>threshold</i> value, for each group - determine the precision score and the recall score for that threshold. Are there any significant discrepancies you notice between groups?

<hr/>
## Conclusion

As our research shows, it is very obvious that the data involved in creating the supervised learning predictors for loan approval from TransRisk scores is inherently discriminatory. What are some other possible solutions for optimizing performance of these models to ensure non-discriminatory decision making? 
<br/><br/>
TransRisk data and non-discriminatory analysis courtesy of https://arxiv.org/pdf/1610.02413.pdf