# An Introduction to Ethical Supervised Learning

<hr/>

## Overview:
- Measuring performance of supervised learning predictors that are:
    - classification problems
    - binary predictors
- Introduction to non-discriminatory supervised learning predictors
- Charting performance of a TransRisk score case study to determine if it passes as non-discriminatory

<hr/>

## Part 1: Measuring Performance on Binary Classifiers
While are many ways to calculate the performance of a binary predictor, two methods are particularly useful for fairness models:
<ul>
<li><i>Sensitivity</i>:
<br/> - True Positive Rate
<br/> - Among all of the actual 1's, what percentage did we predict were 1?
</li>
<li><i>Specificity</i>:
<br/> - True Negative Rate
<br/> - Among all of the actual 0's, what percentage did we predict were 0?
</li>
</ul>

<hr/>

## Part 2: A Brief Introduction to Non-Discriminatory Machine Learning Predictors
For companies that use classification based predictors, sometimes the predicted outcome of individuals within a group will fully influence the decision that is made for that individual. This needs to be treated particularly carefully when the decision being made is an <i>Important Benefit</i> - ie) health care, loan approval, or college admission. What if the data that is being used to train the model is inherently discriminatory? What if factors that created the data we use was inherently discriminatory and we didn't even know? Then the outcome predicted would also be discriminatory.<br/><br/>
This is what non-discriminatory predictors seek to solve. For example, <b>The Equal Opportunity Model</b> requires that the true positive rate for all groups in a dataset to be the same in order to achieve fairness. What does this mean in terms of performance for binary classifiers? (Write in terms of 1's an 0's below)

** Write Answer Here: **

<hr/>

## Part 3: Introducing the TransRisk Dataset
For this part of the tutorial, we will be working with a dataset that represents the distribution of TransRisk scores for non-defaulters (people who have previously paid off their loans on time) against four main demographic groups: Asian, Hispanic, Black, and White. Go ahead and import this data to take a look. What collected information to create TransRisk scores could be inherently discriminatory?

In [1]:
import pickle
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.lines as mlines
%matplotlib inline
totalData = pd.read_csv("TransRiskScores.csv")

For loan approval, usually a bank will set a <b>threshold TransRisk score</b> that determines who is approved and who is denied. For example, if the threshold was 60: everyone with a TransRisk score below 60 would be denied the loan, and everyone with a TransRisk score above 60 would be approved a loan.
<br/><br/>
How should a predictor go about deciding who should get a loan and who should not? It makes sense to say all of the people who <i>deserve</i> a loan should receive one. In the case of the TransRisk score, the group of people who <i>deserve</i> a loan would be the non-defaulters. 
<br/><br/>
Following this logic, in theory the probability of a non-defaulter getting a loan ($\hat Y$ = 1) at any threshold TransRisk score should be the same amongst all four groups. Finish the function below to plot the distribution of non-defaulters from one group getting ($\hat Y$ = 1) based on a threshold value of TransRisk scores. Then, get the probabilities for all four demographic groups and plot them on top of eachother.

In [2]:
def getGraphData(dataset, metricName, graphType):
    i= 0
    x = []
    y = []
    while(i < 100.5):
        # our dataset doesn't include these scores so this line is necessary
        if(i == 72.5 or i == 77.5 or i == 92.5):
            i = (i + 0.5)
        # create and append the x and y values to the x and y arrays to be returned for the plot here:
        
        
        
        i = (i + 0.5)
    plt.plot(x, y, graphType, label=metricName)

** Plot Graph Below **

<hr/>

##  Calculating Performance
Now that we've seen the likelihood of non-defaulting individuals from each of the four demographic groups to be approved a loan based on threshold value, let's check the performance of this model. Take a look at the original data again.

In [3]:
totalData.head()

Unnamed: 0,TransRisk Score,Demographic,Good,Bad
0,0.0,white,0.0,0.17
1,0.5,white,0.03,1.85
2,1.0,white,0.22,7.26
3,1.5,white,0.26,8.85
4,2.0,white,0.35,10.58


** Complete : Calculate the following problem for both the White and Black demographic groups with a threshold TransRisk score of 60**

For all of the individuals that <i>deserve</i> a loan, how many will receive one?

<hr/>

### Analysis: 

What you just calculated is the <b>sensitivity</b> of the White and Black demographic groups for a single threshold value (60). If you recall from the beginning of this tutorial, for the Equalized Opportunity fairness model the main requirement for achieving fairness is to ensure that recall/sensitivity is the same for all groups. As we saw in our plot from Part 3, Equalized Opportunity was definitely not being satisfied. So, how might we go about finding an easy solution to this problem? The answer lies in utilizing this performance metric. To satisfy the requirement, we simply need to find the intersection point where all groups have the same sensitivity.

<b> Create a plot representing the sensitivity and TransRisk scores in order to see the intersection point where all demographic groups have the same sensitivity</b>

## Next Steps:

Ideally we would use these four different values as each group’s threshold during the final decision process. As mentioned before, the Equality of Opportunity model requires the same sensitivity of all groups for its fairness requirements. Using these thresholds would satisfy those requirements, and allow us to label this predictor as Fair Under Equalized Opportunity.

For our case study, the predictor that achieves fairness under Equalized Opportunity would have these thresholds for each demographic group:

In [4]:
### Put four threshold values here

<hr/>
## Conclusion

As our research shows, we discovered that the data involved in creating the supervised learning predictors for loan approval from TransRisk scores is inherently discriminatory. What are some other possible solutions for optimizing performance of these models to ensure non-discriminatory decision making? 
<br/><br/>
TransRisk data and non-discriminatory analysis courtesy of https://arxiv.org/pdf/1610.02413.pdf