#Mitigate Machine Learning Bias: Shap and AIF360

This notebook represents my personal code, notes, and reflections for the Manning liveProject titled "Mitigate Machine Learning Bias: Shap and AIF360" by Michael McKenna. 

#### Copyright 2020 Jeff Nirschl

In [3]:
#@title Licensed under the MIT License (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://opensource.org/licenses/MIT
#
# Copyright 2020 Jeffrey J. Nirschl
# 
# Permission is hereby granted, free of charge, to any person obtaining a copy of 
# this software and associated documentation files (the "Software"), to deal in th
# e Software without restriction, including without limitation the rights to use, 
# copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the 
# Software, and to permit persons to whom the Software is furnished to do so, subj
# ect to the following conditions:
# 
# The above copyright notice and this permission notice shall be included in all c
# opies or substantial portions of the Software.
# 
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLI
# ED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR 
# A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYR
# IGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN 
# ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WIT
# H THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

# Milestone 2


## Performance metrics
* **Precision (P)**: $\frac{TP}{TP + FP}$
  * Also known as the *positive predictive value*
* **Recall (R)**: $\frac{TP}{TP + FN}$
  * Also known as the *sensitivity* or *true positive rate*
* **$F_{1}$ score**:$\frac{2 \cdot P R}{P + R}$
  *  *Harmonic mean* of precision and recall.
* **$F_{\beta}$ score**: $F_{\beta}=\frac{(1+\beta^{2})PR}{\beta^{2}PR}$
  * Parameterized F1 score to weight precision or recall
    * If $\beta=1:$ Equal weight to precision and recall (e.g.,  $F_{\beta}=F_{1}$)
    * If $\beta<1:$ More weight to **precision**
    * If $\beta>1:$ More weight to **recall**

## Bias metrics
### Definitions
* **Average odds difference**:
* **Equal opportunity difference**
* **Disparate impact**: ratio of favorable outcome for the *unpriviledged* group relative to the *priviledged* group. A value of one is no disparate impact whereas values less than one favor the *priviledged* group and values greater than one favor the *unprivileged* group.
* **Statistical Parity Difference**:
* **Theil Index**:
* **Allocation harm**:
* **Algorithmic bias**:


## [Google Machine Learning Crash Course: Fairness](https://developers.google.com/machine-learning/crash-course/fairness/types-of-bias)

### Human biases

#### Human biases *in data*
* **Reporting bias**: data is not an accurate reflection of real-world outcomes, events, or frequencies
* **Selection bias**: when examples are not a random sample of the intended population. The act of observing an event itself can be a form of selection when a non-random process is correlated with the ability to be observed or measured. Thus, it has been noted that "observation is selection" [(Whitehead, 1925)](https://https://archive.org/stream/sciencemodernwor00whit)
  * *Coverage bias*: data not selected in a random manner
  * *Non-response bias (aka participation bias)*: data is not representative due to systematic differences between participants and non-participants
  * *Sampling bias*: proper randomization not used during data collection
  * *Survivorship bias*: 
  * Abraham Wald and [survivorship bias in World War II aircraft](https://clearthinking.co/survivorship-bias/)
* **Overgeneralization**
* **Out-group homogeneity bias**: tendency to assume people in "out-groups" are more similar than our own group 
* **Systematic or structural biases**
  * Biases that exist in the world and are embedded in the data we collect (e.g., racial injustice)

#### Human biases in *collection and annotation*
* **Confirmation bias**: preferring information or evidence that confirms pre-existing beliefs
* **Automation bias**: the tendency to prefer suggestions from automated systems
* **Unconscious biases**: other unconscious biase that can extend into our AI 



### Designing for fairness
1. Consider the problem
  * Should the problem or task exist in the first place?
  * What is represented in the dataset? What is overlooked?
2. Ask experts
3. Train the models to account for bias
  * What are outliers in the data? How does the algorithm handle outliers?
  * What implicit assumptions does the model make about the data or task?
4. Interpret outcomes
  * Is the ML system over-generalizing?
  * How would a human perform the task? 
5. Publish with *context*
  * How should the technology be used?

### Identifying bias

#### Missing features
Missing data occurs for a variety of reasons, however it has the potential to introduce bias if the data is missing due to a non-random process. An adequate description of the number of missing values as well as the percent of missing values per class is a first step to check if data are missing due to systematic factors or confounders. 

Check for missing values in a Pandas DataFrame:
```
DataFrame.describe()
DataFrame.info()
```

Even if the missing values are at random, we must consider how our learning algorithm handles missing values. Does the algorithm or workflow impute missing values etc.? Additional data exploration may be necessary to rule out other biases that can be introduced by missing features.

\\
#### Unexpected feature values
Extreme or unexpected feature values can indicate problems in the data or other inaccuracies that could introduce bias. Understanding the problem domain and characteristics of the data will help reveal errors in the data or unrealistic feature values.

### [Google Responsible AI Practices](https://ai.google/responsibilities/responsible-ai-practices/)