# Machine Learning Fairness

_**Note**:  the goal of this part of the assignment is to understand the kinds of biases that commonly sneak into machine learned systems and a handful of techniques to improve standard modeling.  While we hope you find this instructive, we empathize that these research results may negatively affect some students.  Please reach out to the teaching staff if you have serious concerns for alternate arrangements._

From simple count-based models to the most complex neural architectures, machine learning models are ultimately nothing more than the product of the signals and labels in the training set.  That these tools can so effectively mimic and generalize from the training set distribution is the key to why they are so useful in so many applications.

This powerful ability to fit a data is a double edged sword.  Unfortunately, the real world is filled with inequality, unfairness and stereotypes.  When the signals and labels systemically capture these aspects of the world, the powerful ability to generalize has other names: bias.  This bias can take many forms:  a minority group of entries in the training set would be underrepresented (the loss function is incented to produce a model that works better on the majority at the expense of the minority) or predictions may be systemically biased against a protected group (i.e. the model learns to predict the protected label and from that the actual prediction rather than learning the prediction directly).

In this part of the assignment, we will take a look at a few nice analyses that discuss this bias. Below are a few questions about these papers.

- [How to make a racist AI without really trying](http://blog.conceptnet.io/posts/2017/how-to-make-a-racist-ai-without-really-trying/)
- [Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings](https://arxiv.org/pdf/1607.06520.pdf)
- [Data Decisions and Theoretical Implications when Adversarially Learning Fair Representations](https://arxiv.org/pdf/1707.00075.pdf)

**Again, please be sure to put your answers in the "answers" file!**

## Questions about the Racist AI

1.  In [Step 5](http://blog.conceptnet.io/posts/2017/how-to-make-a-racist-ai-without-really-trying/#Step-5:-Behold-the-monstrosity-that-we-have-created), the author shows that substituting a type of cuisine into a fixed sentence significantly changes the overall sentiment score of their model.  What is the difference in sentiment score between the word ```Italian``` and ```Mexican``` (not the difference in the whole sentence!), assuming that embeddings for all words in the sentence are found in GloVe.?

2. Rank GloVe, Word2Vec, and ConceptNet Numberbatch by ethnic bias as defined by the author?

3. What technique does the author apply to achieve that lower bias?

4. How significant is the penalty to model performance on accuracy when using debiased vectors instead of biased ones?

## Questions about Debiasing Word Embeddings

Word embeddings are commonly used in deep neural networks to solve analogy tasks (see the Embddings worksheet, the corresponding sections in both [Word2Vec](https://papers.nips.cc/paper/5021-distributed-representations-of-words-and-phrases-and-their-compositionality.pdf) and [GloVe](https://nlp.stanford.edu/pubs/glove.pdf)).  This paper quickly reintroduces that task, then continues to explore the analogy task with additional tuples that illustrate the bias that these vectors have picked up.

1.  Why are the results of Table 1 important?

### Questions about Adversarial Learning

1.  What is the equality gap measure?
2.  What is the parity gap measure?
3.  What is the intuition behind $J_{\lambda}$?

In [1]:
import numpy as np
import pandas as pd
import matplotlib
import seaborn
import re
import statsmodels.formula.api

from sklearn.linear_model import SGDClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

In [2]:
# Install a few python packages using pip
from w266_common import utils
utils.require_package("wget")      # for fetching dataset

# Standard python helper libraries.
import os, sys, re, json, time
import itertools, collections
from importlib import reload
from IPython.display import display

# NumPy and SciPy for matrix ops
import numpy as np
import scipy.sparse

# NLTK for NLP utils
import nltk

# Helper libraries
from w266_common import utils, vocabulary, tf_embed_viz

In [3]:
# Configure how graphs will show up in this notebook
%matplotlib inline
seaborn.set_context('notebook', rc={'figure.figsize': (10, 6)}, font_scale=1.5)

In [4]:
import glove_helper; reload(glove_helper)

hands = glove_helper.Hands(ndim=100)  # 50, 100, 200, 300 dim are available

#embeddings = load_embeddings('data/glove.42B.300d.txt')
#embeddings = load_embeddings('data/glove.6B.50d.txt')
embeddings = hands
embeddings.shape

Loading vectors from data/glove/glove.6B.zip
Parsing file: data/glove/glove.6B.zip:glove.6B.100d.txt
Found 400,000 words.
Parsing vectors... Done! (W.shape = (400003, 100))


(400003, 100)

In [5]:
def load_lexicon(filename):
    """
    Load a file from Bing Liu's sentiment lexicon
    (https://www.cs.uic.edu/~liub/FBS/sentiment-analysis.html), containing
    English words in Latin-1 encoding.
    
    One file contains a list of positive words, and the other contains
    a list of negative words. The files contain comment lines starting
    with ';' and blank lines, which should be skipped.
    """
    lexicon = []
    with open(filename, encoding='latin-1') as infile:
        for line in infile:
            line = line.rstrip()
            if line and not line.startswith(';'):
                lexicon.append(line)
    return lexicon

pos_words = load_lexicon('data/positive-words.txt')
neg_words = load_lexicon('data/negative-words.txt')

In [12]:
#pos_vectors = embeddings.loc[pos_words].dropna()
#neg_vectors = embeddings.loc[neg_words].dropna()

pos_vectors = []
neg_vectors = []

for i in pos_words:
    if i in embeddings.vocab:
        temp_vec = embeddings.get_vector(i)
        pos_vectors.append(temp_vec)
    
for j in neg_words:
    if j in embeddings.vocab:
        temp_vec = embeddings.get_vector(j)
        neg_vectors.append(temp_vec)

In [13]:
pos_vectors = pd.DataFrame(pos_vectors)
neg_vectors = pd.DataFrame(neg_vectors)

vectors = pd.concat([pos_vectors, neg_vectors])
targets = np.array([1 for entry in pos_vectors.index] + [-1 for entry in neg_vectors.index])
labels = list(pos_vectors.index) + list(neg_vectors.index)

In [14]:
train_vectors, test_vectors, train_targets, test_targets, train_labels, test_labels = \
    train_test_split(vectors, targets, labels, test_size=0.1, random_state=0)

In [16]:
model = SGDClassifier(loss='log', random_state=0, max_iter=100)
model.fit(train_vectors, train_targets)

SGDClassifier(loss='log', max_iter=100, random_state=0)

In [17]:
accuracy_score(model.predict(test_vectors), test_targets)

0.8942307692307693

In [23]:
def vecs_to_sentiment(vecs):
    # predict_log_proba gives the log probability for each class
    predictions = model.predict_log_proba(vecs)

    # To see an overall positive vs. negative classification in one number,
    # we take the log probability of positive sentiment minus the log
    # probability of negative sentiment.
    return predictions[:, 1] - predictions[:, 0]


def words_to_sentiment(words):
    #vecs = embeddings.loc[words].dropna()
    
    vecs = []
    for i in words:
        if i in embeddings.vocab:
            temp_word = embeddings.get_vector(i)
            vecs.append(temp_word)
    
    vecs = pd.DataFrame(vecs)
    log_odds = vecs_to_sentiment(vecs)
    return pd.DataFrame({'sentiment': log_odds}, index=vecs.index)


# Show 20 examples from the test set
#words_to_sentiment(test_labels).ix[:20]
words_to_sentiment(test_labels).iloc[:20]

Unnamed: 0,sentiment
0,-1.736745
1,-1.736745
2,-1.736745
3,-1.736745
4,-1.736745
5,-1.736745
6,-1.736745
7,-1.736745
8,-1.736745
9,-1.736745


In [None]:
import re
TOKEN_RE = re.compile(r"\w.*?\b")
# The regex above finds tokens that start with a word-like character (\w), and continues
# matching characters (.+?) until the next word break (\b). It's a relatively simple
# expression that manages to extract something very much like words from text.


def text_to_sentiment(text):
    tokens = [token.casefold() for token in TOKEN_RE.findall(text)]
    sentiments = words_to_sentiment(tokens)
    return sentiments['sentiment'].mean()

In [None]:
text_to_sentiment("Let's go get Italian food")

In [None]:
text_to_sentiment("Let's go get Mexican food")

In [None]:
text_to_sentiment("Italian")

In [None]:
text_to_sentiment("Mexican")

In [None]:
text_to_sentiment("Italian") - text_to_sentiment("Mexican")

In [26]:
run answers_test.py

.
----------------------------------------------------------------------
Ran 1 test in 0.017s

OK
