# What-If tool

What if tool is used to probe behaviour of machine learning models with visual interface. There are mainly three tabs in WIT

- **DataPoint editior**
    - **Viewing and Editing details of Datapoints** : it allows you to dive into selected datapoint. once you select any datapoint it will show feature of that datapoint in leftside panel. It also lets you change any feature of that datapoint on the fly and can also makes prediction on that dummy new point by clicking the “Predict” button . it will show you the impact it has in prediction probablities. 
    - **Finding Nearest Counterfactuals** : Another way to understand the model’s behaviour is to look at what small set of changes can cause the model to flip its decision which is called counterfactuals. we can seethe most similar counterfactual to our selected datapoint. It uses L1 & L2 distances to  calculate the similarity of data points.
            - L1 distance (Manhattan): when stright line is not possible. it is also used when dimentionality is really high since euclidean space becomes irrelavent in high dimention space (Curse of dimentionality)
            - L2 distance (Euclidean): used when you can travel in stright line
    - **Analysing partial dependence plots** : Partial dependency plot shows functional relationship between model input and model predictions. they show how models prediction partially depends on the input variable ie. how model output will change if that specific feature is changed.for ex: how model will respond if the Gender is changed from Male to Female.
- **Performance and fairness**: In performance and Fairness tab you can look at overall metrics also slice by any features which can help us to understand how our model is performing on different value of sliced feature.
     - custom threashold: set the threashold using slider to see change in the confusion matrix
     - single threashold: find optimal threashold for all datapoint on the basis of cost ratio
     - demographic parity:  Demographic parity means that similar percentages of datapoints from each slice are predicted as positive classifications. 
     - Equal opportunity : Equal opportunity means that among those datapoints with the positive ground truth label, there is a similar percentage of positive predictions in each slice. 
     - Equal accuracy :Equal accuracy means that there is a similar percentage of correct predictions in each slice.
     - Group thresholds:Optimize a separate threshold for each slice based on the specified cost ratio.
Detail explanation with example can be found here [https://pair-code.github.io/what-if-tool/ai-fairness.html]
- **Features** : Overview gives users a quick understanding of the distribution of values across the features of their dataset(s). Uncover several uncommon and common issues such as unexpected feature values, missing feature values for a large number of observation, training/serving skew and train/test/validation set skew. same as google facets

## Use case

- code is taken from kaggle on the basis of best accuracy [https://www.kaggle.com/sinakhorami/titanic-best-working-classifier]
- Idea of this notebook is to understand WIF tool feature with most widely used and understood dataset **"Titanic Dataset"**
- We will use titanic dataset & see what sort of analysis we can do using WIT tool.

In [1]:
import numpy as np
import pandas as pd
import re as re

train = pd.read_csv('train.csv', header = 0, dtype={'Age': np.float64})
test  = pd.read_csv('test.csv' , header = 0, dtype={'Age': np.float64})
full_data = [train, test]

print (train.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 891 entries, 0 to 890
Data columns (total 12 columns):
PassengerId    891 non-null int64
Survived       891 non-null int64
Pclass         891 non-null int64
Name           891 non-null object
Sex            891 non-null object
Age            714 non-null float64
SibSp          891 non-null int64
Parch          891 non-null int64
Ticket         891 non-null object
Fare           891 non-null float64
Cabin          204 non-null object
Embarked       889 non-null object
dtypes: float64(2), int64(5), object(5)
memory usage: 83.6+ KB
None


# Code

## Feature Engineering #

## 1. Pclass ##
there is no missing value on this feature and already a numerical value. so let's check it's impact on our train set.

In [2]:
print (train[['Pclass', 'Survived']].groupby(['Pclass'], as_index=False).mean())

   Pclass  Survived
0       1  0.629630
1       2  0.472826
2       3  0.242363


## 2. Sex ##

In [3]:
print (train[["Sex", "Survived"]].groupby(['Sex'], as_index=False).mean())

      Sex  Survived
0  female  0.742038
1    male  0.188908


## 3. SibSp and Parch ##
With the number of siblings/spouse and the number of children/parents we can create new feature called Family Size.

In [4]:
for dataset in full_data:
    dataset['FamilySize'] = dataset['SibSp'] + dataset['Parch'] + 1
print (train[['FamilySize', 'Survived']].groupby(['FamilySize'], as_index=False).mean())

   FamilySize  Survived
0           1  0.303538
1           2  0.552795
2           3  0.578431
3           4  0.724138
4           5  0.200000
5           6  0.136364
6           7  0.333333
7           8  0.000000
8          11  0.000000


it seems has a good effect on our prediction but let's go further and categorize people to check whether they are alone in this ship or not.

In [5]:
for dataset in full_data:
    dataset['IsAlone'] = 0
    dataset.loc[dataset['FamilySize'] == 1, 'IsAlone'] = 1
print (train[['IsAlone', 'Survived']].groupby(['IsAlone'], as_index=False).mean())

   IsAlone  Survived
0        0  0.505650
1        1  0.303538


good! the impact is considerable.

## 4. Embarked ##
the embarked feature has some missing value. and we try to fill those with the most occurred value ( 'S' ).

In [6]:
for dataset in full_data:
    dataset['Embarked'] = dataset['Embarked'].fillna('S')
print (train[['Embarked', 'Survived']].groupby(['Embarked'], as_index=False).mean())

  Embarked  Survived
0        C  0.553571
1        Q  0.389610
2        S  0.339009


## 5. Fare ##
Fare also has some missing value and we will replace it with the median. then we categorize it into 4 ranges.

In [7]:
for dataset in full_data:
    dataset['Fare'] = dataset['Fare'].fillna(train['Fare'].median())
train['CategoricalFare'] = pd.qcut(train['Fare'], 4)
print (train[['CategoricalFare', 'Survived']].groupby(['CategoricalFare'], as_index=False).mean())

   CategoricalFare  Survived
0   (-0.001, 7.91]  0.197309
1   (7.91, 14.454]  0.303571
2   (14.454, 31.0]  0.454955
3  (31.0, 512.329]  0.581081


## 6. Age ##
we have plenty of missing values in this feature. # generate random numbers between (mean - std) and (mean + std).
then we categorize age into 5 range.

In [8]:
for dataset in full_data:
    age_avg 	   = dataset['Age'].mean()
    age_std 	   = dataset['Age'].std()
    age_null_count = dataset['Age'].isnull().sum()
    
    age_null_random_list = np.random.randint(age_avg - age_std, age_avg + age_std, size=age_null_count)
    dataset['Age'][np.isnan(dataset['Age'])] = age_null_random_list
    dataset['Age'] = dataset['Age'].astype(int)
    
train['CategoricalAge'] = pd.cut(train['Age'], 5)

print (train[['CategoricalAge', 'Survived']].groupby(['CategoricalAge'], as_index=False).mean())

  CategoricalAge  Survived
0  (-0.08, 16.0]  0.535714
1   (16.0, 32.0]  0.352018
2   (32.0, 48.0]  0.371542
3   (48.0, 64.0]  0.434783
4   (64.0, 80.0]  0.090909


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  import sys


## 7. Name ##
inside this feature we can find the title of people.

In [9]:
def get_title(name):
	title_search = re.search(' ([A-Za-z]+)\.', name)
	# If the title exists, extract and return it.
	if title_search:
		return title_search.group(1)
	return ""

for dataset in full_data:
    dataset['Title'] = dataset['Name'].apply(get_title)

print(pd.crosstab(train['Title'], train['Sex']))

Sex       female  male
Title                 
Capt           0     1
Col            0     2
Countess       1     0
Don            0     1
Dr             1     6
Jonkheer       0     1
Lady           1     0
Major          0     2
Master         0    40
Miss         182     0
Mlle           2     0
Mme            1     0
Mr             0   517
Mrs          125     0
Ms             1     0
Rev            0     6
Sir            0     1


 so we have titles. let's categorize it and check the title impact on survival rate.

In [10]:
for dataset in full_data:
    dataset['Title'] = dataset['Title'].replace(['Lady', 'Countess','Capt', 'Col',\
 	'Don', 'Dr', 'Major', 'Rev', 'Sir', 'Jonkheer', 'Dona'], 'Rare')

    dataset['Title'] = dataset['Title'].replace('Mlle', 'Miss')
    dataset['Title'] = dataset['Title'].replace('Ms', 'Miss')
    dataset['Title'] = dataset['Title'].replace('Mme', 'Mrs')

print (train[['Title', 'Survived']].groupby(['Title'], as_index=False).mean())

    Title  Survived
0  Master  0.575000
1    Miss  0.702703
2      Mr  0.156673
3     Mrs  0.793651
4    Rare  0.347826


## Data Cleaning #
great! now let's clean our data and map our features into numerical values.

In [11]:
for dataset in full_data:
    # Mapping Sex
    dataset['Sex'] = dataset['Sex'].map( {'female': 0, 'male': 1} ).astype(int)
    
    # Mapping titles
    title_mapping = {"Mr": 1, "Miss": 2, "Mrs": 3, "Master": 4, "Rare": 5}
    dataset['Title'] = dataset['Title'].map(title_mapping)
    dataset['Title'] = dataset['Title'].fillna(0)
    
    # Mapping Embarked
    dataset['Embarked'] = dataset['Embarked'].map( {'S': 0, 'C': 1, 'Q': 2} ).astype(int)
    
    # Mapping Fare
    dataset.loc[ dataset['Fare'] <= 7.91, 'Fare'] 						        = 0
    dataset.loc[(dataset['Fare'] > 7.91) & (dataset['Fare'] <= 14.454), 'Fare'] = 1
    dataset.loc[(dataset['Fare'] > 14.454) & (dataset['Fare'] <= 31), 'Fare']   = 2
    dataset.loc[ dataset['Fare'] > 31, 'Fare'] 							        = 3
    dataset['Fare'] = dataset['Fare'].astype(int)
    
    # Mapping Age
    dataset.loc[ dataset['Age'] <= 16, 'Age'] 					       = 0
    dataset.loc[(dataset['Age'] > 16) & (dataset['Age'] <= 32), 'Age'] = 1
    dataset.loc[(dataset['Age'] > 32) & (dataset['Age'] <= 48), 'Age'] = 2
    dataset.loc[(dataset['Age'] > 48) & (dataset['Age'] <= 64), 'Age'] = 3
    dataset.loc[ dataset['Age'] > 64, 'Age']                           = 4

# Feature Selection
drop_elements = ['PassengerId', 'Name', 'Ticket', 'Cabin', 'SibSp',\
                 'Parch', 'FamilySize']
train = train.drop(drop_elements, axis = 1)
train = train.drop(['CategoricalAge', 'CategoricalFare'], axis = 1)

test  = test.drop(drop_elements, axis = 1)

print (train.head(10))

   Survived  Pclass  Sex  Age  Fare  Embarked  IsAlone  Title
0         0       3    1    1     0         0        0      1
1         1       1    0    2     3         1        0      3
2         1       3    0    1     1         0        1      2
3         1       1    0    2     3         0        0      3
4         0       3    1    2     1         0        1      1
5         0       3    1    2     1         2        1      1
6         0       1    1    3     3         0        1      1
7         0       3    1    0     2         0        0      4
8         1       3    0    1     1         0        0      3
9         1       2    0    0     2         1        0      3


good! now we have a clean dataset and ready to predict. let's find which classifier works better on this dataset. 

## Random Forest classifier

In [12]:
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, log_loss
from sklearn.ensemble import RandomForestClassifier

Train_set,Test_set =train_test_split(train,test_size=0.2,random_state=42,shuffle=True)

In [13]:
X_train = Train_set.drop(['Survived'],axis=1)
y_train = Train_set['Survived']

X_test = Test_set.drop(['Survived'],axis=1)
y_test = Test_set['Survived']

In [14]:
rf = RandomForestClassifier(random_state=42)
rf.fit(X_train,y_train)

RandomForestClassifier(bootstrap=True, ccp_alpha=0.0, class_weight=None,
                       criterion='gini', max_depth=None, max_features='auto',
                       max_leaf_nodes=None, max_samples=None,
                       min_impurity_decrease=0.0, min_impurity_split=None,
                       min_samples_leaf=1, min_samples_split=2,
                       min_weight_fraction_leaf=0.0, n_estimators=100,
                       n_jobs=None, oob_score=False, random_state=42, verbose=0,
                       warm_start=False)

In [15]:
accuracy_score(rf.predict(X_test),y_test)

0.8100558659217877

In [16]:
Train_set.columns.tolist()

['Survived', 'Pclass', 'Sex', 'Age', 'Fare', 'Embarked', 'IsAlone', 'Title']

In [17]:
Train_set.columns.tolist()

['Survived', 'Pclass', 'Sex', 'Age', 'Fare', 'Embarked', 'IsAlone', 'Title']

In [18]:
Train_set.Sex.value_counts()

1    467
0    245
Name: Sex, dtype: int64

In [19]:
Test_set.isnull().sum()

Survived    0
Pclass      0
Sex         0
Age         0
Fare        0
Embarked    0
IsAlone     0
Title       0
dtype: int64

In [20]:
import pickle
pickle.dump(rf,open('random_forest.pkl','wb'))

In [21]:
Test_set.to_csv("test_set.csv",index=False)

## What-If tool code

In [22]:
from witwidget.notebook.visualization import WitConfigBuilder
from witwidget.notebook.visualization import WitWidget
import pandas as pd
import pickle

In [23]:
Test_set = pd.read_csv("test_set.csv")
rf = pickle.load(open('random_forest.pkl', 'rb'))

In [24]:
def custom_predict_1(examples_to_infer):
    preds = rf.predict_proba(examples_to_infer)
    return preds

In [25]:
cols = Test_set.columns.tolist()
reordered = cols[1:8] + [cols[0]]
Reordered_test_set = Test_set[reordered]

In [26]:
Reordered_test_set.head()

Unnamed: 0,Pclass,Sex,Age,Fare,Embarked,IsAlone,Title,Survived
0,3,1,2,2,1,0,4,1
1,2,1,1,1,0,1,1,0
2,3,1,1,1,0,1,1,0
3,2,0,0,3,0,0,2,1
4,3,0,0,1,1,0,2,1


In [27]:
config_builder = WitConfigBuilder(
    Reordered_test_set.values.tolist(),
    Reordered_test_set.columns.tolist()).set_custom_predict_fn(
        custom_predict_1).set_target_feature('Survived').set_label_vocab(
            ['No', 'Yes']).set_model_type('classification')

In [28]:
tool_height_in_px = 500 
WitWidget(config_builder, height=tool_height_in_px)

WitWidget(config={'model_type': 'classification', 'label_vocab': ['No', 'Yes'], 'feature_names': ['Pclass', 'S…

## Analyzing model using what-if tool

### DataPoint Editor Tab

**Purpose: To findout which all records our model has predicted wrong**
- Colour by : Survived 
- Y-axis : Inference score

Blue point represents not survived(target=0) & Red point represents survived (target=1). where as Y-axis represents inference score (Model prediction). all blue points near 1 (on Y-axis) are wrongly predicted by our model similarly all red points near 0 value (Y-axis) are wrongly predicted by our model

### Performance & Fairness Tab

**Purpose: To findout optimum threshold**

- Ground Truth feature = Survived
- Apply an optimization strategy: Single threashold

This will change the threshold value from 0.5 to 0.42 which will increase F1-score of our overall dataset from 0.76 to 0.81. which is huge jump interms of F1 score by just changing the threshold of our model