# Kishu Tutorial

The notebook tries to predict survival in the Titanic.

**Goal**: This notebook helps you get familiar with Kishu. You only need to **read the markdown instructions, no need to read the code**.

At the end, you should be able to:
- **Browse** different commits and **read** commit detailed information
- **Checkout** (i.e., restore) a notebook from a commit and **branch out** to explore different coding routes
- **Search** for variable change and **inspect** variable type, size, and value
- **Recover** from Jupyter kernel restart

In [None]:
import numpy as np 
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
plt.style.use('fivethirtyeight')
import warnings
warnings.filterwarnings('ignore')

%matplotlib inline

data = pd.read_csv('./titanic.csv')

In [None]:
X = 1
Y = [X]

# Try Kishu: Browse Commits

Please see Kishuboard. You should find a **commit** created on the left panel. Click on the commit.

On the right panel, you should find the **list of executed cells** (top) and **variables** existing at the commit (bottom).

In the variable panel, you should see variable **names**, their **types**, **sizes**, and **value summaries**. Clicking on "view detail" displays **detailed values** of the variables.

**Quiz**: What shape does `data` have? What's its 11th row?

### Data Preprocessing

In [None]:
# impute age feature
data['Initial']=0
for i in data:
    data['Initial']=data.Name.str.extract('([A-Za-z]+)\.') #lets extract the Salutations
data['Initial'].replace(['Mlle','Mme','Ms','Dr','Major','Lady','Countess','Jonkheer','Col','Rev','Capt','Sir','Don'],['Miss','Miss','Miss','Mr','Mr','Mrs','Mrs','Other','Other','Other','Mr','Mr','Mr'],inplace=True)
## Assigning the NaN Values with the Ceil values of the mean ages
data.loc[(data.Age.isnull())&(data.Initial=='Mr'),'Age']=33
data.loc[(data.Age.isnull())&(data.Initial=='Mrs'),'Age']=36
data.loc[(data.Age.isnull())&(data.Initial=='Master'),'Age']=5
data.loc[(data.Age.isnull())&(data.Initial=='Miss'),'Age']=22
data.loc[(data.Age.isnull())&(data.Initial=='Other'),'Age']=46

In [None]:
# impute embark feature
data['Embarked'].fillna('S',inplace=True)

# Age and fare feature band (convert continous values into categorical values)
data['Age_band']=0
data.loc[data['Age']<=16,'Age_band']=0
data.loc[(data['Age']>16)&(data['Age']<=32),'Age_band']=1
data.loc[(data['Age']>32)&(data['Age']<=48),'Age_band']=2
data.loc[(data['Age']>48)&(data['Age']<=64),'Age_band']=3
data.loc[data['Age']>64,'Age_band']=4
data.head(2)

data['Fare_cat']=0
data.loc[data['Fare']<=7.91,'Fare_cat']=0
data.loc[(data['Fare']>7.91)&(data['Fare']<=14.454),'Fare_cat']=1
data.loc[(data['Fare']>14.454)&(data['Fare']<=31),'Fare_cat']=2
data.loc[(data['Fare']>31)&(data['Fare']<=513),'Fare_cat']=3

# Converting String Values into Numeric
data['Sex'].replace(['male','female'],[0,1],inplace=True)
data['Embarked'].replace(['S','C','Q'],[0,1,2],inplace=True)
data['Initial'].replace(['Mr','Mrs','Miss','Master','Other'],[0,1,2,3,4],inplace=True)

In [None]:
#drop unneeded features
data.drop(['Name','Ticket','Fare','Cabin','PassengerId','Initial','Age'],axis=1,inplace=True)

### Predictive Modeling

In [None]:
#importing all the required ML packages
from sklearn.linear_model import LogisticRegression #logistic regression
from sklearn import svm #support vector Machine
from sklearn.ensemble import RandomForestClassifier #Random Forest
from sklearn.neighbors import KNeighborsClassifier #KNN
from sklearn.naive_bayes import GaussianNB #Naive bayes
from sklearn.tree import DecisionTreeClassifier #Decision Tree
from sklearn.model_selection import train_test_split #training and testing data split
from sklearn import metrics #accuracy measure
from sklearn.metrics import confusion_matrix #for confusion matrix

Divide the data into train and test

In [None]:
train,test=train_test_split(data,test_size=0.3,random_state=0,stratify=data['Survived'])
train_X=train[train.columns[1:]]
train_Y=train[train.columns[:1]]
test_X=test[test.columns[1:]]
test_Y=test[test.columns[:1]]
X=data[data.columns[1:]]
Y=data['Survived']

Train the Radial Support Vector Machines(rbf-SVM)

In [None]:
model=svm.SVC(kernel='rbf',C=1,gamma=0.1)
model.fit(train_X,train_Y)
prediction=model.predict(test_X)
print('Accuracy for rbf SVM is ',metrics.accuracy_score(prediction,test_Y))

# Try Kishu: Checkout and Branch

Suppose you would like to explore a new approach. Instead of this 5th cell (if executed in order):

```python
data.drop(['Name','Ticket','Fare','Cabin','PassengerId','Initial','Age'],axis=1,inplace=True)
```

You would like to **change the code** to not drop the `Age` column:

```python
data.drop(['Name','Ticket','Fare','Cabin','PassengerId','Initial'],axis=1,inplace=True)
```

Thanks to Kishu's **checkout and branch**, you can do this without re-executing every cell! To do so:

1. **Find the commit to restore** before the cell you want to change.
   - *Hint: Find the commit starting with `[5] data.drop([...` and select the previous one.*
2. **Checkout** variables by right-clicking the commit > "Checkout to Selected Commit" > "Checkout Only Variables".
3. **Modify** the cell to not drop the `Age` column.
4. **Branch out** by executing the modified cell and following cells as usual.

___________________________

Unfortunately, not dropping `Age` leads to no significant improvement.

**Quiz**: Let's revert back by checking out the original commit.

# Try Kishu: Search and Inspect

Suppose you would like to find when `data` has changed, for example, when its number of columns changes from 15 to 8. 

Luckily, Kishu's Search and Inspect enables you to do this easily! To do so:

1. **Locate** search bar on the top.
2. **Select** "Select search target" > "variable changes".
3. **Search** the variable `data` by typing its name in the search bar and clicking the search icon.
4. **Inspect** `data` among highlighted commits that changed `data`'s shape.
   - *Hint: Variable size is located in the bottom right panel.*
   - Tips: You can pin `data` on top of the panel by typing `data` into "Add a watch to the variable you want to inspect".

**Quiz**: Find commit(s) when `X` has changed.

In [None]:
data.shape

# Try Kishu: Recover

Say, the kernel is restarted (please manually restart the kernel, "Kernel" > "Restart Kernel...").

Kishu helps you **recover** your previous work!

To do this,
1. **Attach** Kishu ("Kishu" > "Initialize/Re-attach")
2. **Checkout** the latest commit (right-clicking the commit > "Checkout to Selected Commit" > "Checkout Code + Variables").

**Quiz**: Try restarting and recover again, for practice.

In [None]:
# Run data.shape to see if the recover is successful
data.shape