###Â Exercises


#### Exercise 1 

In exercise one of section four, we're asked to:
1. predict whether or not some people will be diagnosed diabetes from a set of variables of exams. 
2. So this is the population of Pima Indians. It's a very famous dataset that we got from UCI and it contains information about the patients:
* including pregnancies, glucose, blood pressure, and 
* then a few other medical examinations, 
* and the last column is the outcome which is a binary variable. 

So it's a classification problem:
* and you're guided through a series of steps that go from loading the data, creating a histogram to inspect the features, 
* and exploring the correlations between the features and the outcome column. 

We suggest using the seaborn pairplot, but you can also draw a heat map as we saw in the lecture. 

Then there are a few open questions. 
* Do features need standardization? And if so, what kind? 
* Are we gonna use MinMax or standard? 
* And then finally, prepare x and y using a machine learning model. 
* Do you need dummy columns? 
* And make sure you define your target variable. 


In [None]:
%matplotlib inline 
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np

In [None]:
df = pd.read_csv('../dataset/diabets.csv')
df.head()

In [None]:
_ = df.hist(figsize=(12,10))

In [None]:
sns.pairplot(df, hue = 'Outcome')

In [None]:
sns.heatmap(df.corr(), annot = True)

In [None]:
df.info()

In [None]:
df.describe()

In [None]:
from sklearn.preprocessing import StandardScaler

In [None]:
sc = StandardScaler()
X = sc.fit_transformer(df.drop('Outcome', axis = 1))
y = df['Outcome'].value
y_cat = to_categorical(y)

#### Exercise 2 

In exercise two, 
* we build a fully connected neural network model to predict the diabetes patients that we've loaded and prepared in exercise one. 
* So we're guided through a series of step: 
1. we start by splitting the data into train test split, 
2. then, we define a sequential model, with at least one inner layer. 
3. So, to build this model, we'll have to make a few choices: 
       * What the size of the input is, 
       * how many nodes we will use in each layer, 
       * the size of the outputs, 
       * and then what activation functions we will use in the inner layers, and, 
       * what activation functions we're gonna use at the output. 
       * Also, what loss function we will use, 
       * and what optimizer we will use. 
       * Finally, we'll fit the model on the training set, using a validation split of 0.1, and we'll test the trained model on the test data from the train test split. 
       * Finally we check the accuracy score, the confusion matrix, and the classification report


In [None]:
X.shape

In [None]:
from sklearn.model_selection import train_test_split

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, y_cat, 
                                                   random_state = 22, 
                                                   test_size = 0.2)

In [None]:
from keras.models import Sequential
from keras.layers import Dense
from keras.optimizers import Adam

In [None]:
model = Sequential()
model.add(Dense(32, input_shape = (8, ), activation = 'relu'))
model.add(Dense(32, activation = 'relu'))
model.add(Dense(2, activation = 'softmax'))
model.compile(Adam(lr = 0.05), 
             loss = 'categorical_crossentropy', 
             metrics = ['accuracy'])

In [None]:
model.fit(X_train, y_train, epochs = 20, verbose = 2, validation_split = 0.1)

In [None]:
model.summary()

In [None]:
y_pred = model.predict(X_test)

In [None]:
y_test_class = np.argmax(y_test, axis = 1)
y_pred_class = np.armax(y_pred, axis = 1)

In [None]:
from sklearn.metrics import accuracy_score
from sklearn.metrics import classification_report
from sklearn.metrics import confusion_matrix

In [None]:
pd.Series(y_test_class).value_counts()

In [None]:
accuracy_score(y_test_class, y_pred_class)

In [None]:
print(classification_report(y_test_class, y_pred_class))

In [None]:
confusion_matrix(y_test_class, y_pred_class)

#### Exercise 3 

I will review exercise three in section four.
This exercise asks you to compare your results on the Pima Indian data set classification, with the results presented in a notebook on the Kaggle website. In this website, they use different machine learning techniques and they built a model to predict the same outcome you are trying to predict. 

So the question you're asked is are neural networks better or worse in this particular case? 

I will be comparing my results also with the few models from psychic learn for example: 
* a support vector machine or/and random forest, 
* and on the exact same train/test split. 

So I'm not going to tell if the performance is worse or better, that's for you to find out. Also, we ask you to try restricting your features to only four features like in the suggested notebook. 

And how does the model performance change? 
1. You can test this for your model, the neural network, 
2. but also for the models like the random forest and the support vector machine. 

In [None]:
from sklearn.ensemble import RandomForestClassifier
from sklearn.svm import SVC
from sklearn.naive_bayes import GaussianNB

for mod in [RandomForestClassifier(), SVC(), GaussianNB()]:
    mod.fit(X_train, y_train[:, 1])
    y_pred = mod.predict(X_test)
    print("="*80)
    print(mod)
    print('Accuracy score: {:0.3}'.format(accuracy_score(y_test_class, 
                                                         y_pred)))
    print("Confusion Matrix:")
    print(confusion_matrix(y_test_class, y_pred))
    print()

#### Exercise 4 - go on Tensorflow playground on a web-browser

This exercise is about Tensorflow playground. Tensorflow playground is a very nice web application from the guys at Tensorflow that provides an interactive environment where you can play with simple fully connected neural nets on very simple data sets. These are two featured data sets where you have to separate groups of points that are blue and orange. So play with it a few minutes. There's no real goal. You don't need to understand the meaning of every knob and button but just to develop a feeling, an intuition about how things work. So there's no real challenge here just feel free to explore, play with it and see what you find