#  Keras Intro: Shallow Models

Keras Documentation: https://keras.io

In this notebook we explore how to use Keras to implement 2 traditional Machine Learning models:
- **Linear Regression** to predict continuous data
- **Logistic Regression** to predict categorical data

## Linear Regression

In [None]:
%matplotlib inline
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np

### 0. Load data

In [None]:
df = pd.read_csv('../data/weight-height.csv')

In [None]:
df.head()

In [None]:
df.plot(kind='scatter',
        x='Height',
        y='Weight',
        title='Weight and Height in adults')

### 1. Create Train/Test split

In [None]:
from sklearn.model_selection import train_test_split

In [None]:
X = df[['Height']].values
y = df['Weight'].values

X_train, X_test, y_train, y_test = train_test_split(X, y, 
    test_size = 0.3, random_state=0)

### 2. Train Linear Regression Model

In [None]:
from keras.models import Sequential
from keras.layers import Dense
from keras.optimizers import Adam, SGD

In [None]:
model = Sequential()

In [None]:
model.add(Dense(1, input_shape=(1,)))

In [None]:
model.summary()

In [None]:
model.compile(Adam(lr=0.9), 'mean_squared_error')

In [None]:
model.fit(X_train, y_train, epochs=40)

### 3. Evaluate Model Performance

In [None]:
from sklearn.metrics import r2_score

In [None]:
y_train_pred = model.predict(X_train).ravel()
y_test_pred = model.predict(X_test).ravel()

In [None]:
print("The R2 score on the Train set is:\t{:0.3f}".format(r2_score(y_train, y_train_pred)))
print("The R2 score on the Test set is:\t{:0.3f}".format(r2_score(y_test, y_test_pred)))

In [None]:
df.plot(kind='scatter',
        x='Height',
        y='Weight',
        title='Weight and Height in adults')
plt.plot(X_test, y_test_pred, color='red')

In [None]:
W, B = model.get_weights()

In [None]:
W

In [None]:
B

# Classification

### 0. Load Data

In [None]:
df = pd.read_csv('../data/user_visit_duration.csv')

In [None]:
df.head()

In [None]:
df.plot(kind='scatter', x='Time (min)', y='Buy')

### 1. Create Train/Test split

In [None]:
X = df[['Time (min)']].values
y = df['Buy'].values

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, y, 
    test_size = 0.3, random_state=0)

### 2. Train Logistic Regression Model

In [None]:
model = Sequential()

In [None]:
model.add(Dense(1, input_shape=(1,), activation='sigmoid'))

In [None]:
model.summary()

In [None]:
model.compile(SGD(lr=0.5), 'binary_crossentropy', metrics=['accuracy'])

In [None]:
model.fit(X_train, y_train, epochs=40)

In [None]:
ax = df.plot(kind='scatter', x='Time (min)', y='Buy',
             title='Purchase behavior VS time spent on site')

t = np.linspace(0, 4)
ax.plot(t, model.predict(t), color='orange')

plt.legend(['model', 'data'])

### 3. Evaluate Model Performance

#### Accuracy

In [None]:
from sklearn.metrics import accuracy_score

In [None]:
y_train_pred = model.predict_classes(X_train)
y_test_pred = model.predict_classes(X_test)

In [None]:
print("The train accuracy score is {:0.3f}".format(accuracy_score(y_train, y_train_pred)))
print("The test accuracy score is {:0.3f}".format(accuracy_score(y_test, y_test_pred)))

#### Confusion Matrix & Classification Report

In [None]:
from sklearn.metrics import confusion_matrix

In [None]:
confusion_matrix(y_test, y_test_pred)

In [None]:
def pretty_confusion_matrix(y_true, y_pred, labels=["False", "True"]):
    cm = confusion_matrix(y_true, y_pred)
    pred_labels = ['Predicted '+ l for l in labels]
    df = pd.DataFrame(cm, index=labels, columns=pred_labels)
    return df

In [None]:
pretty_confusion_matrix(y_test, y_test_pred, ['Not Buy', 'Buy'])

In [None]:
from sklearn.metrics import classification_report

In [None]:
print(classification_report(y_test, y_test_pred))

## Exercise

You've just been hired at a real estate investment firm and they would like you to build a model for pricing houses. You are given a dataset that contains data for house prices and a few features like number of bedrooms, size in square feet and age of the house. Let's see if you can build a model that is able to predict the price. In this exercise we extend what we have learned about linear regression to a dataset with more than one feature. Here are the steps to complete it:

1. Load the dataset ../data/housing-data.csv
- create 2 variables called X and y: X shall be a matrix with 3 columns (sqft,bdrms,age) and y shall be a vector with 1 column (price)
- create a linear regression model in Keras with the appropriate number of inputs and output
- split the data into train and test with a 20% test size, use `random_state=0` for consistency with classmates
- train the model on the training set and check its accuracy on training and test set
- how's your model doing? Is the loss decreasing?
- try to improve your model with these experiments:
    - normalize the input features:
        - divide sqft by 1000
        - divide age by 10
        - divide price by 100000
    - use a different value for the learning rate of your model
    - use a different optimizer
- once you're satisfied with training, check the R2score on the test set

*Copyright &copy; 2017 Francesco Mosconi & CATALIT LLC. All rights reserved.*