# Separability and Hidden Layers

The goal of this module is to demonstrate how a neural network makes data separable if it isn't already.

You will:
* become familiar with the basic functioning of a simple deep network
* learn to view a network as a composition of functions
* explore the concept of "representation learning"

Much credit goes to [http://colah.github.io/posts/2014-03-NN-Manifolds-Topology/](http://colah.github.io/posts/2014-03-NN-Manifolds-Topology/) in inspiring these exercises. It is highly recommended reading.

In [None]:
%matplotlib inline

import numpy as np
import matplotlib.pyplot as plt

# We'll use our handy decision boundary plot function from last time
def plot_decision_boundary(model, X, y):
    X_max = X.max(axis=0)
    X_min = X.min(axis=0)
    xticks = np.linspace(X_min[0], X_max[0], 100)
    yticks = np.linspace(X_min[1], X_max[1], 100)
    xx, yy = np.meshgrid(xticks, yticks)
    ZZ = model.predict(np.c_[xx.ravel(), yy.ravel()])
    Z = ZZ[:,0] >= 0.5
    Z = Z.reshape(xx.shape)
    fig, ax = plt.subplots()
    ax = plt.gca()
    ax.contourf(xx, yy, Z, cmap=plt.cm.bwr, alpha=0.2)
    ax.scatter(X[:,0], X[:,1], c=y[:,0], alpha=0.4)


## Learning non-separable data

Let's take a look at a more challenging dataset, one that is not linearly separable.

In [None]:
from sklearn.datasets import make_blobs, make_circles
from keras.utils import np_utils
from sklearn.model_selection import train_test_split

X, y = make_circles(n_samples=1000,
                    noise=0.02,
                    factor=0.3)
y = np_utils.to_categorical(y)

X_train, X_test, y_train, y_test = train_test_split(X, y)

plt.scatter(X[:,0], X[:,1], c=y[:,0], alpha=0.4)

Let's use the model we build last time on this data.

In [None]:
from keras.models import Sequential
from keras.layers import Dense, Activation
from keras.optimizers import SGD

model0 = Sequential()
model0.add(Dense(2, input_dim=2, activation='softmax'))

model0.compile(loss='categorical_crossentropy',
               optimizer=SGD(lr=0.04),
               metrics=['accuracy'])

model0.fit(X_train, y_train, nb_epoch=20, batch_size=16)
result = model0.evaluate(X_test, y_test)
print 'Test set loss: ', result[0]
print 'Test set accuracy: ', result[1]

plot_decision_boundary(model0, X, y)

The model doesn't do nearly as well, which is no surprise. The model isn't **expressive** enough to represent a more complicated decision boundary.

- - - 
### Exercise 1 - the model's strategy

Though our model doesn't have agency or a mind or it's own, it is a nice shorthand to talk about what the model is "trying" to do in a given situation. Based on the decision boundary plot, can you explain what the model's strategy is and why it gets the accuracy that it does? Try training the model a few times to see what other solutions it finds. Is there a solution that gives the best accuracy?
- - -


- - -
### Exercise 2 - expressiveness and structure

Recalling the structure of our model that you already sketched out, can you explain why this model can only make a linear decision boundary?
- - -


## Adding a hidden layer

We can make our model more expressive by adding a hidden layer. As we will see, this expands the expressiveness and representational capacity of the model, giving it enough power to separate the data.

- - -
### Exercise 3 - add a hidden layer

Below is the model specification we used already. Modify it in place to:

1. make the first layer 3 dimensional rather than 2 dimensional
2. add a "relu" `Activation` layer and a `Dense` layer that feeds into the last softmax layer. 

Hint: the `Dense` layer will not need to specify an `input_dim` since Keras will know the input dimension from the previous layer already.
- - -

<a id="compile"></a>

In [None]:
# Edit the model below to add new 3-dimensional hidden layer
model1 = Sequential()
model1.add(Dense(3, input_dim=2)) # Change the hidden layer to be 3 dimensional
model1.add(Activation('relu'))
model1.add(Dense(2))
# Add a 'relu' layer and Dense layer with an output of 2 dimensions
model1.add(Activation('softmax'))

model1.compile(loss='categorical_crossentropy',
               optimizer=SGD(lr=0.04),
               metrics=['accuracy'])
model1.summary()

### Exercise 4 - a more expressive structure

From the summary, check that your new model 1) has the same input and output dimension as before and 2) has 17 parameters. 

Draw this new model. Is it clear where all of the parameters are?


In [None]:
model1.fit(X_train, y_train, nb_epoch=20, batch_size=16)
plot_decision_boundary(model1, X, y)

result = model1.evaluate(X_test, y_test)
print 'Test set loss: ', result[0]
print 'Test set accuracy: ', result[1]