# Keras versus Poisonous Mushrooms

This example demonstrates building a simple dense neural network using Keras.  The example uses [Agaricus Lepiota](https://archive.ics.uci.edu/ml/datasets/Mushroom) training data to detect poisonous mushrooms.

In [1]:
from pandas import read_csv
srooms_df = read_csv('../data/agaricus-lepiota.data.csv')
srooms_df.head()

Unnamed: 0,edibility,cap-shape,cap-surface,cap-color,bruises,odor,gill-attachment,gill-spacing,gill-size,gill-color,...,stalk-surface-below-ring,stalk-color-above-ring,stalk-color-below-ring,veil-type,veil-color,ring-number,ring-type,spore-print-color,population,habitat
0,p,x,s,n,t,p,f,c,n,k,...,s,w,w,p,w,o,p,k,s,u
1,e,x,s,y,t,a,f,c,b,k,...,s,w,w,p,w,o,p,n,n,g
2,e,b,s,w,t,l,f,c,b,n,...,s,w,w,p,w,o,p,n,n,m
3,p,x,y,w,t,p,f,c,n,n,...,s,w,w,p,w,o,p,k,s,u
4,e,x,s,g,f,n,f,w,b,k,...,s,w,w,p,w,o,e,n,a,g


## Feature extraction

If we wanted to use all the features in the training set then we would need to map each out.  The ```LabelEncoder``` converts T/F data to 1 and 0.  The ```LabelBinarizer``` converters categorical data to an **one hot encoding**. 

> Note: Bruises has missing data. 

If we wanted to use all the features in the training set then we would need to map each out:

```
column_names = srooms_df.axes[1]
def get_mapping(name):
    if(name == 'edibility' or name == 'gill-attachment'):
        return (name, sklearn.preprocessing.LabelEncoder())
    else:
        return (name, sklearn.preprocessing.LabelBinarizer())
    
mappings = list(map(lambda name: get_mapping(name), column_names)
```

We will use a subset of features to make it interesting.  Are there simple rules or a handful of features that can be used to test edibility?  Lets try a few.

In [2]:
from sklearn_pandas import DataFrameMapper
import sklearn
import numpy as np

mappings = ([
    ('edibility', sklearn.preprocessing.LabelEncoder()),
    ('odor', sklearn.preprocessing.LabelBinarizer()),
    ('habitat', sklearn.preprocessing.LabelBinarizer()),
    ('spore-print-color', sklearn.preprocessing.LabelBinarizer())
])

In [3]:
mapper = DataFrameMapper(mappings)
srooms_np = mapper.fit_transform(srooms_df.copy())

Now lets transform the textual data to a vector...

The transformed data should have 26 features. The break down is as follows:
* Edibility (0 = edible, 1 = poisonous)
* odor (9 features): 
    ```[almond=a, creosote=c, foul=f, anise=l, musty=m, none=n, pungent=p, spicy=s, fishy=y]```
* habitat (7 features):
    ```[woods=d, grasses=g, leaves=l, meadows=m, paths=p, urban=u, waste=w]```
* spore-print-color (9 features):
    ```[buff=b, chocolate=h, black=k, brown=n, orange=o, green=r, purple=u, white=w, yellow=y]```

In [4]:
print(srooms_np.shape)
print("Frist sample: {}".format(srooms_np[0]))
print("  edibility (poisonous): {}".format(srooms_np[0][0]))
print("  ordr (pungent): {}".format(srooms_np[0][1:10]))
print("  habitat (urban): {}".format(srooms_np[0][10:17]))
print("  spore-print-color (black): {}".format(srooms_np[0][17:]))

(8124, 26)
Frist sample: [1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0]
  edibility (poisonous): 1
  ordr (pungent): [0 0 0 0 0 0 1 0 0]
  habitat (urban): [0 0 0 0 0 1 0]
  spore-print-color (black): [0 0 1 0 0 0 0 0 0]


Before we train the neural network, let split the data into training and test datasets.

In [5]:
from sklearn.model_selection import train_test_split
train, test = train_test_split(srooms_np, test_size = 0.2, random_state=7)
train_labels = train[:,0:1]
train_data = train[:,1:]
test_labels = test[:,0:1]
test_data = test[:,1:]
print('training data dims: {}, label dims: {}'.format(train_data.shape,train_labels.shape))
print('test data dims: {}, label dims: {}'.format(test_data.shape,test_labels.shape))

training data dims: (6499, 25), label dims: (6499, 1)
test data dims: (1625, 25), label dims: (1625, 1)


# Model Definition

We will create a simple three layer neural network.  The network contains two dense layers and a dropout layer (avoids overfitting).  

## Layer 1: Dense Layer

A dense layer applies an activation function to the output of $W \cdot x + b$.  If the densen layer only had three input and output, a diagram of the layer looks like sort of this.  

![Dense Layer](images/DenseLayer.png)

Under the covers, keras represents layer's wieghts as a matrix.  The inputs, outputs, and biases are vectors...

$$ 
\begin{bmatrix} 
y_1 \\
y_2 \\
y_3
\end{bmatrix}
=
relu
\begin{pmatrix}
\begin{bmatrix} 
W_{1,1} & W_{1,2} &  W_{1,3} \\
W_{2,1} & W_{2,2} &  W_{2,3} \\
W_{3,1} & W_{3,2} &  W_{3,3}
\end{bmatrix}
\cdot
\begin{bmatrix} 
x_1 \\
x_2 \\
x_3
\end{bmatrix}
+
\begin{bmatrix} 
b_1 \\
b_2 \\
b_3
\end{bmatrix}
\end{pmatrix}
$$

If this operation was decomposed futher, it would look like this...

$$ 
\begin{bmatrix} 
y_1 \\
y_2 \\
y_3
\end{bmatrix}
=
\begin{bmatrix}
relu(W_{1,1} x_1 + W_{1,2} x_2 +  W_{1,3} x_3 + b_1) \\
relu(W_{2,1} x_1 + W_{2,2} x_2 +  W_{2,3} x_3 + b_2) \\
relu(W_{3,1} x_1 + W_{3,2} x_2 +  W_{3,3} x_3 + b_3)
\end{bmatrix}
$$

## Layer 2: Dropout

The dropout layer prevents overfitting by randomly dropping inputs to the next layer.

## Layer 3: Dense Layer

This laye acts like the first one, except this layer applies a sigmod activation function.  The output is the probability a mushroom is poisonous.

$$y = sigmod(W \cdot x + b)$$

## Putting It Together

Fortunatly, we don't need to worry about wiegths and biases Keras.  We just define the layers in a sequence...

In [6]:
from keras.models import Sequential
from keras.layers import Dense, Dropout

model = Sequential()
model.add(Dense(20, activation='relu', input_dim=25))
model.add(Dropout(0.5))
model.add(Dense(1, activation='sigmoid'))

Using TensorFlow backend.


## Keras Callbacks

Keras provides callbacks as a means to instrument internal state.  In this example, we will write a tensorflow event log.  The event log enables a tensorboard visualization of the translated model.  The event log also captures key meterics during training. 

> Note: This step is completely optional and depends on the backend engine.  

In [7]:
from keras.callbacks import TensorBoard
tensor_board = TensorBoard(log_dir='./logs/keras_srooms', histogram_freq=1)

In [8]:
model.compile(optimizer='adam',
              loss='binary_crossentropy',
              metrics=['accuracy'])

model.fit(train_data, train_labels, epochs=10, batch_size=32, callbacks=[tensor_board])

INFO:tensorflow:Summary name dense_1/kernel:0 is illegal; using dense_1/kernel_0 instead.
INFO:tensorflow:Summary name dense_1/bias:0 is illegal; using dense_1/bias_0 instead.
INFO:tensorflow:Summary name dense_2/kernel:0 is illegal; using dense_2/kernel_0 instead.
INFO:tensorflow:Summary name dense_2/bias:0 is illegal; using dense_2/bias_0 instead.
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x119e17278>

In [9]:
score = model.evaluate(test_data, test_labels, batch_size=1625)
print(score)

[0.010436742566525936, 0.99507689476013184]
