# Keras Overview
High-level neural network Python package using TensorFlow as a backend
* CPU or GPU versions
* CNTK and Theano can be used as the backend instead

Quickly build Neural Networks in a declarative manner
* i.e. tell Keras **what** kind of NN model/layers you want
* Supports Convolutional (CNN) and Recurrent (RNN) neural networks (or a combo)
* Using TensorFlow itself would give more control of **how** to build a fine-tuned NN

## Models
Two types of implementations for Models:
* The **Sequential** model is the standard way of building a NN by linearly stacking layers
* The **Model Class** can be used to create a more complex NN using the Keras functional API

## Layers
NNs have an input layer, 1 or more hidden layers, and an output layer. Each layer also has an activation function applied to each node which is how learning takes place. Each node uses this function to compute a weighted sum of how much the input contributes to the output.

## Activation Functions
### Overview
The Activation Function (AF) is a weighted, mathematical function applied to the inputs, which results in a final output "score". It could output infinity and completely outweigh any other node if they have smaller values, that's bad. A more practical approach is to apply the same AF to each node and normalize the range to something like -1 to 1. Now they're all on the same playing field in terms of contribution to the final output layer. The actual function can be just about anything, from simple to complex:
* **Step Function**: simple and blunt, yes or no, output is 0 or 1. This is easy to understand, but not the best for a NN. This AF either says, "Yes, my input contributes info to the output, or no it does not." There is no maybe, a little bit, or sort of. This is not the best way to gather info from multiple sources because each one likely contributes something to the final result and we want to compile that info as we move through a deep neural network with many layers.
* **Linear Function**: a straight line, input is proportional to output. This is better, but still bad for three reasons. (1) the output is not bound and could "blow up" to a huge number, or very tiny number, which isn't useful. (2) if every layer has a linear AF, then the whole system is linear and there's no point in having multiple layers, it's really just like one linear function once combined. (3) when we get to back propagation for gradient descent and minimizing the error, the gradient (derivative of the line) is constant and doesn't depend on the input
* **Sigmoid Function**: 'S' shaped, this is better, it's not all or nothing like Step and it is non-linear so multiple layers are more meaningful. Small input changes (x) have a larger affect on the output (y), that's good! It causes this node to eventually figure out whether how to classify the output by drifting towards y=0 or y=1. However, towards the ends of the function (small or large input values) the output has very little change and is nearly flatlined. This means the node is set in its ways and nothing you say can change its mind! That may be good, or bad, but this AF is widely used.
* **Tanh Function**: this is a scaled version of the Sigmoid function, so it has similar characteristics. The difference is that the gradients are even steeper
* **ReLu Function**: this name is used all the time, what is it? The Rectified Linear Unit is linear if x is greater than 0, otherwise it outputs zero. Overall, it *IS* non-linear and it can speed up deep NN processing by making it less computationally dense. Meaning, if it outputs a zero, then there is less number crunching at the next layer because it's input is zero. However, with a flat-line zero gradient for x < 0, the node will stop responding to variations in input. This causes a dying ReLu neuron which makes this portion of the NN go passive. To remedy this, the flat line for x < 0 can be made into a small slope so it will gradually recover during training (Leaky ReLu) instead of causing that whole portion of the NN to become unresponsive to change.
![title](AF.png)
### Keras Activation Function usage
  Add an AF to each forward layer in 1 of 2 ways:
 1. model.add(Dense(64, activation='tanh'))
 2. model.add(Dense(64)); model.add(Activation('tanh')

  [Available activation functions](https://keras.io/activations/#available-activations):  
 1. **elu**
 2. **selu**
 3. **softplus**
 4. **softsign**
 5. **relu**
 6. **tanh**
 7. **sigmoid**
 8. **hardsigmoid**
 9. **linear**
 10. **softmax**

In [93]:
from keras.models import Sequential
model = Sequential()

Let's create a Sequential model made up of 2 layers
* The 1st layer must have a defined input shape: input_shape=(X,)   or   input_dim=X
 *
* The 2nd layer (output layer) 
 * following layers can infer what shape they should be

In [94]:
from keras.layers import Dense, Activation
model.add(Dense(units=64, input_dim=100))
model.add(Activation('relu'))
model.add(Dense(units=10))
model.add(Activation('softmax'))

In [95]:
model.compile(loss='categorical_crossentropy',
              optimizer='sgd',
              metrics=['accuracy'])

In [96]:
# Generate dummy data
import numpy as np
data = np.random.random((1000, 100))
labels = np.random.randint(2, size=(1000, 1))

In [97]:
from keras.utils import to_categorical
one_hot_labels = to_categorical(labels, num_classes=10) # Convert labels to categorical one-hot encoding

In [98]:
model.fit(data, one_hot_labels, epochs=10, batch_size=32)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x237e54a3518>