## Anatomy of Neural Network

training a neural network:
* layers
* input data dna targets
* loss function
* optimizer

some layers are stateless, but more frequently layers have a state: the layer's weights. Different layers are appropriate for differnet tensor formats and different types of data processing. 

* 2D tensor(samples, features): often processed by densely connected layers(fully connected or dense layers) -> `Dense` class in keras
* 3D tensor, sequence data(samples, timesteps, features): typically processed by recurrent layers such as an `LSTM` layer.
* 4D tensor, image data: usually processed by a 2D convolution layer `Conv2D`

Building deep-learning models in Keras is done by clipping together compatible layers to form useful data-transformation pipelines.

* layer compatibility: every layer will only accespt input tensors of a certain shape and will return output tensors of a certain shape.

In [1]:
from keras import layers

  from ._conv import register_converters as _register_converters
Using TensorFlow backend.


In [2]:
layer = layers.Dense(32, input_shape=(784, ))

a layer accepting only 2D tensor where first dimension is 784, and retuning a tensor where first dimension has been transformed to 32.

Layers are dynamically built to match the shape of the incoming layer.

In [3]:
from keras import models
from keras import layers

In [4]:
model = models.Sequential()
model.add(layers.Dense(32, input_shape = (784, )))
model.add(layers.Dense(32))

the second layer did not receive an input shape, but it automatically inferred its input shape as the being the output shape of the layer that came before.

A deep-learning model is a directed, acyclic graph of layers. A variaty of network topologies:
* two-branch networks
* multihead networks
* inception blocks

The topology of a network defines a hypothesis space. By choosing a network topology, you contrain your `space of possibilities` (hypothesis space) to a specific series of tensor operations. Then search for a good set of values for the weight tensors involved in these tensor operations.

Once the network architecture is defined, one still have to choose loss function and optimizer.

A neural network that has multiple outputs may have multiple loss functions. But gradient-descent process must be based on a single scalar loss value, hence for multiloss networks, all lossed are combined into a single scalar quantity.

Guidlines for loss choice:
* binary classification: binary crossentropy
* multiclass classification: categorical crossentropy
* regression: mean squred error
* sequence-learning: connectionist temporal classification

Only when working on truly new research problem, will you have to develop your own objective function.

### Intro to Keras

<img src = 'keras.png'>

Typical Keras workflow:

__1.__ Define input tensors and target tensors
__2.__ Define a network of layers
__3.__ Configure the learning process: loss function, optimizer, monitor metric
__4.__ Iterate on training data by calling `fit()`

Two ways to define a model:
* using the `Sequential` 
    - only for linear stacks of layers
    - most common network architecture by far
* `functional API`
    - for directed acyclic graphs of layers
    - build completely arbitraty architecture

Two layer model using `Sequential` class:

In [6]:
from keras import models
from keras import layers

In [7]:
model = models.Sequential()
model.add(layers.Dense(32, activation='relu', input_shape = (784, )))
model.add(layers.Dense(10, activation='softmax'))

Same model defined using the functional API:

In [9]:
input_tensor = layers.Input(shape = (784, ))
x = layers.Dense(32, activation='relu')(input_tensor)
output_tensor = layers.Dense(10, activation='softmax')(x)
model = models.Model(inputs = input_tensor, outputs = output_tensor)

With the functional API, you are manipulating the data tensors that the model processes and applying layers to this tensor as if they are functions.

Once model architecture is defined regardless which approach was used, the following is the same: compilation step, to specify the loss function, optimizer and metric to monitor during training.

In [10]:
from keras import optimizers

In [11]:
model.compile(optimizer=optimizers.RMSprop(lr = 0.001),
              loss = 'mse',
              metric=['accuracy'])

Finally, pass the Numpy arrays of input and target to the model via `fit()`.

In [13]:
#model.fit(input_tensor, target_tensor, batch_size=128, epochs=10)

### Deep-learning workstation

* recommend running deep-learning code on a modern NVIDIA GPU.
    - image processing with CNN (slow on CPU)
    - sequence processing with RNN (slow on CPU)
    - GPU is 5-10 times faster to CPU