At their core Neural Networks consist of:
- *Layers*, which are combined into a *network* (or *model*)
- The *input data* and corresponding *targets*
- The *loss function*, which defines the feedback signal used for learning
- The *optimizer*, which determines how learning proceeds

![Neural Network Training Process Diagram](NN-Training-Process.svg "Neural Network Training Process Diagram")

### Layers

Different layers are appropriate for different tensor formats and different types of data processing.
- Simple vector data, stored in 2D tensors, is often processed by *densely connected*/*fully connected*/*dense* layers
- Sequence data, stored in 3D tensors, is typically processed by *recurrent* layers such as a `LTSM` layer
- Image data, stored in 4D tensors, is usually processed by 2D convolution layers like `Conv2D`

In [None]:
from keras import models
from keras import layers

model = models.Sequential()
model.add(layers.Dense(32, input_shape=(784,)))
model.add(layers.Dense(32)) # this layer infers its input shape from the output of the previous layer

### Models: Networks of Layers

A deep-learning model is a **directed, acyclic graph** of layers. Most commonly this is a linear stack of layers, but there are other network topologies
- Two-branch networks
- Multihead networks
- Inception blocks

Network topology defines a *hypothesis space* (constraining your *space of possibilities* to a specific series of tensor operations)

### Loss Functions and Optimizers

- *Loss Function*/*Objective Function* : The quantity that will be minimized during training. It represents a measure of success for the task at hand.
- *Optimizer* : Determines how the network will be updated based on the loss function. It implements a specific variant of stochastic gradient descent (SGD).

For multiloss networks, all losses are averaged into a single quantity so we can still find compute gradient.

Choosing the right objective function is *extremely* important. Choose a loss function that embodies the *right* constraints for your problem, so that minimization correlates with model success.