### Neural networks (NN)

In [2]:
from IPython.display import Image

#### neurons

- inputs
- weights
- bias
- activation function
- output

In [4]:
Image(url = '../imgs/neuron.gif', width = 400, height = 250)

#### activation function

In [5]:
Image(url = '../imgs/activation.gif', width = 400, height = 250)

#### layers

In [6]:
Image(url = '../imgs/layers.gif', width = 400, height = 250)

#### types of NN

There are several types of neural networks, each with its own architecture and intended use. Here are some of the main types:

1. Feedforward Neural Networks: These are the simplest type of neural network, in which information flows in one direction, from input to output, without any loops or cycles. Feedforward networks are typically used for supervised learning tasks such as classification and regression.

2. Convolutional Neural Networks (CNNs): Designed for image and video recognition tasks. They use a type of layer called a convolutional layer, which applies a set of filters to the input image to extract features. The output of the convolutional layer is then passed through one or more fully connected layers to produce the final output.

3. Recurrent Neural Networks (RNNs): Designed for sequence processing tasks such as speech recognition and natural language processing. RNNs use feedback connections to allow information to persist over time. This allows them to process sequences of variable length and to model dependencies between elements in a sequence.

4. Long Short-Term Memory (LSTM) Networks: A type of recurrent neural network that are designed to address the problem of vanishing gradients in traditional RNNs. LSTMs use special memory cells that can selectively forget or remember information over time, allowing them to model long-term dependencies in sequential data.

5. Generative Adversarial Networks (GANs): Used for unsupervised learning tasks such as image generation and data synthesis. GANs consist of two networks: a generator network that creates fake data, and a discriminator network that tries to distinguish between real and fake data. The two networks are trained together in a game-like setting, with the generator trying to fool the discriminator and the discriminator trying to correctly classify the data.

6. Autoencoder Networks: Designed to learn a compressed representation of the input data, which can then be used for tasks such as image denoising, data compression, and anomaly detection. Autoencoders consist of an encoder network that compresses the input data, and a decoder network that tries to reconstruct the original data from the compressed representation.


#### the training process
- Back-propagation
- loss function
- optimizer
- learning rate
- epochs
- batch size

##### back-propagation

Backpropagation (short for "backward propagation of errors") is a common algorithm used to train artificial neural networks. The algorithm works by iteratively adjusting the weights of the network based on the errors between the predicted output and the true output for a given input.

During the forward pass of the backpropagation algorithm, the input is fed through the network, and the output is computed. The difference between the predicted output and the true output (i.e., the ground truth) is then used to compute the **loss function**.

During the backward pass of the algorithm, the gradients of the loss function with respect to the parameters of the network are computed using the chain rule of calculus. These gradients are then used to update the parameters of the network (i.e., the weights and biases) in the opposite direction of the gradient. The magnitude of the update is determined by a **learning rate** hyperparameter.

The backpropagation algorithm is typically applied iteratively over a large dataset of inputs and corresponding outputs (i.e., a training set). The weights of the network are updated after each pass through the dataset, and the process is repeated until the network converges to a set of weights that minimize the loss function.

Backpropagation is a widely used algorithm for training neural networks because it allows the network to learn from its mistakes and make small adjustments to its parameters to improve its predictions over time.

##### loss function

A loss function, also known as a cost function or objective function, is a function that measures the difference between the predicted output and the true output (i.e., the ground truth) of a neural network for a given input. The loss function is a key component of the training process, as it tells the optimizer how well the model is performing and guides it in the direction of minimizing the error.

There are many different types of loss functions, each of which is designed for a specific type of problem. Here are some common types of loss functions:

1. Mean Squared Error (MSE): This is a common loss function for regression problems. It measures the average squared difference between the predicted output and the true output.

2. Binary Cross-Entropy: This is a loss function for binary classification problems. It measures the difference between the predicted probability of a positive example and the true probability of a positive example.

3. Categorical Cross-Entropy: This is a loss function for multi-class classification problems. It measures the difference between the predicted probability distribution over classes and the true probability distribution.

4. Hinge Loss: This is a loss function used for training support vector machines (SVMs) and other classifiers. It measures the difference between the predicted output and the true output, and penalizes the model more heavily for predictions that are further from the true output.

5. Kullback-Leibler (KL) Divergence: This is a measure of the difference between two probability distributions. It is often used in generative models and other unsupervised learning tasks.

6. Huber Loss: This is a robust loss function that is less sensitive to outliers than mean squared error. It is often used in regression problems where the data may contain noisy or corrupted values.

These are just a few examples of the many different types of loss functions that exist. The choice of loss function depends on the specific problem being solved and the characteristics of the data.

##### optimizer

The optimizer is a key component of a neural network that is responsible for updating the model's parameters during the training process. Specifically, the optimizer's role is to minimize the loss function of the neural network by adjusting the weights and biases of the model.

During training, the neural network receives input data and produces an output based on its current parameters. The output is compared to the desired output (i.e., the ground truth) using a loss function, which measures how well the model is performing. The optimizer then computes the gradients of the loss with respect to the model's parameters, and uses these gradients to update the parameters in the direction that reduces the loss.

The choice of optimizer can have a significant impact on the performance of the neural network, as different optimizers have different strengths and weaknesses. Some popular optimizers include:

1. Stochastic Gradient Descent (SGD): This is a simple optimizer that updates the parameters in the direction of the negative gradient of the loss function.

2. Adam: This is a more sophisticated optimizer that uses a combination of adaptive learning rates and momentum to converge more quickly and robustly.

3. Adagrad: This optimizer adapts the learning rate for each parameter based on the historical gradient information.

4. RMSprop: This optimizer also adapts the learning rate for each parameter, but uses a moving average of the squared gradient to do so.

5. Adadelta: This optimizer is similar to RMSprop, but uses a more sophisticated way of adapting the learning rate that does not require an explicit learning rate parameter.

Overall, the optimizer is a crucial component of the neural network training process that helps the model learn the optimal set of parameters for the task at hand.

#### regularization

In machine learning, regularization is a set of techniques used to prevent overfitting in models, including neural networks. Overfitting occurs when a model becomes too complex and begins to fit the noise in the training data instead of the underlying patterns. This can lead to poor performance on new data that the model has not seen before. Regularization techniques help prevent overfitting by constraining the model's capacity or by adding additional constraints to the optimization process.

In neural networks, regularization techniques can be applied in several ways. Here are some common techniques:

1. L1 and L2 regularization: These techniques add a penalty term to the loss function that encourages the weights of the model to be small. L1 regularization encourages the weights to be sparse, while L2 regularization encourages the weights to be small but not necessarily sparse.

2. Dropout: This technique randomly drops out some of the neurons in the network during training, forcing the remaining neurons to learn more robust and diverse representations of the data.

3. Early stopping: This technique stops the training process before the model has converged to the training data, preventing overfitting by finding the optimal trade-off between underfitting and overfitting.

4. Data augmentation: This technique increases the size of the training set by applying random transformations to the input data, such as rotating, cropping, or flipping the images. This helps the model learn to be more robust to variations in the input data.

Regularization is an important part of the training process for neural networks, as it helps prevent overfitting and improve the generalization performance of the model. The choice of regularization technique depends on the specific problem being solved and the characteristics of the data.