# Intro - Artificial Intelligence Primer

Welcome! This project offers a quick view and practical examples for AI, avoiding the complex installation setup.

We will start will the concepts of Artificial Intelligence, Machine Learning, Deep Learning, and AI frameworks. Then we will explore each architecture in increasing level of complexity and accuracy.

## Concepts

You probably have heard AI-related concepts before, but it's good to review their hierarchy.

![](../media/intro/A-comparative-view-of-AI-machine-learning-deep-learning-and-generative-AI-source.png "Miltiadis D. Lytras")

To allow work in AI, we have different tools:

- **Python**: Python is the primary programming language we'll be using for AI.
- **PyTorch & torchvision**: PyTorch is an open-source machine learning library, and torchvision offers datasets and models for computer vision.
- **Jupyter Notebook**: The interactive environment where this tutorial is presented.
- **NumPy**: A library for numerical operations in Python.
- **scikit-learn**: Machine learning library in Python. We'll use it for performance metrics.
- **Seaborn & Matplotlib**: Visualization libraries in Python.
- **CUDA** (Optional): If you have a compatible NVIDIA GPU, you can install CUDA for GPU acceleration with PyTorch.

# Deep Learning Summary

Below is a summary of the main concepts of Deep Learning. It's very condensed but it'll give you the reasoning behind each concept. For more in-depth information, you can search online, there's plenty of material there, although it's scattered. Subsequent notebooks will have you implement these concepts.

## Neural Networks and Deep Learning


Traditional learning algorithms do not scale with the amount of data. So Deep Learning takes advantage of large amounts of labeled data to increase performance of analyses, predictions, and classification.

### Neural Network basics

You probably recall from middle school that a linear function has the form `y = mx + b`. The basic neuron in a Neural Network also consists of a weight `w` and a bias `b`. This allows a neuron to change parameters to best fit a set of data.

![](../media/intro/weights_and_biases.png "@theDrewDag")

### Logistic regression

We are looking for an equation to describe a set of data, so we use linear regression to fit the parameters to best fit our inputs. In the graphs above, we would use (x,y) pairs as our data. 

For something like a cat image classifier, we would use the value of each RGB color pixel as our `x` input, and our `y` output would be whether the picture is of a cat or not (its label). Since the linear regression classifies the variables, it becomes a Logistic regression. Because the answer is either True of False, this is a Binary Classifier.

Binary Classifiers also apply to either-one-or-the-other class, like below.

![](../media/intro/Logistic_regression.jpg "AcademicianHelp")

#### Loss function

How do we know is our system is doing a good job? We use the loss function. It measures how far away our prediction is from the expected result (label). This information is used to update our parameters. We usually use cross entropy loss in AI.

<img src="../media/intro/Loss_function.jpg" alt="Loss_function" style="width: 300px;"/>

#### Gradient Descent

How do we know the values to update the numbers in our weight and biases parameters? We use derivatives on the cost function and a lot of math. We want the cost function to be minimized, so we use gradient descent to find the lowest cost point.

<img src="../media/intro/Gradient_Descent.jpg" alt="Gradient_Descent" style="width: 300px;"/>

#### Activation function

As it turns out, no matter how many neurons we use, if they are all linear functions, the end result if a linear function as well! This severely hinders our options. An activation function is non-linear, essentially allows us to say "behave like this linear function, but only after a threshold." It's like adding a digital switch to an analog circuit, and it makes possible Neural Networks. A common activation function is the REctified Linuear Unit, ReLU: `y=x (if x>0)`

<img src="../media/intro/ReLU.png" alt="ReLU" style="width: 300px;"/>



### Neural Networks

With each neuron having parameters (weight, bias) and an activation function (e.g, ReLU), we can make a Neural Network, NN. A shallow NN has one internal layer, and it's fully connected.

Displayed is also a deep NN has multiple hidden layers.

![](../media/intro/Neural_networks.png)

#### Layers


- The input layer is generally a flattened version of your data. For example, an linear array of each (R)ed, (G)reen, and (B)lue value for each pixel in an image.
- The hidden layers are your Neural Network Layers
- The output layer corresponds to how many outputs you need. One for binary classification, multiple for categorization, etc.

Do you see the ReLU activation function above? It's unbounded and generally doesn't work well for outputs, since we want output ranges. For example, probability outputs should be between 0 and 1. We generally use the sigmoid function for the output layer instead.

<img src="../media/intro/Sigmoid.png" alt="Sigmoid" style="width: 300px;"/>


#### Multi-class Classification

When having multiple dependent outputs, for example in categorization, it's useful to use Softmax regression. It's an output layer showing the probabilities for an input to be in any category, with probabilities adding up to 1.


<img src="../media/intro/Softmax_regression.png" alt="Image by @tpreethi" style="width: 40%;"/>

### Forward and Backward Propagation

We train NNs from scratch. We initialize our model with random weights and zero biases, and pass our training data through the NN to get predicted outputs. These predictions will probably suck. But from the training data we know what they *should* be, so we calculate the loss function on that. Then we calculate the gradients that would reduce the loss function. And then we use these gradient values to backpropagate changes into the NN, updating parameter values for weights and biases. And guess what? Our predictions will be better next time.

<img src="../media/intro/backpropagation.png" alt="backpropagation" style="width: 50%;"/>

### The AI model learning process

- Do a forward pass
- Calculate loss
- Calculate gradients
- Do a backward pass
- Update parameters
- Repeat multiple times (epochs) until we get close to expected results

## Hyperparameters

To tune our NN, we need to choose hyperparamaters. These are manually set variables that control the training process of a machine learning model.

### Learning Rate

After we get the gradients from our loss function, we update the parameters by a fraction of the gradient to prevent overshoot. This fraction is the learning rate.

<img src="../media/intro/LearningRate.png" alt="LearningRate by jeremyjordan.me" style="width: 60%;"/>

### Train / Dev / Test

You have a labeled dataset, but your model should work on new data as well. So, we split datasets into training set, development set, and test set.

- **Train**: data used to train the model.
- **Dev**: data used for hyperparameter tuning. Matches production data (e.g: traffic pictures from a specific webcam)
- **Test**: This set is never seen during training, and is only used to determine accuracy. Matches production data.

**Guideline:**
- Decision Making: Train + Dev
- Reflect world data: Dev + Test (taken from same distribution)

The split between sets is a hyperparameter. Recommended split (with Large datasets are > 1 million):

![](../media/intro/Datasets.png "towardsdatascience")

### Regularization

Many times we overfit our data to our training set, and it doesn't perform well on our test set or real world data. To avoid that, we use regularization. Examples:

- **Normalizing inputs**: NNs perform better with average input values close to zero.
- **Dropout**: If random neurons are ignored every now and then, the end result is not fixated on any specific feature.
- **Data augmentation**: Modify your existing data, to expand your training set. For example, rotating/scaling/cropping your images.

![](../media/intro/Regularization.png "Akash Shastri")

### Optimization Algorithms

We want to optimize our models to converge to a minimum loss quickly. For that we:

- Use mini batches of training data, instead of waiting for the whole set to process.
- Include momentum when doing gradient descent (RMSprop).
- Also include acceleration when doing gradient descent (Adam).
- Learning Rate decay: Start learning fast, then converge slow.

## Hyperparameter tuning

When selecting hyperparameters, remember that the scale for the values is usually logarithmic, not linear.

When tuning, work on a single parameter at a time.

## Convolutional Neural Networks (CNN)

CNNs have revolutionized the field of computer vision. They are specifically designed to recognize visual patterns directly from pixel images with minimal preprocessing. CNNs are hierarchical models where neurons in one layer connect to neurons in the next layer in a limited fashion, somewhat like the receptive field in human vision.

A typical CNN architecture consists of:

- **Convolutional Layers**: Apply convolution operation on the input layer to detect features.
- **Activation Layers**: Introduce non-linearity to the model (typically ReLU).
- **Pooling Layers**: Perform down-sampling operations to reduce dimensionality.
- **Fully Connected Layers**: After several convolutional and pooling layers, the high-level reasoning in the neural network happens via fully connected layers.

![](../media/intro/CNN.png "python.plainenglish.io")

### Residual Networks (ResNet)

For most networks, a large number of inputs (e.g: pixels) get monotonically reduced to a small number of outputs (e.g: class). Spatial input information is lost in deeper layers. With residual networks, or ResNets, parts of previous layers are copied, or 'skipped' to deeper layers, providing greater context.


<img src="../media/intro/SkipConnections.jpg" alt="SkipConnections by analyticsvidhya" style="width: 60%;"/>

#### Fine-tuning

A useful trait of ResNets is that the offer transfer learning. Deep ResNets are trained on millions of images to classify a 1000 different classes (e.g: ResNet18). That means most of the layers can already detect useful image features.

Given that ResNets can already detect useful features, we use fine-tuning to achieve transfer learning. We replace the existing output layer (or optionally, additional end layers) to fit our model outputs (for example, classifying between diferent types of apples). Training is fast since it only happens in the last layers.

<img src="../media/intro/Fine-Tuning.png" alt="Fine-Tuning by geeksforgeeks" style="width: 60%;"/>

## Explore

You've learned about a lot of Deep Learning concepts. It's time to see them in action.

The notebooks following this one will have you explore PyTorch, Datasets, Neural Networks, Regularization, CNNs, and ResNets. Click on them in the navigation panel and run the cells as you progress through the material.

### References

- [pytorch-fashionMNIST-tutorial](https://github.com/junaidaliop/pytorch-fashionMNIST-tutorial/blob/main/pytorch_fashion_mnist_tutorial.ipynb)

**Next Notebook: [01-Birds](01-Birds.ipynb)**