# Stochastic Gradiant Descent

Reference: [Kaggle](https://www.kaggle.com/code/ryanholbrook/stochastic-gradient-descent/tutorial)

- how to train a neural netrwork?

### What do we need?

- training set - adjusting the weight
- testing set
- *loss function**
- *optimzer*

## Loss Function

- this is how we tell a network *what* problem to solve
- **loss function** measures the disparity between the target's true value and the value the model predicts
    - MSE
    - MAE
    - Huber loss
- during the training, the model will use the loss function as a guide for finding the correct values of its weights (lower loss is better)

## Optimizer - Stochastic Gradiant Descent (SGD)

- how to solve our problem
- the **optmizer** is an algorithm that adjust the weights to minimize the loss
- vistually all the optimization algorithms used in deep learning belong to a family called **stochastic gradiant descent** 
    - iterative algorithms that train a network in steps
    
One **step** of training goes like this:
1. Sample some training data and run it through the network to make prediction
2. Measure the loss between the prediction and the true values
3. Finally, adjust the weights in a direction that makes the loss smaller

Then just do this over and over until the loss is a small as you like.

Each iteration's sample of training data is called a minibatch (or often just "batch"), while a complete round of the training data is called an epoch. The number of epochs you train for is how many times the network will see each training example.

### Learning rate and Batch Size

- the **learning rate** is a number that determines the size of the shifts in the direction of each batch, 
- a small value means the networks need to see more minibatchers before its weights converge to their best values
- the learning rate and the size of the minibatches are the two parameters that have the largest effect on how the SGD training proceeds
- usually we use **Adam**, a SGD algorithm that has a adaptive learning rate that akes suitable for most problems without any parameter tuning (self tuning)

To add this to out model:

```
model.compile(
    optimizer="adam",
    loss="mae",
)
```

# Example

In [8]:
from IPython.display import display

red_wine = pd.read_csv('red-wine.csv')

# Create training and validation splits
df_train = red_wine.sample(frac=0.7, random_state=0)
df_valid = red_wine.drop(df_train.index)
display(df_train.head(4))

# Scale to [0, 1]
max_ = df_train.max(axis=0)
min_ = df_train.min(axis=0)
df_train = (df_train - min_) / (max_ - min_)
df_valid = (df_valid - min_) / (max_ - min_)

# Split features and target
X_train = df_train.drop('quality', axis=1)
X_valid = df_valid.drop('quality', axis=1)
y_train = df_train['quality']
y_valid = df_valid['quality']

Unnamed: 0,fixed acidity,volatile acidity,citric acid,residual sugar,chlorides,free sulfur dioxide,total sulfur dioxide,density,pH,sulphates,alcohol,quality
1109,10.8,0.47,0.43,2.1,0.171,27.0,66.0,0.9982,3.17,0.76,10.8,6
1032,8.1,0.82,0.0,4.1,0.095,5.0,14.0,0.99854,3.36,0.53,9.6,5
1002,9.1,0.29,0.33,2.05,0.063,13.0,27.0,0.99516,3.26,0.84,11.7,7
487,10.2,0.645,0.36,1.8,0.053,5.0,14.0,0.9982,3.17,0.42,10.0,6


# Exercise

In [3]:
import pandas as pd
import numpy as np

import matplotlib.pyplot as plt
import seaborn as sns

import tensorflow as tf

In [4]:
df = pd.read_csv('abalone.csv')
print(df.shape)
df.head()

(4177, 9)


Unnamed: 0,Type,LongestShell,Diameter,Height,WholeWeight,ShuckedWeight,VisceraWeight,ShellWeight,Rings
0,M,0.455,0.365,0.095,0.514,0.2245,0.101,0.15,15
1,M,0.35,0.265,0.09,0.2255,0.0995,0.0485,0.07,7
2,F,0.53,0.42,0.135,0.677,0.2565,0.1415,0.21,9
3,M,0.44,0.365,0.125,0.516,0.2155,0.114,0.155,10
4,I,0.33,0.255,0.08,0.205,0.0895,0.0395,0.055,7
