# Loss Functions
---

**1. Mean Absolute Error**

$\text{Mean Absolute Error (MAE)} = {\frac{1}{n}}{\sum_{i=1}^{n} abs(y_i - {\hat{y}_i})}$

In [None]:
model.compile(optimizer="adam", 
              loss="mean_absolute_error",
              metrics=["accurary"]
)

**2. Mean Squared Error**

$\text{Mean Squared Error (MSE)} = {\frac{1}{n}}{\sum_{i=1}^{n} (y_i - {\hat{y}_i})^2}$

In [None]:
model.compile(optimizer="adam", 
              loss="mean_square_error",
              metrics=["accurary"]
)

**3. Binary Cross Entropy**

_Also known as log loss_

$\text{Binary Cross Entropy}  = {\frac{1}{n}}{\sum_{i=1}^{n} y_i log({\hat{y}_i}) + (1-y_i) \cdot {\log(1 - {\hat{y_i}})}}$

In [None]:
model.compile(optimizer="adam", 
              loss="binary_crossentropy",
              metrics=["accurary"]
)

# Gradient Descent
---

In each epoch, assuming _log loss_

**1. Calculate weight**

$w = w - \alpha * \frac{\partial}{\partial w }$,  
<br>where $w$ = weight, $\alpha$ = learning rate


$\frac{\partial}{\partial w } = {\frac{1}{n}}{\sum_{i=1}^{n} x_i log({y_{\text predicted}} - y_\text{true})}$



**2. Calculate bias**

$bias = bias - \alpha * \frac{\partial}{\partial b }$,  
<br>where $bias$ = bias, $\alpha$ = learning rate


$\frac{\partial}{\partial b } = {\frac{1}{n}}{\sum_{i=1}^{n} ({y_{\text predicted}} - y_\text{true})}$


In [1]:
# Code Sample

def sigmoid_loss(y):
    ...

def log_loss(y_true, y_predicted):
    ...


def gradient_descent(x1, x2, y_true, epochs):
    w1 = w2 = 1
    bias = 0
    learning_rate = 0.5
    num_inputs = len(x1)
    
    for i in range(epochs):
        weighted_sum = w1*x1 + w2*x2 + bias
        y_predicted = sigmoid_loss(weighted_sum)
        loss = log_loss(y_true, y_predicted)
        
        w1d = (1/num_inputs)*np.dot(np.transpose(w1), (y_predicted - y_true))
        w2d = (1/num_inputs)*np.dot(np.transpose(w1), (y_predicted - y_true))
        
        bias_d = np.mean(y_predicted-y_true)
        
        w1 = w1 - learning_rate * w1d
        w2 = w2 - learning_rate * w2d
        bias = bias - learning_rate * bias_d
        
        print(f'Epoch: {i}, w1:{w1}, w2:{w2}, bias:{bias}, loss:{loss}')

    
    return w1, w2, bias

# Dropout Regularization
---

Why will dropout help with overfitting?
- It can't rely on one input as it might be dropped out at random
- Neurons will not learn redundant details of inputs

### Example (adding Dropout layer on `keras`)

In [None]:
model = keras.Sequential([
  # input layer
  keras.layers.Dense(60, input_dim=60, activation="relu"),
  keras.layers.Dropout(0.5),
  # hidden layer(s)
  keras.layers.Dense(60, activation="relu"),
  keras.layers.Dropout(0.5),
  # output layer
  keras.layers.Dense(1, activation="sigmoid)
])