# 8 - GRADIENT DESCENT AND LOGISTIC REGRESSION

[Course Notebook on Github](https://github.com/fastai/fastai/blob/master/courses/ml1/lesson4-mnist_sgd.ipynb)

### Notes
My goal is not to copy what it is taught in the course and in the notebook. Its just various notes and trying things for my own.
Extrapolation in the previous notebook.

## Lecture Personal Notes
1. RF and DT are limited to KNN.
2. NN are good for unstructured and spatial data.
3. Different nomenclature clash in ML. (CV rows and columns)
4. Is it important to normalize the variables in RF? No - we care relationships between independent variables, regardless of scale. All that matters is the order. Also immune to outliers. 
5. For DL - normalize data!
6. Always check the descriptives of train/test/validation sets.
7. Normalize by channel.
8. Different normalization coeff. by features.
9. Universal Approximation Theorem.
10. What does it need to be a derivative.
11. Cost - lower if model is better.
11. Negative Loss Likelihood = cross entropy
12. One-hot encode the independent variable.
13. Predict the probability of the outcome.
14. argmax - critical - which is the largest element and return its index.
15. NN = linear layer followed by activation function and so on.
16. Universal Approximation Theorem - neural nets can describe any function
17. Softmax - activates as close to 0
18. L - NL - l - NL - l -NL - General last NL layer is:
    * If a multicategory, but only one is picked - SoftMax
    * Binary Classification or Multiple label predictions Sigmoid
    * Regression - Nothing 
    * Hidden Layers - Use RELU
    * Leaky RELU - close to 0.
    * ELU
    * Arbitrary Activation functions - (f.e. Sine Wave)
20. 

In [1]:
# from fastai.imports import *
from fastai.torch_imports import *
import torch.nn as nn

In [2]:
net = nn.Sequential(
    nn.Linear(28*28, 10),
    nn.LogSoftmax()
).cuda()

#### Define the network components ourselves

In [9]:
def binary_loss(y, p):
    return np.mean(-(y * np.log(p) + (1-y)*np.log(1-p)))

In [4]:
def get_weights(*dims): 
    return nn.Parameter(torch.randn(dims)/dims[0])

def softmax(x):
    return torch.exp(x)/(torch.exp(x).sum(dim=1)[:,None])

class LogReg(nn.Module):
    def __init__(self):
        super().__init__()
        self.l1_w = get_weights(28*28, 10)  # Layer 1 weights
        self.l1_b = get_weights(10)         # Layer 1 bias

    def forward(self, x):
        x = x.view(x.size(0), -1)
        x = (x @ self.l1_w) + self.l1_b  # Linear Layer
        x = torch.log(softmax(x)) # Non-linear (LogSoftmax) Layer
        return x

In [6]:
net2 = LogReg().cuda()
optim = optim.Adam(net2.parameters())

## About Broadcasting and Matrix Multiplication

In [27]:
import numpy as np
a = np.array([10, 6, -4])
b = np.array([2, 8, 7])

In [9]:
a + b

array([12, 14,  3])

In [20]:
from torch import tensor as T
a = T([10, 6, -4]).cuda()
b = T([2, 8, 7]).cuda()

In [21]:
a + b

tensor([12, 14,  3], device='cuda:0')

In [25]:
(a < b)

tensor([False,  True,  True], device='cuda:0')

#### SIMD - Single Instruction Multiple Data
add multiple data to same size multiple data in a single instructions

In [28]:
a

array([10,  6, -4])

In [30]:
a > 0 # vector - type 1 tensor type 0 tensor - scalar

array([ True,  True, False])

Broadcasting - Copying it so its the same shape as the other vectors

In [31]:
a + 1

array([11,  7, -3])

In [32]:
m = np.array([[1, 2,3], [4,5,6], [7, 8, 9]]); m

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

In [33]:
c = np.array([10, 20 , 30])

In [34]:
m + c

array([[11, 22, 33],
       [14, 25, 36],
       [17, 28, 39]])

In [35]:
c.shape

(3,)

In [36]:
np.broadcast_to(c[:,None], m.shape)

array([[10, 10, 10],
       [20, 20, 20],
       [30, 30, 30]])

In [37]:
#####

In [39]:
c[None]

array([[10, 20, 30]])

In [40]:
c[:, None]

array([[10],
       [20],
       [30]])

In [41]:
c[None] * c[:, None]

array([[100, 200, 300],
       [200, 400, 600],
       [300, 600, 900]])

In [43]:
xg, yg = np.ogrid[0:5, 0:5]

In [44]:
xg + yg

array([[0, 1, 2, 3, 4],
       [1, 2, 3, 4, 5],
       [2, 3, 4, 5, 6],
       [3, 4, 5, 6, 7],
       [4, 5, 6, 7, 8]])