<a href="https://colab.research.google.com/github/kangwonlee/pytorch-ibm-coursera/blob/main/week03.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


# Hello PyTorch 👋🏻



references
* https://www.coursera.org/learn/deep-neural-networks-with-pytorch/
* https://github.com/damounayman/Deep-Neural-Networks-with-PyTorch/blob/main/Week1/1D_tensors.ipynb



## week 3



### 4.1 Multiple Linear Regression Prediction



$$
\mathbb{y}_{n \times 1} = \mathbb{X}_{n \times m}\mathbb{w}_{m \times 1} + b_{-1\times 1}
$$



Here, -1 of $b_{-1 \times 1}$ means the `numpy` broadcasting.



#### Linear Class



In [None]:
import torch.nn



In [None]:
torch.manual_seed(1)
model = torch.nn.Linear(in_features=2, out_features=1)

list(model.parameters())



In [None]:
model.state_dict()



Forward


In [None]:
X = torch.tensor([1.0, 3.0])
yhat = model(X)
yhat



#### Custom Modules



In [None]:
import torch.nn


torch.manual_seed(1)
class LR(torch.nn.Module):

  def __init__(self, input_size, output_size):
    super(LR, self).__init__()
    self.linear = torch.nn.Linear(input_size, output_size)

  def forward(self, x):
    return self.linear(x)

model = LR(input_size=2, output_size=1)



In [None]:
X = torch.tensor(
    [
        [1.0, 1.0],
        [1.0, 2.0],
        [1.0, 3.0],
    ]
)
X


In [None]:
model(X)



### 4.2 Multiple Linear Regression Training



$$
l\left(\mathbb{w}, b\right)=
\frac{1}{N}
\sum_{i=1}^N{
  \left(
    y_i - (x \cdot \mathbb{w}+b)
  \right)^2
}
$$



Gradient vector



$$
\nabla l(\mathbb{w}, b) = \begin{bmatrix}
  \frac{\partial}{\partial w_1}l(\mathbb{w}, b) \\
  \frac{\partial}{\partial w_2}l(\mathbb{w}, b) \\
  \vdots \\
  \frac{\partial}{\partial w_d}l(\mathbb{w}, b) \\
  \frac{\partial}{\partial b}l(\mathbb{w}, b) \\
\end{bmatrix}
$$



In [None]:
import matplotlib.pyplot as plt

import torch
import torch.nn
import torch.optim
import torch.utils.data



In [None]:
class Data2D(torch.utils.data.Dataset):
  def __init__(self, n_samples=21, ndim=2, xlim=1, noise=0.1):

    self.x = torch.zeros(n_samples, ndim)
    for idim in range(ndim):
      self.x[:, idim] = torch.linspace(-xlim, xlim, n_samples)

    self.w = torch.tensor([[1.0]]*ndim)
    self.b = 1.0

    self.f = self.forward(self.w, self.x, self.b)
    self.y = self.x + noise * torch.randn((n_samples, 1))

    self.len = n_samples

  @staticmethod
  def forward(w, x, b):
    return torch.mm(x, w) + b

  def __getitem__(self, index):
    return self.x[index], self.y[index]

  def __len__(self):
    return self.len



In [None]:
data_set = Data2D()
criterion = torch.nn.MSELoss()

trainloader = torch.utils.data.DataLoader(
    dataset=data_set,
    batch_size=2
)

model = LR(input_size=2, output_size=1)
optimizer = torch.optim.SGD(model.parameters(), lr=0.1)

LOSS, epoch_list = [], []

for epoch in range(100):
  for k, (x, y) in enumerate(trainloader):
    yhat = model(x)
    loss = criterion(yhat, y)

    epoch_list.append(epoch + k/len(trainloader))
    LOSS.append(loss.item())
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()



In [None]:
plt.loglog(epoch_list, LOSS, '.-')
plt.xlabel('epoch')
plt.ylabel('loss')
plt.grid(True)



Multiple dimension example :
* input 1 dimension
* output 10 dimension



In [None]:
torch.manual_seed(1)

model = LR(1, 10)
model(torch.tensor([1.0]))



In [None]:
list(
    model.parameters()
)



In [None]:
x = torch.tensor([[1.0]])



Predict one sample



In [None]:
yhat = model(x)
yhat


Predict three samples



In [None]:
X = torch.tensor([
    [1.0],
    [1.0],
    [3.0],
])



In [None]:
Yhat = model(X)
Yhat



$$
\mathbb{y}_{n \times m} = \mathbb{X}_{n \times d}\mathbb{W}_{d \times m} + b
$$



Training with multiple output



$$
\mathbb{y}_{n \times m} = \mathbb{X}_{n \times d}\mathbb{W}_{d \times m} + \mathbb{b}_{-1\times m} \\
l({\mathbb{W}}, b) = \frac{1}{N}\sum_{i=1}^{N}{
  \left\Vert
    y_i-(\mathbb{x}_i\mathbb{W}_{d \times m}+\mathbb{b})
  \right\Vert^2
}
$$



Updating the Weight $\mathbb{W}$ and bias $\mathbb{b}$ from k'th step to (k+1)st
$$
\begin{align}
  \mathbb{W}^{k+1} &= \mathbb{W}^k-\eta \nabla l(\mathbb{W}^k, b^k) \\
  \mathbb{b}^{k+1} &= \mathbb{b}^k-\eta \frac{\partial}{\partial b} l(\mathbb{W}^k, b^k)
\end{align}
$$


`Data2D` class for multiple input dimensions and multiple output dimensions



In [None]:
class Data2D(torch.utils.data.Dataset):
  def __init__(self, n_samples=21, in_dim=2, out_dim=2, xlim=1, noise=0.1):
    self.x = torch.zeros(n_samples, in_dim)
    for idim in range(in_dim):
      self.x[:, idim] = torch.linspace(-xlim, xlim, n_samples)

    self.w = torch.tensor(
      [
        [float((-1)**odim) for odim in range(out_dim)]
          for idim in range(in_dim)
      ]
    )

    self.b = torch.tensor(
      [
        [float((-1)**odim) for odim in range(out_dim)]
      ]
    )

    self.f = self.forward(self.w, self.x, self.b)
    self.y = self.x + noise * torch.randn((n_samples, 1))

    self.len = n_samples

  @staticmethod
  def forward(w, x, b):
    return torch.mm(x, w) + b

  def __getitem__(self, index):
    return self.x[index], self.y[index]

  def __len__(self):
    return self.len



Training steps for MIMO



In [None]:
n_input_dimension = 2
n_output_dimensin = 2
n_epoch = 100
n_batch = 5



Traning using Mini-Batch Gradient Descent



In [None]:
data_set = Data2D()
criterion = torch.nn.MSELoss()

trainloader = torch.utils.data.DataLoader(
    dataset=data_set,
    batch_size=n_batch
)

model = LR(input_size=n_input_dimension, output_size=n_output_dimensin)
optimizer = torch.optim.SGD(model.parameters(), lr=0.1)

LOSS, epoch_list = [], []

for epoch in range(n_epoch):
  for k, (x, y) in enumerate(trainloader):
    yhat = model(x)
    loss = criterion(yhat, y)

    epoch_list.append(epoch + k/len(trainloader))
    LOSS.append(loss.item())
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()



In [None]:
plt.loglog(epoch_list, LOSS, '.-')
plt.xlabel('epoch')
plt.ylabel('loss')
plt.grid(True)



### 5.0 Linear Classifiers



#### Linear Classifiers
* Will always give continuous numbers : 3, -2, 0, ...
* To match with class that we want to fit, would need **Threshold Functions**



#### Logistic Regression for classification
* Would use sigmoid function
$$
\delta(z)=\frac{1}{1+e^{-z}}
$$
* Would smoothly connect 0 and 1
* Pass linear function result through $\delta(z)$ to get classification
* Possible to consider as a probability



### 5.1 Logistic Regression : Prediction



#### Logistic Function



* scalar $x$
$$
\begin{align}
z &= wx + b \\
\hat y &= \sigma(z)
\end{align}
$$
* vector $\mathbb x$
$$
\begin{align}
z &= \mathbb x\mathbb w + b \\
\hat y &= \sigma(z)
\end{align}
$$



Get `sig()` function from `nn.Sigmoid()` and plot $z$ vs $\sigma(z)$



In [None]:
import torch
import torch.nn
import matplotlib.pyplot as plt

z = torch.linspace(-100, 100, 2000+1).view(-1, 1)

sig = torch.nn.Sigmoid()

yhat = sig(z)

plt.plot(z.numpy(), yhat.numpy())
plt.xlabel('z')
plt.ylabel('$\sigma(z)$')
plt.grid(True)



Plot $z$ vs $\sigma(z)$ using `torch.sigmoid()`



In [None]:
import torch
import matplotlib.pyplot as plt

z = torch.linspace(-100, 100, 2000+1).view(-1, 1)
yhat = torch.sigmoid(z)

plt.plot(z.numpy(), yhat.numpy())
plt.xlabel('z')
plt.ylabel('$\sigma(z)$')
plt.grid(True)



#### `nn.Sequential()`
* To simplify implementing logistic regression model
* Pass both return values from `nn.Linear()` and `nn.Sigmoid()` to `nn.Sequential()`



In [None]:
import torch.nn

torch.manual_seed(1)


sequential_model = torch.nn.Sequential(
    torch.nn.Linear(1, 1),
    torch.nn.Sigmoid()
)
yhat = sequential_model(z)



#### Build a custom model using `nn.Module`



In [None]:
import torch.nn


class logistic_regression(torch.nn.Module):
  def __init__(self, in_size, out_size=1):
    super(logistic_regression, self).__init__()
    self.linear = torch.nn.Linear(in_size, out_size)

  def forward(self, x):
    return torch.sigmoid(self.linear(x))



In [None]:
custom_model = logistic_regression(1)
yhat = custom_model(z)



#### Making a Prediction
parameters : weigt and bias


In [None]:
print(list(sequential_model.parameters()))



single sample



In [None]:
x = torch.tensor([[1.0]])

yhat = sequential_model(x)
yhat



multiple samples



In [None]:
x = torch.tensor([
    [1.0],
    [100.0],
])

yhat = sequential_model(x)
yhat



#### Multidimensional Logistic Regression



In [None]:
torch.manual_seed(1)



Input dimension = 2



In [None]:
custom_2d_model = logistic_regression(2)



or


In [None]:
sequential_2d_model = torch.nn.Sequential(
    torch.nn.Linear(2, 1),
    torch.nn.Sigmoid(),
)



In [None]:
print(list(sequential_model.parameters()))



single sample



In [None]:
x = torch.tensor([[1.0, 1.0]])

yhat = sequential_2d_model(x)
yhat



multiple samples



In [None]:
x = torch.tensor([
    [1.0, 1.0],
    [1.0, 2.0],
    [1.0, 3.0],
])

yhat = sequential_2d_model(x)
yhat



### 5.2 Bernoulli Distribution Maximum Likelhood Estimation (MLE)



#### Biased coin flip
$\theta = 0.2$

face | outcome | probability
:-----:|:-----:|:-----:
heads | `y = 1` | $$P(1)=0.2$$
tails | `y = 0` | $$P(0)=0.8$$



likelihood

head | head | tail | likelihood
:----:|:----:|:----:|:----:
$$\theta$$ | $$\theta$$ | $$1-\theta$$ | $$\theta^2 (1-\theta) $$
0.2 | 0.2 | 0.8 | 0.032



Bernoulli distribution
$$
p(y|\theta) = \theta ^ y (1 - \theta) ^ {1-y}
$$

face | outcome | probability
:-----:|:-----:|:-----:
heads | `y = 1` | $$p(y=1|\theta)=\theta ^ 1(1-\theta)^{1-1} = \theta$$
tails | `y = 0` | $$p(y=0|\theta)=\theta ^ 0(1-\theta)^{1-0} = (1-\theta)$$



Likelihood function
$$
p(Y|\theta)=\prod_{n=1}^{N}{p\left(y_n|\theta\right)}
=\prod_{n=1}^{N}{\theta ^ {y_n} (1 - \theta) ^ {1-y_n}} \\
$$



Maxizing likelihood
$$
\hat \theta = \underset{\theta}{argmax}(P(Y|\theta))
$$



Maximizing the Log likelihood function
$$
\hat \theta = \underset{\theta}{argmax}(ln(P(Y|\theta)))
$$



Log likelihood function
$$
l(\theta)
=ln(p(Y|\theta))
=\sum_{n=1}^{N}{y_n ln\theta +(1-y_n) ln(1 - \theta)}
$$



### 5.3 Logistic Regression Cross Entropy Loss



#### Problems with Mean Square Error (MSE)
* Can be flat in some regions



Cost function
$$
l(w, b) = \frac{1}{N}\sum_{i=1}^{N}{
  \left(
    y_i-\sigma(w x_i+b)
  \right)^2
}
$$
Assume $w = 1$ to simplify
$$
l(b) = \frac{1}{N}\sum_{i=1}^{N}{
  \left(
    y_i-\sigma(x_i+b)
  \right)^2
}
$$
Also consider the threshold function
$$
l(b) = \frac{1}{N}\sum_{i=1}^{N}{
  \left(
    y_i-THR(x_i+b)
  \right)^2
}
$$
$b$ of the next training step
$$
b^2=b^1 - \eta \frac{d}{db}l(b^1)
$$
* With the threshold function, the cost function slope can be flat
* With the sigmoid $\sigma()$ function, the cost function would have slope
* Thus it will be easier to train
* Even so, in a higher dimensional space, some regions of the MSE cost functions may have flat slope



#### Maximum Likelihood



#### Logistic Regression Cross Entropy



#### PyTorch

