<a href="https://colab.research.google.com/github/hikmatfarhat-ndu/pytorch/blob/main/dl_intro.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Supervised Learning

Machine learning tasks can be loosely grouped into three categories
1. Supervised Learning
1. Unsupervised Learning
1. Reinforcement Learning

In this workshop se we will practice mostly **supervised learning**. In supervised learning we are given N data points.

$$Data=\{(x_1,y_1),\ldots,(x_N,y_N)\}$$ 

presumably generated (or sampled) by some __unknown__  function __y=f(x)__. Our goal basically is to __learn__ (an approximation of) __f(x)__. If we do it successfully then for any input __x__ we can compute __y=f(x)__. 
We will see two  types of supervised learning  __classification__ and **regression**. In the first case  __y__ belongs to a discrete set of classes _C_. 
In this notebook we give the first example of **supervised learning**. We are given a set of (image,label) pairs (CIFAR10)   where each image can be in one of the __ten__ classes: ship, horse, car...etc and so each label is a value between 0 and 9 denoting the class of image. For example an image with an associated label of 8 is that of a ship. 

To simplify matters we will group all "machines" (ship, car,...) into one group and all living things (horse, dog,...) into another


In [27]:
import torch 
import torchvision as vision

In [28]:
cifar10_train=vision.datasets.CIFAR10(".",download=True)
cifar10_test=vision.datasets.CIFAR10(".",download=True,train=False)

Files already downloaded and verified
Files already downloaded and verified


Next we create torch **tensors** from the datasets. For now, we think of a torch **tensor** as a multidimensional array.

**Note**: the pixel values are divided by the maximal value (255). This is ofen the case to aid with the convergence

In [29]:
img_train=torch.tensor(cifar10_train.data,dtype=torch.float32)/255.
img_test=torch.tensor(cifar10_test.data,dtype=torch.float32)/255.
label_train=torch.tensor(cifar10_train.targets,dtype=torch.float32)
label_test=torch.tensor(cifar10_test.targets,dtype=torch.float32)



## Logistic Regression

In this module we introduce Logistic Regression which can be regarded as the **simplest neural network**, a single "neuron". This type of network is sometime called a Perceptron, but the method used for learning is different from the way a Perceptron learns. 

As can be seen from the figure below the input is a vector of size _n_ and it feeds a single unit (a neuron or perceptron). To obtain the output we perform the **dot** product between the matrix **W** and the input **x** and the result is fed into some function (usually nonlinear) _f_

$$
\begin{align*}
z&=\sum_iw_i\cdot x_i+b\\
\hat{y}(x)&=f(z)
\end{align*}
$$

Since $z$ depends on $w$ and $b$ so does $\hat{y}$. The input and _f_ are known whereas _W_ and _b_ are parameters to be determined. Our goal is to find the _optimal_ _W_ and _b_ such that the output is as *close as possible* to the label associated with the input.
![title](https://github.com/hikmatfarhat-ndu/CSC645/blob/master/figures/perceptron.png?raw=1)

How is **as close as possible** defined? The dataset is usually a set of pairs $(x,y)$. We define the loss as the **deviation** between the lable $y$ and the result $\hat{y}=f(z)$

$$loss=E_{w,b}(y,\hat{y})$$

The function $E$ depends on the problem (for example binary cross entropy, mean squared error,...)

Note that $E$ depends on the parameters $w,b$. Our goal is to find the **optimal** $w,b$ such that the loss is minimal. From calculus we know that to find the minimum (max) of a function we compute its derivative and find where it is null.

## Gradient Descent


<center>
<img src="https://github.com/hikmatfarhat-ndu/CSC645/blob/master/figures/gradient-descent.png?raw=1" width="350">
</center>

Now that we have an expression to optimize we need a method to find the optimal parameters. Typically, one computes the gradient and the optimal value corresponds to the value  of the parameters when the gradient vanishes. Unfortunately, for logistic regression there is __no closed form solution__ so we seek a numerical method to find the optimal parameters.

Our goal is to find the **optimal** values for _W_ and _b_. To do so we give them some _arbitrary_ values and then using the expression for $E$
In the figure below we show an arbitrary function _E(w)_. For a given value of _w_ we compute the derivative (slope) of _E_ with respect to _w_ (two different values are shown). The point on the left side has a negative slope so we need to **increase** the value of _w_ to move toward the minimum whereas the point on the right side the slope is positive so we have to **decrease** the value of _w_. 

In general we "update" the values of _w_ and _b_ as follows

$$
\begin{align*}
  w=w-\alpha\cdot \frac{\partial E}{\partial w}\\
  b=b-\alpha\cdot \frac{\partial E}{\partial b}
\end{align*}
$$

where $\alpha$ is a parameter chosen by us, called the __learning rate__.

### Flatening the images
The images have dimensions (3,32,32) (3 channels, 32 height,32 width). To feed them to our "neuron" we need to create a vector of dimension 3x32x32

In [30]:
dim=3*32*32
train_samples=50000
test_samples=10000
img_train=img_train.reshape(train_samples,dim)
img_test=img_test.reshape(test_samples,dim)

All "machines" are given label 1 and living things label 0

In [31]:
#airplane=0,car=1,bird=2,cat=3,deer=4,dog=5,frog=6,horse=7,ship=8,truck=9
features=torch.tensor([0,1,8,9])
for i in range(label_train.shape[0]):
    if torch.isin(label_train[i],features):
        label_train[i]=1
    else:
        label_train[i]=0

for i in range(label_test.shape[0]):
    if torch.isin(label_test[i],features):
        label_test[i]=1
    else:
        label_test[i]=0        

In [32]:
# the dataset is a bit biased
torch.count_nonzero(label_train)

tensor(20000)

### Initialize the parameters

In [38]:
samples=img_train.size()[1]
dim=3*32*32
weights=torch.rand(dim,requires_grad=True,dtype=torch.float32)
weights.data/=train_samples
bias=torch.tensor(0.,requires_grad=True,dtype=torch.float32)

In [39]:
rate=0.015
##model=Net().cuda()
import torch.optim as optim
#optimizer=optim.SGD(model.parameters(),lr=0.015)
loss_fn=torch.nn.BCELoss()

for i in range(1000):
  #optimizer.zero_grad()
  #y_hat=model(img_train)
  y_hat=torch.matmul(img_train,weights)+bias
  y_hat=torch.sigmoid(y_hat)
  loss=loss_fn(y_hat.squeeze(),label_train)
 # loss.backward()
  dw,db=torch.autograd.grad(loss,[weights,bias])
 #optimizer.step()
  
  if(i%100==0):
    print(loss)
  weights.data-=rate*dw
  bias.data-=rate*db
  #weights.data-=rate*weights.grad.data
  #bias.data-=rate*bias.grad.data
  #bias.grad.data.zero_()
  #weights.grad.data.zero_()

tensor(0.6942, grad_fn=<BinaryCrossEntropyBackward0>)
tensor(0.4717, grad_fn=<BinaryCrossEntropyBackward0>)
tensor(0.4585, grad_fn=<BinaryCrossEntropyBackward0>)
tensor(0.4523, grad_fn=<BinaryCrossEntropyBackward0>)
tensor(0.4480, grad_fn=<BinaryCrossEntropyBackward0>)
tensor(0.4448, grad_fn=<BinaryCrossEntropyBackward0>)
tensor(0.4423, grad_fn=<BinaryCrossEntropyBackward0>)
tensor(0.4401, grad_fn=<BinaryCrossEntropyBackward0>)
tensor(0.4384, grad_fn=<BinaryCrossEntropyBackward0>)
tensor(0.4368, grad_fn=<BinaryCrossEntropyBackward0>)


In [40]:
def predict(X):
    
    m = X.shape[0]
    print(m)
    #Y_prediction = torch.zeros(m,1).cuda()
    Y_prediction = torch.zeros(m,1)
    
    # Compute vector "Y_hat" predicting
    #    the probabilities of a machine being present in the picture
    #p_hat= model(X)   # compute activation
    y_hat=torch.matmul(X,weights)+bias
    y_hat=torch.sigmoid(y_hat)
    #print(p_hat.size())
    #for i in range(p_hat.shape[0]):
    print(y_hat.size())
    for i in range(y_hat.shape[0]):    
        # Convert probabilities Y_hat[0,i] to actual predictions p[0,i]
        #if p_hat[i]>=0.5:
        if y_hat[i]>=0.5:
            Y_prediction[i]=1
        else:
            Y_prediction[i]=0
    

    return Y_prediction

In [41]:
img_test.size()

torch.Size([10000, 3072])

In [42]:
result=predict(img_test)

10000
torch.Size([10000])


In [43]:
Y_prediction_test = predict(img_test).squeeze()
Y_prediction_train = predict(img_train).squeeze()
#print("train accuracy:"+str((100 - torch.mean(torch.abs(Y_prediction_train - label_train)) * 100)))
print("test accuracy:"+str((100 - torch.mean(torch.abs(Y_prediction_test - label_test)) * 100)))

10000
torch.Size([10000])
50000
torch.Size([50000])
test accuracy:tensor(81.1900)


In [8]:

import torch.nn as nn

class Net(nn.Module):
  def __init__(self):
    super(Net, self).__init__()
  #bias is True by default. It is included here for illustration
    self.input_size=img_train.size()[1]
    self.output_size=1
    self.layer=nn.Linear(self.input_size,self.output_size,bias=True)
  def forward(self,x):  
    y_hat=self.layer(x)
    y_hat=torch.sigmoid(y_hat)
#   y_hat=torch.matmul(x,w)+b
#   y_hat=torch.sigmoid(y_hat)
    return y_hat

In [9]:
#img_train=img_train.cuda()
#img_test=img_test.cuda()
#label_train=label_train.cuda()
#label_test=label_test.cuda()