## Lecture 2.1: Regression and Classification

### Linear Regression

Regression Model: </br>
&ensp;&ensp;&ensp;&ensp;$f_{\theta}\ :\ \mathbb{R}^{n}\ \rightarrow\ \mathbb{R}^{d}$

Linear Regression: </br>
&ensp;&ensp;&ensp;&ensp;$f_{\theta}(x)\ =\ Wx\ +\ b$

Parameters: </br>
&ensp;&ensp;&ensp;&ensp;$\theta\ =\ (W,\ b)$

### Linear Regression Exammple:

Temperature Forecast:

$f(x)$: average temperature on day $x$

Day as input, temperature as output where the weight determines the slope of the line and the bias determines how much the line shifts up or down

In [11]:
import torch
import torch.nn as nn

In [12]:
model = nn.Linear(10, 1) # always define this linear layer

In [13]:
print(model)
print(model.weight)
print(model.bias)

Linear(in_features=10, out_features=1, bias=True)
Parameter containing:
tensor([[-0.2771, -0.1729,  0.0368,  0.2826, -0.2734, -0.1930, -0.0183, -0.0631,
          0.1511,  0.3114]], requires_grad=True)
Parameter containing:
tensor([-0.2628], requires_grad=True)


In [14]:
# x = torch.ones(10)
y = torch.zeros(10)
# model(x)
model(y)

tensor([-0.2628], grad_fn=<ViewBackward0>)

### Linear Regression in PyTorch

Define a linear regression model:

```python
linear = torch.nn.Linear(4, 2)
print(f"{linear.weight=}")
print(f"{linear.bias=}")

x = torch.as_tensor([1, 2, 3, 4], dtype=torch.float32)
print(f"{linear(x)==}")
```

### Linear Regression: Limitation

Cannot deal with non-linear patterns:
* cyclic functions
* quadratic functions
* ...

### Linear Binary Classification

Binary classification model: </br>
&ensp;&ensp;&ensp;&ensp;$f_{\theta}\ :\ \mathbb{R}^{n}\ \rightarrow\ [0,\ 1]$

Linear binary classfication: </br>
&ensp;&ensp;&ensp;&ensp;$f_{\theta}(x)\ =\ \sigma(Wx\ +\ b)$ </br>
&ensp;&ensp;&ensp;&ensp;$\sigma(x)\ =\ \frac{1}{1\ +\ e^{-x}}$

Parameters: </br>
&ensp;&ensp;&ensp;&ensp;$\theta\ =\ (W,\ b)$

### Linear Binary Classification: Decision Boundary

Same exact function as the linear regression model, just wrapped in the sigmoid function: </br>
&ensp;&ensp;&ensp;&ensp;$\sigma(Wx\ +\ b)$

* The weight $(W)$ determined the rotation of the decision boundary that separates the two classes
* The bias $(b)$ is how far we shift the plane up or down

### Linear Binary Classification: An Example

Input $x$: average daily temperature </br>
Output $f(x)$: whether it will rain on Wednesday

Prediction: </br>
&ensp;&ensp;&ensp;&ensp;$P(rain)\ =\ f_{\theta}\ =\ \sigma(Wx\ +\ b)$

In [22]:
class LinearClassifier(torch.nn.Module):
    def __init__(self, input_dim, output_dim) -> None:
        super().__init__()
        self.fc = torch.nn.Linear(input_dim, output_dim)

    def forward(self, x):
        """Never put a sigmoid directly into your model."""
        return nn.functional.sigmoid(self.fc(x))
    
model = LinearClassifier(10, 1)
print(model)
print(model.fc.weight)
print(model.fc.bias)

# x = torch.zeros(10)
# model(x)
# x = torch.ones(10)
# model(x)
x = torch.rand(100, 10)
model(x)

LinearClassifier(
  (fc): Linear(in_features=10, out_features=1, bias=True)
)
Parameter containing:
tensor([[-0.2464, -0.3095, -0.0098, -0.3001,  0.0657,  0.1902,  0.3049,  0.1928,
         -0.0161,  0.2664]], requires_grad=True)
Parameter containing:
tensor([0.0614], requires_grad=True)


tensor([[0.4595],
        [0.4951],
        [0.5111],
        [0.5644],
        [0.5526],
        [0.5000],
        [0.4993],
        [0.4741],
        [0.5962],
        [0.5643],
        [0.5184],
        [0.5854],
        [0.5774],
        [0.4447],
        [0.4841],
        [0.4898],
        [0.5456],
        [0.4944],
        [0.5013],
        [0.5198],
        [0.4783],
        [0.5244],
        [0.5278],
        [0.4887],
        [0.5849],
        [0.5138],
        [0.4937],
        [0.5652],
        [0.5292],
        [0.5430],
        [0.5397],
        [0.5353],
        [0.4708],
        [0.6335],
        [0.4633],
        [0.4829],
        [0.5538],
        [0.5512],
        [0.5572],
        [0.5787],
        [0.5576],
        [0.5796],
        [0.5577],
        [0.6657],
        [0.5417],
        [0.4775],
        [0.5007],
        [0.5625],
        [0.4658],
        [0.5454],
        [0.5207],
        [0.5249],
        [0.5864],
        [0.5058],
        [0.5123],
        [0

### Linear Binary Classification: Limitation

* Linear classifier are not very powerful models
* Cannot deal with non-linear decision boundaries
* For deep learning, we will need to use non-linear models

### Linear Multi-Class Classification

Multi-class classification model: </br>
&ensp;&ensp;&ensp;&ensp;$f_{\theta}\ :\ \mathbb{R}^{n}\ \rightarrow\ \mathbb{P}^{c}\ \text{where}\ \mathbb{P}^{c}\ \subset\ \mathbb{R}_{+}^{c}\ \ \forall_{y\ \in\ \mathbb{P}^{c}}1^{\top}y\ =\ 1$ </br>
* Input: Real valued number
* Output: One class in $C$ possible classes

Linear multi-class classification: </br>
&ensp;&ensp;&ensp;&ensp;$f_{\theta}(x) = \text{softmax}(Wx\ +\ b)$
&ensp;&ensp;&ensp;&ensp;$\text{softmax}(v)_{i}\ =\ \frac{e^{V_{i}}}{\Sigma_{j}^{n}\ =\ 1^{e^{v_{j}}}}$ </br>
* $C$ real valued numbers that are all positive
* One additional constraint: There are $C$ positive numbers that all sum up to one
* Softmax: a function that transformed input to binary classification probabilities for multiple classes

Parameters: </br>
&ensp;&ensp;&ensp;&ensp;$\theta\ =\ (W,\ b)$

For multi-class classification, always regress to an output size $C$, where $C$ is the number of classes we want to split.

### Softmax Function

For input v =
\begin{bmatrix}
V_{1} \\
... \\
V_{d}
\end{bmatrix} \in \mathbb{R}^{d},\ \text{functions softmax}\ :\ \mathbb{R}^n\ \rightarrow\ \mathbb{P}^{c}.

$\mathbb{P}^{c}\ \subset\ \mathbb{R}_{+}^{c}$ &ensp;&ensp;&ensp;&ensp;$\forall_{y\ \in\ \mathbb{P}^{c}}1^{\top}y\ =\ 1$ </br>
$$
\text{softmax}(\text{v})\ =\ \frac{1}{\Sigma_{i}\ e^{v_{i}}}
\begin{bmatrix}
e^{v_{1}} \\
... \\
e^{v_{d}}
\end{bmatrix}
$$