## Lecture 2.1: Regression and Classification

### Linear Regression

Regression Model: </br>
&ensp;&ensp;&ensp;&ensp;$f_{\theta}\ :\ \mathbb{R}^{n}\ \rightarrow\ \mathbb{R}^{d}$

Linear Regression: </br>
&ensp;&ensp;&ensp;&ensp;$f_{\theta}(x)\ =\ Wx\ +\ b$

Parameters: </br>
&ensp;&ensp;&ensp;&ensp;$\theta\ =\ (W,\ b)$

### Linear Regression Exammple:

Temperature Forecast:

$f(x)$: average temperature on day $x$

Day as input, temperature as output where the weight determines the slope of the line and the bias determines how much the line shifts up or down

In [None]:
import torch
import torch.nn as nn

In [None]:
model = nn.Linear(10, 1) # always define this linear layer

In [None]:
print(model)
print(model.weight)
print(model.bias)

In [None]:
# x = torch.ones(10)
y = torch.zeros(10)
# model(x)
model(y)

### Linear Regression in PyTorch

Define a linear regression model:

```python
linear = torch.nn.Linear(4, 2)
print(f"{linear.weight=}")
print(f"{linear.bias=}")

x = torch.as_tensor([1, 2, 3, 4], dtype=torch.float32)
print(f"{linear(x)==}")
```

### Linear Regression: Limitation

Cannot deal with non-linear patterns:
* cyclic functions
* quadratic functions
* ...

### Linear Binary Classification

Binary classification model: </br>
&ensp;&ensp;&ensp;&ensp;$f_{\theta}\ :\ \mathbb{R}^{n}\ \rightarrow\ [0,\ 1]$

Linear binary classfication: </br>
&ensp;&ensp;&ensp;&ensp;$f_{\theta}(x)\ =\ \sigma(Wx\ +\ b)$ </br>
&ensp;&ensp;&ensp;&ensp;$\sigma(x)\ =\ \frac{1}{1\ +\ e^{-x}}$

Parameters: </br>
&ensp;&ensp;&ensp;&ensp;$\theta\ =\ (W,\ b)$

### Linear Binary Classification: Decision Boundary

Same exact function as the linear regression model, just wrapped in the sigmoid function: </br>
&ensp;&ensp;&ensp;&ensp;$\sigma(Wx\ +\ b)$

* The weight $(W)$ determined the rotation of the decision boundary that separates the two classes
* The bias $(b)$ is how far we shift the plane up or down

### Linear Binary Classification: An Example

Input $x$: average daily temperature </br>
Output $f(x)$: whether it will rain on Wednesday

Prediction: </br>
&ensp;&ensp;&ensp;&ensp;$P(rain)\ =\ f_{\theta}\ =\ \sigma(Wx\ +\ b)$

In [None]:
class LinearClassifier(torch.nn.Module):
    def __init__(self, input_dim, output_dim) -> None:
        super().__init__()
        self.fc = torch.nn.Linear(input_dim, output_dim)

    def forward(self, x):
        """Never put a sigmoid directly into your model."""
        return nn.functional.sigmoid(self.fc(x))
    
model = LinearClassifier(10, 1)
print(model)
print(model.fc.weight)
print(model.fc.bias)

# x = torch.zeros(10)
# model(x)
# x = torch.ones(10)
# model(x)
x = torch.rand(100, 10)
model(x)

### Linear Binary Classification: Limitation

* Linear classifier are not very powerful models
* Cannot deal with non-linear decision boundaries
* For deep learning, we will need to use non-linear models

### Linear Multi-Class Classification

Multi-class classification model: </br>
&ensp;&ensp;&ensp;&ensp;$f_{\theta}\ :\ \mathbb{R}^{n}\ \rightarrow\ \mathbb{P}^{c}\ \text{where}\ \mathbb{P}^{c}\ \subset\ \mathbb{R}_{+}^{c}\ \ \forall_{y\ \in\ \mathbb{P}^{c}}1^{\top}y\ =\ 1$ </br>
* Input: Real valued number
* Output: One class in $C$ possible classes

Linear multi-class classification: </br>
&ensp;&ensp;&ensp;&ensp;$f_{\theta}(x) = \text{softmax}(Wx\ +\ b)$
&ensp;&ensp;&ensp;&ensp;$\text{softmax}(v)_{i}\ =\ \frac{e^{V_{i}}}{\Sigma_{j}^{n}\ =\ 1^{e^{v_{j}}}}$ </br>
* $C$ real valued numbers that are all positive
* One additional constraint: There are $C$ positive numbers that all sum up to one
* Softmax: a function that transformed input to binary classification probabilities for multiple classes

Parameters: </br>
&ensp;&ensp;&ensp;&ensp;$\theta\ =\ (W,\ b)$

For multi-class classification, always regress to an output size $C$, where $C$ is the number of classes we want to split.

### Softmax Function
$$
\text{For input v}\ =
\begin{bmatrix}
V_{1} \\
... \\
V_{d}
\end{bmatrix} \in \mathbb{R}^{d},\ \text{functions softmax}\ :\ \mathbb{R}^n\ \rightarrow\ \mathbb{P}^{c}.
$$

$$
\mathbb{P}^{c}\ \subset\ \mathbb{R}_{+}^{c}\ \ \ \ \forall_{y\ \in\ \mathbb{P}^{c}}1^{\top}y\ =\ 1
$$

$$
\text{softmax}(\text{v})\ =\ \frac{1}{\Sigma_{i}\ e^{v_{i}}}
\begin{bmatrix}
e^{v_{1}} \\
... \\
e^{v_{d}}
\end{bmatrix}
$$

* Input: $\mathbb{R}$ Real values range from $-\infty\ \text{to}\ \infty$
* Output: The probability $\mathbb{P}$ over all possible classes
* Softmax exponentializes all the input, taking any input from $-\infty\ \text{to}\ \infty$ converting it between $0\ \text{to}\ \infty$
* Next, the values are normalized to sum up to $1$

### Linear Multi-Class Classification

$$
\text{Let W}\ =\ 
\begin{bmatrix}
\text{w}_{1}^{\top} \\
\text{w}_{2}^{\top} \\
... \\
\text{w}_{d}^{\top}
\end{bmatrix} \text{where}\ \text{w}_{j}\ \in\ \mathbb{R}^{n}
$$

$$
\text{Classify}(\text{x})\ = \underset{j\ \in\ \{1,\ ...,\ d\}}{\text{arg max softmax}}(\text{Wx}\ +\ \text{b})_{j} \\

=\ \underset{j\ \in\ \{1,\ ...,\ d\}}{\text{arg max}}(\text{Wx}\ +\ \text{b})_{j} \\

=\ \underset{j\ \in\ \{1,\ ...,\ d\}}{\text{arg max}}\ \text{w}_{j}^{\top}\text{x}\ +\ \text{b}_{j}
$$

* Softmax maintains order of input regardless of linearity or non-linearity
* run the input through softmax to find the max class label for a specific element
* this run through softmax does not change which element has the highest value
* softmax ensures the ouputs are probabilities somewhat interpretable over multiple classes without changing order
* creates a multi directional plane of separation, one for each class

### Linear Multi-Class Classification: Example

Input $x$: day of the week

Output: $f(x)$: precipitation (rain, snow, hail, sun)

Prediction:
* $P\text{(rain)}\ =\ f_{\theta}(\text{x})_{1}\ =\ \text{softmax}(\text{Wx}\ +\ \text{b})_{1}$
* $P\text{(snow)}\ =\ f_{\theta}(\text{x})_{2}\ =\ \text{softmax}(\text{Wx}\ +\ \text{b})_{2}$
* $P\text{(hail)}\ =\ f_{\theta}(\text{x})_{3}\ =\ \text{softmax}(\text{Wx}\ +\ \text{b})_{3}$
* $P\text{(sun)}\ =\ f_{\theta}(\text{x})_{4}\ =\ \text{softmax}(\text{Wx}\ +\ \text{b})_{4}$

In [24]:
class LinearClassifier(torch.nn.Module):
    def __init__(self, input_dim, n_classes) -> None:
        super().__init__()
        self.fc = torch.nn.Linear(input_dim, n_classes)

    def forward(self, x):
        return nn.functional.softmax(self.fc(x), dim=-1)
    
model = LinearClassifier(10, 4)
x = torch.ones(10)
model(x)

tensor([0.1110, 0.2620, 0.2279, 0.3990], grad_fn=<SoftmaxBackward0>)

In [26]:
x = torch.rand(20, 10)
model(x)
model(x).sum(dim=-1)

tensor([1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000,
        1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000,
        1.0000, 1.0000], grad_fn=<SumBackward1>)

* Just like with binary classification, never define your model with softmax inside of the model
* It is numeriocally unstable and softmax should be produced outside when training neural networks

### Multi-Class vs Multiple Binary Classification

<div style="display: flex; justify-content: space-between;">
  <div style="width: 45%;">
    <h5>Multi-Class Classification (Softmax):</h3>
    <ul>
      <li>Descibes exactly <strong>one category</strong></li>
      <li><strong>no negative</strong> examples</li>
      <li>calibrated probabilities</li>
      <li>used for <strong>mutually exclusive</strong> categories</li>
    </ul>
  </div>
  <div style="width: 45%;">
    <h5>Multiple Binary Classifier (Sigmoid):</h3>
    <ul>
      <li>Allows for <strong>multiple categories</strong></li>
      <li><strong>requires negative</strong> examples</li>
      <li><strong>uncalibrated probabilities</strong></li>
      <li>used for <strong>multi-label tagging</strong></li>
    </ul>
  </div>
</div>

<div style="display: flex; justify-content: space-between; margin-top: 20px;">
  <div style="width: 45%;">
    <strong>Examples:</strong>
    <ul>
      <li>Predicting the weather (rain, cloudy, sunny)</li>
      <li>Predicting the scientific name of an animal</li>
      <li>Predicting the next word in a sentence</li>
    </ul>
  </div>
  <div style="width: 45%;">
    <strong>Examples:</strong>
    <ul>
      <li>Predicting where in Texas it will rain</li>
      <li>Predicting attributes of an animal</li>
      <li>Predicting which books a sentence can be found in</li>
    </ul>
  </div>
</div>
