## Linear Regression

<img src="./images/linear-regression.jpeg" width="400" style="display: block; margin: auto;">

In [1]:
import torch
import torch.nn as nn

In [2]:
# Create a linear regression model
# Arguments: # of inputs, # of outputs

# Note: Each time you reload the model you will get slightly different results,
#       since the model is initialized with different weights. This is important.

model = nn.Linear(10, 1) 

# Inspect the model 
print(model.weight)
print(model.bias) # Only one bias since only one output

Parameter containing:
tensor([[-0.2479,  0.2705, -0.0212, -0.1447,  0.1133,  0.2561, -0.2148, -0.0716,
          0.1681, -0.0261]], requires_grad=True)
Parameter containing:
tensor([-0.0878], requires_grad=True)


In [3]:
x = torch.ones(10)
model(x)

tensor([-0.0061], grad_fn=<ViewBackward0>)

**Limitations:** Linear models are ***not*** good for cyclic functions or quadratic functions. They cannot do anything that really is not a nice straight line.

## Linear Binary Classification

What if, instead, we want to classify our inputs based on if they are in class 1 or class 2? In this case, we regress to probabilities of belonging to class 1 vs class 2. The sigmoid function takes values ranging from $-\infty$ to $\infty$ and squash them so that they are between 0 and 1.

<img src="./images/binary-classification.png" width="500" style="display: block; margin: auto;">

<br>

Binary classification model: 

$$ f_{\theta} : \mathbb{R}^n \implies [0,1] $$

Linear binary classification:

$$f_{\theta}(\mathbf{x}) = \sigma(\mathbf{Wx + b}) $$

$$ \sigma(x) = \frac{1}{1+e^{-x}} $$

Parameters: 

$$ \theta = (\mathbf{W,b}) $$

In [4]:
# A couple special rules for using torch.nn.Module:
#  1. Always immediately call super().__init__()
#  2. Never put a sigmoid directly into the model
#     (We do it here just to show what it would look like)

class LinearClassifier(torch.nn.Module):
    def __init__(self, input_dim, output_dim):
        super().__init__()
        self.fc = torch.nn.Linear(input_dim, output_dim) # Defining a linear regressin layer

    def forward(self, x):
        return nn.functional.sigmoid(self.fc(x)) # Also pass the data through a sigmoid function

model = LinearClassifier(10,1)
print(model)
print(model.fc.weight)
print(model.fc.bias)

x = torch.zeros(10)
print(f"\nTesting with tensor of 0's: {model(x)}")

x = torch.ones(10)
print(f"Testing with tensor of 1's: {model(x)}")

LinearClassifier(
  (fc): Linear(in_features=10, out_features=1, bias=True)
)
Parameter containing:
tensor([[ 0.0271,  0.2877, -0.1170,  0.0942,  0.1914, -0.1088,  0.0101,  0.2968,
          0.2176,  0.0385]], requires_grad=True)
Parameter containing:
tensor([-0.1320], requires_grad=True)

Testing with tensor of 0's: tensor([0.4670], grad_fn=<SigmoidBackward0>)
Testing with tensor of 1's: tensor([0.6912], grad_fn=<SigmoidBackward0>)


**Limitations:** Linear classifiers are not good at dealing with non-linear decision boundaries. They can also not express a function that has two separating planes

## Linear Multi-Class Classification
What if we have $c$ different classes that we want to separate, or classify? In this case again, we regress to probabilities. The softmax function does this for multiple classes (whereas sigmoid does it for binary problems)

Multi-class classification model:

$$ f_\theta : \mathbb{R}^n \rightarrow \mathbb{P}^c \;\;\; where \;\;\; \mathbb{p}^c \in \mathbb{R}^c_+ \;\;\;\; \forall_{\mathbf{y}\in\mathbb{P}^c}\mathbf{1}^\top \mathbf{y} = 1 $$

Linear multi-class classification:

$$ f_\theta (\mathbf{x}) = softmax(\mathbf{Wx+b}) $$
$$ softmax(\mathbf{v}_i = \frac{e^{v_i}}{\sum^n_{j=1}e^{v_j}}) $$

Parameters:

$$ \theta = (\mathbf{W,b) $$

## Softmax function

The softmax function takes a vector of d real valued numbers from $-\infty$ to $+\infty$, and first exponentiates them which maps them from $0$ to $+\infty$. Then, it normalizes them to all sum up to one.

In [5]:
class LinearClassifier(torch.nn.Module):
    def __init__(self, input_dim, n_classes):
        super().__init__()
        self.fc = torch.nn.Linear(input_dim, n_classes) # Defining a linear regressin layer

    def forward(self, x):
        return nn.functional.softmax(self.fc(x), dim=-1) # Also pass the data through a sigmoid function

model = LinearClassifier(10,4)
print(model)
print(model.fc.weight)
print(model.fc.bias)

x = torch.zeros(10)
print(f"\nTesting with tensor of 0's: {model(x)}")

x = torch.ones(10)
print(f"Testing with tensor of 1's: {model(x)}")

LinearClassifier(
  (fc): Linear(in_features=10, out_features=4, bias=True)
)
Parameter containing:
tensor([[ 0.1521,  0.1082, -0.1903,  0.0762,  0.0773, -0.1393,  0.2910, -0.1023,
         -0.1709,  0.0615],
        [ 0.2166,  0.0900, -0.2946, -0.2571, -0.2505,  0.1324,  0.2025,  0.0415,
         -0.0054, -0.1827],
        [ 0.2858,  0.1660, -0.1890,  0.2525, -0.1384, -0.3098, -0.2123,  0.0535,
          0.2263,  0.2635],
        [-0.2173,  0.3146,  0.0960, -0.1536,  0.0641, -0.0596,  0.3121, -0.0352,
         -0.2341, -0.0897]], requires_grad=True)
Parameter containing:
tensor([-0.0121, -0.0879, -0.0439,  0.0899], requires_grad=True)

Testing with tensor of 0's: tensor([0.2498, 0.2316, 0.2420, 0.2766], grad_fn=<SoftmaxBackward0>)
Testing with tensor of 1's: tensor([0.2672, 0.1547, 0.3274, 0.2507], grad_fn=<SoftmaxBackward0>)


Note that you can technically still put the sigmoid function in for multiclass classification, which yields a multiple binary classifier. When would you want to use each:

Multiclass classifier (softmax):
- Describes exactly one category
- No negative examples
- Calibrated probabilities
- Used for mutually exclusive categories

Examples:
- Predicting the weather
- Predicting the name of an animal
- Predicting the next word in a sentence

Multiple binary classifier (sigmoid):
- Allows for multiple categories
- Requires negative examples
- Uncalibrated probabilities
- Used for multi-label tagging

Examples:
- Predicting where in Texas it will rain
- Predicting attributes of an animal
- Predicting which books a sentence can be found in