# Motivation

## Problem with linear model
+ Only captures linear relationship between feature and label
+ Any combination of linear functions give linear function.

## Simple approaches to introduce nonlinearity
+ Use high-order function, e.g. $y = w_1x_1^5+w_2x_1^4+..$ 
+ -- Problem: problem: is that there are too many variables 
+ Use multiple linear classification models (combine binary to multi-class). It is different from combining linear functions because functions are not linearly combined. e.g. binary classification tree. need as many as binary classifier as number of class. 

### Linear Model: use a single weight matrix to linearly fit
$$
W\left\{\begin{array}{ccc}
w_{1} x_{1,1}+w_{2} x_{2,2}+ & \cdots & w_{6} x_{1,6}=0 \\
\vdots & \vdots & \vdots \\
w_{1} x_{10, 1}+\cdots & & w_{6} x_{10,6}=9
\end{array}\right.
$$

### Combine multiple binary classifier [Perceptron]
$$
\left\{\begin{array}{l}
w_{1} x_{1,1}+w_{2} x_{1,2}+..=0 \\
w_{1} x_{21}+w_{2} x_{2}+... = -1 \\
w_{1} x_{10,1}+.. = -1
\end{array}\right.
$$
$$
XW = [1,-1,-1,...]
$$

Combine multiple binary classifier implementation
+ keep model the same
+ Challenge: get result from output of the model, when output is not strictly ~0 or ~1. See get_result(), determine which output is closest to the label, this logic is complicated to implement. It will be simplified in __logistic regression__. 

```
for i in range(len(classes)):
    y = model(feature,weights)
    
def get_result(y):
    troch.argmin(torch.from_numpy(np.numpy([torch.min((torch.abs(y-i))) for i in range(0,10)])))
```

### Logistic regression, perceptron with sigmoid:

$$
\begin{array}{l}
\text { sigmoid }\left(X W_{1}\right)=[1,0,0,0, \cdots] \\
\text { sigmoid }\left(X W_{2}\right)=[0,1,0,0,0, \cdots] \\
\text { sigmoid }\left(X W_{3}\right)=[0,0,1,0, \ldots] \\
\vdots \\
\text { sigmoid }\left(X W_{10}\right)=[0,0, \ldots,0,1]
\end{array}
$$

+ sigmoid(output) in model
+ change loss to incorporate loss_weights, gives high weight for currently training class, to make sure each class gets trained.

```
def label2ground_truth(image_label):
    gt = torch.ones(10,10)
    gt = gt*0.0
    loss_weights=torch.ones(10,10)
    loss_weights = loss_weights*0.1
    for label in image_label:
        gt[label,label]=1.0 
        loss_weights[label,label]=0.9

def model(..):
    ..
    y= torch.sigmoid(h)
    return y
    
def train_model(..):
    gt,loss_weights=label2ground_truth(image_label)
    loss = torch.sum((y-gt[i:i+1,:]).mul(y-gt[i:i+1,:]).mul(loss_weights[i:i+1,:]))

```