# Logistic Regression (PyTorch)


Linear regression is to take $x$ and predict $\hat{y}$. $\hat{y}$ is "unfiltered" raw value. In real life, we often want to have binary prediction - fail or pass? spam or not? hire or no hire. In those cases, the $\hat{y}$ is often needs to be transformed, The sigmoid is such a function that can transform. The basic form of sigmoid has been discussed [here](../DL/activation_functions.ipynb).

## Compare linear vs. logistic

![logistic vs linear](../figs/logistic_vs_linear.png)


The loss function, known as Binary Cross Entropy Loss (BCE), works better than MSE in this case. All loss function needs to share a common trait:
* when (y_pred - y) goes up, loss goes up
* when (y_pred - y) goes down, loss goes down

We can verify that BCE has this charateristic as well.



## Implement logistic regression

Note: this implementaton requires improvement, it did poorly for the following dataset.

```
# hire or no hire
x_data = Variable(torch.Tensor([[65.0], [80.0], [90.0], [30.0]]))
y_data = Variable(torch.Tensor([[0.0], [1.0], [1.0], [0.0]]))
```


In [29]:
import torch
from torch.autograd import Variable
import torch.nn.functional as F


x_data = Variable(torch.Tensor([[1.0], [2.0], [3.0], [4.0]]))
y_data = Variable(torch.Tensor([[0.0], [0.0], [1.0], [1.0]]))

class Model(torch.nn.Module):
    def __init__(self):
        super(Model, self).__init__()
        self.linear = torch.nn.Linear(1, 1) # A linear model, one in and one out

    def forward(self, x):
        """
        forward accept an input, return an output
        we can use any model, but here, we just use pre-defined torch.nn.Linear
        which is assigned in the self.linear
        """
        y_pred = F.sigmoid(self.linear(x)) 
        return y_pred 

model = Model()

criterion = torch.nn.BCELoss(size_average=True)
optimizer = torch.optim.SGD(model.parameters(), lr=0.01)

# training loop

for epoch in range(5000):
    # forward pass
    y_pred = model(x_data)

    # compute and print loss
    loss = criterion(y_pred, y_data)

    # zero gradient
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()  # update 


In [31]:
# after training

hour1 = Variable(torch.Tensor([[1.0]]))
print(f"Pass: {model(hour1).data[0][0] > 0.5}")
hour2 = Variable(torch.Tensor([[7.0]]))
print(f"Pass: {model(hour2).data[0][0] > 0.5}")


Pass: False
Pass: True


## Go wide

Previous example is taking a single $x$ value as input; it doesn't have to. You can think input as a matrix, with column as features, rows as observations. and $X \cdot W$ as matrix productions. 

The basic flow is the same:

![multiple inputs](../figs/deep_wide1.png)


## Go deep

We are also not restricted to a single layer.
With the exception of fixing input width and output, you can add many layers in between. As seen below.

![deep](../figs/deep_wide2.png)

How should you construct the network in between is the million dollar question.