# Logistic Regression

The requirement is to use Neural Network models to predict the traffic demand given the historical data. In this case I am using a single **Logistic Regression Model** with **1 hidden layer**.

First, we need to import all the libraries that we are going to need to complete the task. As a library, we will be using **pyTorch**. Also, for the data preprocessing we are using **numpy**, and for the visualization we will be using **pyplot**.

In [1]:
import numpy as np
import torch
import torchvision
import torch.nn as nn
import torch.nn.functional as F
from torch.utils.data import DataLoader
from torch.utils.data import TensorDataset
import matplotlib.pyplot as plt

## Data Preprocessing

### Importing Training Data

In [5]:
train = np.load('train.npz')
input_train = train['x'] #feature matrix
label_train = train['y'] #label matrix
location_train = train['locations'] #location matrix### Importing Valuation Data
times_train = train['times'] #time matrix

### Importing Valuation Data

In [6]:
val = np.load('val.npz')
input_val = val['x'] #feature matrix
label_val = val['y'] #label matrix
location_val = val['locations'] #location matrix
times_val = val['times'] #time matrix

### Importing Testing Data

In [7]:
test = np.load('test.npz')
input_test = test['x'] #feature matrix
location_test = test['locations'] #location matrix
times_test = test['times'] #time matrix

## Changing from NumPy Arrays into Tensors

Here we are creating tensors from numpy arrays

In [8]:
inputs_train = torch.from_numpy(input_train)
labels_train = torch.from_numpy(label_train)

inputs_val = torch.from_numpy(input_val)
labels_val = torch.from_numpy(label_val)

In [9]:
train_ds = TensorDataset(inputs_train.float(), labels_train.float())
val_ds = TensorDataset(inputs_val.float(), labels_val.float())

### Creating DataLoader

Now, we are creating dataloaders to load the data in batches (in our case we will be using batches of size 100).

Since the training data is often sorted by the target labels, or at least it is not random, therefore, it is crucial for us to choose random data items for our batches.

In [10]:
batch_size = 100
input_size = 8*49

train_loader = DataLoader(train_ds, batch_size, shuffle=True)
val_loader = DataLoader(val_ds, batch_size)

### Shape

Here, we can see that our items in the train_loader have the shape of (100(number of items in the batch), 8, 49). But for ou rfurther operations such as matrix multiplications, this shape is going to be invalid for us. Therefore, we need to reshape it.

In [11]:
for items, labels in train_loader:
    print('items.shape:', items.shape)
    inputs = items.reshape(-1, 8*49)
    print('inputs.shape:', inputs.shape)
    break

items.shape: torch.Size([100, 8, 49])
inputs.shape: torch.Size([100, 392])


The size of the hidden layer is going to be 64.

In [12]:
input_size = inputs.shape[-1]
hidden_size = 64

### Layer Creation

Next, we will create a nn.Linear object that is going to be our hidden layer. The size is already defined to be 64.

In [19]:
L1 = nn.Linear(input_size, hidden_size)

In [20]:
L1_outputs = L1(inputs)
print('layer1_outputs.shape:', L1_outputs.shape)

layer1_outputs.shape: torch.Size([100, 64])


In [21]:
L1_outputs_direct = inputs @ L1.weight.t() + L1.bias
L1_outputs_direct.shape

torch.Size([100, 64])

The image of vectors of size 392 are now transformed into intermediate output vectors of lenght 64 after matrix multiplications and addition of bias.

### Activation Function

L1_outputs and inputs have a linear relationship, where each element of L1_putputs is a weighted sum of elements of unputs. Therefore, layer 1 can only capture linear relationships. That is why we need some kind of function that would make the next relationship between L1 and L2 non-linear.

This kind of function is called an activation function and there are typically 5 major functions which are Step, Tanh, ReLU, and leaky ReLU.

In our case, we will be using ReLU (Rectified Linear Unit) as an activation function. What it does is it ignores the non-negative numbers, but the negative numbers are transformed to 0. This function is no derivativable, and therefore is a good choice for a function to get rid from linearity. 

In [22]:
relu_outputs = F.relu(L1_outputs)
print('min(L1_outputs):', torch.min(L1_outputs).item())
print('min(relu_outputs):', torch.min(relu_outputs).item())

min(L1_outputs): -56.520835876464844
min(relu_outputs): 0.0


### Creation of Layer 2

In [23]:
L2 = nn.Linear(hidden_size, 1)

In [24]:
L2_outputs = L2(relu_outputs)
print(L2_outputs.shape)

torch.Size([100, 1])


Now, layer 2 outputs contains a batch of vectors of size 1. Now, we can compute the loss using F.mse_loss (Mean Squared Loss) function and adjust the weights of L1 and L2 using gradient descent.

In [25]:
F.mse_loss(L2_outputs, labels)

tensor(470.2571, grad_fn=<MseLossBackward0>)

In [26]:
# Expanded version of layer2(F.relu(layer1(inputs)))
outputs = (F.relu(inputs @ L1.weight.t() + L1.bias)) @ L2.weight.t() + L2.bias

In [27]:
# Same as layer2(layer1(inputs))
outputs2 = (inputs @ L1.weight.t() + L1.bias) @ L2.weight.t() + L2.bias

In [30]:
# Create a single layer to replace the two linear layers
combined_layer = nn.Linear(input_size, 1)

combined_layer.weight.data = L2.weight @ L1.weight
combined_layer.bias.data = L1.bias @ L2.weight.t() + L2.bias

In [31]:
# Same as combined_layer(inputs)
outputs3 = inputs @ combined_layer.weight.t() + combined_layer.bias