Let's continue with the PyTorch regression to illustrate some important concepts such as how scalable PyTorch methods are particularly when applied to large datasets with many rows and/or lots of feature columns. 

In [1]:
from sklearn.linear_model import LinearRegression
from sklearn.datasets import make_regression
import pandas as pd
import numpy as np
import torch
import torch.nn as nn
from torch.utils.data import TensorDataset

## Creating a GPU environment

We can create a GPU environment on Google CoLab without too much trouble. Use the following logic to do this although if your laptop is equipped with a GPU this should also work for you. In Google Co Lab you would need to go to the **Runtime -> Change runtime type** to get it to use a GPU. 

In [2]:
if torch.cuda.is_available():
  device = torch.device("cuda")
else:
  device = torch.device("cpu")
print(device)

cuda


### Creating a Regression Data Set

So you can create a regression data set quite easily using scikit-learn. This is pretty straightforward. This could take a while to do depending on where you run this code. The idea here is to create a larger dataset with lots of features which mimics some at-scale data sets you might encounter in a real life situation.

In [3]:
# n_features the numner of columns
n_features = 5000

# check the scikit-learn doc on make_regression
%time X, y = make_regression(n_samples=60000,n_features=n_features)
print("The shape of X is:",X.shape)

CPU times: user 17.6 s, sys: 1.18 s, total: 18.8 s
Wall time: 18.4 s
The shape of X is: (60000, 5000)


So now we could use the default form of Regression which is part of scikit-learn which uses the OLS method. Note, that this might take up to 6 minutes at least on the typical Google CoLab environment. In fact, it might not complet at all due to memory constraints

In [8]:
# This could take a while
from sklearn.linear_model import LinearRegression
%time reg = LinearRegression().fit(X, y)

CPU times: user 5min 13s, sys: 6.31 s, total: 5min 20s
Wall time: 2min 44s


So now we could implement a way to measure error in the regression model. We'll use the standard MSE (Mean Squared Error) which can easily be computed using some simple numpy functions

In [9]:
preds = reg.predict(X)

def my_mse(actual,pred):
  import numpy as np
  mse = np.mean((y-preds)**2)
  return(mse)

# The regression is pretty good
print(my_mse(y,preds))

9.666648443871006e-25


Next up, let's turn the data into numpy arrays and then into tensors. You've seen this before so it shouldn't be a big surprise. 

In [4]:
import numpy as np
X_numpy = np.array(X, dtype='float')
y_numpy = np.array(y, dtype='float')

# Make sure you have torch imported else this will fail
X = torch.from_numpy(X_numpy.astype(np.float32))
y = torch.from_numpy(y_numpy.astype(np.float32))

# Let's capture some characteristis of the original data
y = y.view(y.shape[0],1)
n_samples, n_features = X_numpy.shape
input_size = n_features

#y_numpy = y_numpy.view(y_numpy.shape[0],1)


Now we can use the function <b>TensorDataset</b> to being the tensors into torch. `TensorDataset` provides a way to create a dataset out of the data that is already loaded into memory. 


In [5]:
# Define a tensor dataset
train_ds = TensorDataset(X, y)

In [16]:
train_ds

<torch.utils.data.dataset.TensorDataset at 0x7fb9c370f510>

Now we can initialize some weights and bias values that will seed our model. Note that the weights and biases will be recomputed as we move forward and backwards within the network. Torch has tools to help us do this without a lot of problems. 

In [6]:
# Weights and biases
w = torch.randn(1, n_features, requires_grad=True)
b = torch.randn(1, requires_grad=True)
print(w)
print(b)

tensor([[ 0.1269, -0.0605, -0.5112,  ..., -0.9342, -0.2371,  0.0181]],
       requires_grad=True)
tensor([-0.7067], requires_grad=True)


So here is where we can implement the matrix (tensor) operations necessary to solve the problem. Note that the first time we do this, we are likely to experience a high error rate which is to say a high MSE value. That's okay because we will look at partial derivatives with respect to the loss function to recompute weights and biases with a goal of minimizing the MSE. This is essentially using numerical methods to find the point where the gradient produces a minimum MSE. 

In [7]:
def model(x):
    return x @ w.t() + b

import torch.nn as nn

In [8]:
# Generate a first set of predictions
preds = model(X)
print(preds)

tensor([[ 47.3999],
        [ 12.1222],
        [  7.7041],
        ...,
        [ 14.2788],
        [-24.1895],
        [-37.9776]], grad_fn=<AddBackward0>)


In [9]:
# MSE loss
def mse(t1, t2):
    diff = t1 - t2
    return (torch.sum(diff * diff) / diff.numel())

# Compute loss
loss = mse(preds, y)
print(loss)

tensor(53256.7812, grad_fn=<DivBackward0>)


In [10]:
# Train for 100 epochs
for i in range(100):
    preds = model(X)
    loss = mse(preds, y)
    print(loss,"\n")
    loss.backward()
    with torch.no_grad():
#        w -= w.grad * 1e-5
        w -= w.grad * 0.25
#        b -= b.grad * 1e-5
        b -= b.grad * 0.25
        w.grad.zero_()
        b.grad.zero_()

tensor(53256.7812, grad_fn=<DivBackward0>) 

tensor(12124.2549, grad_fn=<DivBackward0>) 

tensor(3706.6060, grad_fn=<DivBackward0>) 

tensor(1336.5488, grad_fn=<DivBackward0>) 

tensor(530.9936, grad_fn=<DivBackward0>) 

tensor(224.4765, grad_fn=<DivBackward0>) 

tensor(99.0591, grad_fn=<DivBackward0>) 

tensor(45.1084, grad_fn=<DivBackward0>) 

tensor(21.0396, grad_fn=<DivBackward0>) 

tensor(10.0010, grad_fn=<DivBackward0>) 

tensor(4.8273, grad_fn=<DivBackward0>) 

tensor(2.3598, grad_fn=<DivBackward0>) 

tensor(1.1660, grad_fn=<DivBackward0>) 

tensor(0.5813, grad_fn=<DivBackward0>) 

tensor(0.2921, grad_fn=<DivBackward0>) 

tensor(0.1478, grad_fn=<DivBackward0>) 

tensor(0.0752, grad_fn=<DivBackward0>) 

tensor(0.0385, grad_fn=<DivBackward0>) 

tensor(0.0198, grad_fn=<DivBackward0>) 

tensor(0.0102, grad_fn=<DivBackward0>) 

tensor(0.0053, grad_fn=<DivBackward0>) 

tensor(0.0028, grad_fn=<DivBackward0>) 

tensor(0.0014, grad_fn=<DivBackward0>) 

tensor(0.0008, grad_fn=<DivBackward

In [12]:
from torch.utils.data import TensorDataset
# Define dataset
train_ds = TensorDataset(X, y)
train_ds[0:3]

(tensor([[-0.0288,  0.3790, -0.7543,  ...,  0.5759,  0.9869, -0.7461],
         [-2.0790, -0.5753,  1.4763,  ..., -1.3802, -0.2418,  1.5581],
         [-1.4013,  1.7600,  0.5335,  ...,  0.6989, -0.5572, -0.0935]]),
 tensor([[ 235.2787],
         [-352.9172],
         [ -39.0413]]))

In [13]:
from torch.utils.data import DataLoader
# Define data loader
batch_size = 1000
train_dl = DataLoader(train_ds, batch_size, shuffle=True)

In [14]:
# Define model

model = nn.Linear(n_features, 1)
print(model.weight)
print(model.bias)

# Parameters
list(model.parameters())

# Import nn.functional
import torch.nn.functional as F

# Define loss function
loss_fn = F.mse_loss

loss = loss_fn(model(X), y, reduction="none")

Parameter containing:
tensor([[-0.0133,  0.0123, -0.0079,  ...,  0.0078,  0.0055,  0.0043]],
       requires_grad=True)
Parameter containing:
tensor([-0.0065], requires_grad=True)


In [15]:
opt = torch.optim.SGD(model.parameters(), lr=0.02)

def fit(num_epochs, model, loss_fn, opt, train_dl):
    
    # Repeat for given number of epochs
    for epoch in range(num_epochs):
        
        # Train with batches of data
        for xb,yb in train_dl:
            
            # 1. Generate predictions
            pred = model(xb)
            
            # 2. Calculate loss
            loss = loss_fn(pred, yb)
            
            # 3. Compute gradients
            loss.backward()
            
            # 4. Update parameters using gradients
            opt.step()
            
            # 5. Reset the gradients to zero
            opt.zero_grad()
        
        # Print the progress
        if (epoch+1) % 10 == 0 and loss >= 1e-12:
            print('Epoch [{}/{}], Loss: {:.4f}'.format(epoch+1, num_epochs, loss.item()))
            break
        else:
            print('Epoch [{}/{}], Loss: {:.4f}'.format(epoch+1, num_epochs, loss.item()))

%time fit(100, model, loss_fn, opt, train_dl)

Epoch [1/100], Loss: 662.2699
Epoch [2/100], Loss: 16.5256
Epoch [3/100], Loss: 0.6199
Epoch [4/100], Loss: 0.0288
Epoch [5/100], Loss: 0.0016
Epoch [6/100], Loss: 0.0001
Epoch [7/100], Loss: 0.0000
Epoch [8/100], Loss: 0.0000
Epoch [9/100], Loss: 0.0000
Epoch [10/100], Loss: 0.0000
CPU times: user 7.87 s, sys: 97.6 ms, total: 7.97 s
Wall time: 7.98 s


In [16]:
# Note that you could use the OOP approach to define the model if desired
from torch.nn import Linear
import torch.nn as nn
class LinearRegression(nn.Module):
  def __init__(self, in_features, out_features):
    super().__init__()
    self.linear = nn.Linear(in_features, out_features)

  def forward(self, x):
    prediction = self.linear(x)
    return prediction

In [17]:
# Define model
model = LinearRegression(n_features, 1)

# Parameters
list(model.parameters())

# Import nn.functional
import torch.nn.functional as F

# Define loss function
loss_fn = F.mse_loss

loss = loss_fn(model(X), y, reduction="none")

In [18]:
opt = torch.optim.SGD(model.parameters(), lr=0.02)

def fit(num_epochs, model, loss_fn, opt, train_dl):
    
    # Repeat for given number of epochs
    for epoch in range(num_epochs):
        
        # Train with batches of data
        for xb,yb in train_dl:
            
            # 1. Generate predictions
            pred = model(xb)
            
            # 2. Calculate loss
            loss = loss_fn(pred, yb)
            
            # 3. Compute gradients
            loss.backward()
            
            # 4. Update parameters using gradients
            opt.step()
            
            # 5. Reset the gradients to zero
            opt.zero_grad()
        
        # Print the progress
        if (epoch+1) % 10 == 0 and loss >= 1e-12:
            print('Epoch [{}/{}], Loss: {:.4f}'.format(epoch+1, num_epochs, loss.item()))
            break
        else:
            print('Epoch [{}/{}], Loss: {:.4f}'.format(epoch+1, num_epochs, loss.item()))

%time fit(100, model, loss_fn, opt, train_dl)

Epoch [1/100], Loss: 613.2085
Epoch [2/100], Loss: 17.2481
Epoch [3/100], Loss: 0.6461
Epoch [4/100], Loss: 0.0315
Epoch [5/100], Loss: 0.0016
Epoch [6/100], Loss: 0.0001
Epoch [7/100], Loss: 0.0000
Epoch [8/100], Loss: 0.0000
Epoch [9/100], Loss: 0.0000
Epoch [10/100], Loss: 0.0000
CPU times: user 8.36 s, sys: 101 ms, total: 8.46 s
Wall time: 8.45 s
