<a href="http://cocl.us/pytorch_link_top">
    <img src="https://s3-api.us-geo.objectstorage.softlayer.net/cf-courses-data/CognitiveClass/DL0110EN/notebook_images%20/Pytochtop.png" width="750" alt="IBM Product " />
</a> 

<img src="https://s3-api.us-geo.objectstorage.softlayer.net/cf-courses-data/CognitiveClass/DL0110EN/notebook_images%20/cc-logo-square.png" width="200" alt="cognitiveclass.ai logo" />

<h1>Data Preprocessing and Gradient Descent </h1> 

<h2>Table of Contents</h2>
<p>This lab will show how data normalization, data standardization, data decorrelation (Principal Component Analysis), Whitening Data and Zero-Phase Component Analysis affect convergence in parameter space. The simulations are based on the paper Efficient BackProp by Yann A. LeCun1, Léon Bottou1, Genevieve B. Orr2, and Klaus-Robert Müller.  </p>

<ul>
    <li><a href="#Auxiliary">Auxiliary Functions and Classes </a></li>
    <li><a href="#PyTorch_Classes"> Define the PyTorch Classes </a></li>
    <li><a href="#No_Transform">Data with No Pre-processing </a></li>
    <li><a href="#Standardize_Data">Standardize Data </a></li>
    <li><a href="#PCA">PCA </a></li>
    <li><a href="#Whitening">Whitening</a></li>
    <li><a href="#ZCA">Zero-Phase Component Analysis</a></li>
</ul>

<p>Estimated Time Needed: <strong>30 min</strong></p>

<hr>

<h2 id="Auxiliary">Auxiliary Functions and Classes </h2>

We'll need the following libraries for ploting:  

In [None]:
# These are the libraries we are going to use in the lab.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from mpl_toolkits import mplot3d
import seaborn as sns

The class <code>plot_error_surfaces</code> is just to help you visualize the data space and the parameter space during training and has nothing to do with PyTorch. 

In [None]:
# class for ploting  

class plot_error_surfaces(object):
    
    # Constructor
    def __init__(self, w_range, b_range, X, Y, n_samples = 40, go = True):
        W = np.linspace(-w_range, w_range, n_samples)
        B = np.linspace(-b_range, b_range, n_samples)
        w, b = np.meshgrid(W, B)    
        Z = np.zeros((n_samples, n_samples))
        count1 = 0
        self.y = Y.numpy()
        self.x = X.numpy()
        for w1, b1 in zip(w, b):
            count2 = 0
            for w2, b2 in zip(w1, b1):
                Z[count1, count2] = np.mean((self.y - w2 * self.x[:,0] - b2*self.x[:,1]) ** 2)
                count2 += 1
            count1 += 1
        self.Z = Z
        self.w = w
        self.b = b
        self.W = []
        self.B = []
        self.LOSS = []
        self.n = 0
        if go == True:
            plt.figure()
            plt.figure(figsize = (7.5, 5))
            plt.axes(projection = '3d').plot_surface(self.w, self.b, self.Z, rstride = 1, cstride = 1, cmap = 'viridis', edgecolor = 'none')
            plt.title('Loss Surface')
            plt.xlabel('w')
            plt.ylabel('b')
            plt.show()
            plt.figure()
            plt.title('Loss Surface Contour')
            plt.xlabel('w_{1}')
            plt.ylabel('w_{2}')
            plt.contour(self.w, self.b, self.Z)
            plt.show()
            
    # Setter
    def set_para_loss(self, model, loss):
        self.n = self.n + 1
        self.LOSS.append(loss)
        self.W.append(model.state_dict()['linear.weight'][0][0].item())
        self.B.append(model.state_dict()['linear.weight'][0][1].item())
    
    # Plot diagram
    def final_plot(self): 
        ax = plt.axes(projection = '3d')
        ax.plot_wireframe(self.w, self.b, self.Z)
        ax.scatter( self.W,self.B,self.LOSS, c = 'r', marker = 'x', s = 200, alpha = 1)
        plt.figure()
        plt.contour( self.w,self.b, self.Z)
        plt.scatter(self.W,self.B,   c = 'r', marker = 'x')
        for i in range(len(self.W)):
            plt.annotate(str(i), ( self.W[i],self.B[i]))
        plt.xlabel('w_{1}')
        plt.ylabel('w_{2}')
        plt.show()

This function will plot out a scatter plot of several datasets and their covariance matrix; the input is a dictionary, the key is the name of the dataset, and the value is a   PyTorch dataset object.

In [None]:
 def plotData(datasets):   
    for name,dataset in datasets.items():
        plt.scatter(dataset.X.numpy()[:,0], dataset.X.numpy()[:,1],label=name)
        plt.legend()
      
    plt.show()    
    for name,dataset in datasets.items():    
        df = pd.DataFrame(dataset.X.numpy())
        corr = df.corr()
        sns.heatmap(corr,annot=corr)
        plt.title('correlation of {} data'.format(name))
        
        plt.show('correlation')

<!--Empty Space for separating topics-->

<h2 id="PyTorch_Classes">Define the PyTorch Classes</h2>

Import PyTroch libraries and set random seed.

In [None]:
# Import libraries and set random seed

import torch
from torch.utils.data import Dataset, DataLoader
from torch import nn, optim
torch.manual_seed(1)

Create a dataset object; this object has the option to apply the appropriate transform. 

In [None]:
# Create Data Class

class Data(Dataset):
    
    # Constructor
    def __init__(self,range_x1=2,range_x2=10,samples=100,standardiz=False,PCA=False,whitening=False,zero_phase=False):

        self.W=torch.tensor([[-1.0],[1.0]])
        self.X=torch.zeros(samples,2)
        self.X[:,0] = torch.linspace(start = -1*range_x1, end= range_x1, steps =samples)
        self.X[:,1] = torch.linspace(start = -1*range_x2, end= range_x2, steps =samples)
        self.X=self.X+torch.randn(samples,2)*1
        self.Y = torch.mm(self.X,self.W)+torch.randn(samples,1)
        if standardiz==True:
            self.X=torch.mm(self.X-self.X.mean(dim=0),torch.eye(2)/self.X.std(dim=0)) 
        if PCA==True:
            self.X=self.X-self.X.mean(dim=0)
            self.Cov=torch.mm(torch.t(self.X),self.X)/self.X.shape[0]
            self.eigenvalues,self.eigenvectors=torch.eig(self.Cov,True)
            self.X=torch.mm(self.X,self.eigenvectors)
            
        if whitening==True:
            self.X=self.X-self.X.mean(dim=0)
            self.Cov=torch.mm(torch.t(self.X),self.X)/self.X.shape[0]
            self.eigenvalues,self.eigenvectors=torch.eig(self.Cov,True)
            self.diag=torch.eye(2)
            self.diag[0,0]=self.eigenvalues[0,0]**(-1/2)
            self.diag[1,1]=self.eigenvalues[1,0]**(-1/2)
            self.whitening_transform=torch.mm(self.eigenvectors,self.diag)
            self.X=torch.mm(self.X,self.whitening_transform)
            
            
        if zero_phase==True:
            self.X=self.X-self.X.mean(dim=0)
            self.Cov=torch.mm(torch.t(self.X),self.X)/self.X.shape[0]
            self.eigenvalues,self.eigenvectors=torch.eig(self.Cov,True)
            self.diag=torch.eye(2)
            self.diag[0,0]=self.eigenvalues[0,0]**(-1/2)
            self.diag[1,1]=self.eigenvalues[1,0]**(-1/2)
            self.whitening_transform=torch.mm(self.eigenvectors,self.diag)
            self.X=torch.mm(self.X,self.whitening_transform)
            self.X=torch.mm(self.X,torch.t(self.eigenvectors))
        self.len=samples
        
        
    # Getter
    def __getitem__(self,index):    
        return self.X[index],self.Y[index]
    
    # Get Length
    def __len__(self):
        return self.len

Loss function 

In [None]:
criterion = nn.MSELoss()

Linear Regression Class 

In [None]:
from torch import nn, optim

class linear_regression(nn.Module):
    
    # Constructor
    def __init__(self, input_size, output_size):
        super(linear_regression, self).__init__()
        self.linear = nn.Linear(input_size, output_size)
        
    # Prediction
    def forward(self, x):
        yhat = self.linear(x)
        return yhat

Model Parameters and dictionary to store loss for each iteration.

In [None]:
training_loss={'regular':[],'standardize':[],'pca':[],'whitening':[],'zca':[]}
#learning rate
lr=0.01 
batch_size=1

<h2 id="#No_Transform">  Data with No Pre-processing  </h2>
  In this section, we create data with no Preprocessing , if  $\mathbf{\hat{x}}$ is a transformed feature vector  and $\mathbf{x}$ in this case our data is given by:

 $\mathbf{\hat{x}}=\mathbf{x}$ 

In [None]:
# Create dataset object
dataset = Data(range_x1=2,range_x2=10)
dataset_standardize = Data(range_x1=2,range_x2=10,standardiz=True)
dataset_pca = Data(range_x1=2,range_x2=10,PCA=True)
dataset_whitening= Data(range_x1=2,range_x2=10,whitening=True)
dataset_zca=Data(range_x1=2,range_x2=10,zero_phase=True )

Plot data with no Preprocessing:

In [None]:
datasets={'regular dataset':dataset}        
plotData(datasets)

Create a linear regression object, and we initialize the values, so they are relatively far away from the minimum. We also create an optimizer object and a data loader object.

In [None]:
# Build in cost function
model=linear_regression(2,1)
start_w1=-15.0
start_w2=20.0
model.state_dict()['linear.weight'][0][0] = start_w1
model.state_dict()['linear.weight'][0][1] = start_w2
model.state_dict()['linear.bias'][0]=0.0
trainloader = DataLoader(dataset = dataset, batch_size = 1)
optimizer = optim.SGD(model.parameters(), lr=lr)

Create a plotting object, not part of PyTroch, just used to help visualize 

In [None]:
get_surface = plot_error_surfaces(20,20, dataset.X, dataset.Y, 30, go = True)

<!--Empty Space for separating topics-->

Run 10 epochs of stochastic gradient descent:

In [None]:
for epoch in range(10):
    for x,y in trainloader:
        yhat = model(x)
        loss = criterion(yhat, y)
        training_loss['regular'].append(loss.item())
        get_surface.set_para_loss(model, loss.tolist())          
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

Plot the loss surface and overlay the values of the parameters obtained via gradient descent in red.

In [None]:
get_surface.final_plot()

<h2 id="#Standardize_Data "> Standardize Data </h2>

In this section, we Standardize data $\mathbf{x}$, this is equivalent to the following matrix operation:

$\quad
    \boldsymbol D= \begin{pmatrix} \sigma_1 & 0 \\
                             0  & \sigma_2 \end{pmatrix}  $

$\mathbf{\hat{x}}=D^{-1}(\mathbf x-\boldsymbol\mu)$

where $\boldsymbol\mu$ is the mean and $\sigma_i$ is the standard deviation of the i-th component.

We create a dataset object where the data has been standardized.

In [None]:
dataset_standardize = Data(range_x1=2,range_x2=10,standardiz=True)

we plot the data:

In [None]:
datasets={'regular dataset':dataset, 'standardized datatset': dataset_standardize}        
        
plotData(datasets)

Create a linear regression object, and we initialize the values, so they are relatively far away from the minimum. We also create an optimizer object and a data loader object.

In [None]:
model_standardize=linear_regression(2,1)
trainloader = DataLoader(dataset = dataset_standardize, batch_size = 1)
model_standardize.state_dict()['linear.weight'][0][0] = start_w1
model_standardize.state_dict()['linear.weight'][0][1] = start_w2
model_standardize.state_dict()['linear.bias'][0]=0.0
optimizer = optim.SGD(model_standardize.parameters(), lr=lr)

plot the loss surface.

In [None]:
get_surface = plot_error_surfaces(20,20, dataset_standardize.X, dataset_standardize.Y, 30, go = True)

Run 10 epochs of stochastic gradient descent:

In [None]:
for epoch in range(10):
    for x,y in trainloader:
        yhat = model_standardize(x)
        loss = criterion(yhat, y)
        training_loss['standardize'].append(loss.item())
        get_surface.set_para_loss(model_standardize, loss.tolist())          
        optimizer.zero_grad()
        loss.backward()

        optimizer.step()

Plot the loss surface and overlay the values of the parameters obtained via gradient descent in red.

In [None]:
get_surface.final_plot()

<h2 id="#PCA "> PCA</h2>
In this section, we create a dataset object that uses Principal component analysis (PCA). We find the projection of the data on the eigenvectors of the covariance matrix $\mathbf{Q}$, as shown below. We zero center the data.

$\frac{1}{N}   \mathbf{X}^T \mathbf{X} = \mathbf{Q} \mathbf{\Lambda} \mathbf{Q}^T$

$\mathbf{\hat{x}}=\mathbf{x} \mathbf{Q} $

we crate the dataset object :

In [None]:
dataset_pca = Data(range_x1=2,range_x2=10,PCA=True)

we plot the data:

In [None]:

datasets={'regular dataset':dataset, 'PCA': dataset_pca }        
        
plotData(datasets)

Create a linear regression object, and we initialize the values, so they are relatively far away from the minimum. We also create an optimizer object and a data loader object. 

In [None]:
model_pca=linear_regression(2,1)
trainloader = DataLoader(dataset = dataset_pca, batch_size = 1)
model_pca.state_dict()['linear.weight'][0][0] = start_w1
model_pca.state_dict()['linear.weight'][0][1] = start_w2
model_pca.state_dict()['linear.bias'][0]=0.0
optimizer = optim.SGD(model_pca.parameters(), lr=lr)

we plat the loss surface:

In [None]:
get_surface = plot_error_surfaces(20,20, dataset_pca.X, dataset_pca.Y, 30, go = True)
print("standard deviation", dataset_pca.X.std(dim=0))

Run 10 epochs of stochastic gradient descent:

In [None]:
for epoch in range(10):
    for x,y in trainloader:
        yhat = model_pca(x)
        loss = criterion(yhat, y)
        training_loss['pca'].append(loss.item())
        get_surface.set_para_loss(model_pca, loss.tolist())          
        optimizer.zero_grad()
        loss.backward()

        optimizer.step()

Plot the loss surface and overlay the values of the parameters obtained via gradient descent in red.

In [None]:
get_surface.final_plot()

<h2 id="#Whitening<"> Whitening</h2>

In this section we apply a Whitening Matrix, this gives the features all the same variance. The operation can be expressed as: 

$\mathbf{\hat{x}}=\mathbf{x} \mathbf{Q} \mathbf{\Lambda}^{-1/2} $

In [None]:
dataset_whitening= Data(range_x1=2,range_x2=10,whitening=True)

datasets={'hitening dataset':dataset_whitening, 'PCA': dataset_pca}        
        
plotData(datasets)

Create a linear regression object, and we initialize the values, so they are relatively far away from the minimum. We also create an optimizer object and a data loader object. 

In [None]:
model_whitening=linear_regression(2,1)
criterion = nn.MSELoss()
trainloader = DataLoader(dataset = dataset_whitening, batch_size = 1)
model_whitening.state_dict()['linear.weight'][0][0] = start_w1
model_whitening.state_dict()['linear.weight'][0][1] = start_w2
model_whitening.state_dict()['linear.bias'][0]=0.0
optimizer = optim.SGD(model_whitening.parameters(), lr=lr)

we plot the loss surface

In [None]:
get_surface = plot_error_surfaces(20,20, dataset_whitening.X, dataset_whitening.Y, 30, go = True)

Run 10 epochs of stochastic gradient descent:

In [None]:
for epoch in range(10):
    for x,y in trainloader:
        yhat = model_whitening(x)
        loss = criterion(yhat, y)
        training_loss['whitening'].append(loss.item())
        get_surface.set_para_loss(model_whitening, loss.tolist())          
        optimizer.zero_grad()
        loss.backward()

        optimizer.step()

Plot the loss surface and overlay the values of the parameters obtained via gradient descent in red.

In [None]:
get_surface.final_plot()

<h2 id="#ZCA"> Zero-Phase Component Analysis (ZCA) </h2>

We apply ZCA, ZCA is decorrelated and has Whitening applied to it, but the data has more income with the original data. We ca apply the transform the data as follows:

$\mathbf{\hat{x}}=\mathbf{x} \mathbf{Q} \mathbf{\Lambda}^{-1/2}\mathbf{Q}^{T} $

we create the data and plot it:

In [None]:
dataset_zca=Data(range_x1=2,range_x2=10,zero_phase=True )

In [None]:
datasets={'ZCA':dataset_zca, 'whitening': dataset_whitening }  
plotData(datasets)

Create a linear regression object, and we initialize the values, so they are relatively far away from the minimum. We also create an optimizer object and a data loader object. 

In [None]:
model_zca=linear_regression(2,1)
criterion = nn.MSELoss()
trainloader = DataLoader(dataset = dataset_zca, batch_size = 1)
model_zca.state_dict()['linear.weight'][0][0] = start_w1
model_zca.state_dict()['linear.weight'][0][1] = start_w2
model_zca.state_dict()['linear.bias'][0]=0.0
optimizer = optim.SGD(model_zca.parameters(), lr=lr)

we plot the loss surface:

In [None]:
get_surface = plot_error_surfaces(20,20, dataset_zca.X, dataset_zca.Y, 30, go = True)

Run 10 epochs of stochastic gradient descent:

In [None]:
for epoch in range(10):
    for x,y in trainloader:
        yhat = model_zca(x)
        loss = criterion(yhat, y)
        training_loss['zca'].append(loss.item())
        get_surface.set_para_loss(model_zca, loss.tolist())          
        optimizer.zero_grad()
        loss.backward()

        optimizer.step()

Plot the loss surface and overlay the values of the parameters obtained via gradient descent in red.

In [None]:
get_surface.final_plot()

<h2 id="#loss"> Plot Loss   </h2>

We plot the loss for each method

In [None]:
for name, loss in training_loss.items():
    
        plt.plot(loss,label=name )
        plt.legend()
        plt.show()

<!--Empty Space for separating topics-->

<h2>About the Authors:</h2> 

<a href="https://www.linkedin.com/in/joseph-s-50398b136/">Joseph Santarcangelo</a> has a PhD in Electrical Engineering, his research focused on using machine learning, signal processing, and computer vision to determine how videos impact human cognition. Joseph has been working for IBM since he completed his PhD.

<hr>

Copyright &copy; 2020 <a href="cognitiveclass.ai?utm_source=bducopyrightlink&utm_medium=dswb&utm_campaign=bdu">cognitiveclass.ai</a>. This notebook and its source code are released under the terms of the <a href="https://bigdatauniversity.com/mit-license/">MIT License</a>.