#Overview
Pytorch is not only a tool for deep learning. It is a good tool for general-purpose matrix and tensor computations (like numpy) using parallel cores of GPU's. This enables fast linear algebra computations.

In this tutorial, we'll see how Pytorch can be utilized for GPU-enabled matrix computations. This ipython notebook is designed to be compatible with Google Collaboratory (https://colab.research.google.com/) which gives you completely free access to GPU's. 

#Preparation
##Uploading or Using Notebook
You need to signup and apply for access before you can start using Google Colab.
Once you have access, you can either upload this notebook using File → Upload Notebook or simply enter the codes in the cells.
##Activating GPU
To enable GPU backend for your notebook, go to Edit → Notebook Settings and set Hardware accelerator to GPU.


##Installing Pytorch
We are going to use pytorch for tensor operations in GPU. Install pytorch using the following command. Doing it once is sufficient for a session.

In [1]:
# http://pytorch.org/
!pip install torch


Collecting torch
[?25l  Downloading https://files.pythonhosted.org/packages/06/a7/6a173738dd6be014ebf9ba6f0b441d91b113b1506a98e10da4ff60994b54/torch-0.4.1-cp27-cp27mu-manylinux1_x86_64.whl (519.5MB)
[K    100% |████████████████████████████████| 519.5MB 27kB/s 
tcmalloc: large alloc 1073750016 bytes == 0x559488f5c000 @  0x7f82a2e492a4 0x55943059ab68 0x55943068692d 0x5594305ae01a 0x5594305b2d72 0x5594305ab8ca 0x5594305b324e 0x5594305ab8ca 0x5594305b324e 0x5594305ab8ca 0x5594305b324e 0x5594305ab8ca 0x5594305b37d3 0x5594305ab8ca 0x5594305b324e 0x5594305ab8ca 0x5594305b324e 0x5594305b2d72 0x5594305b2d72 0x5594305ab8ca 0x5594305b37d3 0x5594305b2d72 0x5594305ab8ca 0x5594305b37d3 0x5594305ab8ca 0x5594305b37d3 0x5594305ab8ca 0x5594305b324e 0x5594305ab8ca 0x5594305ab1e9 0x5594305dbbdf
[?25hInstalling collected packages: torch
Successfully installed torch-0.4.1


# Variable Initialization
We are going to initializa a big matrix in CPU and another equally sized matrix in GPU

In [2]:
import torch
import time
import numpy as np
from torch.autograd import Variable

x_cpu = np.random.rand(10000,10000)
x_gpu = Variable(torch.from_numpy(x_cpu)).cuda(0)
print 'GPU matrix size:',x_gpu.shape
print 'CPU matrix size:',x_cpu.shape


GPU matrix size: torch.Size([10000, 10000])
CPU matrix size: (10000, 10000)


# CPU vs. GPU Comparison for Matrix Multiplication

In [5]:
# Compute in CPU
oldtime = time.time()
z_cpu = x_cpu.dot(x_cpu.T)
cputime = time.time()-oldtime
print 'Matrix-Matrix product time in CPU:',cputime,'seconds'

# Compute in GPU
oldtime = time.time()
z_cpu = torch.matmul(x_gpu,torch.t(x_gpu))
gputime = time.time()-oldtime
print 'Matrix-Matrix product time in GPU:',gputime,'seconds'
print 'Speed Gain in GPU:',cputime/gputime*100,'%'

Matrix-Matrix product time in CPU: 38.0653579235 seconds
Matrix-Matrix product time in GPU: 0.00435185432434 seconds
Speed Gain in GPU: 874692.834055 %


#CPU vs. GPU Comparison for Random Row-Column Multiplication

In [4]:
from itertools import izip

m,n = x_cpu.shape
idx_a = np.random.choice(np.arange(m),50000)
idx_b = np.random.choice(np.arange(m),50000)

# Compute in CPU
oldtime = time.time()
for i,j in izip(idx_a,idx_b):
  row = x_cpu[i,:][None,:]
  col = x_cpu[:,j][:,None]
  z_cpu = row.dot(col)
print "Random row-column multiplication in CPU:",time.time()-oldtime,'seconds'

# Compute in GPU
oldtime = time.time()
for i,j in izip(idx_a,idx_b):
  row = x_gpu[i,:].unsqueeze(0)
  col = x_gpu[:,j].unsqueeze(1)
  z_gpu = torch.matmul(row,col)
print "Random row-column multiplication (unsqueeze) in GPU:",time.time()-oldtime,'seconds'

# Compute in GPU
oldtime = time.time()
for i,j in izip(idx_a,idx_b):
  row = x_gpu[i,:].view(1,-1)
  col = x_gpu[:,j].view(-1,1)
  z_gpu = torch.matmul(row,col)
print "Random row-column multiplication (view) in GPU:",time.time()-oldtime,'seconds'
print "View is a bit slower"

# Compute in GPU
oldtime = time.time()
for i,j in izip(idx_a,idx_b):
  row = x_gpu[i,:].unsqueeze(0)
  col = x_gpu[:,j].unsqueeze(1)
  z_gpu = torch.mm(row,col)
print "Random row-column multiplication (mm) in GPU:",time.time()-oldtime,'seconds'
  


Random row-column multiplication in CPU: 10.1162171364 seconds
Random row-column multiplication (unsqueeze) in GPU: 7.38484311104 seconds
Random row-column multiplication (view) in GPU: 7.8996989727 seconds
View is a bit slower
Random row-column multiplication (mm) in GPU: 7.1484181881 seconds
