<a href="https://colab.research.google.com/github/smart-stats/ds4bio_book/blob/main/book/gpu.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a> [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/smart-stats/ds4bio_book/HEAD)

# Parallelism and GPU computing

AI calculations involve quite a few arithmetic calculations. Graphics processing units (GPUs) are computer chips that were designed to parallelize large numbers of small arithmetic calculations. They were designed as such because computer graphics involve rotations, shifts, scaling, etc. of images, are a matrix manipulations. 

Currently, I don't have a GPU in my personal computer. Some options for trying out GPUs include Google Colab and Paperspace to name two examples. I'm doing this example on Colab


In [2]:
import torch
torch.cuda.is_available()

True

By default, calculations will be on the CPU. You have to actually migrate calculations to the GPU. Here I write the code in such a way that it works on the GPU or CPU depending on whether a GPU is available. It's typical to create a variable called "device" that references the GPU.

In [3]:
if torch.cuda.is_available():
    device = torch.device("cuda:0")
else :
    device = torch.device("cpu")
print(device)

cuda:0


Let's look at the reduction in runtime

In [8]:
import time

for i in range(10):
  test_matrix = torch.randn([100000, 10])
  test_matrix_cuda = test_matrix.to(device)

  start = time.time()
  test_matrix.sum()
  end = time.time()

  a = end - start

  start = time.time()
  test_matrix_cuda.sum()
  end = time.time()

  b = end - start

  print("The % reduction in runtime is: ", end = "")
  print([(1 - b / a) * 100])


The % reduction in runtime is: [69.79166666666667]
The % reduction in runtime is: [73.96593673965937]
The % reduction in runtime is: [76.03406326034063]
The % reduction in runtime is: [75.97633136094674]
The % reduction in runtime is: [78.29638273045506]
The % reduction in runtime is: [75.93023255813954]
The % reduction in runtime is: [78.57974388824213]
The % reduction in runtime is: [77.97468354430379]
The % reduction in runtime is: [78.54588796185935]
The % reduction in runtime is: [78.66354044548652]


In [13]:
import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
import numpy as np
import matplotlib.pyplot as plt
import urllib.request
import PIL

## Read in and organize the data
imgURL = "https://raw.githubusercontent.com/larvalabs/cryptopunks/master/punks.png"
urllib.request.urlretrieve(imgURL, "cryptoPunksAll.jpg")
img = PIL.Image.open("cryptoPunksAll.jpg").convert("RGB")
imgArray = np.asarray(img)
finalArray = np.empty((10000, 3, 24, 24))
for i in range(100):
  for j in range(100):
    a, b = 24 * i, 24 * (i + 1)  
    c, d = 24 * j, 24 * (j + 1) 
    idx = j + i * (100)
    finalArray[idx,0,:,:] = imgArray[a:b,c:d,0]
    finalArray[idx,1,:,:] = imgArray[a:b,c:d,1]
    finalArray[idx,2,:,:] = imgArray[a:b,c:d,2]

n = finalArray.shape[0]
x_real = finalArray / 255
x_real = torch.tensor(x_real.astype(np.float32))
kernel_size = 5
generator_input_dim = [16, 3, 3]

class create_generator(nn.Module):
    def __init__(self):
        super().__init__()        
        self.net = nn.Sequential(
            nn.ConvTranspose2d(16, 128, 10, 1, bias=False),
            nn.BatchNorm2d(128),
            nn.ReLU(True),
            nn.ConvTranspose2d(128, 3, 4, 2, 1, bias=False), 
            nn.Sigmoid(),
        )
    def forward(self, x):
        return self.net(x)
 
## Use the discriminator from the convnet chapter
class create_discriminator(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(3, 6, 5)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(6, 12, 5)
        self.fc1 = nn.Linear(12 * 3 * 3, 32)
        self.fc2 = nn.Linear(32, 1)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = torch.flatten(x, 1) # flatten all dimensions except batch
        x = F.relu(self.fc1(x))
        x = torch.sigmoid(self.fc2(x))
        return x
    
        
generator = create_generator()
discriminator = create_discriminator()

lr = 1e-4

## y is n real images then n fake images
y = torch.concat( (torch.ones(n), torch.zeros(n) ) ) 

## Set up optimizers
optimizerD = optim.Adam(discriminator.parameters(), lr=lr)
optimizerG = optim.Adam(generator.parameters(), lr=lr)

## Set up the loss function
loss_function = nn.BCELoss()

In [15]:
randomBatchSize = .1
n_epochs = 20
trainFraction = .1

for epoch in range(n_epochs):          
    ## Generate the batch sample
    sample = np.random.uniform(size = n) < trainFraction
    n_batch = np.sum(sample)

    ## Generate the simulated embedding    
    embedding = torch.randn([n_batch]+generator_input_dim, device = device)
    
    
    ## Generate new fake images
    x_fake = generator(embedding)
    
    ## train the discriminator
    ## zero out the gradient
    discriminator.zero_grad()

    ## run the generated and fake images through the discriminator
    yhat_fake = discriminator(x_fake.detach())
    yhat_real = discriminator(x_real[sample,:, :, :])
    ## Note you have to concatenate them in the same order as 
    ## the previous cell. Remember we did real then fake
    yhat = torch.concat( (yhat_real, yhat_fake) ).reshape(-1)

    ## Calculate loss on all-real batch 
    y = torch.concat( (torch.ones(n_batch), torch.zeros(n_batch) ) ) 

    discriminator_error = loss_function(yhat, y)

    # Calculate gradients for D in backward pass
    discriminator_error.backward(retain_graph = True)

    # Update the discriminator
    optimizerD.step()

    ## Train the generator
    ## zero out the gradient
    generator.zero_grad()
    ## The discriminator has been udpated, so push the data through the 
    ## new discriminator
    yhat_fake = discriminator(x_fake)
    ## Note the outcome for the generator is all ones even
    ## though we're classifying real as 1 and fake as 0
    ## In other words, we want the loss for the generator to be
    ## based on how real-like the generated data is
    generator_error = loss_function( yhat_fake,  torch.ones( (n_batch, 1) ) )
    ## Calculate the backwards error
    generator_error.backward(retain_graph = False)
    # Update the discriminator
    optimizerG.step()
    
    if (epoch + 1) % 10 == 0:  
      print(epoch, end = ",")

RuntimeError: ignored