# CUDA

CUDA is a parallel computing platform and programming model that makes using a GPU for general purpose computing simple and elegant. The developer still programs in the familiar C, C++, Fortran, or an ever expanding list of supported languages, and incorporates extensions of these languages in the form of a few basic keywords.

These keywords let the developer express massive amounts of parallelism and direct the compiler to the portion of the application that maps to the GPU.

* PyTorch CUDA installation guide: https://pytorch.org/get-started/locally/
* NVIDIA CUDA Toolkit: https://docs.nvidia.com/cuda/cuda-installation-guide-microsoft-windows/index.html
* CUDA-capable GPU list:https://developer.nvidia.com/zh-cn/cuda-gpus
* PyTorch CUDA sematics: https://pytorch.org/docs/stable/notes/cuda.html

### Use pinned memory buffers

Host to GPU copies are much faster when they originate from pinned (page-locked) memory. CPU tensors and storages expose a `pin_memory()` method, that returns a copy of the object, with data put in a pinned region.

Also, once you pin a tensor or storage, you can use asynchronous GPU copies. Just pass an additional `non_blocking=True` argument to a `to()` or a `cuda()` call. This can be used to overlap data transfers with computation.

You can make the DataLoader return batches placed in pinned memory by passing `pin_memory=True` to its constructor.

### Is CUDA available?

In [1]:
import torch
torch.cuda.is_available()

True

In [2]:
## Get Id of default device
torch.cuda.current_device()

0

In [3]:
torch.cuda.get_device_name(0) # Get name device with ID '0'

'NVIDIA GeForce GTX 1070'

In [4]:
# Returns the current GPU memory usage by 
# tensors in bytes for a given device
torch.cuda.memory_allocated()

0

In [5]:
# Returns the current GPU memory managed by the
# caching allocator in bytes for a given device
torch.cuda.memory_reserved()

0

### Host tensor in GPU

In [6]:
# GPU
a = torch.FloatTensor([1., 2.]).cuda()
a.device

device(type='cuda', index=0)

### Sending Models to GPU

In [7]:
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.utils.data import Dataset, DataLoader
from sklearn.model_selection import train_test_split

import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

In [8]:
class Model(nn.Module):
    def __init__(self, in_features=4, h1=8, h2=9, out_features=3):
        super().__init__()
        self.fc1 = nn.Linear(in_features,h1)    # input layer
        self.fc2 = nn.Linear(h1, h2)            # hidden layer
        self.out = nn.Linear(h2, out_features)  # output layer
        
    def forward(self, x):
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.out(x)
        return x

In [9]:
torch.manual_seed(32)
model = Model()

In [10]:
# From the discussions here: discuss.pytorch.org/t/how-to-check-if-model-is-on-cuda
next(model.parameters()).is_cuda

False

In [11]:
gpumodel = model.cuda()

In [12]:
next(gpumodel.parameters()).is_cuda

True

### Convert Tensors to .cuda() tensors

In [13]:
df = pd.read_csv('./iris.csv')
X = df.drop('target',axis=1).values
y = df['target'].values
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.2,random_state=33)

In [14]:
X_train = torch.FloatTensor(X_train).cuda()
X_test = torch.FloatTensor(X_test).cuda()
y_train = torch.LongTensor(y_train).cuda()
y_test = torch.LongTensor(y_test).cuda()

In [15]:
trainloader = DataLoader(X_train, batch_size=60, shuffle=True)
testloader = DataLoader(X_test, batch_size=60, shuffle=False)

In [16]:
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.01)

In [17]:
import time
epochs = 100
losses = []
start = time.time()
for i in range(epochs):
    i+=1
    y_pred = gpumodel.forward(X_train)
    loss = criterion(y_pred, y_train)
    losses.append(loss)
    
    # a neat trick to save screen space:
    if i%10 == 1:
        print(f'epoch: {i:2}  loss: {loss.item():10.8f}')

    optimizer.zero_grad()
    loss.backward()
    optimizer.step()
    
print(f'TOTAL TRAINING TIME: {time.time()-start}')

epoch:  1  loss: 1.15071130
epoch: 11  loss: 0.93773133
epoch: 21  loss: 0.77982587
epoch: 31  loss: 0.60993999
epoch: 41  loss: 0.40079933
epoch: 51  loss: 0.25436321
epoch: 61  loss: 0.15053053
epoch: 71  loss: 0.10086945
epoch: 81  loss: 0.08128317
epoch: 91  loss: 0.07231430
TOTAL TRAINING TIME: 1.1782209873199463
