### Hugging Face Accelerate Demo

**Note**: Before running this demo, please make sure that you have `wandb.ai` free account. 

Let us install `accelerate`.

In [1]:
!pip install accelerate

Collecting accelerate
  Downloading accelerate-0.5.1-py3-none-any.whl (58 kB)
[K     |████████████████████████████████| 58 kB 669 kB/s eta 0:00:01
Installing collected packages: accelerate
Successfully installed accelerate-0.5.1


**Import** the required modules.

In [3]:
import torch
import torchvision
import wandb
import datetime
from torch.optim import SGD
from torch.optim.lr_scheduler import CosineAnnealingLR
from torch.utils.data import DataLoader
from torchvision import datasets, transforms
from ui import progress_bar

# This is a demo of the PyTorch Accelerate API.
from accelerate import Accelerator

**`wandb`** initialization. See [`wandb_demo`](https://github.com/roatienza/Deep-Learning-Experiments/blob/master/versions/2022/tools/python/wandb_demo.ipynb) notebook for more details.


In [4]:
wandb.login()
config = {
  "learning_rate": 0.1,
  "epochs": 100,
  "batch_size": 128,
  "dataset": "cifar10"
}
run = wandb.init(project="accelerate-project", entity="upeee", config=config)

[34m[1mwandb[0m: Currently logged in as: [33mrowel[0m (use `wandb login --relogin` to force relogin)
2022-03-11 10:00:23.760100: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2022-03-11 10:00:23.760140: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.


### Build the model

Use a ResNet18 from `torchvision`. See [`wandb_demo`](https://github.com/roatienza/Deep-Learning-Experiments/blob/master/versions/2022/tools/python/wandb_demo.ipynb) notebook for more details.

In [8]:
# Shows the code to be replaced with the Accelerate API.
#device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
accelerator = Accelerator()

model = torchvision.models.resnet18(pretrained=False, progress=True)

model.fc = torch.nn.Linear(model.fc.in_features, 10) 

# Replace the model with the Accelerate API.
#model.to(device)

# watch model gradients during training
wandb.watch(model)

[]

### Loss function, Optimizer, Scheduler and DataLoader

See [`wandb_demo`](https://github.com/roatienza/Deep-Learning-Experiments/blob/master/versions/2022/tools/python/wandb_demo.ipynb) notebook for more details.


In [9]:
loss = torch.nn.CrossEntropyLoss()
optimizer = SGD(model.parameters(), lr=wandb.config.learning_rate)
scheduler = CosineAnnealingLR(optimizer, T_max=wandb.config.epochs)

x_train = datasets.CIFAR10(root='./data', train=True, 
                           download=True, 
                           transform=transforms.ToTensor())
x_test = datasets.CIFAR10(root='./data',
                          train=False, 
                          download=True, 
                          transform=transforms.ToTensor())
train_loader = DataLoader(x_train, 
                          batch_size=wandb.config.batch_size, 
                          shuffle=True, 
                          num_workers=2)
test_loader = DataLoader(x_test, 
                         batch_size=wandb.config.batch_size, 
                         shuffle=False, 
                         num_workers=2)

Files already downloaded and verified
Files already downloaded and verified


### Visulaizing sample data from test split


See [`wandb_demo`](https://github.com/roatienza/Deep-Learning-Experiments/blob/master/versions/2022/tools/python/wandb_demo.ipynb) notebook for more details.


Note the last line that uses Accelerate API to wrap the model, optimizer, data loaders and scheduler.

In [10]:

label_human = ["airplane", "automobile", "bird", "cat", "deer", "dog", "frog", "horse", "ship", "truck"]

table_test = wandb.Table(columns=['Image', "Ground Truth", "Initial Pred Label",])

image, label = iter(test_loader).next()
model.eval()
with torch.no_grad():
  pred = torch.argmax(model(image), dim=1).cpu().numpy()
  # Replace the model with the Accelerate API.
  #pred = torch.argmax(model(image.to(device)), dim=1).cpu().numpy()

for i in range(8):
  table_test.add_data(wandb.Image(image[i]),
                      label_human[label[i]], 
                      label_human[pred[i]])
  print(label_human[label[i]], "vs. ",  label_human[pred[i]])

# Accelerate API
model, optimizer, scheduler, train_loader, test_loader = accelerator.prepare(model,
                                                                             optimizer,
                                                                             scheduler, 
                                                                             train_loader, 
                                                                             test_loader)

cat vs.  deer
ship vs.  deer
ship vs.  deer
airplane vs.  deer
frog vs.  deer
frog vs.  deer
automobile vs.  deer
frog vs.  deer


### The train loop

Using Accelerate, we do not need to transfer the model to the `device`.

See [`wandb_demo`](https://github.com/roatienza/Deep-Learning-Experiments/blob/master/versions/2022/tools/python/wandb_demo.ipynb) notebook for more details.

In [11]:
def train(epoch):
  model.train()
  train_loss = 0
  correct = 0
  train_samples = 0

  # sample a batch. compute loss and backpropagate
  for batch_idx, (data, target) in enumerate(train_loader):
    optimizer.zero_grad()
    # Replaced by the Accelerate API.
    #target = target.to(device)
    #output = model(data.to(device))
    
    output = model(data)
    loss_value = loss(output, target)

    # Replaced by the Accelerate API.
    #loss_value.backward()
    accelerator.backward(loss_value)

    optimizer.step()
    scheduler.step(epoch)
    train_loss += loss_value.item()
    train_samples += len(data)
    pred = output.argmax(dim=1, keepdim=True)
    correct += pred.eq(target.view_as(pred)).sum().item()
    if batch_idx % 10 == 0:
      accuracy = 100. * correct / len(train_loader.dataset)
      progress_bar(batch_idx,
                   len(train_loader),
                   'Train Epoch: {}, Loss: {:.6f}, Acc: {:.2f}%'.format(epoch+1, 
                   train_loss/train_samples, accuracy))
  
  train_loss /= len(train_loader.dataset)
  accuracy = 100. * correct / len(train_loader.dataset)

  return accuracy, train_loss

### The validation loop

After every epoch, we will run the validation loop for the model. Again, no need to transfer the data to the `device`.

See [`wandb_demo`](https://github.com/roatienza/Deep-Learning-Experiments/blob/master/versions/2022/tools/python/wandb_demo.ipynb) notebook for more details.

In [12]:
def test():
  model.eval()
  test_loss = 0
  correct = 0
  with torch.no_grad():
    for data, target in test_loader:

      # Replaced by the Accelerate API.
      #output = model(data.to(device))   
      #target = target.to(device)

      output = model(data)
      test_loss += loss(output, target).item()
      pred = output.argmax(dim=1, keepdim=True)
      correct += pred.eq(target.view_as(pred)).sum().item()

  test_loss /= len(test_loader.dataset)
  accuracy = 100. * correct / len(test_loader.dataset)

  print('\nTest Loss: {:.4f}, Acc: {:.2f}%\n'.format(test_loss, accuracy))

  return accuracy, test_loss

### `wandb` plots

Finally, we will use `wandb` to visualize the training progress. 
See [`wandb_demo`](https://github.com/roatienza/Deep-Learning-Experiments/blob/master/versions/2022/tools/python/wandb_demo.ipynb) notebook for more details.

In [13]:
run.display(height=1000)

start_time = datetime.datetime.now()
best_acc = 0
for epoch in range(wandb.config["epochs"]):
    train_acc, train_loss = train(epoch)
    test_acc, test_loss = test()
    if test_acc > best_acc:
        wandb.run.summary["Best accuracy"] = test_acc
        best_acc = test_acc
        accelerator.save(model, "resnet18_best_acc.pth")
    wandb.log({
        "Train accuracy": train_acc,
        "Test accuracy": test_acc,
        "Train loss": train_loss,
        "Test loss": test_loss,
        "Learning rate": optimizer.param_groups[0]['lr']
    })

elapsed_time = datetime.datetime.now() - start_time
print("Elapsed time: %s" % elapsed_time)
wandb.run.summary["Elapsed train time"] = str(elapsed_time)

model.eval()
with torch.no_grad():
  pred = torch.argmax(model(image), dim=1).cpu().numpy()
  # Replace the model with the Accelerate API.
  #pred = torch.argmax(model(image.to(device)), dim=1).cpu().numpy()

final_pred = []
for i in range(8):
    final_pred.append(label_human[pred[i]])
    print(label_human[label[i]], "vs. ",  final_pred[i])

table_test.add_column(name="Final Pred Label", data=final_pred)

wandb.log({"Test data": table_test})

wandb.finish()



 [>.............................]  Step: 2m32s | Tot: 0ms | Train Epoch: 1, Loss: 0.019245, Acc: 0.02% 1/391 




Test Loss: 0.0180, Acc: 38.62%


Test Loss: 0.0100, Acc: 55.81%


Test Loss: 0.0106, Acc: 54.48%


Test Loss: 0.0141, Acc: 48.35%


Test Loss: 0.0085, Acc: 64.41%


Test Loss: 0.0078, Acc: 68.59%


Test Loss: 0.0087, Acc: 64.55%


Test Loss: 0.0095, Acc: 66.05%


Test Loss: 0.0096, Acc: 66.48%


Test Loss: 0.0089, Acc: 69.59%


Test Loss: 0.0098, Acc: 68.00%


Test Loss: 0.0116, Acc: 66.12%


Test Loss: 0.0110, Acc: 69.73%


Test Loss: 0.0115, Acc: 68.32%


Test Loss: 0.0120, Acc: 70.13%


Test Loss: 0.0117, Acc: 69.34%


Test Loss: 0.0152, Acc: 65.05%


Test Loss: 0.0134, Acc: 69.91%


Test Loss: 0.0120, Acc: 72.63%


Test Loss: 0.0143, Acc: 70.62%


Test Loss: 0.0128, Acc: 72.18%


Test Loss: 0.0130, Acc: 72.18%


Test Loss: 0.0138, Acc: 71.30%


Test Loss: 0.0133, Acc: 72.74%


Test Loss: 0.0148, Acc: 71.59%


Test Loss: 0.0134, Acc: 73.89%


Test Loss: 0.0158, Acc: 70.42%


Test Loss: 0.0143, Acc: 73.17%


Test Loss: 0.0142, Acc: 73.91%


Test Loss: 0.0147, Acc: 73.79%


Test Loss


Test Loss: 0.0172, Acc: 75.48%


Test Loss: 0.0168, Acc: 75.72%


Test Loss: 0.0170, Acc: 75.68%


Test Loss: 0.0166, Acc: 75.66%


Test Loss: 0.0167, Acc: 75.66%


Test Loss: 0.0169, Acc: 75.89%


Test Loss: 0.0168, Acc: 75.90%


Test Loss: 0.0172, Acc: 75.75%


Test Loss: 0.0171, Acc: 75.71%


Test Loss: 0.0167, Acc: 75.58%


Test Loss: 0.0171, Acc: 75.86%


Test Loss: 0.0169, Acc: 75.86%


Test Loss: 0.0172, Acc: 75.79%


Test Loss: 0.0170, Acc: 75.77%


Test Loss: 0.0172, Acc: 75.86%


Test Loss: 0.0173, Acc: 75.82%


Test Loss: 0.0169, Acc: 75.76%


Test Loss: 0.0172, Acc: 75.69%


Test Loss: 0.0173, Acc: 75.69%


Test Loss: 0.0172, Acc: 75.70%


Test Loss: 0.0170, Acc: 75.81%


Test Loss: 0.0173, Acc: 75.72%


Test Loss: 0.0174, Acc: 75.55%


Test Loss: 0.0172, Acc: 75.73%


Test Loss: 0.0173, Acc: 75.84%


Test Loss: 0.0170, Acc: 75.65%


Test Loss: 0.0172, Acc: 75.71%


Test Loss: 0.0171, Acc: 75.63%


Test Loss: 0.0173, Acc: 75.82%


Test Loss: 0.0173, Acc: 75.64%


Test Loss

RuntimeError: Input type (torch.FloatTensor) and weight type (torch.cuda.FloatTensor) should be the same or input should be a MKLDNN tensor and weight is a dense tensor

### Load the best performing model

In the following code, we load the best performing model. The model is saved in `./resnet18_best_acc.pth`. The average accuracy of the model is the same as the one in the previous section.

In [14]:
model = torch.load("resnet18_best_acc.pth")
# Using Accelerator API
model = accelerator.prepare(model)
accuracy, _ = test()
print("Best accuracy: %.2f" % accuracy)


Test Loss: 0.0168, Acc: 75.90%

Best accuracy: 75.90
