<a href="https://colab.research.google.com/github/kdwang1808/BDE-1/blob/main/Lab%201_DP.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **BDE - Lab1 - Fall 2021**
> Kaidong Wang - Dec 5, 2021

* 本实验采用LeNet-5网络，在MNIST数字识别任务中对比了使用CPU，单GPU和多GPU并行训练的效率。

## 1. 查看CPU型号与个数

In [None]:
! cat /proc/cpuinfo

processor	: 0
vendor_id	: GenuineIntel
cpu family	: 6
model		: 79
model name	: Intel(R) Xeon(R) CPU @ 2.20GHz
stepping	: 0
microcode	: 0x1
cpu MHz		: 2199.998
cache size	: 56320 KB
physical id	: 0
siblings	: 2
core id		: 0
cpu cores	: 1
apicid		: 0
initial apicid	: 0
fpu		: yes
fpu_exception	: yes
cpuid level	: 13
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single ssbd ibrs ibpb stibp fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm rdseed adx smap xsaveopt arat md_clear arch_capabilities
bugs		: cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs taa
bogomips	: 4399.99
clflush size	: 64
cache_alignment	: 64
address sizes	: 46 bits physical, 48 b

## 2. 查看GPU型号与个数

In [None]:
! nvidia-smi

Sun Dec  5 16:20:39 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.63.01    Driver Version: 470.63.01    CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla V100-SXM2...  Off  | 00000000:3D:00.0 Off |                    0 |
| N/A   34C    P0    40W / 300W |      8MiB / 16160MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  Tesla V100-SXM2...  Off  | 00000000:3E:00.0 Off |                    0 |
| N/A   31C    P0    39W / 300W |      8MiB / 16160MiB |      0%      Default |
|       

## 3. 构建LeNet-5网络


In [None]:
import os
import sys
import numpy as np
import pandas as pd
import torch
import torch.nn as nn
import torch.nn.functional as F
import torchvision
from torchvision import datasets, transforms
import torch.multiprocessing as mp
import torch.distributed as dist
from tqdm import tqdm
from datetime import datetime

In [None]:
# Data Preprocess
def get_transform(size, padding, mean, std, preprocess):
    transform = []
    if preprocess:
        transform.append(transforms.RandomCrop(size=size, padding=padding))
        transform.append(transforms.RandomHorizontalFlip())
    transform.append(transforms.ToTensor())
    transform.append(transforms.Normalize(mean, std))
    return transforms.Compose(transform)

In [None]:
# Model Evaluation
def eval(model, loss, dataloader, device, verbose):
    model.eval()
    total = 0
    correct1 = 0
    with torch.no_grad():
        for images, labels in dataloader:
            images, labels = images.to(device), labels.to(device)
            output = model(images)
            total += loss(output, labels).item() * images.size(0)
            _, pred = output.topk(5, dim=1)
            correct = pred.eq(labels.view(-1, 1).expand_as(pred))
            correct1 += correct[:,:1].sum().item()
    average_loss = total / len(dataloader.dataset)
    accuracy1 = 100. * correct1 / len(dataloader.dataset)
    if verbose:
        print('Evaluation: Average loss: {:.4f}, Top 1 Accuracy: {}/{} ({:.2f}%)'.format(
            average_loss, correct1, len(dataloader.dataset), accuracy1))

In [None]:
# Build LeNet5
class LeNet5(nn.Module):
    def __init__(self):
        super(LeNet5, self).__init__()
        self.conv1 = nn.Conv2d(1, 6, 3)
        self.conv2 = nn.Conv2d(6, 16, 3)
        self.fc1 = nn.Linear(16 * 5 * 5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2))
        x = F.max_pool2d(F.relu(self.conv2(x)), 2)
        x = x.view(-1, int(x.nelement() / x.shape[0]))
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

In [None]:
# Prepare Dataset
input_shape, num_classes = (1, 28, 28), 10
mean, std = (0.1307,), (0.3081,)
transform = get_transform(size=28, padding=0, mean=mean, std=std, preprocess=False)

train_dataset = datasets.MNIST('./data', train=True, download=True, transform=transform)
test_dataset = datasets.MNIST('./data', train=False, download=True, transform=transform)

train_dataloader = torch.utils.data.DataLoader(train_dataset, batch_size=256, shuffle=True)
test_dataloader = torch.utils.data.DataLoader(test_dataset, batch_size=512, shuffle=True)

## 4. CPU训练

In [None]:
device = torch.device("cpu")

torch.manual_seed(0)
model = LeNet5().to(device=device)
# if torch.cuda.device_count() > 1:
#     model = nn.DataParallel(model)
model.to(device)

epochs = 30
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=0.001, momentum=0.9, weight_decay=0)

start = datetime.now()
total_step = len(train_dataloader)
for epoch in tqdm(range(epochs)):
    for i, (images, labels) in enumerate(train_dataloader):
        images, labels= images.to(device), labels.to(device)
        outputs = model(images)
        loss = criterion(outputs, labels)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        if (i + 1) % 100 == 0:
            print(f'Epoch: [{epoch + 1}/{epochs}], Step: [{i + 1}/{total_step}], Loss: {loss.item():.4f}')

print("\nTraining time: " + str(datetime.now() - start))
eval(model, criterion, test_dataloader, device, 1)

  0%|          | 0/30 [00:00<?, ?it/s]

Epoch: [1/30], Step: [100/235], Loss: 2.2846
Epoch: [1/30], Step: [200/235], Loss: 2.2301


  3%|▎         | 1/30 [00:18<08:57, 18.52s/it]

Epoch: [2/30], Step: [100/235], Loss: 1.7188
Epoch: [2/30], Step: [200/235], Loss: 0.6826


  7%|▋         | 2/30 [00:36<08:34, 18.36s/it]

Epoch: [3/30], Step: [100/235], Loss: 0.4484
Epoch: [3/30], Step: [200/235], Loss: 0.4559


 10%|█         | 3/30 [00:54<08:12, 18.26s/it]

Epoch: [4/30], Step: [100/235], Loss: 0.3519
Epoch: [4/30], Step: [200/235], Loss: 0.2225


 13%|█▎        | 4/30 [01:13<07:54, 18.25s/it]

Epoch: [5/30], Step: [100/235], Loss: 0.1940
Epoch: [5/30], Step: [200/235], Loss: 0.2298


 17%|█▋        | 5/30 [01:31<07:35, 18.21s/it]

Epoch: [6/30], Step: [100/235], Loss: 0.2051
Epoch: [6/30], Step: [200/235], Loss: 0.1937


 20%|██        | 6/30 [01:49<07:18, 18.25s/it]

Epoch: [7/30], Step: [100/235], Loss: 0.1864
Epoch: [7/30], Step: [200/235], Loss: 0.1807


 23%|██▎       | 7/30 [02:08<07:01, 18.31s/it]

Epoch: [8/30], Step: [100/235], Loss: 0.1628
Epoch: [8/30], Step: [200/235], Loss: 0.1806


 27%|██▋       | 8/30 [02:26<06:42, 18.30s/it]

Epoch: [9/30], Step: [100/235], Loss: 0.1972
Epoch: [9/30], Step: [200/235], Loss: 0.1001


 30%|███       | 9/30 [02:44<06:25, 18.35s/it]

Epoch: [10/30], Step: [100/235], Loss: 0.1287
Epoch: [10/30], Step: [200/235], Loss: 0.1257


 33%|███▎      | 10/30 [03:06<06:25, 19.29s/it]

Epoch: [11/30], Step: [100/235], Loss: 0.1180
Epoch: [11/30], Step: [200/235], Loss: 0.1140


 37%|███▋      | 11/30 [03:26<06:12, 19.60s/it]

Epoch: [12/30], Step: [100/235], Loss: 0.2208
Epoch: [12/30], Step: [200/235], Loss: 0.0595


 40%|████      | 12/30 [03:45<05:52, 19.56s/it]

Epoch: [13/30], Step: [100/235], Loss: 0.1544
Epoch: [13/30], Step: [200/235], Loss: 0.1222


 43%|████▎     | 13/30 [04:05<05:33, 19.59s/it]

Epoch: [14/30], Step: [100/235], Loss: 0.0810
Epoch: [14/30], Step: [200/235], Loss: 0.0850


 47%|████▋     | 14/30 [04:25<05:15, 19.70s/it]

Epoch: [15/30], Step: [100/235], Loss: 0.0840
Epoch: [15/30], Step: [200/235], Loss: 0.1101


 50%|█████     | 15/30 [04:44<04:54, 19.61s/it]

Epoch: [16/30], Step: [100/235], Loss: 0.0773
Epoch: [16/30], Step: [200/235], Loss: 0.1030


 53%|█████▎    | 16/30 [05:04<04:35, 19.68s/it]

Epoch: [17/30], Step: [100/235], Loss: 0.0893
Epoch: [17/30], Step: [200/235], Loss: 0.1190


 57%|█████▋    | 17/30 [05:25<04:19, 20.00s/it]

Epoch: [18/30], Step: [100/235], Loss: 0.0667
Epoch: [18/30], Step: [200/235], Loss: 0.0894


 60%|██████    | 18/30 [05:47<04:07, 20.64s/it]

Epoch: [19/30], Step: [100/235], Loss: 0.1312
Epoch: [19/30], Step: [200/235], Loss: 0.0397


 63%|██████▎   | 19/30 [06:09<03:51, 21.07s/it]

Epoch: [20/30], Step: [100/235], Loss: 0.0651
Epoch: [20/30], Step: [200/235], Loss: 0.0433


 67%|██████▋   | 20/30 [06:30<03:29, 20.96s/it]

Epoch: [21/30], Step: [100/235], Loss: 0.0908
Epoch: [21/30], Step: [200/235], Loss: 0.1197


 70%|███████   | 21/30 [06:51<03:09, 21.03s/it]

Epoch: [22/30], Step: [100/235], Loss: 0.0403
Epoch: [22/30], Step: [200/235], Loss: 0.0846


 73%|███████▎  | 22/30 [07:11<02:45, 20.72s/it]

Epoch: [23/30], Step: [100/235], Loss: 0.0552
Epoch: [23/30], Step: [200/235], Loss: 0.0402


 77%|███████▋  | 23/30 [07:32<02:25, 20.85s/it]

Epoch: [24/30], Step: [100/235], Loss: 0.0829
Epoch: [24/30], Step: [200/235], Loss: 0.0949


 80%|████████  | 24/30 [07:54<02:05, 20.97s/it]

Epoch: [25/30], Step: [100/235], Loss: 0.0520
Epoch: [25/30], Step: [200/235], Loss: 0.0795


 83%|████████▎ | 25/30 [08:16<01:46, 21.37s/it]

Epoch: [26/30], Step: [100/235], Loss: 0.1127
Epoch: [26/30], Step: [200/235], Loss: 0.0604


 87%|████████▋ | 26/30 [08:36<01:23, 20.92s/it]

Epoch: [27/30], Step: [100/235], Loss: 0.0969
Epoch: [27/30], Step: [200/235], Loss: 0.0795


 90%|█████████ | 27/30 [08:55<01:01, 20.46s/it]

Epoch: [28/30], Step: [100/235], Loss: 0.0435
Epoch: [28/30], Step: [200/235], Loss: 0.0515


 93%|█████████▎| 28/30 [09:15<00:40, 20.23s/it]

Epoch: [29/30], Step: [100/235], Loss: 0.0515
Epoch: [29/30], Step: [200/235], Loss: 0.0570


 97%|█████████▋| 29/30 [09:35<00:20, 20.22s/it]

Epoch: [30/30], Step: [100/235], Loss: 0.0438
Epoch: [30/30], Step: [200/235], Loss: 0.0477


100%|██████████| 30/30 [09:55<00:00, 19.86s/it]



Training time: 0:09:55.736387
Evaluation: Average loss: 0.0609, Top 1 Accuracy: 9810/10000 (98.10%)


## 5. 单GPU训练 (Tesla V100)

In [None]:
torch.cuda.is_available()

True

In [None]:
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

torch.manual_seed(0)
model = LeNet5().to(device=device)
# if torch.cuda.device_count() > 1:
#     model = nn.DataParallel(model)
model.to(device)

epochs = 30
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=0.001, momentum=0.9, weight_decay=0)

start = datetime.now()
total_step = len(train_dataloader)
for epoch in tqdm(range(epochs)):
    for i, (images, labels) in enumerate(train_dataloader):
        images, labels= images.to(device), labels.to(device)
        outputs = model(images)
        loss = criterion(outputs, labels)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        if (i + 1) % 100 == 0:
            print(f'Epoch: [{epoch + 1}/{epochs}], Step: [{i + 1}/{total_step}], Loss: {loss.item():.4f}')

print("\nTraining time: " + str(datetime.now() - start))
eval(model, criterion, test_dataloader, device, 1)

  return torch.max_pool2d(input, kernel_size, stride, padding, dilation, ceil_mode)


Epoch: [1/30], Step: [100/235], Loss: 2.2846
Epoch: [1/30], Step: [200/235], Loss: 2.2301


  3%|▎         | 1/30 [00:11<05:24, 11.18s/it]

Epoch: [2/30], Step: [100/235], Loss: 1.7189
Epoch: [2/30], Step: [200/235], Loss: 0.6827


  7%|▋         | 2/30 [00:20<04:43, 10.11s/it]

Epoch: [3/30], Step: [100/235], Loss: 0.4485
Epoch: [3/30], Step: [200/235], Loss: 0.4559


 10%|█         | 3/30 [00:29<04:20,  9.64s/it]

Epoch: [4/30], Step: [100/235], Loss: 0.3524
Epoch: [4/30], Step: [200/235], Loss: 0.2226


 13%|█▎        | 4/30 [00:38<04:05,  9.45s/it]

Epoch: [5/30], Step: [100/235], Loss: 0.1938
Epoch: [5/30], Step: [200/235], Loss: 0.2299


 17%|█▋        | 5/30 [00:47<03:53,  9.35s/it]

Epoch: [6/30], Step: [100/235], Loss: 0.2049
Epoch: [6/30], Step: [200/235], Loss: 0.1940


 20%|██        | 6/30 [00:57<03:42,  9.27s/it]

Epoch: [7/30], Step: [100/235], Loss: 0.1862
Epoch: [7/30], Step: [200/235], Loss: 0.1799


 23%|██▎       | 7/30 [01:06<03:32,  9.23s/it]

Epoch: [8/30], Step: [100/235], Loss: 0.1639
Epoch: [8/30], Step: [200/235], Loss: 0.1819


 27%|██▋       | 8/30 [01:15<03:22,  9.18s/it]

Epoch: [9/30], Step: [100/235], Loss: 0.1969
Epoch: [9/30], Step: [200/235], Loss: 0.1007


 30%|███       | 9/30 [01:24<03:11,  9.13s/it]

Epoch: [10/30], Step: [100/235], Loss: 0.1297
Epoch: [10/30], Step: [200/235], Loss: 0.1250


 33%|███▎      | 10/30 [01:33<03:03,  9.15s/it]

Epoch: [11/30], Step: [100/235], Loss: 0.1180
Epoch: [11/30], Step: [200/235], Loss: 0.1151


 37%|███▋      | 11/30 [01:42<02:52,  9.09s/it]

Epoch: [12/30], Step: [100/235], Loss: 0.2205
Epoch: [12/30], Step: [200/235], Loss: 0.0598


 40%|████      | 12/30 [01:51<02:43,  9.07s/it]

Epoch: [13/30], Step: [100/235], Loss: 0.1544
Epoch: [13/30], Step: [200/235], Loss: 0.1211


 43%|████▎     | 13/30 [02:00<02:34,  9.08s/it]

Epoch: [14/30], Step: [100/235], Loss: 0.0815
Epoch: [14/30], Step: [200/235], Loss: 0.0857


 47%|████▋     | 14/30 [02:09<02:25,  9.07s/it]

Epoch: [25/30], Step: [100/235], Loss: 0.0524
Epoch: [25/30], Step: [200/235], Loss: 0.0799


 83%|████████▎ | 25/30 [03:49<00:45,  9.11s/it]

Epoch: [26/30], Step: [100/235], Loss: 0.1131
Epoch: [26/30], Step: [200/235], Loss: 0.0630


 87%|████████▋ | 26/30 [03:58<00:36,  9.12s/it]

Epoch: [27/30], Step: [100/235], Loss: 0.0973
Epoch: [27/30], Step: [200/235], Loss: 0.0779


 90%|█████████ | 27/30 [04:08<00:27,  9.12s/it]

Epoch: [28/30], Step: [100/235], Loss: 0.0434
Epoch: [28/30], Step: [200/235], Loss: 0.0509


 93%|█████████▎| 28/30 [04:17<00:18,  9.13s/it]

Epoch: [29/30], Step: [100/235], Loss: 0.0524
Epoch: [29/30], Step: [200/235], Loss: 0.0578


 97%|█████████▋| 29/30 [04:26<00:09,  9.10s/it]

Epoch: [30/30], Step: [100/235], Loss: 0.0438
Epoch: [30/30], Step: [200/235], Loss: 0.0475


100%|██████████| 30/30 [04:35<00:00,  9.18s/it]



Training time: 0:04:35.382380
Evaluation: Average loss: 0.0610, Top 1 Accuracy: 9813/10000 (98.13%)


## 6. 多GPU并行训练 (2*Tesla V100)
> 方法：数据并行(DP)，单进程，多线程，多用于单机多卡

In [None]:
torch.cuda.is_available()

True

In [None]:
# Prepare Dataset
input_shape, num_classes = (1, 28, 28), 10
mean, std = (0.1307,), (0.3081,)
transform = get_transform(size=28, padding=0, mean=mean, std=std, preprocess=False)

train_dataset = datasets.MNIST('./data', train=True, download=True, transform=transform)
test_dataset = datasets.MNIST('./data', train=False, download=True, transform=transform)

train_dataloader = torch.utils.data.DataLoader(train_dataset, batch_size=512, shuffle=True)
test_dataloader = torch.utils.data.DataLoader(test_dataset, batch_size=512, shuffle=True)

In [None]:
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

torch.manual_seed(0)
model = LeNet5().to(device=device)
if torch.cuda.device_count() > 1:
    print("GPU number: ", torch.cuda.device_count())
    model = nn.DataParallel(model)
model.cuda()

epochs = 30
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=0.001, momentum=0.9, weight_decay=0)

start = datetime.now()
total_step = len(train_dataloader)
for epoch in tqdm(range(epochs)):
    for i, (images, labels) in enumerate(train_dataloader):
        images, labels= images.cuda(non_blocking=True), labels.cuda(non_blocking=True)
        outputs = model(images)
        loss = criterion(outputs, labels)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        if (i + 1) % 100 == 0:
            print(f'Epoch: [{epoch + 1}/{epochs}], Step: [{i + 1}/{total_step}], Loss: {loss.item():.4f}')

print("\nTraining time: " + str(datetime.now() - start))
eval(model, criterion, test_dataloader, device, 1)

GPU number:  2


  0%|          | 0/30 [00:00<?, ?it/s]

Epoch: [1/30], Step: [100/118], Loss: 2.2792


  3%|▎         | 1/30 [00:10<05:13, 10.81s/it]

Epoch: [2/30], Step: [100/118], Loss: 2.2004


  7%|▋         | 2/30 [00:21<04:54, 10.53s/it]

Epoch: [3/30], Step: [100/118], Loss: 1.7086


 10%|█         | 3/30 [00:31<04:38, 10.32s/it]

Epoch: [4/30], Step: [100/118], Loss: 0.5458


 13%|█▎        | 4/30 [00:41<04:29, 10.37s/it]

Epoch: [5/30], Step: [100/118], Loss: 0.4290


 17%|█▋        | 5/30 [00:52<04:26, 10.64s/it]

Epoch: [6/30], Step: [100/118], Loss: 0.3688


 20%|██        | 6/30 [01:03<04:13, 10.57s/it]

Epoch: [7/30], Step: [100/118], Loss: 0.2903


 23%|██▎       | 7/30 [01:13<04:01, 10.51s/it]

Epoch: [8/30], Step: [100/118], Loss: 0.2611


 27%|██▋       | 8/30 [01:23<03:49, 10.42s/it]

Epoch: [9/30], Step: [100/118], Loss: 0.2378


 30%|███       | 9/30 [01:34<03:40, 10.50s/it]

Epoch: [10/30], Step: [100/118], Loss: 0.2194


 33%|███▎      | 10/30 [01:44<03:23, 10.19s/it]

Epoch: [11/30], Step: [100/118], Loss: 0.2108


 37%|███▋      | 11/30 [01:53<03:11, 10.07s/it]

Epoch: [12/30], Step: [100/118], Loss: 0.1464


 40%|████      | 12/30 [02:03<02:58,  9.90s/it]

Epoch: [13/30], Step: [100/118], Loss: 0.2083


 43%|████▎     | 13/30 [02:12<02:46,  9.77s/it]

Epoch: [14/30], Step: [100/118], Loss: 0.1963


 47%|████▋     | 14/30 [02:22<02:34,  9.69s/it]

Epoch: [15/30], Step: [100/118], Loss: 0.2202


 50%|█████     | 15/30 [02:31<02:24,  9.64s/it]

Epoch: [16/30], Step: [100/118], Loss: 0.1535


 53%|█████▎    | 16/30 [02:41<02:15,  9.70s/it]

Epoch: [17/30], Step: [100/118], Loss: 0.1533


 57%|█████▋    | 17/30 [02:51<02:06,  9.70s/it]

Epoch: [18/30], Step: [100/118], Loss: 0.1373


 60%|██████    | 18/30 [03:01<01:56,  9.69s/it]

Epoch: [19/30], Step: [100/118], Loss: 0.0764


 63%|██████▎   | 19/30 [03:10<01:46,  9.65s/it]

Epoch: [20/30], Step: [100/118], Loss: 0.1248


 67%|██████▋   | 20/30 [03:20<01:36,  9.61s/it]

Epoch: [21/30], Step: [100/118], Loss: 0.1680


 70%|███████   | 21/30 [03:29<01:26,  9.63s/it]

Epoch: [22/30], Step: [100/118], Loss: 0.1392


 73%|███████▎  | 22/30 [03:39<01:17,  9.64s/it]

Epoch: [23/30], Step: [100/118], Loss: 0.0930


 77%|███████▋  | 23/30 [03:49<01:07,  9.65s/it]

Epoch: [24/30], Step: [100/118], Loss: 0.1265


 80%|████████  | 24/30 [03:58<00:57,  9.66s/it]

Epoch: [25/30], Step: [100/118], Loss: 0.0978


 83%|████████▎ | 25/30 [04:08<00:48,  9.62s/it]

Epoch: [26/30], Step: [100/118], Loss: 0.1164


 87%|████████▋ | 26/30 [04:17<00:38,  9.60s/it]

Epoch: [27/30], Step: [100/118], Loss: 0.1303


 90%|█████████ | 27/30 [04:27<00:28,  9.59s/it]

Epoch: [28/30], Step: [100/118], Loss: 0.1110


 93%|█████████▎| 28/30 [04:37<00:19,  9.62s/it]

Epoch: [29/30], Step: [100/118], Loss: 0.0797


 97%|█████████▋| 29/30 [04:46<00:09,  9.62s/it]

Epoch: [30/30], Step: [100/118], Loss: 0.0865


100%|██████████| 30/30 [04:56<00:00,  9.88s/it]



Training time: 0:04:56.333004
Evaluation: Average loss: 0.0823, Top 1 Accuracy: 9760/10000 (97.60%)


## 7. 效率对比

* CPU训练：9min58s
* 单GPU训练：4min35s
* 多GPU训练：4min56s

**结论：** GPU训练效率远高于CPU，但使用数据并行Data Parallel (DP) 的方法对训练效率没有明显提升，这是由于实验中使用数据量较小，且DP只用一个进程来计算模型权重并在每个批处理期间分发到各GPU，各GPU间通信效率低导致GPU利用率进一步降低。采用分布式数据并行Distributed Data parallel (DDP) 进行多进程训练可进一步提升效率。