<!--- Licensed to the Apache Software Foundation (ASF) under one -->
<!--- or more contributor license agreements.  See the NOTICE file -->
<!--- distributed with this work for additional information -->
<!--- regarding copyright ownership.  The ASF licenses this file -->
<!--- to you under the Apache License, Version 2.0 (the -->
<!--- "License"); you may not use this file except in compliance -->
<!--- with the License.  You may obtain a copy of the License at -->

<!---   http://www.apache.org/licenses/LICENSE-2.0 -->

<!--- Unless required by applicable law or agreed to in writing, -->
<!--- software distributed under the License is distributed on an -->
<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->
<!--- KIND, either express or implied.  See the License for the -->
<!--- specific language governing permissions and limitations -->
<!--- under the License. -->

# Step 7: Load and Run a NN using GPU

In this step, you will learn how to use graphics processing units (GPUs) with MXNet. If you use GPUs to train and deploy neural networks, you may be able to train or perform inference quicker than with central processing units (CPUs).

## Prerequisites

Before you start the steps, make sure you have at least one Nvidia GPU on your machine and make sure that you have CUDA properly installed. GPUs from AMD and Intel are not supported. Additionally, you will need to install the GPU-enabled version of MXNet. You can find information about how to install the GPU version of MXNet for your system [here](https://mxnet.apache.org/versions/1.4.1/install/ubuntu_setup.html).

You can use the following command to view the number GPUs that are available to MXNet.

In [1]:
from mxnet import np, npx, gluon, autograd
from mxnet.gluon import nn
import time
npx.set_np()

npx.num_gpus() #This command provides the number of GPUs MXNet can access

1

## Allocate data to a GPU

MXNet's ndarray is very similar to NumPy's. One major difference is that MXNet's ndarray has a `context` attribute specifieing which device an array is on. By default, arrays are stored on `npx.cpu()`. To change it to the first GPU, you can use the following code, `npx.gpu()` or `npx.gpu(0)` to indicate the first GPU.

In [2]:
gpu = npx.gpu() if npx.num_gpus() > 0 else npx.cpu()
x = np.ones((3,4), ctx=gpu)
x

[15:36:27] /work/mxnet/src/storage/storage.cc:202: Using Pooled (Naive) StorageManager for GPU


array([[1., 1., 1., 1.],
       [1., 1., 1., 1.],
       [1., 1., 1., 1.]], ctx=gpu(0))

If you're using a CPU, MXNet allocates data on the main memory and tries to use as many CPU cores as possible.  If there are multiple GPUs, MXNet will tell you which GPUs the ndarray is allocated on.

Assuming there is at least two GPUs. You can create another ndarray and assign it to a different GPU. If you only have one GPU, then you will get an error trying to run this code. In the example code here, you will copy `x` to the second GPU, `npx.gpu(1)`:

In [3]:
gpu_1 = npx.gpu(1) if npx.num_gpus() > 1 else npx.cpu()
x.copyto(gpu_1)

[15:36:27] /work/mxnet/src/storage/storage.cc

array([[1., 1., 1., 1.],
       [1., 1., 1., 1.],
       [1., 1., 1., 1.]])

:202: Using Pooled (Naive) StorageManager for CPU


MXNet requries that users explicitly move data between devices. But several operators such as `print`, and `asnumpy`, will implicitly move data to main memory.

## Choosing GPU Ids
If you have multiple GPUs on your machine, MXNet can access each of them through 0-indexing with `npx`. As you saw before, the first GPU was accessed using `npx.gpu(0)`, and the second using `npx.gpu(1)`. This extends to however many GPUs your machine has. So if your machine has eight GPUs, the last GPU is accessed using `npx.gpu(7)`. This allows you to select which GPUs to use for operations and training. You might find it particularly useful when you want to leverage multiple GPUs while training neural networks.

## Run an operation on a GPU

To perform an operation on a particular GPU, you only need to guarantee that the input of an operation is already on that GPU. The output is allocated on the same GPU as well. Almost all operators in the `np` and `npx` module support running on a GPU.

In [4]:
y = np.random.uniform(size=(3,4), ctx=gpu)
x + y

array([[1.7402194, 1.9209938, 1.0390205, 1.9689629],
       [1.9251406, 1.4463501, 1.6673192, 1.1099306],
       [1.4702187, 1.5131936, 1.7761751, 1.2947657]], ctx=gpu(0))

Remember that if the inputs are not on the same GPU, you will get an error.

## Run a neural network on a GPU

To run a neural network on a GPU, you only need to copy and move the input data and parameters to the GPU. To demonstrate this you can reuse the previously defined LeafNetwork in [Training Neural Networks](6-train-nn.md). The following code example shows this.

In [5]:
# The convolutional block has a convolution layer, a max pool layer and a batch normalization layer
def conv_block(filters, kernel_size=2, stride=2, batch_norm=True):
    conv_block = nn.HybridSequential()
    conv_block.add(nn.Conv2D(channels=filters, kernel_size=kernel_size, activation='relu'),
              nn.MaxPool2D(pool_size=4, strides=stride))
    if batch_norm:
        conv_block.add(nn.BatchNorm())
    return conv_block

# The dense block consists of a dense layer and a dropout layer
def dense_block(neurons, activation='relu', dropout=0.2):
    dense_block = nn.HybridSequential()
    dense_block.add(nn.Dense(neurons, activation=activation))
    if dropout:
        dense_block.add(nn.Dropout(dropout))
    return dense_block

# Create neural network blueprint using the blocks
class LeafNetwork(nn.HybridBlock):
    def __init__(self):
        super(LeafNetwork, self).__init__()
        self.conv1 = conv_block(32)
        self.conv2 = conv_block(64)
        self.conv3 = conv_block(128)
        self.flatten = nn.Flatten()
        self.dense1 = dense_block(100)
        self.dense2 = dense_block(10)
        self.dense3 = nn.Dense(2)

    def forward(self, batch):
        batch = self.conv1(batch)
        batch = self.conv2(batch)
        batch = self.conv3(batch)
        batch = self.flatten(batch)
        batch = self.dense1(batch)
        batch = self.dense2(batch)
        batch = self.dense3(batch)

        return batch

Load the saved parameters onto GPU 0 directly as shown below; additionally, you could use `net.collect_params().reset_ctx(gpu)` to change the device.

In [6]:
net = LeafNetwork()
net.load_parameters('leaf_models.params', ctx=gpu)

Use the following command to create input data on GPU 0. The forward function will then run on GPU 0.

In [7]:
x = np.random.uniform(size=(1, 3, 128, 128), ctx=gpu)
net(x)

[15:36:28] /work/mxnet/src/operator/nn/./cudnn/./cudnn_algoreg-inl.h:106: Running performance tests to find the best convolution algorithm, this can take a while... (set the environment variable MXNET_CUDNN_AUTOTUNE_DEFAULT to 0 to disable)


array([[3.1243267, 0.5664827]], ctx=gpu(0))

## Training with multiple GPUs

Finally, you will see how you can use multiple GPUs to jointly train a neural network through data parallelism. To elaborate on what data parallelism is, assume there are *n* GPUs, then you can split each data batch into *n* parts, and use a GPU on each of these parts to run the forward and backward passes on the seperate chunks of the data.

First copy the data definitions with the following commands, and the transform functions from the tutorial [Training Neural Networks](6-train-nn.md).

In [8]:
# Import transforms as compose a series of transformations to the images
from mxnet.gluon.data.vision import transforms

jitter_param = 0.05

# mean and std for normalizing image value in range (0,1)
mean = [0.485, 0.456, 0.406]
std = [0.229, 0.224, 0.225]

training_transformer = transforms.Compose([
    transforms.Resize(size=224, keep_ratio=True),
    transforms.CenterCrop(128),
    transforms.RandomFlipLeftRight(),
    transforms.RandomColorJitter(contrast=jitter_param),
    transforms.ToTensor(),
    transforms.Normalize(mean, std)
])

validation_transformer = transforms.Compose([
    transforms.Resize(size=224, keep_ratio=True),
    transforms.CenterCrop(128),
    transforms.ToTensor(),
    transforms.Normalize(mean, std)
])

# Use ImageFolderDataset to create a Dataset object from directory structure
train_dataset = gluon.data.vision.ImageFolderDataset('./datasets/train')
val_dataset = gluon.data.vision.ImageFolderDataset('./datasets/validation')
test_dataset = gluon.data.vision.ImageFolderDataset('./datasets/test')

# Create data loaders
batch_size = 4
train_loader = gluon.data.DataLoader(train_dataset.transform_first(training_transformer),batch_size=batch_size, shuffle=True, try_nopython=True)
validation_loader = gluon.data.DataLoader(val_dataset.transform_first(validation_transformer), batch_size=batch_size, try_nopython=True)
test_loader = gluon.data.DataLoader(test_dataset.transform_first(validation_transformer), batch_size=batch_size, try_nopython=True)

### Define a helper function
This is the same test function defined previously in the **Step 6**.

In [9]:
# Function to return the accuracy for the validation and test set
def test(val_data, devices):
    acc = gluon.metric.Accuracy()
    for batch in val_data:
        data, label = batch[0], batch[1]
        data_list = gluon.utils.split_and_load(data, devices)
        label_list = gluon.utils.split_and_load(label, devices)
        outputs = [net(X) for X in data_list]
        acc.update(label_list, outputs)

    _, accuracy = acc.get()
    return accuracy

The training loop is quite similar to that shown earlier. The major differences are highlighted in the following code.

In [10]:
# Diff 1: Use two GPUs for training.
available_gpus = [npx.gpu(i) for i in range(npx.num_gpus())]
num_gpus = 2
devices = available_gpus[:num_gpus]
print('Using {} GPUs'.format(len(devices)))

# Diff 2: reinitialize the parameters and place them on multiple GPUs
net.initialize(force_reinit=True, ctx=devices)

# Loss and trainer are the same as before
loss_fn = gluon.loss.SoftmaxCrossEntropyLoss()
optimizer = 'sgd'
optimizer_params = {'learning_rate': 0.001}
trainer = gluon.Trainer(net.collect_params(), optimizer, optimizer_params)

epochs = 2
accuracy = gluon.metric.Accuracy()
log_interval = 5

for epoch in range(epochs):
    train_loss = 0.
    tic = time.time()
    btic = time.time()
    accuracy.reset()
    for idx, batch in enumerate(train_loader):
        data, label = batch[0], batch[1]

        # Diff 3: split batch and load into corresponding devices
        data_list = gluon.utils.split_and_load(data, devices)
        label_list = gluon.utils.split_and_load(label, devices)

        # Diff 4: run forward and backward on each devices.
        # MXNet will automatically run them in parallel
        with autograd.record():
            outputs = [net(X)
                      for X in data_list]
            losses = [loss_fn(output, label)
                      for output, label in zip(outputs, label_list)]
        for l in losses:
            l.backward()
        trainer.step(batch_size)

        # Diff 5: sum losses over all devices. Here, the float
        # function will copy data into CPU.
        train_loss += sum([float(l.sum()) for l in losses])
        accuracy.update(label_list, outputs)
        if log_interval and (idx + 1) % log_interval == 0:
            _, acc = accuracy.get()

            print(f"""Epoch[{epoch + 1}] Batch[{idx + 1}] Speed: {batch_size / (time.time() - btic)} samples/sec \
                  batch loss = {train_loss} | accuracy = {acc}""")
            btic = time.time()

    _, acc = accuracy.get()

    acc_val = test(validation_loader, devices)
    print(f"[Epoch {epoch + 1}] training: accuracy={acc}")
    print(f"[Epoch {epoch + 1}] time cost: {time.time() - tic}")
    print(f"[Epoch {epoch + 1}] validation: validation accuracy={acc_val}")

Using 1 GPUs


Epoch[1] Batch[5] Speed: 0.7800640930384977 samples/sec                   batch loss = 14.94167971611023 | accuracy = 0.5


Epoch[1] Batch[10] Speed: 1.2551385430076734 samples/sec                   batch loss = 28.922159433364868 | accuracy = 0.5


Epoch[1] Batch[15] Speed: 1.2527030171109697 samples/sec                   batch loss = 44.19984483718872 | accuracy = 0.45


Epoch[1] Batch[20] Speed: 1.2579837487718002 samples/sec                   batch loss = 58.20937991142273 | accuracy = 0.475


Epoch[1] Batch[25] Speed: 1.2565204799266754 samples/sec                   batch loss = 72.98255777359009 | accuracy = 0.47


Epoch[1] Batch[30] Speed: 1.2548358839100653 samples/sec                   batch loss = 87.12670731544495 | accuracy = 0.48333333333333334


Epoch[1] Batch[35] Speed: 1.2561111562827136 samples/sec                   batch loss = 100.93852281570435 | accuracy = 0.4928571428571429


Epoch[1] Batch[40] Speed: 1.2549436378460457 samples/sec                   batch loss = 114.38246393203735 | accuracy = 0.5


Epoch[1] Batch[45] Speed: 1.2512475210485732 samples/sec                   batch loss = 128.20508122444153 | accuracy = 0.5111111111111111


Epoch[1] Batch[50] Speed: 1.2544769940614808 samples/sec                   batch loss = 141.79715156555176 | accuracy = 0.525


Epoch[1] Batch[55] Speed: 1.2556274804722392 samples/sec                   batch loss = 156.28957843780518 | accuracy = 0.5181818181818182


Epoch[1] Batch[60] Speed: 1.2504592921002355 samples/sec                   batch loss = 170.38401103019714 | accuracy = 0.5166666666666667


Epoch[1] Batch[65] Speed: 1.2572850890396232 samples/sec                   batch loss = 183.64068150520325 | accuracy = 0.5384615384615384


Epoch[1] Batch[70] Speed: 1.2499323710137733 samples/sec                   batch loss = 197.4080159664154 | accuracy = 0.5392857142857143


Epoch[1] Batch[75] Speed: 1.2534132602796382 samples/sec                   batch loss = 211.5394163131714 | accuracy = 0.5366666666666666


Epoch[1] Batch[80] Speed: 1.2554309203483973 samples/sec                   batch loss = 225.44428730010986 | accuracy = 0.53125


Epoch[1] Batch[85] Speed: 1.257551602463801 samples/sec                   batch loss = 238.90795874595642 | accuracy = 0.5352941176470588


Epoch[1] Batch[90] Speed: 1.2534751604124215 samples/sec                   batch loss = 251.57125306129456 | accuracy = 0.5444444444444444


Epoch[1] Batch[95] Speed: 1.256814819631379 samples/sec                   batch loss = 265.8668415546417 | accuracy = 0.5394736842105263


Epoch[1] Batch[100] Speed: 1.2585525069642585 samples/sec                   batch loss = 279.8087785243988 | accuracy = 0.535


Epoch[1] Batch[105] Speed: 1.2631939081964922 samples/sec                   batch loss = 293.8328905105591 | accuracy = 0.5357142857142857


Epoch[1] Batch[110] Speed: 1.2540493154878964 samples/sec                   batch loss = 307.43044662475586 | accuracy = 0.5386363636363637


Epoch[1] Batch[115] Speed: 1.2489270881480155 samples/sec                   batch loss = 321.70241475105286 | accuracy = 0.5413043478260869


Epoch[1] Batch[120] Speed: 1.2514185034355199 samples/sec                   batch loss = 334.96837520599365 | accuracy = 0.5458333333333333


Epoch[1] Batch[125] Speed: 1.254774506230114 samples/sec                   batch loss = 348.29314732551575 | accuracy = 0.552


Epoch[1] Batch[130] Speed: 1.2571520630868354 samples/sec                   batch loss = 361.83963346481323 | accuracy = 0.551923076923077


Epoch[1] Batch[135] Speed: 1.254490126279626 samples/sec                   batch loss = 376.2895600795746 | accuracy = 0.5462962962962963


Epoch[1] Batch[140] Speed: 1.2550945056714176 samples/sec                   batch loss = 390.86057329177856 | accuracy = 0.5410714285714285


Epoch[1] Batch[145] Speed: 1.2567305605716181 samples/sec                   batch loss = 404.45981335639954 | accuracy = 0.5448275862068965


Epoch[1] Batch[150] Speed: 1.2536054422378673 samples/sec                   batch loss = 418.7510898113251 | accuracy = 0.5416666666666666


Epoch[1] Batch[155] Speed: 1.255065399533484 samples/sec                   batch loss = 433.08609795570374 | accuracy = 0.535483870967742


Epoch[1] Batch[160] Speed: 1.2602468195083425 samples/sec                   batch loss = 446.306081533432 | accuracy = 0.5390625


Epoch[1] Batch[165] Speed: 1.25350466111392 samples/sec                   batch loss = 460.3353192806244 | accuracy = 0.5333333333333333


Epoch[1] Batch[170] Speed: 1.2543842321513299 samples/sec                   batch loss = 473.8333351612091 | accuracy = 0.5323529411764706


Epoch[1] Batch[175] Speed: 1.2575440616343814 samples/sec                   batch loss = 487.91784405708313 | accuracy = 0.5285714285714286


Epoch[1] Batch[180] Speed: 1.2594134633371799 samples/sec                   batch loss = 501.13830828666687 | accuracy = 0.5333333333333333


Epoch[1] Batch[185] Speed: 1.259103731115282 samples/sec                   batch loss = 514.882922410965 | accuracy = 0.5337837837837838


Epoch[1] Batch[190] Speed: 1.2557892283309349 samples/sec                   batch loss = 528.4052519798279 | accuracy = 0.5302631578947369


Epoch[1] Batch[195] Speed: 1.2529947286323346 samples/sec                   batch loss = 541.9168555736542 | accuracy = 0.5307692307692308


Epoch[1] Batch[200] Speed: 1.2526153805032483 samples/sec                   batch loss = 555.666296005249 | accuracy = 0.52875


Epoch[1] Batch[205] Speed: 1.2590427855908584 samples/sec                   batch loss = 569.2219471931458 | accuracy = 0.5317073170731708


Epoch[1] Batch[210] Speed: 1.2494277594742893 samples/sec                   batch loss = 582.8030693531036 | accuracy = 0.5357142857142857


Epoch[1] Batch[215] Speed: 1.2525431853815452 samples/sec                   batch loss = 596.4676113128662 | accuracy = 0.536046511627907


Epoch[1] Batch[220] Speed: 1.2535262958442368 samples/sec                   batch loss = 609.9083158969879 | accuracy = 0.5363636363636364


Epoch[1] Batch[225] Speed: 1.2524896990263523 samples/sec                   batch loss = 623.5774533748627 | accuracy = 0.5366666666666666


Epoch[1] Batch[230] Speed: 1.2465736569395 samples/sec                   batch loss = 636.9059588909149 | accuracy = 0.5391304347826087


Epoch[1] Batch[235] Speed: 1.2561869612212582 samples/sec                   batch loss = 651.800167798996 | accuracy = 0.5382978723404256


Epoch[1] Batch[240] Speed: 1.254558605871845 samples/sec                   batch loss = 665.195570230484 | accuracy = 0.5395833333333333


Epoch[1] Batch[245] Speed: 1.2530991715475683 samples/sec                   batch loss = 678.7829773426056 | accuracy = 0.5408163265306123


Epoch[1] Batch[250] Speed: 1.253546151751774 samples/sec                   batch loss = 692.7498037815094 | accuracy = 0.54


Epoch[1] Batch[255] Speed: 1.2501780756149412 samples/sec                   batch loss = 706.4440677165985 | accuracy = 0.5372549019607843


Epoch[1] Batch[260] Speed: 1.254791210875505 samples/sec                   batch loss = 720.2649462223053 | accuracy = 0.5355769230769231


Epoch[1] Batch[265] Speed: 1.2548909787761826 samples/sec                   batch loss = 734.2809684276581 | accuracy = 0.5339622641509434


Epoch[1] Batch[270] Speed: 1.2555421590541969 samples/sec                   batch loss = 747.8806462287903 | accuracy = 0.5324074074074074


Epoch[1] Batch[275] Speed: 1.2513202201853693 samples/sec                   batch loss = 761.2548611164093 | accuracy = 0.5327272727272727


Epoch[1] Batch[280] Speed: 1.2514491209348846 samples/sec                   batch loss = 775.3696784973145 | accuracy = 0.53125


Epoch[1] Batch[285] Speed: 1.2501172458819387 samples/sec                   batch loss = 789.8228852748871 | accuracy = 0.5289473684210526


Epoch[1] Batch[290] Speed: 1.258656462105303 samples/sec                   batch loss = 803.4634928703308 | accuracy = 0.5310344827586206


Epoch[1] Batch[295] Speed: 1.2584011842635046 samples/sec                   batch loss = 816.8407771587372 | accuracy = 0.5322033898305085


Epoch[1] Batch[300] Speed: 1.2559897557712687 samples/sec                   batch loss = 830.0690937042236 | accuracy = 0.5341666666666667


Epoch[1] Batch[305] Speed: 1.2604557807122172 samples/sec                   batch loss = 843.1603579521179 | accuracy = 0.5377049180327869


Epoch[1] Batch[310] Speed: 1.253395187717737 samples/sec                   batch loss = 856.8269345760345 | accuracy = 0.5370967741935484


Epoch[1] Batch[315] Speed: 1.2565537003708123 samples/sec                   batch loss = 870.2423598766327 | accuracy = 0.5380952380952381


Epoch[1] Batch[320] Speed: 1.253793091903487 samples/sec                   batch loss = 884.1290156841278 | accuracy = 0.5375


Epoch[1] Batch[325] Speed: 1.2661317843813709 samples/sec                   batch loss = 898.005254983902 | accuracy = 0.536923076923077


Epoch[1] Batch[330] Speed: 1.2592706292136366 samples/sec                   batch loss = 911.6059746742249 | accuracy = 0.5386363636363637


Epoch[1] Batch[335] Speed: 1.2598756511157345 samples/sec                   batch loss = 925.2668769359589 | accuracy = 0.5380597014925373


Epoch[1] Batch[340] Speed: 1.2591574058584252 samples/sec                   batch loss = 939.590637922287 | accuracy = 0.5360294117647059


Epoch[1] Batch[345] Speed: 1.261006019860157 samples/sec                   batch loss = 953.2761199474335 | accuracy = 0.5384057971014493


Epoch[1] Batch[350] Speed: 1.2613393526796857 samples/sec                   batch loss = 966.6394975185394 | accuracy = 0.5392857142857143


Epoch[1] Batch[355] Speed: 1.2599540874045565 samples/sec                   batch loss = 979.6217617988586 | accuracy = 0.5415492957746478


Epoch[1] Batch[360] Speed: 1.2619507309246083 samples/sec                   batch loss = 993.2251489162445 | accuracy = 0.5423611111111111


Epoch[1] Batch[365] Speed: 1.2548345699518821 samples/sec                   batch loss = 1007.1839747428894 | accuracy = 0.5431506849315069


Epoch[1] Batch[370] Speed: 1.2534187851564613 samples/sec                   batch loss = 1021.6251583099365 | accuracy = 0.5425675675675675


Epoch[1] Batch[375] Speed: 1.2537269443726642 samples/sec                   batch loss = 1036.0940234661102 | accuracy = 0.542


Epoch[1] Batch[380] Speed: 1.2555928055016963 samples/sec                   batch loss = 1049.7660427093506 | accuracy = 0.5434210526315789


Epoch[1] Batch[385] Speed: 1.2568896737470465 samples/sec                   batch loss = 1062.607658147812 | accuracy = 0.5461038961038961


Epoch[1] Batch[390] Speed: 1.257403159018943 samples/sec                   batch loss = 1075.8811349868774 | accuracy = 0.5474358974358975


Epoch[1] Batch[395] Speed: 1.2548609434963818 samples/sec                   batch loss = 1089.2429082393646 | accuracy = 0.5487341772151899


Epoch[1] Batch[400] Speed: 1.2527539024756673 samples/sec                   batch loss = 1102.8884692192078 | accuracy = 0.549375


Epoch[1] Batch[405] Speed: 1.2510162278808161 samples/sec                   batch loss = 1116.2029745578766 | accuracy = 0.5506172839506173


Epoch[1] Batch[410] Speed: 1.2541326528895997 samples/sec                   batch loss = 1130.248714208603 | accuracy = 0.5481707317073171


Epoch[1] Batch[415] Speed: 1.2584413005861275 samples/sec                   batch loss = 1143.9695103168488 | accuracy = 0.5475903614457831


Epoch[1] Batch[420] Speed: 1.25280451132255 samples/sec                   batch loss = 1157.417017698288 | accuracy = 0.5476190476190477


Epoch[1] Batch[425] Speed: 1.2477320491806998 samples/sec                   batch loss = 1171.0811598300934 | accuracy = 0.5488235294117647


Epoch[1] Batch[430] Speed: 1.2544102116065656 samples/sec                   batch loss = 1185.1412522792816 | accuracy = 0.5494186046511628


Epoch[1] Batch[435] Speed: 1.2518523288429046 samples/sec                   batch loss = 1198.6414625644684 | accuracy = 0.55


Epoch[1] Batch[440] Speed: 1.256036018729272 samples/sec                   batch loss = 1212.5303852558136 | accuracy = 0.55


Epoch[1] Batch[445] Speed: 1.2503264020120628 samples/sec                   batch loss = 1225.3544535636902 | accuracy = 0.5516853932584269


Epoch[1] Batch[450] Speed: 1.25427047919513 samples/sec                   batch loss = 1238.1488189697266 | accuracy = 0.5516666666666666


Epoch[1] Batch[455] Speed: 1.2543139898431557 samples/sec                   batch loss = 1251.2948153018951 | accuracy = 0.5521978021978022


Epoch[1] Batch[460] Speed: 1.2640768481319788 samples/sec                   batch loss = 1264.772379398346 | accuracy = 0.5532608695652174


Epoch[1] Batch[465] Speed: 1.253277026608095 samples/sec                   batch loss = 1278.7574956417084 | accuracy = 0.5516129032258065


Epoch[1] Batch[470] Speed: 1.2567921297674167 samples/sec                   batch loss = 1292.1293137073517 | accuracy = 0.5526595744680851


Epoch[1] Batch[475] Speed: 1.2582756602955953 samples/sec                   batch loss = 1305.112337589264 | accuracy = 0.5542105263157895


Epoch[1] Batch[480] Speed: 1.2559607021191828 samples/sec                   batch loss = 1317.6019382476807 | accuracy = 0.55625


Epoch[1] Batch[485] Speed: 1.2583337002207249 samples/sec                   batch loss = 1331.1665830612183 | accuracy = 0.5572164948453608


Epoch[1] Batch[490] Speed: 1.2542346601943997 samples/sec                   batch loss = 1346.0700175762177 | accuracy = 0.5551020408163265


Epoch[1] Batch[495] Speed: 1.2543093010506172 samples/sec                   batch loss = 1359.9537889957428 | accuracy = 0.555050505050505


Epoch[1] Batch[500] Speed: 1.2570470376944418 samples/sec                   batch loss = 1373.150966644287 | accuracy = 0.556


Epoch[1] Batch[505] Speed: 1.2569090713624032 samples/sec                   batch loss = 1385.9123120307922 | accuracy = 0.557920792079208


Epoch[1] Batch[510] Speed: 1.2608365768814083 samples/sec                   batch loss = 1398.6922705173492 | accuracy = 0.5593137254901961


Epoch[1] Batch[515] Speed: 1.2597801972040854 samples/sec                   batch loss = 1411.5897991657257 | accuracy = 0.5606796116504854


Epoch[1] Batch[520] Speed: 1.2559315557929762 samples/sec                   batch loss = 1425.2962763309479 | accuracy = 0.5605769230769231


Epoch[1] Batch[525] Speed: 1.2586464529613484 samples/sec                   batch loss = 1438.1838495731354 | accuracy = 0.5619047619047619


Epoch[1] Batch[530] Speed: 1.2578068185941593 samples/sec                   batch loss = 1451.52525472641 | accuracy = 0.5627358490566038


Epoch[1] Batch[535] Speed: 1.2550411767447314 samples/sec                   batch loss = 1466.05832529068 | accuracy = 0.5616822429906542


Epoch[1] Batch[540] Speed: 1.2609319066454923 samples/sec                   batch loss = 1478.9635696411133 | accuracy = 0.5625


Epoch[1] Batch[545] Speed: 1.2562317336957791 samples/sec                   batch loss = 1492.5664472579956 | accuracy = 0.5619266055045872


Epoch[1] Batch[550] Speed: 1.2615486763759873 samples/sec                   batch loss = 1505.9105036258698 | accuracy = 0.5622727272727273


Epoch[1] Batch[555] Speed: 1.262505981531394 samples/sec                   batch loss = 1518.3944864273071 | accuracy = 0.5635135135135135


Epoch[1] Batch[560] Speed: 1.2582973656570917 samples/sec                   batch loss = 1531.116688966751 | accuracy = 0.5642857142857143


Epoch[1] Batch[565] Speed: 1.254797780239923 samples/sec                   batch loss = 1544.2199578285217 | accuracy = 0.5641592920353983


Epoch[1] Batch[570] Speed: 1.2518853963184433 samples/sec                   batch loss = 1557.6236400604248 | accuracy = 0.5635964912280702


Epoch[1] Batch[575] Speed: 1.2535419370018297 samples/sec                   batch loss = 1570.8408467769623 | accuracy = 0.5621739130434783


Epoch[1] Batch[580] Speed: 1.2596275388559723 samples/sec                   batch loss = 1583.7122225761414 | accuracy = 0.5625


Epoch[1] Batch[585] Speed: 1.2533465910300625 samples/sec                   batch loss = 1597.715912103653 | accuracy = 0.5611111111111111


Epoch[1] Batch[590] Speed: 1.254809886820253 samples/sec                   batch loss = 1610.5069754123688 | accuracy = 0.5610169491525424


Epoch[1] Batch[595] Speed: 1.2515863580111575 samples/sec                   batch loss = 1623.1359028816223 | accuracy = 0.5617647058823529


Epoch[1] Batch[600] Speed: 1.2610600464910782 samples/sec                   batch loss = 1635.9754691123962 | accuracy = 0.5629166666666666


Epoch[1] Batch[605] Speed: 1.2484012714971047 samples/sec                   batch loss = 1651.0625584125519 | accuracy = 0.5611570247933885


Epoch[1] Batch[610] Speed: 1.2516167036046872 samples/sec                   batch loss = 1663.0480184555054 | accuracy = 0.5627049180327869


Epoch[1] Batch[615] Speed: 1.2572177246458112 samples/sec                   batch loss = 1674.8246083259583 | accuracy = 0.5642276422764227


Epoch[1] Batch[620] Speed: 1.2578030466376 samples/sec                   batch loss = 1688.5771400928497 | accuracy = 0.5637096774193548


Epoch[1] Batch[625] Speed: 1.2551784515784692 samples/sec                   batch loss = 1702.1149837970734 | accuracy = 0.564


Epoch[1] Batch[630] Speed: 1.2537350953202289 samples/sec                   batch loss = 1716.3718521595001 | accuracy = 0.5630952380952381


Epoch[1] Batch[635] Speed: 1.2557097119741236 samples/sec                   batch loss = 1729.5673418045044 | accuracy = 0.562992125984252


Epoch[1] Batch[640] Speed: 1.2551433319021728 samples/sec                   batch loss = 1742.3828332424164 | accuracy = 0.56328125


Epoch[1] Batch[645] Speed: 1.2615897525670619 samples/sec                   batch loss = 1756.052062034607 | accuracy = 0.5631782945736434


Epoch[1] Batch[650] Speed: 1.2556449596386854 samples/sec                   batch loss = 1770.0560307502747 | accuracy = 0.5634615384615385


Epoch[1] Batch[655] Speed: 1.2526342723103894 samples/sec                   batch loss = 1782.8761537075043 | accuracy = 0.5645038167938932


Epoch[1] Batch[660] Speed: 1.255382635356708 samples/sec                   batch loss = 1795.9268791675568 | accuracy = 0.5640151515151515


Epoch[1] Batch[665] Speed: 1.254760148098233 samples/sec                   batch loss = 1807.706080198288 | accuracy = 0.5650375939849624


Epoch[1] Batch[670] Speed: 1.2532720647034739 samples/sec                   batch loss = 1821.6946403980255 | accuracy = 0.5656716417910448


Epoch[1] Batch[675] Speed: 1.2525044727727308 samples/sec                   batch loss = 1835.2724566459656 | accuracy = 0.5655555555555556


Epoch[1] Batch[680] Speed: 1.256112378870421 samples/sec                   batch loss = 1847.7724686861038 | accuracy = 0.5661764705882353


Epoch[1] Batch[685] Speed: 1.2534979179740113 samples/sec                   batch loss = 1861.4400178194046 | accuracy = 0.5664233576642336


Epoch[1] Batch[690] Speed: 1.2528973204234122 samples/sec                   batch loss = 1874.0000085830688 | accuracy = 0.5666666666666667


Epoch[1] Batch[695] Speed: 1.251986290937809 samples/sec                   batch loss = 1885.6824293136597 | accuracy = 0.5683453237410072


Epoch[1] Batch[700] Speed: 1.2527409001234804 samples/sec                   batch loss = 1897.922821521759 | accuracy = 0.5685714285714286


Epoch[1] Batch[705] Speed: 1.2507732709085453 samples/sec                   batch loss = 1911.086387872696 | accuracy = 0.5691489361702128


Epoch[1] Batch[710] Speed: 1.2514831006454115 samples/sec                   batch loss = 1922.228783249855 | accuracy = 0.5707746478873239


Epoch[1] Batch[715] Speed: 1.255385641317811 samples/sec                   batch loss = 1936.4910842180252 | accuracy = 0.5713286713286714


Epoch[1] Batch[720] Speed: 1.2534730064473423 samples/sec                   batch loss = 1947.8574267625809 | accuracy = 0.5725694444444445


Epoch[1] Batch[725] Speed: 1.2476358285877378 samples/sec                   batch loss = 1960.8454657793045 | accuracy = 0.573103448275862


Epoch[1] Batch[730] Speed: 1.2530656655614756 samples/sec                   batch loss = 1973.1719969511032 | accuracy = 0.5736301369863014


Epoch[1] Batch[735] Speed: 1.2572934747631017 samples/sec                   batch loss = 1986.1857787370682 | accuracy = 0.5744897959183674


Epoch[1] Batch[740] Speed: 1.2555589781077396 samples/sec                   batch loss = 1999.8425747156143 | accuracy = 0.5746621621621621


Epoch[1] Batch[745] Speed: 1.259904318494184 samples/sec                   batch loss = 2013.2253767251968 | accuracy = 0.5738255033557047


Epoch[1] Batch[750] Speed: 1.2550617378891504 samples/sec                   batch loss = 2025.9330681562424 | accuracy = 0.5746666666666667


Epoch[1] Batch[755] Speed: 1.253770604707666 samples/sec                   batch loss = 2037.8159048557281 | accuracy = 0.5751655629139073


Epoch[1] Batch[760] Speed: 1.251973584794429 samples/sec                   batch loss = 2050.2488236427307 | accuracy = 0.5753289473684211


Epoch[1] Batch[765] Speed: 1.2593811314001704 samples/sec                   batch loss = 2064.136218547821 | accuracy = 0.5751633986928104


Epoch[1] Batch[770] Speed: 1.25778286705428 samples/sec                   batch loss = 2076.1021897792816 | accuracy = 0.5753246753246753


Epoch[1] Batch[775] Speed: 1.2544042090375604 samples/sec                   batch loss = 2087.141832590103 | accuracy = 0.5758064516129032


Epoch[1] Batch[780] Speed: 1.2562008817327965 samples/sec                   batch loss = 2098.982282400131 | accuracy = 0.576602564102564


Epoch[1] Batch[785] Speed: 1.2598255099929827 samples/sec                   batch loss = 2112.438569545746 | accuracy = 0.5767515923566879


[Epoch 1] training: accuracy=0.5774111675126904
[Epoch 1] time cost: 645.7815256118774
[Epoch 1] validation: validation accuracy=0.6577777777777778


Epoch[2] Batch[5] Speed: 1.2486001896282006 samples/sec                   batch loss = 11.037821292877197 | accuracy = 0.8


Epoch[2] Batch[10] Speed: 1.256748447025143 samples/sec                   batch loss = 25.44400954246521 | accuracy = 0.7


Epoch[2] Batch[15] Speed: 1.2562235502530497 samples/sec                   batch loss = 38.04939150810242 | accuracy = 0.6666666666666666


Epoch[2] Batch[20] Speed: 1.2522575725638727 samples/sec                   batch loss = 48.981640577316284 | accuracy = 0.6875


Epoch[2] Batch[25] Speed: 1.2558286142745536 samples/sec                   batch loss = 62.96444916725159 | accuracy = 0.65


Epoch[2] Batch[30] Speed: 1.2603347698125125 samples/sec                   batch loss = 74.62451660633087 | accuracy = 0.6666666666666666


Epoch[2] Batch[35] Speed: 1.2531011370381437 samples/sec                   batch loss = 86.70231711864471 | accuracy = 0.6571428571428571


Epoch[2] Batch[40] Speed: 1.2550851164471637 samples/sec                   batch loss = 99.16558229923248 | accuracy = 0.66875


Epoch[2] Batch[45] Speed: 1.2586416372978646 samples/sec                   batch loss = 114.06643855571747 | accuracy = 0.6388888888888888


Epoch[2] Batch[50] Speed: 1.2553019494677324 samples/sec                   batch loss = 129.27371215820312 | accuracy = 0.64


Epoch[2] Batch[55] Speed: 1.255912000306619 samples/sec                   batch loss = 141.3021068572998 | accuracy = 0.6454545454545455


Epoch[2] Batch[60] Speed: 1.2594578988245761 samples/sec                   batch loss = 154.80473852157593 | accuracy = 0.6291666666666667


Epoch[2] Batch[65] Speed: 1.2555582264085765 samples/sec                   batch loss = 167.91644823551178 | accuracy = 0.6230769230769231


Epoch[2] Batch[70] Speed: 1.261858853536604 samples/sec                   batch loss = 179.86169838905334 | accuracy = 0.6392857142857142


Epoch[2] Batch[75] Speed: 1.259475673897517 samples/sec                   batch loss = 191.12312424182892 | accuracy = 0.6533333333333333


Epoch[2] Batch[80] Speed: 1.254832505166011 samples/sec                   batch loss = 203.03994512557983 | accuracy = 0.65


Epoch[2] Batch[85] Speed: 1.2548978307757992 samples/sec                   batch loss = 214.8965871334076 | accuracy = 0.65


Epoch[2] Batch[90] Speed: 1.25741946252212 samples/sec                   batch loss = 226.64938640594482 | accuracy = 0.6555555555555556


Epoch[2] Batch[95] Speed: 1.2512055292474569 samples/sec                   batch loss = 240.82597732543945 | accuracy = 0.6447368421052632


Epoch[2] Batch[100] Speed: 1.2588740586891731 samples/sec                   batch loss = 251.68155360221863 | accuracy = 0.6525


Epoch[2] Batch[105] Speed: 1.2480026041314105 samples/sec                   batch loss = 264.70415914058685 | accuracy = 0.6476190476190476


Epoch[2] Batch[110] Speed: 1.2570144504227057 samples/sec                   batch loss = 278.5752670764923 | accuracy = 0.65


Epoch[2] Batch[115] Speed: 1.2548455509727228 samples/sec                   batch loss = 291.74111223220825 | accuracy = 0.6521739130434783


Epoch[2] Batch[120] Speed: 1.25153594089383 samples/sec                   batch loss = 305.2262134552002 | accuracy = 0.6458333333333334


Epoch[2] Batch[125] Speed: 1.25288515715751 samples/sec                   batch loss = 316.63157045841217 | accuracy = 0.648


Epoch[2] Batch[130] Speed: 1.2591442702430897 samples/sec                   batch loss = 328.3074723482132 | accuracy = 0.65


Epoch[2] Batch[135] Speed: 1.2553067396092474 samples/sec                   batch loss = 342.0327333211899 | accuracy = 0.6444444444444445


Epoch[2] Batch[140] Speed: 1.2564808623734922 samples/sec                   batch loss = 355.0291100740433 | accuracy = 0.6464285714285715


Epoch[2] Batch[145] Speed: 1.250726555646141 samples/sec                   batch loss = 371.4832464456558 | accuracy = 0.6396551724137931


Epoch[2] Batch[150] Speed: 1.2486960014034159 samples/sec                   batch loss = 383.3257625102997 | accuracy = 0.6433333333333333


Epoch[2] Batch[155] Speed: 1.2540398481597574 samples/sec                   batch loss = 395.34854888916016 | accuracy = 0.6435483870967742


Epoch[2] Batch[160] Speed: 1.2526875839479155 samples/sec                   batch loss = 405.6257321834564 | accuracy = 0.6484375


Epoch[2] Batch[165] Speed: 1.2532504387477905 samples/sec                   batch loss = 421.86066722869873 | accuracy = 0.6439393939393939


Epoch[2] Batch[170] Speed: 1.2562742516889134 samples/sec                   batch loss = 432.784317612648 | accuracy = 0.6441176470588236


Epoch[2] Batch[175] Speed: 1.2470830102681398 samples/sec                   batch loss = 447.55539095401764 | accuracy = 0.64


Epoch[2] Batch[180] Speed: 1.2553779385713029 samples/sec                   batch loss = 459.1331458091736 | accuracy = 0.6430555555555556


Epoch[2] Batch[185] Speed: 1.2586844130106245 samples/sec                   batch loss = 470.81811809539795 | accuracy = 0.6459459459459459


Epoch[2] Batch[190] Speed: 1.2556859342158373 samples/sec                   batch loss = 483.19288992881775 | accuracy = 0.6486842105263158


Epoch[2] Batch[195] Speed: 1.2535277943813508 samples/sec                   batch loss = 496.3331676721573 | accuracy = 0.6461538461538462


Epoch[2] Batch[200] Speed: 1.2499912270029199 samples/sec                   batch loss = 509.5856257677078 | accuracy = 0.645


Epoch[2] Batch[205] Speed: 1.2538458462827353 samples/sec                   batch loss = 523.2100483179092 | accuracy = 0.6439024390243903


Epoch[2] Batch[210] Speed: 1.259621580823126 samples/sec                   batch loss = 535.9657715559006 | accuracy = 0.6428571428571429


Epoch[2] Batch[215] Speed: 1.2530929007379015 samples/sec                   batch loss = 547.2447406053543 | accuracy = 0.6453488372093024


Epoch[2] Batch[220] Speed: 1.2514648969995812 samples/sec                   batch loss = 559.7437914609909 | accuracy = 0.6465909090909091


Epoch[2] Batch[225] Speed: 1.2469710410182562 samples/sec                   batch loss = 572.0000430345535 | accuracy = 0.6477777777777778


Epoch[2] Batch[230] Speed: 1.2560338559540793 samples/sec                   batch loss = 583.0966633558273 | accuracy = 0.65


Epoch[2] Batch[235] Speed: 1.2504295617806507 samples/sec                   batch loss = 595.8893347978592 | accuracy = 0.6478723404255319


Epoch[2] Batch[240] Speed: 1.2537171071632978 samples/sec                   batch loss = 607.4655154943466 | accuracy = 0.65


Epoch[2] Batch[245] Speed: 1.2489335032799995 samples/sec                   batch loss = 620.4475516080856 | accuracy = 0.6510204081632653


Epoch[2] Batch[250] Speed: 1.250684319101469 samples/sec                   batch loss = 633.2819687128067 | accuracy = 0.651


Epoch[2] Batch[255] Speed: 1.2494097086158191 samples/sec                   batch loss = 644.5298404693604 | accuracy = 0.6519607843137255


Epoch[2] Batch[260] Speed: 1.255332287648092 samples/sec                   batch loss = 655.6377938985825 | accuracy = 0.6528846153846154


Epoch[2] Batch[265] Speed: 1.2616485730176572 samples/sec                   batch loss = 670.6082857847214 | accuracy = 0.6509433962264151


Epoch[2] Batch[270] Speed: 1.2562246789974323 samples/sec                   batch loss = 681.1659400463104 | accuracy = 0.6537037037037037


Epoch[2] Batch[275] Speed: 1.2609073621176057 samples/sec                   batch loss = 695.9771139621735 | accuracy = 0.6481818181818182


Epoch[2] Batch[280] Speed: 1.2576149490015482 samples/sec                   batch loss = 709.3056607246399 | accuracy = 0.6455357142857143


Epoch[2] Batch[285] Speed: 1.251790495483144 samples/sec                   batch loss = 725.1620156764984 | accuracy = 0.6403508771929824


Epoch[2] Batch[290] Speed: 1.2520003987962085 samples/sec                   batch loss = 736.6155145168304 | accuracy = 0.6422413793103449


Epoch[2] Batch[295] Speed: 1.2537068953664032 samples/sec                   batch loss = 748.772057056427 | accuracy = 0.6423728813559322


Epoch[2] Batch[300] Speed: 1.2547586466132237 samples/sec                   batch loss = 762.3970668315887 | accuracy = 0.6416666666666667


Epoch[2] Batch[305] Speed: 1.2517710687399775 samples/sec                   batch loss = 772.2740033864975 | accuracy = 0.6459016393442623


Epoch[2] Batch[310] Speed: 1.253167218547622 samples/sec                   batch loss = 783.3464138507843 | accuracy = 0.6451612903225806


Epoch[2] Batch[315] Speed: 1.2510787311151212 samples/sec                   batch loss = 796.033398270607 | accuracy = 0.6452380952380953


Epoch[2] Batch[320] Speed: 1.2531095606390186 samples/sec                   batch loss = 807.7241579294205 | accuracy = 0.6453125


Epoch[2] Batch[325] Speed: 1.2464038106168138 samples/sec                   batch loss = 822.7106195688248 | accuracy = 0.6423076923076924


Epoch[2] Batch[330] Speed: 1.2497974515395918 samples/sec                   batch loss = 833.3728982210159 | accuracy = 0.6439393939393939


Epoch[2] Batch[335] Speed: 1.251427277809624 samples/sec                   batch loss = 844.7757166624069 | accuracy = 0.6447761194029851


Epoch[2] Batch[340] Speed: 1.2512813964887304 samples/sec                   batch loss = 859.6683188676834 | accuracy = 0.6426470588235295


Epoch[2] Batch[345] Speed: 1.2508502048439347 samples/sec                   batch loss = 872.6712045669556 | accuracy = 0.6413043478260869


Epoch[2] Batch[350] Speed: 1.2458364484412876 samples/sec                   batch loss = 885.8657801151276 | accuracy = 0.64


Epoch[2] Batch[355] Speed: 1.248533009345533 samples/sec                   batch loss = 899.6960498094559 | accuracy = 0.6408450704225352


Epoch[2] Batch[360] Speed: 1.2564364485147026 samples/sec                   batch loss = 912.557119011879 | accuracy = 0.6423611111111112


Epoch[2] Batch[365] Speed: 1.2511861206562822 samples/sec                   batch loss = 922.6762572526932 | accuracy = 0.6452054794520548


Epoch[2] Batch[370] Speed: 1.2511092386930138 samples/sec                   batch loss = 935.5437570810318 | accuracy = 0.6445945945945946


Epoch[2] Batch[375] Speed: 1.2503055298385595 samples/sec                   batch loss = 947.1449280977249 | accuracy = 0.644


Epoch[2] Batch[380] Speed: 1.2475029813427947 samples/sec                   batch loss = 956.5779094696045 | accuracy = 0.6467105263157895


Epoch[2] Batch[385] Speed: 1.253656775656013 samples/sec                   batch loss = 971.650910615921 | accuracy = 0.6454545454545455


Epoch[2] Batch[390] Speed: 1.2517332443594762 samples/sec                   batch loss = 984.1215230226517 | accuracy = 0.6448717948717949


Epoch[2] Batch[395] Speed: 1.2477265743225534 samples/sec                   batch loss = 995.096915602684 | accuracy = 0.6462025316455696


Epoch[2] Batch[400] Speed: 1.2489756216527315 samples/sec                   batch loss = 1005.4873203039169 | accuracy = 0.6475


Epoch[2] Batch[405] Speed: 1.25300408660636 samples/sec                   batch loss = 1017.609582066536 | accuracy = 0.6481481481481481


Epoch[2] Batch[410] Speed: 1.2540759372558987 samples/sec                   batch loss = 1033.297933101654 | accuracy = 0.6451219512195122


Epoch[2] Batch[415] Speed: 1.2574231379367253 samples/sec                   batch loss = 1044.4016822576523 | accuracy = 0.6463855421686747


Epoch[2] Batch[420] Speed: 1.2525332732296315 samples/sec                   batch loss = 1054.8897905349731 | accuracy = 0.6482142857142857


Epoch[2] Batch[425] Speed: 1.2509615659925366 samples/sec                   batch loss = 1067.0472235679626 | accuracy = 0.65


Epoch[2] Batch[430] Speed: 1.245243907906132 samples/sec                   batch loss = 1078.6328929662704 | accuracy = 0.6505813953488372


Epoch[2] Batch[435] Speed: 1.248864241934575 samples/sec                   batch loss = 1091.1355637311935 | accuracy = 0.6505747126436782


Epoch[2] Batch[440] Speed: 1.2544981933499515 samples/sec                   batch loss = 1103.7263365983963 | accuracy = 0.65


Epoch[2] Batch[445] Speed: 1.2505901595914974 samples/sec                   batch loss = 1115.2391158342361 | accuracy = 0.650561797752809


Epoch[2] Batch[450] Speed: 1.2439608948869914 samples/sec                   batch loss = 1127.397672533989 | accuracy = 0.65


Epoch[2] Batch[455] Speed: 1.2465224388610423 samples/sec                   batch loss = 1139.8483496904373 | accuracy = 0.65


Epoch[2] Batch[460] Speed: 1.252947472998721 samples/sec                   batch loss = 1151.247007727623 | accuracy = 0.65


Epoch[2] Batch[465] Speed: 1.2433981499789966 samples/sec                   batch loss = 1164.6577755212784 | accuracy = 0.6489247311827957


Epoch[2] Batch[470] Speed: 1.2467886705920026 samples/sec                   batch loss = 1173.2926542758942 | accuracy = 0.6505319148936171


Epoch[2] Batch[475] Speed: 1.2492549950754952 samples/sec                   batch loss = 1186.5996152162552 | accuracy = 0.6494736842105263


Epoch[2] Batch[480] Speed: 1.248693120330579 samples/sec                   batch loss = 1200.7888947725296 | accuracy = 0.6494791666666667


Epoch[2] Batch[485] Speed: 1.254147559171874 samples/sec                   batch loss = 1216.4986573457718 | accuracy = 0.6463917525773196


Epoch[2] Batch[490] Speed: 1.2492309960472925 samples/sec                   batch loss = 1227.412891626358 | accuracy = 0.6469387755102041


Epoch[2] Batch[495] Speed: 1.2550664323088256 samples/sec                   batch loss = 1239.8941069841385 | accuracy = 0.6474747474747474


Epoch[2] Batch[500] Speed: 1.2568655688069414 samples/sec                   batch loss = 1252.474119901657 | accuracy = 0.6465


Epoch[2] Batch[505] Speed: 1.247914881355148 samples/sec                   batch loss = 1266.5922043323517 | accuracy = 0.6460396039603961


Epoch[2] Batch[510] Speed: 1.254697088821643 samples/sec                   batch loss = 1279.8425246477127 | accuracy = 0.6460784313725491


Epoch[2] Batch[515] Speed: 1.251969567464523 samples/sec                   batch loss = 1291.9822480678558 | accuracy = 0.6461165048543689


Epoch[2] Batch[520] Speed: 1.2576544495521917 samples/sec                   batch loss = 1304.7747008800507 | accuracy = 0.6451923076923077


Epoch[2] Batch[525] Speed: 1.247204734752548 samples/sec                   batch loss = 1315.195271372795 | accuracy = 0.6461904761904762


Epoch[2] Batch[530] Speed: 1.246260394128768 samples/sec                   batch loss = 1327.0779634714127 | accuracy = 0.6457547169811321


Epoch[2] Batch[535] Speed: 1.246533182268266 samples/sec                   batch loss = 1338.8312191963196 | accuracy = 0.6462616822429906


Epoch[2] Batch[540] Speed: 1.2491644919308305 samples/sec                   batch loss = 1351.0055611133575 | accuracy = 0.6453703703703704


Epoch[2] Batch[545] Speed: 1.2427989959066015 samples/sec                   batch loss = 1362.6500918865204 | accuracy = 0.6454128440366973


Epoch[2] Batch[550] Speed: 1.251332633077404 samples/sec                   batch loss = 1373.2693076133728 | accuracy = 0.6472727272727272


Epoch[2] Batch[555] Speed: 1.2519115526212088 samples/sec                   batch loss = 1384.3038170337677 | accuracy = 0.6481981981981982


Epoch[2] Batch[560] Speed: 1.2459915193162219 samples/sec                   batch loss = 1396.7655112743378 | accuracy = 0.6473214285714286


Epoch[2] Batch[565] Speed: 1.2474915719001922 samples/sec                   batch loss = 1411.1336994171143 | accuracy = 0.6469026548672566


Epoch[2] Batch[570] Speed: 1.2523747001322676 samples/sec                   batch loss = 1422.8683687448502 | accuracy = 0.6473684210526316


Epoch[2] Batch[575] Speed: 1.2682830509461163 samples/sec                   batch loss = 1435.254263997078 | accuracy = 0.6473913043478261


Epoch[2] Batch[580] Speed: 1.2744348485571277 samples/sec                   batch loss = 1446.8506481647491 | accuracy = 0.6478448275862069


Epoch[2] Batch[585] Speed: 1.2798053030567706 samples/sec                   batch loss = 1458.217889904976 | accuracy = 0.6482905982905983


Epoch[2] Batch[590] Speed: 1.2833990413147125 samples/sec                   batch loss = 1468.3594794273376 | accuracy = 0.6491525423728813


Epoch[2] Batch[595] Speed: 1.285128375218644 samples/sec                   batch loss = 1480.2668061256409 | accuracy = 0.65


Epoch[2] Batch[600] Speed: 1.285678202804868 samples/sec                   batch loss = 1491.1974263191223 | accuracy = 0.6508333333333334


Epoch[2] Batch[605] Speed: 1.2808266076869526 samples/sec                   batch loss = 1502.1135120391846 | accuracy = 0.6512396694214876


Epoch[2] Batch[610] Speed: 1.2812187374281505 samples/sec                   batch loss = 1514.7814506292343 | accuracy = 0.6516393442622951


Epoch[2] Batch[615] Speed: 1.2795629422536874 samples/sec                   batch loss = 1527.3643976449966 | accuracy = 0.6524390243902439


Epoch[2] Batch[620] Speed: 1.2759944102474943 samples/sec                   batch loss = 1539.204451918602 | accuracy = 0.652016129032258


Epoch[2] Batch[625] Speed: 1.2818210441332096 samples/sec                   batch loss = 1551.5326414108276 | accuracy = 0.6524


Epoch[2] Batch[630] Speed: 1.271665946593295 samples/sec                   batch loss = 1563.4201766252518 | accuracy = 0.6523809523809524


Epoch[2] Batch[635] Speed: 1.267945751030717 samples/sec                   batch loss = 1572.1673910617828 | accuracy = 0.6539370078740158


Epoch[2] Batch[640] Speed: 1.2752043629867293 samples/sec                   batch loss = 1585.343554019928 | accuracy = 0.653515625


Epoch[2] Batch[645] Speed: 1.280244382587069 samples/sec                   batch loss = 1598.767030954361 | accuracy = 0.6546511627906977


Epoch[2] Batch[650] Speed: 1.275697809579708 samples/sec                   batch loss = 1610.2460684776306 | accuracy = 0.655


Epoch[2] Batch[655] Speed: 1.2804676512639313 samples/sec                   batch loss = 1621.5379066467285 | accuracy = 0.6553435114503817


Epoch[2] Batch[660] Speed: 1.2777711729182106 samples/sec                   batch loss = 1637.1866385936737 | accuracy = 0.6545454545454545


Epoch[2] Batch[665] Speed: 1.2871777206115338 samples/sec                   batch loss = 1650.0036914348602 | accuracy = 0.6548872180451127


Epoch[2] Batch[670] Speed: 1.2841189738477694 samples/sec                   batch loss = 1661.029364824295 | accuracy = 0.6559701492537313


Epoch[2] Batch[675] Speed: 1.2896636217457687 samples/sec                   batch loss = 1671.0057504177094 | accuracy = 0.6570370370370371


Epoch[2] Batch[680] Speed: 1.2836446254528984 samples/sec                   batch loss = 1680.9242178201675 | accuracy = 0.6573529411764706


Epoch[2] Batch[685] Speed: 1.2821071741291115 samples/sec                   batch loss = 1689.4321784973145 | accuracy = 0.6583941605839416


Epoch[2] Batch[690] Speed: 1.2816123800771693 samples/sec                   batch loss = 1700.9503180980682 | accuracy = 0.658695652173913


Epoch[2] Batch[695] Speed: 1.2855262958270777 samples/sec                   batch loss = 1713.1339865922928 | accuracy = 0.6589928057553956


Epoch[2] Batch[700] Speed: 1.2835433756641412 samples/sec                   batch loss = 1725.5836124420166 | accuracy = 0.6585714285714286


Epoch[2] Batch[705] Speed: 1.2838856858269554 samples/sec                   batch loss = 1740.9365907907486 | accuracy = 0.6585106382978724


Epoch[2] Batch[710] Speed: 1.2818700131623595 samples/sec                   batch loss = 1751.490419626236 | accuracy = 0.6591549295774648


Epoch[2] Batch[715] Speed: 1.2823157053180603 samples/sec                   batch loss = 1765.9779965877533 | accuracy = 0.6583916083916084


Epoch[2] Batch[720] Speed: 1.2846979413625679 samples/sec                   batch loss = 1778.3820583820343 | accuracy = 0.6586805555555556


Epoch[2] Batch[725] Speed: 1.2798584141006129 samples/sec                   batch loss = 1794.2249982357025 | accuracy = 0.6572413793103449


Epoch[2] Batch[730] Speed: 1.2863749520404617 samples/sec                   batch loss = 1804.7282885313034 | accuracy = 0.6578767123287671


Epoch[2] Batch[735] Speed: 1.2878618605737264 samples/sec                   batch loss = 1815.1084361076355 | accuracy = 0.658843537414966


Epoch[2] Batch[740] Speed: 1.281120706941665 samples/sec                   batch loss = 1825.5900285243988 | accuracy = 0.6594594594594595


Epoch[2] Batch[745] Speed: 1.2837996245449044 samples/sec                   batch loss = 1836.7751129865646 | accuracy = 0.6597315436241611


Epoch[2] Batch[750] Speed: 1.277147678832819 samples/sec                   batch loss = 1845.915100812912 | accuracy = 0.6603333333333333


Epoch[2] Batch[755] Speed: 1.2776638418857853 samples/sec                   batch loss = 1856.8742135763168 | accuracy = 0.6602649006622516


Epoch[2] Batch[760] Speed: 1.2793358916163209 samples/sec                   batch loss = 1869.0615787506104 | accuracy = 0.6605263157894737


Epoch[2] Batch[765] Speed: 1.2745765925810675 samples/sec                   batch loss = 1882.1507419347763 | accuracy = 0.6607843137254902


Epoch[2] Batch[770] Speed: 1.2731841442084642 samples/sec                   batch loss = 1892.1226536035538 | accuracy = 0.6613636363636364


Epoch[2] Batch[775] Speed: 1.2796134956490117 samples/sec                   batch loss = 1905.411994934082 | accuracy = 0.6619354838709678


Epoch[2] Batch[780] Speed: 1.2756198256307858 samples/sec                   batch loss = 1918.649936914444 | accuracy = 0.6621794871794872


Epoch[2] Batch[785] Speed: 1.276760077138846 samples/sec                   batch loss = 1931.1192984580994 | accuracy = 0.6627388535031847


[Epoch 2] training: accuracy=0.6624365482233503
[Epoch 2] time cost: 639.2938239574432
[Epoch 2] validation: validation accuracy=0.7166666666666667


## Next steps

Now that you have completed training and predicting with a neural network on GPUs, you reached the conclusion of the crash course. Congratulations.
If you are keen on studying more, checkout [D2L.ai](https://d2l.ai),
[GluonCV](https://cv.gluon.ai/tutorials/index.html), [GluonNLP](https://nlp.gluon.ai),
[GluonTS](https://ts.gluon.ai/), [AutoGluon](https://auto.gluon.ai).