<!--- Licensed to the Apache Software Foundation (ASF) under one -->
<!--- or more contributor license agreements.  See the NOTICE file -->
<!--- distributed with this work for additional information -->
<!--- regarding copyright ownership.  The ASF licenses this file -->
<!--- to you under the Apache License, Version 2.0 (the -->
<!--- "License"); you may not use this file except in compliance -->
<!--- with the License.  You may obtain a copy of the License at -->

<!---   http://www.apache.org/licenses/LICENSE-2.0 -->

<!--- Unless required by applicable law or agreed to in writing, -->
<!--- software distributed under the License is distributed on an -->
<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->
<!--- KIND, either express or implied.  See the License for the -->
<!--- specific language governing permissions and limitations -->
<!--- under the License. -->

# Step 7: Load and Run a NN using GPU

In this step, you will learn how to use graphics processing units (GPUs) with MXNet. If you use GPUs to train and deploy neural networks, you may be able to train or perform inference quicker than with central processing units (CPUs).

## Prerequisites

Before you start the steps, make sure you have at least one Nvidia GPU on your machine and make sure that you have CUDA properly installed. GPUs from AMD and Intel are not supported. Additionally, you will need to install the GPU-enabled version of MXNet. You can find information about how to install the GPU version of MXNet for your system [here](https://mxnet.apache.org/versions/1.4.1/install/ubuntu_setup.html).

You can use the following command to view the number GPUs that are available to MXNet.

In [1]:
from mxnet import np, npx, gluon, autograd
from mxnet.gluon import nn
import time
npx.set_np()

npx.num_gpus() #This command provides the number of GPUs MXNet can access

1

## Allocate data to a GPU

MXNet's ndarray is very similar to NumPy's. One major difference is that MXNet's ndarray has a `context` attribute specifieing which device an array is on. By default, arrays are stored on `npx.cpu()`. To change it to the first GPU, you can use the following code, `npx.gpu()` or `npx.gpu(0)` to indicate the first GPU.

In [2]:
gpu = npx.gpu() if npx.num_gpus() > 0 else npx.cpu()
x = np.ones((3,4), ctx=gpu)
x

[15:37:02] /work/mxnet/src/storage/storage.cc:202: Using Pooled (Naive) StorageManager for GPU


array([[1., 1., 1., 1.],
       [1., 1., 1., 1.],
       [1., 1., 1., 1.]], ctx=gpu(0))

If you're using a CPU, MXNet allocates data on the main memory and tries to use as many CPU cores as possible.  If there are multiple GPUs, MXNet will tell you which GPUs the ndarray is allocated on.

Assuming there is at least two GPUs. You can create another ndarray and assign it to a different GPU. If you only have one GPU, then you will get an error trying to run this code. In the example code here, you will copy `x` to the second GPU, `npx.gpu(1)`:

In [3]:
gpu_1 = npx.gpu(1) if npx.num_gpus() > 1 else npx.cpu()
x.copyto(gpu_1)

[15:37:02] /work/mxnet/src/storage/storage.cc:202: Using Pooled (Naive) StorageManager for CPU


array([[1., 1., 1., 1.],
       [1., 1., 1., 1.],
       [1., 1., 1., 1.]])

MXNet requries that users explicitly move data between devices. But several operators such as `print`, and `asnumpy`, will implicitly move data to main memory.

## Choosing GPU Ids
If you have multiple GPUs on your machine, MXNet can access each of them through 0-indexing with `npx`. As you saw before, the first GPU was accessed using `npx.gpu(0)`, and the second using `npx.gpu(1)`. This extends to however many GPUs your machine has. So if your machine has eight GPUs, the last GPU is accessed using `npx.gpu(7)`. This allows you to select which GPUs to use for operations and training. You might find it particularly useful when you want to leverage multiple GPUs while training neural networks.

## Run an operation on a GPU

To perform an operation on a particular GPU, you only need to guarantee that the input of an operation is already on that GPU. The output is allocated on the same GPU as well. Almost all operators in the `np` and `npx` module support running on a GPU.

In [4]:
y = np.random.uniform(size=(3,4), ctx=gpu)
x + y

array([[1.7402194, 1.9209938, 1.0390205, 1.9689629],
       [1.9251406, 1.4463501, 1.6673192, 1.1099306],
       [1.4702187, 1.5131936, 1.7761751, 1.2947657]], ctx=gpu(0))

Remember that if the inputs are not on the same GPU, you will get an error.

## Run a neural network on a GPU

To run a neural network on a GPU, you only need to copy and move the input data and parameters to the GPU. To demonstrate this you can reuse the previously defined LeafNetwork in [Training Neural Networks](6-train-nn.md). The following code example shows this.

In [5]:
# The convolutional block has a convolution layer, a max pool layer and a batch normalization layer
def conv_block(filters, kernel_size=2, stride=2, batch_norm=True):
    conv_block = nn.HybridSequential()
    conv_block.add(nn.Conv2D(channels=filters, kernel_size=kernel_size, activation='relu'),
              nn.MaxPool2D(pool_size=4, strides=stride))
    if batch_norm:
        conv_block.add(nn.BatchNorm())
    return conv_block

# The dense block consists of a dense layer and a dropout layer
def dense_block(neurons, activation='relu', dropout=0.2):
    dense_block = nn.HybridSequential()
    dense_block.add(nn.Dense(neurons, activation=activation))
    if dropout:
        dense_block.add(nn.Dropout(dropout))
    return dense_block

# Create neural network blueprint using the blocks
class LeafNetwork(nn.HybridBlock):
    def __init__(self):
        super(LeafNetwork, self).__init__()
        self.conv1 = conv_block(32)
        self.conv2 = conv_block(64)
        self.conv3 = conv_block(128)
        self.flatten = nn.Flatten()
        self.dense1 = dense_block(100)
        self.dense2 = dense_block(10)
        self.dense3 = nn.Dense(2)

    def forward(self, batch):
        batch = self.conv1(batch)
        batch = self.conv2(batch)
        batch = self.conv3(batch)
        batch = self.flatten(batch)
        batch = self.dense1(batch)
        batch = self.dense2(batch)
        batch = self.dense3(batch)

        return batch

Load the saved parameters onto GPU 0 directly as shown below; additionally, you could use `net.collect_params().reset_ctx(gpu)` to change the device.

In [6]:
net = LeafNetwork()
net.load_parameters('leaf_models.params', ctx=gpu)

Use the following command to create input data on GPU 0. The forward function will then run on GPU 0.

In [7]:
x = np.random.uniform(size=(1, 3, 128, 128), ctx=gpu)
net(x)

[15:37:03] /work/mxnet/src/operator/nn/./cudnn/./cudnn_algoreg-inl.h:106: Running performance tests to find the best convolution algorithm, this can take a while... (set the environment variable MXNET_CUDNN_AUTOTUNE_DEFAULT to 0 to disable)


array([[ 5.1561475, -4.7610435]], ctx=gpu(0))

## Training with multiple GPUs

Finally, you will see how you can use multiple GPUs to jointly train a neural network through data parallelism. To elaborate on what data parallelism is, assume there are *n* GPUs, then you can split each data batch into *n* parts, and use a GPU on each of these parts to run the forward and backward passes on the seperate chunks of the data.

First copy the data definitions with the following commands, and the transform functions from the tutorial [Training Neural Networks](6-train-nn.md).

In [8]:
# Import transforms as compose a series of transformations to the images
from mxnet.gluon.data.vision import transforms

jitter_param = 0.05

# mean and std for normalizing image value in range (0,1)
mean = [0.485, 0.456, 0.406]
std = [0.229, 0.224, 0.225]

training_transformer = transforms.Compose([
    transforms.Resize(size=224, keep_ratio=True),
    transforms.CenterCrop(128),
    transforms.RandomFlipLeftRight(),
    transforms.RandomColorJitter(contrast=jitter_param),
    transforms.ToTensor(),
    transforms.Normalize(mean, std)
])

validation_transformer = transforms.Compose([
    transforms.Resize(size=224, keep_ratio=True),
    transforms.CenterCrop(128),
    transforms.ToTensor(),
    transforms.Normalize(mean, std)
])

# Use ImageFolderDataset to create a Dataset object from directory structure
train_dataset = gluon.data.vision.ImageFolderDataset('./datasets/train')
val_dataset = gluon.data.vision.ImageFolderDataset('./datasets/validation')
test_dataset = gluon.data.vision.ImageFolderDataset('./datasets/test')

# Create data loaders
batch_size = 4
train_loader = gluon.data.DataLoader(train_dataset.transform_first(training_transformer),batch_size=batch_size, shuffle=True, try_nopython=True)
validation_loader = gluon.data.DataLoader(val_dataset.transform_first(validation_transformer), batch_size=batch_size, try_nopython=True)
test_loader = gluon.data.DataLoader(test_dataset.transform_first(validation_transformer), batch_size=batch_size, try_nopython=True)

### Define a helper function
This is the same test function defined previously in the **Step 6**.

In [9]:
# Function to return the accuracy for the validation and test set
def test(val_data, devices):
    acc = gluon.metric.Accuracy()
    for batch in val_data:
        data, label = batch[0], batch[1]
        data_list = gluon.utils.split_and_load(data, devices)
        label_list = gluon.utils.split_and_load(label, devices)
        outputs = [net(X) for X in data_list]
        acc.update(label_list, outputs)

    _, accuracy = acc.get()
    return accuracy

The training loop is quite similar to that shown earlier. The major differences are highlighted in the following code.

In [10]:
# Diff 1: Use two GPUs for training.
available_gpus = [npx.gpu(i) for i in range(npx.num_gpus())]
num_gpus = 2
devices = available_gpus[:num_gpus]
print('Using {} GPUs'.format(len(devices)))

# Diff 2: reinitialize the parameters and place them on multiple GPUs
net.initialize(force_reinit=True, ctx=devices)

# Loss and trainer are the same as before
loss_fn = gluon.loss.SoftmaxCrossEntropyLoss()
optimizer = 'sgd'
optimizer_params = {'learning_rate': 0.001}
trainer = gluon.Trainer(net.collect_params(), optimizer, optimizer_params)

epochs = 2
accuracy = gluon.metric.Accuracy()
log_interval = 5

for epoch in range(epochs):
    train_loss = 0.
    tic = time.time()
    btic = time.time()
    accuracy.reset()
    for idx, batch in enumerate(train_loader):
        data, label = batch[0], batch[1]

        # Diff 3: split batch and load into corresponding devices
        data_list = gluon.utils.split_and_load(data, devices)
        label_list = gluon.utils.split_and_load(label, devices)

        # Diff 4: run forward and backward on each devices.
        # MXNet will automatically run them in parallel
        with autograd.record():
            outputs = [net(X)
                      for X in data_list]
            losses = [loss_fn(output, label)
                      for output, label in zip(outputs, label_list)]
        for l in losses:
            l.backward()
        trainer.step(batch_size)

        # Diff 5: sum losses over all devices. Here, the float
        # function will copy data into CPU.
        train_loss += sum([float(l.sum()) for l in losses])
        accuracy.update(label_list, outputs)
        if log_interval and (idx + 1) % log_interval == 0:
            _, acc = accuracy.get()

            print(f"""Epoch[{epoch + 1}] Batch[{idx + 1}] Speed: {batch_size / (time.time() - btic)} samples/sec \
                  batch loss = {train_loss} | accuracy = {acc}""")
            btic = time.time()

    _, acc = accuracy.get()

    acc_val = test(validation_loader, devices)
    print(f"[Epoch {epoch + 1}] training: accuracy={acc}")
    print(f"[Epoch {epoch + 1}] time cost: {time.time() - tic}")
    print(f"[Epoch {epoch + 1}] validation: validation accuracy={acc_val}")

Using 1 GPUs


Epoch[1] Batch[5] Speed: 0.7959911069528408 samples/sec                   batch loss = 13.021339654922485 | accuracy = 0.7


Epoch[1] Batch[10] Speed: 1.278790986792882 samples/sec                   batch loss = 27.583625555038452 | accuracy = 0.55


Epoch[1] Batch[15] Speed: 1.274732702395497 samples/sec                   batch loss = 41.029130697250366 | accuracy = 0.5333333333333333


Epoch[1] Batch[20] Speed: 1.2781860703205847 samples/sec                   batch loss = 54.15745782852173 | accuracy = 0.525


Epoch[1] Batch[25] Speed: 1.278095708382492 samples/sec                   batch loss = 68.37223982810974 | accuracy = 0.52


Epoch[1] Batch[30] Speed: 1.2862777088190578 samples/sec                   batch loss = 83.33424305915833 | accuracy = 0.5166666666666667


Epoch[1] Batch[35] Speed: 1.2909019323422823 samples/sec                   batch loss = 97.87541007995605 | accuracy = 0.5


Epoch[1] Batch[40] Speed: 1.2873314998220227 samples/sec                   batch loss = 111.61775183677673 | accuracy = 0.5125


Epoch[1] Batch[45] Speed: 1.2859185492186518 samples/sec                   batch loss = 125.34869694709778 | accuracy = 0.5166666666666667


Epoch[1] Batch[50] Speed: 1.2873920536773495 samples/sec                   batch loss = 138.97705745697021 | accuracy = 0.53


Epoch[1] Batch[55] Speed: 1.2855447158371291 samples/sec                   batch loss = 153.42677855491638 | accuracy = 0.5318181818181819


Epoch[1] Batch[60] Speed: 1.2870698896725958 samples/sec                   batch loss = 166.73014307022095 | accuracy = 0.5291666666666667


Epoch[1] Batch[65] Speed: 1.2851051437170566 samples/sec                   batch loss = 181.66497492790222 | accuracy = 0.5115384615384615


Epoch[1] Batch[70] Speed: 1.2771028612916835 samples/sec                   batch loss = 195.18508744239807 | accuracy = 0.5178571428571429


Epoch[1] Batch[75] Speed: 1.2603957457965438 samples/sec                   batch loss = 209.78490352630615 | accuracy = 0.5033333333333333


Epoch[1] Batch[80] Speed: 1.2547633387658095 samples/sec                   batch loss = 224.02922582626343 | accuracy = 0.503125


Epoch[1] Batch[85] Speed: 1.2566482894425899 samples/sec                   batch loss = 237.81672072410583 | accuracy = 0.5058823529411764


Epoch[1] Batch[90] Speed: 1.2589953561026683 samples/sec                   batch loss = 251.60504722595215 | accuracy = 0.5083333333333333


Epoch[1] Batch[95] Speed: 1.2555707235240723 samples/sec                   batch loss = 265.41394567489624 | accuracy = 0.5131578947368421


Epoch[1] Batch[100] Speed: 1.254178872893669 samples/sec                   batch loss = 279.47403860092163 | accuracy = 0.515


Epoch[1] Batch[105] Speed: 1.2539572728734947 samples/sec                   batch loss = 293.1193301677704 | accuracy = 0.5166666666666667


Epoch[1] Batch[110] Speed: 1.2541752164203712 samples/sec                   batch loss = 307.22746872901917 | accuracy = 0.5159090909090909


Epoch[1] Batch[115] Speed: 1.2562806484502202 samples/sec                   batch loss = 320.8226239681244 | accuracy = 0.5152173913043478


Epoch[1] Batch[120] Speed: 1.252344037326941 samples/sec                   batch loss = 335.05130410194397 | accuracy = 0.5166666666666667


Epoch[1] Batch[125] Speed: 1.2535600137959313 samples/sec                   batch loss = 349.3795657157898 | accuracy = 0.516


Epoch[1] Batch[130] Speed: 1.2580389317432121 samples/sec                   batch loss = 363.38984394073486 | accuracy = 0.5134615384615384


Epoch[1] Batch[135] Speed: 1.2575481148189536 samples/sec                   batch loss = 377.44809889793396 | accuracy = 0.5111111111111111


Epoch[1] Batch[140] Speed: 1.2567774429893555 samples/sec                   batch loss = 391.3754172325134 | accuracy = 0.5125


Epoch[1] Batch[145] Speed: 1.2554019864832577 samples/sec                   batch loss = 405.2796230316162 | accuracy = 0.5103448275862069


Epoch[1] Batch[150] Speed: 1.2504698238691545 samples/sec                   batch loss = 418.41769313812256 | accuracy = 0.5166666666666667


Epoch[1] Batch[155] Speed: 1.2556015445413993 samples/sec                   batch loss = 431.85930037498474 | accuracy = 0.5161290322580645


Epoch[1] Batch[160] Speed: 1.256827153469602 samples/sec                   batch loss = 445.563045501709 | accuracy = 0.51875


Epoch[1] Batch[165] Speed: 1.2513348730240004 samples/sec                   batch loss = 459.8511679172516 | accuracy = 0.5196969696969697


Epoch[1] Batch[170] Speed: 1.2514040353214069 samples/sec                   batch loss = 473.7527222633362 | accuracy = 0.5205882352941177


Epoch[1] Batch[175] Speed: 1.2540714377227784 samples/sec                   batch loss = 487.30538988113403 | accuracy = 0.5214285714285715


Epoch[1] Batch[180] Speed: 1.2525701173034693 samples/sec                   batch loss = 500.8864300251007 | accuracy = 0.5236111111111111


Epoch[1] Batch[185] Speed: 1.255738941993871 samples/sec                   batch loss = 515.301349401474 | accuracy = 0.5175675675675676


Epoch[1] Batch[190] Speed: 1.2539501499757988 samples/sec                   batch loss = 529.0397543907166 | accuracy = 0.5197368421052632


Epoch[1] Batch[195] Speed: 1.258050817939962 samples/sec                   batch loss = 542.5152764320374 | accuracy = 0.5205128205128206


Epoch[1] Batch[200] Speed: 1.2523351566119116 samples/sec                   batch loss = 556.0281825065613 | accuracy = 0.52125


Epoch[1] Batch[205] Speed: 1.2564898960849726 samples/sec                   batch loss = 569.623256444931 | accuracy = 0.5231707317073171


Epoch[1] Batch[210] Speed: 1.2559780965390563 samples/sec                   batch loss = 583.5742948055267 | accuracy = 0.5214285714285715


Epoch[1] Batch[215] Speed: 1.2594578042776572 samples/sec                   batch loss = 597.1954283714294 | accuracy = 0.5244186046511627


Epoch[1] Batch[220] Speed: 1.251871197640361 samples/sec                   batch loss = 610.6698834896088 | accuracy = 0.5272727272727272


Epoch[1] Batch[225] Speed: 1.2545957568333534 samples/sec                   batch loss = 623.7910859584808 | accuracy = 0.5311111111111111


Epoch[1] Batch[230] Speed: 1.257271898240299 samples/sec                   batch loss = 636.8112618923187 | accuracy = 0.5358695652173913


Epoch[1] Batch[235] Speed: 1.2567351733451226 samples/sec                   batch loss = 650.3380837440491 | accuracy = 0.5361702127659574


Epoch[1] Batch[240] Speed: 1.2576941411488431 samples/sec                   batch loss = 663.6409647464752 | accuracy = 0.5385416666666667


Epoch[1] Batch[245] Speed: 1.256857000476756 samples/sec                   batch loss = 677.0029807090759 | accuracy = 0.539795918367347


Epoch[1] Batch[250] Speed: 1.25353828424147 samples/sec                   batch loss = 691.1726469993591 | accuracy = 0.538


Epoch[1] Batch[255] Speed: 1.2561458598895412 samples/sec                   batch loss = 704.8298141956329 | accuracy = 0.5392156862745098


Epoch[1] Batch[260] Speed: 1.2548804662647042 samples/sec                   batch loss = 717.4342806339264 | accuracy = 0.5413461538461538


Epoch[1] Batch[265] Speed: 1.2615364394135695 samples/sec                   batch loss = 731.5742974281311 | accuracy = 0.5415094339622641


Epoch[1] Batch[270] Speed: 1.2502328553864281 samples/sec                   batch loss = 745.5894312858582 | accuracy = 0.5425925925925926


Epoch[1] Batch[275] Speed: 1.2431153104458428 samples/sec                   batch loss = 759.5370500087738 | accuracy = 0.5409090909090909


Epoch[1] Batch[280] Speed: 1.2526772018521792 samples/sec                   batch loss = 773.8707146644592 | accuracy = 0.5383928571428571


Epoch[1] Batch[285] Speed: 1.2513061276515702 samples/sec                   batch loss = 787.6806769371033 | accuracy = 0.537719298245614


Epoch[1] Batch[290] Speed: 1.2558821981645358 samples/sec                   batch loss = 801.0292499065399 | accuracy = 0.5405172413793103


Epoch[1] Batch[295] Speed: 1.2407613208121369 samples/sec                   batch loss = 814.0392398834229 | accuracy = 0.5423728813559322


Epoch[1] Batch[300] Speed: 1.2528669127014591 samples/sec                   batch loss = 827.1071441173553 | accuracy = 0.5441666666666667


Epoch[1] Batch[305] Speed: 1.2610540748990482 samples/sec                   batch loss = 840.4290544986725 | accuracy = 0.5467213114754098


Epoch[1] Batch[310] Speed: 1.25516314516777 samples/sec                   batch loss = 854.8211305141449 | accuracy = 0.5451612903225806


Epoch[1] Batch[315] Speed: 1.25878310096628 samples/sec                   batch loss = 867.9435710906982 | accuracy = 0.5476190476190477


Epoch[1] Batch[320] Speed: 1.2563656939374714 samples/sec                   batch loss = 882.0267467498779 | accuracy = 0.546875


Epoch[1] Batch[325] Speed: 1.2574561233884578 samples/sec                   batch loss = 895.525673866272 | accuracy = 0.5476923076923077


Epoch[1] Batch[330] Speed: 1.2539233461123196 samples/sec                   batch loss = 908.6096019744873 | accuracy = 0.55


Epoch[1] Batch[335] Speed: 1.2528706551106636 samples/sec                   batch loss = 921.3549115657806 | accuracy = 0.5522388059701493


Epoch[1] Batch[340] Speed: 1.2602258987998376 samples/sec                   batch loss = 935.3446600437164 | accuracy = 0.55


Epoch[1] Batch[345] Speed: 1.2565667819909043 samples/sec                   batch loss = 948.7366170883179 | accuracy = 0.5507246376811594


Epoch[1] Batch[350] Speed: 1.2596619641493874 samples/sec                   batch loss = 961.9751484394073 | accuracy = 0.5535714285714286


Epoch[1] Batch[355] Speed: 1.2585082297442194 samples/sec                   batch loss = 975.177743434906 | accuracy = 0.5563380281690141


Epoch[1] Batch[360] Speed: 1.2531411033505053 samples/sec                   batch loss = 988.9694304466248 | accuracy = 0.5576388888888889


Epoch[1] Batch[365] Speed: 1.2546505491353104 samples/sec                   batch loss = 1002.4232959747314 | accuracy = 0.5595890410958904


Epoch[1] Batch[370] Speed: 1.254749637778629 samples/sec                   batch loss = 1015.8805522918701 | accuracy = 0.5601351351351351


Epoch[1] Batch[375] Speed: 1.2560387457173052 samples/sec                   batch loss = 1030.0652413368225 | accuracy = 0.5573333333333333


Epoch[1] Batch[380] Speed: 1.2563353057875672 samples/sec                   batch loss = 1043.3064107894897 | accuracy = 0.5585526315789474


Epoch[1] Batch[385] Speed: 1.2550757273633881 samples/sec                   batch loss = 1056.078706741333 | accuracy = 0.5590909090909091


Epoch[1] Batch[390] Speed: 1.2450912407025676 samples/sec                   batch loss = 1069.3247332572937 | accuracy = 0.5602564102564103


Epoch[1] Batch[395] Speed: 1.2511539298494772 samples/sec                   batch loss = 1083.2210130691528 | accuracy = 0.5582278481012658


Epoch[1] Batch[400] Speed: 1.2533525834814934 samples/sec                   batch loss = 1096.3415327072144 | accuracy = 0.559375


Epoch[1] Batch[405] Speed: 1.2540989978699204 samples/sec                   batch loss = 1109.5824337005615 | accuracy = 0.5598765432098766


Epoch[1] Batch[410] Speed: 1.249800710120812 samples/sec                   batch loss = 1123.3172721862793 | accuracy = 0.5591463414634147


Epoch[1] Batch[415] Speed: 1.2567642628397562 samples/sec                   batch loss = 1136.4932148456573 | accuracy = 0.5590361445783133


Epoch[1] Batch[420] Speed: 1.262861590811767 samples/sec                   batch loss = 1149.8470406532288 | accuracy = 0.5595238095238095


Epoch[1] Batch[425] Speed: 1.2583963704766625 samples/sec                   batch loss = 1164.0111465454102 | accuracy = 0.558235294117647


Epoch[1] Batch[430] Speed: 1.2563450900468902 samples/sec                   batch loss = 1177.618931055069 | accuracy = 0.5575581395348838


Epoch[1] Batch[435] Speed: 1.2619774044489602 samples/sec                   batch loss = 1191.197227716446 | accuracy = 0.5568965517241379


Epoch[1] Batch[440] Speed: 1.2550858675799343 samples/sec                   batch loss = 1204.5210950374603 | accuracy = 0.5579545454545455


Epoch[1] Batch[445] Speed: 1.2586991443926976 samples/sec                   batch loss = 1218.170494556427 | accuracy = 0.5589887640449438


Epoch[1] Batch[450] Speed: 1.2585220129371726 samples/sec                   batch loss = 1232.854101896286 | accuracy = 0.5561111111111111


Epoch[1] Batch[455] Speed: 1.256385075382883 samples/sec                   batch loss = 1245.379578590393 | accuracy = 0.5593406593406594


Epoch[1] Batch[460] Speed: 1.265787128083152 samples/sec                   batch loss = 1259.765213727951 | accuracy = 0.558695652173913


Epoch[1] Batch[465] Speed: 1.2551334724533283 samples/sec                   batch loss = 1272.9708499908447 | accuracy = 0.5602150537634408


Epoch[1] Batch[470] Speed: 1.2509686549780858 samples/sec                   batch loss = 1286.8967599868774 | accuracy = 0.5590425531914893


Epoch[1] Batch[475] Speed: 1.2552256878251933 samples/sec                   batch loss = 1299.614207983017 | accuracy = 0.5594736842105263


Epoch[1] Batch[480] Speed: 1.2612560029547457 samples/sec                   batch loss = 1312.6166591644287 | accuracy = 0.5614583333333333


Epoch[1] Batch[485] Speed: 1.2572597441059354 samples/sec                   batch loss = 1326.0108938217163 | accuracy = 0.5618556701030928


Epoch[1] Batch[490] Speed: 1.253522549517127 samples/sec                   batch loss = 1340.3635714054108 | accuracy = 0.561734693877551


Epoch[1] Batch[495] Speed: 1.252397604731773 samples/sec                   batch loss = 1353.686688899994 | accuracy = 0.5631313131313131


Epoch[1] Batch[500] Speed: 1.2539189413912364 samples/sec                   batch loss = 1365.9597599506378 | accuracy = 0.565


Epoch[1] Batch[505] Speed: 1.2491633758382437 samples/sec                   batch loss = 1379.5864305496216 | accuracy = 0.5643564356435643


Epoch[1] Batch[510] Speed: 1.2497913068326656 samples/sec                   batch loss = 1393.4677393436432 | accuracy = 0.5627450980392157


Epoch[1] Batch[515] Speed: 1.2537296613434097 samples/sec                   batch loss = 1407.009388923645 | accuracy = 0.5626213592233009


Epoch[1] Batch[520] Speed: 1.2442264025344307 samples/sec                   batch loss = 1421.2689349651337 | accuracy = 0.5620192307692308


Epoch[1] Batch[525] Speed: 1.243845889055891 samples/sec                   batch loss = 1435.120732307434 | accuracy = 0.5604761904761905


Epoch[1] Batch[530] Speed: 1.2422292074039964 samples/sec                   batch loss = 1448.6219482421875 | accuracy = 0.5613207547169812


Epoch[1] Batch[535] Speed: 1.242706480167543 samples/sec                   batch loss = 1462.0137820243835 | accuracy = 0.561214953271028


Epoch[1] Batch[540] Speed: 1.2418408141491495 samples/sec                   batch loss = 1475.2066309452057 | accuracy = 0.5611111111111111


Epoch[1] Batch[545] Speed: 1.2372840957094073 samples/sec                   batch loss = 1490.7597081661224 | accuracy = 0.5610091743119267


Epoch[1] Batch[550] Speed: 1.2380991036239521 samples/sec                   batch loss = 1504.50554895401 | accuracy = 0.5609090909090909


Epoch[1] Batch[555] Speed: 1.2426829161704616 samples/sec                   batch loss = 1517.6118495464325 | accuracy = 0.5608108108108109


Epoch[1] Batch[560] Speed: 1.2415089782357998 samples/sec                   batch loss = 1531.078340768814 | accuracy = 0.5620535714285714


Epoch[1] Batch[565] Speed: 1.2439587734963156 samples/sec                   batch loss = 1543.5547568798065 | accuracy = 0.5637168141592921


Epoch[1] Batch[570] Speed: 1.243100204723231 samples/sec                   batch loss = 1556.8779101371765 | accuracy = 0.5649122807017544


Epoch[1] Batch[575] Speed: 1.2410721002786957 samples/sec                   batch loss = 1570.675675868988 | accuracy = 0.5639130434782609


Epoch[1] Batch[580] Speed: 1.2466674904855237 samples/sec                   batch loss = 1583.620052576065 | accuracy = 0.5642241379310344


Epoch[1] Batch[585] Speed: 1.2424995893442499 samples/sec                   batch loss = 1597.314251422882 | accuracy = 0.5645299145299145


Epoch[1] Batch[590] Speed: 1.245232539748102 samples/sec                   batch loss = 1611.7332739830017 | accuracy = 0.563135593220339


Epoch[1] Batch[595] Speed: 1.2465041940161734 samples/sec                   batch loss = 1625.6663608551025 | accuracy = 0.5638655462184874


Epoch[1] Batch[600] Speed: 1.2427908944756927 samples/sec                   batch loss = 1639.812644481659 | accuracy = 0.56375


Epoch[1] Batch[605] Speed: 1.244484546211084 samples/sec                   batch loss = 1652.0747299194336 | accuracy = 0.565702479338843


Epoch[1] Batch[610] Speed: 1.2458986201945113 samples/sec                   batch loss = 1664.6996273994446 | accuracy = 0.5663934426229508


Epoch[1] Batch[615] Speed: 1.2413484998314874 samples/sec                   batch loss = 1676.9317367076874 | accuracy = 0.5682926829268292


Epoch[1] Batch[620] Speed: 1.2460919288641221 samples/sec                   batch loss = 1688.995990037918 | accuracy = 0.5705645161290323


Epoch[1] Batch[625] Speed: 1.2416417464524265 samples/sec                   batch loss = 1701.6539325714111 | accuracy = 0.5712


Epoch[1] Batch[630] Speed: 1.242491123749397 samples/sec                   batch loss = 1714.680382013321 | accuracy = 0.5722222222222222


Epoch[1] Batch[635] Speed: 1.2426922127969504 samples/sec                   batch loss = 1727.455576658249 | accuracy = 0.5728346456692913


Epoch[1] Batch[640] Speed: 1.2466662862143087 samples/sec                   batch loss = 1739.1944484710693 | accuracy = 0.57421875


Epoch[1] Batch[645] Speed: 1.2444790075034955 samples/sec                   batch loss = 1752.3853163719177 | accuracy = 0.5744186046511628


Epoch[1] Batch[650] Speed: 1.2532079379620384 samples/sec                   batch loss = 1765.311458349228 | accuracy = 0.5746153846153846


Epoch[1] Batch[655] Speed: 1.242562716873543 samples/sec                   batch loss = 1778.4273809194565 | accuracy = 0.5763358778625954


Epoch[1] Batch[660] Speed: 1.2464677985352985 samples/sec                   batch loss = 1791.2619704008102 | accuracy = 0.5772727272727273


Epoch[1] Batch[665] Speed: 1.2493676539996899 samples/sec                   batch loss = 1804.6401470899582 | accuracy = 0.5778195488721805


Epoch[1] Batch[670] Speed: 1.2559246925000396 samples/sec                   batch loss = 1817.616945385933 | accuracy = 0.5779850746268657


Epoch[1] Batch[675] Speed: 1.2435277279320587 samples/sec                   batch loss = 1829.690945982933 | accuracy = 0.5792592592592593


Epoch[1] Batch[680] Speed: 1.2445558153137666 samples/sec                   batch loss = 1841.3817640542984 | accuracy = 0.580514705882353


Epoch[1] Batch[685] Speed: 1.2457713227831615 samples/sec                   batch loss = 1853.8815211057663 | accuracy = 0.5802919708029197


Epoch[1] Batch[690] Speed: 1.2503715963828304 samples/sec                   batch loss = 1868.1650332212448 | accuracy = 0.5800724637681159


Epoch[1] Batch[695] Speed: 1.2527807499129517 samples/sec                   batch loss = 1880.8901880979538 | accuracy = 0.5805755395683453


Epoch[1] Batch[700] Speed: 1.2477114491167183 samples/sec                   batch loss = 1893.665188908577 | accuracy = 0.5810714285714286


Epoch[1] Batch[705] Speed: 1.2485013265481382 samples/sec                   batch loss = 1906.4983110427856 | accuracy = 0.5812056737588652


Epoch[1] Batch[710] Speed: 1.2587111375032496 samples/sec                   batch loss = 1919.331735253334 | accuracy = 0.5820422535211267


Epoch[1] Batch[715] Speed: 1.262642043519611 samples/sec                   batch loss = 1932.3620918989182 | accuracy = 0.5818181818181818


Epoch[1] Batch[720] Speed: 1.2533714975321741 samples/sec                   batch loss = 1945.9387415647507 | accuracy = 0.5819444444444445


Epoch[1] Batch[725] Speed: 1.2475811833835957 samples/sec                   batch loss = 1959.8908542394638 | accuracy = 0.5817241379310345


Epoch[1] Batch[730] Speed: 1.2491736067616077 samples/sec                   batch loss = 1971.619180560112 | accuracy = 0.5832191780821918


Epoch[1] Batch[735] Speed: 1.2497419651647013 samples/sec                   batch loss = 1984.6294292211533 | accuracy = 0.5840136054421768


Epoch[1] Batch[740] Speed: 1.2459727348107021 samples/sec                   batch loss = 1999.5200599431992 | accuracy = 0.5844594594594594


Epoch[1] Batch[745] Speed: 1.2455661844679295 samples/sec                   batch loss = 2012.5249601602554 | accuracy = 0.5845637583892618


Epoch[1] Batch[750] Speed: 1.2500640596494248 samples/sec                   batch loss = 2025.1707171201706 | accuracy = 0.585


Epoch[1] Batch[755] Speed: 1.252183362603063 samples/sec                   batch loss = 2037.9128559827805 | accuracy = 0.5854304635761589


Epoch[1] Batch[760] Speed: 1.251258812658158 samples/sec                   batch loss = 2051.9218126535416 | accuracy = 0.5851973684210526


Epoch[1] Batch[765] Speed: 1.2546786977265383 samples/sec                   batch loss = 2063.8030141592026 | accuracy = 0.5856209150326798


Epoch[1] Batch[770] Speed: 1.2575080554207854 samples/sec                   batch loss = 2076.5743030309677 | accuracy = 0.5857142857142857


Epoch[1] Batch[775] Speed: 1.2572271458051478 samples/sec                   batch loss = 2089.651391386986 | accuracy = 0.5858064516129032


Epoch[1] Batch[780] Speed: 1.2512178465503279 samples/sec                   batch loss = 2104.2069326639175 | accuracy = 0.5855769230769231


Epoch[1] Batch[785] Speed: 1.2527096581648223 samples/sec                   batch loss = 2117.4634860754013 | accuracy = 0.5856687898089172


[Epoch 1] training: accuracy=0.5853426395939086
[Epoch 1] time cost: 646.744110584259
[Epoch 1] validation: validation accuracy=0.6755555555555556


Epoch[2] Batch[5] Speed: 1.2546208068296199 samples/sec                   batch loss = 14.231284856796265 | accuracy = 0.5


Epoch[2] Batch[10] Speed: 1.256288738565164 samples/sec                   batch loss = 25.512755632400513 | accuracy = 0.6


Epoch[2] Batch[15] Speed: 1.2550253103860478 samples/sec                   batch loss = 39.32652688026428 | accuracy = 0.6


Epoch[2] Batch[20] Speed: 1.2565656526316649 samples/sec                   batch loss = 52.27813267707825 | accuracy = 0.6


Epoch[2] Batch[25] Speed: 1.2513720199742568 samples/sec                   batch loss = 66.05025744438171 | accuracy = 0.59


Epoch[2] Batch[30] Speed: 1.2530899993397406 samples/sec                   batch loss = 79.38292360305786 | accuracy = 0.6


Epoch[2] Batch[35] Speed: 1.2493352775957907 samples/sec                   batch loss = 91.36757564544678 | accuracy = 0.6142857142857143


Epoch[2] Batch[40] Speed: 1.2512478010033719 samples/sec                   batch loss = 104.15080690383911 | accuracy = 0.63125


Epoch[2] Batch[45] Speed: 1.2536777598695121 samples/sec                   batch loss = 116.54228258132935 | accuracy = 0.6277777777777778


Epoch[2] Batch[50] Speed: 1.25562973582121 samples/sec                   batch loss = 130.36914920806885 | accuracy = 0.61


Epoch[2] Batch[55] Speed: 1.2557474010959149 samples/sec                   batch loss = 142.68245387077332 | accuracy = 0.6181818181818182


Epoch[2] Batch[60] Speed: 1.2574697892968865 samples/sec                   batch loss = 156.68294596672058 | accuracy = 0.6166666666666667


Epoch[2] Batch[65] Speed: 1.2560564244096903 samples/sec                   batch loss = 170.4556906223297 | accuracy = 0.6115384615384616


Epoch[2] Batch[70] Speed: 1.25434962581207 samples/sec                   batch loss = 182.43594622612 | accuracy = 0.625


Epoch[2] Batch[75] Speed: 1.2531981088955901 samples/sec                   batch loss = 194.495525598526 | accuracy = 0.6366666666666667


Epoch[2] Batch[80] Speed: 1.2521851383054527 samples/sec                   batch loss = 206.69270133972168 | accuracy = 0.646875


Epoch[2] Batch[85] Speed: 1.2529406422812024 samples/sec                   batch loss = 217.8775978088379 | accuracy = 0.65


Epoch[2] Batch[90] Speed: 1.259066785139646 samples/sec                   batch loss = 230.97218930721283 | accuracy = 0.6444444444444445


Epoch[2] Batch[95] Speed: 1.2509040175966832 samples/sec                   batch loss = 242.69096493721008 | accuracy = 0.6447368421052632


Epoch[2] Batch[100] Speed: 1.252237570524691 samples/sec                   batch loss = 254.58694899082184 | accuracy = 0.645


Epoch[2] Batch[105] Speed: 1.2529165014379566 samples/sec                   batch loss = 266.0304020643234 | accuracy = 0.6547619047619048


Epoch[2] Batch[110] Speed: 1.2544041152478755 samples/sec                   batch loss = 277.0948318243027 | accuracy = 0.6613636363636364


Epoch[2] Batch[115] Speed: 1.2580762891179114 samples/sec                   batch loss = 289.60208797454834 | accuracy = 0.6608695652173913


Epoch[2] Batch[120] Speed: 1.2523528246847306 samples/sec                   batch loss = 301.50218892097473 | accuracy = 0.6604166666666667


Epoch[2] Batch[125] Speed: 1.2593643988565957 samples/sec                   batch loss = 314.53022480010986 | accuracy = 0.662


Epoch[2] Batch[130] Speed: 1.253393876774971 samples/sec                   batch loss = 327.4932118654251 | accuracy = 0.6615384615384615


Epoch[2] Batch[135] Speed: 1.250881633913554 samples/sec                   batch loss = 340.3043409585953 | accuracy = 0.6611111111111111


Epoch[2] Batch[140] Speed: 1.2554036773867348 samples/sec                   batch loss = 353.5726727247238 | accuracy = 0.6625


Epoch[2] Batch[145] Speed: 1.2574237033870332 samples/sec                   batch loss = 367.2144434452057 | accuracy = 0.6620689655172414


Epoch[2] Batch[150] Speed: 1.2603765244691474 samples/sec                   batch loss = 382.30321168899536 | accuracy = 0.6566666666666666


Epoch[2] Batch[155] Speed: 1.2527430515731925 samples/sec                   batch loss = 394.54210126399994 | accuracy = 0.6564516129032258


Epoch[2] Batch[160] Speed: 1.2533876966531536 samples/sec                   batch loss = 406.3049564361572 | accuracy = 0.6609375


Epoch[2] Batch[165] Speed: 1.2525144779761264 samples/sec                   batch loss = 418.35813331604004 | accuracy = 0.6621212121212121


Epoch[2] Batch[170] Speed: 1.25221308291683 samples/sec                   batch loss = 430.61431789398193 | accuracy = 0.663235294117647


Epoch[2] Batch[175] Speed: 1.2522410287754968 samples/sec                   batch loss = 443.3186991214752 | accuracy = 0.6642857142857143


Epoch[2] Batch[180] Speed: 1.2526874904147531 samples/sec                   batch loss = 455.3955750465393 | accuracy = 0.6652777777777777


Epoch[2] Batch[185] Speed: 1.247803412319455 samples/sec                   batch loss = 467.8201644420624 | accuracy = 0.6648648648648648


Epoch[2] Batch[190] Speed: 1.2494027303309478 samples/sec                   batch loss = 479.37680315971375 | accuracy = 0.6644736842105263


Epoch[2] Batch[195] Speed: 1.2540128529991017 samples/sec                   batch loss = 495.99836134910583 | accuracy = 0.6564102564102564


Epoch[2] Batch[200] Speed: 1.2529806919333788 samples/sec                   batch loss = 507.7471343278885 | accuracy = 0.6575


Epoch[2] Batch[205] Speed: 1.2540194142159586 samples/sec                   batch loss = 518.4090617895126 | accuracy = 0.6621951219512195


Epoch[2] Batch[210] Speed: 1.2546337544108292 samples/sec                   batch loss = 530.4119831323624 | accuracy = 0.6642857142857143


Epoch[2] Batch[215] Speed: 1.2625426545714376 samples/sec                   batch loss = 543.4713450670242 | accuracy = 0.663953488372093


Epoch[2] Batch[220] Speed: 1.2581638423650565 samples/sec                   batch loss = 557.2902921438217 | accuracy = 0.6625


Epoch[2] Batch[225] Speed: 1.2626355818122308 samples/sec                   batch loss = 568.1103593111038 | accuracy = 0.6644444444444444


Epoch[2] Batch[230] Speed: 1.254198093167994 samples/sec                   batch loss = 580.4745978116989 | accuracy = 0.6641304347826087


Epoch[2] Batch[235] Speed: 1.2565762875116184 samples/sec                   batch loss = 593.8806582689285 | accuracy = 0.6638297872340425


Epoch[2] Batch[240] Speed: 1.257167889061926 samples/sec                   batch loss = 604.4163188934326 | accuracy = 0.665625


Epoch[2] Batch[245] Speed: 1.2580364790639016 samples/sec                   batch loss = 616.6055907011032 | accuracy = 0.6663265306122449


Epoch[2] Batch[250] Speed: 1.257702532329743 samples/sec                   batch loss = 627.2289770841599 | accuracy = 0.671


Epoch[2] Batch[255] Speed: 1.2517755517810358 samples/sec                   batch loss = 638.1709456443787 | accuracy = 0.6715686274509803


Epoch[2] Batch[260] Speed: 1.256769534866422 samples/sec                   batch loss = 650.394704580307 | accuracy = 0.6721153846153847


Epoch[2] Batch[265] Speed: 1.2546485787812245 samples/sec                   batch loss = 663.5776968002319 | accuracy = 0.6726415094339623


Epoch[2] Batch[270] Speed: 1.2633519986036064 samples/sec                   batch loss = 675.1398748159409 | accuracy = 0.6722222222222223


Epoch[2] Batch[275] Speed: 1.2604038889987397 samples/sec                   batch loss = 687.0497518777847 | accuracy = 0.6727272727272727


Epoch[2] Batch[280] Speed: 1.2563389748669542 samples/sec                   batch loss = 701.0598328113556 | accuracy = 0.66875


Epoch[2] Batch[285] Speed: 1.257244857966957 samples/sec                   batch loss = 711.527982711792 | accuracy = 0.6701754385964912


Epoch[2] Batch[290] Speed: 1.261962311462159 samples/sec                   batch loss = 723.2639706134796 | accuracy = 0.6698275862068965


Epoch[2] Batch[295] Speed: 1.2554599494948966 samples/sec                   batch loss = 737.7841379642487 | accuracy = 0.6652542372881356


Epoch[2] Batch[300] Speed: 1.2583077467468697 samples/sec                   batch loss = 752.5312298536301 | accuracy = 0.6616666666666666


Epoch[2] Batch[305] Speed: 1.255346471015615 samples/sec                   batch loss = 763.9386774301529 | accuracy = 0.6614754098360656


Epoch[2] Batch[310] Speed: 1.2553617818978966 samples/sec                   batch loss = 776.6440087556839 | accuracy = 0.6596774193548387


Epoch[2] Batch[315] Speed: 1.2574274730687491 samples/sec                   batch loss = 788.3297387361526 | accuracy = 0.6611111111111111


Epoch[2] Batch[320] Speed: 1.2601917266477507 samples/sec                   batch loss = 798.5955352783203 | accuracy = 0.66328125


Epoch[2] Batch[325] Speed: 1.2575909105014873 samples/sec                   batch loss = 810.3380620479584 | accuracy = 0.6638461538461539


Epoch[2] Batch[330] Speed: 1.2506101090365611 samples/sec                   batch loss = 822.6045095920563 | accuracy = 0.6621212121212121


Epoch[2] Batch[335] Speed: 1.251124166490364 samples/sec                   batch loss = 834.6329321861267 | accuracy = 0.6611940298507463


Epoch[2] Batch[340] Speed: 1.2544369424941866 samples/sec                   batch loss = 844.9470448493958 | accuracy = 0.663235294117647


Epoch[2] Batch[345] Speed: 1.2547150112748644 samples/sec                   batch loss = 857.1982063055038 | accuracy = 0.663768115942029


Epoch[2] Batch[350] Speed: 1.2556593380274943 samples/sec                   batch loss = 867.226858496666 | accuracy = 0.665


Epoch[2] Batch[355] Speed: 1.253908351443781 samples/sec                   batch loss = 877.6195948123932 | accuracy = 0.6676056338028169


Epoch[2] Batch[360] Speed: 1.2558361345446754 samples/sec                   batch loss = 889.459402680397 | accuracy = 0.6680555555555555


Epoch[2] Batch[365] Speed: 1.2618486984894337 samples/sec                   batch loss = 901.5853090286255 | accuracy = 0.6671232876712329


Epoch[2] Batch[370] Speed: 1.2566872586346232 samples/sec                   batch loss = 911.8265775442123 | accuracy = 0.6682432432432432


Epoch[2] Batch[375] Speed: 1.251319566882084 samples/sec                   batch loss = 925.2090512514114 | accuracy = 0.6673333333333333


Epoch[2] Batch[380] Speed: 1.252245608650557 samples/sec                   batch loss = 941.5858215093613 | accuracy = 0.6644736842105263


Epoch[2] Batch[385] Speed: 1.252771956548929 samples/sec                   batch loss = 953.5523867607117 | accuracy = 0.6636363636363637


Epoch[2] Batch[390] Speed: 1.2531149892418136 samples/sec                   batch loss = 966.6985274553299 | accuracy = 0.6615384615384615


Epoch[2] Batch[395] Speed: 1.2558013540503583 samples/sec                   batch loss = 979.7358821630478 | accuracy = 0.660759493670886


Epoch[2] Batch[400] Speed: 1.249945408254987 samples/sec                   batch loss = 991.0781946182251 | accuracy = 0.661875


Epoch[2] Batch[405] Speed: 1.258599997509389 samples/sec                   batch loss = 1002.9868507385254 | accuracy = 0.662962962962963


Epoch[2] Batch[410] Speed: 1.2547289930205898 samples/sec                   batch loss = 1013.4260242581367 | accuracy = 0.6634146341463415


Epoch[2] Batch[415] Speed: 1.2530630450571083 samples/sec                   batch loss = 1026.0901312232018 | accuracy = 0.6632530120481928


Epoch[2] Batch[420] Speed: 1.2558421508256212 samples/sec                   batch loss = 1040.1256253123283 | accuracy = 0.6625


Epoch[2] Batch[425] Speed: 1.2506279149199324 samples/sec                   batch loss = 1051.1144135594368 | accuracy = 0.6635294117647059


Epoch[2] Batch[430] Speed: 1.2556692057397647 samples/sec                   batch loss = 1063.9295080304146 | accuracy = 0.6633720930232558


Epoch[2] Batch[435] Speed: 1.2503559410556842 samples/sec                   batch loss = 1074.9601905941963 | accuracy = 0.6637931034482759


Epoch[2] Batch[440] Speed: 1.2541687473275345 samples/sec                   batch loss = 1088.460208952427 | accuracy = 0.6630681818181818


Epoch[2] Batch[445] Speed: 1.2572221525731133 samples/sec                   batch loss = 1096.3198773264885 | accuracy = 0.6651685393258427


Epoch[2] Batch[450] Speed: 1.2655838421339323 samples/sec                   batch loss = 1108.4934752583504 | accuracy = 0.6644444444444444


Epoch[2] Batch[455] Speed: 1.2543567532488409 samples/sec                   batch loss = 1118.6300548315048 | accuracy = 0.6642857142857143


Epoch[2] Batch[460] Speed: 1.2580494029047617 samples/sec                   batch loss = 1132.897154211998 | accuracy = 0.6641304347826087


Epoch[2] Batch[465] Speed: 1.2601664536997506 samples/sec                   batch loss = 1143.2207338809967 | accuracy = 0.6645161290322581


Epoch[2] Batch[470] Speed: 1.2594988389742796 samples/sec                   batch loss = 1155.9577776193619 | accuracy = 0.6648936170212766


Epoch[2] Batch[475] Speed: 1.2543482190906994 samples/sec                   batch loss = 1168.2549277544022 | accuracy = 0.6647368421052632


Epoch[2] Batch[480] Speed: 1.2543581599893534 samples/sec                   batch loss = 1178.340523481369 | accuracy = 0.6661458333333333


Epoch[2] Batch[485] Speed: 1.2528006757584211 samples/sec                   batch loss = 1190.3774206638336 | accuracy = 0.6664948453608247


Epoch[2] Batch[490] Speed: 1.2546657492177042 samples/sec                   batch loss = 1200.4299983978271 | accuracy = 0.6663265306122449


Epoch[2] Batch[495] Speed: 1.2609982479888702 samples/sec                   batch loss = 1211.1430851221085 | accuracy = 0.6661616161616162


Epoch[2] Batch[500] Speed: 1.2617442153942695 samples/sec                   batch loss = 1223.2734462022781 | accuracy = 0.6665


Epoch[2] Batch[505] Speed: 1.2582183806112222 samples/sec                   batch loss = 1235.2201310396194 | accuracy = 0.6663366336633664


Epoch[2] Batch[510] Speed: 1.2521629891149724 samples/sec                   batch loss = 1249.6079946756363 | accuracy = 0.6647058823529411


Epoch[2] Batch[515] Speed: 1.2560565184466186 samples/sec                   batch loss = 1261.3566517829895 | accuracy = 0.6655339805825242


Epoch[2] Batch[520] Speed: 1.2605846762453439 samples/sec                   batch loss = 1272.718846321106 | accuracy = 0.6663461538461538


Epoch[2] Batch[525] Speed: 1.2571868242342514 samples/sec                   batch loss = 1283.8922423124313 | accuracy = 0.6666666666666666


Epoch[2] Batch[530] Speed: 1.250293603195142 samples/sec                   batch loss = 1295.0281069278717 | accuracy = 0.6669811320754717


Epoch[2] Batch[535] Speed: 1.252796091822342 samples/sec                   batch loss = 1306.5798106193542 | accuracy = 0.6658878504672897


Epoch[2] Batch[540] Speed: 1.2581205359492984 samples/sec                   batch loss = 1316.848918914795 | accuracy = 0.6662037037037037


Epoch[2] Batch[545] Speed: 1.2546294385207335 samples/sec                   batch loss = 1329.489073753357 | accuracy = 0.6660550458715596


Epoch[2] Batch[550] Speed: 1.2579505469918746 samples/sec                   batch loss = 1338.9736952781677 | accuracy = 0.6677272727272727


Epoch[2] Batch[555] Speed: 1.2501218102353384 samples/sec                   batch loss = 1350.9500192403793 | accuracy = 0.6675675675675675


Epoch[2] Batch[560] Speed: 1.2503893022847465 samples/sec                   batch loss = 1363.2796038389206 | accuracy = 0.6678571428571428


Epoch[2] Batch[565] Speed: 1.2506516879629548 samples/sec                   batch loss = 1374.7619733810425 | accuracy = 0.6672566371681415


Epoch[2] Batch[570] Speed: 1.2526043449587785 samples/sec                   batch loss = 1384.8330490589142 | accuracy = 0.6684210526315789


Epoch[2] Batch[575] Speed: 1.2547758200625128 samples/sec                   batch loss = 1397.2536916732788 | accuracy = 0.6678260869565218


Epoch[2] Batch[580] Speed: 1.2551760100400817 samples/sec                   batch loss = 1408.216940164566 | accuracy = 0.6681034482758621


Epoch[2] Batch[585] Speed: 1.247779654633437 samples/sec                   batch loss = 1422.1586791276932 | accuracy = 0.6675213675213675


Epoch[2] Batch[590] Speed: 1.2511947984615654 samples/sec                   batch loss = 1435.0172461271286 | accuracy = 0.6682203389830509


Epoch[2] Batch[595] Speed: 1.2562432095077 samples/sec                   batch loss = 1443.9525426626205 | accuracy = 0.6701680672268907


Epoch[2] Batch[600] Speed: 1.2509175414601093 samples/sec                   batch loss = 1453.3738851547241 | accuracy = 0.6716666666666666


Epoch[2] Batch[605] Speed: 1.2549483313822805 samples/sec                   batch loss = 1466.1762903928757 | accuracy = 0.6727272727272727


Epoch[2] Batch[610] Speed: 1.2507098658096616 samples/sec                   batch loss = 1479.1573884487152 | accuracy = 0.6725409836065573


Epoch[2] Batch[615] Speed: 1.2566614671597771 samples/sec                   batch loss = 1488.2779703140259 | accuracy = 0.6739837398373983


Epoch[2] Batch[620] Speed: 1.2599064000048061 samples/sec                   batch loss = 1501.9299228191376 | accuracy = 0.6733870967741935


Epoch[2] Batch[625] Speed: 1.2533699057336323 samples/sec                   batch loss = 1513.2581222057343 | accuracy = 0.6732


Epoch[2] Batch[630] Speed: 1.2528920808337611 samples/sec                   batch loss = 1524.2022253274918 | accuracy = 0.6726190476190477


Epoch[2] Batch[635] Speed: 1.2487032506132352 samples/sec                   batch loss = 1536.10762155056 | accuracy = 0.6724409448818898


Epoch[2] Batch[640] Speed: 1.253603287825014 samples/sec                   batch loss = 1545.9259123802185 | accuracy = 0.6734375


Epoch[2] Batch[645] Speed: 1.2524395830908723 samples/sec                   batch loss = 1555.8528559207916 | accuracy = 0.6744186046511628


Epoch[2] Batch[650] Speed: 1.2524054579301842 samples/sec                   batch loss = 1567.2529579401016 | accuracy = 0.6746153846153846


Epoch[2] Batch[655] Speed: 1.2550421155950946 samples/sec                   batch loss = 1577.1690709590912 | accuracy = 0.6751908396946565


Epoch[2] Batch[660] Speed: 1.2596099486357242 samples/sec                   batch loss = 1591.3417797088623 | accuracy = 0.6746212121212121


Epoch[2] Batch[665] Speed: 1.2565132337742095 samples/sec                   batch loss = 1603.7367098331451 | accuracy = 0.674812030075188


Epoch[2] Batch[670] Speed: 1.257569229505341 samples/sec                   batch loss = 1613.0832780599594 | accuracy = 0.6757462686567164


Epoch[2] Batch[675] Speed: 1.250832952168576 samples/sec                   batch loss = 1625.0596284866333 | accuracy = 0.6759259259259259


Epoch[2] Batch[680] Speed: 1.2549591266488564 samples/sec                   batch loss = 1636.8232803344727 | accuracy = 0.6768382352941177


Epoch[2] Batch[685] Speed: 1.2523480570604142 samples/sec                   batch loss = 1649.6121599674225 | accuracy = 0.6766423357664234


Epoch[2] Batch[690] Speed: 1.2514893553536528 samples/sec                   batch loss = 1659.5730794668198 | accuracy = 0.6764492753623188


Epoch[2] Batch[695] Speed: 1.2541183094430746 samples/sec                   batch loss = 1669.0861678123474 | accuracy = 0.6773381294964029


Epoch[2] Batch[700] Speed: 1.2512889557215583 samples/sec                   batch loss = 1680.2495337724686 | accuracy = 0.6775


Epoch[2] Batch[705] Speed: 1.2518288837533997 samples/sec                   batch loss = 1696.1562153100967 | accuracy = 0.676595744680851


Epoch[2] Batch[710] Speed: 1.2497633771081338 samples/sec                   batch loss = 1707.134643793106 | accuracy = 0.6771126760563381


Epoch[2] Batch[715] Speed: 1.2498336692973897 samples/sec                   batch loss = 1720.0090264678001 | accuracy = 0.6762237762237763


Epoch[2] Batch[720] Speed: 1.2515337935882789 samples/sec                   batch loss = 1730.9221655726433 | accuracy = 0.6763888888888889


Epoch[2] Batch[725] Speed: 1.2469171027451071 samples/sec                   batch loss = 1740.7865651249886 | accuracy = 0.676896551724138


Epoch[2] Batch[730] Speed: 1.2445683713381606 samples/sec                   batch loss = 1752.1416179537773 | accuracy = 0.677054794520548


Epoch[2] Batch[735] Speed: 1.245051324218263 samples/sec                   batch loss = 1764.2664881944656 | accuracy = 0.6772108843537415


Epoch[2] Batch[740] Speed: 1.2370438887965172 samples/sec                   batch loss = 1773.0886495113373 | accuracy = 0.6790540540540541


Epoch[2] Batch[745] Speed: 1.2482580451044132 samples/sec                   batch loss = 1783.1089676618576 | accuracy = 0.6795302013422819


Epoch[2] Batch[750] Speed: 1.2468951395275312 samples/sec                   batch loss = 1795.8230789899826 | accuracy = 0.68


Epoch[2] Batch[755] Speed: 1.2523379610346297 samples/sec                   batch loss = 1806.440100312233 | accuracy = 0.6811258278145695


Epoch[2] Batch[760] Speed: 1.2420545658928226 samples/sec                   batch loss = 1819.3778063058853 | accuracy = 0.68125


Epoch[2] Batch[765] Speed: 1.2421188437078943 samples/sec                   batch loss = 1833.0280147790909 | accuracy = 0.6813725490196079


Epoch[2] Batch[770] Speed: 1.2507563933352697 samples/sec                   batch loss = 1847.094240784645 | accuracy = 0.6805194805194805


Epoch[2] Batch[775] Speed: 1.2464808562114393 samples/sec                   batch loss = 1859.8704280853271 | accuracy = 0.6806451612903226


Epoch[2] Batch[780] Speed: 1.2497111518729225 samples/sec                   batch loss = 1873.29843044281 | accuracy = 0.6801282051282052


Epoch[2] Batch[785] Speed: 1.2518595213372257 samples/sec                   batch loss = 1883.61661362648 | accuracy = 0.6802547770700637


[Epoch 2] training: accuracy=0.679251269035533
[Epoch 2] time cost: 645.0299310684204
[Epoch 2] validation: validation accuracy=0.73


## Next steps

Now that you have completed training and predicting with a neural network on GPUs, you reached the conclusion of the crash course. Congratulations.
If you are keen on studying more, checkout [D2L.ai](https://d2l.ai),
[GluonCV](https://cv.gluon.ai/tutorials/index.html), [GluonNLP](https://nlp.gluon.ai),
[GluonTS](https://ts.gluon.ai/), [AutoGluon](https://auto.gluon.ai).