<!--- Licensed to the Apache Software Foundation (ASF) under one -->
<!--- or more contributor license agreements.  See the NOTICE file -->
<!--- distributed with this work for additional information -->
<!--- regarding copyright ownership.  The ASF licenses this file -->
<!--- to you under the Apache License, Version 2.0 (the -->
<!--- "License"); you may not use this file except in compliance -->
<!--- with the License.  You may obtain a copy of the License at -->

<!---   http://www.apache.org/licenses/LICENSE-2.0 -->

<!--- Unless required by applicable law or agreed to in writing, -->
<!--- software distributed under the License is distributed on an -->
<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->
<!--- KIND, either express or implied.  See the License for the -->
<!--- specific language governing permissions and limitations -->
<!--- under the License. -->

# Step 7: Load and Run a NN using GPU

In this step, you will learn how to use graphics processing units (GPUs) with MXNet. If you use GPUs to train and deploy neural networks, you may be able to train or perform inference quicker than with central processing units (CPUs).

## Prerequisites

Before you start the steps, make sure you have at least one Nvidia GPU on your machine and make sure that you have CUDA properly installed. GPUs from AMD and Intel are not supported. Additionally, you will need to install the GPU-enabled version of MXNet. You can find information about how to install the GPU version of MXNet for your system [here](https://mxnet.apache.org/versions/1.4.1/install/ubuntu_setup.html).

You can use the following command to view the number GPUs that are available to MXNet.

In [1]:
from mxnet import np, npx, gluon, autograd
from mxnet.gluon import nn
import time
npx.set_np()

npx.num_gpus() #This command provides the number of GPUs MXNet can access

1

## Allocate data to a GPU

MXNet's ndarray is very similar to NumPy's. One major difference is that MXNet's ndarray has a `context` attribute specifieing which device an array is on. By default, arrays are stored on `npx.cpu()`. To change it to the first GPU, you can use the following code, `npx.gpu()` or `npx.gpu(0)` to indicate the first GPU.

In [2]:
gpu = npx.gpu() if npx.num_gpus() > 0 else npx.cpu()
x = np.ones((3,4), ctx=gpu)
x

[10:30:06] /work/mxnet/src/storage/storage.cc:202: Using Pooled (Naive) StorageManager for GPU


array([[1., 1., 1., 1.],
       [1., 1., 1., 1.],
       [1., 1., 1., 1.]], ctx=gpu(0))

If you're using a CPU, MXNet allocates data on the main memory and tries to use as many CPU cores as possible.  If there are multiple GPUs, MXNet will tell you which GPUs the ndarray is allocated on.

Assuming there is at least two GPUs. You can create another ndarray and assign it to a different GPU. If you only have one GPU, then you will get an error trying to run this code. In the example code here, you will copy `x` to the second GPU, `npx.gpu(1)`:

In [3]:
gpu_1 = npx.gpu(1) if npx.num_gpus() > 1 else npx.cpu()
x.copyto(gpu_1)

[10:30:06] /work/mxnet/src/storage/storage.cc:202: Using Pooled (Naive) StorageManager for CPU


array([[1., 1., 1., 1.],
       [1., 1., 1., 1.],
       [1., 1., 1., 1.]])

MXNet requries that users explicitly move data between devices. But several operators such as `print`, and `asnumpy`, will implicitly move data to main memory.

## Choosing GPU Ids
If you have multiple GPUs on your machine, MXNet can access each of them through 0-indexing with `npx`. As you saw before, the first GPU was accessed using `npx.gpu(0)`, and the second using `npx.gpu(1)`. This extends to however many GPUs your machine has. So if your machine has eight GPUs, the last GPU is accessed using `npx.gpu(7)`. This allows you to select which GPUs to use for operations and training. You might find it particularly useful when you want to leverage multiple GPUs while training neural networks.

## Run an operation on a GPU

To perform an operation on a particular GPU, you only need to guarantee that the input of an operation is already on that GPU. The output is allocated on the same GPU as well. Almost all operators in the `np` and `npx` module support running on a GPU.

In [4]:
y = np.random.uniform(size=(3,4), ctx=gpu)
x + y

array([[1.7402194, 1.9209938, 1.0390205, 1.9689629],
       [1.9251406, 1.4463501, 1.6673192, 1.1099306],
       [1.4702187, 1.5131936, 1.7761751, 1.2947657]], ctx=gpu(0))

Remember that if the inputs are not on the same GPU, you will get an error.

## Run a neural network on a GPU

To run a neural network on a GPU, you only need to copy and move the input data and parameters to the GPU. To demonstrate this you can reuse the previously defined LeafNetwork in [Training Neural Networks](6-train-nn.md). The following code example shows this.

In [5]:
# The convolutional block has a convolution layer, a max pool layer and a batch normalization layer
def conv_block(filters, kernel_size=2, stride=2, batch_norm=True):
    conv_block = nn.HybridSequential()
    conv_block.add(nn.Conv2D(channels=filters, kernel_size=kernel_size, activation='relu'),
              nn.MaxPool2D(pool_size=4, strides=stride))
    if batch_norm:
        conv_block.add(nn.BatchNorm())
    return conv_block

# The dense block consists of a dense layer and a dropout layer
def dense_block(neurons, activation='relu', dropout=0.2):
    dense_block = nn.HybridSequential()
    dense_block.add(nn.Dense(neurons, activation=activation))
    if dropout:
        dense_block.add(nn.Dropout(dropout))
    return dense_block

# Create neural network blueprint using the blocks
class LeafNetwork(nn.HybridBlock):
    def __init__(self):
        super(LeafNetwork, self).__init__()
        self.conv1 = conv_block(32)
        self.conv2 = conv_block(64)
        self.conv3 = conv_block(128)
        self.flatten = nn.Flatten()
        self.dense1 = dense_block(100)
        self.dense2 = dense_block(10)
        self.dense3 = nn.Dense(2)

    def forward(self, batch):
        batch = self.conv1(batch)
        batch = self.conv2(batch)
        batch = self.conv3(batch)
        batch = self.flatten(batch)
        batch = self.dense1(batch)
        batch = self.dense2(batch)
        batch = self.dense3(batch)

        return batch

Load the saved parameters onto GPU 0 directly as shown below; additionally, you could use `net.collect_params().reset_ctx(gpu)` to change the device.

In [6]:
net = LeafNetwork()
net.load_parameters('leaf_models.params', ctx=gpu)

Use the following command to create input data on GPU 0. The forward function will then run on GPU 0.

In [7]:
x = np.random.uniform(size=(1, 3, 128, 128), ctx=gpu)
net(x)

[10:30:06] /work/mxnet/src/operator/nn/./cudnn/./cudnn_algoreg-inl.h:106: Running performance tests to find the best convolution algorithm, this can take a while... (set the environment variable MXNET_CUDNN_AUTOTUNE_DEFAULT to 0 to disable)


array([[ 2.76339 , -4.679167]], ctx=gpu(0))

## Training with multiple GPUs

Finally, you will see how you can use multiple GPUs to jointly train a neural network through data parallelism. To elaborate on what data parallelism is, assume there are *n* GPUs, then you can split each data batch into *n* parts, and use a GPU on each of these parts to run the forward and backward passes on the seperate chunks of the data.

First copy the data definitions with the following commands, and the transform functions from the tutorial [Training Neural Networks](6-train-nn.md).

In [8]:
# Import transforms as compose a series of transformations to the images
from mxnet.gluon.data.vision import transforms

jitter_param = 0.05

# mean and std for normalizing image value in range (0,1)
mean = [0.485, 0.456, 0.406]
std = [0.229, 0.224, 0.225]

training_transformer = transforms.Compose([
    transforms.Resize(size=224, keep_ratio=True),
    transforms.CenterCrop(128),
    transforms.RandomFlipLeftRight(),
    transforms.RandomColorJitter(contrast=jitter_param),
    transforms.ToTensor(),
    transforms.Normalize(mean, std)
])

validation_transformer = transforms.Compose([
    transforms.Resize(size=224, keep_ratio=True),
    transforms.CenterCrop(128),
    transforms.ToTensor(),
    transforms.Normalize(mean, std)
])

# Use ImageFolderDataset to create a Dataset object from directory structure
train_dataset = gluon.data.vision.ImageFolderDataset('./datasets/train')
val_dataset = gluon.data.vision.ImageFolderDataset('./datasets/validation')
test_dataset = gluon.data.vision.ImageFolderDataset('./datasets/test')

# Create data loaders
batch_size = 4
train_loader = gluon.data.DataLoader(train_dataset.transform_first(training_transformer),batch_size=batch_size, shuffle=True, try_nopython=True)
validation_loader = gluon.data.DataLoader(val_dataset.transform_first(validation_transformer), batch_size=batch_size, try_nopython=True)
test_loader = gluon.data.DataLoader(test_dataset.transform_first(validation_transformer), batch_size=batch_size, try_nopython=True)

### Define a helper function
This is the same test function defined previously in the **Step 6**.

In [9]:
# Function to return the accuracy for the validation and test set
def test(val_data, devices):
    acc = gluon.metric.Accuracy()
    for batch in val_data:
        data, label = batch[0], batch[1]
        data_list = gluon.utils.split_and_load(data, devices)
        label_list = gluon.utils.split_and_load(label, devices)
        outputs = [net(X) for X in data_list]
        acc.update(label_list, outputs)

    _, accuracy = acc.get()
    return accuracy

The training loop is quite similar to that shown earlier. The major differences are highlighted in the following code.

In [10]:
# Diff 1: Use two GPUs for training.
available_gpus = [npx.gpu(i) for i in range(npx.num_gpus())]
num_gpus = 2
devices = available_gpus[:num_gpus]
print('Using {} GPUs'.format(len(devices)))

# Diff 2: reinitialize the parameters and place them on multiple GPUs
net.initialize(force_reinit=True, ctx=devices)

# Loss and trainer are the same as before
loss_fn = gluon.loss.SoftmaxCrossEntropyLoss()
optimizer = 'sgd'
optimizer_params = {'learning_rate': 0.001}
trainer = gluon.Trainer(net.collect_params(), optimizer, optimizer_params)

epochs = 2
accuracy = gluon.metric.Accuracy()
log_interval = 5

for epoch in range(epochs):
    train_loss = 0.
    tic = time.time()
    btic = time.time()
    accuracy.reset()
    for idx, batch in enumerate(train_loader):
        data, label = batch[0], batch[1]

        # Diff 3: split batch and load into corresponding devices
        data_list = gluon.utils.split_and_load(data, devices)
        label_list = gluon.utils.split_and_load(label, devices)

        # Diff 4: run forward and backward on each devices.
        # MXNet will automatically run them in parallel
        with autograd.record():
            outputs = [net(X)
                      for X in data_list]
            losses = [loss_fn(output, label)
                      for output, label in zip(outputs, label_list)]
        for l in losses:
            l.backward()
        trainer.step(batch_size)

        # Diff 5: sum losses over all devices. Here, the float
        # function will copy data into CPU.
        train_loss += sum([float(l.sum()) for l in losses])
        accuracy.update(label_list, outputs)
        if log_interval and (idx + 1) % log_interval == 0:
            _, acc = accuracy.get()

            print(f"""Epoch[{epoch + 1}] Batch[{idx + 1}] Speed: {batch_size / (time.time() - btic)} samples/sec \
                  batch loss = {train_loss} | accuracy = {acc}""")
            btic = time.time()

    _, acc = accuracy.get()

    acc_val = test(validation_loader, devices)
    print(f"[Epoch {epoch + 1}] training: accuracy={acc}")
    print(f"[Epoch {epoch + 1}] time cost: {time.time() - tic}")
    print(f"[Epoch {epoch + 1}] validation: validation accuracy={acc_val}")

Using 1 GPUs


Epoch[1] Batch[5] Speed: 0.755749922925942 samples/sec                   batch loss = 13.638998031616211 | accuracy = 0.65


Epoch[1] Batch[10] Speed: 1.2456605138051442 samples/sec                   batch loss = 28.305928468704224 | accuracy = 0.55


Epoch[1] Batch[15] Speed: 1.2558046440146189 samples/sec                   batch loss = 41.200326681137085 | accuracy = 0.6166666666666667


Epoch[1] Batch[20] Speed: 1.2522487865427347 samples/sec                   batch loss = 55.85567569732666 | accuracy = 0.575


Epoch[1] Batch[25] Speed: 1.251849246370639 samples/sec                   batch loss = 69.64195537567139 | accuracy = 0.56


Epoch[1] Batch[30] Speed: 1.2576473788655742 samples/sec                   batch loss = 83.87509369850159 | accuracy = 0.5416666666666666


Epoch[1] Batch[35] Speed: 1.24909845988448 samples/sec                   batch loss = 98.59664678573608 | accuracy = 0.5214285714285715


Epoch[1] Batch[40] Speed: 1.250110911732442 samples/sec                   batch loss = 113.03827619552612 | accuracy = 0.5125


Epoch[1] Batch[45] Speed: 1.2547886769961836 samples/sec                   batch loss = 126.34587621688843 | accuracy = 0.5111111111111111


Epoch[1] Batch[50] Speed: 1.2528462362936004 samples/sec                   batch loss = 139.7075650691986 | accuracy = 0.505


Epoch[1] Batch[55] Speed: 1.2481312865146421 samples/sec                   batch loss = 153.45012855529785 | accuracy = 0.509090909090909


Epoch[1] Batch[60] Speed: 1.242175034772605 samples/sec                   batch loss = 167.59816002845764 | accuracy = 0.5041666666666667


Epoch[1] Batch[65] Speed: 1.2422989305792864 samples/sec                   batch loss = 182.19793319702148 | accuracy = 0.5115384615384615


Epoch[1] Batch[70] Speed: 1.2435920661374618 samples/sec                   batch loss = 195.56180667877197 | accuracy = 0.5178571428571429


Epoch[1] Batch[75] Speed: 1.2404729840336908 samples/sec                   batch loss = 208.98277020454407 | accuracy = 0.52


Epoch[1] Batch[80] Speed: 1.241394057885644 samples/sec                   batch loss = 223.90000534057617 | accuracy = 0.509375


Epoch[1] Batch[85] Speed: 1.2454207418342182 samples/sec                   batch loss = 237.26024317741394 | accuracy = 0.5117647058823529


Epoch[1] Batch[90] Speed: 1.2417722453080347 samples/sec                   batch loss = 251.3291301727295 | accuracy = 0.5138888888888888


Epoch[1] Batch[95] Speed: 1.2456070587599417 samples/sec                   batch loss = 265.554438829422 | accuracy = 0.5105263157894737


Epoch[1] Batch[100] Speed: 1.2513365529892104 samples/sec                   batch loss = 279.48835158348083 | accuracy = 0.51


Epoch[1] Batch[105] Speed: 1.2463209416737182 samples/sec                   batch loss = 293.0321798324585 | accuracy = 0.5166666666666667


Epoch[1] Batch[110] Speed: 1.2268536278699647 samples/sec                   batch loss = 306.8075454235077 | accuracy = 0.5181818181818182


Epoch[1] Batch[115] Speed: 1.2411756668190899 samples/sec                   batch loss = 320.3475151062012 | accuracy = 0.5239130434782608


Epoch[1] Batch[120] Speed: 1.2439358075995588 samples/sec                   batch loss = 334.1985981464386 | accuracy = 0.5270833333333333


Epoch[1] Batch[125] Speed: 1.244908403539243 samples/sec                   batch loss = 347.7584230899811 | accuracy = 0.53


Epoch[1] Batch[130] Speed: 1.2459149042981543 samples/sec                   batch loss = 361.58333230018616 | accuracy = 0.5326923076923077


Epoch[1] Batch[135] Speed: 1.2515576944480997 samples/sec                   batch loss = 375.365109205246 | accuracy = 0.5351851851851852


Epoch[1] Batch[140] Speed: 1.2526588699231644 samples/sec                   batch loss = 390.04282689094543 | accuracy = 0.5285714285714286


Epoch[1] Batch[145] Speed: 1.2504469897275652 samples/sec                   batch loss = 403.81980204582214 | accuracy = 0.5327586206896552


Epoch[1] Batch[150] Speed: 1.2513722999847692 samples/sec                   batch loss = 417.5286362171173 | accuracy = 0.53


Epoch[1] Batch[155] Speed: 1.2528913323245312 samples/sec                   batch loss = 431.3340816497803 | accuracy = 0.5274193548387097


Epoch[1] Batch[160] Speed: 1.2556509740880595 samples/sec                   batch loss = 444.9142014980316 | accuracy = 0.5328125


Epoch[1] Batch[165] Speed: 1.2508824732872146 samples/sec                   batch loss = 458.8919200897217 | accuracy = 0.5303030303030303


Epoch[1] Batch[170] Speed: 1.250130752738617 samples/sec                   batch loss = 472.9213526248932 | accuracy = 0.5352941176470588


Epoch[1] Batch[175] Speed: 1.2474628173202447 samples/sec                   batch loss = 487.00474405288696 | accuracy = 0.5342857142857143


Epoch[1] Batch[180] Speed: 1.244053228509029 samples/sec                   batch loss = 500.3417568206787 | accuracy = 0.5375


Epoch[1] Batch[185] Speed: 1.2480025112965574 samples/sec                   batch loss = 514.7016408443451 | accuracy = 0.5324324324324324


Epoch[1] Batch[190] Speed: 1.2465944973261758 samples/sec                   batch loss = 527.727808713913 | accuracy = 0.5381578947368421


Epoch[1] Batch[195] Speed: 1.2453358774665249 samples/sec                   batch loss = 540.7654023170471 | accuracy = 0.5435897435897435


Epoch[1] Batch[200] Speed: 1.2524239694304093 samples/sec                   batch loss = 554.1429719924927 | accuracy = 0.54375


Epoch[1] Batch[205] Speed: 1.250254190984097 samples/sec                   batch loss = 567.3661985397339 | accuracy = 0.55


Epoch[1] Batch[210] Speed: 1.2276611464701557 samples/sec                   batch loss = 581.371132850647 | accuracy = 0.5488095238095239


Epoch[1] Batch[215] Speed: 1.2314097107263013 samples/sec                   batch loss = 594.4544930458069 | accuracy = 0.5523255813953488


Epoch[1] Batch[220] Speed: 1.2490035161784334 samples/sec                   batch loss = 607.6355128288269 | accuracy = 0.5568181818181818


Epoch[1] Batch[225] Speed: 1.2549568737083225 samples/sec                   batch loss = 621.7872412204742 | accuracy = 0.5533333333333333


Epoch[1] Batch[230] Speed: 1.2449268788325605 samples/sec                   batch loss = 635.6880567073822 | accuracy = 0.5532608695652174


Epoch[1] Batch[235] Speed: 1.2532427621811058 samples/sec                   batch loss = 649.1839487552643 | accuracy = 0.5563829787234043


Epoch[1] Batch[240] Speed: 1.2492262521624955 samples/sec                   batch loss = 662.8717873096466 | accuracy = 0.5541666666666667


Epoch[1] Batch[245] Speed: 1.2377219628452114 samples/sec                   batch loss = 676.6592261791229 | accuracy = 0.5551020408163265


Epoch[1] Batch[250] Speed: 1.2438568630329343 samples/sec                   batch loss = 689.9573786258698 | accuracy = 0.559


Epoch[1] Batch[255] Speed: 1.2531508379203924 samples/sec                   batch loss = 703.6983916759491 | accuracy = 0.5607843137254902


Epoch[1] Batch[260] Speed: 1.2509962654771232 samples/sec                   batch loss = 717.2360138893127 | accuracy = 0.5634615384615385


Epoch[1] Batch[265] Speed: 1.2450493838988401 samples/sec                   batch loss = 730.8170008659363 | accuracy = 0.5632075471698114


Epoch[1] Batch[270] Speed: 1.2468464895855806 samples/sec                   batch loss = 744.8479187488556 | accuracy = 0.5592592592592592


Epoch[1] Batch[275] Speed: 1.248482466291519 samples/sec                   batch loss = 758.1529226303101 | accuracy = 0.5618181818181818


Epoch[1] Batch[280] Speed: 1.2490133725420858 samples/sec                   batch loss = 771.6246266365051 | accuracy = 0.5633928571428571


Epoch[1] Batch[285] Speed: 1.2436208269152937 samples/sec                   batch loss = 784.6887493133545 | accuracy = 0.5649122807017544


Epoch[1] Batch[290] Speed: 1.2515640432674375 samples/sec                   batch loss = 797.6712799072266 | accuracy = 0.5698275862068966


Epoch[1] Batch[295] Speed: 1.2452032422813535 samples/sec                   batch loss = 811.0652117729187 | accuracy = 0.5694915254237288


Epoch[1] Batch[300] Speed: 1.2501785414090882 samples/sec                   batch loss = 824.9757421016693 | accuracy = 0.5683333333333334


Epoch[1] Batch[305] Speed: 1.2480563578305657 samples/sec                   batch loss = 838.5929770469666 | accuracy = 0.5680327868852459


Epoch[1] Batch[310] Speed: 1.24857705203419 samples/sec                   batch loss = 852.0879738330841 | accuracy = 0.567741935483871


Epoch[1] Batch[315] Speed: 1.245533912318125 samples/sec                   batch loss = 865.3755941390991 | accuracy = 0.5674603174603174


Epoch[1] Batch[320] Speed: 1.231461773355524 samples/sec                   batch loss = 878.7998859882355 | accuracy = 0.56796875


Epoch[1] Batch[325] Speed: 1.2344255569375044 samples/sec                   batch loss = 892.5214138031006 | accuracy = 0.5676923076923077


Epoch[1] Batch[330] Speed: 1.2376134940845462 samples/sec                   batch loss = 905.9720368385315 | accuracy = 0.5674242424242424


Epoch[1] Batch[335] Speed: 1.251470124659052 samples/sec                   batch loss = 920.2192580699921 | accuracy = 0.564179104477612


Epoch[1] Batch[340] Speed: 1.2437445503013578 samples/sec                   batch loss = 934.5265734195709 | accuracy = 0.5632352941176471


Epoch[1] Batch[345] Speed: 1.2430523109569016 samples/sec                   batch loss = 947.5619313716888 | accuracy = 0.5644927536231884


Epoch[1] Batch[350] Speed: 1.249651949933228 samples/sec                   batch loss = 960.826878786087 | accuracy = 0.565


Epoch[1] Batch[355] Speed: 1.2422024424064402 samples/sec                   batch loss = 974.6854753494263 | accuracy = 0.5647887323943662


Epoch[1] Batch[360] Speed: 1.2472975505716999 samples/sec                   batch loss = 988.5173561573029 | accuracy = 0.5645833333333333


Epoch[1] Batch[365] Speed: 1.2479085695040124 samples/sec                   batch loss = 1002.1012246608734 | accuracy = 0.5657534246575342


Epoch[1] Batch[370] Speed: 1.245058715966706 samples/sec                   batch loss = 1015.4178507328033 | accuracy = 0.5662162162162162


Epoch[1] Batch[375] Speed: 1.2418166395513914 samples/sec                   batch loss = 1029.258048772812 | accuracy = 0.566


Epoch[1] Batch[380] Speed: 1.2450147363559214 samples/sec                   batch loss = 1042.8928744792938 | accuracy = 0.5671052631578948


Epoch[1] Batch[385] Speed: 1.2418434798468208 samples/sec                   batch loss = 1055.2592959403992 | accuracy = 0.5701298701298702


Epoch[1] Batch[390] Speed: 1.2444141159855868 samples/sec                   batch loss = 1068.8212523460388 | accuracy = 0.5705128205128205


Epoch[1] Batch[395] Speed: 1.2440338566829459 samples/sec                   batch loss = 1082.8132207393646 | accuracy = 0.5683544303797469


Epoch[1] Batch[400] Speed: 1.2422793373924155 samples/sec                   batch loss = 1096.8064920902252 | accuracy = 0.565625


Epoch[1] Batch[405] Speed: 1.2420485890377908 samples/sec                   batch loss = 1110.6968533992767 | accuracy = 0.5660493827160494


Epoch[1] Batch[410] Speed: 1.2447131532146714 samples/sec                   batch loss = 1122.344343662262 | accuracy = 0.5695121951219512


Epoch[1] Batch[415] Speed: 1.2442162524869913 samples/sec                   batch loss = 1135.6362655162811 | accuracy = 0.5710843373493976


Epoch[1] Batch[420] Speed: 1.2448926075983104 samples/sec                   batch loss = 1149.3384010791779 | accuracy = 0.5708333333333333


Epoch[1] Batch[425] Speed: 1.2408042662941565 samples/sec                   batch loss = 1163.1433062553406 | accuracy = 0.5694117647058824


Epoch[1] Batch[430] Speed: 1.2300597000546434 samples/sec                   batch loss = 1177.0542123317719 | accuracy = 0.5691860465116279


Epoch[1] Batch[435] Speed: 1.2385259361428766 samples/sec                   batch loss = 1189.5388071537018 | accuracy = 0.5701149425287356


Epoch[1] Batch[440] Speed: 1.2543516890091213 samples/sec                   batch loss = 1203.1949067115784 | accuracy = 0.5704545454545454


Epoch[1] Batch[445] Speed: 1.2490141164249011 samples/sec                   batch loss = 1215.8655285835266 | accuracy = 0.5713483146067416


Epoch[1] Batch[450] Speed: 1.2502109615345063 samples/sec                   batch loss = 1229.2715682983398 | accuracy = 0.5711111111111111


Epoch[1] Batch[455] Speed: 1.237648643915431 samples/sec                   batch loss = 1243.480652809143 | accuracy = 0.5703296703296703


Epoch[1] Batch[460] Speed: 1.2530120409941843 samples/sec                   batch loss = 1256.8784809112549 | accuracy = 0.571195652173913


Epoch[1] Batch[465] Speed: 1.2455814426568101 samples/sec                   batch loss = 1269.9697880744934 | accuracy = 0.5725806451612904


Epoch[1] Batch[470] Speed: 1.2437524797630577 samples/sec                   batch loss = 1284.2878251075745 | accuracy = 0.5723404255319149


Epoch[1] Batch[475] Speed: 1.2506588666673326 samples/sec                   batch loss = 1297.7682609558105 | accuracy = 0.5726315789473684


Epoch[1] Batch[480] Speed: 1.2503589229925354 samples/sec                   batch loss = 1310.7916049957275 | accuracy = 0.5729166666666666


Epoch[1] Batch[485] Speed: 1.2486793657144764 samples/sec                   batch loss = 1323.7732303142548 | accuracy = 0.5742268041237113


Epoch[1] Batch[490] Speed: 1.2458971398425602 samples/sec                   batch loss = 1335.9629290103912 | accuracy = 0.5760204081632653


Epoch[1] Batch[495] Speed: 1.2465018787187192 samples/sec                   batch loss = 1349.6669144630432 | accuracy = 0.5767676767676768


Epoch[1] Batch[500] Speed: 1.2428423587699005 samples/sec                   batch loss = 1363.0152988433838 | accuracy = 0.5775


Epoch[1] Batch[505] Speed: 1.2449729770803848 samples/sec                   batch loss = 1377.1131873130798 | accuracy = 0.5767326732673267


Epoch[1] Batch[510] Speed: 1.245512367659055 samples/sec                   batch loss = 1390.6436731815338 | accuracy = 0.5769607843137254


Epoch[1] Batch[515] Speed: 1.2470089488854068 samples/sec                   batch loss = 1401.9110164642334 | accuracy = 0.5805825242718446


Epoch[1] Batch[520] Speed: 1.2418742740806477 samples/sec                   batch loss = 1415.5971755981445 | accuracy = 0.5807692307692308


Epoch[1] Batch[525] Speed: 1.2505630330962236 samples/sec                   batch loss = 1427.9152319431305 | accuracy = 0.5814285714285714


Epoch[1] Batch[530] Speed: 1.24864061278387 samples/sec                   batch loss = 1440.7696540355682 | accuracy = 0.5830188679245283


Epoch[1] Batch[535] Speed: 1.247959715899979 samples/sec                   batch loss = 1453.3459694385529 | accuracy = 0.5850467289719626


Epoch[1] Batch[540] Speed: 1.2504920996108446 samples/sec                   batch loss = 1466.6530652046204 | accuracy = 0.5847222222222223


Epoch[1] Batch[545] Speed: 1.2473001470170044 samples/sec                   batch loss = 1479.1804394721985 | accuracy = 0.5844036697247706


Epoch[1] Batch[550] Speed: 1.250643949971308 samples/sec                   batch loss = 1491.0547177791595 | accuracy = 0.5859090909090909


Epoch[1] Batch[555] Speed: 1.2490018424717546 samples/sec                   batch loss = 1504.3684644699097 | accuracy = 0.5864864864864865


Epoch[1] Batch[560] Speed: 1.2498079721629474 samples/sec                   batch loss = 1516.628769159317 | accuracy = 0.5875


Epoch[1] Batch[565] Speed: 1.2483127496202728 samples/sec                   batch loss = 1529.9613089561462 | accuracy = 0.5867256637168141


Epoch[1] Batch[570] Speed: 1.2547108824967954 samples/sec                   batch loss = 1544.1055443286896 | accuracy = 0.5859649122807018


Epoch[1] Batch[575] Speed: 1.245225977735109 samples/sec                   batch loss = 1557.6388041973114 | accuracy = 0.5852173913043478


Epoch[1] Batch[580] Speed: 1.2540788432215322 samples/sec                   batch loss = 1571.957511663437 | accuracy = 0.584051724137931


Epoch[1] Batch[585] Speed: 1.249709755536172 samples/sec                   batch loss = 1583.787076473236 | accuracy = 0.5850427350427351


Epoch[1] Batch[590] Speed: 1.2497133860182135 samples/sec                   batch loss = 1597.5121784210205 | accuracy = 0.5855932203389831


Epoch[1] Batch[595] Speed: 1.254669314722458 samples/sec                   batch loss = 1611.359917640686 | accuracy = 0.5865546218487395


Epoch[1] Batch[600] Speed: 1.2538823927717155 samples/sec                   batch loss = 1623.6768305301666 | accuracy = 0.5870833333333333


Epoch[1] Batch[605] Speed: 1.251565350385297 samples/sec                   batch loss = 1637.975921869278 | accuracy = 0.5859504132231405


Epoch[1] Batch[610] Speed: 1.2484586827408706 samples/sec                   batch loss = 1649.5772497653961 | accuracy = 0.5881147540983607


Epoch[1] Batch[615] Speed: 1.2452507473857641 samples/sec                   batch loss = 1661.6137399673462 | accuracy = 0.5886178861788618


Epoch[1] Batch[620] Speed: 1.2479688131572915 samples/sec                   batch loss = 1675.344733953476 | accuracy = 0.5883064516129032


Epoch[1] Batch[625] Speed: 1.2515376213989664 samples/sec                   batch loss = 1687.594422340393 | accuracy = 0.5884


Epoch[1] Batch[630] Speed: 1.2471245403996025 samples/sec                   batch loss = 1700.1807177066803 | accuracy = 0.5888888888888889


Epoch[1] Batch[635] Speed: 1.2469307258891058 samples/sec                   batch loss = 1713.263869524002 | accuracy = 0.5889763779527559


Epoch[1] Batch[640] Speed: 1.2557869724089707 samples/sec                   batch loss = 1725.4222385883331 | accuracy = 0.5890625


Epoch[1] Batch[645] Speed: 1.2422079608682506 samples/sec                   batch loss = 1739.7258191108704 | accuracy = 0.5883720930232558


Epoch[1] Batch[650] Speed: 1.2386677608324814 samples/sec                   batch loss = 1753.3223888874054 | accuracy = 0.5873076923076923


Epoch[1] Batch[655] Speed: 1.2441848806600524 samples/sec                   batch loss = 1766.4889006614685 | accuracy = 0.5877862595419847


Epoch[1] Batch[660] Speed: 1.2489160245245032 samples/sec                   batch loss = 1780.9579184055328 | accuracy = 0.5871212121212122


Epoch[1] Batch[665] Speed: 1.2533018367205613 samples/sec                   batch loss = 1791.7148023843765 | accuracy = 0.5883458646616542


Epoch[1] Batch[670] Speed: 1.2479080125790365 samples/sec                   batch loss = 1804.4503477811813 | accuracy = 0.5891791044776119


Epoch[1] Batch[675] Speed: 1.2507064160107444 samples/sec                   batch loss = 1816.4938591718674 | accuracy = 0.5903703703703703


Epoch[1] Batch[680] Speed: 1.2502365820758907 samples/sec                   batch loss = 1829.0536526441574 | accuracy = 0.5908088235294118


Epoch[1] Batch[685] Speed: 1.2493148106199552 samples/sec                   batch loss = 1841.724304318428 | accuracy = 0.5905109489051095


Epoch[1] Batch[690] Speed: 1.2498981961080442 samples/sec                   batch loss = 1853.8903558254242 | accuracy = 0.5909420289855073


Epoch[1] Batch[695] Speed: 1.2519879726525827 samples/sec                   batch loss = 1868.0424370765686 | accuracy = 0.5902877697841726


Epoch[1] Batch[700] Speed: 1.249087300228268 samples/sec                   batch loss = 1879.6766257286072 | accuracy = 0.5910714285714286


Epoch[1] Batch[705] Speed: 1.2547911170279402 samples/sec                   batch loss = 1891.4522601366043 | accuracy = 0.5918439716312057


Epoch[1] Batch[710] Speed: 1.2513121939351557 samples/sec                   batch loss = 1905.159831404686 | accuracy = 0.5915492957746479


Epoch[1] Batch[715] Speed: 1.2513432728951588 samples/sec                   batch loss = 1917.9708768129349 | accuracy = 0.5916083916083916


Epoch[1] Batch[720] Speed: 1.2480375110151263 samples/sec                   batch loss = 1931.2632046937943 | accuracy = 0.5913194444444444


Epoch[1] Batch[725] Speed: 1.2493600249258152 samples/sec                   batch loss = 1943.876825928688 | accuracy = 0.5913793103448276


Epoch[1] Batch[730] Speed: 1.2482745766936438 samples/sec                   batch loss = 1957.5350695848465 | accuracy = 0.5917808219178082


Epoch[1] Batch[735] Speed: 1.2518781101144585 samples/sec                   batch loss = 1970.104565024376 | accuracy = 0.5925170068027211


Epoch[1] Batch[740] Speed: 1.249465537701171 samples/sec                   batch loss = 1982.5571111440659 | accuracy = 0.5925675675675676


Epoch[1] Batch[745] Speed: 1.2476552199721618 samples/sec                   batch loss = 1995.8134311437607 | accuracy = 0.5922818791946308


Epoch[1] Batch[750] Speed: 1.2476436221817575 samples/sec                   batch loss = 2007.9110159873962 | accuracy = 0.593


Epoch[1] Batch[755] Speed: 1.2491360321928522 samples/sec                   batch loss = 2023.0204825401306 | accuracy = 0.5920529801324503


Epoch[1] Batch[760] Speed: 1.2358162138001556 samples/sec                   batch loss = 2034.4248254299164 | accuracy = 0.593421052631579


Epoch[1] Batch[765] Speed: 1.236665202766403 samples/sec                   batch loss = 2048.209818840027 | accuracy = 0.592156862745098


Epoch[1] Batch[770] Speed: 1.254745414931944 samples/sec                   batch loss = 2060.6149044036865 | accuracy = 0.5928571428571429


Epoch[1] Batch[775] Speed: 1.2500994545368518 samples/sec                   batch loss = 2075.0287103652954 | accuracy = 0.5925806451612903


Epoch[1] Batch[780] Speed: 1.2508864836435856 samples/sec                   batch loss = 2088.6537477970123 | accuracy = 0.5919871794871795


Epoch[1] Batch[785] Speed: 1.2549123797902244 samples/sec                   batch loss = 2100.8335239887238 | accuracy = 0.5917197452229299


[Epoch 1] training: accuracy=0.5926395939086294
[Epoch 1] time cost: 649.8786573410034
[Epoch 1] validation: validation accuracy=0.65


Epoch[2] Batch[5] Speed: 1.2492508091320942 samples/sec                   batch loss = 12.920516729354858 | accuracy = 0.65


Epoch[2] Batch[10] Speed: 1.2472525781450143 samples/sec                   batch loss = 26.20220708847046 | accuracy = 0.65


Epoch[2] Batch[15] Speed: 1.2488004725089907 samples/sec                   batch loss = 37.856436252593994 | accuracy = 0.6833333333333333


Epoch[2] Batch[20] Speed: 1.2462477113668926 samples/sec                   batch loss = 52.16963815689087 | accuracy = 0.625


Epoch[2] Batch[25] Speed: 1.2532603622538407 samples/sec                   batch loss = 65.6547474861145 | accuracy = 0.63


Epoch[2] Batch[30] Speed: 1.2439333173725577 samples/sec                   batch loss = 78.36877393722534 | accuracy = 0.6166666666666667


Epoch[2] Batch[35] Speed: 1.2468640957928894 samples/sec                   batch loss = 91.96244513988495 | accuracy = 0.5928571428571429


Epoch[2] Batch[40] Speed: 1.2488943627032256 samples/sec                   batch loss = 104.55287516117096 | accuracy = 0.60625


Epoch[2] Batch[45] Speed: 1.2448860491675058 samples/sec                   batch loss = 118.84776437282562 | accuracy = 0.6


Epoch[2] Batch[50] Speed: 1.2515402355270377 samples/sec                   batch loss = 131.52032577991486 | accuracy = 0.61


Epoch[2] Batch[55] Speed: 1.251347192873626 samples/sec                   batch loss = 143.74605119228363 | accuracy = 0.6272727272727273


Epoch[2] Batch[60] Speed: 1.251887544830682 samples/sec                   batch loss = 154.47658145427704 | accuracy = 0.6416666666666667


Epoch[2] Batch[65] Speed: 1.234539372336387 samples/sec                   batch loss = 165.52654612064362 | accuracy = 0.65


Epoch[2] Batch[70] Speed: 1.2297490924880794 samples/sec                   batch loss = 177.64190256595612 | accuracy = 0.6571428571428571


Epoch[2] Batch[75] Speed: 1.2407234248110726 samples/sec                   batch loss = 191.54444706439972 | accuracy = 0.65


Epoch[2] Batch[80] Speed: 1.2484756841710472 samples/sec                   batch loss = 202.48256540298462 | accuracy = 0.659375


Epoch[2] Batch[85] Speed: 1.2454715920399229 samples/sec                   batch loss = 213.78765296936035 | accuracy = 0.6647058823529411


Epoch[2] Batch[90] Speed: 1.2512830763103548 samples/sec                   batch loss = 225.9112914800644 | accuracy = 0.6638888888888889


Epoch[2] Batch[95] Speed: 1.2505361874141792 samples/sec                   batch loss = 239.28787183761597 | accuracy = 0.6605263157894737


Epoch[2] Batch[100] Speed: 1.2463198306566068 samples/sec                   batch loss = 250.82736027240753 | accuracy = 0.6575


Epoch[2] Batch[105] Speed: 1.2501656856180412 samples/sec                   batch loss = 263.2045135498047 | accuracy = 0.6595238095238095


Epoch[2] Batch[110] Speed: 1.2500398432496893 samples/sec                   batch loss = 277.3733985424042 | accuracy = 0.6590909090909091


Epoch[2] Batch[115] Speed: 1.2421328220247296 samples/sec                   batch loss = 289.6812187433243 | accuracy = 0.6565217391304348


Epoch[2] Batch[120] Speed: 1.2424914918163397 samples/sec                   batch loss = 302.72797334194183 | accuracy = 0.6479166666666667


Epoch[2] Batch[125] Speed: 1.251668714705646 samples/sec                   batch loss = 312.111124753952 | accuracy = 0.658


Epoch[2] Batch[130] Speed: 1.2395137137863224 samples/sec                   batch loss = 324.45250165462494 | accuracy = 0.6615384615384615


Epoch[2] Batch[135] Speed: 1.2422648958616014 samples/sec                   batch loss = 338.8818951845169 | accuracy = 0.65


Epoch[2] Batch[140] Speed: 1.2394164674566315 samples/sec                   batch loss = 349.6986060142517 | accuracy = 0.6517857142857143


Epoch[2] Batch[145] Speed: 1.2373495233238156 samples/sec                   batch loss = 363.0417722463608 | accuracy = 0.6482758620689655


Epoch[2] Batch[150] Speed: 1.237927996269957 samples/sec                   batch loss = 374.3623950481415 | accuracy = 0.6516666666666666


Epoch[2] Batch[155] Speed: 1.2360333601676157 samples/sec                   batch loss = 386.06027269363403 | accuracy = 0.6532258064516129


Epoch[2] Batch[160] Speed: 1.2389513246904968 samples/sec                   batch loss = 397.29732835292816 | accuracy = 0.6546875


Epoch[2] Batch[165] Speed: 1.2415141230495819 samples/sec                   batch loss = 410.6792086362839 | accuracy = 0.6530303030303031


Epoch[2] Batch[170] Speed: 1.2385726588070447 samples/sec                   batch loss = 422.10343539714813 | accuracy = 0.6544117647058824


Epoch[2] Batch[175] Speed: 1.2170558566821832 samples/sec                   batch loss = 433.70080852508545 | accuracy = 0.66


Epoch[2] Batch[180] Speed: 1.232860342464309 samples/sec                   batch loss = 444.7014328241348 | accuracy = 0.6625


Epoch[2] Batch[185] Speed: 1.240301678411074 samples/sec                   batch loss = 458.03361117839813 | accuracy = 0.6621621621621622


Epoch[2] Batch[190] Speed: 1.2368783618005095 samples/sec                   batch loss = 473.12402975559235 | accuracy = 0.6618421052631579


Epoch[2] Batch[195] Speed: 1.2452428912331412 samples/sec                   batch loss = 485.04331147670746 | accuracy = 0.6615384615384615


Epoch[2] Batch[200] Speed: 1.2465147518815987 samples/sec                   batch loss = 502.4821392297745 | accuracy = 0.65625


Epoch[2] Batch[205] Speed: 1.2430942178006104 samples/sec                   batch loss = 514.3763728141785 | accuracy = 0.6585365853658537


Epoch[2] Batch[210] Speed: 1.2403627487569178 samples/sec                   batch loss = 529.3703405857086 | accuracy = 0.6535714285714286


Epoch[2] Batch[215] Speed: 1.23514058939087 samples/sec                   batch loss = 542.8113610744476 | accuracy = 0.6523255813953488


Epoch[2] Batch[220] Speed: 1.2426868741231512 samples/sec                   batch loss = 554.8763074874878 | accuracy = 0.6522727272727272


Epoch[2] Batch[225] Speed: 1.2443882720307367 samples/sec                   batch loss = 566.4160178899765 | accuracy = 0.6533333333333333


Epoch[2] Batch[230] Speed: 1.24656624719216 samples/sec                   batch loss = 576.6955009698868 | accuracy = 0.6554347826086957


Epoch[2] Batch[235] Speed: 1.23638979123801 samples/sec                   batch loss = 588.7784980535507 | accuracy = 0.6553191489361702


Epoch[2] Batch[240] Speed: 1.2391180476471941 samples/sec                   batch loss = 601.3045423030853 | accuracy = 0.6552083333333333


Epoch[2] Batch[245] Speed: 1.2376475483061407 samples/sec                   batch loss = 613.3625612258911 | accuracy = 0.6571428571428571


Epoch[2] Batch[250] Speed: 1.2332250982811224 samples/sec                   batch loss = 627.6067974567413 | accuracy = 0.654


Epoch[2] Batch[255] Speed: 1.2425698030170425 samples/sec                   batch loss = 638.3298094272614 | accuracy = 0.6558823529411765


Epoch[2] Batch[260] Speed: 1.245226809535745 samples/sec                   batch loss = 651.8012444972992 | accuracy = 0.6538461538461539


Epoch[2] Batch[265] Speed: 1.2424180667788336 samples/sec                   batch loss = 664.6770312786102 | accuracy = 0.6537735849056604


Epoch[2] Batch[270] Speed: 1.2387491578172098 samples/sec                   batch loss = 676.6600023508072 | accuracy = 0.6518518518518519


Epoch[2] Batch[275] Speed: 1.24919518525854 samples/sec                   batch loss = 687.0038633346558 | accuracy = 0.6545454545454545


Epoch[2] Batch[280] Speed: 1.233465184557159 samples/sec                   batch loss = 699.7616212368011 | accuracy = 0.6553571428571429


Epoch[2] Batch[285] Speed: 1.226839722185111 samples/sec                   batch loss = 712.9542814493179 | accuracy = 0.656140350877193


Epoch[2] Batch[290] Speed: 1.2451496416520758 samples/sec                   batch loss = 725.4136161804199 | accuracy = 0.6543103448275862


Epoch[2] Batch[295] Speed: 1.2450864358050693 samples/sec                   batch loss = 740.4722018241882 | accuracy = 0.65


Epoch[2] Batch[300] Speed: 1.2504507176937207 samples/sec                   batch loss = 750.9719064235687 | accuracy = 0.6508333333333334


Epoch[2] Batch[305] Speed: 1.2442085939244512 samples/sec                   batch loss = 762.4637432098389 | accuracy = 0.6516393442622951


Epoch[2] Batch[310] Speed: 1.253423467293592 samples/sec                   batch loss = 775.1508784294128 | accuracy = 0.6524193548387097


Epoch[2] Batch[315] Speed: 1.244369443539568 samples/sec                   batch loss = 788.8121973276138 | accuracy = 0.6515873015873016


Epoch[2] Batch[320] Speed: 1.2415977322502358 samples/sec                   batch loss = 801.0359824895859 | accuracy = 0.6515625


Epoch[2] Batch[325] Speed: 1.24426036026054 samples/sec                   batch loss = 813.4161019325256 | accuracy = 0.6523076923076923


Epoch[2] Batch[330] Speed: 1.2378067972034204 samples/sec                   batch loss = 825.6515321731567 | accuracy = 0.6530303030303031


Epoch[2] Batch[335] Speed: 1.2322760051179524 samples/sec                   batch loss = 839.1605496406555 | accuracy = 0.6514925373134328


Epoch[2] Batch[340] Speed: 1.2378404967471044 samples/sec                   batch loss = 850.2022860050201 | accuracy = 0.6514705882352941


Epoch[2] Batch[345] Speed: 1.2390881220564005 samples/sec                   batch loss = 861.5069969892502 | accuracy = 0.6536231884057971


Epoch[2] Batch[350] Speed: 1.2352820945378806 samples/sec                   batch loss = 872.2352764606476 | accuracy = 0.655


Epoch[2] Batch[355] Speed: 1.2390447462431808 samples/sec                   batch loss = 885.2885553836823 | accuracy = 0.6556338028169014


Epoch[2] Batch[360] Speed: 1.2383410001919821 samples/sec                   batch loss = 896.8149123191833 | accuracy = 0.65625


Epoch[2] Batch[365] Speed: 1.2368924047954564 samples/sec                   batch loss = 905.7095817327499 | accuracy = 0.6602739726027397


Epoch[2] Batch[370] Speed: 1.2390278176885283 samples/sec                   batch loss = 915.5152057409286 | accuracy = 0.6621621621621622


Epoch[2] Batch[375] Speed: 1.236989437837307 samples/sec                   batch loss = 927.1369355916977 | accuracy = 0.6626666666666666


Epoch[2] Batch[380] Speed: 1.2467506834672524 samples/sec                   batch loss = 942.2498584985733 | accuracy = 0.6618421052631579


Epoch[2] Batch[385] Speed: 1.2519561142683848 samples/sec                   batch loss = 953.4165315628052 | accuracy = 0.662987012987013


Epoch[2] Batch[390] Speed: 1.2464774297025722 samples/sec                   batch loss = 968.3655042648315 | accuracy = 0.6628205128205128


Epoch[2] Batch[395] Speed: 1.2309711494071722 samples/sec                   batch loss = 978.6090930700302 | accuracy = 0.6645569620253164


Epoch[2] Batch[400] Speed: 1.2292154304004648 samples/sec                   batch loss = 994.0644694566727 | accuracy = 0.663125


Epoch[2] Batch[405] Speed: 1.2398432010584903 samples/sec                   batch loss = 1005.8431377410889 | accuracy = 0.662962962962963


Epoch[2] Batch[410] Speed: 1.2513489662052864 samples/sec                   batch loss = 1015.9383723735809 | accuracy = 0.6646341463414634


Epoch[2] Batch[415] Speed: 1.2550467159821788 samples/sec                   batch loss = 1028.5216200351715 | accuracy = 0.6644578313253012


Epoch[2] Batch[420] Speed: 1.2510400156890642 samples/sec                   batch loss = 1039.1579654216766 | accuracy = 0.6654761904761904


Epoch[2] Batch[425] Speed: 1.249831248507757 samples/sec                   batch loss = 1052.2200591564178 | accuracy = 0.6652941176470588


Epoch[2] Batch[430] Speed: 1.249184953981712 samples/sec                   batch loss = 1065.0495102405548 | accuracy = 0.6651162790697674


Epoch[2] Batch[435] Speed: 1.241931454290634 samples/sec                   batch loss = 1076.732103586197 | accuracy = 0.6660919540229885


Epoch[2] Batch[440] Speed: 1.2418659089287358 samples/sec                   batch loss = 1087.7220342159271 | accuracy = 0.6670454545454545


Epoch[2] Batch[445] Speed: 1.2486661690027843 samples/sec                   batch loss = 1098.6625006198883 | accuracy = 0.6679775280898876


Epoch[2] Batch[450] Speed: 1.2467064916033983 samples/sec                   batch loss = 1115.494778394699 | accuracy = 0.6672222222222223


Epoch[2] Batch[455] Speed: 1.2398892901091483 samples/sec                   batch loss = 1129.0201358795166 | accuracy = 0.6664835164835164


Epoch[2] Batch[460] Speed: 1.2356604799002824 samples/sec                   batch loss = 1141.385490655899 | accuracy = 0.6668478260869565


Epoch[2] Batch[465] Speed: 1.2396412926092115 samples/sec                   batch loss = 1151.3882348537445 | accuracy = 0.6682795698924732


Epoch[2] Batch[470] Speed: 1.2307048582983775 samples/sec                   batch loss = 1164.2341758012772 | accuracy = 0.6680851063829787


Epoch[2] Batch[475] Speed: 1.2434486508344496 samples/sec                   batch loss = 1177.509429216385 | accuracy = 0.6684210526315789


Epoch[2] Batch[480] Speed: 1.242033325330587 samples/sec                   batch loss = 1188.4426919221878 | accuracy = 0.6697916666666667


Epoch[2] Batch[485] Speed: 1.2389195773800434 samples/sec                   batch loss = 1200.1405301094055 | accuracy = 0.6690721649484536


Epoch[2] Batch[490] Speed: 1.2403748535035388 samples/sec                   batch loss = 1211.1653178930283 | accuracy = 0.6704081632653062


Epoch[2] Batch[495] Speed: 1.2398822345055145 samples/sec                   batch loss = 1224.026428937912 | accuracy = 0.6691919191919192


Epoch[2] Batch[500] Speed: 1.2409848900921092 samples/sec                   batch loss = 1235.8623851537704 | accuracy = 0.6685


Epoch[2] Batch[505] Speed: 1.2488628474897339 samples/sec                   batch loss = 1245.6631243228912 | accuracy = 0.6698019801980198


Epoch[2] Batch[510] Speed: 1.2527158316019764 samples/sec                   batch loss = 1257.7976195812225 | accuracy = 0.6700980392156862


Epoch[2] Batch[515] Speed: 1.2494517660632207 samples/sec                   batch loss = 1269.7334549427032 | accuracy = 0.670388349514563


Epoch[2] Batch[520] Speed: 1.2461742119759833 samples/sec                   batch loss = 1281.695021867752 | accuracy = 0.6697115384615384


Epoch[2] Batch[525] Speed: 1.2311085389049725 samples/sec                   batch loss = 1294.2495505809784 | accuracy = 0.6704761904761904


Epoch[2] Batch[530] Speed: 1.2271858431522054 samples/sec                   batch loss = 1306.922308921814 | accuracy = 0.6702830188679245


Epoch[2] Batch[535] Speed: 1.2362021223763187 samples/sec                   batch loss = 1319.8733664751053 | accuracy = 0.6696261682242991


Epoch[2] Batch[540] Speed: 1.2499611464299045 samples/sec                   batch loss = 1330.6644034385681 | accuracy = 0.6712962962962963


Epoch[2] Batch[545] Speed: 1.2463021472342506 samples/sec                   batch loss = 1343.811524629593 | accuracy = 0.6706422018348623


Epoch[2] Batch[550] Speed: 1.2478941824349161 samples/sec                   batch loss = 1355.3547189235687 | accuracy = 0.6709090909090909


Epoch[2] Batch[555] Speed: 1.2503098160316168 samples/sec                   batch loss = 1363.3489824533463 | accuracy = 0.672972972972973


Epoch[2] Batch[560] Speed: 1.2439893037723897 samples/sec                   batch loss = 1374.3286137580872 | accuracy = 0.6736607142857143


Epoch[2] Batch[565] Speed: 1.2486988824895477 samples/sec                   batch loss = 1385.4789861440659 | accuracy = 0.6738938053097345


Epoch[2] Batch[570] Speed: 1.247573297798612 samples/sec                   batch loss = 1397.6611160039902 | accuracy = 0.6741228070175439


Epoch[2] Batch[575] Speed: 1.2416452383181427 samples/sec                   batch loss = 1408.4267046451569 | accuracy = 0.6739130434782609


Epoch[2] Batch[580] Speed: 1.2503179226075318 samples/sec                   batch loss = 1416.6668211221695 | accuracy = 0.6754310344827587


Epoch[2] Batch[585] Speed: 1.2439077700639734 samples/sec                   batch loss = 1431.1629017591476 | accuracy = 0.673931623931624


Epoch[2] Batch[590] Speed: 1.2504272318780383 samples/sec                   batch loss = 1440.7227652072906 | accuracy = 0.6745762711864407


Epoch[2] Batch[595] Speed: 1.2442380291558959 samples/sec                   batch loss = 1451.29375821352 | accuracy = 0.6752100840336135


Epoch[2] Batch[600] Speed: 1.2404448272371766 samples/sec                   batch loss = 1462.7897391915321 | accuracy = 0.6754166666666667


Epoch[2] Batch[605] Speed: 1.2416484545276254 samples/sec                   batch loss = 1474.675086915493 | accuracy = 0.675206611570248


Epoch[2] Batch[610] Speed: 1.244004338869923 samples/sec                   batch loss = 1487.593804180622 | accuracy = 0.6762295081967213


Epoch[2] Batch[615] Speed: 1.2459843015574417 samples/sec                   batch loss = 1500.1458171010017 | accuracy = 0.6764227642276422


Epoch[2] Batch[620] Speed: 1.2421494676676588 samples/sec                   batch loss = 1509.8019538521767 | accuracy = 0.6778225806451613


Epoch[2] Batch[625] Speed: 1.241201010200474 samples/sec                   batch loss = 1519.5160059332848 | accuracy = 0.6796


Epoch[2] Batch[630] Speed: 1.2418229818452806 samples/sec                   batch loss = 1529.1191928386688 | accuracy = 0.6805555555555556


Epoch[2] Batch[635] Speed: 1.2393358067036293 samples/sec                   batch loss = 1542.1990730762482 | accuracy = 0.6803149606299213


Epoch[2] Batch[640] Speed: 1.2481032452349583 samples/sec                   batch loss = 1552.6753035187721 | accuracy = 0.68046875


Epoch[2] Batch[645] Speed: 1.253355298829913 samples/sec                   batch loss = 1564.0324347615242 | accuracy = 0.6810077519379845


Epoch[2] Batch[650] Speed: 1.239676374519758 samples/sec                   batch loss = 1572.8927208781242 | accuracy = 0.6826923076923077


Epoch[2] Batch[655] Speed: 1.2529544909367532 samples/sec                   batch loss = 1583.8003305792809 | accuracy = 0.683587786259542


Epoch[2] Batch[660] Speed: 1.2417403532664004 samples/sec                   batch loss = 1591.7187747955322 | accuracy = 0.684469696969697


Epoch[2] Batch[665] Speed: 1.2425842516379577 samples/sec                   batch loss = 1599.9953846931458 | accuracy = 0.6849624060150376


Epoch[2] Batch[670] Speed: 1.242320916158598 samples/sec                   batch loss = 1610.0295590162277 | accuracy = 0.6850746268656717


Epoch[2] Batch[675] Speed: 1.24574801241262 samples/sec                   batch loss = 1619.491066634655 | accuracy = 0.6862962962962963


Epoch[2] Batch[680] Speed: 1.2418908208669022 samples/sec                   batch loss = 1634.0502368807793 | accuracy = 0.6856617647058824


Epoch[2] Batch[685] Speed: 1.2406741542955193 samples/sec                   batch loss = 1648.1395594477654 | accuracy = 0.6843065693430657


Epoch[2] Batch[690] Speed: 1.2403394569546313 samples/sec                   batch loss = 1660.8991152644157 | accuracy = 0.6840579710144927


Epoch[2] Batch[695] Speed: 1.2343687025161902 samples/sec                   batch loss = 1670.6747112870216 | accuracy = 0.6845323741007194


Epoch[2] Batch[700] Speed: 1.236797939881056 samples/sec                   batch loss = 1682.777226626873 | accuracy = 0.6853571428571429


Epoch[2] Batch[705] Speed: 1.233538824908561 samples/sec                   batch loss = 1697.626748263836 | accuracy = 0.6840425531914893


Epoch[2] Batch[710] Speed: 1.2354107140989938 samples/sec                   batch loss = 1711.9065681099892 | accuracy = 0.6834507042253521


Epoch[2] Batch[715] Speed: 1.2333862032638698 samples/sec                   batch loss = 1724.5271540284157 | accuracy = 0.6832167832167833


Epoch[2] Batch[720] Speed: 1.2372406636250175 samples/sec                   batch loss = 1735.8546317219734 | accuracy = 0.6836805555555555


Epoch[2] Batch[725] Speed: 1.2396773821235458 samples/sec                   batch loss = 1747.7180669903755 | accuracy = 0.6837931034482758


Epoch[2] Batch[730] Speed: 1.2358135739154985 samples/sec                   batch loss = 1760.2968118786812 | accuracy = 0.6835616438356165


Epoch[2] Batch[735] Speed: 1.2203726653135594 samples/sec                   batch loss = 1770.1362683176994 | accuracy = 0.6843537414965987


Epoch[2] Batch[740] Speed: 1.2260206506953049 samples/sec                   batch loss = 1780.1097655892372 | accuracy = 0.6851351351351351


Epoch[2] Batch[745] Speed: 1.2447617291704305 samples/sec                   batch loss = 1792.0222200751305 | accuracy = 0.6852348993288591


Epoch[2] Batch[750] Speed: 1.2467142735911463 samples/sec                   batch loss = 1801.9212544560432 | accuracy = 0.6853333333333333


Epoch[2] Batch[755] Speed: 1.2441226953029467 samples/sec                   batch loss = 1813.1695302128792 | accuracy = 0.6854304635761589


Epoch[2] Batch[760] Speed: 1.2478997515836006 samples/sec                   batch loss = 1825.031455218792 | accuracy = 0.6851973684210526


Epoch[2] Batch[765] Speed: 1.2451963109057145 samples/sec                   batch loss = 1838.6327971816063 | accuracy = 0.6843137254901961


Epoch[2] Batch[770] Speed: 1.243632073478681 samples/sec                   batch loss = 1847.858699142933 | accuracy = 0.6847402597402598


Epoch[2] Batch[775] Speed: 1.2456396121512632 samples/sec                   batch loss = 1861.3337209820747 | accuracy = 0.6832258064516129


Epoch[2] Batch[780] Speed: 1.2476901074255011 samples/sec                   batch loss = 1873.1700794100761 | accuracy = 0.6836538461538462


Epoch[2] Batch[785] Speed: 1.247178682045638 samples/sec                   batch loss = 1884.655536711216 | accuracy = 0.6837579617834395


[Epoch 2] training: accuracy=0.684010152284264
[Epoch 2] time cost: 650.5420842170715
[Epoch 2] validation: validation accuracy=0.7177777777777777


## Next steps

Now that you have completed training and predicting with a neural network on GPUs, you reached the conclusion of the crash course. Congratulations.
If you are keen on studying more, checkout [D2L.ai](https://d2l.ai),
[GluonCV](https://cv.gluon.ai/tutorials/index.html), [GluonNLP](https://nlp.gluon.ai),
[GluonTS](https://ts.gluon.ai/), [AutoGluon](https://auto.gluon.ai).