# ODIN Kaggle - Train on Fashion MNIST and Test on MNIST

This notebook is set up to run on Kaggle since they have free GPU hours. At a high level, this notebook
- Clones a forked repo of ODIN since their code has some bugs in Python 3 (I'm assuming their syntax was valid in Python 2 or something)
- Downloads our model trained on Fashion MNIST
- Downloads MNIST and Fashion MNIST datasets
- Evaluates the model using ODIN and using the baseline

In [1]:
!git clone https://github.com/jiajinghu19/BDL-OOD

Cloning into 'BDL-OOD'...
remote: Enumerating objects: 273, done.[K
remote: Counting objects: 100% (273/273), done.[K
remote: Compressing objects: 100% (199/199), done.[K
remote: Total 273 (delta 114), reused 218 (delta 63), pack-reused 0[K
Receiving objects: 100% (273/273), 35.91 MiB | 29.53 MiB/s, done.
Resolving deltas: 100% (114/114), done.


In [2]:
# move the pre-SVHN-trained model here
%cd /kaggle/working/BDL-OOD/src/ODIN/odin_fork/models
!mv ../../Densenet_Train_FashionMNIST_Kaggle/Densenet_Train_FashionMNIST_6.15_Percent_Error.pth ./
!ls

/kaggle/working/BDL-OOD/src/ODIN/odin_fork/models
Densenet_Train_FashionMNIST_6.15_Percent_Error.pth


In [3]:
%cd /kaggle/working/BDL-OOD/src/ODIN/odin_fork/code
!rm densenet.py # delete the densenet.py file
!mv ../../Densenet_Train_FashionMNIST_Kaggle/densenet.py ./ # move the correct densenet.py file here

/kaggle/working/BDL-OOD/src/ODIN/odin_fork/code


## Editing `cal.py`

We need to edit `cal.py` in order to
- Load the model via load_state_dict
- Make sure the data is not normalized, since the model we are using was not trained on normalized data

In [4]:
%%writefile cal.py
# %load cal.py
# Copyright (c) 2017-present, Facebook, Inc.
# All rights reserved.
#
# This source code is licensed under the license found in the
# LICENSE file in the root directory of this source tree.
#

"""
Created on Sat Sep 19 20:55:56 2015

@author: liangshiyu
"""

from __future__ import print_function
import torch
from torch.autograd import Variable
import torch.nn as nn
import torch.nn.functional as F
import numpy as np
import torch.optim as optim
import torchvision
import torchvision.transforms as transforms
import numpy as np
import time
from scipy import misc
import calMetric as m
import calData as d
#CUDA_DEVICE = 0

start = time.time()
#loading data sets

def get_transform(dataset_name=""):
    # calData.py uses hardcoded image transforms so I'm (Harry) leaving this for now
    normalize_transform = transforms.Normalize( # default is CIFAR-10
        (125.3/255, 123.0/255, 113.9/255),
        (63.0/255, 62.1/255.0, 66.7/255.0)
    )
    # if dataset_name == "SVHN":
    #     normalize_transform = transforms.Normalize(
    #         (0.43768218, 0.44376934, 0.47280428), 
    #         (0.1980301, 0.2010157, 0.19703591)
    #     )
    if dataset_name == "MNIST" or dataset_name == "FashionMNIST":
        normalize_transform = transforms.Normalize((0.2860402,), (0.3530239,)) # this line is important because the model was trained on this data normalization
    return transforms.Compose([
        transforms.ToTensor(),
        transforms.CenterCrop((32)),
        normalize_transform
    ])

criterion = nn.CrossEntropyLoss()

def test(nnName, in_dataset_name, out_data_name, CUDA_DEVICE, epsilon, temperature):
    net1 = torch.load("../models/{}.pth".format(nnName))
    optimizer1 = optim.SGD(net1.parameters(), lr = 0, momentum = 0)
    net1.cuda(CUDA_DEVICE)
    net1.eval() # https://ai-pool.com/d/pytorch---error--expected-more-than-1-value-per-channel-when-training
    
    testset_out = None
    testloader_out = None
    if out_data_name != "Uniform" and out_data_name != "Gaussian": # if the test data is not unniform or gaussian
        if out_data_name == "CIFAR-10": 
            testset_out = torchvision.datasets.CIFAR10(root='../data', train=False, download=True, transform=get_transform("CIFAR-10"))
        elif out_data_name == "SVHN": 
            testset_out = torchvision.datasets.SVHN(root='svhn', split='test', download=True, transform=get_transform("SVHN"))
        elif out_data_name == "MNIST": 
            testset_out = torchvision.datasets.MNIST(root='mnist', train=False, download=True, transform=get_transform("MNIST"))  
        elif out_data_name == "FashionMNIST": 
            testset_out = torchvision.datasets.FashionMNIST(root='fashionmnist', train=False, download=True, transform=get_transform("FashionMNIST"))                               
        else:
            testset_out = torchvision.datasets.ImageFolder("../data/{}".format(out_data_name), transform=get_transform("CIFAR-10")) # load the data from the folder
        testloader_out = torch.utils.data.DataLoader(testset_out, batch_size=1,
                                            shuffle=False, num_workers=2)
        
    testset_in = None
    if in_dataset_name == "CIFAR-10": 
        testset_in = torchvision.datasets.CIFAR10(root='../data', train=False, download=True, transform=get_transform("CIFAR-10"))
    elif in_dataset_name == "CIFAR-100": 
        testset_in = torchvision.datasets.CIFAR100(root='../data', train=False, download=True, transform=get_transform("CIFAR-10"))
    elif in_dataset_name == "SVHN":
        testset_in = torchvision.datasets.SVHN(root='svhn', split='test', download=True, transform=get_transform("SVHN"))
    elif in_dataset_name == "MNIST":
        testset_in = torchvision.datasets.MNIST(root='mnist', train=False, download=True, transform=get_transform("MNIST"))
    elif in_dataset_name == "FashionMNIST":
        testset_in = torchvision.datasets.FashionMNIST(root='fashionmnist', train=False, download=True, transform=get_transform("FashionMNIST"))
    else:
        print("Invalid in-distribution dataset name")
    testloader_in = torch.utils.data.DataLoader(testset_in, batch_size=1,
                                        shuffle=False, num_workers=2)
    
    if out_data_name == "Gaussian":
        d.testGaussian(net1, criterion, CUDA_DEVICE, testloader_in, testloader_in, nnName, out_data_name, epsilon, temperature)
        m.metric(nnName, in_dataset_name, out_data_name)

    elif out_data_name == "Uniform":
        d.testUni(net1, criterion, CUDA_DEVICE, testloader_in, testloader_in, nnName, out_data_name, epsilon, temperature)
        m.metric(nnName, in_dataset_name, out_data_name)
    else:
        d.testData(net1, criterion, CUDA_DEVICE, testloader_in, testloader_out, nnName, in_dataset_name, out_data_name, epsilon, temperature) 
        m.metric(nnName, in_dataset_name, out_data_name)










Overwriting cal.py


## Editing `calData.py`

We need to edit `calData.py` in order to
- Make sure the model output is shaped properly to fit the ODIN evaluation code
- Make sure the data is not normalized, since the model we are using was not trained on normalized data

In [5]:
%%writefile calData.py
# %load calData.py
# Copyright (c) 2017-present, Facebook, Inc.
# All rights reserved.
#
# This source code is licensed under the license found in the
# LICENSE file in the root directory of this source tree.
#

"""
Created on Sat Sep 19 20:55:56 2015

@author: liangshiyu
"""

from __future__ import print_function
import torch
from torch.autograd import Variable
import numpy as np
import torchvision.transforms as transforms
import numpy as np
import time

# this reshape is important because some densenet models output in a shape (10) instead if (1,10)
def reshape_output(output):
    return torch.reshape(output, (1, 10))

def testData(net1, criterion, CUDA_DEVICE, testloader_in, testloader_out, nnName, in_data_name, out_data_name, noiseMagnitude1, temper):
    t0 = time.time()
    f1 = open("./softmax_scores/confidence_Base_In.txt", 'w')
    f2 = open("./softmax_scores/confidence_Base_Out.txt", 'w')
    g1 = open("./softmax_scores/confidence_Our_In.txt", 'w')
    g2 = open("./softmax_scores/confidence_Our_Out.txt", 'w')
    N = 10000
    if out_data_name == "iSUN":
        N = 8925
        print("Processing in-distribution images")
########################################In-distribution###########################################
    for j, data in enumerate(testloader_in):
        if j<1000: continue
        images, _ = data
        
        inputs = Variable(images.cuda(CUDA_DEVICE), requires_grad = True)
        outputs = reshape_output(net1(inputs))
        

        # Calculating the confidence of the output, no perturbation added here, no temperature scaling used
        nnOutputs = outputs.data.cpu()
        nnOutputs = nnOutputs.numpy()
        nnOutputs = nnOutputs[0]
        nnOutputs = nnOutputs - np.max(nnOutputs)
        nnOutputs = np.exp(nnOutputs)/np.sum(np.exp(nnOutputs))
        f1.write("{}, {}, {}\n".format(temper, noiseMagnitude1, np.max(nnOutputs)))
        
        # Using temperature scaling
        outputs = outputs / temper
	
        # Calculating the perturbation we need to add, that is,
        # the sign of gradient of cross entropy loss w.r.t. input
        maxIndexTemp = np.argmax(nnOutputs)
        labels = Variable(torch.LongTensor([maxIndexTemp]).cuda(CUDA_DEVICE))
        loss = criterion(outputs, labels)
        loss.backward()
        
        # Normalizing the gradient to binary in {0, 1}
        gradient =  torch.ge(inputs.grad.data, 0)
        gradient = (gradient.float() - 0.5) * 2
        # Normalizing the gradient to the same space of image
#         gradient[0][0] = (gradient[0][0] )/(0.5)
        # Adding small perturbations to images
        tempInputs = torch.add(inputs.data,  -noiseMagnitude1, gradient)
        outputs = reshape_output(net1(Variable(tempInputs)))
        outputs = outputs / temper
        # Calculating the confidence after adding perturbations
        nnOutputs = outputs.data.cpu()
        nnOutputs = nnOutputs.numpy()
        nnOutputs = nnOutputs[0]
        nnOutputs = nnOutputs - np.max(nnOutputs)
        nnOutputs = np.exp(nnOutputs)/np.sum(np.exp(nnOutputs))
        g1.write("{}, {}, {}\n".format(temper, noiseMagnitude1, np.max(nnOutputs)))
        if j % 100 == 99:
            print("{:4}/{:4} images processed, {:.1f} seconds used.".format(j+1-1000, N-1000, time.time()-t0))
            t0 = time.time()
        
        if j == N - 1: break


    t0 = time.time()
    print("Processing out-of-distribution images")
###################################Out-of-Distributions#####################################
    for j, data in enumerate(testloader_out):
        if j<1000: continue
        images, _ = data
    
    
        inputs = Variable(images.cuda(CUDA_DEVICE), requires_grad = True)
        outputs = reshape_output(net1(inputs))
        


        # Calculating the confidence of the output, no perturbation added here
        nnOutputs = outputs.data.cpu()
        nnOutputs = nnOutputs.numpy()
        nnOutputs = nnOutputs[0]
        nnOutputs = nnOutputs - np.max(nnOutputs)
        nnOutputs = np.exp(nnOutputs)/np.sum(np.exp(nnOutputs))
        f2.write("{}, {}, {}\n".format(temper, noiseMagnitude1, np.max(nnOutputs)))
        
        # Using temperature scaling
        outputs = outputs / temper
  
  
        # Calculating the perturbation we need to add, that is,
        # the sign of gradient of cross entropy loss w.r.t. input
        maxIndexTemp = np.argmax(nnOutputs)
        labels = Variable(torch.LongTensor([maxIndexTemp]).cuda(CUDA_DEVICE))
        loss = criterion(outputs, labels)
        loss.backward()
        
        # Normalizing the gradient to binary in {0, 1}
        gradient =  (torch.ge(inputs.grad.data, 0))
        gradient = (gradient.float() - 0.5) * 2
        # Normalizing the gradient to the same space of image
#         gradient[0][0] = (gradient[0][0] )/(0.5)
        # Adding small perturbations to images
        tempInputs = torch.add(inputs.data,  -noiseMagnitude1, gradient)
        outputs = reshape_output(net1(Variable(tempInputs)))
        outputs = outputs / temper
        # Calculating the confidence after adding perturbations
        nnOutputs = outputs.data.cpu()
        nnOutputs = nnOutputs.numpy()
        nnOutputs = nnOutputs[0]
        nnOutputs = nnOutputs - np.max(nnOutputs)
        nnOutputs = np.exp(nnOutputs)/np.sum(np.exp(nnOutputs))
        g2.write("{}, {}, {}\n".format(temper, noiseMagnitude1, np.max(nnOutputs)))
        if j % 100 == 99:
            print("{:4}/{:4} images processed, {:.1f} seconds used.".format(j+1-1000, N-1000, time.time()-t0))
            t0 = time.time()

        if j== N-1: break




def testGaussian(net1, criterion, CUDA_DEVICE, testloader_in, testloader_out, nnName, out_data_name, noiseMagnitude1, temper):
    t0 = time.time()
    f1 = open("./softmax_scores/confidence_Base_In.txt", 'w')
    f2 = open("./softmax_scores/confidence_Base_Out.txt", 'w')
    g1 = open("./softmax_scores/confidence_Our_In.txt", 'w')
    g2 = open("./softmax_scores/confidence_Our_Out.txt", 'w')
########################################In-Distribution###############################################
    N = 10000
    print("Processing in-distribution images")
    for j, data in enumerate(testloader_in):
        
        if j<1000: continue
        images, _ = data
        
        inputs = Variable(images.cuda(CUDA_DEVICE), requires_grad = True)
        outputs = net1(inputs)
        
        
        # Calculating the confidence of the output, no perturbation added here
        nnOutputs = outputs.data.cpu()
        nnOutputs = nnOutputs.numpy()
        nnOutputs = nnOutputs[0]
        nnOutputs = nnOutputs - np.max(nnOutputs)
        nnOutputs = np.exp(nnOutputs)/np.sum(np.exp(nnOutputs))
        f1.write("{}, {}, {}\n".format(temper, noiseMagnitude1, np.max(nnOutputs)))
        
        # Using temperature scaling
        outputs = outputs / temper
        
        # Calculating the perturbation we need to add, that is,
        # the sign of gradient of cross entropy loss w.r.t. input
        maxIndexTemp = np.argmax(nnOutputs)
        labels = Variable(torch.LongTensor([maxIndexTemp]).cuda(CUDA_DEVICE))
        loss = criterion(outputs, labels)
        loss.backward()
        
        
        # Normalizing the gradient to binary in {0, 1}
        gradient =  (torch.ge(inputs.grad.data, 0))
        gradient = (gradient.float() - 0.5) * 2
        # Normalizing the gradient to the same space of image
        gradient[0][0] = (gradient[0][0] )/(63.0/255.0)
        gradient[0][1] = (gradient[0][1] )/(62.1/255.0)
        gradient[0][2] = (gradient[0][2])/(66.7/255.0)
        # Adding small perturbations to images
        tempInputs = torch.add(inputs.data,  -noiseMagnitude1, gradient)
        outputs = net1(Variable(tempInputs))
        outputs = outputs / temper
        # Calculating the confidence after adding perturbations
        nnOutputs = outputs.data.cpu()
        nnOutputs = nnOutputs.numpy()
        nnOutputs = nnOutputs[0]
        nnOutputs = nnOutputs - np.max(nnOutputs)
        nnOutputs = np.exp(nnOutputs)/np.sum(np.exp(nnOutputs))

        g1.write("{}, {}, {}\n".format(temper, noiseMagnitude1, np.max(nnOutputs)))
        if j % 100 == 99:
            print("{:4}/{:4} images processed, {:.1f} seconds used.".format(j+1-1000, N-1000, time.time()-t0))
            t0 = time.time()

    
    
########################################Out-of-Distribution######################################
    print("Processing out-of-distribution images")
    for j, data in enumerate(testloader_out):
        if j<1000: continue
        
        images = torch.randn(1,3,32,32) + 0.5
        images = torch.clamp(images, 0, 1)
        images[0][0] = (images[0][0] - 125.3/255) / (63.0/255)
        images[0][1] = (images[0][1] - 123.0/255) / (62.1/255)
        images[0][2] = (images[0][2] - 113.9/255) / (66.7/255)
        
        
        inputs = Variable(images.cuda(CUDA_DEVICE), requires_grad = True)
        outputs = net1(inputs)
        
        
        
        # Calculating the confidence of the output, no perturbation added here
        nnOutputs = outputs.data.cpu()
        nnOutputs = nnOutputs.numpy()
        nnOutputs = nnOutputs[0]
        nnOutputs = nnOutputs - np.max(nnOutputs)
        nnOutputs = np.exp(nnOutputs)/np.sum(np.exp(nnOutputs))
        f2.write("{}, {}, {}\n".format(temper, noiseMagnitude1, np.max(nnOutputs)))
        
        # Using temperature scaling
        outputs = outputs / temper
        
        # Calculating the perturbation we need to add, that is,
        # the sign of gradient of cross entropy loss w.r.t. input
        maxIndexTemp = np.argmax(nnOutputs)
        labels = Variable(torch.LongTensor([maxIndexTemp]).cuda(CUDA_DEVICE))
        loss = criterion(outputs, labels)
        loss.backward()
        
        # Normalizing the gradient to binary in {0, 1}
        gradient =  (torch.ge(inputs.grad.data, 0))
        gradient = (gradient.float() - 0.5) * 2
        # Normalizing the gradient to the same space of image
        gradient[0][0] = (gradient[0][0] )/(63.0/255.0)
        gradient[0][1] = (gradient[0][1] )/(62.1/255.0)
        gradient[0][2] = (gradient[0][2])/(66.7/255.0)
        # Adding small perturbations to images
        tempInputs = torch.add(inputs.data,  -noiseMagnitude1, gradient)
        outputs = net1(Variable(tempInputs))
        outputs = outputs / temper
        # Calculating the confidence after adding perturbations
        nnOutputs = outputs.data.cpu()
        nnOutputs = nnOutputs.numpy()
        nnOutputs = nnOutputs[0]
        nnOutputs = nnOutputs - np.max(nnOutputs)
        nnOutputs = np.exp(nnOutputs)/np.sum(np.exp(nnOutputs))
        g2.write("{}, {}, {}\n".format(temper, noiseMagnitude1, np.max(nnOutputs)))
        
        if j % 100 == 99:
            print("{:4}/{:4} images processed, {:.1f} seconds used.".format(j+1-1000, N-1000, time.time()-t0))
            t0 = time.time()

        if j== N-1: break




def testUni(net1, criterion, CUDA_DEVICE, testloader_in, testloader_out, nnName, out_data_name, noiseMagnitude1, temper):
    t0 = time.time()
    f1 = open("./softmax_scores/confidence_Base_In.txt", 'w')
    f2 = open("./softmax_scores/confidence_Base_Out.txt", 'w')
    g1 = open("./softmax_scores/confidence_Our_In.txt", 'w')
    g2 = open("./softmax_scores/confidence_Our_Out.txt", 'w')
########################################In-Distribution###############################################
    N = 10000
    print("Processing in-distribution images")
    for j, data in enumerate(testloader_in):
        if j<1000: continue
        
        images, _ = data
        
        inputs = Variable(images.cuda(CUDA_DEVICE), requires_grad = True)
        outputs = net1(inputs)
        
        
        # Calculating the confidence of the output, no perturbation added here
        nnOutputs = outputs.data.cpu()
        nnOutputs = nnOutputs.numpy()
        nnOutputs = nnOutputs[0]
        nnOutputs = nnOutputs - np.max(nnOutputs)
        nnOutputs = np.exp(nnOutputs)/np.sum(np.exp(nnOutputs))
        f1.write("{}, {}, {}\n".format(temper, noiseMagnitude1, np.max(nnOutputs)))
        
        # Using temperature scaling
        outputs = outputs / temper
        
        # Calculating the perturbation we need to add, that is,
        # the sign of gradient of cross entropy loss w.r.t. input
        maxIndexTemp = np.argmax(nnOutputs)
        labels = Variable(torch.LongTensor([maxIndexTemp]).cuda(CUDA_DEVICE))
        loss = criterion(outputs, labels)
        loss.backward()
        
        
        # Normalizing the gradient to binary in {0, 1}
        gradient =  (torch.ge(inputs.grad.data, 0))
        gradient = (gradient.float() - 0.5) * 2
        # Normalizing the gradient to the same space of image
        gradient[0][0] = (gradient[0][0] )/(63.0/255.0)
        gradient[0][1] = (gradient[0][1] )/(62.1/255.0)
        gradient[0][2] = (gradient[0][2])/(66.7/255.0)
        # Adding small perturbations to images
        tempInputs = torch.add(inputs.data,  -noiseMagnitude1, gradient)
        outputs = net1(Variable(tempInputs))
        outputs = outputs / temper
        # Calculating the confidence after adding perturbations
        nnOutputs = outputs.data.cpu()
        nnOutputs = nnOutputs.numpy()
        nnOutputs = nnOutputs[0]
        nnOutputs = nnOutputs - np.max(nnOutputs)
        nnOutputs = np.exp(nnOutputs)/np.sum(np.exp(nnOutputs))

        g1.write("{}, {}, {}\n".format(temper, noiseMagnitude1, np.max(nnOutputs)))
        if j % 100 == 99:
            print("{:4}/{:4}  images processed, {:.1f} seconds used.".format(j+1-1000, N-1000, time.time()-t0))
            t0 = time.time()



########################################Out-of-Distribution######################################
    print("Processing out-of-distribution images")
    for j, data in enumerate(testloader_out):
        if j<1000: continue
        
        images = torch.rand(1,3,32,32)
        images[0][0] = (images[0][0] - 125.3/255) / (63.0/255)
        images[0][1] = (images[0][1] - 123.0/255) / (62.1/255)
        images[0][2] = (images[0][2] - 113.9/255) / (66.7/255)
        
        
        inputs = Variable(images.cuda(CUDA_DEVICE), requires_grad = True)
        outputs = net1(inputs)
        
        
        
        # Calculating the confidence of the output, no perturbation added here
        nnOutputs = outputs.data.cpu()
        nnOutputs = nnOutputs.numpy()
        nnOutputs = nnOutputs[0]
        nnOutputs = nnOutputs - np.max(nnOutputs)
        nnOutputs = np.exp(nnOutputs)/np.sum(np.exp(nnOutputs))
        f2.write("{}, {}, {}\n".format(temper, noiseMagnitude1, np.max(nnOutputs)))
        
        # Using temperature scaling
        outputs = outputs / temper
        
        # Calculating the perturbation we need to add, that is,
        # the sign of gradient of cross entropy loss w.r.t. input
        maxIndexTemp = np.argmax(nnOutputs)
        labels = Variable(torch.LongTensor([maxIndexTemp]).cuda(CUDA_DEVICE))
        loss = criterion(outputs, labels)
        loss.backward()
        
        # Normalizing the gradient to binary in {0, 1}
        gradient =  (torch.ge(inputs.grad.data, 0))
        gradient = (gradient.float() - 0.5) * 2
        # Normalizing the gradient to the same space of image
        gradient[0][0] = (gradient[0][0] )/(63.0/255.0)
        gradient[0][1] = (gradient[0][1] )/(62.1/255.0)
        gradient[0][2] = (gradient[0][2])/(66.7/255.0)
        # Adding small perturbations to images
        tempInputs = torch.add(inputs.data,  -noiseMagnitude1, gradient)
        outputs = net1(Variable(tempInputs))
        outputs = outputs / temper
        # Calculating the confidence after adding perturbations
        nnOutputs = outputs.data.cpu()
        nnOutputs = nnOutputs.numpy()
        nnOutputs = nnOutputs[0]
        nnOutputs = nnOutputs - np.max(nnOutputs)
        nnOutputs = np.exp(nnOutputs)/np.sum(np.exp(nnOutputs))
        g2.write("{}, {}, {}\n".format(temper, noiseMagnitude1, np.max(nnOutputs)))
        if j % 100 == 99:
            print("{:4}/{:4} images processed, {:.1f} seconds used.".format(j+1-1000, N-1000, time.time()-t0))
            t0 = time.time()

        if j== N-1: break


Overwriting calData.py


In [6]:
!python main.py --nn Densenet_Train_FashionMNIST_6.15_Percent_Error --in_dataset FashionMNIST --out_dataset MNIST --magnitude 0.0014 --temperature 1000 --gpu 0

Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz to mnist/MNIST/raw/train-images-idx3-ubyte.gz
9913344it [00:00, 31182563.38it/s]                                              
Extracting mnist/MNIST/raw/train-images-idx3-ubyte.gz to mnist/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz to mnist/MNIST/raw/train-labels-idx1-ubyte.gz
29696it [00:00, 36915842.20it/s]                                                
Extracting mnist/MNIST/raw/train-labels-idx1-ubyte.gz to mnist/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz to mnist/MNIST/raw/t10k-images-idx3-ubyte.gz
1649664it [00:00, 38648444.18it/s]                                              
Extracting mnist/MNIST/raw/t1