# What's this PyTorch business?

You've written a lot of code in this assignment to provide a whole host of neural network functionality. Dropout, Batch Norm, and 2D convolutions are some of the workhorses of deep learning in computer vision. You've also worked hard to make your code efficient and vectorized.

For the last part of this assignment, though, we're going to leave behind your beautiful codebase and instead migrate to one of two popular deep learning frameworks: in this instance, PyTorch (or TensorFlow, if you choose to use that notebook).

### What is PyTorch?

PyTorch is a system for executing dynamic computational graphs over Tensor objects that behave similarly as numpy ndarray. It comes with a powerful automatic differentiation engine that removes the need for manual back-propagation. 

### Why?

* Our code will now run on GPUs! Much faster training. When using a framework like PyTorch or TensorFlow you can harness the power of the GPU for your own custom neural network architectures without having to write CUDA code directly (which is beyond the scope of this class).
* We want you to be ready to use one of these frameworks for your project so you can experiment more efficiently than if you were writing every feature you want to use by hand. 
* We want you to stand on the shoulders of giants! TensorFlow and PyTorch are both excellent frameworks that will make your lives a lot easier, and now that you understand their guts, you are free to use them :) 
* We want you to be exposed to the sort of deep learning code you might run into in academia or industry.

### PyTorch versions
This notebook assumes that you are using **PyTorch version 1.0**. In some of the previous versions (e.g. before 0.4), Tensors had to be wrapped in Variable objects to be used in autograd; however Variables have now been deprecated. In addition 1.0 also separates a Tensor's datatype from its device, and uses numpy-style factories for constructing Tensors rather than directly invoking Tensor constructors.

## How will I learn PyTorch?

Justin Johnson has made an excellent [tutorial](https://github.com/jcjohnson/pytorch-examples) for PyTorch. 

You can also find the detailed [API doc](http://pytorch.org/docs/stable/index.html) here. If you have other questions that are not addressed by the API docs, the [PyTorch forum](https://discuss.pytorch.org/) is a much better place to ask than StackOverflow.


# Table of Contents

This assignment has 5 parts. You will learn PyTorch on **three different levels of abstraction**, which will help you understand it better and prepare you for the final project. 

1. Part I, Preparation: we will use CIFAR-10 dataset.
2. Part II, Barebones PyTorch: **Abstraction level 1**, we will work directly with the lowest-level PyTorch Tensors. 
3. Part III, PyTorch Module API: **Abstraction level 2**, we will use `nn.Module` to define arbitrary neural network architecture. 
4. Part IV, PyTorch Sequential API: **Abstraction level 3**, we will use `nn.Sequential` to define a linear feed-forward network very conveniently. 
5. Part V, CIFAR-10 open-ended challenge: please implement your own network to get as high accuracy as possible on CIFAR-10. You can experiment with any layer, optimizer, hyperparameters or other advanced features. 

Here is a table of comparison:

| API           | Flexibility | Convenience |
|---------------|-------------|-------------|
| Barebone      | High        | Low         |
| `nn.Module`     | High        | Medium      |
| `nn.Sequential` | Low         | High        |

In [1]:
import torch
print(torch.cuda.current_device())  # 0
print(torch.cuda.device(0))         # Out[3]: <torch.cuda.device at 0x7efce0b03be0>
print(torch.cuda.device_count())    # Out[4]: 1
print(torch.cuda.get_device_name(0))# Out[5]: Tesla K80
print(torch.cuda.is_available())    # Out[6]: True

0
<torch.cuda.device object at 0x7f4694c91160>
1
Tesla K80
True


In [6]:
# should be the output fo the above cell
'''
import torch
print(torch.cuda.current_device())  # 0
print(torch.cuda.device(0))         # Out[3]: <torch.cuda.device at 0x7efce0b03be0>
print(torch.cuda.device_count())    # Out[4]: 1
print(torch.cuda.get_device_name(0))# Out[5]: Tesla K80
print(torch.cuda.is_available())    # Out[6]: True
'''

0
<torch.cuda.device object at 0x7f5a6300b128>
1
Tesla K80
True

SyntaxError: invalid syntax (<ipython-input-6-907676161791>, line 12)

# Real work (official cs231n stuff):

# Part I. Preparation

First, we load the CIFAR-10 dataset. This might take a couple minutes the first time you do it, but the files should stay cached after that.

In previous parts of the assignment we had to write our own code to download the CIFAR-10 dataset, preprocess it, and iterate through it in minibatches; PyTorch provides convenient tools to automate this process for us.

In [2]:
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader
from torch.utils.data import sampler

import torchvision.datasets as dset
import torchvision.transforms as T

import numpy as np

In [3]:
NUM_TRAIN = 49000

# The torchvision.transforms package provides tools for preprocessing data
# and for performing data augmentation; here we set up a transform to
# preprocess the data by subtracting the mean RGB value and dividing by the
# standard deviation of each RGB value; we've hardcoded the mean and std.                Why would they hardcode these???? -nxb
transform = T.Compose([
                T.ToTensor(),
                T.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010))
            ])

# We set up a Dataset object for each split (train / val / test); Datasets load
# training examples one at a time, so we wrap each Dataset in a DataLoader which
# iterates through the Dataset and forms minibatches. We divide the CIFAR-10
# training set into train and val sets by passing a Sampler object to the
# DataLoader telling how it should sample from the underlying Dataset.
cifar10_train = dset.CIFAR10('./cs231n/datasets', train=True, download=True,
                             transform=transform)
loader_train = DataLoader(cifar10_train, batch_size=64, 
                          sampler=sampler.SubsetRandomSampler(range(NUM_TRAIN)))

cifar10_val = dset.CIFAR10('./cs231n/datasets', train=True, download=True,
                           transform=transform)
loader_val = DataLoader(cifar10_val, batch_size=64, 
                        sampler=sampler.SubsetRandomSampler(range(NUM_TRAIN, 50000)))

cifar10_test = dset.CIFAR10('./cs231n/datasets', train=False, download=True, 
                            transform=transform)
loader_test = DataLoader(cifar10_test, batch_size=64)

Files already downloaded and verified
Files already downloaded and verified
Files already downloaded and verified


In [None]:
import sys
#for varname in dir():
    #print("{0}   has size {1}".format(varname,   sys.getsizeof(eval(varname))  )     )

ipython_vars = ['In', 'Out', 'exit', 'quit', 'get_ipython', 'ipython_vars']

# Get a sorted list of the objects and their sizes
sorted([(x, sys.getsizeof(globals().get(x))) for x in dir() if not x.startswith('_') and x not in sys.modules and x not in ipython_vars], key=lambda x: x[1], reverse=True)

    

You have an option to **use GPU by setting the flag to True below**. It is not necessary to use GPU for this assignment. Note that if your computer does not have CUDA enabled, `torch.cuda.is_available()` will return False and this notebook will fallback to CPU mode.

The global variables `dtype` and `device` will control the data types throughout this assignment. 

In [5]:
#=====================================================================================================================
#=====================================================================================================================
#===================================================== TODO ==========================================================
#=====================================================================================================================
#=====================================================================================================================

USE_GPU = True

#=====================================================================================================================
#=====================================================================================================================
#=====================================================================================================================
#=====================================================================================================================
#=====================================================================================================================

dtype = torch.float32 # we will be using float throughout this tutorial

if USE_GPU and torch.cuda.is_available():
    device = torch.device('cuda')
else:
    device = torch.device('cpu')

# Constant to control how frequently we print train loss
print_every = 100

print('using device:', device)

using device: cuda


# Part II. Barebones PyTorch

PyTorch ships with high-level APIs to help us define model architectures conveniently, which we will cover in Part II of this tutorial. In this section, we will start with the barebone PyTorch elements to understand the autograd engine better. After this exercise, you will come to appreciate the high-level model API more.

We will start with a simple fully-connected ReLU network with two hidden layers and no biases for CIFAR classification. 
This implementation computes the forward pass using operations on PyTorch Tensors, and uses PyTorch autograd to compute gradients. It is important that you understand every line, because you will write a harder version after the example.

When we create a PyTorch Tensor with `requires_grad=True`, then operations involving that Tensor will not just compute values; they will also build up a computational graph in the background, allowing us to easily backpropagate through the graph to compute gradients of some Tensors with respect to a downstream loss. Concretely, if x is a Tensor with `x.requires_grad == True` then after backpropagation `x.grad` will be another Tensor holding the gradient of x with respect to the scalar loss at the end.

### PyTorch Tensors: Flatten Function
A PyTorch Tensor is conceptionally similar to a numpy array: it is an n-dimensional grid of numbers, and like numpy PyTorch provides many functions to efficiently operate on Tensors. As a simple example, we provide a `flatten` function below which reshapes image data for use in a fully-connected neural network.

Recall that image data is typically stored in a Tensor of shape N x C x H x W, where:

* N is the number of datapoints
* C is the number of channels
* H is the height of the intermediate feature map in pixels
* W is the height of the intermediate feature map in pixels

This is the right way to represent the data when we are doing something like a 2D convolution, that needs spatial understanding of where the intermediate features are relative to each other. When we use fully connected affine layers to process the image, however, we want each datapoint to be represented by a single vector -- it's no longer useful to segregate the different channels, rows, and columns of the data. So, we use a "flatten" operation to collapse the `C x H x W` values per representation into a single long vector. The flatten function below first reads in the N, C, H, and W values from a given batch of data, and then returns a "view" of that data. "View" is analogous to numpy's "reshape" method: it reshapes x's dimensions to be N x ??, where ?? is allowed to be anything (in this case, it will be C x H x W, but we don't need to specify that explicitly). 

In [6]:
def flatten(x):
    N = x.shape[0] # read in N, C, H, W
    return x.view(N, -1)  # "flatten" the C * H * W values into a single vector per image

def test_flatten():
    x = torch.arange(12).view(2, 1, 3, 2)
    print('Before flattening: ', x)
    print('After flattening: ', flatten(x))

test_flatten()

Before flattening:  tensor([[[[ 0,  1],
          [ 2,  3],
          [ 4,  5]]],


        [[[ 6,  7],
          [ 8,  9],
          [10, 11]]]])
After flattening:  tensor([[ 0,  1,  2,  3,  4,  5],
        [ 6,  7,  8,  9, 10, 11]])


### Barebones PyTorch: Two-Layer Network

Here we define a function `two_layer_fc` which performs the forward pass of a two-layer fully-connected ReLU network on a batch of image data. After defining the forward pass we check that it doesn't crash and that it produces outputs of the right shape by running zeros through the network.

You don't have to write any code here, but it's important that you read and understand the implementation.

In [7]:
import torch.nn.functional as F  # useful stateless functions

def two_layer_fc(x, params):
    """
    A fully-connected neural networks; the architecture is:
    NN is fully connected -> ReLU -> fully connected layer.
    Note that this function only defines the forward pass; 
    PyTorch will take care of the backward pass for us.
    
    The input to the network will be a minibatch of data, of shape
    (N, d1, ..., dM) where d1 * ... * dM = D. The hidden layer will have H units,
    and the output layer will produce scores for C classes.
    
    Inputs:
    - x: A PyTorch Tensor of shape (N, d1, ..., dM) giving a minibatch of
      input data.
    - params: A list [w1, w2] of PyTorch Tensors giving weights for the network;
      w1 has shape (D, H) and w2 has shape (H, C).
    
    Returns:
    - scores: A PyTorch Tensor of shape (N, C) giving classification scores for
      the input data x.
    """
    # first we flatten the image:
    x = flatten(x)  # shape: [batch_size, C x H x W]               AKA [N,D]   -nxb, July 30, 2019
    # NOTE:    x is a MINI-batch.  -nxb,   July 30, 2019


    w1, w2 = params

    # Forward pass: compute predicted y using operations on Tensors. Since w1 and
    # w2 have requires_grad=True, operations involving these Tensors will cause
    # PyTorch to build a computational graph, allowing automatic computation of
    # gradients. Since we are no longer implementing the backward pass by hand we
    # don't need to keep references to intermediate values.
    # you can also use `.clamp(min=0)`, equivalent to F.relu()
    x = F.relu(x.mm(w1))      # mm == matmul()
    x = x.mm(w2)
    return x
    

def two_layer_fc_test():
    hidden_layer_size = 42
    x = torch.zeros((64, 50), dtype=dtype)  # minibatch size 64, feature dimension 50
    w1 = torch.zeros((50, hidden_layer_size), dtype=dtype)   # isn't "requires_grad"   == False by default??      https://pytorch.org/docs/stable/notes/autograd.html
    w2 = torch.zeros((hidden_layer_size, 10), dtype=dtype)

    #===============================================================================================================
    # isn't "requires_grad"   == False by default??      https://pytorch.org/docs/stable/notes/autograd.html
    # isn't "requires_grad"   == False by default??      https://pytorch.org/docs/stable/notes/autograd.html
    # isn't "requires_grad"   == False by default??      https://pytorch.org/docs/stable/notes/autograd.html
    #===============================================================================================================

    scores = two_layer_fc(x, [w1, w2])
    print(scores.size())  # you should see [64, 10]

two_layer_fc_test()

torch.Size([64, 10])


### Barebones PyTorch: Three-Layer ConvNet

Here you will complete the implementation of the function `three_layer_convnet`, which will perform the forward pass of a three-layer convolutional network. Like above, we can immediately test our implementation by passing zeros through the network. The network should have the following architecture:

1. A convolutional layer (with bias) with `channel_1` filters, each with shape `KW1 x KH1`, and zero-padding of two
2. ReLU nonlinearity
3. A convolutional layer (with bias) with `channel_2` filters, each with shape `KW2 x KH2`, and zero-padding of one
4. ReLU nonlinearity
5. Fully-connected layer with bias, producing scores for C classes.

Note that we have **no softmax activation** here after our fully-connected layer: this is because PyTorch's cross entropy loss performs a softmax activation for you, and by bundling that step in makes computation more efficient.

**HINT**: For convolutions: http://pytorch.org/docs/stable/nn.html#torch.nn.functional.conv2d; pay attention to the shapes of convolutional filters!

In [8]:
def three_layer_convnet(x, params):
    """
    Performs the forward pass of a three-layer convolutional network with the
    architecture defined above.

    Inputs:
    - x: A PyTorch Tensor of shape (N, 3, H, W) giving a minibatch of images
    - params: A list of PyTorch Tensors giving the weights and biases for the
      network; should contain the following:
      - conv_w1: PyTorch Tensor of shape (channel_1, 3, KH1, KW1) giving weights
        for the first convolutional layer
      - conv_b1: PyTorch Tensor of shape (channel_1,) giving biases for the first
        convolutional layer
      - conv_w2: PyTorch Tensor of shape (channel_2, channel_1, KH2, KW2) giving
        weights for the second convolutional layer
      - conv_b2: PyTorch Tensor of shape (channel_2,) giving biases for the second
        convolutional layer
      - fc_w: PyTorch Tensor giving weights for the fully-connected layer. Can you
        figure out what the shape should be?
      - fc_b: PyTorch Tensor giving biases for the fully-connected layer. Can you
        figure out what the shape should be?
    
    Returns:
    - scores: PyTorch Tensor of shape (N, C) giving classification scores for x
    """
    conv_w1, conv_b1, conv_w2, conv_b2, fc_w, fc_b = params
    scores = None
    ################################################################################
    # TODO: Implement the forward pass for the three-layer ConvNet.                #       DONE.
    ################################################################################
    # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

    #print("Before 1st conv,   x.shape = ", x.shape) # torch.Size([64, 3, 32, 32])

    
    # "x" was the shortest var name we could use.  Also, it helped me solve a dropout bug back earlier in assn 2 because adding dropout was optional so I didn't want to have to keep track of a thousand different variables names like "relu_out1" and "affine_out_2"    -nxb
    x = F.conv2d(x, conv_w1, bias=conv_b1, stride=1, padding=2, dilation=1, groups=1)
    # ReLU1:
    x = x.clamp(min=0)
    #print(x.shape)
    # conv2:               -nxb
    x = F.conv2d(x, conv_w2, bias=conv_b2, stride=1, padding=1, dilation=1, groups=1)
    # RelU2:
    x = x.clamp(min=0)

    
    # Debug info:
    '''
        Throughout the ReLUs and conv2,   the shape of x never changed.
    print(x.shape)            # torch.Size([  64,    9,   32,   32])
    '''
    #print(x.shape)            # torch.Size([  64,    9,   32,   32])
    #print(fc_w.shape)         # torch.Size([9216,   10])
    
    
    
    x=flatten(x)
    scores= x.mm(fc_w) + fc_b    # + is overloaded in PyTorch,   just like in numpy

    # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
    ################################################################################
    #                                 END OF YOUR CODE                             #
    ################################################################################
    return scores

After defining the forward pass of the ConvNet above, run the following cell to test your implementation.

When you run this function, scores should have shape (64, 10).

In [9]:
def three_layer_convnet_test():
    x = torch.zeros((64, 3, 32, 32), dtype=dtype)  # minibatch size 64, image size [3, 32, 32]

    conv_w1 = torch.zeros((6, 3, 5, 5), dtype=dtype)  # [out_channel, in_channel, kernel_H, kernel_W]
    conv_b1 = torch.zeros((6,))  # out_channel
    conv_w2 = torch.zeros((9, 6, 3, 3), dtype=dtype)  # [out_channel, in_channel, kernel_H, kernel_W]
    conv_b2 = torch.zeros((9,))  # out_channel

    # you must calculate the shape of the tensor after two conv layers, before the fully-connected layer
    fc_w = torch.zeros((9 * 32 * 32, 10))
    fc_b = torch.zeros(10)

    scores = three_layer_convnet(x, [conv_w1, conv_b1, conv_w2, conv_b2, fc_w, fc_b])
    print(scores.size())  # you should see [64, 10]
three_layer_convnet_test()

torch.Size([64, 10])


### Barebones PyTorch: Initialization
Let's write a couple utility methods to initialize the weight matrices for our models.

- `random_weight(shape)` initializes a weight tensor with the Kaiming normalization method.
- `zero_weight(shape)` initializes a weight tensor with all zeros. Useful for instantiating bias parameters.

The `random_weight` function uses the Kaiming normal initialization method, described in:

He et al, *Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification*, ICCV 2015, https://arxiv.org/abs/1502.01852

In [10]:
def random_weight(shape):
    """
    Create random Tensors for weights; setting requires_grad=True means that we
    want to compute gradients for these Tensors during the backward pass.
    We use Kaiming normalization: sqrt(2 / fan_in)
    """
    if len(shape) == 2:  # FC weight
        fan_in = shape[0]
    else:
        fan_in = np.prod(shape[1:]) # conv weight [out_channel, in_channel, kH, kW]
    # randn is standard normal distribution generator. 
    w = torch.randn(shape, device=device, dtype=dtype) * np.sqrt(2. / fan_in)
    w.requires_grad = True
    return w

def zero_weight(shape):
    return torch.zeros(shape, device=device, dtype=dtype, requires_grad=True)

# create a weight of shape [3 x 5]
# you should see the type `torch.cuda.FloatTensor` if you use GPU. 
# Otherwise it should be `torch.FloatTensor`
random_weight((3, 5))

tensor([[ 0.4300,  0.4265, -1.1066, -1.0988,  0.2558],
        [-0.7984,  1.1036,  0.5489,  0.0814, -0.4607],
        [-0.7061, -0.2281, -2.1129,  0.2475,  0.0029]], device='cuda:0',
       requires_grad=True)

### Barebones PyTorch: Check Accuracy
When training the model we will use the following function to check the accuracy of our model on the training or validation sets.

When checking accuracy, we don't need to compute any gradients; as a result, we don't need PyTorch to build a computational graph for us when we compute scores. To prevent a graph from being built, we scope our computation under a `torch.no_grad()` context manager.

In [None]:
def check_accuracy_part2(loader, model_fn, params):
    """
    Check the accuracy of a classification model.
    
    Inputs:
    - loader: A DataLoader for the data split we want to check
    - model_fn: A function that performs the forward pass of the model,
      with the signature scores = model_fn(x, params)
    - params: List of PyTorch Tensors giving parameters of the model
    
    Returns: Nothing, but prints the accuracy of the model
    """
    split = 'val' if loader.dataset.train else 'test'
    print('Checking accuracy on the %s set' % split)
    num_correct, num_samples = 0, 0
    with torch.no_grad():
        for x, y in loader:
            x = x.to(device=device, dtype=dtype)  # move to device, e.g. GPU
            y = y.to(device=device, dtype=torch.int64)
            scores = model_fn(x, params)
            _, preds = scores.max(1)
            num_correct += (preds == y).sum()
            num_samples += preds.size(0)
        acc = float(num_correct) / num_samples
        print('Got %d / %d correct (%.2f%%)' % (num_correct, num_samples, 100 * acc))

### BareBones PyTorch: Training Loop
We can now set up a basic training loop to train our network. We will train the model using stochastic gradient descent without momentum. We will use `torch.functional.cross_entropy` to compute the loss; you can [read about it here](http://pytorch.org/docs/stable/nn.html#cross-entropy).

The training loop takes as input the neural network function, a list of initialized parameters (`[w1, w2]` in our example), and learning rate.

In [None]:
def train_part2(model_fn, params, learning_rate):
    """
    Train a model on CIFAR-10.
    
    Inputs:
    - model_fn: A Python function that performs the forward pass of the model.
      It should have the signature scores = model_fn(x, params) where x is a
      PyTorch Tensor of image data, params is a list of PyTorch Tensors giving
      model weights, and scores is a PyTorch Tensor of shape (N, C) giving
      scores for the elements in x.
    - params: List of PyTorch Tensors giving weights for the model
    - learning_rate: Python scalar giving the learning rate to use for SGD
    
    Returns: Nothing
    """
    for t, (x, y) in enumerate(loader_train):
        # Move the data to the proper device (GPU or CPU)
        x = x.to(device=device, dtype=dtype)
        y = y.to(device=device, dtype=torch.long)

        # Forward pass: compute scores and loss
        scores = model_fn(x, params)
        loss = F.cross_entropy(scores, y)

        # Backward pass: PyTorch figures out which Tensors in the computational
        # graph has requires_grad=True and uses backpropagation to compute the
        # gradient of the loss with respect to these Tensors, and stores the
        # gradients in the .grad attribute of each Tensor.
        loss.backward()

        # Update parameters. We don't want to backpropagate through the
        # parameter updates, so we scope the updates under a torch.no_grad()
        # context manager to prevent a computational graph from being built.
        with torch.no_grad():
            for w in params:
                w -= learning_rate * w.grad

                # Manually zero the gradients after running the backward pass
                w.grad.zero_()

        if t % print_every == 0:
            print('Iteration %d, loss = %.4f' % (t, loss.item()))
            check_accuracy_part2(loader_val, model_fn, params)
            print()

### BareBones PyTorch: Train a Two-Layer Network
Now we are ready to run the training loop. We need to explicitly allocate tensors for the fully connected weights, `w1` and `w2`. 

Each minibatch of CIFAR has 64 examples, so the tensor shape is `[64, 3, 32, 32]`. 

After flattening, `x` shape should be `[64, 3 * 32 * 32]`. This will be the size of the first dimension of `w1`. 
The second dimension of `w1` is the hidden layer size, which will also be the first dimension of `w2`. 

Finally, the output of the network is a 10-dimensional vector that represents the probability distribution over 10 classes. 

You don't need to tune any hyperparameters but you should see accuracies above 40% after training for one epoch.

In [None]:
hidden_layer_size = 4000
learning_rate = 1e-2

w1 = random_weight((3 * 32 * 32, hidden_layer_size))
w2 = random_weight((hidden_layer_size, 10))

train_part2(two_layer_fc, [w1, w2], learning_rate)

### BareBones PyTorch: Training a ConvNet

In the below you should use the functions defined above to train a three-layer convolutional network on CIFAR. The network should have the following architecture:

1. Convolutional layer (with bias) with 32 5x5 filters, with zero-padding of 2
2. ReLU
3. Convolutional layer (with bias) with 16 3x3 filters, with zero-padding of 1
4. ReLU
5. Fully-connected layer (with bias) to compute scores for 10 classes

You should initialize your weight matrices using the `random_weight` function defined above, and you should initialize your bias vectors using the `zero_weight` function above.

You don't need to tune any hyperparameters, but if everything works correctly you should achieve an accuracy above 42% after one epoch.

In [None]:
learning_rate = 3e-3

channel_1 = 32
channel_2 = 16

conv_w1 = None
conv_b1 = None
conv_w2 = None
conv_b2 = None
fc_w = None
fc_b = None

################################################################################
# TODO: Initialize the parameters of a three-layer ConvNet.                    #
################################################################################
# *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

RGB=3
HH1=filter_h_1=5
HH2=filter_h_2=3
F1=channel_1
F2=channel_2

# Convolutional parameters:
conv_w1  = random_weight( (F1, RGB, HH1, HH1)) # He initialization
conv_b1  = zero_weight(   (F1,              ))
conv_w2  = random_weight( (F2,  F1, HH2, HH2))
conv_b2  = zero_weight(   (F2,              ))

# Variable renames for convenience,   all just to compute the # of parameters in  fc_w :
#   I like short variable names.  -nxb, July 30, 2019
H=input_h=32
P1=pad1=2
P2=pad2=1
S1=stride1=1
S2=stride2=1

#=========================================================
# Calculating intermediate (hidden) outputs' dimensions:
#=========================================================
HP1=outputs_height_after_1st_conv             =1+  (H+   2*P1 -HH1)/S1
HP2=WP2=outputs_height_after_2nd_conv =  width=1+  (HP1+ 2*P2 -HH2)/S2    # "P" stands for "prime" because I couldn't use a real apostrophe / backtick in a variable name in python
fc_in_dim = int(F2*HP2*WP2)
C=num_classes=10

fc_w     = random_weight( (fc_in_dim, C)                  )    # instead of "channel_2," one could hardcode "2**14" or "16384"
fc_b     = zero_weight(   (C,)                            )

'''
    Before 1st conv,   
x.shape =   torch.Size([64,  3, 32, 32])
            torch.Size([64, 32, 32, 32])
            torch.Size([64, 16, 32, 32])
            torch.Size([16, 10])
'''



# *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
################################################################################
#                                 END OF YOUR CODE                             #
################################################################################

params = [conv_w1, conv_b1, conv_w2, conv_b2, fc_w, fc_b]
train_part2(three_layer_convnet, params, learning_rate)

# DEBUGGING:

In [None]:
torch.randn

In [None]:
torch.bernoulli

In [None]:
dir(torch.random)

# Part III. PyTorch Module API

Barebone PyTorch requires that we track all the parameter tensors by hand. This is fine for small networks with a few tensors, but it would be extremely inconvenient and error-prone to track tens or hundreds of tensors in larger networks.

PyTorch provides the `nn.Module` API for you to define arbitrary network architectures, while tracking every learnable parameters for you. In Part II, we implemented SGD ourselves. PyTorch also provides the `torch.optim` package that implements all the common optimizers, such as RMSProp, Adagrad, and Adam. It even supports approximate second-order methods like L-BFGS! You can refer to the [doc](http://pytorch.org/docs/master/optim.html) for the exact specifications of each optimizer.

To use the Module API, follow the steps below:

1. Subclass `nn.Module`. Give your network class an intuitive name like `TwoLayerFC`. 

2. In the constructor `__init__()`, define all the layers you need as class attributes. Layer objects like `nn.Linear` and `nn.Conv2d` are themselves `nn.Module` subclasses and contain learnable parameters, so that you don't have to instantiate the raw tensors yourself. `nn.Module` will track these internal parameters for you. Refer to the [doc](http://pytorch.org/docs/master/nn.html) to learn more about the dozens of builtin layers. **Warning**: don't forget to call the `super().__init__()` first!

3. In the `forward()` method, define the *connectivity* of your network. You should use the attributes defined in `__init__` as function calls that take tensor as input and output the "transformed" tensor. Do *not* create any new layers with learnable parameters in `forward()`! All of them must be declared upfront in `__init__`. 

After you define your Module subclass, you can instantiate it as an object and call it just like the NN forward function in part II.

### Module API: Two-Layer Network
Here is a concrete example of a 2-layer fully connected network:

In [11]:
import torch.nn.functional as F  # useful stateless functions

class TwoLayerFC(nn.Module):
    def __init__(self, input_size, hidden_size, num_classes):
        super().__init__()
        # assign layer objects to class attributes
        self.fc1 = nn.Linear(input_size, hidden_size)
        # nn.init package contains convenient initialization methods
        # http://pytorch.org/docs/master/nn.html#torch-nn-init 
        nn.init.kaiming_normal_(self.fc1.weight)
        self.fc2 = nn.Linear(hidden_size, num_classes)
        nn.init.kaiming_normal_(self.fc2.weight)
    
    def forward(self, x):
        # forward always defines connectivity
        x = flatten(x)
        scores = self.fc2(F.relu(self.fc1(x)))
        return scores

def test_TwoLayerFC():
    input_size = 50
    x = torch.zeros((64, input_size), dtype=dtype)  # minibatch size 64, feature dimension 50
    model = TwoLayerFC(input_size, 42, 10)
    scores = model(x)
    print(scores.size())  # you should see [64, 10]
test_TwoLayerFC()

torch.Size([64, 10])


### Module API: Three-Layer ConvNet
It's your turn to implement a 3-layer ConvNet followed by a fully connected layer. The network architecture should be the same as in Part II:

1. Convolutional layer with `channel_1` 5x5 filters with zero-padding of 2
2. ReLU
3. Convolutional layer with `channel_2` 3x3 filters with zero-padding of 1
4. ReLU
5. Fully-connected layer to `num_classes` classes

You should initialize the weight matrices of the model using the Kaiming normal initialization method.

**HINT**: http://pytorch.org/docs/stable/nn.html#conv2d

After you implement the three-layer ConvNet, the `test_ThreeLayerConvNet` function will run your implementation; it should print `(64, 10)` for the shape of the output scores.

In [None]:
class ThreeLayerConvNet(nn.Module):
    def __init__(self, in_channel, channel_1, channel_2, num_classes):
        super().__init__()
        ########################################################################
        # TODO: Set up the layers you need for a three-layer ConvNet with the  #
        # architecture defined above.                                          #
        ########################################################################
        # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

        #print("in_channel  : ", in_channel)
        #print("channel_1   : ", channel_1)
        #print("channel_2   : ", channel_2)
        #print("num_classes : ", num_classes)
        self.conv1 = nn.Conv2d( in_channels=in_channel, out_channels=channel_1, kernel_size=5 , stride=1, padding=2, dilation=1, groups=1, bias=True )
        nn.init.kaiming_normal_(self.conv1.weight)
        #self.relu1 = nn.ReLU( ... )  # NOPE, we don't need to do this "self.relu1" function call   b/c   we don't need to learn any parameters for the  relu.
        self.conv2 = nn.Conv2d( in_channels= channel_1, out_channels=channel_2, kernel_size=3 , stride=1, padding=1, dilation=1, groups=1, bias=True )
        nn.init.kaiming_normal_(self.conv2.weight)

        #=========================================================
        # Fully Connected Layer:
        #=========================================================
        # Calculating intermediate (hidden) outputs' dimensions:
        #=========================================================
        # "P" stands for "prime" because I couldn't use a real apostrophe / backtick in a variable name in python
        HH1 = filter_h_1       = 5
        HH2 = filter_h_2       = 3
        F1  = channel_1
        F2                  = channel_2
        HP1                 = outputs_height_after_1st_conv             =1+  (H+   2*P1 -HH1)/S1
        HP2           = WP2 = outputs_height_after_2nd_conv =  width =   1+  (HP1+ 2*P2 -HH2)/S2
        fc_in_dim           = int(F2*HP2*WP2)
        C                   = num_classes
        self.linear3 = nn.Linear( in_features = fc_in_dim,  out_features = C, bias=True)
        nn.init.kaiming_normal_(self.linear3.weight)

        '''
          # "Sequential()"   API
        self.main = nn.Sequential(
            # input is Z, going into a convolution
            nn.ConvTranspose2d(     nz, ngf * 8, 4, 1, 0, bias=False),
            nn.BatchNorm2d(ngf * 8),
            nn.ReLU(True),
            # state size. (ngf*8) x 4 x 4
            nn.ConvTranspose2d(ngf * 8, ngf * 4, 4, 2, 1, bias=False),
            nn.BatchNorm2d(ngf * 4),
            nn.ReLU(True),
            # state size. (ngf*4) x 8 x 8
            nn.ConvTranspose2d(ngf * 4, ngf * 2, 4, 2, 1, bias=False),
            nn.BatchNorm2d(ngf * 2),
            nn.ReLU(True),
            # state size. (ngf*2) x 16 x 16
            nn.ConvTranspose2d(ngf * 2,     ngf, 4, 2, 1, bias=False),
            nn.BatchNorm2d(ngf),
            nn.ReLU(True),
            # state size. (ngf) x 32 x 32
            nn.ConvTranspose2d(    ngf,      nc, 4, 2, 1, bias=False),
            nn.Tanh()
            # state size. (nc) x 64 x 64

        )
        '''



        '''
          # My original code from earlier   (uses Barebones PyTorch)
          #   This function SHOULD use the nn.Module API

        #=========================================================
        # Convenience names:
        #=========================================================
        # Variable renames for convenience,   all just to compute the # of parameters in  fc_w :
        #   I like short variable names.  -nxb, July 30, 2019
        RGB = 3
        HH1 = filter_h_1       = 5
        HH2 = filter_h_2       = 3
        F1  = channel_1
        F2  = channel_2

        #=========================================================
        # Convolutional Layers:
        #=========================================================

        # Are these variable names too long?    Maybe just 'w1' ?
        self.params['conv_w1'] = random_weight( (F1, RGB, HH1, HH1)) # He initialization
        self.params['conv_b1'] = zero_weight(   (F1,              ))
        self.params['conv_w2'] = random_weight( (F2,  F1, HH2, HH2))
        self.params['conv_b2'] = zero_weight(   (F2,              ))

        #=========================================================
        # Fully Connected Layer:
        #=========================================================
        # Calculating intermediate (hidden) outputs' dimensions:
        #=========================================================
        # "P" stands for "prime" because I couldn't use a real apostrophe / backtick in a variable name in python
        HP1                 = outputs_height_after_1st_conv             =1+  (H+   2*P1 -HH1)/S1
        HP2           = WP2 = outputs_height_after_2nd_conv =  width =  1+  (HP1+ 2*P2 -HH2)/S2
        fc_in_dim           = int(F2*HP2*WP2)
        C                   = num_classes = 10
        self.params['fc_w'] = random_weight( (fc_in_dim, C)                  )    # instead of "channel_2," one could hardcode "2**14" or "16384"
        self.params['fc_b'] = zero_weight(   (C,)                            )

        '''

        # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
        ########################################################################
        #                          END OF YOUR CODE                            #       
        ########################################################################

    def forward(self, x):
        scores = None
        ########################################################################
        # TODO: Implement the forward function for a 3-layer ConvNet. you      #
        # should use the layers you defined in __init__ and specify the        #
        # connectivity of those layers in forward()                            #
        ########################################################################
        # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

        #print('Within function "forward()", before conv1(),    \n x.shape = {0}'.format(x.shape))
        # All layers:
        x=self.conv1(x)
        x=F.relu(x)
        x=self.conv2(x)
        x=F.relu(x)
        x=flatten(x)
        x=self.linear3(x)
        scores=F.relu(x)
        
        
        '''
        # "x" was the shortest var name we could use.  Also, it helped me solve a dropout bug back earlier in assn 2 because adding dropout was optional so I didn't want to have to keep track of a thousand different variables names like "relu_out1" and "affine_out_2"    -nxb
        x = nn.Conv2d(x, conv_w1, bias=conv_b1, stride=1, padding=2, dilation=1, groups=1)
        # ReLU1:
        x = x.clamp(min = 0)
        # conv2:
        x = nn.Conv2d(x, conv_w2, bias=conv_b2, stride=1, padding=1, dilation=1, groups=1)  # not F :  functional.conv2d()    nn.Conv2d()
        # RelU2:
        x = x.clamp(min = 0)

        x = flatten(x)
        scores = x.mm(fc_w) + fc_b    # + is overloaded in PyTorch,   just like in numpy
        '''
        
        
        
        # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
        ########################################################################
        #                             END OF YOUR CODE                         #
        ########################################################################
        return scores


def test_ThreeLayerConvNet():
    x = torch.zeros((64, 3, 32, 32), dtype=dtype)  # minibatch size 64, image size [3, 32, 32]
    model = ThreeLayerConvNet(in_channel=3, channel_1=12, channel_2=8, num_classes=10)
    scores = model(x)
    print(scores.size())  # you should see [64, 10]
test_ThreeLayerConvNet()

### Module API: Check Accuracy
Given the validation or test set, we can check the classification accuracy of a neural network. 

This version is slightly different from the one in part II. You don't manually pass in the parameters anymore.

In [24]:
def check_accuracy_part34(loader, model):
    if loader.dataset.train:
        print('Checking accuracy on validation set')
    else:
        print('Checking accuracy on test set')   
    num_correct = 0
    num_samples = 0
    model.eval()  # set model to evaluation mode
    with torch.no_grad():
        for x, y in loader:
            x = x.to(device=device, dtype=dtype)  # move to device, e.g. GPU
            y = y.to(device=device, dtype=torch.long)
            scores = model(x)
            _, preds = scores.max(1)
            num_correct += (preds == y).sum()
            num_samples += preds.size(0)
        acc = float(num_correct) / num_samples
        print('Got %d / %d correct (%.2f)' % (num_correct, num_samples, 100 * acc))

### Module API: Training Loop
We also use a slightly different training loop. Rather than updating the values of the weights ourselves, we use an Optimizer object from the `torch.optim` package, which abstract the notion of an optimization algorithm and provides implementations of most of the algorithms commonly used to optimize neural networks.

In [None]:
def train_part34(model, optimizer, epochs=1):
    """
    Train a model on CIFAR-10 using the PyTorch Module API.
    
    Inputs:
    - model: A PyTorch Module giving the model to train.
    - optimizer: An Optimizer object we will use to train the model
    - epochs: (Optional) A Python integer giving the number of epochs to train for
    
    Returns: Nothing, but prints model accuracies during training.
    """
    model = model.to(device=device)  # move the model parameters to CPU/GPU
    for e in range(epochs):
        for t, (x, y) in enumerate(loader_train):
            model.train()  # put model to training mode
            x = x.to(device=device, dtype=dtype)  # move to device, e.g. GPU
            y = y.to(device=device, dtype=torch.long)

            scores = model(x)
            loss = F.cross_entropy(scores, y)

            # Zero out all of the gradients for the variables which the optimizer
            # will update.
            optimizer.zero_grad()

            # This is the backwards pass: compute the gradient of the loss with
            # respect to each  parameter of the model.
            loss.backward()

            # Actually update the parameters of the model using the gradients
            # computed by the backwards pass.
            optimizer.step()

            if t % print_every == 0:
                print('Iteration %d, loss = %.4f' % (t, loss.item()))
                check_accuracy_part34(loader_val, model)
                print()

### Module API: Train a Two-Layer Network
Now we are ready to run the training loop. In contrast to part II, we don't explicitly allocate parameter tensors anymore.

Simply pass the input size, hidden layer size, and number of classes (i.e. output size) to the constructor of `TwoLayerFC`. 

You also need to define an optimizer that tracks all the learnable parameters inside `TwoLayerFC`.

You don't need to tune any hyperparameters, but you should see model accuracies above 40% after training for one epoch.

In [None]:
hidden_layer_size = 4000
learning_rate = 1e-2
model = TwoLayerFC(3 * 32 * 32, hidden_layer_size, 10)
optimizer = optim.SGD(model.parameters(), lr=learning_rate)

train_part34(model, optimizer)

### Module API: Train a Three-Layer ConvNet
You should now use the Module API to train a three-layer ConvNet on CIFAR. This should look very similar to training the two-layer network! You don't need to tune any hyperparameters, but you should achieve above above 45% after training for one epoch.

You should train the model using stochastic gradient descent without momentum.

In [None]:
learning_rate = 3e-3
channel_1 = 32
channel_2 = 16

model= None
optimizer = None
################################################################################
# TODO: Instantiate your ThreeLayerConvNet model and a corresponding optimizer #
################################################################################
# *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

RGB  = 3
in_D = 3 *32 * 32 # CIFAR-10
lr= 1e-2

# num filters
F1= channel_1
F2= channel_2
C = 10
model = ThreeLayerConvNet(in_channel = RGB, channel_1 = F1, channel_2 = F2, num_classes = C)
optimizer = optim.SGD(model.parameters(), lr=lr)

# *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
################################################################################
#                                 END OF YOUR CODE                             
################################################################################

train_part34(model, optimizer)

# Part IV. PyTorch Sequential API

Part III introduced the PyTorch Module API, which allows you to define arbitrary learnable layers and their connectivity. 

For simple models like a stack of feed forward layers, you still need to go through 3 steps: subclass `nn.Module`, assign layers to class attributes in `__init__`, and call each layer one by one in `forward()`. Is there a more convenient way? 

Fortunately, PyTorch provides a container Module called `nn.Sequential`, which merges the above steps into one. It is not as flexible as `nn.Module`, because you cannot specify more complex topology than a feed-forward stack, but it's good enough for many use cases.

### Sequential API: Two-Layer Network
Let's see how to rewrite our two-layer fully connected network example with `nn.Sequential`, and train it using the training loop defined above.

Again, you don't need to tune any hyperparameters here, but you shoud achieve above 40% accuracy after one epoch of training.

In [None]:
# We need to wrap `flatten` function in a module in order to stack it
# in nn.Sequential
class Flatten(nn.Module):
    def forward(self, x):
        return flatten(x)

hidden_layer_size = 4000
learning_rate = 1e-2

model = nn.Sequential(
    Flatten(),
    nn.Linear(3 * 32 * 32, hidden_layer_size),
    nn.ReLU(),
    nn.Linear(hidden_layer_size, 10),
)

# you can use Nesterov momentum in optim.SGD
optimizer = optim.SGD(model.parameters(), lr=learning_rate,
                     momentum=0.9, nesterov=True)

train_part34(model, optimizer)

### Sequential API: Three-Layer ConvNet
Here you should use `nn.Sequential` to define and train a three-layer ConvNet with the same architecture we used in Part III:

1. Convolutional layer (with bias) with 32 5x5 filters, with zero-padding of 2
2. ReLU
3. Convolutional layer (with bias) with 16 3x3 filters, with zero-padding of 1
4. ReLU
5. Fully-connected layer (with bias) to compute scores for 10 classes

You should initialize your weight matrices using the `random_weight` function defined above, and you should initialize your bias vectors using the `zero_weight` function above.

You should optimize your model using stochastic gradient descent with Nesterov momentum 0.9.

Again, you don't need to tune any hyperparameters but you should see accuracy above 55% after one epoch of training.

In [None]:
channel_1 = 32
channel_2 = 16
learning_rate = 1e-2

model = None
optimizer = None

################################################################################
# TODO: Rewrite the 2-layer ConvNet with bias from Part III with the           #
# Sequential API.                                                              #
################################################################################
# *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

# 'F1' means "num filters    in the 1st conv2d layer"
# 'F2' means "num filters    in the 2nd conv2d layer"
F1= channel_1
F2= channel_2
lr=learning_rate

RGB = 3
HH1 = filter_h_1       = 5
HH2 = filter_h_2       = 3
F1  = channel_1
F2                  = channel_2
HP1                 = outputs_height_after_1st_conv             =1+  (H+   2*P1 -HH1)/S1
HP2           = WP2 = outputs_height_after_2nd_conv =  width =   1+  (HP1+ 2*P2 -HH2)/S2
fc_in_dim           = int(F2*HP2*WP2)
C                   = num_classes


# I'm like 90% sure the best thing to do is  leaky ReLU and Adam,
#   But for now I'm in "blindly-execute-and-follow-the-assignment-directions" mode,   so I'll do Nesterov Momentum and vanilla  ReLU
model = nn.Sequential(
    nn.Conv2d( in_channels=RGB, out_channels= F1, kernel_size=5, stride=1, padding=2, dilation=1, groups=1, bias=True ),
    nn.ReLU(),
    nn.Conv2d( in_channels=F1 , out_channels= F2, kernel_size=3, stride=1, padding=1, dilation=1, groups=1, bias=True ),
    nn.ReLU(),
    Flatten(),
    nn.Linear( in_features = fc_in_dim,    out_features = C, bias=True),
)

optimizer = optim.SGD(model.parameters(), lr=lr,   momentum=0.9, nesterov=True)

# *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
################################################################################
#                                 END OF YOUR CODE                             
################################################################################

train_part34(model, optimizer)

# Part V. CIFAR-10 open-ended challenge

In this section, you can experiment with whatever ConvNet architecture you'd like on CIFAR-10. 

Now it's your job to experiment with architectures, hyperparameters, loss functions, and optimizers to train a model that achieves **at least 70%** accuracy on the CIFAR-10 **validation** set within 10 epochs. You can use the check_accuracy and train functions from above. You can use either `nn.Module` or `nn.Sequential` API. 

Describe what you did at the end of this notebook.

Here are the official API documentation for each component. One note: what we call in the class "spatial batch norm" is called "BatchNorm2D" in PyTorch.

* Layers in torch.nn package: http://pytorch.org/docs/stable/nn.html
* Activations: http://pytorch.org/docs/stable/nn.html#non-linear-activations
* Loss functions: http://pytorch.org/docs/stable/nn.html#loss-functions
* Optimizers: http://pytorch.org/docs/stable/optim.html


### Things you might try:
- **Filter size**: Above we used 5x5; would smaller filters be more efficient?
- **Number of filters**: Above we used 32 filters. Do more or fewer do better?
- **Pooling vs Strided Convolution**: Do you use max pooling or just stride convolutions?
- **Batch normalization**: Try adding spatial batch normalization after convolution layers and vanilla batch normalization after affine layers. Do your networks train faster?
- **Network architecture**: The network above has two layers of trainable parameters. Can you do better with a deep network? Good architectures to try include:
    - [conv-relu-pool]xN -> [affine]xM -> [softmax or SVM]
    - [conv-relu-conv-relu-pool]xN -> [affine]xM -> [softmax or SVM]
    - [batchnorm-relu-conv]xN -> [affine]xM -> [softmax or SVM]
- **Global Average Pooling**: Instead of flattening and then having multiple affine layers, perform convolutions until your image gets small (7x7 or so) and then perform an average pooling operation to get to a 1x1 image picture (1, 1 , Filter#), which is then reshaped into a (Filter#) vector. This is used in [Google's Inception Network](https://arxiv.org/abs/1512.00567) (See Table 1 for their architecture).
- **Regularization**: Add l2 weight regularization, or perhaps use Dropout.

### Tips for training
For each network architecture that you try, you should tune the learning rate and other hyperparameters. When doing this there are a couple important things to keep in mind:

- If the parameters are working well, you should see improvement within a few hundred iterations
- Remember the coarse-to-fine approach for hyperparameter tuning: start by testing a large range of hyperparameters for just a few training iterations to find the combinations of parameters that are working at all.
- Once you have found some sets of parameters that seem to work, search more finely around these parameters. You may need to train for more epochs.
- You should use the validation set for hyperparameter search, and save your test set for evaluating your architecture on the best parameters as selected by the validation set.

### Going above and beyond
If you are feeling adventurous there are many other features you can implement to try and improve your performance. You are **not required** to implement any of these, but don't miss the fun if you have time!

- Alternative optimizers: you can try Adam, Adagrad, RMSprop, etc.
- Alternative activation functions such as leaky ReLU, parametric ReLU, ELU, or MaxOut.
- Model ensembles
- Data augmentation
- New Architectures
  - [ResNets](https://arxiv.org/abs/1512.03385) where the input from the previous layer is added to the output.
  - [DenseNets](https://arxiv.org/abs/1608.06993) where inputs into previous layers are concatenated together.
  - [This blog has an in-depth overview](https://chatbotslife.com/resnets-highwaynets-and-densenets-oh-my-9bb15918ee32)

### Have fun and happy training! 

In [None]:
class NXB_net(nn.Module):
    def __init__(self, input_size, hidden_size, num_classes):
        super().__init__()
        # assign layer objects to class attributes
        self.conv1 = nn.Conv2d( in_channels=in_channel, out_channels=channel_1, kernel_size=5 , stride=1, padding=2, dilation=1, groups=1, bias=True )
        nn.init.kaiming_normal_(self.conv1.weight)
        # nn.init package contains convenient initialization methods
        # http://pytorch.org/docs/master/nn.html#torch-nn-init 
    
    def forward(self, x):
        # forward always defines connectivity
        x = flatten(x)
        scores = self.fc2(F.relu(self.fc1(x)))
        return scores

def test_nxb():
    input_size = 50
    x = torch.zeros((64, input_size), dtype=dtype)  # minibatch size 64, feature dimension 50
    model = TwoLayerFC(input_size, 42, 10)
    scores = model(x)
    print(scores.size())  # you should see [64, 10]
test_nxb()

In [32]:
def train_nxb(model, optimizer, epochs=1):
    """
    Train a model on CIFAR-10 using the PyTorch Module API.

    Inputs:
    - model: A PyTorch Module giving the model to train.
    - optimizer: An Optimizer object we will use to train the model
    - epochs: (Optional) A Python integer giving the number of epochs to train for
    
    Returns: Nothing, but prints model accuracies during training.
    """
    model = model.to(device=device)  # move the model parameters to CPU/GPU
    for e in range(epochs):
        for t, (x, y) in enumerate(loader_train):
            model.train()  # put model to training mode
            x = x.to(device=device, dtype=dtype)  # move to device, e.g. GPU
            y = y.to(device=device, dtype=torch.long)

            scores = model(x)
            loss = F.cross_entropy(scores, y)

            # Zero out all of the gradients for the variables which the optimizer
            # will update.
            optimizer.zero_grad()

            # This is the backwards pass: compute the gradient of the loss with
            # respect to each  parameter of the model.
            loss.backward()

            # Actually update the parameters of the model using the gradients
            # computed by the backwards pass.
            optimizer.step()

            if t % print_every == 0:
                print('Iteration %d, loss = %.4f' % (t, loss.item()))
                check_accuracy_part34(loader_val, model)
                print()

In [30]:
"""
import torchvision
net=rnext=rNeXt=resnext101 =torchvision.models.resnext101_32x8d(pretrained=False, progress=True)
d=dir; from pprint import pprint as p
#p(d(resnext101))
'parameters' in d(resnext101)
"""


True

In [38]:
# TODO:      change ResNeXt dimensions (input *AND* output     and whatever else is needed to make the network dimensions actually *WORK*)
lr=1e-3
optimizer=optim.Adam(net.parameters(), lr=lr, betas=(0.9, 0.999), eps=1e-8, weight_decay=0, amsgrad=False )
train_nxb(rNeXt, optimizer, 
          epochs=3)

Iteration 0, loss = 1.4550
Checking accuracy on validation set
Got 349 / 1000 correct (34.90)

Iteration 100, loss = 1.6709
Checking accuracy on validation set
Got 380 / 1000 correct (38.00)

Iteration 200, loss = 1.8279
Checking accuracy on validation set
Got 455 / 1000 correct (45.50)

Iteration 300, loss = 1.5736
Checking accuracy on validation set
Got 444 / 1000 correct (44.40)

Iteration 400, loss = 1.4400
Checking accuracy on validation set
Got 465 / 1000 correct (46.50)

Iteration 500, loss = 1.9875
Checking accuracy on validation set
Got 458 / 1000 correct (45.80)

Iteration 600, loss = 1.9080
Checking accuracy on validation set
Got 418 / 1000 correct (41.80)

Iteration 700, loss = 1.6117
Checking accuracy on validation set
Got 347 / 1000 correct (34.70)

Iteration 0, loss = 1.6633
Checking accuracy on validation set
Got 370 / 1000 correct (37.00)

Iteration 100, loss = 2.0094
Checking accuracy on validation set
Got 427 / 1000 correct (42.70)

Iteration 200, loss = 1.4959
Check

In [12]:
#from torch.utils.tensorboard import SummaryWriter

In [26]:
torch.cuda.get_device_name(0)  # Test whether we're still using the GPU. 

'Tesla K80'

## 1. Import ALL required libraries :

#### (ie. tqdm for progress bars)

In [14]:
import copy

import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F

import torchvision
import torchvision.transforms as transforms

# progress bars:
from tqdm import tqdm as pbar
#from models import *

## 2. Prepare datasets :

In [15]:
def make_dataloaders(params):
    """
    Make a Pytorch dataloader object that can be used for traing and valiation
    Input:
        - params dict with key 'path' (string): path of the dataset folder
        - params dict with key 'batch_size' (int): mini-batch size
        - params dict with key 'num_workers' (int): number of workers for dataloader
    Output:
        - trainloader and testloader (pytorch dataloader object)
    """
    transform_train = transforms.Compose([transforms.RandomCrop(32, padding=4),
                                          transforms.RandomHorizontalFlip(),
                                          transforms.ToTensor(),
                                          transforms.Normalize([0.4914, 0.4822, 0.4465], [0.2023, 0.1994, 0.2010])])

    transform_validation = transforms.Compose([transforms.ToTensor(),
                                               transforms.Normalize([0.4914, 0.4822, 0.4465], [0.2023, 0.1994, 0.2010])])

    trainset = torchvision.datasets.CIFAR10(root=params['path'], train=True, download=True, transform=transform_train)
    testset = torchvision.datasets.CIFAR10(root=params['path'], train=False, download=True, transform=transform_validation)
    # nxb,   August 2, 2019   (I added the "download=True"    parameter in b/c I hadn't downloaded the dataset on this machine)
    #   Perhaps after we've already downloaded the data, though,    we should set download=False)

    trainloader = torch.utils.data.DataLoader(trainset, batch_size=params['batch_size'], shuffle=True , num_workers=params['num_workers'])
    testloader  = torch.utils.data.DataLoader(testset , batch_size=params['batch_size'], shuffle=False, num_workers=params['num_workers'])
    return trainloader, testloader

## 3. Train model :

In [16]:
def train_model(model, params):

    writer = SummaryWriter('runs/' + params['description'])
    model = model.to(params['device'])
    optimizer = optim.SGD(model.parameters(), lr=params['learning_rate'], 
                          weight_decay=params['weight_decay'], momentum=0.9, nesterov=True)
    scheduler = optim.lr_scheduler.MultiStepLR(optimizer, milestones=params['reduce_learning_rate'], gamma=0.1)
    criterion = nn.CrossEntropyLoss()
    best_accuracy = test_model(model, params)
    best_model = copy.deepcopy(model.state_dict())

    for epoch in pbar(range(params['num_epochs'])):
        scheduler.step()

        # Each epoch has a training and validation phase
        for phase in ['train', 'validation']:

            # Loss accumulator for each epoch
            logs = {'Loss': 0.0,
                    'Accuracy': 0.0}

            # Set the model to the correct phase
            model.train() if phase == 'train' else model.eval()

            # Iterate over data
            for image, label in params[phase+'_loader']:
                image = image.to(params['device'])
                label = label.to(params['device'])

                # Zero gradient
                optimizer.zero_grad()

                with torch.set_grad_enabled(phase == 'train'):

                    # Forward pass
                    prediction = model(image)
                    loss = criterion(prediction, label)
                    accuracy = torch.sum(torch.max(prediction, 1)[1] == label.data).item()

                    # Update log
                    logs['Loss'] += image.shape[0]*loss.detach().item()
                    logs['Accuracy'] += accuracy

                    # Backward pass
                    if phase == 'train':
                        loss.backward()
                        optimizer.step()

            # Normalize and write the data to TensorBoard
            logs['Loss'] /= len(params[phase+'_loader'].dataset)
            logs['Accuracy'] /= len(params[phase+'_loader'].dataset)
            writer.add_scalars('Loss', {phase: logs['Loss']}, epoch)
            writer.add_scalars('Accuracy', {phase: logs['Accuracy']}, epoch)

            # Save the best weights
            if phase == 'validation' and logs['Accuracy'] > best_accuracy:
                best_accuracy = logs['Accuracy']
                best_model = copy.deepcopy(model.state_dict())

        # Write best weights to disk
        if epoch % params['check_point'] == 0 or epoch == params['num_epochs']-1:
            torch.save(best_model, params['state_dict_path'] + params['description'] + '.pt')

    final_accuracy = test_model(model, params)
    writer.add_text('Final_Accuracy', str(final_accuracy), 0)
    writer.close()

## 4. Test Model :

In [17]:
def test_model(model, params):
    
    model = model.to(params['device']).eval()     
    phase = 'validation'
    logs = {'Accuracy': 0.0}
            
    # Iterate over data
    for image, label in pbar(params[phase+'_loader']):
        print(label)
        image = image.to(params['device'])
        label = label.to(params['device'])

        with torch.no_grad():
            prediction = model(image)
            accuracy = torch.sum(torch.max(prediction, 1)[1] == label.data).item()
            logs['Accuracy'] += accuracy

    logs['Accuracy'] /= len(params[phase+'_loader'].dataset)

    return logs['Accuracy']

## 5. Create PyTorch model : 

In [18]:
from models.inception import inception_v3

In [19]:
model = inception_v3(pretrained=True, progress=True, device='cuda')  # I trained this specifically for CIFAR-10

## 6. Put everything together :

In [20]:
# Train on cuda if available
device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
print("Using", device)

Using cuda:0


In [21]:
data_params = {'path': '/home/cat_macys_vr/assignment1/cs231n/datasets/CIFAR-10__huyvnphan/',  # '/raid/data/pytorch_dataset/cifar10',
               'batch_size': 256,
               'num_workers': 0   #1 #4
              }


train_loader, validation_loader = make_dataloaders(data_params)

train_params = {'description': 'densenet161',
                'num_epochs': 600,
                'reduce_learning_rate': [200, 400],
                'learning_rate': 5e-2, 'weight_decay': 1e-3,
                'check_point': 100, 'device': device,
                'state_dict_path': 'trained_models/',
                'train_loader': train_loader,
                'validation_loader': validation_loader}

Files already downloaded and verified
Files already downloaded and verified


In [22]:
test_model(model, train_params)

  0%|          | 0/40 [00:00<?, ?it/s]

tensor([3, 8, 8, 0, 6, 6, 1, 6, 3, 1, 0, 9, 5, 7, 9, 8, 5, 7, 8, 6, 7, 0, 4, 9,
        5, 2, 4, 0, 9, 6, 6, 5, 4, 5, 9, 2, 4, 1, 9, 5, 4, 6, 5, 6, 0, 9, 3, 9,
        7, 6, 9, 8, 0, 3, 8, 8, 7, 7, 4, 6, 7, 3, 6, 3, 6, 2, 1, 2, 3, 7, 2, 6,
        8, 8, 0, 2, 9, 3, 3, 8, 8, 1, 1, 7, 2, 5, 2, 7, 8, 9, 0, 3, 8, 6, 4, 6,
        6, 0, 0, 7, 4, 5, 6, 3, 1, 1, 3, 6, 8, 7, 4, 0, 6, 2, 1, 3, 0, 4, 2, 7,
        8, 3, 1, 2, 8, 0, 8, 3, 5, 2, 4, 1, 8, 9, 1, 2, 9, 7, 2, 9, 6, 5, 6, 3,
        8, 7, 6, 2, 5, 2, 8, 9, 6, 0, 0, 5, 2, 9, 5, 4, 2, 1, 6, 6, 8, 4, 8, 4,
        5, 0, 9, 9, 9, 8, 9, 9, 3, 7, 5, 0, 0, 5, 2, 2, 3, 8, 6, 3, 4, 0, 5, 8,
        0, 1, 7, 2, 8, 8, 7, 8, 5, 1, 8, 7, 1, 3, 0, 5, 7, 9, 7, 4, 5, 9, 8, 0,
        7, 9, 8, 2, 7, 6, 9, 4, 3, 9, 6, 4, 7, 6, 5, 1, 5, 8, 8, 0, 4, 0, 5, 5,
        1, 1, 8, 9, 0, 3, 1, 9, 2, 2, 5, 3, 9, 9, 4, 0])


  2%|▎         | 1/40 [00:01<01:16,  1.96s/it]

tensor([3, 0, 0, 9, 8, 1, 5, 7, 0, 8, 2, 4, 7, 0, 2, 3, 6, 3, 8, 5, 0, 3, 4, 3,
        9, 0, 6, 1, 0, 9, 1, 0, 7, 9, 1, 2, 6, 9, 3, 4, 6, 0, 0, 6, 6, 6, 3, 2,
        6, 1, 8, 2, 1, 6, 8, 6, 8, 0, 4, 0, 7, 7, 5, 5, 3, 5, 2, 3, 4, 1, 7, 5,
        4, 6, 1, 9, 3, 6, 6, 9, 3, 8, 0, 7, 2, 6, 2, 5, 8, 5, 4, 6, 8, 9, 9, 1,
        0, 2, 2, 7, 3, 2, 8, 0, 9, 5, 8, 1, 9, 4, 1, 3, 8, 1, 4, 7, 9, 4, 2, 7,
        0, 7, 0, 6, 6, 9, 0, 9, 2, 8, 7, 2, 2, 5, 1, 2, 6, 2, 9, 6, 2, 3, 0, 3,
        9, 8, 7, 8, 8, 4, 0, 1, 8, 2, 7, 9, 3, 6, 1, 9, 0, 7, 3, 7, 4, 5, 0, 0,
        2, 9, 3, 4, 0, 6, 2, 5, 3, 7, 3, 7, 2, 5, 3, 1, 1, 4, 9, 9, 5, 7, 5, 0,
        2, 2, 2, 9, 7, 3, 9, 4, 3, 5, 4, 6, 5, 6, 1, 4, 3, 4, 4, 3, 7, 8, 3, 7,
        8, 0, 5, 7, 6, 0, 5, 4, 8, 6, 8, 5, 5, 9, 9, 9, 5, 0, 1, 0, 8, 1, 1, 8,
        0, 2, 2, 0, 4, 6, 5, 4, 9, 4, 7, 9, 9, 4, 5, 6])


  5%|▌         | 2/40 [00:03<01:13,  1.94s/it]

tensor([6, 1, 5, 3, 8, 9, 5, 8, 5, 7, 0, 7, 0, 5, 0, 0, 4, 6, 9, 0, 9, 5, 6, 6,
        6, 2, 9, 0, 1, 7, 6, 7, 5, 9, 1, 6, 2, 5, 5, 5, 8, 5, 9, 4, 6, 4, 3, 2,
        0, 7, 6, 2, 2, 3, 9, 7, 9, 2, 6, 7, 1, 3, 6, 6, 8, 9, 7, 5, 4, 0, 8, 4,
        0, 9, 3, 4, 8, 9, 6, 9, 2, 6, 1, 4, 7, 3, 5, 3, 8, 5, 0, 2, 1, 6, 4, 3,
        3, 9, 6, 9, 8, 8, 5, 8, 6, 6, 2, 1, 7, 7, 1, 2, 7, 9, 9, 4, 4, 1, 2, 5,
        6, 8, 7, 6, 8, 3, 0, 5, 5, 3, 0, 7, 9, 1, 3, 4, 4, 5, 3, 9, 5, 6, 9, 2,
        1, 1, 4, 1, 9, 4, 7, 6, 3, 8, 9, 0, 1, 3, 6, 3, 6, 3, 2, 0, 3, 1, 0, 5,
        9, 6, 4, 8, 9, 6, 9, 6, 3, 0, 3, 2, 2, 7, 8, 3, 8, 2, 7, 5, 7, 2, 4, 8,
        7, 4, 2, 9, 8, 8, 6, 8, 8, 7, 4, 3, 3, 8, 4, 9, 4, 8, 8, 1, 8, 2, 1, 3,
        6, 5, 4, 2, 7, 9, 9, 4, 1, 4, 1, 3, 2, 7, 0, 7, 9, 7, 6, 6, 2, 5, 9, 2,
        9, 1, 2, 2, 6, 8, 2, 1, 3, 6, 6, 0, 1, 2, 7, 0])


  8%|▊         | 3/40 [00:05<01:11,  1.92s/it]

tensor([5, 4, 6, 1, 6, 4, 0, 2, 2, 6, 0, 5, 9, 1, 7, 6, 7, 0, 3, 9, 6, 8, 3, 0,
        3, 4, 7, 7, 1, 4, 7, 2, 7, 1, 4, 7, 4, 4, 8, 4, 7, 7, 5, 3, 7, 2, 0, 8,
        9, 5, 8, 3, 6, 2, 0, 8, 7, 3, 7, 6, 5, 3, 1, 3, 2, 2, 5, 4, 1, 2, 9, 2,
        7, 0, 7, 2, 1, 3, 2, 0, 2, 4, 7, 9, 8, 9, 0, 7, 7, 0, 7, 8, 4, 6, 3, 3,
        0, 1, 3, 7, 0, 1, 3, 1, 4, 2, 3, 8, 4, 2, 3, 7, 8, 4, 3, 0, 9, 0, 0, 1,
        0, 4, 4, 6, 7, 6, 1, 1, 3, 7, 3, 5, 2, 6, 6, 5, 8, 7, 1, 6, 8, 8, 5, 3,
        0, 4, 0, 1, 3, 8, 8, 0, 6, 9, 9, 9, 5, 5, 8, 6, 0, 0, 4, 2, 3, 2, 7, 2,
        2, 5, 9, 8, 9, 1, 7, 4, 0, 3, 0, 1, 3, 8, 3, 9, 6, 1, 4, 7, 0, 3, 7, 8,
        9, 1, 1, 6, 6, 6, 6, 9, 1, 9, 9, 4, 2, 1, 7, 0, 6, 8, 1, 9, 2, 9, 0, 4,
        7, 8, 3, 1, 2, 0, 1, 5, 8, 4, 6, 3, 8, 1, 3, 8, 5, 0, 8, 4, 8, 1, 1, 8,
        9, 6, 0, 8, 6, 1, 3, 4, 1, 6, 0, 5, 1, 1, 0, 0])


 10%|█         | 4/40 [00:07<01:08,  1.91s/it]

tensor([3, 5, 0, 0, 6, 6, 3, 3, 6, 3, 6, 6, 0, 7, 2, 2, 7, 5, 5, 2, 8, 5, 2, 1,
        1, 4, 3, 2, 0, 3, 1, 5, 3, 7, 6, 8, 9, 1, 6, 4, 9, 3, 9, 0, 9, 6, 3, 6,
        0, 7, 3, 8, 0, 0, 0, 6, 6, 6, 9, 2, 5, 4, 4, 6, 3, 6, 0, 8, 6, 0, 6, 2,
        7, 5, 1, 2, 7, 8, 8, 0, 9, 4, 9, 7, 2, 0, 2, 8, 3, 8, 9, 1, 5, 5, 4, 7,
        5, 3, 8, 3, 3, 6, 2, 8, 4, 3, 7, 1, 2, 4, 1, 6, 9, 0, 5, 8, 6, 1, 8, 6,
        1, 4, 2, 6, 2, 7, 2, 2, 0, 8, 6, 9, 1, 7, 1, 8, 8, 0, 7, 3, 8, 0, 3, 4,
        3, 7, 7, 9, 2, 3, 1, 9, 1, 9, 6, 3, 3, 3, 1, 0, 6, 1, 4, 1, 0, 0, 1, 1,
        6, 5, 4, 6, 2, 0, 7, 9, 8, 7, 2, 0, 6, 8, 1, 4, 3, 7, 0, 6, 1, 8, 5, 7,
        8, 4, 8, 3, 9, 9, 9, 8, 7, 6, 6, 3, 5, 1, 5, 9, 1, 4, 1, 5, 7, 0, 1, 5,
        2, 0, 8, 8, 5, 6, 7, 3, 2, 4, 7, 2, 5, 8, 2, 4, 9, 2, 1, 8, 1, 9, 8, 8,
        8, 9, 0, 4, 3, 3, 1, 8, 4, 6, 3, 3, 5, 2, 2, 8])


 12%|█▎        | 5/40 [00:09<01:06,  1.90s/it]

tensor([3, 8, 9, 5, 8, 9, 8, 9, 1, 6, 5, 9, 4, 4, 8, 0, 7, 2, 9, 7, 4, 1, 6, 4,
        4, 9, 1, 2, 5, 6, 0, 8, 6, 1, 9, 4, 5, 9, 5, 0, 7, 2, 0, 0, 4, 2, 6, 6,
        5, 5, 2, 8, 1, 7, 3, 1, 4, 5, 6, 5, 1, 4, 7, 0, 9, 4, 3, 8, 2, 8, 4, 7,
        2, 3, 1, 5, 2, 9, 8, 9, 7, 9, 5, 1, 4, 0, 8, 2, 3, 8, 9, 1, 1, 3, 2, 4,
        9, 3, 1, 7, 4, 6, 2, 8, 9, 5, 3, 9, 5, 5, 6, 7, 2, 4, 6, 3, 1, 0, 7, 2,
        5, 4, 7, 6, 1, 1, 9, 8, 1, 0, 1, 3, 1, 1, 1, 7, 3, 9, 6, 8, 4, 6, 8, 4,
        9, 4, 7, 9, 7, 6, 8, 4, 9, 7, 0, 1, 6, 1, 5, 9, 0, 4, 3, 4, 1, 3, 0, 8,
        4, 6, 2, 2, 6, 5, 3, 6, 2, 1, 1, 8, 6, 0, 4, 0, 1, 9, 7, 1, 3, 7, 7, 8,
        7, 7, 3, 9, 7, 7, 7, 2, 1, 2, 8, 6, 4, 0, 7, 9, 8, 6, 8, 4, 9, 1, 7, 2,
        2, 8, 5, 8, 1, 2, 2, 4, 1, 2, 5, 2, 8, 1, 8, 1, 8, 6, 0, 2, 4, 1, 3, 6,
        7, 7, 4, 4, 3, 3, 4, 5, 2, 4, 3, 7, 8, 4, 4, 4])


 15%|█▌        | 6/40 [00:11<01:04,  1.90s/it]

tensor([5, 4, 3, 2, 8, 4, 5, 5, 4, 1, 4, 2, 5, 1, 6, 4, 3, 4, 4, 0, 8, 8, 4, 5,
        7, 5, 6, 9, 1, 6, 7, 2, 0, 1, 4, 5, 6, 0, 0, 2, 7, 5, 6, 0, 6, 2, 9, 1,
        7, 7, 5, 2, 5, 6, 4, 1, 4, 3, 3, 3, 0, 3, 5, 5, 8, 9, 7, 3, 1, 3, 3, 3,
        4, 4, 2, 3, 3, 8, 1, 7, 7, 0, 7, 4, 5, 1, 4, 2, 4, 3, 9, 9, 4, 9, 9, 1,
        8, 1, 6, 7, 5, 5, 4, 9, 7, 6, 5, 9, 2, 4, 0, 7, 8, 5, 5, 0, 0, 9, 9, 8,
        2, 5, 4, 8, 3, 6, 3, 6, 0, 6, 6, 6, 9, 6, 6, 8, 6, 2, 4, 5, 8, 1, 2, 7,
        6, 5, 7, 8, 1, 8, 0, 8, 6, 9, 2, 8, 9, 4, 0, 9, 4, 9, 5, 7, 5, 5, 9, 5,
        3, 0, 1, 9, 7, 2, 4, 1, 0, 8, 0, 3, 1, 7, 0, 0, 4, 8, 6, 2, 4, 0, 0, 9,
        0, 8, 4, 5, 9, 3, 9, 0, 5, 6, 5, 0, 1, 4, 8, 1, 0, 5, 2, 1, 0, 2, 8, 1,
        5, 6, 7, 7, 2, 6, 2, 5, 0, 1, 4, 2, 5, 4, 6, 2, 2, 1, 7, 2, 8, 5, 5, 3,
        0, 4, 8, 3, 7, 6, 3, 8, 1, 0, 1, 3, 3, 0, 7, 4])


 18%|█▊        | 7/40 [00:13<01:02,  1.90s/it]

tensor([9, 5, 3, 6, 0, 1, 4, 4, 4, 4, 2, 2, 5, 8, 1, 5, 9, 8, 1, 1, 5, 3, 9, 9,
        7, 6, 5, 0, 8, 4, 7, 0, 9, 2, 8, 4, 7, 1, 3, 9, 6, 8, 9, 0, 4, 9, 6, 7,
        8, 9, 4, 8, 9, 7, 2, 5, 3, 7, 1, 0, 2, 9, 5, 5, 8, 5, 4, 2, 8, 3, 5, 5,
        7, 7, 8, 6, 2, 8, 2, 3, 5, 6, 8, 0, 2, 3, 7, 0, 1, 9, 1, 3, 7, 5, 8, 3,
        2, 9, 6, 8, 6, 9, 3, 8, 9, 8, 0, 7, 8, 5, 0, 0, 1, 3, 9, 1, 5, 3, 4, 4,
        0, 9, 9, 9, 9, 8, 2, 4, 2, 2, 5, 1, 9, 1, 0, 9, 4, 2, 1, 6, 0, 3, 7, 6,
        3, 1, 8, 6, 5, 7, 2, 8, 4, 4, 8, 3, 5, 0, 5, 7, 4, 4, 2, 2, 7, 3, 6, 0,
        2, 7, 6, 2, 3, 0, 7, 7, 8, 1, 1, 4, 6, 0, 6, 6, 5, 5, 6, 3, 9, 3, 6, 8,
        7, 6, 4, 9, 5, 6, 4, 1, 6, 3, 8, 2, 3, 9, 8, 5, 1, 5, 4, 5, 7, 5, 7, 8,
        9, 1, 7, 2, 5, 6, 8, 4, 6, 5, 3, 9, 9, 8, 5, 5, 6, 4, 5, 9, 7, 3, 4, 1,
        4, 2, 3, 6, 5, 5, 2, 8, 0, 0, 1, 8, 3, 1, 3, 5])


 20%|██        | 8/40 [00:15<01:00,  1.89s/it]

tensor([8, 3, 8, 5, 8, 6, 3, 5, 5, 5, 0, 9, 5, 5, 7, 1, 8, 2, 2, 7, 3, 4, 2, 9,
        6, 2, 2, 4, 3, 0, 2, 0, 1, 3, 2, 2, 7, 0, 1, 0, 7, 7, 2, 0, 1, 6, 5, 0,
        2, 2, 0, 1, 2, 6, 0, 1, 6, 6, 5, 3, 4, 0, 0, 9, 1, 0, 2, 5, 9, 7, 8, 6,
        4, 6, 0, 2, 0, 1, 9, 2, 4, 9, 0, 7, 2, 3, 4, 8, 0, 2, 6, 8, 9, 7, 1, 9,
        3, 7, 2, 7, 9, 6, 9, 5, 7, 8, 5, 9, 8, 3, 9, 8, 0, 5, 5, 7, 1, 2, 0, 7,
        5, 8, 2, 2, 5, 3, 9, 3, 1, 9, 3, 1, 4, 3, 4, 4, 9, 0, 9, 5, 9, 2, 9, 4,
        0, 2, 8, 4, 8, 6, 8, 2, 0, 5, 9, 6, 9, 5, 1, 3, 4, 4, 2, 7, 2, 4, 4, 0,
        2, 8, 4, 5, 1, 2, 6, 8, 1, 7, 2, 8, 7, 4, 3, 3, 0, 3, 7, 2, 5, 6, 1, 7,
        9, 0, 2, 3, 9, 8, 9, 5, 0, 0, 7, 6, 3, 3, 8, 1, 4, 0, 1, 5, 4, 3, 2, 6,
        0, 8, 8, 6, 3, 3, 2, 9, 7, 5, 1, 6, 0, 5, 9, 9, 4, 5, 9, 4, 8, 3, 3, 2,
        5, 1, 9, 5, 5, 8, 1, 8, 9, 7, 0, 6, 3, 2, 0, 2])


 22%|██▎       | 9/40 [00:17<00:58,  1.89s/it]

tensor([6, 9, 3, 9, 1, 6, 6, 7, 2, 6, 3, 2, 5, 5, 2, 7, 5, 2, 0, 8, 0, 7, 7, 1,
        7, 4, 0, 2, 2, 6, 1, 5, 9, 7, 6, 2, 7, 0, 5, 6, 0, 1, 1, 8, 4, 5, 3, 1,
        2, 4, 8, 9, 8, 1, 7, 2, 7, 2, 3, 2, 6, 7, 9, 4, 0, 1, 0, 4, 5, 0, 8, 0,
        7, 6, 1, 0, 8, 5, 9, 2, 5, 4, 4, 9, 0, 6, 1, 2, 5, 1, 6, 7, 1, 5, 8, 8,
        0, 3, 9, 4, 0, 3, 4, 9, 2, 4, 3, 0, 6, 4, 5, 6, 6, 7, 8, 4, 8, 8, 3, 2,
        8, 0, 2, 5, 6, 8, 7, 8, 9, 9, 0, 6, 6, 8, 1, 1, 7, 4, 4, 2, 3, 6, 9, 6,
        2, 4, 1, 8, 2, 3, 1, 6, 8, 6, 8, 5, 9, 0, 1, 0, 2, 9, 2, 9, 0, 9, 7, 8,
        5, 0, 6, 6, 8, 1, 6, 5, 8, 7, 9, 2, 1, 6, 7, 5, 6, 3, 0, 9, 8, 9, 5, 9,
        9, 9, 3, 6, 4, 4, 8, 8, 1, 7, 3, 0, 6, 2, 4, 8, 2, 2, 9, 2, 2, 9, 7, 4,
        1, 9, 1, 4, 3, 5, 3, 8, 9, 3, 6, 2, 0, 8, 9, 0, 5, 1, 5, 8, 2, 1, 2, 3,
        2, 4, 8, 4, 0, 6, 4, 2, 9, 8, 4, 7, 6, 2, 4, 7])


 25%|██▌       | 10/40 [00:18<00:56,  1.89s/it]

tensor([7, 6, 4, 2, 2, 3, 2, 4, 9, 0, 0, 9, 6, 5, 8, 5, 2, 4, 8, 8, 6, 4, 5, 7,
        6, 3, 7, 0, 4, 0, 5, 8, 2, 8, 0, 4, 9, 2, 3, 6, 8, 1, 6, 0, 8, 4, 3, 1,
        4, 5, 2, 4, 3, 6, 0, 7, 1, 3, 5, 7, 1, 8, 0, 4, 8, 2, 7, 7, 0, 2, 2, 4,
        3, 3, 1, 6, 1, 6, 4, 6, 7, 4, 7, 3, 8, 1, 3, 1, 4, 6, 5, 5, 7, 4, 5, 4,
        9, 7, 7, 2, 3, 3, 6, 7, 4, 1, 6, 1, 0, 6, 9, 6, 0, 2, 0, 2, 4, 7, 0, 0,
        0, 0, 7, 5, 3, 2, 7, 8, 5, 5, 2, 2, 8, 9, 1, 2, 2, 0, 3, 4, 9, 1, 6, 9,
        0, 8, 3, 6, 4, 6, 7, 5, 1, 8, 9, 0, 5, 0, 5, 4, 3, 8, 5, 2, 0, 8, 5, 9,
        2, 5, 6, 5, 7, 3, 7, 2, 8, 8, 4, 2, 1, 6, 2, 9, 9, 8, 2, 3, 1, 0, 5, 2,
        3, 7, 9, 5, 9, 1, 1, 3, 2, 7, 4, 4, 1, 9, 4, 5, 5, 2, 7, 7, 6, 1, 2, 3,
        7, 5, 4, 3, 7, 7, 0, 0, 1, 4, 4, 3, 7, 9, 8, 7, 0, 9, 0, 3, 7, 6, 3, 8,
        4, 5, 6, 5, 3, 4, 2, 9, 2, 4, 9, 2, 0, 1, 1, 3])


 28%|██▊       | 11/40 [00:20<00:54,  1.89s/it]

tensor([7, 9, 5, 1, 6, 0, 3, 3, 7, 2, 4, 5, 2, 3, 6, 3, 3, 0, 4, 4, 3, 8, 1, 2,
        3, 2, 7, 1, 4, 5, 6, 8, 6, 6, 5, 4, 2, 9, 9, 3, 2, 0, 4, 1, 5, 5, 6, 6,
        5, 6, 9, 7, 9, 4, 8, 3, 1, 9, 8, 1, 9, 0, 3, 0, 4, 7, 7, 2, 6, 5, 6, 9,
        7, 7, 5, 7, 0, 8, 6, 7, 8, 3, 8, 9, 3, 2, 9, 2, 7, 4, 1, 9, 7, 0, 8, 1,
        2, 0, 6, 5, 1, 4, 2, 8, 2, 8, 3, 3, 1, 4, 9, 8, 1, 1, 7, 9, 9, 4, 6, 8,
        5, 3, 2, 2, 3, 8, 5, 1, 9, 7, 9, 1, 5, 3, 1, 0, 3, 8, 9, 2, 0, 8, 0, 7,
        9, 4, 8, 3, 0, 7, 0, 7, 0, 5, 5, 4, 3, 0, 9, 9, 0, 2, 2, 0, 8, 4, 4, 5,
        2, 2, 1, 1, 5, 5, 4, 3, 4, 0, 8, 4, 8, 2, 5, 6, 5, 1, 7, 2, 4, 0, 5, 6,
        8, 7, 6, 3, 1, 3, 6, 6, 4, 5, 5, 5, 1, 8, 7, 3, 0, 2, 5, 1, 1, 1, 6, 7,
        2, 1, 2, 3, 9, 8, 3, 1, 7, 4, 8, 4, 7, 5, 8, 6, 0, 3, 0, 8, 3, 4, 4, 9,
        4, 6, 1, 5, 7, 7, 9, 0, 1, 3, 1, 3, 2, 4, 0, 7])


 30%|███       | 12/40 [00:22<00:52,  1.88s/it]

tensor([8, 3, 1, 6, 2, 1, 8, 8, 1, 9, 7, 5, 3, 2, 9, 0, 6, 8, 4, 1, 5, 7, 2, 4,
        7, 0, 4, 9, 0, 7, 0, 6, 7, 1, 8, 5, 8, 8, 7, 0, 0, 3, 2, 5, 7, 0, 6, 4,
        4, 9, 1, 8, 6, 3, 9, 4, 9, 3, 3, 7, 5, 9, 7, 7, 4, 5, 8, 2, 8, 9, 5, 3,
        1, 9, 2, 2, 0, 1, 1, 8, 1, 1, 9, 8, 9, 9, 0, 2, 6, 9, 1, 9, 0, 1, 9, 2,
        9, 9, 9, 0, 9, 9, 2, 1, 7, 0, 5, 6, 3, 9, 8, 3, 0, 2, 7, 7, 4, 8, 3, 0,
        5, 2, 3, 5, 7, 6, 6, 6, 5, 6, 5, 4, 5, 8, 8, 0, 9, 7, 9, 0, 6, 9, 6, 1,
        4, 7, 9, 0, 3, 5, 4, 7, 6, 9, 8, 2, 3, 9, 3, 3, 7, 8, 3, 5, 5, 9, 6, 1,
        4, 7, 4, 8, 7, 7, 1, 3, 0, 3, 2, 5, 4, 9, 3, 5, 4, 7, 0, 3, 7, 0, 2, 1,
        5, 8, 7, 3, 5, 7, 8, 5, 7, 8, 1, 5, 4, 7, 0, 8, 3, 2, 9, 7, 4, 1, 6, 5,
        9, 8, 5, 5, 4, 1, 0, 9, 4, 4, 3, 0, 4, 8, 0, 8, 2, 9, 5, 9, 7, 4, 6, 7,
        9, 2, 9, 3, 7, 7, 8, 2, 2, 0, 2, 5, 3, 6, 4, 7])


 32%|███▎      | 13/40 [00:24<00:50,  1.89s/it]

tensor([2, 3, 7, 8, 7, 2, 5, 0, 0, 7, 0, 9, 6, 1, 0, 3, 9, 7, 4, 9, 1, 6, 8, 1,
        2, 3, 3, 5, 4, 8, 9, 7, 4, 4, 1, 2, 4, 9, 8, 7, 9, 5, 1, 2, 1, 6, 6, 4,
        5, 7, 4, 5, 8, 5, 2, 8, 7, 8, 2, 3, 6, 1, 3, 3, 1, 5, 1, 9, 0, 9, 2, 0,
        6, 2, 4, 8, 5, 7, 6, 1, 2, 9, 4, 5, 0, 3, 3, 7, 7, 7, 1, 4, 5, 0, 2, 8,
        5, 0, 0, 6, 2, 0, 8, 4, 5, 4, 5, 6, 4, 7, 9, 4, 2, 0, 6, 4, 0, 0, 6, 4,
        6, 1, 9, 5, 5, 2, 2, 6, 3, 4, 5, 9, 1, 7, 2, 3, 9, 6, 5, 0, 2, 9, 7, 1,
        7, 2, 2, 0, 8, 6, 4, 3, 2, 7, 7, 0, 4, 1, 6, 5, 1, 3, 0, 3, 9, 0, 0, 2,
        5, 0, 4, 0, 1, 9, 8, 4, 9, 4, 2, 4, 3, 3, 4, 0, 4, 3, 2, 8, 9, 1, 5, 8,
        1, 8, 2, 4, 5, 2, 4, 1, 1, 6, 6, 8, 5, 2, 2, 5, 0, 8, 2, 3, 6, 2, 9, 6,
        1, 4, 5, 9, 0, 1, 0, 0, 8, 1, 1, 6, 6, 9, 5, 4, 1, 7, 8, 6, 9, 1, 7, 6,
        0, 9, 3, 5, 3, 2, 5, 3, 4, 9, 7, 1, 4, 4, 6, 1])


 35%|███▌      | 14/40 [00:26<00:49,  1.89s/it]

tensor([3, 8, 8, 0, 6, 7, 7, 6, 7, 2, 3, 2, 2, 6, 2, 7, 4, 0, 3, 6, 2, 6, 3, 3,
        0, 9, 5, 1, 1, 5, 3, 6, 4, 3, 4, 1, 0, 4, 5, 5, 2, 8, 9, 4, 3, 1, 8, 0,
        1, 3, 3, 4, 4, 2, 9, 7, 6, 8, 1, 8, 9, 1, 3, 1, 7, 3, 0, 0, 2, 8, 3, 9,
        2, 7, 2, 6, 0, 1, 6, 1, 6, 7, 5, 5, 2, 5, 9, 4, 0, 2, 3, 4, 9, 4, 1, 0,
        0, 2, 3, 8, 9, 2, 8, 9, 5, 7, 9, 1, 4, 6, 2, 8, 4, 4, 8, 9, 3, 1, 1, 6,
        5, 8, 4, 6, 4, 5, 2, 6, 4, 2, 3, 1, 3, 6, 8, 5, 2, 2, 7, 1, 6, 2, 5, 9,
        2, 8, 1, 6, 9, 2, 7, 5, 3, 2, 9, 7, 0, 2, 9, 3, 4, 1, 7, 9, 5, 8, 9, 7,
        3, 4, 0, 9, 7, 4, 2, 4, 7, 0, 1, 8, 1, 0, 4, 6, 1, 9, 9, 2, 1, 2, 5, 6,
        9, 7, 7, 3, 4, 2, 0, 2, 5, 6, 7, 3, 7, 9, 4, 2, 0, 6, 1, 7, 5, 6, 5, 3,
        9, 2, 7, 8, 5, 9, 5, 8, 5, 4, 3, 7, 9, 8, 1, 2, 2, 8, 9, 3, 0, 8, 4, 0,
        2, 0, 1, 4, 1, 8, 5, 5, 7, 9, 8, 3, 7, 9, 1, 5])


 38%|███▊      | 15/40 [00:28<00:47,  1.89s/it]

tensor([6, 9, 8, 7, 2, 0, 9, 0, 8, 5, 9, 4, 2, 9, 8, 1, 9, 1, 8, 3, 7, 6, 4, 2,
        3, 7, 0, 3, 5, 8, 8, 8, 7, 9, 6, 2, 7, 4, 5, 7, 6, 7, 1, 7, 3, 6, 8, 2,
        6, 6, 7, 1, 5, 9, 7, 1, 7, 0, 1, 6, 3, 3, 9, 0, 1, 2, 3, 2, 2, 5, 4, 9,
        8, 7, 4, 4, 9, 7, 6, 7, 7, 1, 2, 3, 5, 1, 9, 0, 3, 3, 1, 5, 6, 6, 2, 4,
        6, 8, 8, 9, 6, 6, 1, 0, 7, 5, 8, 2, 1, 5, 8, 1, 4, 7, 5, 0, 3, 9, 9, 5,
        2, 8, 4, 1, 9, 0, 4, 4, 8, 2, 9, 0, 7, 9, 8, 7, 1, 3, 2, 9, 9, 5, 9, 7,
        6, 7, 7, 1, 1, 2, 4, 2, 0, 6, 8, 7, 6, 2, 2, 9, 8, 2, 4, 2, 0, 5, 8, 6,
        8, 2, 7, 7, 3, 1, 8, 1, 6, 5, 9, 7, 8, 9, 6, 4, 8, 1, 9, 4, 0, 4, 1, 4,
        3, 6, 2, 2, 7, 0, 0, 7, 0, 7, 4, 3, 6, 7, 7, 4, 5, 4, 3, 5, 5, 4, 7, 1,
        0, 7, 7, 1, 6, 5, 0, 7, 7, 4, 6, 1, 8, 5, 9, 3, 5, 6, 2, 2, 7, 3, 5, 8,
        1, 0, 6, 8, 7, 8, 8, 5, 7, 5, 4, 9, 7, 3, 3, 8])


 40%|████      | 16/40 [00:30<00:45,  1.88s/it]

tensor([8, 3, 9, 4, 7, 2, 0, 8, 0, 7, 3, 3, 2, 5, 2, 4, 4, 0, 4, 8, 2, 4, 0, 6,
        4, 5, 6, 0, 8, 8, 0, 6, 1, 1, 6, 1, 4, 2, 1, 1, 2, 4, 4, 5, 5, 8, 8, 5,
        1, 8, 2, 3, 3, 9, 6, 6, 5, 0, 7, 3, 3, 2, 7, 4, 5, 6, 0, 2, 1, 8, 1, 0,
        9, 3, 1, 0, 5, 4, 2, 3, 3, 6, 7, 6, 0, 0, 5, 7, 4, 7, 7, 0, 6, 1, 3, 9,
        0, 9, 0, 3, 8, 4, 8, 8, 4, 1, 0, 2, 2, 4, 1, 1, 2, 4, 3, 4, 1, 4, 7, 1,
        0, 0, 9, 0, 4, 2, 8, 8, 5, 5, 0, 4, 0, 8, 6, 2, 5, 9, 2, 9, 1, 1, 5, 4,
        7, 5, 8, 6, 2, 1, 5, 5, 3, 4, 1, 8, 9, 9, 8, 9, 8, 6, 8, 5, 8, 9, 4, 6,
        2, 6, 3, 7, 4, 0, 0, 1, 7, 5, 1, 5, 9, 3, 1, 6, 8, 7, 3, 6, 9, 1, 2, 0,
        1, 7, 2, 6, 1, 9, 0, 0, 8, 9, 9, 2, 8, 6, 2, 5, 6, 0, 3, 3, 0, 7, 4, 7,
        5, 0, 1, 6, 8, 8, 1, 2, 1, 5, 4, 5, 9, 6, 7, 1, 0, 6, 9, 2, 7, 7, 3, 9,
        9, 1, 9, 7, 0, 1, 3, 5, 4, 6, 3, 8, 8, 0, 4, 8])


 42%|████▎     | 17/40 [00:32<00:43,  1.88s/it]

tensor([3, 6, 7, 0, 0, 4, 5, 2, 6, 8, 4, 9, 9, 9, 2, 0, 4, 2, 8, 1, 1, 0, 0, 3,
        7, 4, 1, 1, 9, 7, 7, 4, 6, 8, 6, 0, 2, 8, 5, 3, 5, 3, 5, 7, 9, 8, 4, 4,
        3, 1, 4, 8, 3, 6, 5, 3, 0, 8, 9, 5, 7, 6, 2, 0, 4, 9, 9, 0, 5, 2, 3, 6,
        1, 1, 0, 2, 8, 2, 1, 1, 7, 5, 2, 3, 4, 1, 2, 9, 2, 1, 3, 4, 8, 9, 0, 0,
        4, 9, 0, 2, 2, 0, 6, 8, 7, 3, 3, 8, 9, 0, 2, 5, 3, 6, 1, 3, 9, 5, 0, 5,
        4, 4, 0, 1, 1, 6, 1, 7, 9, 9, 6, 2, 4, 3, 8, 3, 4, 7, 0, 2, 8, 4, 8, 3,
        8, 8, 8, 3, 3, 5, 7, 7, 0, 4, 1, 5, 9, 7, 0, 6, 8, 4, 9, 0, 1, 8, 9, 6,
        3, 9, 2, 4, 4, 0, 3, 3, 5, 4, 5, 1, 1, 8, 2, 2, 9, 3, 7, 8, 9, 2, 3, 1,
        7, 3, 2, 3, 0, 1, 9, 5, 5, 3, 4, 5, 2, 0, 0, 3, 1, 3, 4, 7, 4, 2, 8, 4,
        8, 9, 9, 4, 2, 4, 3, 6, 4, 6, 4, 6, 6, 3, 8, 7, 4, 8, 0, 5, 9, 3, 9, 0,
        7, 5, 1, 9, 7, 9, 1, 8, 4, 2, 1, 6, 4, 3, 0, 1])


 45%|████▌     | 18/40 [00:34<00:41,  1.88s/it]

tensor([4, 8, 6, 1, 7, 1, 3, 5, 4, 3, 3, 9, 7, 8, 2, 5, 5, 4, 5, 4, 5, 7, 2, 5,
        5, 1, 7, 7, 8, 3, 6, 0, 2, 5, 2, 3, 7, 4, 2, 1, 6, 8, 6, 8, 3, 2, 7, 7,
        9, 7, 1, 4, 7, 4, 6, 1, 7, 3, 0, 0, 6, 6, 8, 6, 6, 0, 3, 4, 7, 4, 9, 4,
        9, 9, 3, 4, 1, 4, 0, 3, 7, 1, 2, 2, 8, 4, 7, 8, 5, 5, 6, 5, 6, 0, 6, 4,
        9, 3, 2, 7, 3, 8, 3, 9, 4, 1, 6, 9, 9, 4, 3, 9, 8, 1, 6, 9, 5, 9, 0, 9,
        7, 2, 2, 4, 2, 6, 4, 8, 1, 9, 5, 6, 5, 1, 8, 6, 6, 7, 6, 5, 0, 2, 7, 6,
        1, 2, 3, 1, 6, 4, 9, 0, 3, 9, 1, 0, 5, 3, 1, 6, 9, 8, 9, 0, 6, 1, 6, 2,
        3, 5, 6, 9, 0, 7, 2, 4, 0, 3, 6, 8, 6, 9, 1, 9, 0, 6, 4, 5, 9, 5, 4, 1,
        9, 2, 0, 7, 3, 5, 1, 8, 3, 0, 5, 2, 8, 8, 3, 9, 6, 0, 3, 5, 3, 4, 1, 5,
        0, 7, 3, 9, 4, 5, 3, 1, 4, 2, 4, 9, 9, 7, 1, 7, 0, 1, 2, 1, 5, 3, 8, 4,
        1, 5, 1, 9, 9, 7, 0, 1, 7, 6, 2, 6, 5, 0, 3, 1])


 48%|████▊     | 19/40 [00:35<00:39,  1.88s/it]

tensor([3, 8, 9, 9, 0, 8, 8, 7, 9, 2, 9, 1, 3, 6, 7, 6, 0, 8, 6, 2, 4, 2, 3, 5,
        5, 4, 1, 6, 7, 1, 6, 1, 7, 1, 6, 2, 3, 0, 4, 2, 9, 7, 5, 5, 6, 0, 8, 8,
        1, 1, 3, 5, 6, 5, 3, 8, 7, 5, 0, 7, 7, 5, 7, 1, 9, 9, 1, 9, 9, 3, 5, 5,
        3, 0, 3, 0, 6, 6, 5, 6, 8, 9, 4, 5, 1, 1, 2, 7, 9, 0, 1, 2, 5, 4, 4, 7,
        4, 8, 3, 3, 5, 3, 5, 7, 0, 1, 2, 9, 8, 1, 7, 3, 5, 9, 1, 9, 1, 1, 3, 6,
        7, 4, 8, 3, 5, 6, 7, 0, 1, 7, 7, 2, 3, 3, 5, 1, 7, 6, 8, 4, 4, 5, 7, 3,
        4, 7, 7, 3, 6, 0, 7, 2, 6, 7, 4, 1, 8, 0, 2, 2, 1, 5, 4, 0, 9, 3, 4, 4,
        2, 3, 8, 5, 0, 0, 4, 6, 3, 9, 2, 0, 5, 7, 5, 5, 1, 7, 3, 1, 6, 7, 9, 0,
        1, 2, 1, 4, 6, 1, 5, 9, 5, 0, 4, 9, 4, 9, 7, 1, 6, 8, 0, 6, 5, 0, 8, 1,
        7, 5, 6, 9, 3, 2, 3, 9, 8, 6, 1, 0, 9, 0, 9, 8, 7, 7, 5, 2, 3, 5, 5, 7,
        8, 7, 6, 9, 1, 9, 8, 6, 5, 6, 7, 5, 6, 7, 2, 5])


 50%|█████     | 20/40 [00:37<00:37,  1.88s/it]

tensor([9, 6, 6, 7, 0, 1, 0, 1, 3, 2, 5, 6, 1, 2, 5, 1, 9, 9, 8, 6, 8, 6, 8, 9,
        0, 4, 0, 3, 3, 6, 4, 9, 9, 7, 2, 3, 5, 6, 4, 6, 0, 3, 7, 5, 0, 2, 9, 2,
        7, 2, 9, 2, 4, 0, 6, 6, 3, 4, 5, 5, 9, 1, 1, 4, 4, 3, 5, 4, 8, 0, 0, 3,
        9, 4, 7, 4, 2, 8, 3, 6, 3, 1, 9, 6, 3, 8, 3, 8, 4, 7, 5, 7, 5, 3, 1, 2,
        6, 1, 3, 2, 0, 7, 5, 0, 0, 0, 1, 4, 3, 5, 8, 4, 3, 1, 7, 1, 0, 4, 2, 4,
        1, 4, 0, 1, 2, 1, 7, 7, 9, 8, 7, 5, 4, 0, 9, 0, 0, 8, 2, 0, 0, 2, 4, 8,
        6, 2, 4, 6, 3, 5, 1, 5, 3, 7, 2, 2, 9, 8, 0, 0, 0, 3, 4, 4, 6, 1, 6, 7,
        4, 4, 3, 9, 4, 0, 8, 0, 4, 6, 5, 7, 9, 7, 0, 5, 7, 7, 3, 1, 9, 3, 0, 9,
        5, 3, 7, 9, 4, 4, 1, 7, 7, 1, 4, 1, 2, 8, 7, 0, 0, 4, 7, 2, 9, 7, 6, 9,
        3, 5, 8, 0, 3, 6, 8, 3, 2, 4, 7, 1, 1, 3, 9, 7, 5, 1, 0, 8, 7, 0, 1, 6,
        9, 3, 2, 7, 7, 8, 1, 0, 3, 4, 6, 7, 5, 2, 0, 1])


 52%|█████▎    | 21/40 [00:39<00:35,  1.88s/it]

tensor([5, 5, 1, 4, 1, 3, 0, 8, 6, 2, 1, 3, 6, 4, 1, 9, 0, 4, 1, 0, 1, 9, 8, 6,
        9, 2, 4, 7, 2, 2, 7, 4, 9, 1, 3, 2, 6, 3, 4, 4, 9, 4, 8, 2, 6, 6, 1, 6,
        3, 6, 5, 8, 4, 6, 7, 1, 9, 3, 6, 7, 6, 0, 7, 1, 9, 5, 2, 6, 7, 7, 6, 5,
        9, 1, 5, 6, 0, 2, 0, 9, 1, 8, 3, 5, 0, 0, 0, 5, 7, 7, 8, 5, 5, 5, 1, 6,
        1, 5, 1, 0, 6, 2, 3, 2, 1, 7, 5, 1, 9, 8, 3, 6, 9, 7, 3, 0, 2, 3, 9, 4,
        4, 3, 9, 6, 8, 6, 6, 8, 5, 4, 3, 6, 7, 7, 4, 3, 9, 6, 2, 4, 0, 1, 3, 6,
        4, 9, 2, 6, 0, 3, 8, 7, 5, 3, 3, 8, 3, 2, 6, 3, 6, 2, 7, 4, 5, 7, 9, 0,
        0, 6, 7, 2, 8, 5, 5, 7, 5, 5, 9, 4, 6, 4, 7, 3, 3, 6, 4, 1, 6, 3, 1, 6,
        7, 0, 5, 0, 1, 9, 7, 3, 5, 2, 3, 9, 5, 6, 4, 0, 0, 0, 8, 0, 3, 2, 4, 5,
        3, 7, 9, 6, 9, 3, 1, 2, 6, 7, 4, 5, 1, 3, 7, 6, 9, 5, 8, 5, 8, 5, 8, 5,
        6, 5, 0, 8, 3, 8, 1, 8, 1, 5, 0, 9, 8, 6, 3, 6])


 55%|█████▌    | 22/40 [00:41<00:33,  1.88s/it]

tensor([3, 4, 4, 7, 4, 7, 4, 3, 2, 4, 5, 5, 7, 5, 4, 5, 8, 0, 5, 4, 0, 5, 4, 3,
        3, 2, 4, 2, 9, 4, 8, 8, 6, 2, 6, 1, 7, 0, 4, 3, 8, 5, 9, 7, 6, 7, 1, 0,
        2, 3, 5, 6, 7, 1, 6, 2, 8, 3, 7, 2, 5, 7, 5, 7, 7, 1, 7, 4, 3, 3, 4, 0,
        4, 2, 7, 0, 2, 2, 6, 6, 2, 5, 2, 6, 0, 6, 1, 2, 9, 0, 0, 1, 5, 5, 1, 6,
        7, 8, 5, 6, 3, 3, 6, 0, 9, 9, 2, 2, 6, 1, 4, 6, 2, 5, 7, 8, 8, 5, 6, 4,
        3, 2, 3, 4, 4, 4, 8, 9, 2, 6, 0, 9, 7, 9, 8, 7, 7, 3, 8, 2, 9, 3, 5, 7,
        5, 8, 7, 3, 7, 0, 7, 1, 6, 2, 4, 0, 0, 2, 9, 8, 2, 8, 7, 7, 5, 5, 0, 1,
        2, 9, 6, 1, 0, 1, 0, 6, 5, 0, 5, 7, 4, 4, 8, 3, 5, 3, 7, 2, 5, 8, 3, 5,
        7, 3, 7, 0, 5, 4, 5, 6, 9, 3, 5, 3, 8, 2, 4, 2, 2, 6, 7, 6, 0, 5, 5, 8,
        4, 5, 6, 1, 3, 5, 4, 7, 3, 0, 7, 2, 0, 1, 2, 4, 9, 6, 4, 5, 9, 7, 7, 6,
        7, 4, 3, 1, 6, 9, 4, 8, 0, 3, 1, 6, 4, 4, 2, 1])


 57%|█████▊    | 23/40 [00:43<00:31,  1.88s/it]

tensor([4, 6, 3, 0, 8, 9, 7, 6, 8, 4, 8, 1, 4, 5, 9, 4, 7, 1, 4, 5, 3, 1, 2, 6,
        4, 7, 1, 5, 9, 3, 1, 4, 8, 7, 7, 0, 4, 3, 4, 6, 7, 7, 5, 4, 4, 4, 3, 6,
        6, 1, 9, 9, 7, 9, 4, 3, 2, 9, 8, 5, 0, 6, 9, 0, 9, 7, 8, 5, 4, 4, 9, 4,
        2, 6, 6, 9, 9, 7, 9, 9, 0, 5, 1, 1, 8, 6, 9, 9, 5, 6, 5, 5, 9, 7, 9, 4,
        5, 4, 3, 3, 9, 8, 8, 3, 4, 8, 4, 0, 4, 0, 7, 2, 8, 0, 2, 0, 7, 4, 0, 6,
        3, 3, 1, 7, 4, 6, 5, 0, 4, 9, 5, 6, 6, 3, 0, 5, 2, 6, 3, 3, 8, 0, 8, 5,
        4, 2, 9, 5, 0, 5, 4, 8, 2, 7, 6, 5, 7, 8, 2, 0, 1, 8, 2, 4, 8, 4, 3, 0,
        4, 0, 1, 9, 3, 0, 3, 6, 5, 5, 2, 8, 5, 0, 8, 7, 5, 2, 4, 3, 7, 0, 2, 3,
        1, 0, 0, 2, 1, 2, 1, 4, 6, 7, 1, 0, 5, 2, 5, 9, 2, 5, 6, 7, 1, 9, 3, 6,
        1, 0, 2, 9, 4, 2, 9, 3, 8, 2, 9, 8, 5, 1, 0, 1, 5, 2, 5, 6, 8, 8, 3, 5,
        0, 4, 1, 3, 7, 0, 3, 3, 9, 2, 3, 9, 9, 8, 5, 9])


 60%|██████    | 24/40 [00:45<00:30,  1.88s/it]

tensor([7, 2, 0, 9, 4, 7, 1, 7, 6, 4, 4, 8, 9, 0, 7, 1, 2, 4, 0, 3, 7, 5, 1, 9,
        3, 5, 2, 4, 1, 1, 3, 3, 9, 2, 5, 8, 0, 8, 1, 5, 1, 5, 9, 0, 9, 2, 4, 0,
        6, 9, 1, 1, 3, 4, 9, 5, 3, 3, 3, 3, 4, 1, 5, 9, 8, 7, 4, 8, 7, 3, 5, 2,
        5, 8, 9, 7, 2, 0, 0, 7, 5, 3, 5, 7, 4, 6, 2, 8, 2, 7, 7, 6, 1, 3, 8, 6,
        2, 0, 4, 4, 8, 6, 4, 0, 2, 5, 4, 1, 2, 5, 4, 6, 9, 5, 0, 5, 2, 0, 8, 2,
        6, 4, 7, 9, 5, 7, 2, 6, 2, 1, 9, 9, 7, 2, 5, 1, 8, 1, 6, 3, 0, 8, 8, 5,
        6, 9, 9, 9, 6, 3, 0, 8, 8, 5, 3, 3, 1, 1, 0, 7, 0, 6, 0, 4, 4, 6, 5, 3,
        2, 9, 6, 6, 7, 9, 7, 7, 0, 8, 4, 2, 7, 7, 7, 4, 0, 1, 6, 0, 5, 3, 0, 2,
        4, 9, 7, 3, 3, 2, 8, 8, 5, 3, 8, 0, 8, 0, 0, 0, 2, 2, 7, 0, 7, 7, 7, 2,
        2, 0, 4, 0, 3, 7, 8, 4, 7, 9, 6, 1, 1, 6, 9, 3, 8, 9, 5, 8, 3, 9, 4, 3,
        7, 5, 5, 2, 7, 5, 7, 9, 6, 3, 0, 8, 4, 5, 6, 1])


 62%|██████▎   | 25/40 [00:47<00:28,  1.88s/it]

tensor([0, 4, 8, 4, 7, 1, 3, 2, 0, 9, 1, 8, 8, 6, 8, 1, 7, 1, 0, 9, 7, 6, 4, 5,
        8, 2, 1, 2, 5, 8, 1, 1, 8, 8, 0, 9, 0, 0, 4, 3, 3, 5, 6, 6, 3, 1, 4, 4,
        7, 8, 0, 1, 4, 8, 4, 6, 2, 2, 2, 6, 3, 4, 0, 8, 1, 3, 8, 3, 8, 9, 1, 6,
        1, 9, 8, 2, 0, 4, 7, 4, 0, 0, 7, 6, 8, 9, 8, 6, 2, 8, 8, 0, 1, 1, 0, 7,
        7, 4, 4, 7, 7, 2, 6, 6, 7, 4, 5, 4, 0, 9, 4, 7, 2, 5, 3, 2, 2, 7, 6, 2,
        6, 4, 9, 6, 8, 3, 8, 5, 5, 2, 3, 4, 5, 2, 7, 5, 1, 5, 3, 7, 8, 8, 9, 6,
        0, 9, 2, 6, 0, 0, 1, 4, 1, 3, 5, 1, 8, 9, 6, 2, 6, 9, 5, 6, 5, 9, 1, 4,
        2, 8, 9, 5, 4, 7, 0, 3, 7, 1, 6, 9, 1, 1, 4, 0, 5, 1, 6, 3, 0, 2, 5, 5,
        5, 2, 8, 2, 3, 1, 6, 2, 7, 0, 8, 9, 2, 6, 4, 7, 5, 5, 8, 8, 7, 8, 5, 7,
        4, 4, 0, 0, 7, 0, 4, 0, 4, 4, 2, 5, 4, 0, 7, 7, 7, 0, 4, 6, 2, 9, 1, 1,
        5, 0, 9, 1, 0, 6, 5, 3, 0, 9, 0, 6, 2, 0, 4, 1])


 65%|██████▌   | 26/40 [00:49<00:26,  1.87s/it]

tensor([6, 7, 6, 1, 0, 4, 6, 0, 4, 4, 2, 7, 6, 5, 3, 5, 9, 4, 5, 2, 0, 5, 7, 2,
        3, 9, 4, 5, 8, 1, 9, 3, 1, 8, 5, 8, 6, 6, 3, 4, 0, 2, 3, 2, 6, 1, 8, 2,
        1, 3, 6, 2, 2, 3, 3, 5, 0, 2, 1, 2, 8, 0, 8, 2, 2, 7, 3, 5, 9, 8, 3, 6,
        1, 0, 8, 6, 3, 5, 8, 9, 4, 8, 0, 0, 2, 9, 1, 8, 3, 7, 2, 8, 0, 9, 4, 8,
        9, 4, 8, 9, 5, 4, 8, 1, 5, 5, 7, 2, 5, 8, 8, 1, 2, 3, 1, 5, 0, 2, 3, 0,
        8, 2, 2, 8, 7, 3, 5, 3, 9, 6, 3, 5, 1, 1, 7, 7, 3, 3, 8, 9, 6, 0, 8, 0,
        6, 6, 4, 6, 9, 1, 1, 7, 0, 6, 8, 9, 0, 6, 9, 0, 1, 1, 7, 2, 1, 0, 4, 5,
        0, 3, 6, 3, 3, 8, 1, 8, 1, 3, 1, 5, 4, 5, 1, 8, 9, 1, 3, 1, 7, 2, 9, 8,
        9, 4, 6, 0, 9, 4, 0, 4, 6, 2, 1, 3, 0, 8, 5, 6, 0, 3, 4, 6, 9, 5, 7, 9,
        0, 3, 6, 5, 7, 0, 5, 1, 2, 2, 9, 7, 0, 0, 1, 4, 4, 5, 5, 9, 6, 3, 6, 2,
        6, 9, 3, 2, 3, 2, 4, 9, 5, 2, 1, 6, 6, 4, 7, 6])


 68%|██████▊   | 27/40 [00:50<00:24,  1.87s/it]

tensor([3, 3, 8, 1, 4, 6, 4, 4, 5, 6, 2, 0, 5, 2, 7, 5, 2, 5, 2, 5, 9, 0, 6, 7,
        9, 8, 3, 2, 1, 4, 3, 4, 5, 5, 6, 8, 6, 0, 8, 9, 8, 0, 7, 7, 6, 6, 7, 5,
        9, 8, 7, 9, 5, 6, 0, 4, 8, 2, 0, 9, 1, 3, 9, 8, 3, 7, 4, 5, 0, 2, 1, 9,
        1, 0, 5, 8, 4, 7, 3, 1, 4, 5, 3, 1, 0, 8, 8, 8, 2, 1, 3, 7, 3, 1, 6, 9,
        1, 6, 0, 7, 7, 1, 3, 3, 3, 1, 0, 2, 7, 1, 2, 2, 9, 6, 3, 5, 9, 4, 8, 0,
        4, 8, 0, 1, 3, 7, 0, 4, 7, 4, 8, 6, 4, 3, 9, 8, 2, 2, 8, 3, 1, 1, 2, 8,
        2, 6, 9, 4, 8, 4, 4, 1, 5, 2, 6, 9, 2, 0, 7, 1, 8, 9, 3, 9, 9, 0, 7, 7,
        5, 4, 2, 6, 4, 5, 7, 7, 8, 7, 2, 6, 2, 2, 4, 4, 0, 7, 1, 3, 9, 6, 0, 0,
        2, 3, 8, 2, 2, 4, 3, 5, 2, 9, 1, 0, 0, 6, 5, 5, 7, 9, 9, 6, 5, 5, 0, 5,
        7, 1, 6, 6, 4, 1, 4, 4, 1, 5, 0, 0, 4, 5, 8, 4, 8, 3, 0, 5, 0, 5, 3, 1,
        6, 7, 0, 9, 1, 5, 7, 6, 5, 5, 5, 6, 0, 0, 1, 7])


 70%|███████   | 28/40 [00:52<00:22,  1.87s/it]

tensor([5, 1, 9, 2, 4, 1, 3, 7, 8, 2, 0, 9, 6, 6, 0, 6, 5, 8, 2, 7, 4, 0, 2, 7,
        7, 8, 8, 7, 0, 4, 9, 1, 4, 4, 3, 5, 4, 6, 2, 3, 1, 0, 3, 3, 3, 6, 3, 1,
        2, 8, 9, 7, 9, 3, 8, 7, 3, 1, 7, 7, 3, 2, 2, 8, 9, 5, 9, 2, 1, 7, 4, 4,
        0, 5, 7, 1, 5, 4, 0, 8, 4, 9, 8, 7, 8, 4, 2, 3, 4, 0, 5, 4, 1, 8, 2, 5,
        4, 5, 2, 5, 3, 7, 9, 7, 1, 4, 1, 3, 1, 4, 5, 5, 1, 7, 1, 3, 0, 1, 2, 5,
        7, 0, 6, 3, 5, 7, 5, 5, 8, 9, 4, 6, 3, 6, 6, 8, 2, 6, 4, 8, 4, 1, 3, 1,
        2, 3, 3, 0, 1, 6, 4, 2, 8, 2, 4, 3, 1, 4, 4, 4, 4, 1, 5, 5, 7, 6, 5, 4,
        5, 6, 5, 6, 2, 4, 7, 7, 4, 6, 5, 0, 2, 3, 9, 2, 3, 8, 0, 7, 0, 6, 8, 1,
        2, 8, 9, 1, 5, 4, 2, 3, 5, 5, 3, 6, 5, 0, 3, 2, 1, 3, 3, 7, 7, 4, 9, 4,
        3, 9, 5, 1, 7, 6, 6, 4, 8, 5, 1, 2, 8, 4, 5, 3, 3, 3, 0, 4, 8, 7, 7, 9,
        4, 8, 4, 5, 9, 1, 8, 3, 1, 3, 8, 0, 3, 5, 4, 6])


 72%|███████▎  | 29/40 [00:54<00:20,  1.88s/it]

tensor([9, 2, 7, 2, 0, 5, 2, 9, 7, 5, 0, 7, 0, 3, 5, 5, 3, 8, 7, 7, 9, 6, 5, 8,
        9, 0, 1, 9, 4, 0, 0, 1, 7, 2, 2, 8, 5, 9, 8, 2, 8, 0, 6, 6, 8, 6, 9, 6,
        1, 1, 5, 3, 5, 0, 7, 6, 1, 4, 4, 1, 1, 8, 1, 1, 2, 3, 4, 9, 9, 5, 2, 7,
        3, 3, 6, 9, 6, 9, 8, 8, 6, 7, 4, 7, 2, 3, 8, 0, 9, 0, 7, 8, 1, 2, 3, 2,
        4, 2, 9, 0, 3, 8, 9, 6, 5, 5, 2, 4, 1, 0, 6, 1, 8, 4, 5, 5, 5, 0, 8, 9,
        0, 5, 7, 9, 6, 3, 4, 6, 4, 9, 7, 1, 2, 5, 2, 8, 0, 2, 2, 9, 8, 1, 9, 7,
        7, 3, 1, 4, 1, 1, 3, 8, 8, 1, 9, 3, 8, 9, 5, 1, 1, 6, 2, 7, 2, 6, 2, 6,
        9, 6, 0, 1, 3, 9, 3, 8, 8, 8, 2, 9, 1, 2, 6, 0, 5, 0, 7, 7, 7, 9, 4, 5,
        0, 2, 3, 5, 5, 8, 9, 1, 2, 7, 1, 3, 4, 6, 9, 1, 0, 7, 1, 9, 5, 8, 1, 9,
        9, 7, 9, 8, 3, 0, 2, 6, 0, 2, 2, 4, 0, 8, 4, 8, 1, 2, 7, 2, 7, 9, 5, 2,
        2, 9, 2, 6, 5, 7, 2, 5, 6, 5, 1, 8, 2, 6, 8, 6])


 75%|███████▌  | 30/40 [00:56<00:18,  1.87s/it]

tensor([3, 0, 8, 0, 0, 9, 9, 6, 8, 0, 6, 6, 0, 7, 9, 8, 0, 5, 0, 9, 6, 0, 4, 7,
        1, 7, 8, 8, 2, 3, 9, 6, 5, 5, 3, 1, 5, 3, 0, 2, 5, 2, 7, 3, 8, 4, 4, 6,
        1, 9, 4, 1, 7, 0, 1, 4, 3, 0, 2, 9, 4, 5, 0, 3, 4, 7, 2, 3, 0, 1, 3, 7,
        9, 9, 0, 7, 5, 6, 7, 1, 2, 2, 7, 4, 6, 8, 4, 9, 8, 4, 9, 2, 3, 4, 0, 2,
        5, 6, 0, 5, 3, 5, 8, 3, 7, 4, 9, 5, 6, 3, 8, 6, 9, 8, 5, 5, 8, 3, 5, 9,
        0, 4, 2, 0, 5, 9, 9, 3, 8, 0, 6, 0, 2, 3, 7, 0, 0, 6, 9, 1, 5, 2, 2, 7,
        9, 3, 9, 3, 5, 2, 2, 2, 8, 8, 5, 0, 0, 7, 0, 3, 1, 6, 5, 0, 9, 2, 6, 7,
        4, 2, 0, 0, 5, 3, 3, 2, 6, 4, 2, 7, 8, 0, 1, 0, 1, 3, 6, 4, 8, 0, 6, 0,
        9, 1, 6, 2, 1, 0, 1, 3, 0, 3, 7, 9, 9, 3, 7, 5, 7, 1, 6, 6, 1, 1, 1, 4,
        4, 2, 1, 7, 0, 1, 2, 1, 1, 3, 2, 6, 9, 1, 3, 2, 0, 8, 7, 3, 4, 6, 4, 6,
        9, 4, 8, 9, 3, 0, 5, 7, 2, 0, 2, 9, 9, 3, 8, 7])


 78%|███████▊  | 31/40 [00:58<00:16,  1.88s/it]

tensor([1, 1, 2, 2, 2, 5, 9, 1, 2, 6, 1, 2, 1, 8, 9, 2, 3, 1, 9, 9, 4, 4, 5, 5,
        0, 3, 3, 7, 1, 7, 7, 9, 8, 6, 3, 5, 5, 1, 6, 2, 7, 8, 4, 3, 3, 5, 9, 6,
        8, 5, 1, 0, 6, 7, 0, 9, 6, 2, 0, 8, 7, 1, 2, 6, 9, 1, 6, 4, 7, 1, 9, 0,
        0, 5, 2, 8, 3, 8, 2, 8, 4, 0, 8, 3, 6, 1, 2, 3, 0, 5, 0, 4, 1, 7, 2, 0,
        1, 3, 9, 1, 6, 1, 6, 1, 2, 8, 7, 0, 9, 5, 8, 0, 0, 1, 7, 9, 7, 1, 9, 2,
        1, 9, 4, 3, 6, 0, 2, 0, 0, 9, 9, 8, 9, 4, 5, 4, 3, 0, 7, 8, 7, 3, 7, 5,
        4, 6, 5, 9, 5, 2, 4, 2, 7, 9, 6, 7, 4, 7, 1, 2, 3, 1, 1, 3, 6, 8, 3, 7,
        7, 0, 8, 1, 5, 2, 5, 5, 3, 6, 3, 5, 9, 8, 4, 6, 8, 0, 4, 1, 7, 8, 4, 9,
        7, 2, 6, 3, 8, 8, 6, 7, 3, 8, 1, 1, 5, 3, 7, 9, 9, 9, 8, 5, 9, 9, 6, 8,
        8, 4, 8, 1, 0, 1, 8, 8, 6, 5, 9, 0, 6, 1, 2, 6, 6, 3, 5, 6, 5, 5, 3, 3,
        6, 8, 9, 4, 4, 3, 8, 3, 6, 0, 8, 5, 7, 8, 4, 6])


 80%|████████  | 32/40 [01:00<00:14,  1.87s/it]

tensor([5, 3, 1, 8, 3, 0, 4, 4, 3, 3, 5, 3, 3, 2, 7, 7, 8, 2, 0, 5, 5, 6, 1, 1,
        3, 9, 9, 3, 6, 7, 1, 8, 3, 9, 8, 7, 8, 8, 0, 3, 2, 4, 8, 7, 5, 7, 1, 8,
        7, 3, 5, 4, 4, 3, 8, 6, 1, 1, 6, 3, 2, 2, 8, 9, 8, 3, 3, 8, 1, 1, 0, 5,
        2, 0, 5, 0, 5, 8, 6, 3, 9, 6, 6, 3, 9, 4, 8, 3, 4, 3, 1, 1, 2, 9, 7, 8,
        5, 5, 8, 6, 9, 0, 8, 3, 5, 1, 3, 3, 5, 5, 3, 8, 1, 4, 6, 4, 2, 4, 3, 8,
        9, 6, 6, 6, 5, 5, 2, 7, 7, 4, 4, 9, 3, 8, 3, 7, 7, 8, 5, 3, 9, 1, 6, 7,
        3, 3, 3, 8, 5, 6, 1, 3, 7, 9, 6, 1, 6, 4, 9, 0, 5, 7, 4, 9, 1, 0, 0, 2,
        7, 8, 3, 2, 7, 6, 1, 6, 6, 4, 4, 2, 1, 8, 4, 4, 7, 8, 5, 8, 9, 4, 3, 2,
        2, 9, 3, 0, 1, 1, 1, 1, 6, 8, 7, 1, 3, 0, 4, 5, 0, 3, 0, 6, 2, 0, 9, 8,
        6, 8, 5, 8, 2, 9, 6, 0, 6, 1, 8, 7, 3, 2, 2, 6, 5, 6, 9, 2, 5, 4, 9, 8,
        5, 3, 9, 3, 5, 7, 1, 8, 5, 8, 6, 2, 5, 1, 3, 0])


 82%|████████▎ | 33/40 [01:02<00:13,  1.87s/it]

tensor([4, 1, 9, 5, 4, 9, 2, 1, 4, 2, 5, 9, 8, 7, 2, 1, 6, 9, 7, 3, 6, 5, 8, 0,
        2, 2, 4, 9, 0, 5, 5, 4, 2, 3, 1, 0, 3, 0, 4, 4, 9, 1, 3, 3, 0, 2, 1, 1,
        0, 4, 1, 9, 4, 8, 1, 1, 1, 5, 9, 8, 1, 6, 7, 2, 9, 9, 7, 8, 8, 1, 9, 7,
        8, 9, 1, 6, 6, 3, 3, 1, 3, 0, 8, 8, 3, 6, 6, 7, 1, 0, 7, 6, 4, 8, 2, 0,
        6, 6, 8, 6, 3, 1, 1, 8, 6, 7, 0, 5, 4, 3, 2, 1, 7, 8, 2, 0, 1, 7, 4, 8,
        7, 6, 6, 7, 0, 6, 4, 9, 2, 3, 2, 1, 3, 8, 6, 8, 1, 9, 6, 7, 6, 5, 0, 6,
        0, 1, 3, 1, 1, 5, 3, 4, 3, 3, 8, 1, 6, 6, 1, 8, 8, 6, 2, 7, 4, 6, 8, 9,
        3, 3, 5, 5, 3, 1, 4, 1, 9, 5, 1, 7, 7, 7, 5, 2, 3, 3, 6, 6, 6, 4, 0, 7,
        1, 7, 3, 0, 8, 1, 2, 4, 1, 2, 2, 4, 9, 8, 2, 8, 7, 6, 3, 4, 7, 2, 3, 1,
        5, 3, 0, 4, 3, 4, 9, 4, 7, 9, 1, 7, 8, 3, 1, 8, 3, 2, 5, 7, 2, 7, 9, 6,
        8, 6, 8, 6, 5, 9, 0, 4, 8, 5, 6, 4, 3, 8, 0, 4])


 85%|████████▌ | 34/40 [01:04<00:11,  1.87s/it]

tensor([8, 6, 7, 2, 7, 1, 7, 4, 1, 2, 4, 2, 2, 8, 7, 2, 2, 6, 7, 5, 6, 8, 9, 9,
        6, 5, 2, 9, 8, 7, 2, 3, 3, 7, 4, 3, 2, 1, 9, 0, 1, 5, 3, 2, 6, 5, 9, 1,
        5, 4, 3, 6, 2, 8, 9, 7, 8, 0, 8, 9, 1, 2, 5, 7, 2, 0, 6, 6, 1, 6, 3, 1,
        5, 0, 3, 7, 4, 6, 5, 6, 4, 8, 3, 4, 5, 5, 0, 6, 6, 7, 5, 0, 9, 9, 6, 1,
        0, 4, 6, 3, 6, 9, 3, 6, 6, 8, 2, 1, 0, 1, 9, 3, 7, 4, 3, 0, 2, 7, 6, 4,
        3, 0, 8, 2, 0, 0, 9, 6, 6, 6, 8, 7, 4, 1, 8, 1, 2, 2, 4, 8, 5, 2, 6, 5,
        3, 9, 1, 0, 7, 2, 4, 4, 0, 0, 6, 2, 2, 4, 0, 5, 9, 7, 1, 8, 4, 5, 5, 9,
        8, 5, 7, 8, 0, 9, 8, 9, 1, 6, 3, 8, 0, 3, 4, 4, 8, 4, 8, 9, 8, 6, 0, 0,
        8, 2, 7, 4, 2, 5, 6, 0, 5, 8, 4, 1, 9, 0, 1, 4, 4, 8, 4, 9, 6, 0, 7, 7,
        6, 8, 9, 6, 2, 0, 4, 9, 4, 9, 3, 9, 6, 6, 7, 0, 9, 7, 1, 8, 6, 0, 6, 7,
        4, 1, 9, 4, 6, 7, 9, 8, 3, 9, 2, 1, 2, 7, 6, 1])


 88%|████████▊ | 35/40 [01:05<00:09,  1.87s/it]

tensor([0, 0, 5, 6, 0, 4, 3, 2, 8, 8, 0, 6, 9, 5, 2, 8, 7, 0, 6, 5, 9, 7, 2, 3,
        6, 9, 6, 2, 2, 4, 1, 0, 5, 0, 8, 9, 3, 5, 9, 3, 8, 1, 6, 3, 7, 5, 6, 2,
        0, 2, 8, 2, 8, 7, 7, 8, 1, 0, 8, 9, 7, 0, 3, 8, 0, 5, 9, 5, 8, 4, 2, 0,
        9, 2, 2, 4, 4, 9, 2, 2, 2, 5, 1, 3, 2, 0, 0, 4, 0, 6, 5, 8, 0, 5, 8, 6,
        4, 8, 5, 2, 9, 7, 9, 7, 1, 0, 1, 9, 6, 9, 2, 7, 9, 4, 4, 0, 6, 2, 4, 1,
        3, 7, 2, 8, 5, 9, 0, 3, 2, 3, 2, 7, 6, 3, 2, 5, 9, 0, 5, 9, 9, 8, 7, 7,
        4, 8, 6, 5, 2, 3, 1, 0, 4, 1, 8, 8, 4, 9, 4, 4, 3, 3, 9, 2, 0, 1, 1, 8,
        4, 4, 8, 3, 2, 9, 5, 7, 6, 2, 5, 4, 7, 3, 3, 9, 0, 1, 5, 9, 3, 7, 6, 0,
        4, 2, 2, 5, 6, 3, 8, 9, 5, 6, 1, 4, 5, 4, 6, 7, 2, 1, 0, 2, 0, 4, 9, 8,
        8, 9, 1, 1, 5, 0, 0, 8, 7, 1, 7, 4, 5, 4, 3, 3, 2, 0, 6, 6, 0, 1, 3, 9,
        8, 3, 7, 8, 9, 4, 8, 9, 0, 9, 7, 1, 6, 2, 9, 5])


 90%|█████████ | 36/40 [01:07<00:07,  1.87s/it]

tensor([6, 9, 3, 9, 8, 7, 7, 1, 6, 5, 3, 1, 3, 1, 2, 7, 1, 8, 2, 0, 9, 7, 9, 8,
        8, 6, 7, 3, 7, 1, 3, 9, 0, 9, 3, 6, 7, 2, 7, 3, 0, 5, 9, 7, 5, 5, 0, 6,
        5, 1, 8, 2, 7, 5, 9, 0, 0, 0, 8, 8, 7, 3, 7, 8, 9, 3, 7, 9, 7, 8, 7, 9,
        8, 5, 4, 8, 3, 7, 6, 3, 8, 2, 1, 9, 5, 7, 3, 9, 5, 5, 8, 7, 3, 5, 3, 5,
        9, 7, 6, 7, 3, 6, 4, 3, 9, 4, 2, 1, 9, 6, 0, 2, 6, 7, 4, 7, 9, 0, 7, 4,
        3, 5, 3, 1, 1, 2, 6, 8, 2, 1, 7, 8, 5, 9, 6, 1, 1, 5, 0, 6, 0, 9, 2, 6,
        5, 8, 9, 5, 5, 6, 2, 9, 1, 5, 8, 8, 7, 1, 7, 3, 5, 4, 9, 7, 5, 2, 9, 9,
        4, 7, 4, 1, 3, 8, 7, 9, 0, 4, 5, 7, 5, 2, 8, 7, 6, 9, 6, 9, 3, 8, 5, 6,
        6, 9, 5, 7, 8, 0, 5, 0, 7, 4, 8, 2, 5, 1, 3, 2, 2, 6, 2, 1, 7, 4, 6, 3,
        1, 3, 7, 2, 1, 3, 7, 0, 8, 4, 4, 5, 7, 9, 5, 4, 3, 9, 6, 8, 2, 3, 3, 1,
        6, 1, 7, 0, 3, 4, 2, 9, 4, 5, 8, 2, 7, 0, 9, 6])


 92%|█████████▎| 37/40 [01:09<00:05,  1.88s/it]

tensor([8, 0, 8, 2, 8, 5, 7, 7, 2, 2, 0, 0, 0, 7, 4, 1, 6, 6, 8, 8, 9, 0, 9, 0,
        1, 3, 3, 0, 9, 6, 6, 2, 6, 3, 4, 0, 8, 4, 1, 4, 0, 6, 5, 0, 9, 9, 9, 9,
        1, 2, 3, 5, 4, 2, 9, 6, 0, 9, 6, 6, 8, 0, 6, 1, 4, 6, 8, 0, 5, 4, 1, 2,
        0, 9, 6, 4, 2, 4, 6, 5, 9, 7, 7, 4, 6, 5, 0, 1, 9, 0, 3, 1, 9, 0, 9, 7,
        8, 6, 7, 6, 8, 2, 4, 5, 3, 0, 3, 2, 1, 7, 5, 9, 3, 4, 5, 7, 1, 5, 0, 1,
        1, 1, 9, 7, 5, 4, 9, 7, 8, 1, 0, 2, 8, 5, 6, 7, 0, 1, 4, 8, 4, 4, 6, 6,
        5, 8, 1, 8, 4, 6, 5, 9, 2, 2, 1, 4, 9, 1, 6, 7, 2, 0, 1, 7, 6, 5, 2, 2,
        5, 6, 0, 9, 0, 1, 5, 3, 3, 5, 8, 7, 5, 6, 5, 8, 0, 5, 9, 4, 6, 5, 1, 1,
        0, 3, 3, 9, 4, 8, 1, 7, 7, 9, 9, 4, 3, 6, 3, 2, 8, 2, 7, 6, 7, 0, 2, 1,
        2, 9, 4, 6, 9, 6, 1, 0, 1, 8, 7, 0, 0, 4, 7, 4, 2, 6, 9, 5, 9, 0, 7, 4,
        5, 8, 1, 4, 7, 9, 9, 8, 8, 6, 3, 7, 0, 8, 9, 6])


 95%|█████████▌| 38/40 [01:11<00:03,  1.87s/it]

tensor([2, 4, 6, 2, 9, 7, 4, 6, 8, 5, 6, 1, 3, 5, 9, 9, 1, 3, 2, 0, 3, 0, 2, 0,
        7, 3, 9, 3, 5, 7, 6, 5, 9, 5, 6, 1, 2, 4, 0, 2, 7, 8, 4, 4, 0, 9, 5, 9,
        2, 9, 3, 2, 4, 3, 2, 2, 8, 8, 6, 8, 1, 6, 8, 9, 8, 2, 1, 4, 9, 1, 5, 7,
        1, 6, 0, 1, 5, 2, 8, 1, 1, 3, 8, 1, 3, 1, 8, 5, 1, 9, 0, 3, 0, 4, 0, 5,
        1, 2, 0, 4, 9, 5, 1, 2, 2, 7, 8, 7, 2, 4, 5, 3, 4, 0, 6, 1, 5, 9, 8, 0,
        2, 0, 7, 6, 0, 5, 9, 7, 5, 0, 4, 6, 0, 6, 3, 6, 8, 1, 1, 8, 9, 7, 2, 9,
        0, 2, 2, 9, 0, 6, 5, 7, 7, 9, 1, 7, 9, 8, 4, 5, 0, 8, 0, 2, 5, 2, 1, 4,
        4, 8, 9, 7, 8, 3, 6, 6, 0, 1, 1, 1, 8, 1, 4, 4, 0, 7, 8, 2, 1, 2, 5, 4,
        6, 0, 5, 7, 4, 4, 3, 9, 5, 8, 8, 0, 8, 7, 4, 1, 8, 4, 9, 5, 4, 1, 7, 7,
        7, 7, 0, 3, 8, 3, 3, 0, 5, 7, 0, 8, 0, 0, 9, 2, 2, 3, 4, 8, 2, 2, 6, 3,
        3, 6, 2, 9, 4, 0, 1, 7, 5, 5, 7, 3, 0, 4, 2, 0])


100%|██████████| 40/40 [01:13<00:00,  1.35s/it]

tensor([7, 5, 8, 0, 8, 2, 7, 0, 3, 5, 3, 8, 3, 5, 1, 7])





0.9541

## Describe what you did 

In the cell below you should write an explanation of what you did, any additional features that you implemented, and/or any graphs that you made in the process of training and evaluating your network.

# Chronological Steps   (what I did, in the order I did it):

### 1.
1.  First I tried a layer   of 32 5x5 convolutional filters   into   
    LeakyReLU w/ neg slope 0.2 (ie. max(0.2x, x))  into
    flatten()  into
    Linear().

  with AdaM as the optimizer   
  
  This 1-conv network is super stupidly simple (we can't learn anything other than 1 FC layer with edge/corner inputs ), but still we get 55-60% accuracy.


##### Code:
 
''' 
 
model = nn.Sequential(
    nn.Conv2d( in_channels=RGB, out_channels= F1, kernel_size=HH1, stride=1, padding=2, dilation=1, groups=1, bias=True ),
    nn.GroupNorm(num_groups=G, num_channels=F1, eps=1e-05, affine=True),
    nn.LeakyReLU(negative_slope=0.2),
    Flatten(),
    nn.Linear( in_features = fc_in_dim,    out_features = C, bias=True),
)
 
optimizer = optim.Adam(model.parameters(), lr=lr,   betas=(0.9, 0.999), eps=1e-8, weight_decay=0, amsgrad=False)

'''



### 2. 
2. Second, I added GroupNorm   AFTER the 1st Conv layer.  It's unclear this had significant positive effect, but I did see validation acc of 60.6%, 60.1%, 60.1%, 60.4%, 61.2%, 61.3%, 61.6%, 61.6%, 60.7%, 61.3%, 60.7%, 60.2%, 60.8%, 61.1%,     and 61.8%.

 
##### Code:
 
'''

model = nn.Sequential(
    nn.Conv2d( in_channels=RGB, out_channels= F1, kernel_size=HH1, stride=1, padding=2, dilation=1, groups=1, bias=True ),
    nn.GroupNorm(num_groups=G, num_channels=F1, eps=1e-05, affine=True),
    nn.LeakyReLU(negative_slope=0.2),
    Flatten(),
    nn.Linear( in_features = fc_in_dim,    out_features = C, bias=True),
)
 
optimizer = optim.Adam(model.parameters(), lr=lr,   betas=(0.9, 0.999), eps=1e-8, weight_decay=0, amsgrad=False)

'''

###  3. 
3. Third, I added a 2nd Conv2d layer.    In Lecture 9 (CNN Architectures :  https://www.youtube.com/watch?v=DAOcjicFr1Y&t=3955s)  It was recommended (the success of VGG Networks) I use deeper networks, all with 3x3 filters

##### Code:
''' 
model = nn.Sequential(
    nn.Conv2d( in_channels=RGB, out_channels= F1, kernel_size=HH1, stride=1, padding=2, dilation=1, groups=1, bias=True ),
    nn.GroupNorm(num_groups=G, num_channels=F1, eps=1e-05, affine=True),
    nn.LeakyReLU(negative_slope=lrelu_m),
    nn.Conv2d( in_channels=F1, out_channels= F1, kernel_size=HH1, stride=1, padding=2, dilation=1, groups=1, bias=True ),
    nn.GroupNorm(num_groups=G, num_channels=F1, eps=1e-05, affine=True),
    nn.LeakyReLU(negative_slope=lrelu_m),
    Flatten(),
    nn.Linear( in_features = fc_in_dim,    out_features = C, bias=True),
)

optimizer = optim.Adam(model.parameters(), lr=lr,   betas=(0.9, 0.999), eps=1e-8, weight_decay=0, amsgrad=False)


'''

### 4.
4. Fourth, I just added many more convolutional layers  (5 conv2d layers).  I thought about implementing the latest and greatest (ie. ResNeXT, etc.), but I'll do that on a real problem, not this toy problem where we only need 70%


##### Code:
''' 

model = nn.Sequential(
    nn.Conv2d( in_channels=RGB, out_channels= F1, kernel_size=HH1, stride=S1, padding=P1, dilation=1, groups=1, bias=True ),
    nn.GroupNorm(num_groups=G, num_channels=F1, eps=eps, affine=True),
    nn.LeakyReLU(negative_slope=lrelu_m),
    nn.Conv2d( in_channels=F1 , out_channels= F1, kernel_size=HH1, stride=S1, padding=P1, dilation=1, groups=1, bias=True ),
    nn.GroupNorm(num_groups=G, num_channels=F1, eps=eps, affine=True),
    nn.LeakyReLU(negative_slope=lrelu_m),
    nn.Conv2d( in_channels=F1 , out_channels= F1, kernel_size=HH1, stride=S1, padding=P1, dilation=1, groups=1, bias=True ),
    nn.GroupNorm(num_groups=G, num_channels=F1, eps=eps, affine=True),
    nn.LeakyReLU(negative_slope=lrelu_m),
    nn.Conv2d( in_channels=F1 , out_channels= F1, kernel_size=HH1, stride=S1, padding=P1, dilation=1, groups=1, bias=True ),
    nn.GroupNorm(num_groups=G, num_channels=F1, eps=eps, affine=True),
    nn.LeakyReLU(negative_slope=lrelu_m),
    nn.Conv2d( in_channels=F1 , out_channels= F1, kernel_size=HH1, stride=S1, padding=P1, dilation=1, groups=1, bias=True ),
    nn.GroupNorm(num_groups=G, num_channels=F1, eps=eps, affine=True),
    nn.LeakyReLU(negative_slope=lrelu_m),
    Flatten(),
    nn.Linear( in_features = fc_in_dim,    out_features = C, bias=True),
)

''' 


### 5.
5. Fifth, I added a BatchNorm layer  at the start of the network (right after the input & before the 1st "conv2d()"  ).  I'll need to implement a deeper network with more regularization to get  that goal 70%

##### Code:
''' 

model = nn.Sequential(
    nn.BatchNorm2d(num_features=RGB,   eps=eps,   momentum=0.1, affine=True, track_running_stats=True),
    nn.Conv2d( in_channels=RGB, out_channels= F1, kernel_size=HH1, stride=S1, padding=P1, dilation=1, groups=1, bias=True ),
    nn.GroupNorm(num_groups=G, num_channels=F1, eps=eps, affine=True),
    nn.LeakyReLU(negative_slope=lrelu_m),
    nn.Conv2d( in_channels=F1 , out_channels= F1, kernel_size=HH1, stride=S1, padding=P1, dilation=1, groups=1, bias=True ),
    nn.GroupNorm(num_groups=G, num_channels=F1, eps=eps, affine=True),
    nn.LeakyReLU(negative_slope=lrelu_m),
    nn.Conv2d( in_channels=F1 , out_channels= F1, kernel_size=HH1, stride=S1, padding=P1, dilation=1, groups=1, bias=True ),
    nn.GroupNorm(num_groups=G, num_channels=F1, eps=eps, affine=True),
    nn.LeakyReLU(negative_slope=lrelu_m),
    nn.Conv2d( in_channels=F1 , out_channels= F1, kernel_size=HH1, stride=S1, padding=P1, dilation=1, groups=1, bias=True ),
    nn.GroupNorm(num_groups=G, num_channels=F1, eps=eps, affine=True),
    nn.LeakyReLU(negative_slope=lrelu_m),
    nn.Conv2d( in_channels=F1 , out_channels= F1, kernel_size=HH1, stride=S1, padding=P1, dilation=1, groups=1, bias=True ),
    nn.GroupNorm(num_groups=G, num_channels=F1, eps=eps, affine=True),
    nn.LeakyReLU(negative_slope=lrelu_m),
    Flatten(),
    nn.Linear( in_features = fc_in_dim,    out_features = C, bias=True),
)

'''

### 6.
6. Sixth, I tried using the ResNeXt model from " torchvision.models.resnext101_32x8d() " and trained it on CIFAR-10.  But this did very poorly.  

Code:

'''

#===============================================================================
# Notes from the src code:
#  (https://pytorch.org/docs/stable/_modules/torchvision/models/resnet.html#resnext50_32x4d):
#===============================================================================
def resnext101_32x8d(**kwargs):
    kwargs['groups'] = 32
    kwargs['width_per_group'] = 8
    return _resnet('resnext101_32x8d', Bottleneck, [3, 4, 23, 3],
                   pretrained=False, progress=True, **kwargs)

#===============================================================================
################################################################################
# called by resnext101...()
def _resnet(arch, inplanes, planes, pretrained, progress, **kwargs):
    model = ResNet(inplanes, planes, **kwargs)
    if pretrained:
        state_dict = load_state_dict_from_url(model_urls[arch],
                                              progress=progress)
        model.load_state_dict(state_dict)
    return model

def __init__(self):
    pass
scores=net(x)
def forward(self, x):


'''


### 7
7. Seventh, I grabbed an inception-v3 architecture from pytorch.vision.models and used transfer learning to retrain inception3 for CIFAR-10.  This, finally, was successful.



## Test set -- run this only once

Now that we've gotten a result we're happy with, we test our final model on the test set (which you should store in best_model). Think about how this compares to your validation set accuracy.

In [25]:
best_model = model
check_accuracy_part34(loader_test, best_model)

Checking accuracy on test set
Got 9541 / 10000 correct (95.41)


# Old code / other shit below:

# Old code / other shit below:

# The real deal:

In [3]:
################################################################################
# TODO:                                                                        #         
# Experiment with any architectures, optimizers, and hyperparameters.          #
# Achieve AT LEAST 70% accuracy on the *validation set* within 10 epochs.      #
#                                                                              #
# Note that you can use the check_accuracy function to evaluate on either      #
# the test set or the validation set, by passing either loader_test or         #
# loader_val as the second argument to check_accuracy. You should not touch    #
# the test set until you have finished your architecture and  hyperparameter   #
# tuning, and only run the test set once at the end to report a final value.   #
################################################################################
model = None
optimizer = None

# *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

eps=1e-05
H=input_imgs_height=32
P1  = pad1 = 1
P2  = pad2 = 1
S1  = stride1 = 1
S2  = stride2 = 1
RGB =  3
HH1 = filter_h_1       = 3
HH2 = filter_h_2       = 3
F1  = num_filters_1    = 32
HP1 = height_prime_1= WP1= outputs_height_after_1st_conv             =1+  (H+   2*P1 -HH1)/S1
HP2           = WP2 = outputs_height_after_2nd_conv =  width =   1+  (HP1+ 2*P2 -HH2)/S2
fc_in_dim           = int(F1*HP1*WP1)
C                   = num_classes

G=num_groups=what_Yuxin_Wu_found_best_n_groups=32
lr=1e-3   # lr= 1e-3 or 5e-4   were suggested by JcJohns.
lrelu_m=Leaky_ReLU_slope=0.2 # slope
model = nn.Sequential(
    nn.BatchNorm2d(num_features=RGB,   eps=eps,   momentum=0.1, affine=True, track_running_stats=True),
    nn.Conv2d( in_channels=RGB, out_channels= F1, kernel_size=HH1, stride=S1, padding=P1, dilation=1, groups=1, bias=True ),
    nn.GroupNorm(num_groups=G, num_channels=F1, eps=eps, affine=True),
    nn.LeakyReLU(negative_slope=lrelu_m),
    nn.Conv2d( in_channels=F1 , out_channels= F1, kernel_size=HH1, stride=S1, padding=P1, dilation=1, groups=1, bias=True ),
    nn.GroupNorm(num_groups=G, num_channels=F1, eps=eps, affine=True),
    nn.LeakyReLU(negative_slope=lrelu_m),
    nn.Conv2d( in_channels=F1 , out_channels= F1, kernel_size=HH1, stride=S1, padding=P1, dilation=1, groups=1, bias=True ),
    nn.GroupNorm(num_groups=G, num_channels=F1, eps=ejflkdsalps, affine=True),
    nn.LeakyReLU(negative_slope=lrelu_m),
    nn.Conv2d( in_channels=F1 , out_channels= F1, kernel_size=HH1, stride=S1, padding=P1, dilation=1, groups=1, bias=True ),
    nn.GroupNorm(num_groups=G, num_channels=F1, eps=eps, affine=True),
    nn.LeakyReLU(negative_slope=lrelu_m),
    nn.Conv2d( in_channels=F1 , out_channels= F1, kernel_size=HH1, stride=S1, padding=P1, dilation=1, groups=1, bias=True ),
    nn.GroupNorm(num_groups=G, num_channels=F1, eps=eps, affine=True),
    nn.LeakyReLU(negative_slope=lrelu_m),
    Flatten(),
    nn.Linear( in_features = fc_in_dim,    out_features = C, bias=True),
)

optimizer = optim.Adam(model.parameters(), lr=lr,   betas=(0.9, 0.999), eps=1e-8, weight_decay=0, amsgrad=False)


# *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
################################################################################
#                                 END OF YOUR CODE                             
################################################################################

# You should get at least 70% accuracy
train_part34(model, optimizer, epochs=10)

NameError: name 'num_classes' is not defined

In [None]:
print(model.parameters()  )
for i,p in enumerate(model.parameters() ):
    print("at idx {0}   , parameters.shape = {1}".format(i,p.shape))