# Homework 2, *part 2*
### (60 points total)

In this part, you will build a convolutional neural network (CNN) to solve (yet another) image classification problem: the Tiny ImageNet dataset (200 classes, 100K training images, 10K validation images). Try to achieve as high accuracy as possible.

**Unlike part 1**, you are now free to use the full power of PyTorch and its subpackages.

## Deliverables

* This file.
* A "checkpoint file" `"checkpoint.pth"` that contains your CNN's weights (you get them from `model.state_dict()`). Obtain it with `torch.save(..., "checkpoint.pth")`. When grading, we will load it to evaluate your accuracy.

**Should you decide to put your `"checkpoint.pth"` on Google Drive, update (edit) the following cell with the link to it:**

### [Dear TAs, I've put my "checkpoint.pth" on Google Drive, download it here](http://your-link-here)

## Grading

* 9 points for reproducible training code and a filled report below.
* 11 points for building a network that gets above 25% accuracy.
* 4 points for using an **interactive** (please don't reinvent the wheel with `plt.plot`) tool for viewing progress, for example Tensorboard ([with this library](https://github.com/lanpa/tensorboardX) and [an extra hack for Colab](https://stackoverflow.com/a/57791702)). In this notebook, insert screenshots of accuracy and loss plots (training and validation) over iterations/epochs/time.
* 6 points for beating each of these accuracy milestones on the private **test** set:
  * 30%
  * 34%
  * 38%
  * 42%
  * 46%
  * 50%
  
*Private test set* means that you won't be able to evaluate your model on it. Rather, after you submit code and checkpoint, we will load your model and evaluate it on that test set ourselves, reporting your accuracy in a comment to the grade.

Note that there is an important formatting requirement, see below near "`DO_TRAIN = True`".

## Restrictions

* No pretrained networks.
* Don't enlarge images (e.g. don't resize them to $224 \times 224$ or $256 \times 256$).

## Tips

* **One change at a time**: never test several new things at once (unless you are super confident). Train a model, introduce one change, train again.
* Google a lot: try to reinvent as few wheels as possible (unlike in part 1 of this assignment).
* Use GPU.
* Use regularization: L2, batch normalization, dropout, data augmentation...
* Pay much attention to accuracy and loss graphs (e.g. in Tensorboard). Track failures early, stop bad experiments early.

In [1]:
# Detect if we are in Google Colaboratory
try:
    import google.colab
    IN_COLAB = True
except ImportError:
    IN_COLAB = False

from pathlib import Path
# Determine the locations of auxiliary libraries and datasets.
# `AUX_DATA_ROOT` is where 'notmnist.py', 'animation.py' and 'tiny-imagenet-2020.zip' are.
if IN_COLAB:
    google.colab.drive.mount("/content/drive")
    
    # Change this if you created the shortcut in a different location
    AUX_DATA_ROOT = Path("/content/drive/My Drive/Deep Learning 2020 -- Home Assignment 2")
    
    assert AUX_DATA_ROOT.is_dir(), "Have you forgot to 'Add a shortcut to Drive'?"
else:
    AUX_DATA_ROOT = Path(".")

The below cell puts training and validation images in `./tiny-imagenet-200/train` and `./tiny-imagenet-200/val`:

In [2]:
# Extract the dataset into the current directory
if not Path("tiny-imagenet-200/train/class_000/00000.jpg").is_file():
    import zipfile
    with zipfile.ZipFile(AUX_DATA_ROOT / 'tiny-imagenet-2020.zip', 'r') as archive:
        archive.extractall()

**You are required** to format your notebook cells so that `Run All` on a fresh notebook:
* trains your model from scratch, if `DO_TRAIN is True`;
* loads your trained model from `"./checkpoint.pth"`, then **computes** and prints its validation accuracy, if `DO_TRAIN is False`.

In [3]:
DO_TRAIN = True

## Train the model

In [28]:
%load_ext tensorboard

In [29]:
import torch
import torchvision
import torch.utils.data as data
import torchvision.transforms as transforms
import torch.nn as nn
import torch.optim as optim
from torch.optim import lr_scheduler
import torchvision.datasets as datasets
import torchvision.models as models
import matplotlib.pyplot as plt
import numpy as np
import os
from torch.utils.tensorboard import SummaryWriter

In [32]:
os.makedirs("./logs", exist_ok=True)

%tensorboard --logdir {"./logs"}

Reusing TensorBoard on port 6006 (pid 5488), started 0:00:18 ago. (Use '!kill 5488' to kill it.)

In [40]:
writer = SummaryWriter(log_dir = 'logs/model')

In [41]:
import torch, time, copy, sys, os
import matplotlib.pyplot as plt
from livelossplot import PlotLosses

def train_model(output_path, model, dataloaders, dataset_sizes, criterion, optimizer, num_epochs=5, scheduler=None):
    if not os.path.exists('models/'+str(output_path)):
        os.makedirs('models/'+str(output_path))
    device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
    since = time.time()
    liveloss = PlotLosses()
    best_model_wts = copy.deepcopy(model.state_dict())
    best_acc = 0.0
    best = 0
    
    for epoch in range(num_epochs):
        print('Epoch {}/{}'.format(epoch+1, num_epochs))
        print('-' * 10)

        for phase in ['train', 'val']:
            
            running_loss = 0.0
            running_corrects = 0
            curr_loss = 0
            if phase == 'train':
                model.train()  
                
                for i,(inputs, labels) in enumerate(dataloaders['train']):
                    inputs = inputs.to(device)
                    labels = labels.to(device)
                    optimizer.zero_grad()
    
                    with torch.set_grad_enabled(True):
                        outputs = model(inputs)
                        prob, preds = torch.max(outputs, 1)
                        loss = criterion(outputs, labels)
                        loss.backward()
                        optimizer.step()
                    
                    curr_loss = loss.item() * inputs.size(0)
                    running_loss += curr_loss
                    writer.add_scalar('Train loss', curr_loss, global_step=i)
                    running_corrects += torch.sum(preds == labels.data)
                
                train_loss = running_loss / dataset_sizes[phase]
                train_acc = running_corrects.double() / dataset_sizes[phase]
                print(train_loss, train_acc)
            else:
                model.eval()   

                for i,(inputs, labels) in enumerate(dataloaders['val']):
                    inputs = inputs.to(device)
                    labels = labels.to(device)
                    optimizer.zero_grad()
    
                    with torch.set_grad_enabled(False):
                        outputs = model(inputs)
                        prob, preds = torch.max(outputs, 1)
                        loss = criterion(outputs, labels)
                    
                    curr_loss = loss.item() * inputs.size(0)
                    running_loss += curr_loss
                    writer.add_scalar('Val loss', curr_loss, global_step=i)
                    running_corrects += torch.sum(preds == labels.data)
                
                eval_loss = running_loss / dataset_sizes[phase]
                eval_acc = running_corrects.double() / dataset_sizes[phase]
                print(eval_loss, eval_acc)
            scheduler.step()

In [42]:
data_transforms = {
    'train': transforms.Compose([
        torchvision.transforms.ColorJitter(),
        transforms.RandomResizedCrop(size=(64, 64), scale = (0.7, 1.0)),
        #transforms.CenterCrop(50),
        #transforms.Resize((64,64), interpolation=2),
        
        transforms.RandomHorizontalFlip(),
        transforms.RandomRotation(35),
        transforms.ToTensor(),
    ]),
    'val': transforms.Compose([
        transforms.ToTensor(),
    ]),
}

In [43]:
image_datasets = {x: datasets.ImageFolder(os.path.join('tiny-imagenet-200', x), data_transforms[x]) for x in ['train', 'val']}
dataloaders = {x: torch.utils.data.DataLoader(image_datasets[x], batch_size=100, shuffle=True, num_workers=64)
              for x in ['train', 'val']}

dataset_sizes = {x: len(image_datasets[x]) for x in ['train', 'val']}

In [44]:
dataset_sizes

{'train': 100000, 'val': 10000}

In [None]:
image_datasets = {x: datasets.ImageFolder(os.path.join('tiny-imagenet-200', x), data_transforms[x]) for x in ['train', 'val']}
dataloaders = {x: torch.utils.data.DataLoader(image_datasets[x], batch_size=100, shuffle=True, num_workers=64)
              for x in ['train', 'val']}

dataset_sizes = {x: len(image_datasets[x]) for x in ['train', 'val']}

model_ft = models.resnet18()
num_ftrs = model_ft.fc.in_features
model_ft.fc = nn.Linear(num_ftrs, 200)
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
model_ft = model_ft.to(device)

criterion = nn.CrossEntropyLoss()
optimizer_ft = optim.SGD(model_ft.parameters(), lr=0.05, momentum=0.9, weight_decay=0.005, nesterov = True)
exp_lr_scheduler = lr_scheduler.MultiStepLR(optimizer_ft, milestones=[25, 39], gamma=0.1)

model_ft = train_model('model_rot_mult_norm_reg', model_ft, dataloaders, dataset_sizes, criterion, optimizer_ft, 50, exp_lr_scheduler)

Epoch 1/50
----------
4.68878015422821 tensor(0.0553, device='cuda:0', dtype=torch.float64)
4.5615658187866215 tensor(0.0649, device='cuda:0', dtype=torch.float64)
Epoch 2/50
----------
4.346771195650101 tensor(0.0900, device='cuda:0', dtype=torch.float64)
4.362562050819397 tensor(0.0891, device='cuda:0', dtype=torch.float64)
Epoch 3/50
----------
4.276687040090561 tensor(0.0995, device='cuda:0', dtype=torch.float64)
4.483496618270874 tensor(0.0718, device='cuda:0', dtype=torch.float64)
Epoch 4/50
----------


In [None]:
if DO_TRAIN:
    # Your code here (train your model)
    # etc.

## Load and evaluate the model

In [4]:
import torch
import torchvision
import torch.utils.data as data
import torchvision.transforms as transforms
import torch.nn as nn
import torch.optim as optim
from torch.optim import lr_scheduler
import torchvision.datasets as datasets
import torchvision.models as models
import matplotlib.pyplot as plt
import numpy as np

In [None]:
%load_ext tensorboard
from torch.utils.tensorboard import SummaryWriter

In [None]:
import os
logs_base_dir = "./logs"
os.makedirs(logs_base_dir, exist_ok=True)
%tensorboard --logdir {logs_base_dir}

In [8]:
image_datasets = {x: datasets.ImageFolder(os.path.join('tiny-imagenet-200', x),
                                          data_transforms[x]) for x in ['train', 'val']}

In [9]:
img_eval = torch.utils.data.DataLoader(image_datasets['val'], batch_size=100,
                                             shuffle=True, num_workers=64)

In [11]:
# Your code here (load the model from "./checkpoint.pth")
# Please use `torch.load("checkpoint.pth", map_location='cpu')`

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
model = models.densenet121()
num = model.classifier.in_features
model.classifier = nn.Linear(num, 200)
model.load_state_dict(torch.load('model_27_epoch.pt'))
model = model.to(device)
model.eval()
val_sum = 0
for img, labels in img_eval:
    img = img.to(device)
    labels = labels.to(device)
    _, pred_labels = torch.max(model(img), 1)
    val_sum += torch.sum(pred_labels == labels.data)

val_accuracy = (val_sum.double() / 10000)*100


print(val_accuracy)

tensor(51.4600, device='cuda:0', dtype=torch.float64)


In [13]:
#val_accuracy = # Your code here
assert 0 <= val_accuracy <= 100
print("Validation accuracy: %.2f%%" % val_accuracy)

Validation accuracy: 51.46%


# Report

Below, please mention:

* A brief history of tweaks and improvements.
* Which network architectures have you tried? What is the final one and why?
* What is the training method (batch size, optimization algorithm, number of iterations, ...) and why?
* Which techniques have you tried to prevent overfitting? What were their effects? Which of them worked well?
* Any other insights you learned.

For example, start with:

"I have analyzed these and those conference papers/sources/blog posts. \
I tried this and that to adapt them to my problem. \
The conclusions this task taught me are ..."

In [None]:

if DO_TRAIN:
    n_epochs = 25

    for epoch in range(n_epochs):
        model.train()
        n_iters = 0
        batch_losses_train = []
        batch_losses_val = []
        correct_train = 0
        correct_val = 0
        total_val = 0
        total_train = 0
        for batch in tqdm(train_dataloader):
            # unpack batch
            image_batch, labels = batch
            image_batch, labels = image_batch.to(device), labels.to(device)

            # forward
            pred_batch = model(image_batch)

            loss = criterion(pred_batch, labels)
            batch_losses_train.append(loss.item())

            # optimize
            optimizer.zero_grad()
            loss.backward()
            optimizer.step()
            lr_scheduler.step(loss)
            

            # dump statistics
            writer.add_scalar("train/loss", loss.item(), global_step=epoch * len(train_dataloader) + n_iters)
            

            #if n_iters % 50 == 0:
            #    writer.add_image('train/photo_image', torchvision.utils.make_grid(photo_image_batch) * 0.5 + 0.5, epoch * len(train_dataloader) + n_iters)
            #    writer.add_image('train/map_image_pred', torchvision.utils.make_grid(map_image_pred_batch), epoch * len(train_dataloader) + n_iters)
            #    writer.add_image('train/map_image_gt', torchvision.utils.make_grid(map_image_batch), epoch * len(train_dataloader) + n_iters)

            n_iters += 1
        
        # your code here
        model.eval()
        n_iters = 0
        with torch.no_grad():
            for batch in val_dataloader:
                image_batch, labels = batch
                image_batch, labels = image_batch.to(device), labels.to(device)

                pred_batch = model(image_batch)
                _, predicted = torch.max(pred_batch, 1)
                total_train += labels.size(0)
                correct_train += (predicted==labels).sum().item()
                #print(correct_train)
                #print(total_train)

                loss = criterion(pred_batch.data, labels)
                batch_losses_val.append(loss.item())

            for batch in train_dataloader:
                image_batch, labels = batch
                image_batch, labels = image_batch.to(device), labels.to(device)
                
                pred_batch = model(image_batch)
                _, predicted = torch.max(pred_batch.data, 1)
                total_val += labels.size(0)
                correct_val += (predicted == labels).sum().item()
                #print(correct_val)
                #print(total_val)

            #if n_iters < 5:
            #    writer.add_image(f'val_{n_iters}/photo_image', torchvision.utils.make_grid(photo_image_batch) * 0.5 + 0.5, 
            #                     epoch * len(valid_dataloader) + n_iters)
            #    writer.add_image(f'val_{n_iters}/map_image_pred', torchvision.utils.make_grid(map_image_pred_batch), 
            #                     epoch * epoch * len(valid_dataloader) + n_iters)
            #    writer.add_image(f'val_{n_iters}/map_image_gt', torchvision.utils.make_grid(map_image_batch), 
            #                    epoch * epoch * len(valid_dataloader) + n_iters)
            n_iters += 1
        #print(batch_losses_train)
        #print(batch_losses_val)
        loss_averaged_train = np.mean(batch_losses_train)
        loss_averaged_val = np.mean(batch_losses_val)
        accuracy_train = 100*(correct_train/total_train)
        accuracy_val = 100*(correct_val/total_val)
        writer.add_scalar('val/loss_averaged', loss_averaged_val.item(), epoch)
        writer.add_scalar('train/loss_averaged', loss_averaged_train.item(), epoch)
        writer.add_scalar('val/accuracy', accuracy_val, epoch)
        writer.add_scalar('train/accuracy', accuracy_train, epoch)

        print("Epoch {} out of {} done.".format(epoch+1, n_epochs))
        print(f"Train accuracy :{accuracy_train}  Train loss: {loss_averaged_train}")


print(f"Val accuracy: {accuracy_val}    Val loss: {loss_averaged_val}")
        print()
`


BATCH_SIZE = 256

device = torch.device('cuda:5')

transform = transforms.Compose([
    transforms.ToTensor(),
    #transforms.Normalize(mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5]),
])
train_dataset = torchvision.datasets.ImageFolder(DATASET_ROOT + 'train', transform=transform)
train_dataloader = DataLoader(train_dataset, batch_size=BATCH_SIZE, shuffle=True, num_workers=8)
val_dataset   = torchvision.datasets.ImageFolder(DATASET_ROOT + 'val', transform=transform)
val_dataloader = DataLoader(val_dataset, batch_size=BATCH_SIZE, shuffle=False, num_workers=8)

In [1]:
!pip install livelossplot

Collecting livelossplot
  Downloading livelossplot-0.5.0-py3-none-any.whl (26 kB)
Installing collected packages: livelossplot
Successfully installed livelossplot-0.5.0


You should consider upgrading via the 'c:\users\dmvyp\anaconda5\python.exe -m pip install --upgrade pip' command.


In [2]:
import torch, time, copy, sys, os
import matplotlib.pyplot as plt
from livelossplot import PlotLosses
import torch, torchvision
import torch.nn as nn
import torch.optim as optim
from torch.optim import lr_scheduler
import torchvision.datasets as datasets
import torch.utils.data as data
import torchvision.transforms as transforms
from torch.autograd import Variable
import torchvision.models as models
import matplotlib.pyplot as plt
import time, os, copy, numpy as np
from livelossplot import PlotLosses

%matplotlib inline


def train_model(output_path, model, dataloaders, dataset_sizes, criterion, optimizer, num_epochs=5, scheduler=None):
    if not os.path.exists('models/'+str(output_path)):
        os.makedirs('models/'+str(output_path))
    device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
    since = time.time()
    liveloss = PlotLosses()
    best_model_wts = copy.deepcopy(model.state_dict())
    best_acc = 0.0
    best = 0
    for epoch in range(num_epochs):
        print('Epoch {}/{}'.format(epoch+1, num_epochs))
        print('-' * 10)

        # Each epoch has a training and validation phase
        for phase in ['train', 'val']:
            if phase == 'train':
                model.train()  # Set model to training mode
            else:
                model.eval()   # Set model to evaluate mode

            running_loss = 0.0
            running_corrects = 0

            # Iterate over data.
            for i,(inputs, labels) in enumerate(dataloaders[phase]):
                inputs = inputs.to(device)
                labels = labels.to(device)

                # zero the parameter gradients
                optimizer.zero_grad()

                # forward
                # track history if only in train
                with torch.set_grad_enabled(phase == 'train'):
                    outputs = model(inputs)
                    _, preds = torch.max(outputs, 1)
                    loss = criterion(outputs, labels)

                    # backward + optimize only if in training phase
                    if phase == 'train':
                        loss.backward()
                        optimizer.step()
                        scheduler.step()

                # statistics
                running_loss += loss.item() * inputs.size(0)
                running_corrects += torch.sum(preds == labels.data)
                print("\rIteration: {}/{}, Loss: {}.".format(i+1, len(dataloaders[phase]), loss.item() * inputs.size(0)), end="")

#                 print( (i+1)*100. / len(dataloaders[phase]), "% Complete" )
                sys.stdout.flush()
                
            #if scheduler != None:
            #    scheduler.step()  
            
            epoch_loss = running_loss / dataset_sizes[phase]
            epoch_acc = running_corrects.double() / dataset_sizes[phase]
            if phase == 'train':
                avg_loss = epoch_loss
                t_acc = epoch_acc
            else:
                val_loss = epoch_loss
                val_acc = epoch_acc
            
#             print('{} Loss: {:.4f} Acc: {:.4f}'.format(
#                 phase, epoch_loss, epoch_acc))

            # deep copy the model
            if phase == 'val' and epoch_acc > best_acc:
                best_acc = epoch_acc
                best = epoch + 1
                best_model_wts = copy.deepcopy(model.state_dict())
                
        liveloss.update({
            'log loss': avg_loss,
            'val_log loss': val_loss,
            'accuracy': t_acc,
            'val_accuracy': val_acc
        })
                
        liveloss.draw()
        print('Train Loss: {:.4f} Acc: {:.4f}'.format(avg_loss, t_acc))
        print(  'Val Loss: {:.4f} Acc: {:.4f}'.format(val_loss, val_acc))
        print()
        torch.save(model.state_dict(), './models/' + str(output_path) + '/model_{}_epoch.pt'.format(epoch+1))
    time_elapsed = time.time() - since
    print('Training complete in {:.0f}m {:.0f}s'.format(
        time_elapsed // 60, time_elapsed % 60))
    print('Best Validation Accuracy: {}, Epoch: {}'.format(best_acc, best))

In [7]:

data_transforms = {
    'train': transforms.Compose([
        transforms.RandomResizedCrop(size=(64, 64), scale = (0.6, 1.0)),
        #transforms.CenterCrop(50),
        #transforms.Resize((64,64), interpolation=2),
        transforms.RandomHorizontalFlip(),
        transforms.RandomRotation(45),
        transforms.ToTensor(),
    ]),
    'val': transforms.Compose([
        transforms.ToTensor(),
    ]),
}

data_dir = 'tiny-imagenet-200'

image_datasets = {x: datasets.ImageFolder(os.path.join(data_dir, x),
                                          data_transforms[x])
                  for x in ['train', 'val']}
dataloaders = {x: torch.utils.data.DataLoader(image_datasets[x], batch_size=100,
                                             shuffle=True)
              for x in ['train', 'val']}
dataset_sizes = {x: len(image_datasets[x]) for x in ['train', 'val']}

#Load Resnet18
model_ft = models.resnet18()
#Finetune Final few layers to adjust for tiny imagenet input
model_ft.avgpool = nn.AdaptiveAvgPool2d(1)
num_ftrs = model_ft.fc.in_features
model_ft.fc = nn.Linear(num_ftrs, 200)
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
model_ft = model_ft.to(device)
#Multi GPU
#model_ft = torch.nn.DataParallel(model_ft, device_ids=[0, 1])

#Loss Function
criterion = nn.CrossEntropyLoss()
# Observe that all parameters are being optimized
optimizer_ft = optim.SGD(model_ft.parameters(), lr=0.05, momentum=0.9, weight_decay=0.003, nesterov = True)

# Decay LR by a factor of 0.1 every 7 epochs
exp_lr_scheduler = lr_scheduler.StepLR(optimizer_ft, step_size=25, gamma=0.1)
exp_lr_scheduler = lr_scheduler.MultiStepLR(optimizer_ft, milestones=[25, 39], gamma=0.1)
exp_lr_scheduler = torch.optim.lr_scheduler.CyclicLR(optimizer_ft, base_lr=0.001, max_lr=0.05, step_size_up=3000)

#Train
model_ft = train_model('model_rot_mult_norm',model_ft, dataloaders, dataset_sizes, criterion, optimizer_ft, 50, exp_lr_scheduler)

Traceback (most recent call last):
  File "C:\Users\dmvyp\Anaconda5\lib\multiprocessing\queues.py", line 230, in _feed
    close()
  File "C:\Users\dmvyp\Anaconda5\lib\multiprocessing\connection.py", line 177, in close
    self._close()
  File "C:\Users\dmvyp\Anaconda5\lib\multiprocessing\connection.py", line 277, in _close
    _CloseHandle(self._handle)
OSError: [WinError 6] Неверный дескриптор


Epoch 1/50
----------
Iteration: 7/1000, Loss: 543.1194305419922.

KeyboardInterrupt: 

In [8]:
#Load Resnet18
model_ft = models.resnet18()
#Finetune Final few layers to adjust for tiny imagenet input
model_ft.avgpool = nn.AdaptiveAvgPool2d(1)
num_ftrs = model_ft.fc.in_features
model_ft.fc = nn.Linear(num_ftrs, 200)
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
model_ft = model_ft.to(device)
#Multi GPU
#model_ft = torch.nn.DataParallel(model_ft, device_ids=[0, 1])

#Loss Function
criterion = nn.CrossEntropyLoss()
# Observe that all parameters are being optimized
optimizer_ft = optim.SGD(model_ft.parameters(), lr=0.05, momentum=0.9, weight_decay=0.003, nesterov = True)

# Decay LR by a factor of 0.1 every 7 epochs
#exp_lr_scheduler = lr_scheduler.StepLR(optimizer_ft, step_size=25, gamma=0.1)
#exp_lr_scheduler = lr_scheduler.MultiStepLR(optimizer_ft, milestones=[25, 39], gamma=0.1)
exp_lr_scheduler = torch.optim.lr_scheduler.CyclicLR(optimizer_ft, base_lr=0.001, max_lr=0.05,step_size_up=3000, mode = 'triangular2')

#Train
model_ft = train_model('model_rot_mult_norm',model_ft, dataloaders, dataset_sizes, criterion, optimizer_ft, 50, exp_lr_scheduler)

Epoch 1/50
----------
Iteration: 203/1000, Loss: 498.73080253601074.

KeyboardInterrupt: 