# ***Important***

**Before starting, make sure to read the [Assignment Instructions](https://courseworks2.columbia.edu/courses/172081/pages/assignment-instructions) page on Courseworks2 to learn the workflow for completing this project.**

<ins>Important changes since Project 1 on the assignment instructions: </ins> When you are ready to submit, use "File -> Save and pin revision". Please name this revision "Grade me". We will only grade the revision that is correctly named. Late days will be applied according to this named revision.

# Intruduction

This project aims to demonstrate how neural networks can be used in a robotics setting. We will continue using the 2D maze environment introduced in Project 1 and learn to navigate an agent to a goal. However, since neural networks can be more powerful models than the ones we had access to previously, we can afford to make some changes to the 2D maze environment and make the problem more difficult.  This project consists of three parts. In Part I, you will be training a simple DNN, which will take as input the agent position and goal position. In Parts II and III, you will be training CNNs which take as input an image of the environment, with the agent the goal depicted on it.

<div>
<img src="https://github.com/roamlab/robot-learning-S2023/blob/main/project2/imgs/P1_side.png?raw=true" width="300"/>
</div>

The image above shows the simulation world. The "robot" (also called "agent") is shown by the green dot. The goal location is shown by the red square. The agent is required to navigate to the goal. **Unlike the previous project, the robot and the goal are spawned at random positions in the maze.** Also, the action space now contains all four directions: 'up', 'down', 'left' and 'right'. Another change is that, in addition to the obstacle map shown above, we introduce two new obstacle maps as shown below. However, these new maps will not be used until Part III.

<div>
<img src="https://github.com/roamlab/robot-learning-S2023/blob/main/project2/imgs/map1.png?raw=true" width="300"/>
<img src="https://github.com/roamlab/robot-learning-S2023/blob/main/project2/imgs/map2.png?raw=true" width="300"/>
<img src="https://github.com/roamlab/robot-learning-S2023/blob/main/project2/imgs/map3.png?raw=true" width="300"/>
</div>

We want to learn to navigate the agent by imitating demonstrations from an expert user. In all three parts, you will be using data collected by a human controlling the agent via a keyboard for training.

# Part 0. Project Setup

**Important: You need to follow these steps so that Colab can access the required data files and python source files.**

1. Download the zip file from the Google Drive link: https://drive.google.com/file/d/1uMY8x2kjW86d2VD21SVsTWtVa-zm_zSb/view?usp=share_link. Make sure you are logged into your LionMail account.
2. Click on the "Files" icon in the left sidebar. Then choose "Upload to session storage". Upload the zip file downloaded in the first step to Colab.
<div>
<img src="https://github.com/roamlab/robot-learning-S2023/blob/main/project2/imgs/guide_1.png?raw=true" width="300"/>
<img src="https://github.com/roamlab/robot-learning-S2023/blob/main/project2/imgs/guide_2.png?raw=true" width="300"/>
</div>
3. Run the cell below to unzip the zip file.

*You only need to download the zip file to your local machine (step 1) ONCE. But you need to do step 2 and step 3 EVERY TIME you connect to Colab.*

In [None]:
# DO NOT CHANGE

# Make sure you have successfully uploaded the zip file to Colab before running the line below.
# Unzip the uploaded zip file
!unzip project2.zip -d /content/

Archive:  project2.zip
   creating: /content/mjcf/
  inflating: /content/mjcf/point_mass.xml  
   creating: /content/mjcf/common/
  inflating: /content/mjcf/common/skybox.xml  
  inflating: /content/mjcf/common/visual.xml  
  inflating: /content/mjcf/common/materials.xml  
  inflating: /content/mjcf/test_mjcf.xml  
  inflating: /content/dnn.py         
   creating: /content/imgs/
  inflating: /content/imgs/P1_side.png  
  inflating: /content/imgs/map1.png  
  inflating: /content/imgs/map3.png  
  inflating: /content/imgs/map2.png  
  inflating: /content/score_policy.py  
  inflating: /content/simple_maze.py  
  inflating: /content/data_utils.py  
   creating: /content/data/
  inflating: /content/data/map1.pkl  
  inflating: /content/data/all_maps.pkl  


In [None]:
# DO NOT CHANGE

# Install required packages
!pip install pybullet==2.6.6 numpngw

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting pybullet==2.6.6
  Downloading pybullet-2.6.6.tar.gz (89.2 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m89.2/89.2 MB[0m [31m5.9 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting numpngw
  Downloading numpngw-0.1.2-py3-none-any.whl (21 kB)
Building wheels for collected packages: pybullet
  Building wheel for pybullet (setup.py) ... [?25l[?25hdone
  Created wheel for pybullet: filename=pybullet-2.6.6-cp39-cp39-linux_x86_64.whl size=103261505 sha256=347b8a6a922b45cb214262315bd4ef73d329bc450cad40fa477629c7b74d733f
  Stored in directory: /root/.cache/pip/wheels/7e/ab/14/9235a3e8e4f3c31dcec4ea48039fc54139a3d05f3281c2dcfa
Successfully built pybullet
Installing collected packages: pybullet, numpngw
Successfully installed numpngw-0.1.2 pybullet-2.6.6


# Part I. Behavioral cloning with low dimensional data

This part is a natural extension of Part II in Project 1.

Learning the agent's policy here is the familiar classification problem, given that you will be provided with labeled examples from an expert. Each labeled example $i$ will contain a tuple of the form $(o, a)^i$, where $o$ represents an observation and $a$ represents the action taken by the expert given that observation. You must simply learn to imitate the expert, a process also known as behavioral cloning. While the action space is the same in all parts of the project, the observation space will be different.

We will be training a DNN policy to predict an action to be taken ('up', 'down', 'left', and 'right') based on the observation. **In Part I, the observation will contain the agent position and the current goal position.** (Because the goal is sampled randomly, the policy has to know the current goal to be reached.) The environment thus returns an observation array of size (4, ) where the agent position is contained in the first two axes and the current goal position is contained in the next two. **In Part I, the map that the robot is navigating is always the same.**

PyTorch and Tensorflow are two popular frameworks for building and training neural networks but for this class, we will be exclusively using PyTorch and you are allowed to use any of its features. A good starting point can be found [here](https://github.com/roamlab/robot-learning-S2023/blob/main/project2/dnn.py).

You will implement a class that inherits from `RobotPolicy` by providing implementations for the abstract methods from the class.



In [None]:
# DO NOT CHANGE
# base class

import abc


class RobotPolicy(abc.ABC):

    @abc.abstractmethod
    def train(self, data):
        """
            Abstract method for training a policy.

            Args:
                data: a dict that contains X (key = 'obs') and y (key = 'actions').

                X is either rgb image (N, 64, 64, 3) OR  agent & goal pos (N, 4)

            Returns:
                This method does not return anything. It will just need to update the
                property of a RobotPolicy instance.
        """

    @abc.abstractmethod
    def get_action(self, obs):
        """
            Abstract method for getting action. You can do data preprocessing and feed
            forward of your trained model here.
            Args:
                obs: an observation (64 x 64 x 3) rgb image OR (4, ) positions

            Returns:
                action: an integer between 0 to 3
        """

In [None]:
# Implement your solution for Part 1 below
import torch
torch.manual_seed(0)
import torch.nn as nn
import torch.nn.functional as F
import numpy as np
from torch.utils.data.dataset import Dataset
from torch.utils.data import DataLoader

class Project2DNN(nn.Module):
    def __init__(self, input_dim):
        super(Project2DNN, self).__init__()
        self.fc1 = nn.Linear(input_dim, 128)
        self.fc2 = nn.Linear(128, 64)
        self.fc3 = nn.Linear(64, 4)

    def forward(self, x):
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

    def predict(self, features):
        """
        Function receives a numpy array, converts to torch, returns numpy again
        """
        self.eval()	#Sets network in eval mode (vs training mode)
        features = torch.from_numpy(features).float()
        return self.forward(features)

class MyDataset(Dataset):
    def __init__(self, labels, features):
        super(MyDataset, self).__init__()
        self.labels = labels
        self.features = features

    def __len__(self):
        return self.features.shape[0]

    def __getitem__(self, idx):		#This tells torch how to extract a single datapoint from a dataset, Torch randomized and needs a way to get the nth-datapoint
        feature = self.features[idx]
        label = self.labels[idx]
        return {'feature': feature, 'label': label}

class POSBCRobot(RobotPolicy):

    def train(self, data):

        #for key, val in data.items():
        #    print(key, val.shape)
        #print("Using dummy solution for POSBCRobot")
        #pass
      self.network = Project2DNN(4)
      self.learning_rate = .01
      self.optimizer = torch.optim.Adam(self.network.parameters(), lr=self.learning_rate)
      self.criterion = nn.CrossEntropyLoss()
      self.num_epochs = 500
      self.batchsize = 25
      self.shuffle = True

      total_loss = 0.0
      for key, val in data.items():
        if(key == 'actions'):
          labels = val
        elif(key =='obs'):
          features = val

      #labels = np.asarray(np.reshape(label, (label.shape[0], 1)))
      #features = np.asarray(feature)

      #print(labels.shape, features.shape)

      dataset = MyDataset(labels, features)
      loader = DataLoader(dataset, shuffle=self.shuffle, batch_size = self.batchsize)

      for epoch in range(self.num_epochs):
        total_loss = 0.0
        for i, data in enumerate(loader):
          features = data['feature'].float()
          labels = data['label']
          self.optimizer.zero_grad()
          predictions = self.network(features)
          loss = self.criterion(predictions, labels)
          loss.backward()
          total_loss +=loss.item()
          self.optimizer.step()
        print('loss', total_loss/i)

    def get_action(self, obs):
        return torch.argmax(self.network.predict(obs))

## Evaluation and Grading

We will evaluate your model by simply having the agent follow the commands that it provides.  We will evaluate for 100 different randomly sampled starting positions and goals. For each goal, we roll out the trained policy for 50 steps. After the 50 steps, we will evaluate the closest distance to the goal the agent has ended up. If the agent reaches < 0.1 distance from the goal, the episode is ended before 50 steps and the minimum distance will be recorded as 0. The score is the fraction of the initial distance to goal covered by the agent averaged over 100 trials. Your final grade will be computed based on this score.

We will calculate the score using the formula :

```score = avg[(init_dist -  min_dist) / init_dist]```

We will auto-generate your grades using the code below. The grading of each part is separate from each other so you can get the grade right after each part is finished.

The total points of this assignment are 15. According to the difficulty level of each part, parts 1, 2, and 3 have 4, 5, 6 points respectively.

- Part 1: if your score >= 0.99, you will receive 4 / 4. Otherwise, your final grade will be score / 0.99 * 4.
- Part 2: if your score >= 0.95, you will receive 5 / 5. Otherwise, your final grade will be score / 0.95 * 5.
- Part 3: if your score >= 0.95, you will receive 6 / 6. Otherwise, your final grade will be score / 0.95 * 6.

The score function for each part provides two extra arguments to assist your debugging.

- gui: If this is set to True, you will save the behavior of the agents during evaluation as an animation file. This animation file can be visualized using the provided code below to help you understand the behavior of the agent. **Please set it to False before your submission as it will slow down evaluation.**
- model: If you provide a path to a saved model, the score function will not train from scratch but will instead load the save model. **Please set it to None before submission.** Any models you generate during runtime will be automatically deleted when disconnected. The grader will train the model from scratch.

In [None]:
# DO NOT CHANGE
# Set up grading

import score_policy
import importlib
importlib.reload(score_policy)
from IPython.display import Image


part1_bound = 0.99
part2_bound = 0.95
part3_bound = 0.95

In [None]:
# DO NOT CHANGE
# Getting the score and grade for Part 1

score1 = score_policy.score_pos_bc(policy=POSBCRobot(), gui=False, model=None)
grade1 = score1 / part1_bound * 4 if score1 < part1_bound else 4

print('\n---')
print(f'Part 1 Score: {score1}')
print(f'Part 1 Grade: {score1:.2f} / {part1_bound:.2f} * 4 = {grade1:.2f}')

loss 0.736521785645365
loss 0.49338911539353664
loss 0.4606442698892557
loss 0.421215554453292
loss 0.4023906680398017
loss 0.3795953232256122
loss 0.37085489118061726
loss 0.36060934162364816
loss 0.344667043232318
loss 0.34610863250194107
loss 0.3422666661581903
loss 0.3183236955050028
loss 0.3146117433155858
loss 0.29607906668159945
loss 0.2986305573217149
loss 0.2761607276006315
loss 0.2775510223406666
loss 0.2819247030237186
loss 0.26799461988251916
loss 0.2573349194940906
loss 0.2586349053952679
loss 0.265463031804974
loss 0.2629578020892241
loss 0.2543383500532909
loss 0.251150323933213
loss 0.2507862256336137
loss 0.23652164202253773
loss 0.24097880600430305
loss 0.24595440333744265
loss 0.24304281688242588
loss 0.23400198650669377
loss 0.24410756860139235
loss 0.2290019559316665
loss 0.23413468480297606
loss 0.22162815942516867
loss 0.22599674639461925
loss 0.2393972431807398
loss 0.2412192085230688
loss 0.2237443505265053
loss 0.21470621359423273
loss 0.21166275504906223
loss

In [None]:
# Optionally, uncomment and run the code below if you have saved an animation (gui = True) that you want to visualize.

# Image(filename='part_1_anim.png', width=200, height=200)

# Part II. Behavioral cloning with visual observations

In this part, you asked to do a similar task as Part I, **but the observations will be RGB image observations of the world**, similar to the ones you used to do localization in Part III of Project 1. To process the RGB image you will be implementing a CNN using PyTorch.  [The official PyTorch tutorial](https://pytorch.org/tutorials/beginner/blitz/cifar10_tutorial.html) is a good starting point. As in Part I, the map that the robot is navigating is always the same. **This means that your model really only has to learn how to figure out where the robot and the goal are located, and how to navigate around a fixed set of obstacles.**

All requirements from your code, as well as the evaluation method, are unchanged compared to Part I. The only difference is the nature of the observation that is provided to you.

In [None]:
# Implement your solution for Part 2 below

import torch
torch.manual_seed(0)
import torchvision
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import numpy as np
from torch.utils.data.dataset import Dataset
from torch.utils.data import DataLoader


class Net(nn.Module):
    def __init__(self):
        super().__init__()


        self.conv1 = nn.Conv2d(in_channels=3, out_channels=16, kernel_size=3, stride=1, padding=1)
        self.conv2 = nn.Conv2d(in_channels=16, out_channels=32, kernel_size=3, stride=1, padding=1)
        self.pool = nn.MaxPool2d(2, 2)
        #self.drop = nn.Dropout2d(p=0.2)
        #self.conv3 = nn.Conv2d(in_channels=32, out_channels=64, kernel_size=3, stride=1, padding=1)
        #self.conv4 = nn.Conv2d(in_channels=64, out_channels=32, kernel_size=3, stride=1, padding=1)
        self.fc1 = nn.Linear(32*16*16, 64)
        self.fc2 = nn.Linear(64, 4)

    def forward(self, x):

        #print("layer1")
        #print(x.shape)
        x = F.relu(self.conv1(x))
        x = self.pool(x)
        #print("layer2")
        #print(x.shape)
        x = F.relu(self.conv2(x))
        #print("layer3")
        #print(x.shape)
        x = self.pool(x)
        #x = F.relu(self.conv3(x))
        #print("layer4")
        #print(x.shape)
        #x = self.pool(x)
        #x = F.relu(self.conv4(x))
        #print("layer5")
        #print(x.shape)
        #x = torch.flatten(x, 1)
        x = x.view(-1, 32*16*16)
        x = F.relu(self.fc1(x))
        #print("layer6")
        #print(x.shape)
        x = self.fc2(x)
        #print("layer7")
        #print(x.shape)
        return F.log_softmax(x, dim=1)

    def predict(self, features):
      self.eval()
      label = torch.max(self.forward(features), 1)
      action = label[1].item()
      return action


class MyDataset(Dataset):
    def __init__(self, labels, features):
        super(MyDataset, self).__init__()
        self.labels = labels
        self.features = features

    def __len__(self):
        return self.features.shape[0]

    def __getitem__(self, idx):
        feature = self.features[idx]
        label = self.labels[idx]
        return {'feature': feature, 'label': label}

class RGBBCRobot1(RobotPolicy):

    def train(self, data):
        #for key, val in data.items():
        #    print(key, val.shape)
        #print("Using dummy solution for RGBBCRobot1")
        #pass

      self.net = Net()
      self.criterion = nn.CrossEntropyLoss()
      self.learning_rate = 0.001
      self.optimizer = optim.Adam(self.net.parameters(), lr=self.learning_rate)
      self.batch_size = 50

      for key, val in data.items():
        if(key == 'actions'):
          labels = val
          print(labels.shape)
        elif(key =='obs'):
          features = val
          print(features.shape)

      features = np.asarray(np.reshape(features, (features.shape[0], features.shape[3], features.shape[2], features.shape[1])))

      labels = np.asarray(np.reshape(labels, (labels.shape[0], 1)))

      dataset = MyDataset(labels, features)
      loader = DataLoader(dataset, batch_size = self.batch_size, shuffle=True)



      for epoch in range(55):  # loop over the dataset multiple times
        total_loss = 0.0
        for i, data in enumerate(loader):

          #print(data['feature'].shape)

          feature = data['feature'].float()
          label = data['label']

          self.optimizer.zero_grad()
          outputs = self.net(feature)
          #if(epoch==1):
          #  print(outputs)
          #  print(labels)
          loss = self.criterion(outputs, label.flatten())
          loss.backward()
          self.optimizer.step()
          total_loss +=loss

        print("{0:.5f}".format(total_loss/i))

    def get_action(self, obs):
      obs = np.reshape(obs, (1, obs.shape[2], obs.shape[1], obs.shape[0]))
      obs = torch.from_numpy(obs).float()
      action = self.net.predict(obs)
      #print("action = ")
      #print(action)
      #print(action[1].item())
      return action

## Evaluation and Grading

In [None]:
# DO NOT CHANGE
# Getting the score and grade for Part 2

score2 = score_policy.score_rgb_bc1(policy=RGBBCRobot1(), gui=False, model=None)
grade2 = score2 / part2_bound * 5 if score2 < part2_bound else 5

print('\n---')
print(f'Part 2 Score: {score2}')
print(f'Part 2 Grade: {score2:.2f} / {part2_bound:.2f} * 5 = {grade2:.2f}')

(4000,)
(4000, 64, 64, 3)
1.40760
1.39175
1.35409
1.22189
1.13391
1.08630
1.03261
0.99078
0.98030
0.93111
0.90217
0.87241
0.85871
0.81322
0.79415
0.75974
0.72321
0.68939
0.67970
0.63839
0.60732
0.58937
0.56055
0.54550
0.51858
0.49185
0.46812
0.48157
0.44937
0.43902
0.41687
0.40015
0.38063
0.36205
0.34665
0.33706
0.33196
0.31781
0.31556
0.29693
0.29501
0.27565
0.26731
0.26922
0.24933
0.23713
0.23764
0.22693
0.21614
0.22084
0.20388
0.19418
0.18993
0.18052
0.19360

---
Part 2 Score: 0.8721420604230399
Part 2 Grade: 0.87 / 0.95 * 5 = 4.59


In [None]:
# Optionally, uncomment and run the code below if you have saved an animation (gui = True) that you want to visualize.

# Image(filename='part_2_anim.png', width=200, height=200)

# Part III. Behavioral cloning with visual observations - multiple maps

This part is the same as  Part II except that it is trained and tested differently. **The training set involves expert demonstrations for the two new obstacle maps. And while testing, for each trial, a different obstacle map is randomly selected.** This means that your model has to learn how to reason about what an obstacle is, and how to go around it, based on nothing more than an image. The main objective of this part is to show that, when using a CNN, it is possible for a model to achieve this. The evaluation method for this part is the same as Part I and II.

In [None]:
# Implement your solution for Part 3 below

import torch
torch.manual_seed(0)
import torchvision
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import numpy as np
from torch.utils.data.dataset import Dataset
from torch.utils.data import DataLoader


class Net(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(in_channels=3, out_channels=8, kernel_size=3, stride=1, padding=1)
        self.conv2 = nn.Conv2d(in_channels=8, out_channels=8, kernel_size=3, stride=1, padding=1)
        self.pool = nn.MaxPool2d(2, 2)
        #self.drop = nn.Dropout2d(p=0.2)
        #self.conv3 = nn.Conv2d(in_channels=32, out_channels=64, kernel_size=3, stride=1, padding=1)
        #self.conv4 = nn.Conv2d(in_channels=64, out_channels=32, kernel_size=3, stride=1, padding=1)
        self.fc1 = nn.Linear(8*32*32, 128)
        self.fc2 = nn.Linear(128, 64)
        self.fc3 = nn.Linear(64, 4)

    def forward(self, x):

        #print("layer1")
        #print(x.shape)
        x = F.relu(self.conv1(x))
        x = self.pool(x)
        #print("layer2")
        #print(x.shape)
        x = F.relu(self.conv2(x))
        #print("layer3")
        #print(x.shape)
        #x = self.pool(x)
        #x = F.relu(self.conv3(x))
        #print("layer4")
        #print(x.shape)
        #x = self.pool(x)
        #x = F.relu(self.conv4(x))
        #print("layer5")
        #print(x.shape)
        #x = torch.flatten(x, 1)
        x = x.view(-1, 8*32*32)
        x = F.relu(self.fc1(x))
        #print("layer6")
        #print(x.shape)
        x = self.fc2(x)
        #print("layer7")
        #print(x.shape)
        x = self.fc3(x)
        return x

    def predict(self, features):
      self.eval()
      label = torch.max(self.forward(features), 1)
      action = label[1].item()
      return action


class MyDataset(Dataset):
    def __init__(self, labels, features):
        super(MyDataset, self).__init__()
        self.labels = labels
        self.features = features

    def __len__(self):
        return self.features.shape[0]

    def __getitem__(self, idx):
        feature = self.features[idx]
        label = self.labels[idx]
        return {'feature': feature, 'label': label}


class RGBBCRobot2(RobotPolicy):

    def train(self, data):
        #for key, val in data.items():
        #    print(key, val.shape)
        #print("Using dummy solution for RGBBCRobot2")
        #pass

      self.net = Net()
      self.criterion = nn.CrossEntropyLoss()
      self.learning_rate = 0.002
      self.optimizer = optim.Adam(self.net.parameters(), lr=self.learning_rate)
      self.batch_size = 25

      for key, val in data.items():
        if(key == 'actions'):
          labels = val
          print(labels.shape)
        elif(key =='obs'):
          features = val
          print(features.shape)

      features =torch.from_numpy(np.asarray(np.reshape(features, (features.shape[0], features.shape[3], features.shape[2], features.shape[1]))))

      #labels = np.asarray(np.reshape(labels, (labels.shape[0], 1)))

      dataset = MyDataset(labels, features)
      loader = DataLoader(dataset, batch_size = self.batch_size, shuffle=True)



      for epoch in range(35):  # loop over the dataset multiple times
        total_loss = 0.0
        for i, data in enumerate(loader):

          #print(data['feature'].shape)

          feature = data['feature'].float()
          label = data['label']

          self.optimizer.zero_grad()
          outputs = self.net(feature)
          #if(epoch==1):
          #  print(outputs)
          #  print(labels)
          loss = self.criterion(outputs, label)
          loss.backward()
          #print(loss)
          total_loss +=loss.item()
          self.optimizer.step()

        print("{0:.5f}".format(total_loss/i))

    def get_action(self, obs):
      obs = np.reshape(obs, (1, obs.shape[2], obs.shape[1], obs.shape[0]))
      obs = torch.from_numpy(obs).float()
      action = self.net.predict(obs)
      #print("action = ")
      #print(action)
      #print(action[1].item())
      return action

## Evaluation and Grading


In [None]:
# DO NOT CHANGE
# Getting the score and grade for Part 3

score3 = score_policy.score_rgb_bc2(policy=RGBBCRobot2(), gui=False, model=None)
grade3 = score3 / part3_bound * 6 if score3 < part3_bound else 6

print('\n---')
print(f'Part 3 Score: {score3}')
print(f'Part 3 Grade: {score3:.2f} / {part3_bound:.2f} * 6 = {grade3:.2f}')

(12000,)
(12000, 64, 64, 3)
1.35566
1.25762
1.06596
0.94266
0.82665
0.71186
0.62096
0.53689
0.47008
0.40924
0.37090
0.31634
0.28470
0.24748
0.21504
0.19643
0.18340
0.15771
0.14767
0.13969
0.13054
0.11936
0.12632
0.09674
0.09050
0.10679
0.09925
0.07709
0.08760
0.08400
0.06693
0.08678
0.08284
0.06797
0.07021

---
Part 3 Score: 0.5088990903090036
Part 3 Grade: 0.51 / 0.95 * 6 = 3.21


In [None]:
# Optionally, uncomment and run the code below if you have saved an animation (gui = True) that you want to visualize.

# Image(filename='part_3_anim.png', width=200, height=200)

# Other Requirements and Hints

- **Training time**: To keep auto-grading feasible, your total training time must be strictly under 3 mins, 15mins, and 10 mins for parts 1, 2, and 3. These time budgets are more than enough to achieve full credits on this project. Note that longer training time does not necessarily mean higher performance because of overfitting. The faster your network trains the better!
- **Memory usage**: Make sure your code does not require too much memory. The required amount of RAM for this assignment should not be more than 8GB.
- **NO GPU**: No GPU is required or allowed for this assignment.
- **Reproducibility**: We have ensured that the randomness of the environment is deterministic. To get reproducible scores you must ensure your model training and prediction are also reproducible. The randomly initialized weights of the neural network should be made repeatable using seeding. You can add PyTorch seeding method below and see [PyTorch Reproducibility](https://pytorch.org/docs/stable/notes/randomness.html) to learn more.
  ```
  import torch
  torch.manual_seed(0)
  ```
- **Classifier**: In all the parts we are training a neural network to solve a classification problem and it is important to use a reasonable loss function. For example, the MSE (mean squared classification) error has drawbacks related to sensitivity. Cross entropy loss usually has good performance for classification tasks and you can find the documentation for it [here](https://pytorch.org/docs/stable/generated/torch.nn.CrossEntropyLoss.html#torch.nn.CrossEntropyLoss) and is further explained below. However, note that you are free to use any loss function you like.
  - Cross entropy is a concept from information theory which is defined for two probability distributions. Cross entropy is minimum when the two distributions involved are the same and this is the property that makes it useful as a loss function in the context of machine learning. The idea is to minimize the cross entropy between the prediction distribution and the label distribution. For our case where we are training a neural network for classification, we can have the network output a score for each action. Cross entropy can be computed from these scores by converting to probability values (using softmax) and comparing it with the label distribution. The label distribution is obtained simply by assigning a probability of 1 to ground truth action and 0 to all other actions. Once trained, the best action can found by just choosing the action with the highest probability (i.e., the highest score) as predicted by the network.
- **Optimizer**: While it is possible to use a simple optimizer to achieve the desired accuracy, the training time can be quite high. There exist a number of optimizers implemented in PyTorch that have much faster convergence.
- **Parameter tuning**: Keep your architectures simple and slowly add complexity (more layers/kernels) to improve accuracy. Remember "To Err is Human" and the expert data (collected by a human) that you are training on is not perfect. Having a 100% training accuracy (very small training loss) might not be the best for achieving the highest score. So make sure your model does not overfit during training.
- **PyTorch input shape**: Notice that the expected input shape to CONV2D in PyTorch is (N, C, H, W), where N is the batch size, C is the number of channels, H is the image height and W is the image width. You will need to switch axes for the incoming images in order for them to be correctly passed to the first convolution layer.