# ***Important***

**Before starting, make sure to read the [Assignment Instructions](https://courseworks2.columbia.edu/courses/172081/pages/assignment-instructions) page on Courseworks2 to learn the workflow for completing this project.**

**Different from Projects 1 and 2**, apart from the link to your notebook, you are also required to submit the collected data file `data.pkl` and your chosen model checkpoint `dynamic.pth` to Coursework. You should put the link to your notebook in the "Comment" section of your submission.

# Project Setup


In [None]:
# DO NOT CHANGE

# After running this cell, the folder 'mecs6616_sp23_project3' will show up in the file explorer on the left (click on the folder icon if it's not open)
# It may take a few seconds to appear
!git clone https://github.com/roamlab/mecs6616_sp23_project3.git

Cloning into 'mecs6616_sp23_project3'...
remote: Enumerating objects: 27, done.[K
remote: Counting objects: 100% (27/27), done.[K
remote: Compressing objects: 100% (20/20), done.[K
remote: Total 27 (delta 11), reused 22 (delta 6), pack-reused 0[K
Unpacking objects: 100% (27/27), 23.05 KiB | 1.65 MiB/s, done.


In [None]:
# DO NOT CHANGE

# Copy all needed files into the working directory. This is simply to make accessing files easier
!cp -av /content/mecs6616_sp23_project3/* /content/

'/content/mecs6616_sp23_project3/arm_dynamics_base.py' -> '/content/arm_dynamics_base.py'
'/content/mecs6616_sp23_project3/arm_dynamics_teacher.py' -> '/content/arm_dynamics_teacher.py'
'/content/mecs6616_sp23_project3/geometry.py' -> '/content/geometry.py'
'/content/mecs6616_sp23_project3/imgs' -> '/content/imgs'
'/content/mecs6616_sp23_project3/imgs/example.png' -> '/content/imgs/example.png'
'/content/mecs6616_sp23_project3/render.py' -> '/content/render.py'
'/content/mecs6616_sp23_project3/robot.py' -> '/content/robot.py'
'/content/mecs6616_sp23_project3/score.py' -> '/content/score.py'


In [None]:
# DO NOT CHANGE

# Install required packages
!pip install ray

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting ray
  Downloading ray-2.3.1-cp39-cp39-manylinux2014_x86_64.whl (58.6 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m58.6/58.6 MB[0m [31m6.6 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting virtualenv>=20.0.24
  Downloading virtualenv-20.21.0-py3-none-any.whl (8.7 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m8.7/8.7 MB[0m [31m51.7 MB/s[0m eta [36m0:00:00[0m
Collecting aiosignal
  Downloading aiosignal-1.3.1-py3-none-any.whl (7.6 kB)
Collecting frozenlist
  Downloading frozenlist-1.3.3-cp39-cp39-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (158 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m158.8/158.8 KB[0m [31m5.6 MB/s[0m eta [36m0:00:00[0m
Collecting distlib<1,>=0.3.6
  Downloading distlib-0.3.6-py2.py3-none-any.whl (468 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━

# Starter Code Explanation

This project uses two 3-link arms, one called arm_teacher (blue) and the other called arm_student (red), as shown in the image below. For each test, a constant torque will be applied to the first joint of both arms for 5 seconds. arm_teacher is moving according to the provided ground truth forward dynamics and your job is to use deep learning to train the arm_student to learn the forward dynamics of the arm_teacher so that it can imitate its behavior. The forward dynamics is a function that takes in the current state of and an action applied to the arm, and then computes the new state of the arm. This project uses a time step of 0.01 second, meaning each time we advance the simulation, we compute the forward dynamics for 0.01 second. In the example image, the student arm is not updating its state and remains static but we will make it move after training is done.



<div>
<img src="https://github.com/roamlab/mecs6616_sp23_project3/blob/master/imgs/example.png?raw=true" width="600"/>
</div>

The interface for controlling the robot is defined in the `Robot` class in `robot.py` file. Each robot is initialized with a corresponding forward dynamics (the base class for forward dynamics definition is in `arm_dynamic_base.py`). The arm_teacher is initialized with the provided ground truth forward dynamics, as defined in `arm_dynamics_teacher.py`. You are welcome to look in-depth into this file to understand how the ground truth forward dynamics is computed for an arm, given its number of links, link mass, and viscous friction of the environment - this is recommended but not necessary to successfully complete this assignment. The state of each arm is defined with a (6,1)-dimensional numpy array (three joint positions in radians + three joint velocities in radians per second). An action is defined as the three toques (in Nm) applied to the three joints respectively, which is a (3,1) numpy array. **Throughout this project, we make the problem simpler by only applying a torque to the first joint, so the actions always look like `[torque,0,0]`.** Also, when scoring your model the robot will always start off in a hanging position, meaning an initial state of `[-pi/2,0,0,0,0,0]` so if the collected data from part 1 looks similar, the model will perform better. The `robot.py` file provides you with some functions to set/get the state and set the action for the arm. Make sure you understand `robot.py` well enough before getting started.

`geometry.py` provides some geometry functions and `render.py` defines how the visualization is rendered. These two files are not of particular interest for completing this project.

# Part I. Data collection.

You will first need to complete the cell below to collect a dataset for training the forward dynamics. After running the cell, it should generate a pickle file `data.pkl` that contains a data dictionary `data = {'X': X, 'Y': Y}`. The shape of `data['X']` should be (`num_samples`, 9), the first 6 elements are state and the last 3 elements are the action. The shape of `data['Y']` should be (`num_samples`, 6), which saves the next state after applying the action using the ground truth forward dynamics of arm_teacher.

**After the data file is generated, `data.pkl` should appear under the 'Files' icon in the left sidebar. You can download this file by right clicking the file name. You are required to submit this file. Please do not change its name.**

In the cell below, we have provided a minimal example of simulating the arm_teacher for 5 seconds. The GUI visualization is turned on and you should see the behavior of arm_teacher. The visualization can drastically slow down the simulator and you should turn it off when collecting a large amount of data.





In [None]:
import numpy as np
import os
from arm_dynamics_teacher import ArmDynamicsTeacher
from robot import Robot
import pickle
import math
from render import Renderer
import time

# DO NOT CHANGE
# Teacher arm
dynamics_teacher = ArmDynamicsTeacher(
    num_links=3,
    link_mass=0.1,
    link_length=1,
    joint_viscous_friction=0.1,
    dt=0.01
)
arm_teacher = Robot(dynamics_teacher)

# ---
# You code starts here. X and Y should eventually be populated with your collected data
# Control the arm to collect a dataset for training the forward dynamics.
X = np.zeros((0, arm_teacher.dynamics.get_state_dim() + arm_teacher.dynamics.get_action_dim()))
Y = np.zeros((0, arm_teacher.dynamics.get_state_dim()))

# We run the simulator for 5 seconds with a time step of 0.01 second,
# so there are 500 steps in total
num_steps = 500

# GUI visualization, this will drastically reudce the speed of the simulator!
gui = False

X = []
Y = []

for torque in range(1800):
  torque=(np.random.randint(-1800, 1800))/1000
  if torque == 0:
    continue


  # Define the initial state of the robot, such that it is vertical
  initial_state = np.zeros((arm_teacher.dynamics.get_state_dim(), 1))  # position + velocity
  initial_state[0] = -math.pi / 2.0

  # Set the initial state of the arm. Input to set_state() should be of shape (6, 1)
  arm_teacher.set_state(initial_state)

  # Define the action, applying 1Nm torque to the first joint
  action = np.zeros((arm_teacher.dynamics.get_action_dim(), 1))
  action[0] = torque

  # Set the action. Input to set_action() should be of shape (3, 1)
  arm_teacher.set_action(action)
  arm_teacher.set_t(0)

  # Initialize the GUI
  if gui:
    renderer = Renderer()
    time.sleep(1)

  for s in range(num_steps):
    # Get the current state
    state = arm_teacher.get_state()
    action = np.zeros((arm_teacher.dynamics.get_action_dim(), 1))
    action[0] = torque
    X.append(np.concatenate((state, action), axis=0))

    # The advance function will simulate the action for 1 time step
    arm_teacher.advance()
    y = arm_teacher.get_state()
    Y.append(y)

    if gui:
        renderer.plot([(arm_teacher, 'tab:blue')])

    # Get the new state after advancing one time step
    new_state = arm_teacher.get_state()
X = np.hstack(X)
Y = np.hstack(Y)
# ---


# DO NOT CHANGE
# Save the collected data in the data.pkl file
data = {'X': X, 'Y': Y}
pickle.dump(data, open( "data.pkl", "wb" ) )

In [None]:
print('X shape:',X.shape,'Y shape:',Y.shape)

X shape: (9, 899500) Y shape: (6, 899500)


# Part 2. Learning the forward dynamics.

## Training

After the data is collected, you will then need to complete the cell below to use the collected dataset to learn the forward dynamics.

The code already creates the dataset class and loads the dataset with a random 0.8/0.2 train/test split for you. This cell should save the model that it trains. You should use a specific procedure for saving, outlined below.

In machine learning, it is a very good practice to save not only the final model but also the checkpoints, such that you have a wider range of models to choose from. We provide a code snippet for you and for each epoch of your training, you should use it to save the model at that epoch.

```
model_folder_name = f'epoch_{epoch:04d}_loss_{test_loss:.8f}'
if not os.path.exists(os.path.join(model_dir, model_folder_name)):
    os.makedirs(os.path.join(model_dir, model_folder_name))
torch.save(model.state_dict(), os.path.join(model_dir, model_folder_name, 'dynamics.pth'))
```

The output from running this code should be a folder as below:

```
models/
    2023-03-08_23-57-50/
        epoch_0001_loss_0.00032930/
            dynamics.pth
        epoch_0002_loss_0.00009413/
            dynamics.pth   
        ...  
```

You can see that every time you run this cell, a folder whose name is the time you started will be created under `models`. Checkpoints from all epochs will be saved and then the folder name for saving the checkpoint indicates the epoch number and loss on the holdout test set. Recording checkpoints this way allows you to easily pick the model with the smallest loss.


### Important: choosing the best model

Your code should keep track of the checkpoint with the smallest loss on the test set. You should save the path of that checkpoint to the variable `model_path`. An example value of `model_path` could be `models/2023-03-07_20-14-32/epoch_0046_loss_0.00000005/dynamics.pth`. In the evaluation code, the checkpoint from `model_path` will be loaded and evaluated.

You should also download the `dynamic.pth` file to include in your submission.

In [None]:
from torch._C import NoneType
import torch
torch.manual_seed(0)
from torch.utils.data import Dataset, DataLoader, random_split
import os
import numpy as np
import torch
import torch.nn as nn
import argparse
import time
import pickle
np.set_printoptions(suppress=True)


class DynamicDataset(Dataset):
    def __init__(self, data_file):
        data = pickle.load(open(data_file, "rb" ))
        # X: (N, 9), Y: (N, 6)
        self.X = data['X'].T.astype(np.float32)
        self.Y = data['Y'].T.astype(np.float32)

    def __len__(self):
        return self.X.shape[0]

    def __getitem__(self, idx):
        return self.X[idx], self.Y[idx]


class Net(nn.Module):
    # ---
    # Your code goes here
    #pass
    def __init__(self, input_dim=9, output_dim=3):
      super(Net, self).__init__()
      self.fc1 = nn.Linear(input_dim, 256)
      self.fc2 = nn.Linear(256, 128)
      self.fc3 = nn.Linear(128, 64)
      self.fc4 = nn.Linear(64, 32)
      self.fc5 = nn.Linear(32, output_dim)

    def forward(self, x):
      x = nn.functional.relu(self.fc1(x))
      x = nn.functional.relu(self.fc2(x))
      x = nn.functional.relu(self.fc3(x))
      x = nn.functional.relu(self.fc4(x))
      x = self.fc5(x)
      return x

    def predict(self, features):
      self.eval()
      return self.forward(features).detach().numpy()
    # ---


def train(model):
    model.train()

    # ---
    # Your code goes here
    learning_rate = 0.001
    optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)
    criterion = nn.MSELoss()
    total_items = 0
    training_loss = 0.0
    #dtheta_prev = torch.zeros(5000, 3)
    #theta_prev = torch.zeros(5000, 3)
    dt=0.01

    for i, data in enumerate(train_loader, 0):
          features, labels = data
          optimizer.zero_grad()
          #predictions = model(features)

          ddtheta = model(features)

          theta = features[:, :3]
          dtheta = features[:, 3:6]

          dtheta_next = ddtheta*dt+dtheta

          theta_next = theta + dtheta*dt + 0.5*dt**2*ddtheta

          predictions = torch.cat((theta_next, dtheta_next), 1)

          loss = criterion(predictions, labels)

          loss.backward()

          optimizer.step()

          training_loss +=loss.item()
          total_items += labels.size(0)


    training_loss = training_loss/total_items
    print("training loss is as ", training_loss)
    # ---


def test(model):
    model.eval()

    # --
    # Your code goes here
    criterion = nn.MSELoss()

    test_loss = 0.0
    total_item = 0

    dt = 0.01

    for i, data in enumerate(test_loader, 0):
      features, labels = data
      ddtheta = model(features.float())

      theta = features[:, :3]
      dtheta = features[:, 3:6]

      dtheta_next = ddtheta*dt+dtheta

      theta_next = theta + dtheta*dt + 0.5*dt**2*ddtheta

      predictions = torch.cat((theta_next, dtheta_next), 1)

      loss = criterion(predictions.float(), labels.float())

      test_loss += loss.item()
      total_item += labels.size(0)

    test_loss = test_loss/total_item
    print("test loss is as ", test_loss)
    # ---

    return test_loss


# The ratio of the dataset used for testing
split = 0.2

# We are only using CPU, and GPU is not allowed.
device = torch.device("cpu")

dataset = DynamicDataset('data.pkl')
dataset_size = len(dataset)
test_size = int(np.floor(split * dataset_size))
train_size = dataset_size - test_size
train_set, test_set = random_split(dataset, [train_size, test_size])

train_loader = torch.utils.data.DataLoader(train_set, shuffle=True, batch_size=1000)
test_loader = torch.utils.data.DataLoader(test_set, shuffle=True, batch_size=1000)

# The name of the directory to save all the checkpoints
timestr = time.strftime("%Y-%m-%d_%H-%M-%S")
model_dir = os.path.join('models', timestr)

# Keep track of the checkpoint with the smallest test loss and save in model_path
model_path = None

epochs = 25
model = Net()
best_loss = 10000
best_epoch = -1

for epoch in range(1, 1 + epochs):
    # ---
    # Your code goes here
    train(model)
    test_loss = test(model)

    if(test_loss<best_loss):
      best_loss = test_loss
      best_epoch = epoch

    model_folder_name = f'epoch_{epoch:04d}_loss_{test_loss:.8f}'
    if not os.path.exists(os.path.join(model_dir, model_folder_name)):
      os.makedirs(os.path.join(model_dir, model_folder_name))
      torch.save(model.state_dict(), os.path.join(model_dir, model_folder_name, 'dynamics.pth'))

    print("Best epoch:", best_epoch+1, "best loss", best_loss)


    # ---
    #pass


training loss is as  7.588549223724656e-08
test loss is as  1.532732523948022e-08
Best epoch: 2 best loss 1.532732523948022e-08
training loss is as  1.2174988404976229e-08
test loss is as  1.901572570231024e-09
Best epoch: 3 best loss 1.901572570231024e-09
training loss is as  5.843923461847571e-09
test loss is as  1.0227750739767628e-09
Best epoch: 4 best loss 1.0227750739767628e-09
training loss is as  1.0273536936189191e-08
test loss is as  7.987890937767529e-10
Best epoch: 5 best loss 7.987890937767529e-10
training loss is as  9.27425757258633e-09
test loss is as  6.101824645850867e-10
Best epoch: 6 best loss 6.101824645850867e-10
training loss is as  4.9882825497970434e-09
test loss is as  4.972561199241275e-10
Best epoch: 7 best loss 4.972561199241275e-10
training loss is as  8.54557664846035e-09
test loss is as  4.2368159226178376e-10
Best epoch: 8 best loss 4.2368159226178376e-10
training loss is as  3.9661946609743456e-09
test loss is as  3.8232060986657857e-10
Best epoch: 9 b

In [None]:
#rememember to update this path
model_path = "/content/models/2023-03-31_21-44-43/epoch_0025_loss_0.00000000/dynamics.pth"

## Prediction

After you are done with training, you need to complete the cell below to load the saved checkpoint (in function init_model) and then use it to predict the new state given the current state and action (in function dynamics_step). Please do not modify the arguments to those functions, even though you might not use all of them.

In [None]:
from numpy.core.fromnumeric import shape
from arm_dynamics_base import ArmDynamicsBase

class ArmDynamicsStudent(ArmDynamicsBase):
    def init_model(self, model_path, num_links, time_step, device):
        # ---
        # Your code hoes here
        # Initialize the model loading the saved model from provided model_path
        self.model = Net()
        # ---
        self.model_loaded = True
        checkpoint = torch.load(model_path)
        self.model.load_state_dict(checkpoint)

    def dynamics_step(self, state, action, dt):
        if self.model_loaded:
            # ---
            # Your code goes here
            # Use the loaded model to predict new state given the current state and action
            # Output should be an array of shape (6,1)

            #print("state is as ", state.shape)
            #print("action is as ", action.shape)

            self.model.eval()

            X = np.concatenate((state, action)).reshape(1, -1)

            #print("X is as ", X.shape)

            ddtheta = self.model(torch.FloatTensor(X))
            #print("ddtheta is as ", ddtheta.shape)

            theta = torch.FloatTensor(X[:, :3])
            #print("theta is as ", theta.shape)
            dtheta = torch.FloatTensor(X[:, 3:6])
            #print("dtheta is as ", dtheta.shape)

            dtheta_next = ddtheta*dt+dtheta
            #print("dtheta_next is as ", dtheta_next.shape)

            theta_next = theta + dtheta*dt + 0.5*dt**2*ddtheta
            #print("theta next is as ", theta_next.shape)

            predictions = torch.cat((theta_next, dtheta_next), 1)
            #print("predictions are as ", predictions.shape)

            return predictions.detach().numpy().T
            #return state
            # ---
        else:
          #print("shape of state is ", state.shape)
          return state

# Evaluation and Grading

The total number of points for this project is 15. There are 3 types of tests, each is worth 5 points.

**For each type, there are 50 tests.** For each test, you get a score of 1, 0.5, or 0. Your final grade for each type is the averaged score across 50 tests * 5.

- *Type 1*: for each test, a constant torque randomly sampled from [-1.5Nm, 1.5Nm] is applied to the first joint of the arm for 5 seconds. If the MSE (Mean Squred Error) between the predicted arm state (arm_student) and the ground truth arm state (arm_teacher) is < 0.0005, you get score 1 for this test. If 0.0005 <= MSE < 0.008, you get score 0.5 for this test. Otherwise you get 0.
- *Type 2*: for each test, a torque that linearly increases from 0 to a random torque in [0.5Nm, 1.5Nm] is applied to the first joint of the arm for 5 seconds. If MSE < 0.0005, you get score 1 for this test. If 0.0005 <= MSE < 0.008, you get score 0.5 for this test. Otherwise you get 0.
- *Type 3*: for each test, one torque is applied for the first 2.5 seconds and another torque is applied for the remaining 2.5 seconds. Both torques are sampled from [-1Nm, 1Nm]. If MSE < 0.015, you get score 1 for this test. If 0.015 <= MSE < 0.05, you get score 0.5 for this test. Otherwise you get 0.


In [None]:
# DO NOT CHANGE
# Set up grading

# Make sure model_path is correctly set
print(model_path)

import importlib
import score
importlib.reload(score)

# Create the teacher arm
dynamics_teacher = ArmDynamicsTeacher(
    num_links=3,
    link_mass=0.1,
    link_length=1,
    joint_viscous_friction=0.1,
    dt=0.01
)
arm_teacher = Robot(dynamics_teacher)

# Create the student arm
dynamics_student = ArmDynamicsStudent(
    num_links=3,
    link_mass=0.1,
    link_length=1,
    joint_viscous_friction=0.1,
    dt=0.01
)
if model_path is not None:
  dynamics_student.init_model(model_path, num_links=3, time_step=0.01, device=torch.device('cpu'))
arm_student = Robot(dynamics_student)

/content/models/2023-03-31_21-44-43/epoch_0025_loss_0.00000000/dynamics.pth


In [None]:
# DO NOT CHANGE
# Test on randomly sampled torques from [-1.5, 1.5]
score.score_random_torque(arm_teacher, arm_student, gui=False)


----------------------------------------
TEST 1 (Torque = 0.8139619298002381 Nm)

average mse: 4.5896148518226786e-05
Score: 1/1
----------------------------------------


----------------------------------------
TEST 2 (Torque = -1.4377441519217955 Nm)

average mse: 0.0002925372905483516
Score: 1/1
----------------------------------------


----------------------------------------
TEST 3 (Torque = 0.4009447047788264 Nm)

average mse: 5.943697396239898e-05
Score: 1/1
----------------------------------------


----------------------------------------
TEST 4 (Torque = 0.7464116476158358 Nm)

average mse: 5.5048101267270806e-05
Score: 1/1
----------------------------------------


----------------------------------------
TEST 5 (Torque = -0.004478963092228838 Nm)

average mse: 1.8821718380718227e-07
Score: 1/1
----------------------------------------


----------------------------------------
TEST 6 (Torque = -0.825610063407457 Nm)

average mse: 2.644860237018004e-05
Score: 1/1
---------

In [None]:
# DO NOT CHANGE

# Test on torques that linearly increase from 0 to a random number from [0.5, 1.5]
score.score_linear_torques(arm_teacher, arm_student, gui=False)


----------------------------------------
TEST 1 (Torque 0 -> 1.2713206432667459 Nm)

average mse: 5.064690500795647e-05
Score: 1/1
----------------------------------------


----------------------------------------
TEST 2 (Torque 0 -> 0.5207519493594015 Nm)

average mse: 1.0098180775272336e-05
Score: 1/1
----------------------------------------


----------------------------------------
TEST 3 (Torque 0 -> 1.1336482349262753 Nm)

average mse: 3.6558209159354785e-05
Score: 1/1
----------------------------------------


----------------------------------------
TEST 4 (Torque 0 -> 1.2488038825386119 Nm)

average mse: 4.809185552193393e-05
Score: 1/1
----------------------------------------


----------------------------------------
TEST 5 (Torque 0 -> 0.9985070123025904 Nm)

average mse: 2.767795992362766e-05
Score: 1/1
----------------------------------------


----------------------------------------
TEST 6 (Torque 0 -> 0.7247966455308477 Nm)

average mse: 1.4703868538508974e-05
Score:

In [None]:
# DO NOT CHANGE

# Test on one torque applied to the first 2.5s and another torque applied to the second 2.5s
# Both torques are sampled from [-1, 1]
score.score_two_torques(arm_teacher, arm_student, gui=False)


----------------------------------------
TEST 1 (Torque 1 = 0.542641286533492 Nm,  Torque 2 = -0.21494151210682544 Nm)

average mse: 0.0010422156469932396
Score: 1/1
----------------------------------------


----------------------------------------
TEST 2 (Torque 1 = -0.958496101281197 Nm,  Torque 2 = -0.8130792508826994 Nm)

average mse: 0.0009284839837126158
Score: 1/1
----------------------------------------


----------------------------------------
TEST 3 (Torque 1 = 0.26729646985255084 Nm,  Torque 2 = 0.6422113156738569 Nm)

average mse: 0.0001997761385173234
Score: 1/1
----------------------------------------


----------------------------------------
TEST 4 (Torque 1 = 0.4976077650772237 Nm,  Torque 2 = -0.6976959607148723 Nm)

average mse: 0.0017148532326030665
Score: 1/1
----------------------------------------


----------------------------------------
TEST 5 (Torque 1 = -0.0029859753948191514 Nm,  Torque 2 = -0.23177110261560085 Nm)

average mse: 1.4497552678099307e-05
Sc

# Other Requirements and Hints

- Training time: This project requires more training than the previous projects. But less than a hundred epochs of training (<= 25 mins) should suffice for achieving the full points. Again, the shorter your model training time is the better.
- Dataset: Choosing the right policy to collect datasets for this project is important. You need to think about how to do it properly so that your trained model will pass the tests successfully. It is in general very hard to learn the ground truth forward dynamics completely (that works for any distribution of actions), and during testing small errors can accumulate, leading to drastic failure in the end. You might want to try overfitting on the test cases, to begin with. Make sure that your dataset is less than 100 Mb, which is pretty much sufficient for achieving full marks. Collecting datasets can be time-consuming and you could parallelize this process for some speed-up using [ray](https://www.ray.io/). Make sure your data collection takes <= 25 mins.
- NO GPU: No GPU is required or allowed for this assignment and we will test your code without GPUs.
- Loss Function: This is essentially a regression problem so think about what losses are suitable for regression.
- Optimizer: While it is possible to use a simple optimizer to achieve the desired accuracy, the training time can be quite high. There exists a number of optimizers implemented in PyTorch that have much faster convergence.
- Seeding. Please use seeding in your code to make sure your results are reproducible.