## General Colab Tips
- Modify files by opening/editing them in the UI (double-click to open).
- `Right click > Refresh` in the Colab file explorer to update the directory.
- All files are lost when the Colab session disconnects, so make sure back up your work.
- Do **not** use `drive.mount` for your datasets! Reading from GDrive is super slow.
- Instead, place datasets into the `/content/` folder and modify your data accordingly.

**Make a copy of this notebook and modify this to whatever workflow you prefer!**

If you have some additional colab tips, please share them on the discussion forum.

## Setup

First, enable a GPU runtime via `Runtime > Change runtime type > T4 GPU`

Next, upload the your project files to the Colab. You can do this by either
- using Github (**recommended**)
- uploading files manually using the UI

## Github Setup

You can use git from within Google Colab!

For this section, we assume you know how to use git and have already pushed the starter code to a private repo.

It's a good idea to structure your repo something like
```
online_deep_learning/
    homework1/
    homework2/
    ...
```

We highly recommend using this workflow as you'll be able to easily pull/commit your changes after modifying your model on Colab.

To do this, you'll need a personal access token from [https://github.com/settings/tokens](https://github.com/settings/tokens)

The easiest thing to do is select "classic" token and make sure you have the `repo` scope selected to allow access to your private repos.
There's also fine-grained tokens where you can select access to specific repos.

Once you have your token, fill in your information and then run the following cell to clone your git repo to the Colab instance.

In [None]:
import os


os.environ['USER'] = 'bradyz'
os.environ['REPO'] = 'online_deep_learning'
os.environ['TOKEN'] = 'mYseCretToKEn'

# do everything in colab's "root" directory
%cd /content
!git clone https://${TOKEN}@github.com/${USER}/${REPO}.git

# make sure your repo shows up
%ls

## Code Setup

Next let's move into `homework2/` so we can continue setting up the data / code for training.

This will be the main working directory and the training/grading must be run from this directory.


In [None]:
# navigate to your repo
%cd /content/{os.environ['REPO']}
%ls

# go to a specific homework
%mkdir -p homework2
%cd homework2
%ls

# if you don't have a copy of homework2 yet in your git repo
# you can uncomment the lines below to get a copy
#!curl -O https://www.cs.utexas.edu/~bzhou/dl_class/homework2.zip
#!unzip -o homework2.zip
#!rm homework2.zip

## Dataset Setup

Now that your code is all ready, the next step is to download the datasets.

Note: it's good practice to add data directories like `*/classification_data` to your `.gitignore` so you don't accidently commit them to your repo.

Since the datasets used in this class are relatively small, we can simply re-download them if the compute instance crashes/restarts.

In [None]:
!curl -O https://www.cs.utexas.edu/~bzhou/dl_class/classification_data.zip
!unzip -qo classification_data.zip > /dev/null
!rm classification_data.zip

# refreshes python imports automatically when you edit the source file
%load_ext autoreload
%autoreload 2

## Setup Verification

Now you should be all set up, check out the `README.md` for additional instructions.

Run this cell to verify your working directory is setup correctly.

Your workspace should be organized as follows:

```
online_deep_learning/
├── homework1/
└── homework2/              <- you should be here
    ├── bundle.py
    ├── classification_data/
    ├── grader/
    ├── homework/
    ├── README.md
    └── requirements.txt
```

In [None]:
!ls

## Additional Helper Cells

Now you're on your own! The rest of the provided cells are small helper routines.

If you have any additional helpful colab tips/tricks please share them on the discussion forum.

## Tensorboard (Optional)

You can monitor training using the following command.

Make sure that your training code writes to the corresponding directory.

In [None]:
%load_ext tensorboard
!mkdir -p logs
%tensorboard --logdir logs

## Training

After you implement your model, modify and run this cell to start training.

Be sure to pass in the appropriate parameters.

In [None]:
from homework.train import train


train(
    model_name="linear",
    num_epoch=10,
    lr=1e-3,
)

## Grader

Run the following cell to grade your homework locally.

The Canvas grader uses a different data split for testing,  
so there may be a *small* difference between your local grade and your final grade.

In [None]:
!python3 -m grader homework -vv --disable_color

## Updating Your Changes

After you've made progress, modify this cell and commit your changes to git.

In [None]:
%ls
!git status

# Be careful not to "git add *" since there are datasets and logs
!git add homework/*.py
!git config --global user.email "GITHUB_EMAIL"
!git config --global user.name "GITHUB_USER"
!git commit -m "update"
!git push origin main

## Tuning

Rather than changing one parameter and re-running the cell above over and over again,  
it is good practice to set up the model/training code so you can "tune" your model in a semi-automatic way.

This cell tunes over the `num_epochs` your training runs for,  
but you could easily modify this to tune over the number of layers, learning rate, etc.

After you find a good set of model hyperparameters, be sure to hard-code them into the constructor,  
since the grader will use the default constructor to load your model!

In [None]:
from homework.train import train


jobs = [
    # Run on short schedule (10 epochs)
    {
        "model_name": "mlp",
        "num_epoch": 10,
        "lr": 1e-3,
        "hidden_dim": 64,
    },
    # Train for longer (20 epochs)
    {
        "model_name": "mlp",
        "num_epoch": 20,
        "lr": 1e-3,
        "hidden_dim": 128,
    },
]

for params in jobs:
    train(**params)

## Submission

Run the following cell to bundle your submission (modify UTID accordingly).

After the bundler and grader run, right click and download your bundled `.zip` file from the Colab UI.


In [None]:
!python3 bundle.py homework UTID

# optional: run the grader with your bundled homework
!python3 -m grader UTID.zip -vv --disable_color