## Setup

You will need to make a copy of this notebook in your Google Drive before you can edit the homework files. You can do so with **File &rarr; Save a copy in Drive**.

In [1]:
#@title mount your Google Drive
#@markdown Your work will be stored in a folder called `cds_rl_2022` by default to prevent Colab instance timeouts from deleting your edits.

import os
from google.colab import drive
drive.mount('/content/gdrive')

Drive already mounted at /content/gdrive; to attempt to forcibly remount, call drive.mount("/content/gdrive", force_remount=True).


In [2]:
#@title set up mount symlink

DRIVE_PATH = '/content/gdrive/My\ Drive/Colab\ Notebooks/RL'
DRIVE_PYTHON_PATH = DRIVE_PATH.replace('\\', '')
if not os.path.exists(DRIVE_PYTHON_PATH):
    !mkdir $DRIVE_PATH

## the space in `My Drive` causes some issues,
## make a symlink to avoid this
SYM_PATH = '/content/cds_rl_2022'
if not os.path.exists(SYM_PATH):
    !ln -s $DRIVE_PATH $SYM_PATH

In [3]:
#@title apt install requirements

#@markdown Run each section with Shift+Enter

#@markdown Double-click on section headers to show code.

!apt update 
!apt install -y --no-install-recommends \
        build-essential \
        curl \
        git \
        git-lfs \
        gnupg2 \
        make \
        cmake \
        ffmpeg \
        swig \
        libz-dev \
        unzip \
        zlib1g-dev \
        libglfw3 \
        libglfw3-dev \
        libxrandr2 \
        libxinerama-dev \
        libxi6 \
        libxcursor-dev \
        libgl1-mesa-dev \
        libgl1-mesa-glx \
        libglew-dev \
        libosmesa6-dev \
        lsb-release \
        ack-grep \
        patchelf \
        wget \
        xpra \
        xserver-xorg-dev \
        xvfb \
        python-opengl \
        ffmpeg

# set up git lfs
!git lfs install

[33m0% [Working][0m            Hit:1 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64  InRelease
[33m0% [Connecting to archive.ubuntu.com] [Connecting to security.ubuntu.com] [Conn[0m[33m0% [1 InRelease gpgv 1,581 B] [Connecting to archive.ubuntu.com] [Connecting to[0m                                                                               Hit:2 https://cloud.r-project.org/bin/linux/ubuntu bionic-cran40/ InRelease
[33m0% [1 InRelease gpgv 1,581 B] [Connecting to archive.ubuntu.com] [Connecting to[0m                                                                               Ign:3 https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64  InRelease
Hit:4 https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64  Release
Hit:5 http://ppa.launchpad.net/c2d4u.team/c2d4u4.0+/ubuntu bionic InRelease
Hit:6 http://security.ubuntu.com/ubuntu bionic-security InRelease
Hit:7 http://archiv

In [4]:
#@title install mujoco-py

%pip install free-mujoco-py

# Cythonizes pkg on the first run
import mujoco_py

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


### Clone/update repo

Now we need to clone the HW2 codebase. There are two options:

1. Git clone the [repository](https://github.com/pkuderov/mipt-rl-hw-2022.git), install requirements, start coding HW2. This's the only option if you haven't cloned the repo yet for HW1. 
    If you have the repo already cloned, it's better to follow the 2-nd option. Otherwise, you will need to move the old `rl_hw` folder first. But don't delete it - make sure you've kept the HW1 solution as you will need it for this assignment!

2. Use already cloned local repository in `rl_hw`. Save the HW1 solution to the separate branch, then git pull changes from the remote upstream to get HW2 codebase.

In [None]:
#@title clone homework repo (option #1)
%cd $SYM_PATH
!git clone https://github.com/pkuderov/mipt-rl-hw-2022.git rl_hw
%cd rl_hw

In [None]:
#@title pull updated repo (option #2)
# Don't hesitate to update the script for yourself

%cd $SYM_PATH/rl_hw
# git commit before pulling
!git checkout -b "hw1"
!git add .
!git commit -m "HW1 solution"
!git checkout main

# update
!git pull

In [5]:
%cd $SYM_PATH
%cd rl_hw/

/content/gdrive/My Drive/Colab Notebooks/RL
/content/gdrive/My Drive/Colab Notebooks/RL/rl_hw


In [6]:
#@title install requirements (from HW1 as HW2 has them the same)
%cd hw1
%pip install -r requirements.colab.txt
%pip install -e .

# also install hw2 package
%cd ../hw2
%pip install -e .

/content/gdrive/My Drive/Colab Notebooks/RL/rl_hw/hw1
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Obtaining file:///content/gdrive/My%20Drive/Colab%20Notebooks/RL/rl_hw/hw1
Installing collected packages: cds-rl
  Attempting uninstall: cds-rl
    Found existing installation: cds-rl 1.0.0
    Can't uninstall 'cds-rl'. No files were found to uninstall.
  Running setup.py develop for cds-rl
Successfully installed cds-rl-1.0.0
/content/gdrive/My Drive/Colab Notebooks/RL/rl_hw/hw2
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Obtaining file:///content/gdrive/My%20Drive/Colab%20Notebooks/RL/rl_hw/hw2
Installing collected packages: hw2
  Attempting uninstall: hw2
    Found existing installation: hw2 1.0.0
    Can't uninstall 'hw2'. No files were found to uninstall.
  Running setup.py develo

In [7]:
#@title set up virtual display

from pyvirtualdisplay import Display

display = Display(visible=0, size=(1400, 900))
display.start()

<pyvirtualdisplay.display.Display at 0x7f80a2d83910>

In [8]:
#@title test virtual display

#@markdown If you see a video of a four-legged ant fumbling about, setup is complete!

import gym
import matplotlib
matplotlib.use('Agg')
from hw2.infrastructure.colab_utils import (
    wrap_env,
    show_video
)

env = wrap_env(gym.make("Ant-v2"))

observation = env.reset()
for i in range(100):
    env.render(mode='rgb_array')
    obs, rew, term, _ = env.step(env.action_space.sample() ) 
    if term:
        break;
            
env.close()
print('Loading video...')
show_video()

Loading video...


## Editing Code

To edit code, click the folder icon on the left menu. Navigate to the corresponding file (`cds_rl_2022/...`). Double click a file to open an editor. There is a timeout of about ~12 hours with Colab while it is active (and less if you close your browser window). We sync your edits to Google Drive so that you won't lose your work in the event of an instance timeout, but you will need to re-mount your Google Drive and re-install packages with every new instance.

## Run Policy Gradients

In [9]:
#@title imports

import os
import time

from hw2.infrastructure.rl_trainer import RL_Trainer
from hw2.agents.pg_agent import PGAgent

%load_ext autoreload
%autoreload 2

In [10]:
#@title runtime arguments

class Args:

    def __getitem__(self, key):
        return getattr(self, key)

    def __setitem__(self, key, val):
        setattr(self, key, val)

    def __contains__(self, key):
        return hasattr(self, key)

    env_name = 'HalfCheetah-v2' #@param
    exp_name = 'q4_b50000_lr0.02_rtg' #@param

    #@markdown main parameters of interest
    n_iter = 100 #@param {type: "integer"}

    ## PDF will tell you how to set ep_len
    ## and discount for each environment
    ep_len = 150 #@param {type: "integer"}
    discount = 0.95 #@param {type: "number"}

    reward_to_go = True #@param {type: "boolean"}
    nn_baseline = False #@param {type: "boolean"}
    gae_lambda = None #@param {type: "number"}
    dont_standardize_advantages = False #@param {type: "boolean"}

    #@markdown batches and steps
    batch_size =  50000#@param {type: "integer"}
    eval_batch_size = 400 #@param {type: "integer"}

    num_agent_train_steps_per_iter = 1 #@param {type: "integer"}
    learning_rate =  0.02 #@param {type: "number"}

    #@markdown MLP parameters
    n_layers = 2 #@param {type: "integer"}
    size =  32#@param {type: "integer"}

    #@markdown system
    save_params = False #@param {type: "boolean"}
    no_gpu = False #@param {type: "boolean"}
    which_gpu = 0 #@param {type: "integer"}
    seed = 1 #@param {type: "integer"}

    action_noise_std = 0 #@param {type: "number"}

    #@markdown logging
    ## default is to not log video so
    ## that logs are small enough to be
    ## uploaded to gradscope
    video_log_freq =  -1#@param {type: "integer"}
    scalar_log_freq =  1#@param {type: "integer"}


args = Args()

## ensure compatibility with hw1 code
args['train_batch_size'] = args['batch_size']

if args['video_log_freq'] > 0:
    import warnings
    warnings.warn(
      '''\nLogging videos will make eventfiles too'''
      '''\nlarge for the autograder. Set video_log_freq = -1'''
      '''\nfor the runs you intend to submit.'''
    )

In [11]:
#@title create directory for logging

data_path = '/content/cds_rl_2022/hw2/data'

if not (os.path.exists(data_path)):
    os.makedirs(data_path)

logdir = args.exp_name + '_' + args.env_name + '_' + time.strftime("%d-%m-%Y_%H-%M-%S")
logdir = os.path.join(data_path, logdir)
args['logdir'] = logdir
if not(os.path.exists(logdir)):
    os.makedirs(logdir)

In [12]:
## define policy gradient trainer

class PG_Trainer(object):
    def __init__(self, params):

        #####################
        ## SET AGENT PARAMS
        #####################

        computation_graph_args = {
            'n_layers': params['n_layers'],
            'size': params['size'],
            'learning_rate': params['learning_rate'],
            }

        estimate_advantage_args = {
            'gamma': params['discount'],
            'standardize_advantages': not(params['dont_standardize_advantages']),
            'reward_to_go': params['reward_to_go'],
            'nn_baseline': params['nn_baseline'],
            'gae_lambda': params['gae_lambda'],
        }

        train_args = {
            'num_agent_train_steps_per_iter': params['num_agent_train_steps_per_iter'],
        }

        agent_params = {**computation_graph_args, **estimate_advantage_args, **train_args}

        self.params = params
        self.params['agent_class'] = PGAgent
        self.params['agent_params'] = agent_params
        self.params['batch_size_initial'] = self.params['batch_size']

        ################
        ## RL TRAINER
        ################

        self.rl_trainer = RL_Trainer(self.params)

    def run_training_loop(self):
        self.rl_trainer.run_training_loop(
            self.params['n_iter'],
            collect_policy = self.rl_trainer.agent.actor,
            eval_policy = self.rl_trainer.agent.actor,
        )

In [13]:
## run training

print(args.logdir)
trainer = PG_Trainer(args)
trainer.run_training_loop()

/content/cds_rl_2022/hw2/data/q4_b50000_lr0.02_rtg_HalfCheetah-v2_24-05-2022_12-46-38
########################
logging outputs to  /content/cds_rl_2022/hw2/data/q4_b50000_lr0.02_rtg_HalfCheetah-v2_24-05-2022_12-46-38
########################
Using GPU id 0


********** Iteration 0 ************

Collecting data to be used for training...

Training agent using sampled data from replay buffer...

Beginning logging procedure...

Collecting data for eval...
Eval_AverageReturn : -84.34406280517578
Eval_StdReturn : 11.401876449584961
Eval_MaxReturn : -70.7845687866211
Eval_MinReturn : -98.68087768554688
Eval_AverageEpLen : 150.0
Train_AverageReturn : -89.52372741699219
Train_StdReturn : 37.225181579589844
Train_MaxReturn : -5.174495697021484
Train_MinReturn : -219.4058380126953
Train_AverageEpLen : 150.0
Train_EnvstepsSoFar : 50100
TimeSinceStart : 78.86546754837036
Training Loss : -3647.230712890625
Initial_DataCollection_AverageReturn : -89.52372741699219
Done logging...




********** Iter

In [14]:
#@markdown You can visualize your runs with tensorboard from within the notebook

%load_ext tensorboard
%tensorboard --logdir /content/cds_rl_2022/hw2/data

ERROR: Timed out waiting for TensorBoard to start. It may still be running as pid 3966.