# Exercise 10: Running RLlib experiments on EC2

This tutorial walks through how to run RLlib experiments on an AWS EC2 instance. This assumes that the machine you are using has already been configured for AWS (i.e. `~/.aws/credentials` is properly set up). For more detailed documentation, please view: https://github.com/ray-project/ray/blob/master/doc/source/autoscaling.rst

## Getting Dependencies 


* First, make sure your version of ray is tracking http://github.com/eugenevinitsky/ray. To do this, go to your ray directory and run `git remove -v` and confirm that the branch you are trackng matches "eugenevinitsky/ray" [TODO: is this still true?]


* Install the `rayutils` package from https://github.com/richardliaw/rayutils:  

`pip install -e git+https://github.com/richardliaw/rayutils.git#egg=rayutils`
    

## Modify Configuration

This section explains the components of `/learning-traffic/scripts/ray_autoscale.yaml` you'll want to customize. These descriptions are also listed in the `ray_autoscale.yaml`. We'll go over some of the variables you should change, as well as those that might come in handy for you:

* Modify `cluster_name`: A unique identifier for the head node and workers of this cluster. If you want to set up multiple clusters, `cluster_name` must be changed each time the script is run.

    
* Modify `file_mounts`: _change me!_ You'll want to change these file mounts. [TODO: I think this is gone]
    * "tmp/path" indicates the path to the version of Flow you intend to use. This is specified in the format`#"/tmp/path": "<PATH TO LEARNING TRAFFIC>/.git/refs/heads/<BRANCH NAME>"`
    * "tmp/ray_autoscaler_key" is the path to the ray autoscaler key. For most, this will be found in ~/.ssh

## Setup Clusters

* To create or update the cluster, run: `ray up ray_autoscale.yaml -y`

#* To set up ray in the cluster, run:  `ray2 setup ray_autoscale.yaml`

After step 5 is complete, you can login to the cluster via:  `$(ray2 login_cmd ray_autoscale.yaml)`. Note that you can run commands from outside the cluster via: `ray2 submit ray_autoscale.yaml [--background] [--shutdown] test.py`, where test.py is an example script.

## Run Experiments

The cluster is all set up and you are ready to run an experiment! From `/learning-traffic/scripts`, run:  

`./run_rllib.sh -f /Users/kathyjang/research/rllab-multiagent/learning-traffic/examples/rllib/figure_eight.py -s`

--- 
### Results and Caveats
The experiment is now being run! Results are by default logged in ~/ray_results

The `run_rllib.sh` script can be run with a few different flags:  
* -f is required, indicates the script to be ran on the cluster
* -s instructs the cluster to shutdown after the script is done running
* -b runs the script in the background (recommended for long experiments)
* -n TODO: This is listed as an option in `run_rllib.sh` but there's no if clause supporting it, nor do I see an analog in https://github.com/richardliaw/rayutils/blob/master/rayutils/rayutils.py Whoever wrote this can you chime in? 

For background users: RLlib uses `screen`, a Linux utility for managing processes in order to run scripts in the background. This means your experiment is running in a "screen" separate to the main screen you can interface with. If you ran an experiment with the -b flag, here's how to check up on the progress of your experiment. Login to the cluster and enter `screen -r` in order to reattach the other screen. Once reattached you should immediately be able to see the stdout string of your running experiment. To detach from this screen, hit `Ctrl-d` to signal for commands to be sent to screen rather than than the shell, then hit `a`. 

## Close Clusters

If you didn't run `./run_rllib.sh` with the -s option, then you will need to shutdown the cluster manually. To do this, log on to the cluster and run:  

`ray2 shutdown`

## Troubleshooting


- NOTE: If pyarrow is an issue or Ray is being an issue, this is what I did. basically you have to completely get rid of ray and reinstall it again
  - source activate [your_env]
  - `rm -rf ray`
  - repeat the following until `which ray` returns blank:
    - `which ray`
    - `rm [the output of which ray]`
    - this gets rid of the binary installed. Idk if this is necessary but I did it. after this step it’s as if ray never existed on your system
  - Go to directory you want to install ray and run: `git clone https://github.com/eugenevinitsky/ray.git`
  - `cd ray/python`
  - `python setup.py develop`
  
  
  
* pip install for rayutils doesn't work
    * this is being run in "editable" mode
    * NOTE: Richard Liaw's Git README suggests that you run the following command: `pip install git+https://github.com/richardliaw/rayutils.git`. I suspect there's something wrong with the repo structure, because rayutils is nowhere to be found after pip returns a successful installation. My edit does the pip installation in "edit" mode, and the #rayutils at the end of the command denotes the name of the package
    TODO: Could someone confirm that ^ doesn't work?