# Ray Autoscaling

There are a few different ways to bring up and manage a ray cluster and as you would expect the [Ray Docs](https://docs.ray.io/en/master/tune/tutorials/tune-distributed.html) do a good job at explaining those.

We are going to focus on launching a cloud cluster on AWS using the [Ray Autoscaling](https://docs.ray.io/en/master/autoscaling.html) functionality.


## There are a few pre-requisites

Which should have been covered in your credentials access
  
 - Install boto3
 - ensure credentials are setup
 - check credentials for boto3 access - https://boto3.amazonaws.com/v1/documentation/api/latest/guide/configuration.html



## Do a smoke test on boto3

list buckets on S3

In [1]:
import boto3

session = boto3.Session()
dev_s3_client = session.client('s3')
dev_s3_client.list_buckets()

{'ResponseMetadata': {'RequestId': '5FFFE6D9506C1EEC',
  'HostId': 'eabicxoTrZ2TV89+gIxXwocziR2K/QA67p1NZNwT4FPMr9PeQtxk7kufwwlqYzvll2Q46gTkkfc=',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'x-amz-id-2': 'eabicxoTrZ2TV89+gIxXwocziR2K/QA67p1NZNwT4FPMr9PeQtxk7kufwwlqYzvll2Q46gTkkfc=',
   'x-amz-request-id': '5FFFE6D9506C1EEC',
   'date': 'Fri, 12 Jun 2020 09:17:23 GMT',
   'content-type': 'application/xml',
   'transfer-encoding': 'chunked',
   'server': 'AmazonS3'},
  'RetryAttempts': 0},
 'Buckets': [],
 'Owner': {'ID': '5c6b8c15734061985b64a686156fe288fb3c7203d90d7d022a23233325058cf6'}}

## Minimal Example

This is `minimal.yaml` in the same folder as this notebook

In [None]:
# An unique identifier for the head node and workers of this cluster.
cluster_name: minimal

# The maximum number of workers nodes to launch in addition to the head
# node. This takes precedence over min_workers. min_workers default to 0.
max_workers: 4

# Cloud-provider specific configuration.
provider:
    type: aws
    region: us-east-1

# How Ray will authenticate with newly launched nodes.
auth:
    ssh_user: ubuntu

## Commands

To bring up a cluster

 - `ray up cluster_minimal.yaml`
 - go watch aws :)



To access the HEAD NODE on the cluster

 - `ray attach cluster_minimal.yaml`
 - `conda activate t20-fri-ray`

To Bring the cluster back down

 - `ray down cluster_minimal.yaml`
 - watch AWS to make sure it comes down :D

Run a job on a node

 - `ray submit tune-default.yaml tune_script.py --start -- --ray-address=localhost:6379`

## Custom Example

This is cluster_tune.yml in the same folder as this notebook

In [None]:
# An unique identifier for the head node and workers of this cluster.
cluster_name: raytune

# The minimum number of workers nodes to launch in addition to the head
# node. This number should be >= 0.
min_workers: 2    

# The maximum number of workers nodes to launch in addition to the head
# node. This takes precedence over min_workers. min_workers default to 0.
max_workers: 8

# Cloud-provider specific configuration.
provider:
    type: aws
    region: us-east-2
    # availability_zone: us-west-2b

# How Ray will authenticate with newly launched nodes.
auth:
    ssh_user: ubuntu
        
# Provider-specific config for the head node, e.g. instance type. By default
# Ray will auto-configure unspecified fields such as SubnetId and KeyName.
# For more documentation on available fields, see:
# http://boto3.readthedocs.io/en/latest/reference/services/ec2.html#EC2.ServiceResource.create_instances
head_node:
    InstanceType: c5.2xlarge
    ImageId: ami-07c1207a9d40bc3bd  # Default Ubuntu 16.04 AMI.

    # Set primary volume to 50 GiB
    BlockDeviceMappings:
        - DeviceName: /dev/sda1
          Ebs:
              VolumeSize: 50
        
# Provider-specific config for worker nodes, e.g. instance type. By default
# Ray will auto-configure unspecified fields such as SubnetId and KeyName.
# For more documentation on available fields, see:
# http://boto3.readthedocs.io/en/latest/reference/services/ec2.html#EC2.ServiceResource.create_instances
worker_nodes:
    InstanceType: c4.2xlarge
    ImageId: ami-07c1207a9d40bc3bd  # Default Ubuntu 16.04 AMI.

    # Set primary volume to 50 GiB
    BlockDeviceMappings:
        - DeviceName: /dev/sda1
          Ebs:
              VolumeSize: 50

    # Run workers on spot by default. Comment this out to use on-demand.
#     InstanceMarketOptions:
#         MarketType: spot
        # Additional options can be found in the boto docs, e.g.
        #   SpotOptions:
        #       MaxPrice: MAX_HOURLY_PRICE

    # Additional options in the boto docs.
        
# Files or directories to copy to the head and worker nodes. The format is a
# dictionary from REMOTE_PATH: LOCAL_PATH, e.g.
file_mounts: {
    "/home/ubuntu/transform-2020-ray": "~/dev/transform-2020-ray"
 }

# List of shell commands to run to set up nodes.
setup_commands:
    # Consider uncommenting these if you run into dpkg locking issues
    # - sudo pkill -9 apt-get || true
    # - sudo pkill -9 dpkg || true
    # - sudo dpkg --configure -a
    # Install basics.
    - sudo apt-get update
    - sudo apt-get install -y build-essential
    - sudo apt-get install curl
    - sudo apt-get install unzip
    # Install Node.js in order to build the dashboard.
    - curl -sL https://deb.nodesource.com/setup_12.x | sudo -E bash
    - sudo apt-get install -y nodejs
    # Install Anaconda.
    - wget https://repo.continuum.io/archive/Anaconda3-5.0.1-Linux-x86_64.sh || true
    - bash Anaconda3-5.0.1-Linux-x86_64.sh -b -p $HOME/anaconda3 || true
    - echo 'export PATH="$HOME/anaconda3/bin:$PATH"' >> ~/.bashrc
    # Build Ray.
    - git clone https://github.com/ray-project/ray || true
    - ray/ci/travis/install-bazel.sh
    - cd ray/python/ray/dashboard/client; npm ci; npm run build
    - pip install boto3==1.4.8 cython==0.29.0 aiohttp grpcio psutil setproctitle
    - cd ray/python; pip install -e . --verbose

# Custom commands that will be run on the head node after common setup.
head_setup_commands: []

# Custom commands that will be run on worker nodes after common setup.
worker_setup_commands: []

# Command to start ray on the head node. You don't need to change this.
head_start_ray_commands:
    - ray stop
    - ulimit -n 65536; ray start --head --num-redis-shards=10 --port=6379 --autoscaling-config=~/ray_bootstrap_config.yaml

# Command to start ray on worker nodes. You don't need to change this.
worker_start_ray_commands:
    - ray stop
    - ulimit -n 65536; ray start --address=$RAY_HEAD_IP:6379        
        
        
# If a node is idle for this many minutes, it will be removed.
idle_timeout_minutes: 5