# Trainables

So far we have been using the functional interface to Raytune, which is lightweight and easy to get started with.

However, is limited in a couple of ways (1) it doesn't allow us to maintain state (2) raytune cannot 'see' or manage training iterations (3) it doesn't let us use some other useful parts of Raytune like the checkpointing or schedulers.

We'll take a look at a simple trainable below


In [1]:
%load_ext autoreload
%autoreload 2

from dependencies import *

Loading dependencies we have already seen...
Importing ray...
Done...


## Trainable Interface

 1. By subclassing tune.Trainable
 2. Setup state in `__init__`
 3. Implement `_train()` such that si completely one using unit/iteration of training
 4. Implement `_save` to save state, checkpoint models, etc...
 5. Implement `_restore` to, restore...


In [4]:
from os import path

class MyTrainable(tune.Trainable):
    
    
    def _setup(self, config):
        # config (dict): A dict of hyperparameters
        self.x = 0
        self.a = config["a"]

        
    def _train(self):  # This is called iteratively.
        self.x += self.a
        print("Trainable", f"({self.a})", self.x)
        return {"score": self.x }
    
    
    def _save(self, checkpoint_dir):
        checkpoint_path = path.join(checkpoint_dir, "model.npy")
        np.save(checkpoint_path, np.array(self.x))
        return checkpoint_path

    #
    # Restore is used internally by Raytune and schedulers. 
    # It's only useful manually on single training runs.
    #
    def _restore(self, checkpoint_path):
        print("CHECKPOINT PATH", checkpoint_path)
        self.x = np.load(checkpoint_path)[0]


## Start Ray

In [5]:
ray.shutdown()
ray.init(num_cpus=2, num_gpus=0, include_webui=True)

2020-06-12 11:16:54,184	INFO resource_spec.py:204 -- Starting Ray with 32.91 GiB memory available for workers and up to 16.47 GiB for objects. You can adjust these settings with ray.init(memory=<bytes>, object_store_memory=<bytes>).
2020-06-12 11:16:54,446	INFO services.py:1168 -- View the Ray dashboard at [1m[32mlocalhost:8267[39m[22m


{'node_ip_address': '192.168.1.39',
 'raylet_ip_address': '192.168.1.39',
 'redis_address': '192.168.1.39:56277',
 'object_store_address': '/tmp/ray/session_2020-06-12_11-16-54_183366_135563/sockets/plasma_store',
 'raylet_socket_name': '/tmp/ray/session_2020-06-12_11-16-54_183366_135563/sockets/raylet',
 'webui_url': 'localhost:8267',
 'session_dir': '/tmp/ray/session_2020-06-12_11-16-54_183366_135563'}

## Run

Do some simple tuning

In [6]:
analysis = tune.run(
    MyTrainable,
    name="simple_trainable",
    stop={"training_iteration": 20},
    config={ "a": tune.grid_search([1,2]) },
    checkpoint_freq=5,
    resources_per_trial=dict(cpu=1, gpu=0),
    local_dir="~/ray_results/my_trainable")

print('best config: ', analysis.get_best_config(metric="score", mode="max"))

Trial name,status,loc,a
MyTrainable_00000,RUNNING,,1
MyTrainable_00001,PENDING,,2


Result for MyTrainable_00000:
  date: 2020-06-12_11-16-55
  done: false
  experiment_id: b8aa5c204c254de78302230079b31d71
  experiment_tag: 0_a=1
  hostname: cosmos-ml
  iterations_since_restore: 1
  node_ip: 192.168.1.39
  pid: 135798
  score: 1
  time_since_restore: 3.790855407714844e-05
  time_this_iter_s: 3.790855407714844e-05
  time_total_s: 3.790855407714844e-05
  timestamp: 1591957015
  timesteps_since_restore: 0
  training_iteration: 1
  trial_id: '00000'
  
Result for MyTrainable_00001:[2m[36m(pid=135798)[0m 2020-06-12 11:16:55,909	INFO trainable.py:217 -- Getting current IP.
  date: 2020-06-12_11-16-55
  done: false
  experiment_id: 1b8c21e83c984c8592d3ffa19882701a
  experiment_tag: 1_a=2
  hostname: cosmos-ml
  iterations_since_restore: 1
  node_ip: 192.168.1.39
  pid: 135797
  score: 2
  time_since_restore: 2.7418136596679688e-05
  time_this_iter_s: 2.7418136596679688e-05
  time_total_s: 2.7418136596679688e-05
  timestamp: 1591957015
  timesteps_since_restore: 0
  traini

Trial name,status,loc,a,iter,total time (s)
MyTrainable_00000,TERMINATED,,1,20,0.000533342
MyTrainable_00001,TERMINATED,,2,20,0.000697136


[2m[36m(pid=135797)[0m Trainable (2) 16
[2m[36m(pid=135797)[0m Trainable (2) 18
[2m[36m(pid=135797)[0m Trainable (2) 20
[2m[36m(pid=135797)[0m Trainable (2) 22
[2m[36m(pid=135797)[0m Trainable (2) 24
[2m[36m(pid=135797)[0m Trainable (2) 26
[2m[36m(pid=135797)[0m Trainable (2) 28
[2m[36m(pid=135797)[0m Trainable (2) 30
[2m[36m(pid=135797)[0m Trainable (2) 32
[2m[36m(pid=135797)[0m Trainable (2) 34
[2m[36m(pid=135797)[0m Trainable (2) 36
[2m[36m(pid=135797)[0m Trainable (2) 38
[2m[36m(pid=135797)[0m Trainable (2) 40
best config:  {'a': 2}


Go check the ray_results directory!!!

In [7]:
ray.shutdown()