# FossilNET Classification

In this notebook, we are going to repeast the FossilNET Training but using BHOB

[BHOB Docs](https://docs.ray.io/en/latest/tune-searchalg.html#bohb)

### Load Dependencies

We load the usual deps and also load [PyTorch](https://pytorch.org/docs/stable/index.html) and the [TorchVision](https://pytorch.org/docs/stable/torchvision/index.html) helper library to get access to pretrained models, dataloaders & transformers for image problems

In [1]:
%load_ext autoreload
%autoreload 2

from fossilnet_deps import *

Loading dependencies we have already seen...
Importing ray...
Done...


### Check for Cuda

In [2]:
print('CUDA Available') if torch.cuda.is_available() else print('CPU Only')

CPU Only


### Start Ray

In [3]:
ray.shutdown()
ray.init(num_cpus=3, num_gpus=1, include_webui=True)

2020-06-11 23:42:01,064	INFO resource_spec.py:204 -- Starting Ray with 4.83 GiB memory available for workers and up to 2.43 GiB for objects. You can adjust these settings with ray.init(memory=<bytes>, object_store_memory=<bytes>).
2020-06-11 23:42:01,411	INFO services.py:1168 -- View the Ray dashboard at [1m[32mlocalhost:8266[39m[22m


{'node_ip_address': '192.168.1.39',
 'raylet_ip_address': '192.168.1.39',
 'redis_address': '192.168.1.39:52710',
 'object_store_address': '/tmp/ray/session_2020-06-11_23-42-01_051223_20118/sockets/plasma_store',
 'raylet_socket_name': '/tmp/ray/session_2020-06-11_23-42-01_051223_20118/sockets/raylet',
 'webui_url': 'localhost:8266',
 'session_dir': '/tmp/ray/session_2020-06-11_23-42-01_051223_20118'}

In [11]:
# BOHB uses ConfigSpace for their hyperparameter search space
import ConfigSpace as CS
from ray.tune.schedulers.hb_bohb import HyperBandForBOHB
from ray.tune.suggest.bohb import TuneBOHB

config_space = CS.ConfigurationSpace()

config_space.add_hyperparameter(
    CS.UniformFloatHyperparameter("lr", lower=1e-3, upper=1e-1)
)

config_space.add_hyperparameter(
    CS.UniformFloatHyperparameter("weight_decay", lower=1e-7, upper=1e-3)
)

config_space.add_hyperparameter(
    CS.CategoricalHyperparameter("batch_size", choices=[8, 16, 32, 64])
)

config_space.add_hyperparameter(
    CS.CategoricalHyperparameter("augment_flip", choices=[True, False])
)

config_space.add_hyperparameter(
    CS.CategoricalHyperparameter("use_grayscale", choices=[True, False])
)




experiment_metrics = dict(metric="val_f1_score", mode="max")




bohb_hyperband = tune.schedulers.hb_bohb.HyperBandForBOHB(time_attr="training_iteration",
                                                max_t=200,
                                                **experiment_metrics)



bohb_search = tune.suggest.bohb.TuneBOHB(config_space,
                                           max_concurrent=10, 
                                           **experiment_metrics)

#
# Before commiting to a huge run, run training 1 iteration with N (10?) samples to dry run through different
# hyperparameter options
#
# Then set this to False and tune for real
#
smoke_test = True

analysis = tune.run(
    FossilTrainable,
    scheduler=bohb_hyperband,
    search_alg=bohb_search,
    
    local_dir="~/ray_results/torch_fossilnet_bhob",
    resources_per_trial={
        "cpu": 3,
        "gpu": 1
    },
    num_samples=5 if smoke_test else 100,
    checkpoint_at_end=True,
    keep_checkpoints_num=5,
    checkpoint_freq=3
)

Trial name,status,loc,augment_flip,batch_size,lr,use_grayscale,weight_decay
FossilTrainable_8f11bab8,RUNNING,,False,16,0.00440261,False,0.000919423
FossilTrainable_8f11fe38,PENDING,,True,32,0.00914594,True,0.000647219
FossilTrainable_8f122d68,PENDING,,False,8,0.0497776,True,0.00040671
FossilTrainable_8f125dc4,PENDING,,True,16,0.0167502,True,0.000366055
FossilTrainable_8f128d9e,PENDING,,True,8,0.0934811,True,0.000433276


[2m[36m(pid=21039)[0m Loading dependencies we have already seen...


2020-06-11 23:47:44,424	ERROR trial_runner.py:519 -- Trial FossilTrainable_8f11bab8: Error processing event.
Traceback (most recent call last):
  File "/Users/stevejpurves/anaconda/anaconda3/envs/t20-fri-ray/lib/python3.8/site-packages/ray/tune/trial_runner.py", line 467, in _process_trial
    result = self.trial_executor.fetch_result(trial)
  File "/Users/stevejpurves/anaconda/anaconda3/envs/t20-fri-ray/lib/python3.8/site-packages/ray/tune/ray_trial_executor.py", line 431, in fetch_result
    result = ray.get(trial_future[0], DEFAULT_GET_TIMEOUT)
  File "/Users/stevejpurves/anaconda/anaconda3/envs/t20-fri-ray/lib/python3.8/site-packages/ray/worker.py", line 1515, in get
    raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(TypeError): [36mray::FossilTrainable.train()[39m (pid=21039, ip=192.168.1.39)
  File "python/ray/_raylet.pyx", line 424, in ray._raylet.execute_task
  File "python/ray/_raylet.pyx", line 459, in ray._raylet.execute_task
  File "python/ray/_raylet.pyx",

[2m[36m(pid=21039)[0m Importing ray...
[2m[36m(pid=21039)[0m Done...
[2m[36m(pid=21042)[0m Loading dependencies we have already seen...


2020-06-11 23:47:46,883	ERROR trial_runner.py:519 -- Trial FossilTrainable_8f11fe38: Error processing event.
Traceback (most recent call last):
  File "/Users/stevejpurves/anaconda/anaconda3/envs/t20-fri-ray/lib/python3.8/site-packages/ray/tune/trial_runner.py", line 467, in _process_trial
    result = self.trial_executor.fetch_result(trial)
  File "/Users/stevejpurves/anaconda/anaconda3/envs/t20-fri-ray/lib/python3.8/site-packages/ray/tune/ray_trial_executor.py", line 431, in fetch_result
    result = ray.get(trial_future[0], DEFAULT_GET_TIMEOUT)
  File "/Users/stevejpurves/anaconda/anaconda3/envs/t20-fri-ray/lib/python3.8/site-packages/ray/worker.py", line 1515, in get
    raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(TypeError): [36mray::FossilTrainable.train()[39m (pid=21042, ip=192.168.1.39)
  File "python/ray/_raylet.pyx", line 424, in ray._raylet.execute_task
  File "python/ray/_raylet.pyx", line 459, in ray._raylet.execute_task
  File "python/ray/_raylet.pyx",

[2m[36m(pid=21042)[0m Importing ray...
[2m[36m(pid=21042)[0m Done...
[2m[36m(pid=21046)[0m Loading dependencies we have already seen...


2020-06-11 23:47:49,411	ERROR trial_runner.py:519 -- Trial FossilTrainable_8f122d68: Error processing event.
Traceback (most recent call last):
  File "/Users/stevejpurves/anaconda/anaconda3/envs/t20-fri-ray/lib/python3.8/site-packages/ray/tune/trial_runner.py", line 467, in _process_trial
    result = self.trial_executor.fetch_result(trial)
  File "/Users/stevejpurves/anaconda/anaconda3/envs/t20-fri-ray/lib/python3.8/site-packages/ray/tune/ray_trial_executor.py", line 431, in fetch_result
    result = ray.get(trial_future[0], DEFAULT_GET_TIMEOUT)
  File "/Users/stevejpurves/anaconda/anaconda3/envs/t20-fri-ray/lib/python3.8/site-packages/ray/worker.py", line 1515, in get
    raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(TypeError): [36mray::FossilTrainable.train()[39m (pid=21046, ip=192.168.1.39)
  File "python/ray/_raylet.pyx", line 424, in ray._raylet.execute_task
  File "python/ray/_raylet.pyx", line 459, in ray._raylet.execute_task
  File "python/ray/_raylet.pyx",

Trial name,status,loc,augment_flip,batch_size,lr,use_grayscale,weight_decay
FossilTrainable_8f11bab8,ERROR,,False,16,0.00440261,False,0.000919423
FossilTrainable_8f11fe38,ERROR,,True,32,0.00914594,True,0.000647219
FossilTrainable_8f122d68,ERROR,,False,8,0.0497776,True,0.00040671
FossilTrainable_8f125dc4,PENDING,,True,16,0.0167502,True,0.000366055
FossilTrainable_8f128d9e,PENDING,,True,8,0.0934811,True,0.000433276

Trial name,# failures,error file
FossilTrainable_8f11bab8,1,"/Users/stevejpurves/ray_results/torch_fossilnet_bhob/FossilTrainable/FossilTrainable_1_augment_flip=False,batch_size=16,lr=0.0044026,use_grayscale=False,weight_decay=0.00091942_2020-06-11_23-47-42hjkbpgsc/error.txt"
FossilTrainable_8f11fe38,1,"/Users/stevejpurves/ray_results/torch_fossilnet_bhob/FossilTrainable/FossilTrainable_2_augment_flip=True,batch_size=32,lr=0.0091459,use_grayscale=True,weight_decay=0.00064722_2020-06-11_23-47-440zs0se1x/error.txt"
FossilTrainable_8f122d68,1,"/Users/stevejpurves/ray_results/torch_fossilnet_bhob/FossilTrainable/FossilTrainable_3_augment_flip=False,batch_size=8,lr=0.049778,use_grayscale=True,weight_decay=0.00040671_2020-06-11_23-47-46rbyq5l62/error.txt"


[2m[36m(pid=21046)[0m Importing ray...
[2m[36m(pid=21046)[0m Done...
[2m[36m(pid=21063)[0m Loading dependencies we have already seen...


2020-06-11 23:47:52,044	ERROR trial_runner.py:519 -- Trial FossilTrainable_8f125dc4: Error processing event.
Traceback (most recent call last):
  File "/Users/stevejpurves/anaconda/anaconda3/envs/t20-fri-ray/lib/python3.8/site-packages/ray/tune/trial_runner.py", line 467, in _process_trial
    result = self.trial_executor.fetch_result(trial)
  File "/Users/stevejpurves/anaconda/anaconda3/envs/t20-fri-ray/lib/python3.8/site-packages/ray/tune/ray_trial_executor.py", line 431, in fetch_result
    result = ray.get(trial_future[0], DEFAULT_GET_TIMEOUT)
  File "/Users/stevejpurves/anaconda/anaconda3/envs/t20-fri-ray/lib/python3.8/site-packages/ray/worker.py", line 1515, in get
    raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(TypeError): [36mray::FossilTrainable.train()[39m (pid=21063, ip=192.168.1.39)
  File "python/ray/_raylet.pyx", line 424, in ray._raylet.execute_task
  File "python/ray/_raylet.pyx", line 459, in ray._raylet.execute_task
  File "python/ray/_raylet.pyx",

[2m[36m(pid=21063)[0m Importing ray...
[2m[36m(pid=21063)[0m Done...
[2m[36m(pid=21065)[0m Loading dependencies we have already seen...


2020-06-11 23:47:54,310	ERROR trial_runner.py:519 -- Trial FossilTrainable_8f128d9e: Error processing event.
Traceback (most recent call last):
  File "/Users/stevejpurves/anaconda/anaconda3/envs/t20-fri-ray/lib/python3.8/site-packages/ray/tune/trial_runner.py", line 467, in _process_trial
    result = self.trial_executor.fetch_result(trial)
  File "/Users/stevejpurves/anaconda/anaconda3/envs/t20-fri-ray/lib/python3.8/site-packages/ray/tune/ray_trial_executor.py", line 431, in fetch_result
    result = ray.get(trial_future[0], DEFAULT_GET_TIMEOUT)
  File "/Users/stevejpurves/anaconda/anaconda3/envs/t20-fri-ray/lib/python3.8/site-packages/ray/worker.py", line 1515, in get
    raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(TypeError): [36mray::FossilTrainable.train()[39m (pid=21065, ip=192.168.1.39)
  File "python/ray/_raylet.pyx", line 424, in ray._raylet.execute_task
  File "python/ray/_raylet.pyx", line 459, in ray._raylet.execute_task
  File "python/ray/_raylet.pyx",

[2m[36m(pid=21065)[0m Importing ray...
[2m[36m(pid=21065)[0m Done...


Trial name,status,loc,augment_flip,batch_size,lr,use_grayscale,weight_decay
FossilTrainable_8f11bab8,ERROR,,False,16,0.00440261,False,0.000919423
FossilTrainable_8f11fe38,ERROR,,True,32,0.00914594,True,0.000647219
FossilTrainable_8f122d68,ERROR,,False,8,0.0497776,True,0.00040671
FossilTrainable_8f125dc4,ERROR,,True,16,0.0167502,True,0.000366055
FossilTrainable_8f128d9e,ERROR,,True,8,0.0934811,True,0.000433276

Trial name,# failures,error file
FossilTrainable_8f11bab8,1,"/Users/stevejpurves/ray_results/torch_fossilnet_bhob/FossilTrainable/FossilTrainable_1_augment_flip=False,batch_size=16,lr=0.0044026,use_grayscale=False,weight_decay=0.00091942_2020-06-11_23-47-42hjkbpgsc/error.txt"
FossilTrainable_8f11fe38,1,"/Users/stevejpurves/ray_results/torch_fossilnet_bhob/FossilTrainable/FossilTrainable_2_augment_flip=True,batch_size=32,lr=0.0091459,use_grayscale=True,weight_decay=0.00064722_2020-06-11_23-47-440zs0se1x/error.txt"
FossilTrainable_8f122d68,1,"/Users/stevejpurves/ray_results/torch_fossilnet_bhob/FossilTrainable/FossilTrainable_3_augment_flip=False,batch_size=8,lr=0.049778,use_grayscale=True,weight_decay=0.00040671_2020-06-11_23-47-46rbyq5l62/error.txt"
FossilTrainable_8f125dc4,1,"/Users/stevejpurves/ray_results/torch_fossilnet_bhob/FossilTrainable/FossilTrainable_4_augment_flip=True,batch_size=16,lr=0.01675,use_grayscale=True,weight_decay=0.00036605_2020-06-11_23-47-49eueudvj8/error.txt"
FossilTrainable_8f128d9e,1,"/Users/stevejpurves/ray_results/torch_fossilnet_bhob/FossilTrainable/FossilTrainable_5_augment_flip=True,batch_size=8,lr=0.093481,use_grayscale=True,weight_decay=0.00043328_2020-06-11_23-47-52hxsepji_/error.txt"


TuneError: ('Trials did not complete', [FossilTrainable_8f11bab8, FossilTrainable_8f11fe38, FossilTrainable_8f122d68, FossilTrainable_8f125dc4, FossilTrainable_8f128d9e])

In [None]:
print("Best config is:", analysis.get_best_config(metric="best_val_f1_score"))

In [None]:
import ray
ray.shutdown()


## Next

Head over to EC2 and check results of a longer run on [tensorboard](http://ec2-3-136-85-207.us-east-2.compute.amazonaws.com:6006/)