# Hyperparameter optimization
---
This notebook uses Cirrus to optimize the hyperparameters of a logistic regression model.

## Setup
---

In [1]:
# To ease development, each time a cell is run, all modules will be reloaded.
%load_ext autoreload
%autoreload 2

In [2]:
import logging
import sys
import atexit

In [3]:
# Cirrus produces logs, but they will not show unless we add a handler that prints.
from cirrus import utilities
utilities.set_logging_handler()

[     _initialize | ResourceManager] Initializing no-retries Lambda client.
[     _initialize | ResourceManager] Initializing IAM resource.
[     _initialize | ResourceManager] Initializing EC2 resource.
[     _initialize | ResourceManager] Initializing Cloudwatch Logs client.
[     _initialize | ResourceManager] Initializing S3 resource.
[     _initialize | ResourceManager] Initializing STS client.


In [4]:
from cirrus import instance, automate, lr, GridSearch, cv_utils

## Instances, base task configuration, hyperparameters
---

First, we start some EC2 instances.

In [5]:
NUM_INSTANCES = 2

instances = []
for i in range(NUM_INSTANCES):
    inst = instance.Instance(
        name="hyperparameter_example_instance_%d" % i,
        disk_size=32,
        typ="m4.2xlarge",
        username="ubuntu",
        ami_owner_name=("self", "cirrus_server_image")
    )
    inst.start()
    instances.append(inst)

[        __init__ |      MainThread] Resolving AMI owner/name to AMI ID.
[        __init__ |      MainThread] Done.
[         _exists |      MainThread] Listing instances.
[         _exists |      MainThread] No existing instance with the same name was found.
[ _start_and_wait |      MainThread] Starting a new instance.
[ _start_and_wait |      MainThread] Waiting for instance to enter running state.
[ _start_and_wait |      MainThread] Fetching instance metadata.
[ _start_and_wait |      MainThread] Done.
[           start |      MainThread] Done.
[        __init__ |      MainThread] Resolving AMI owner/name to AMI ID.
[        __init__ |      MainThread] Done.
[         _exists |      MainThread] Listing instances.
[         _exists |      MainThread] No existing instance with the same name was found.
[ _start_and_wait |      MainThread] Starting a new instance.
[ _start_and_wait |      MainThread] Waiting for instance to enter running state.
[ _start_and_wait |      MainThread] Fetc

Second, we define the base configuration for our machine learning task.

In [6]:
base_task_config = {
    "n_workers": 16,
    "n_ps": 1,
    "dataset": "criteo-kaggle-19b",
    "learning_rate": 0.0001,
    "epsilon": 0.0001,
    "progress_callback": None,
    "train_set": [(0, 799)],
    "test_set": (800, 850),
    "minibatch_size": 200,
    "model_bits": 19,
    "opt_method": "adagrad",
    "timeout": 60,
    "lambda_size": 192
}

Third, we identify our hyperparameters and their possible values.

In [10]:
hyperparameter_names = [
    "train_set",
    "test_set"
]
hyperparameter_values = cv_utils.create_test_train_sets(k = 10, num_sets = 841)
print(hyperparameter_values)

[[0, 83], [84, 167], [168, 251], [252, 335], [336, 419], [420, 503], [504, 587], [588, 671], [672, 755], [756, 840]]
[[[[84, 167], [168, 251], [252, 335], [336, 419], [420, 503], [504, 587], [588, 671], [672, 755], [756, 840]], [[0, 83], [168, 251], [252, 335], [336, 419], [420, 503], [504, 587], [588, 671], [672, 755], [756, 840]], [[0, 83], [84, 167], [252, 335], [336, 419], [420, 503], [504, 587], [588, 671], [672, 755], [756, 840]], [[0, 83], [84, 167], [168, 251], [336, 419], [420, 503], [504, 587], [588, 671], [672, 755], [756, 840]], [[0, 83], [84, 167], [168, 251], [252, 335], [420, 503], [504, 587], [588, 671], [672, 755], [756, 840]], [[0, 83], [84, 167], [168, 251], [252, 335], [336, 419], [504, 587], [588, 671], [672, 755], [756, 840]], [[0, 83], [84, 167], [168, 251], [252, 335], [336, 419], [420, 503], [588, 671], [672, 755], [756, 840]], [[0, 83], [84, 167], [168, 251], [252, 335], [336, 419], [420, 503], [504, 587], [672, 755], [756, 840]], [[0, 83], [84, 167], [168, 25

All of the above defines a hyperparameter optimization task, which consists of one machine learning task per assignment of values to the hyperparameters.

In [11]:
search = GridSearch(
    task=lr.LogisticRegression,
    param_base=base_task_config,
    hyper_vars=hyperparameter_names,
    hyper_params=hyperparameter_values,
    instances=instances,
    cross_validation=True
)

[([[84, 167], [168, 251], [252, 335], [336, 419], [420, 503], [504, 587], [588, 671], [672, 755], [756, 840]], (0, 83)), ([[0, 83], [168, 251], [252, 335], [336, 419], [420, 503], [504, 587], [588, 671], [672, 755], [756, 840]], (84, 167)), ([[0, 83], [84, 167], [252, 335], [336, 419], [420, 503], [504, 587], [588, 671], [672, 755], [756, 840]], (168, 251)), ([[0, 83], [84, 167], [168, 251], [336, 419], [420, 503], [504, 587], [588, 671], [672, 755], [756, 840]], (252, 335)), ([[0, 83], [84, 167], [168, 251], [252, 335], [420, 503], [504, 587], [588, 671], [672, 755], [756, 840]], (336, 419)), ([[0, 83], [84, 167], [168, 251], [252, 335], [336, 419], [504, 587], [588, 671], [672, 755], [756, 840]], (420, 503)), ([[0, 83], [84, 167], [168, 251], [252, 335], [336, 419], [420, 503], [588, 671], [672, 755], [756, 840]], (504, 587)), ([[0, 83], [84, 167], [168, 251], [252, 335], [336, 419], [420, 503], [504, 587], [672, 755], [756, 840]], (588, 671)), ([[0, 83], [84, 167], [168, 251], [252,

## Run
---

Next, we run our hyperparameter optimization task.

In [12]:
search.run()

[get_available_concurrency |      MainThread] Getting account settings.
[get_available_concurrency |      MainThread] Done.
(0, 83)
[           start |      MainThread] Uploading configuration.
[           start |      MainThread] Starting parameter server.
[           start |      MainThread] Starting error task.
(84, 167)
[           start |      MainThread] Uploading configuration.
[           start |      MainThread] Starting parameter server.
[           start |      MainThread] Starting error task.
[wait_until_started |       Thread-26] Making connection attempt #1 to ParameterServer@52.38.42.122:1337.
(168, 251)
[wait_until_started |       Thread-27] Making connection attempt #1 to ParameterServer@52.24.74.165:1339.
[           start |      MainThread] Uploading configuration.
[           start |      MainThread] Starting parameter server.
[           start |      MainThread] Starting error task.
(252, 335)
[           start |      MainThread] Uploading configuration.
[         

[     run_command |       Thread-37] Waiting for completion.
[     run_command |       Thread-37] Fetching stdout and stderr.
[     run_command |       Thread-37] stdout had length 0.
[     run_command |       Thread-37] stderr had length 0.
[     run_command |       Thread-37] Exit code was 0.
[     run_command |       Thread-37] Done.
[     run_command |       Thread-36] Waiting for completion.
[     run_command |       Thread-36] Fetching stdout and stderr.
[     run_command |       Thread-36] stdout had length 0.
[     run_command |       Thread-36] stderr had length 0.
[     run_command |       Thread-36] Exit code was 0.
[     run_command |       Thread-36] Done.
[wait_until_started |       Thread-26] Making connection attempt #2 to ParameterServer@52.38.42.122:1337.
[wait_until_started |       Thread-27] Making connection attempt #2 to ParameterServer@52.24.74.165:1339.
[wait_until_started |       Thread-28] Making connection attempt #2 to ParameterServer@52.38.42.122:1341.
[wai

[   launch_worker | Exp #02 Wkr #2002] Launching Task 20020000.
[           new_f | Exp #06 Wkr #6010] jittery_exponential_backoff: Making attempt #1.
[   launch_worker | Exp #06 Wkr #6010] Launching Task 60100000.
[           new_f | Exp #08 Wkr #8007] jittery_exponential_backoff: Making attempt #1.
[   launch_worker | Exp #08 Wkr #8007] Launching Task 80070000.
[           new_f | Exp #06 Wkr #6011] jittery_exponential_backoff: Making attempt #1.
[           new_f | Exp #01 Wkr #1007] jittery_exponential_backoff: Making attempt #1.
[   launch_worker | Exp #01 Wkr #1007] Launching Task 10070000.
[     make_lambda |       Thread-33] Allocating reserved concurrent executions to the Lambda.
[           new_f | Exp #06 Wkr #6012] jittery_exponential_backoff: Making attempt #1.
[   launch_worker | Exp #06 Wkr #6012] Launching Task 60120000.
[           new_f | Exp #08 Wkr #8009] jittery_exponential_backoff: Making attempt #1.
[   launch_worker | Exp #08 Wkr #8009] Launching Task 80090000.


[     make_lambda |       Thread-31] Done.
[           new_f | Exp #01 Wkr #1010] jittery_exponential_backoff: Making attempt #1.
[   launch_worker | Exp #01 Wkr #1010] Launching Task 10100000.
[           new_f | Exp #07 Wkr #7001] jittery_exponential_backoff: Making attempt #1.
[   launch_worker | Exp #07 Wkr #7001] Launching Task 70010000.
[           new_f | Exp #07 Wkr #7002] jittery_exponential_backoff: Making attempt #1.
[   launch_worker | Exp #07 Wkr #7002] Launching Task 70020000.
[           new_f | Exp #07 Wkr #7004] jittery_exponential_backoff: Making attempt #1.
[   launch_worker | Exp #07 Wkr #7004] Launching Task 70040000.
[           new_f | Exp #01 Wkr #1012] jittery_exponential_backoff: Making attempt #1.
[   launch_worker | Exp #01 Wkr #1012] Launching Task 10120000.
[           new_f | Exp #07 Wkr #7007] jittery_exponential_backoff: Making attempt #1.
[           new_f | Exp #07 Wkr #7008] jittery_exponential_backoff: Making attempt #1.
[   launch_worker | Exp #07 

[   launch_worker | Exp #09 Wkr #9000] Launching Task 90000000.
[           new_f | Exp #09 Wkr #9001] jittery_exponential_backoff: Making attempt #1.
[   launch_worker | Exp #09 Wkr #9001] Launching Task 90010000.
[           new_f | Exp #03 Wkr #3002] jittery_exponential_backoff: Making attempt #1.
[   launch_worker | Exp #03 Wkr #3002] Launching Task 30020000.
[           new_f | Exp #09 Wkr #9004] jittery_exponential_backoff: Making attempt #1.
[   launch_worker | Exp #09 Wkr #9004] Launching Task 90040000.
[           new_f | Exp #09 Wkr #9006] jittery_exponential_backoff: Making attempt #1.
[   launch_worker | Exp #09 Wkr #9006] Launching Task 90060000.
[           new_f | Exp #09 Wkr #9007] jittery_exponential_backoff: Making attempt #1.
[   launch_worker | Exp #09 Wkr #9007] Launching Task 90070000.
[           new_f | Exp #03 Wkr #3006] jittery_exponential_backoff: Making attempt #1.
[   launch_worker | Exp #03 Wkr #3006] Launching Task 30060000.
[           new_f | Exp #03 Wk

Run this cell to see the present accuracy of experiment `I`.

In [25]:
I = 4

for line in search.cirrus_objs[I].ps.error_output().split("\n")[-20:]:
    print(line)

[     run_command |      MainThread] Running `cat error_out_1345`.
[     run_command |      MainThread] Waiting for completion.
[     run_command |      MainThread] Fetching stdout and stderr.
[     run_command |      MainThread] stdout had length 19779.
[     run_command |      MainThread] stderr had length 0.
[     run_command |      MainThread] Exit code was 0.
[     run_command |      MainThread] Done.
received s3 obj elapsed: 3762879 size: 13512160 BW (MB/s): 2.86102
Getting object count: 84 s3 e2e bw (MB/s): 5.56968
Waiting for pref_sem
S3SparseIterator: getting object 337
[ERROR_TASK] Loss (Total/Avg): 2.72482e+06/0.656582 Accuracy: 0.746799 time(us): 1549011974012246 time from start (sec): 6.43485
[ERROR_TASK] getting the full model
[ERROR_TASK] received the model
[ERROR_TASK] computing loss.
received s3 obj elapsed: 3369420 size: 13509168 BW (MB/s): 3.8147
Getting object count: 85 s3 e2e bw (MB/s): 5.53969
Waiting for pref_sem
Received: 4 bytes
Received: 11
[ERROR_TASK] Loss (

## Cleanup
---

When we're satisfied with the results, we kill our task.

In [26]:
search.kill_all()

[   delete_lambda | Exp #00 Cleanup] Deleting Lambda function cirrus_worker_0_2019-2-1_1-2-59-254914.
[     run_command |      MainThread] Running `kill -n 9 $(cat error_1337.pid)`.
[     run_command |      MainThread] Waiting for completion.
[     run_command |      MainThread] Fetching stdout and stderr.
[     run_command |      MainThread] stdout had length 0.
[     run_command |      MainThread] stderr had length 0.
[     run_command |      MainThread] Exit code was 0.
[     run_command |      MainThread] Done.
[     run_command |      MainThread] Running `kill -n 9 $(cat ps_1337.pid)`.
[     run_command |      MainThread] Waiting for completion.
[     run_command |      MainThread] Fetching stdout and stderr.
[     run_command |      MainThread] stdout had length 0.
[     run_command |      MainThread] stderr had length 0.
[     run_command |      MainThread] Exit code was 0.
[     run_command |      MainThread] Done.
[   launch_worker | Exp #00 Wkr #08] Task 80000 completed with 

RuntimeError: `kill -n 9 $(cat error_1341.pid)` returned nonzero exit code 1. The stderr follows.
bash: line 0: kill: (2690) - No such process


[   launch_worker | Exp #06 Wkr #6000] Task 60000000 completed with status code 200.
[           new_f | Exp #06 Wkr #6000] jittery_exponential_backoff: Making attempt #1.
[   launch_worker | Exp #06 Wkr #6000] Launching Task 60000001.
[   launch_worker | Exp #06 Wkr #6002] Task 60020000 completed with status code 200.
[           new_f | Exp #06 Wkr #6002] jittery_exponential_backoff: Making attempt #1.
[   launch_worker | Exp #06 Wkr #6002] Launching Task 60020001.
[   launch_worker | Exp #06 Wkr #6002] Task 60020001 completed with status code 200.
[   launch_worker | Exp #02 Wkr #2002] Task 20020000 completed with status code 200.
[   launch_worker | Exp #02 Wkr #2001] Task 20010000 completed with status code 200.
[   launch_worker | Exp #06 Wkr #6010] Task 60100000 completed with status code 200.
[           new_f | Exp #06 Wkr #6010] jittery_exponential_backoff: Making attempt #1.
[   launch_worker | Exp #06 Wkr #6005] Task 60050000 completed with status code 200.
[           new_

[   launch_worker | Exp #02 Wkr #2013] Task 20130000 completed with status code 200.
[   launch_worker | Exp #02 Wkr #2006] Task 20060000 completed with status code 200.
[   launch_worker | Exp #02 Wkr #2009] Task 20090000 completed with status code 200.
[   launch_worker | Exp #07 Wkr #7001] Task 70010000 completed with status code 200.
[           new_f | Exp #07 Wkr #7001] jittery_exponential_backoff: Making attempt #1.
[   launch_worker | Exp #07 Wkr #7001] Launching Task 70010001.
[   launch_worker | Exp #08 Wkr #8006] Task 80060000 completed with status code 200.
[           new_f | Exp #08 Wkr #8006] jittery_exponential_backoff: Making attempt #1.
[   launch_worker | Exp #08 Wkr #8006] Launching Task 80060001.
[   launch_worker | Exp #05 Wkr #5007] Task 50070000 completed with status code 200.
[           new_f | Exp #05 Wkr #5007] jittery_exponential_backoff: Making attempt #1.
[   launch_worker | Exp #05 Wkr #5007] Launching Task 50070001.
[   launch_worker | Exp #08 Wkr #8014

[           new_f | Exp #09 Wkr #9004] jittery_exponential_backoff: Making attempt #1.
[   launch_worker | Exp #09 Wkr #9004] Launching Task 90040001.
[   launch_worker | Exp #09 Wkr #9011] Task 90110000 completed with status code 200.
[           new_f | Exp #09 Wkr #9011] jittery_exponential_backoff: Making attempt #1.
[   launch_worker | Exp #09 Wkr #9011] Launching Task 90110001.
[   launch_worker | Exp #09 Wkr #9014] Task 90140000 completed with status code 200.
[           new_f | Exp #09 Wkr #9014] jittery_exponential_backoff: Making attempt #1.
[   launch_worker | Exp #09 Wkr #9014] Launching Task 90140001.
[   launch_worker | Exp #07 Wkr #7003] Task 70030000 completed with status code 200.
[           new_f | Exp #07 Wkr #7003] jittery_exponential_backoff: Making attempt #1.
[   launch_worker | Exp #07 Wkr #7003] Launching Task 70030001.
[   launch_worker | Exp #07 Wkr #7011] Task 70110000 completed with status code 200.
[           new_f | Exp #07 Wkr #7011] jittery_exponenti

[   launch_worker | Exp #05 Wkr #5008] Launching Task 50080001.
[   launch_worker | Exp #05 Wkr #5004] Task 50040000 completed with status code 200.
[   launch_worker | Exp #05 Wkr #5006] Launching Task 50060001.
[           new_f | Exp #03 Wkr #3004] jittery_exponential_backoff: Making attempt #1.
[   launch_worker | Exp #03 Wkr #3004] Launching Task 30040001.
[   launch_worker | Exp #03 Wkr #3010] Task 30100000 completed with status code 200.
[           new_f | Exp #04 Wkr #4007] jittery_exponential_backoff: Making attempt #1.
[   launch_worker | Exp #04 Wkr #4007] Launching Task 40070001.
[           new_f | Exp #05 Wkr #5004] jittery_exponential_backoff: Making attempt #1.
[   launch_worker | Exp #05 Wkr #5004] Launching Task 50040001.
[           new_f | Exp #03 Wkr #3010] jittery_exponential_backoff: Making attempt #1.
[   launch_worker | Exp #03 Wkr #3010] Launching Task 30100001.
[   launch_worker | Exp #09 Wkr #9008] Launching Task 90080001.
[   launch_worker | Exp #09 Wkr #9

We also need to terminate our instances in order to avoid continuing charges.

In [None]:
for inst in instances:
    inst.cleanup()

If a cell errors, running this should clean up any resources that were created. After running this cell, the kernel will become unusable and need to be restarted.

In [None]:
atexit._run_exitfuncs()

[         cleanup |      MainThread] Closing SSH client.
[         cleanup |      MainThread] Terminating instance.
[         cleanup |      MainThread] Waiting for instance to terminate.
[   launch_worker | Exp #01 Wkr #1008] Task 10080000 completed with status code 200.
[           new_f | Exp #01 Wkr #1008] jittery_exponential_backoff: Making attempt #1.
[   launch_worker | Exp #01 Wkr #1008] Launching Task 10080001.
[   launch_worker | Exp #03 Wkr #3015] Task 30150000 completed with status code 200.
[           new_f | Exp #03 Wkr #3015] jittery_exponential_backoff: Making attempt #1.
[   launch_worker | Exp #03 Wkr #3015] Launching Task 30150001.
[   launch_worker | Exp #03 Wkr #3008] Task 30080000 completed with status code 200.
[           new_f | Exp #03 Wkr #3008] jittery_exponential_backoff: Making attempt #1.
[   launch_worker | Exp #03 Wkr #3008] Launching Task 30080001.
[   launch_worker | Exp #03 Wkr #3007] Task 30070000 completed with status code 200.
[           new_f |