# Cross Validation
---
This notebook uses Cirrus to perform cross validation on a logistic regression model.

## Setup
---

In [1]:
# To ease development, each time a cell is run, all modules will be reloaded.
%load_ext autoreload
%autoreload 2

In [2]:
import logging
import sys
import atexit

In [3]:
# Cirrus produces logs, but they will not show unless we add a handler that prints.
from cirrus import utilities
utilities.set_logging_handler()

In [4]:
from cirrus import instance, automate, lr, GridSearch, cv_utils

[     _initialize | ResourceManager] Initializing no-retries Lambda client.
[     _initialize | ResourceManager] Initializing IAM resource.
[     _initialize | ResourceManager] Initializing EC2 resource.
[     _initialize | ResourceManager] Initializing Cloudwatch Logs client.
[     _initialize | ResourceManager] Initializing S3 resource.
[     _initialize | ResourceManager] Initializing STS client.


## Instances, base task configuration, hyperparameters
---

First, we start some EC2 instances.

In [5]:
NUM_INSTANCES = 2

instances = []
for i in range(NUM_INSTANCES):
    inst = instance.Instance(
        name="hyperparameter_example_instance_%d" % i,
        disk_size=32,
        typ="m4.2xlarge",
        username="ubuntu",
        ami_owner_name=("self", "cirrus_server_image")
    )
    inst.start()
    instances.append(inst)

[        __init__ |      MainThread] Resolving AMI owner/name to AMI ID.
[        __init__ |      MainThread] Done.
[         _exists |      MainThread] Listing instances.
[         _exists |      MainThread] No existing instance with the same name was found.
[ _start_and_wait |      MainThread] Starting a new instance.
[ _start_and_wait |      MainThread] Waiting for instance to enter running state.
[ _start_and_wait |      MainThread] Fetching instance metadata.
[ _start_and_wait |      MainThread] Done.
[           start |      MainThread] Done.
[        __init__ |      MainThread] Resolving AMI owner/name to AMI ID.
[        __init__ |      MainThread] Done.
[         _exists |      MainThread] Listing instances.
[         _exists |      MainThread] No existing instance with the same name was found.
[ _start_and_wait |      MainThread] Starting a new instance.
[ _start_and_wait |      MainThread] Waiting for instance to enter running state.
[ _start_and_wait |      MainThread] Fetc

Second, we define the base configuration for our machine learning task.

In [6]:
base_task_config = {
    "n_workers": 16,
    "n_ps": 1,
    "dataset": "criteo-kaggle-19b",
    "learning_rate": 0.0001,
    "epsilon": 0.0001,
    "progress_callback": None,
    "train_set": [(0, 799)],
    "test_set": (800, 850),
    "minibatch_size": 200,
    "model_bits": 19,
    "opt_method": "adagrad",
    "timeout": 60,
    "lambda_size": 192
}

Third, we identify our hyperparameters and their possible values.

In [7]:
hyperparameter_names = [
    "train_set",
    "test_set"
]
hyperparameter_values = cv_utils.create_test_train_sets(k = 4, num_sets = 200)
print(hyperparameter_values)

[[[[50, 99], [100, 149], [150, 199]], [[0, 49], [100, 149], [150, 199]], [[0, 49], [50, 99], [150, 199]], [[0, 49], [50, 99], [100, 149]]], [(0, 49), (50, 99), (100, 149), (150, 199)]]


All of the above defines a hyperparameter optimization task, which consists of one machine learning task per assignment of values to the hyperparameters.

In [8]:
search = GridSearch(
    task=lr.LogisticRegression,
    param_base=base_task_config,
    hyper_vars=hyperparameter_names,
    hyper_params=hyperparameter_values,
    instances=instances,
    cross_validation=True
)

[([[50, 99], [100, 149], [150, 199]], (0, 49)), ([[0, 49], [100, 149], [150, 199]], (50, 99)), ([[0, 49], [50, 99], [150, 199]], (100, 149)), ([[0, 49], [50, 99], [100, 149]], (150, 199))]


## Run
---

Next, we run our hyperparameter optimization task.

In [9]:
search.run()

[get_available_concurrency |      MainThread] Getting account settings.
[get_available_concurrency |      MainThread] Done.
(0, 49)
[           start |      MainThread] Uploading configuration.
[           start |      MainThread] Starting parameter server.
[           start |      MainThread] Starting error task.
(50, 99)
[           start |      MainThread] Uploading configuration.
[           start |      MainThread] Starting parameter server.
[           start |      MainThread] Starting error task.
[wait_until_started |        Thread-8] Making connection attempt #1 to ParameterServer@34.212.25.110:1337.
(100, 149)
[           start |      MainThread] Uploading configuration.
[           start |      MainThread] Starting parameter server.
[           start |      MainThread] Starting error task.
[wait_until_started |        Thread-9] Making connection attempt #1 to ParameterServer@35.165.68.50:1339.
(150, 199)
[           start |      MainThread] Uploading configuration.
[         

[   launch_worker | Exp #00 Wkr #04] Launching Task 40000.
[           new_f | Exp #00 Wkr #05] jittery_exponential_backoff: Making attempt #1.
[   launch_worker | Exp #00 Wkr #05] Launching Task 50000.
[           new_f | Exp #00 Wkr #06] jittery_exponential_backoff: Making attempt #1.
[   launch_worker | Exp #00 Wkr #06] Launching Task 60000.
[           new_f | Exp #00 Wkr #08] jittery_exponential_backoff: Making attempt #1.
[   launch_worker | Exp #00 Wkr #08] Launching Task 80000.
[           new_f | Exp #00 Wkr #10] jittery_exponential_backoff: Making attempt #1.
[   launch_worker | Exp #00 Wkr #10] Launching Task 100000.
[           new_f | Exp #00 Wkr #12] jittery_exponential_backoff: Making attempt #1.
[   launch_worker | Exp #00 Wkr #12] Launching Task 120000.
[     make_lambda |       Thread-10] Allocating reserved concurrent executions to the Lambda.
[           new_f | Exp #00 Wkr #14] jittery_exponential_backoff: Making attempt #1.
[   launch_worker | Exp #00 Wkr #14] Lau

Run this cell to see the present accuracy of experiment `I`.

In [11]:
print(cv_utils.get_cv_acc(search.cirrus_objs))

[     run_command |      MainThread] Running `cat error_out_1337`.
[     run_command |      MainThread] Waiting for completion.
[     run_command |      MainThread] Fetching stdout and stderr.
[     run_command |      MainThread] stdout had length 13983.
[     run_command |      MainThread] stderr had length 0.
[     run_command |      MainThread] Exit code was 0.
[     run_command |      MainThread] Done.
[     run_command |      MainThread] Running `cat error_out_1339`.
[     run_command |      MainThread] Waiting for completion.
[     run_command |      MainThread] Fetching stdout and stderr.
[     run_command |      MainThread] stdout had length 13127.
[     run_command |      MainThread] stderr had length 0.
[     run_command |      MainThread] Exit code was 0.
[     run_command |      MainThread] Done.
[     run_command |      MainThread] Running `cat error_out_1341`.
[     run_command |      MainThread] Waiting for completion.
[     run_command |      MainThread] Fetching stdout

## Cleanup
---

When we're satisfied with the results, we kill our task.

In [12]:
search.kill_all()

[   delete_lambda | Exp #00 Cleanup] Deleting Lambda function cirrus_worker_0_2019-3-8_15-9-52-891777.
[     run_command |      MainThread] Running `kill -n 9 $(cat error_1337.pid)`.
[     run_command |      MainThread] Waiting for completion.
[     run_command |      MainThread] Fetching stdout and stderr.
[     run_command |      MainThread] stdout had length 0.
[     run_command |      MainThread] stderr had length 0.
[     run_command |      MainThread] Exit code was 0.
[     run_command |      MainThread] Done.
[     run_command |      MainThread] Running `kill -n 9 $(cat ps_1337.pid)`.
[     run_command |      MainThread] Waiting for completion.
[     run_command |      MainThread] Fetching stdout and stderr.
[     run_command |      MainThread] stdout had length 0.
[     run_command |      MainThread] stderr had length 0.
[     run_command |      MainThread] Exit code was 0.
[     run_command |      MainThread] Done.
[   launch_worker | Exp #00 Wkr #05] Task 50000 completed with

[   launch_worker | Exp #03 Wkr #3015] Task 30150000 completed with status code 200.
[   launch_worker | Exp #03 Wkr #3011] Task 30110000 completed with status code 200.
[   launch_worker | Exp #03 Wkr #3014] Task 30140000 completed with status code 200.
[   launch_worker | Exp #03 Wkr #3013] Task 30130000 completed with status code 200.
[   launch_worker | Exp #03 Wkr #3005] Task 30050000 completed with status code 200.
[   launch_worker | Exp #03 Wkr #3006] Task 30060000 completed with status code 200.
[   launch_worker | Exp #03 Wkr #3007] Task 30070000 completed with status code 200.
[   launch_worker | Exp #03 Wkr #3001] Task 30010000 completed with status code 200.
[   launch_worker | Exp #03 Wkr #3010] Task 30100000 completed with status code 200.
[   launch_worker | Exp #03 Wkr #3008] Task 30080000 completed with status code 200.


We also need to terminate our instances in order to avoid continuing charges.

In [None]:
for inst in instances:
    inst.cleanup()

[         cleanup |      MainThread] Closing SSH client.
[         cleanup |      MainThread] Terminating instance.
[         cleanup |      MainThread] Waiting for instance to terminate.
[         cleanup |      MainThread] Done.
[         cleanup |      MainThread] Closing SSH client.
[         cleanup |      MainThread] Terminating instance.
[         cleanup |      MainThread] Waiting for instance to terminate.


If a cell errors, running this should clean up any resources that were created. After running this cell, the kernel will become unusable and need to be restarted.

In [None]:
atexit._run_exitfuncs()