# Logistic Regression
---
This notebook uses Cirrus to run logistic regression on the Criteo dataset.

## Setup
---

In [1]:
# To ease development, each time a cell is run, all modules will be reloaded.
%load_ext autoreload
%autoreload 2

In [2]:
import logging
import sys
import atexit

from cirrus import instance, automate, sm

In [3]:
# Cirrus produces logs, but they will not show unless we add a handler that prints.
handler = logging.StreamHandler(sys.stdout)
formatter = logging.Formatter("[%(funcName)16s | %(threadName)15s] %(message)s")
handler.setFormatter(formatter)
logger = logging.getLogger("cirrus")
logger.setLevel(logging.DEBUG)
logger.addHandler(handler)

## Instance, server, and task
---

First, we start an EC2 instance.

In [4]:
inst = instance.Instance(
    name="sm_example_instance",
    disk_size=32,
    typ="m5a.2xlarge",
    username="ubuntu",
    ami_owner_name=("self", "cirrus_server_image")
)
inst.start()

[        __init__ |      MainThread] __init__: Resolving AMI owner/name to AMI ID.
[    ec2_resource |      MainThread] ClientManager: Initializing EC2 resource.
[          config |      MainThread] config: Configuration loaded.
[        __init__ |      MainThread] __init__: Done.
[         _exists |      MainThread] _exists: Listing instances.
[         _exists |      MainThread] _exists: No existing instance with the same name was found.
[ _start_and_wait |      MainThread] _start_and_wait: Starting a new instance.
[ _start_and_wait |      MainThread] _start_and_wait: Waiting for instance to enter running state.
[ _start_and_wait |      MainThread] _start_and_wait: Fetching instance metadata.
[ _start_and_wait |      MainThread] _start_and_wait: Done.
[           start |      MainThread] start: Done.


Second, we create a parameter server to run on our instance.

In [5]:
server = automate.ParameterServer(
    instance=inst,
    ps_port=1337,
    error_port=1338,
    num_workers=64
)

Third, we define our machine learning task.

In [6]:
task = sm.Softmax(
    n_workers=16,
    n_ps=1,
    dataset="cirrus-mnist",
    learning_rate=0.00001,
    epsilon=0.0001,
    progress_callback=None,
    train_set=(0, 20),
    test_set=(21, 42),
    minibatch_size=20,
    model_bits=19,
    ps=server,
    opt_method="sgd",
    timeout=60,
    lambda_size=192
)

## Run
---

Next, we run our machine learning task.

In [7]:
task.run()

[           start |      MainThread] start: Uploading configuration.
[     run_command |      MainThread] run_command: Calling _connect_ssh.
[    _connect_ssh |      MainThread] _connect_ssh: Configuring.
[    _connect_ssh |      MainThread] _connect_ssh: Making connection attempt #1 out of 20.
[     run_command |      MainThread] run_command: Running `echo 'load_input_path: /mnt/efs/mnist/train_mnist1.csv 
load_input_type: csv
num_classes: 10 
num_features: 784 
limit_cols: 1000 
normalize: 1 
limit_samples: 50000000 
s3_size: 1000 
use_bias: 1 
model_type: Softmax 
minibatch_size: 20 
learning_rate: 0.000010 
epsilon: 0.000100 
model_bits: 19 
dataset_format: binary 
s3_bucket: cirrus-mnist 
use_grad_threshold: 0 
grad_threshold: 0.001000 
train_set: 0-20 
test_set: 21-42' > config_1337.txt`.
[     run_command |      MainThread] run_command: Waiting for completion.
[     run_command |      MainThread] run_command: Fetching stdout and stderr.
[     run_command |      MainThread] run_c

[   launch_worker | Exp #00 Wkr #04] launch_worker: Task 40000 completed with status code 200.
[   launch_worker | Exp #00 Wkr #08] launch_worker: Task 80000 completed with status code 200.
[   launch_worker | Exp #00 Wkr #05] launch_worker: Task 50000 completed with status code 200.
[   launch_worker | Exp #00 Wkr #14] launch_worker: Task 140000 completed with status code 200.
[   launch_worker | Exp #00 Wkr #07] launch_worker: Task 70000 completed with status code 200.
[   launch_worker | Exp #00 Wkr #15] launch_worker: Task 150000 completed with status code 200.
[   launch_worker | Exp #00 Wkr #01] launch_worker: Task 10000 completed with status code 200.


Run this cell to see the present accuracy of the model.

In [13]:
for line in server.error_output().split("\n")[-10:]:
    print(line)

[     run_command |      MainThread] run_command: Running `cat error_out_1337`.
[     run_command |      MainThread] run_command: Waiting for completion.
[     run_command |      MainThread] run_command: Fetching stdout and stderr.
[     run_command |      MainThread] run_command: stdout had length 48002.
[     run_command |      MainThread] run_command: stderr had length 0.
[     run_command |      MainThread] run_command: Exit code was 0.
[     run_command |      MainThread] run_command: Done.
inside if
[ERROR_TASK] received the model
[ERROR_TASK] computing loss.
[ERROR_TASK] Loss (Total/Avg): 5558.59/0.264695 Accuracy: 0.125476 time(us): 1545181194495157 time from start (sec): 45.5703
[ERROR_TASK] getting the full model
inside if
[ERROR_TASK] received the model
[ERROR_TASK] computing loss.
[ERROR_TASK] Loss (Total/Avg): 5558.59/0.264695 Accuracy: 0.125476 time(us): 1545181194722030 time from start (sec): 45.7972

[           new_f | Exp #00 Wkr #00] jittery_exponential_backoff: Maki

## Cleanup
---

When we're satisfied with the results, we kill our task.

In [14]:
task.kill()

[   launch_worker | Exp #00 Wkr #12] launch_worker: Task 120006 completed with status code 200.
[   launch_worker | Exp #00 Wkr #00] launch_worker: Task 6 completed with status code 200.
[   launch_worker | Exp #00 Wkr #09] launch_worker: Task 90006 completed with status code 200.
[   launch_worker | Exp #00 Wkr #14] launch_worker: Task 140006 completed with status code 200.
[   launch_worker | Exp #00 Wkr #10] launch_worker: Task 100006 completed with status code 200.
[   launch_worker | Exp #00 Wkr #01] launch_worker: Task 10006 completed with status code 200.
[   launch_worker | Exp #00 Wkr #02] launch_worker: Task 20006 completed with status code 200.
[   launch_worker | Exp #00 Wkr #03] launch_worker: Task 30006 completed with status code 200.
[   launch_worker | Exp #00 Wkr #11] launch_worker: Task 110006 completed with status code 200.
[   launch_worker | Exp #00 Wkr #13] launch_worker: Task 130006 completed with status code 200.
[   launch_worker | Exp #00 Wkr #15] launch_worke

RuntimeError: `kill -n 9 $(cat error_1337.pid)` returned nonzero exit code 2.

We also need to terminate our instance in order to avoid continuing charges.

In [15]:
inst.cleanup()

[         cleanup |      MainThread] cleanup: Closing SSH client.
[         cleanup |      MainThread] cleanup: Terminating instance.
[         cleanup |      MainThread] cleanup: Waiting for instance to terminate.
[         cleanup |      MainThread] cleanup: Done.
