## Overview of AWS package

This is a set of tools for managing AWS for data science projects. In particular it separates the data and environment from the AWS server. This enables setup of data and programs on a free tier version before being moved to a more expensive GPU for the processing. It also enables the use of spot instances with persistent data which can save 80% of the cost of GPUs.

## Notes
    
#### How are data and programs retained?
* boot volume runs nvidia-docker
* pdrive volume is created based on most recent "cats" snapshot (or empty volume)
* pdrive is mounted as /v1
* pdrive holds docker database and program data
* on termination pdrive "cats" volume is saved to a "cats" snapshot
* if AWS initiates termination then "cats" volume needs to be saved manually
* all snapshots are retained until manually deleted

#### Why snapshots?
* cheaper storage
* can be mounted when instance created (volume cannot)
* can be attached in any availability_zone (volume is in one zone and instance would need to be in same zone)

## Setup

In [1]:
from analysis.ipstartup import *
import aws
import server
import apps
from pdrive import Pdrive
import fabric.api as fab
from config import user

fab.env.host_string = aws.get("sm1").public_ip_address
fab.env.user = user
fab.output['everything'] = True

In [2]:
# one off setup
1/0
aws.create_key()
aws.create_securityGroup()
aws.client.allocate_address()

[root:INFO]:starting (cellevents.py\31, time=09:06)


ZeroDivisionError: division by zero

time: 162 ms


## Creating and working with a new instance with a pdrive (persistent drive)

In [6]:
# create instance with 10GB volume attached called "cats". itype is free or gpu.
server.create("gpu", bootsize=None, itype="gpu", spot=True, pdrive="cats", pdrivesize=10, docker="/v1")
apps.restart_notebook()

[root:INFO]:starting (cellevents.py\31, time=09:11)


[34.248.131.79] run: docker rm -f notebook || true
[34.248.131.79] out: notebook
[34.248.131.79] out: 

[34.248.131.79] sudo: nvidia-smi
[34.248.131.79] out: Sun Feb 19 09:12:03 2017       
[34.248.131.79] out: +------------------------------------------------------+                       
[34.248.131.79] out: | NVIDIA-SMI 352.99     Driver Version: 352.99         |                       
[34.248.131.79] out: |-------------------------------+----------------------+----------------------+
[34.248.131.79] out: | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
[34.248.131.79] out: | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
[34.248.131.79] out: |   0  Tesla K80           On   | 0000:00:1E.0     Off |                    0 |
[34.248.131.79] out: | N/A   29C    P8    25W / 149W |     55MiB / 11519MiB |      0%      Default |
[34.248.131.79] out: +-------------------------------+----------------------+----------------------+
[3

In [7]:
server.terminate("gpu", False)

[root:INFO]:starting (cellevents.py\31, time=09:15)
[root:INFO]:docker stopped (apps.py\73, time=09:15)
[root:INFO]:volume dismounted (pdrive.py\108, time=09:15)
[root:INFO]:instance terminated (itools.py\179, time=09:15)
[root:INFO]:waiting until volume available (pdrive.py\135, time=09:15)
[root:INFO]:volume available (pdrive.py\137, time=09:16)
[root:INFO]:volume deleted (pdrive.py\142, time=09:16)


time: 1min 38s


## Working with an existing pdrive

In [None]:
pdrive = Pdrive("cats")
pdrive.connect("sm1")

In [None]:
pdrive.disconnect()

## Utilities

In [None]:
# get a resource by name
aws.get("sm1")

In [None]:
# get all resources (instances, volumes, snapshots)
aws.get(unique=False)

In [None]:
# show instances used
aws.get_instances()

In [None]:
# show python tasks running in containers
server.get_tasks("python")

In [None]:
# show all tasks running in containers
server.get_tasks()

## Change docker location 

In [None]:
# set to pdrive
apps.set_docker_folder("/v1")

In [None]:
# set to boot drive
apps.set_docker_folder()