## Overview of xdrive package

This package puts programs and data on a portable "xdrive" rather than an on 
the AWS instance boot drive. The "xdrive" can then be moved between different types of 
server including spot instances. Ths saves 100% of the cost of setting up data and programs by using free tier
servers; and 80% of the cost of GPUs by providing persistent storage for spot
instances. The examples here show how to setup and work with various types of server.

Note that xdrive holds minimal state so you can continue to use AWS menus in
parallel.

## Imports

In [14]:
get_ipython().magic('load_ext cellevents')
from logcon import log
from xdrive import aws, server, apps
from xdrive.drive import Drive
import fabric.api as fab
from fabric.state import connections

# True means verbose output
fab.output['everything'] = True

[root:INFO]:starting (cellevents.py\ :32, time=13:32)


The cellevents extension is already loaded. To reload it, use:
  %reload_ext cellevents
time: 21.5 ms


## Configuration

In [5]:
# create a key
try:
    key = aws.ec2.create_key_pair(KeyName="key")
    with open(keyfile, "w") as f:
        f.write(key.key_material)
except Exception as e:
    log.warning(e)

[root:INFO]:starting (cellevents.py\ :32, time=20:53)


time: 1.86 s


In [6]:
# create a security group
try:
  sec = aws.ec2.create_security_group(GroupName="simon", 
                                Description="wordpress, jupyter, ssh")
  sec.authorize_ingress(
      IpPermissions=[dict(IpProtocol='tcp', FromPort=80, ToPort=80),
                     dict(IpProtocol='tcp', FromPort=443, ToPort=443),
                     dict(IpProtocol='tcp', FromPort=8888, ToPort=8888),
                     dict(IpProtocol='tcp', FromPort=22, ToPort=22)])
except Exception as e:
    log.warning(e)

[root:INFO]:starting (cellevents.py\ :32, time=20:53)


time: 229 ms


## Setup programs and data using a free instance

In [7]:
# create a server called "kate" with a free instance; and a xdrive called "fastai" mounted at /v1 to hold programs and data
server.create("kate", itype="free", drive="fastai", drivesize=15)

[root:INFO]:starting (cellevents.py\30, time=19:15)
[root:INFO]:waiting for instance running (server.py\64, time=19:15)
[root:INFO]:instance kate running at 34.251.171.56 (server.py\75, time=19:15)
[root:INFO]:waiting for ssh (server.py\138, time=19:15)
[root:INFO]:ssh connected (server.py\147, time=19:16)


[34.251.171.56] sudo: yum install docker -y -q
[34.251.171.56] sudo: usermod -aG docker ec2-user
[34.251.171.56] sudo: pip install -q docker-compose
[34.251.171.56] out: [33mYou are using pip version 6.1.1, however version 9.0.1 is available.
[34.251.171.56] out: You should consider upgrading via the 'pip install --upgrade pip' command.[0m
[34.251.171.56] out: [33m    DEPRECATION: Uninstalling a distutils installed project (colorama) has been deprecated and will be removed in a future version. This is due to the fact that uninstalling a distutils project will only partially uninstall the project.[0m
[34.251.171.56] out: 


[root:INFO]:docker installed. if need to pull images then use ssh as this shows progress whereas fabric does not (apps.py\29, time=19:16)



[34.251.171.56] sudo: mkfs -t ext4 /dev/xvdf
[34.251.171.56] out: mke2fs 1.42.12 (29-Aug-2014)
[34.251.171.56] out: Creating filesystem with 2621440 4k blocks and 655360 inodes
[34.251.171.56] out: Filesystem UUID: 566f7072-4918-470f-ac7a-1b9e9aec2d8d
[34.251.171.56] out: Superblock backups stored on blocks: 
[34.251.171.56] out: 	32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632
[34.251.171.56] out: 
[34.251.171.56] out: Allocating group tables:  0/80     done                            
[34.251.171.56] out: Writing inode tables:  0/80     done                            
[34.251.171.56] out: Creating journal (32768 blocks): done
[34.251.171.56] out: Writing superblocks and filesystem accounting information:  0/80     done
[34.251.171.56] out: 
[34.251.171.56] out: 


[root:INFO]:volume formatted successfully (pdrive.py\100, time=19:16)



[34.251.171.56] sudo: mkdir -p /v1
[34.251.171.56] sudo: mount /dev/xvdf /v1
[34.251.171.56] sudo: chown -R ec2-user:ec2-user /v1


[root:INFO]:volume mounted (pdrive.py\107, time=19:16)


[34.251.171.56] sudo: mkdir -p /etc/docker
[34.251.171.56] put: <file obj> -> /etc/docker/daemon.json
[34.251.171.56] sudo: mkdir -p /v1/docker


ec2.Instance(id='i-0acf43b90b18cf6f5')

time: 1min 1s


It can take several minutes to pull a large docker image or data file; and doing this via a notebook either produces excessive output or a silent wait. Therefore carry out these steps via SSH as it is easier to monitor progress. 
* download data to /v1 (as required)
* docker pull simonm3/fastai (or other docker image)

Now run the notebook. Password is dl_course (the default for fastai)

In [29]:
apps.run_fastai()

[root:INFO]:starting (cellevents.py\32, time=20:56)


[34.250.24.85] run: mkdir -p /v1/.jupyter
[34.250.24.85] put: <file obj> -> /v1/.jupyter/jupyter_notebook_config.py
[34.250.24.85] run: mkdir -p /v1/nbs
[34.250.24.85] run: nvidia-docker run -v /v1:/host -v /v1/.jupyter:/home/docker/.jupyter -w /host/nbs -p 8888:8888 -d --restart=always --name fastai simonm3/fastai
[34.250.24.85] out: 192d591b72ca31116a45df638f1d3f39b43a84ad3fa3a25bf5d35ddb1493e67f
[34.250.24.85] out: 

[34.250.24.85] sudo: cp -r /var/lib/nvidia-docker /v1
[34.250.24.85] sudo: cp -r /run/docker/plugins /v1


[root:INFO]:fastai running on 34.250.24.85:8888 (apps.py\114, time=20:56)


time: 3.27 s


You can test the fastai notebook at the "kate" ip address port 8888. All of the setup time so far has used free instances and free storage. Next step is to terminate this instance. All data and programs will be preserved in a snapshot that we can later attach to a high performance instance such as a GPU.

In [75]:
server.terminate("kate")

[root:INFO]:starting (cellevents.py\30, time=21:38)
[root:INFO]:docker stopped (apps.py\65, time=21:38)
[root:INFO]:instance terminated (server.py\168, time=21:38)
[root:INFO]:waiting for snapshot. this can take 15 minutes.you can break and then delete volume manually (pdrive.py\132, time=21:38)
[root:INFO]:snapshot completed (pdrive.py\135, time=21:45)
[root:INFO]:waiting until volume available (pdrive.py\141, time=21:45)
[root:INFO]:volume available (pdrive.py\143, time=21:45)
[root:INFO]:volume deleted (pdrive.py\148, time=21:45)


time: 6min 47s


## Work with the programs and data using a GPU

Create a spot GPU server called "sarah" with the same data and programs as before. Note you don't have to run fastai as it the docker container automatically starts with docker.

In [20]:
server.create("sarah", itype="gpu", spot=True, drive="fastai")

[root:INFO]:starting (cellevents.py\ :32, time=13:16)
[root:INFO]:spot request submitted (server.py\ :133, time=13:16)
[root:INFO]:spot request fulfilled i-0405e4833bc1dfc35 (server.py\ :149, time=13:16)
[root:INFO]:waiting for instance running (server.py\ :81, time=13:16)
[root:INFO]:instance sarah running at 34.248.136.179 (server.py\ :91, time=13:16)
[root:INFO]:waiting for ssh server (server.py\ :154, time=13:16)
[root:INFO]:ssh connected 34.248.136.179 (server.py\ :163, time=13:18)


[34.248.136.179] sudo: mkdir -p /v1
[34.248.136.179] sudo: mount /dev/xvdf /v1
[34.248.136.179] sudo: chown -R ec2-user:ec2-user /v1


[root:INFO]:volume mounted (drive.py\ :106, time=13:18)


[34.248.136.179] sudo: yum install docker -y -q
[34.248.136.179] sudo: usermod -aG docker ec2-user
[34.248.136.179] sudo: pip install -q docker-compose
[34.248.136.179] out: [33mYou are using pip version 6.1.1, however version 9.0.1 is available.
[34.248.136.179] out: You should consider upgrading via the 'pip install --upgrade pip' command.[0m
[34.248.136.179] out: [33m    DEPRECATION: Uninstalling a distutils installed project (colorama) has been deprecated and will be removed in a future version. This is due to the fact that uninstalling a distutils project will only partially uninstall the project.[0m
[34.248.136.179] out: 


[root:INFO]:docker installed. if need to pull images then use ssh as this shows progress whereas fabric does not (apps.py\ :26, time=13:18)



[34.248.136.179] sudo: mkdir -p /etc/docker
[34.248.136.179] put: <file obj> -> /etc/docker/daemon.json
[34.248.136.179] sudo: mkdir -p /v1/docker
[34.248.136.179] sudo: wget -P /tmp https://github.com/NVIDIA/nvidia-docker/releases/download/v1.0.0/nvidia-docker_1.0.0_amd64.tar.xz
[34.248.136.179] out: --2017-03-16 13:18:57--  https://github.com/NVIDIA/nvidia-docker/releases/download/v1.0.0/nvidia-docker_1.0.0_amd64.tar.xz
[34.248.136.179] out: Resolving github.com (github.com)... 192.30.253.112, 192.30.253.113
[34.248.136.179] out: Connecting to github.com (github.com)|192.30.253.112|:443... connected.
[34.248.136.179] out: HTTP request sent, awaiting response... 302 Found
[34.248.136.179] out: Location: https://github-cloud.s3.amazonaws.com/releases/45557469/329876a2-dd88-11e6-8eaa-692a2a93c70b.xz?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAISTNZFOVBIJMK3TQ%2F20170316%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20170316T131857Z&X-Amz-Expires=300&X-Amz-Signature=d256ca626bc1e

[root:INFO]:nvidia-docker-plugin is running (apps.py\ :41, time=13:18)
[root:INFO]:instance sarah ready at 34.248.136.179 (server.py\ :119, time=13:18)


[34.248.136.179] put: C:\Users\s\Documents\py\apps\config\_creds.py -> /home/ec2-user/_creds.py


ec2.Instance(id='i-0405e4833bc1dfc35')

time: 2min 33s


You may have to wait a minute or so for the container and notebook to start on "sarah" ip address port 8888. 

When you have finished working then terminate the server. Note that calling server.terminate("sarah") saves the xdrive as a snapshot including all data and programs. On termination by AWS (e.g. if outbid on spot instance) or via the AWS menu, the volume will remain but will not automatically be saved as a snapshot. In this case use the AWS menus to save as snapshot and delete the volume. It would be possible to automate this by capturing AWS termination notices.

In [22]:
server.terminate("sarah")

[root:INFO]:starting (cellevents.py\ :32, time=15:43)
[root:INFO]:docker stopped (apps.py\ :64, time=15:43)
[root:INFO]:volume dismounted (drive.py\ :113, time=15:43)
[root:INFO]:instance terminated (server.py\ :184, time=15:43)
[root:INFO]:waiting for snapshot. this can take 15 minutes.Have a cup of tea. (drive.py\ :139, time=15:43)
[root:INFO]:snapshot completed (drive.py\ :142, time=15:52)
[root:INFO]:waiting until volume available (drive.py\ :131, time=15:52)
[root:INFO]:volume available (drive.py\ :133, time=15:52)
[root:INFO]:volume deleted (drive.py\ :150, time=15:52)


time: 8min 52s


## Create more servers

It is possible to create servers without a xdrive. For example you may want to create a server with a static IP address running wordpress.
* request an elastic ip address from AWS (this is free as long as attached to a running instance)
* run script below

In [None]:
instance = server.create("sm")
# attach to the first elastic ip address on your account
fab.env.host_string = aws.get_ips()[0]
aws.client.associate_address(InstanceId=instance.instance_id,
                             PublicIp=fab.env.host_string)
#### watch aws console until address confirmed before continuing

In [7]:
connections.connect(fab.env.host_string)
server.wait_ssh()
apps.install_docker()
##### via ssh service docker start; docker pull python

[root:INFO]:starting (cellevents.py\ :32, time=14:34)
[root:INFO]:waiting for ssh server (server.py\ :155, time=14:34)
[root:INFO]:ssh connected 34.248.84.101 (server.py\ :164, time=14:34)


[34.248.84.101] sudo: yum install docker -y -q
[34.248.84.101] sudo: usermod -aG docker ec2-user
[34.248.84.101] sudo: pip install -q docker-compose
[34.248.84.101] out: [33mYou are using pip version 6.1.1, however version 9.0.1 is available.
[34.248.84.101] out: You should consider upgrading via the 'pip install --upgrade pip' command.[0m
[34.248.84.101] out: [33m    DEPRECATION: Uninstalling a distutils installed project (colorama) has been deprecated and will be removed in a future version. This is due to the fact that uninstalling a distutils project will only partially uninstall the project.[0m
[34.248.84.101] out: 


[root:INFO]:docker installed. if need to pull images then use ssh as this shows progress whereas fabric does not (apps.py\ :29, time=14:34)



[34.248.84.101] sudo: service docker start
[34.248.84.101] out: Starting cgconfig service: [60G[[0;32m  OK  [0;39m]
[34.248.84.101] out: 
[34.248.84.101] out: Starting docker:	.[60G[[0;32m  OK  [0;39m]
[34.248.84.101] out: 
[34.248.84.101] out: 



'Starting cgconfig service: \x1b[60G[\x1b[0;32m  OK  \x1b[0;39m]\r\r\nStarting docker:\t.\x1b[60G[\x1b[0;32m  OK  \x1b[0;39m]'

time: 9.33 s


In [None]:
apps.install_wordpress()

In [None]:
apps.install_python("meetups", "$HOME:/root")

[root:INFO]:starting (cellevents.py\ :32, time=17:32)


[34.248.84.101] put: C:\Users\s\.meetups\creds.yaml -> /home/ec2-user/.meetups/creds.yaml
[34.248.84.101] put: C:\Users\s\.xtools\creds.yaml -> /home/ec2-user/.xtools/creds.yaml
[34.248.84.101] put: C:\Users\s\.logconfig.yaml -> /home/ec2-user/.logconfig.yaml
[34.248.84.101] run: docker rm -f meetups
[34.248.84.101] out: meetups
[34.248.84.101] out: 

[34.248.84.101] run: docker run -v $HOME:/root --name meetups -di python
[34.248.84.101] out: 8e2cb90b5ea037520b0259f146fe4b52a65c2be6af6d626b28c5a7b9079bba7f
[34.248.84.101] out: 

[34.248.84.101] run: docker exec meetups pip install meetups
[34.248.84.101] out: The directory '/root/.cache/pip/http' or its parent directory is not owned by the current user and the cache has been disabled. Please check the permissions and owner of that directory. If executing pip with sudo, you may want sudo's -H flag.
[34.248.84.101] out: The directory '/root/.cache/pip' or its parent directory is not owned by the current user and caching wheels has been 

## Work with an existing xdrive

Typically you will create a server and xdrive at the same time. However sometimes you may want to attach the xdrive to an existing instance. This is also possible with the commands below.

WARNING. Disconnect from a running instance does not work well. I am not sure how to do this cleanly. Sometimes it does not dismount even when not in use. Sometimes it dismounts but then won't detach but AWS does not report the error. Sometimes the volume is deleted from AWS and no longer appears on the AWS console but the data is still visible on the instance!! Alternatives are:
* Shutdown the instance first
* OR wait for snapshot to complete then check on the AWS console and if not deleted then use the menus to force detach/delete manually; and possibly reboot the instance.

In [12]:
xdrive = Drive("fastai")
xdrive.connect("sm")

[root:INFO]:starting (cellevents.py\ :32, time=11:04)
[root:INFO]:waiting until volume available (drive.py\ :73, time=11:04)
[root:INFO]:volume available (drive.py\ :75, time=11:05)
[root:INFO]:waiting for device to be visible (drive.py\ :79, time=11:05)
[root:INFO]:volume attached (drive.py\ :85, time=11:05)


[34.248.84.101] sudo: mkdir -p /v1
[34.248.84.101] sudo: mount /dev/xvdf /v1
[34.248.84.101] sudo: chown -R ec2-user:ec2-user /v1


[root:INFO]:volume mounted (drive.py\ :104, time=11:05)


time: 25.1 s


The xdrive is now attached as /v1 to the server. In this case docker is not installed or started automatically. You can work with /v1 as required. When finished disconnect.

In [14]:
xdrive.disconnect()

[root:INFO]:starting (cellevents.py\ :32, time=11:13)
[root:INFO]:volume dismounted (drive.py\ :113, time=11:13)
[root:INFO]:detach request sent (drive.py\ :130, time=11:13)
[root:INFO]:waiting until volume available (drive.py\ :131, time=11:13)
[root:INFO]:volume available (drive.py\ :133, time=11:13)
[root:INFO]:waiting for snapshot. this can take 15 minutes.Have a cup of tea. (drive.py\ :139, time=11:13)
[root:INFO]:snapshot completed (drive.py\ :142, time=11:14)
[root:INFO]:volume deleted (drive.py\ :150, time=11:14)


time: 1min 36s


## Utilities

As a bonus there are a number of utilities available as below. Also, for convenience, all resources (instances, volumes, snapshots) can be referred to by name rather than the amazon 20 character id.

In [15]:
# get a resource by name
aws.get("sm")

[root:INFO]:starting (cellevents.py\ :32, time=11:15)


ec2.Instance(id='i-027978907ff7cef35')

time: 1.19 s


In [16]:
# get all resources (instances, volumes, snapshots)
aws.get(unique=False)

[root:INFO]:starting (cellevents.py\ :32, time=11:15)


[ec2.Instance(id='i-027978907ff7cef35'),
 ec2.Volume(id='vol-0918ce5cbb48963c2'),
 ec2.Snapshot(id='snap-0bc4360a60ab599bc'),
 ec2.Snapshot(id='snap-0b85efcea9bdb7ca8')]

time: 611 ms


In [4]:
# show instances used
aws.get_instances()

[root:INFO]:starting (cellevents.py\ :32, time=20:42)


Unnamed: 0,name,instance_id,image,type,state,ip
0,sm,i-027978907ff7cef35,ami-c51e3eb6,t2.micro,running,34.248.84.101


time: 1.85 s


In [None]:
# show python tasks running in containers
fab.env.host_string=aws.get("sm").public_ip_addres
server.get_tasks("python")