# Amazon Basics 4: Volumes and Snapshots

## Usage Notes

The purpose of this notebook is to give you a quick introduction to how disk space works in Amazon's Elastic Compute Cloud (EC2).

## Notebook Imports

In [None]:
from aws_base import *
from aws_util import *

## Understanding Volumes

Disk space for Amazon EC2 works a lot like disk space for regular VMs in the sense that there is something roughly equivalent to a `vmdk` or `vdi` that's created.

https://en.wikipedia.org/wiki/VMDK

What makes Amazon EC2 easy to understand is that similar to having VMs on your external hard drive, the hard disk for an Amazon EC2 instance effectively lives on an external system referred to as Elastic Block Store.

https://aws.amazon.com/ebs/

Like Amazon S3, you are billed per gigabyte per month for storing data (in this case, having volumes) in this storage system.

https://aws.amazon.com/ebs/pricing/

As you run a virtual machine on your local machine, bits of it get loaded into main memory. Similarly, an EC2 instance volume lives in cloud storage (conceptually similar to an Amazon S3 bucket). As the virtual machine runs and asks for files, fragments of that volume transfer to the machine itself.

## Understanding Snapshots

At any time, you can take a snapshot of the volumes for a running (or ideally, shutdown) virtual machine. These snapshots allow you to create something called an Amazon Machine Image (AMI).

We briefly touched on this in the introduction to this notebook series, but basically, an Amazon Machine Image is a backup of a volume that can be used as a starting point for a virtual machine. It might be as simple as an operating system, where you can do as much as you want and start over, or it can be complex as a complete database installation, where you can create a brand new snapshot so that you don't have to redo any work.

AMIs are stored in Amazon S3. This means that if you take a snapshot, you incur the costs of S3 storage in order to maintain that AMI. These costs are separate from the costs of the volumes in Elastic Block Store.

http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/AmazonS3.html

Knowing that AMIs originate in S3 explains why creating a snapshot of a large database on Amazon EC2 and restoring it is extremely slow. It's because you're effectively streaming the database volume from Amazon S3 to Amazon EBS, and then from Amazon EBS back to your requested EC2 instance.

## Ephemeral Block Devices

With that being said, you aren't restricted to just the disk stored in Amazon EBS. When you initialize a VM, you also have the option to request access to the local disks (or at least, effectively local disks) attached to the virtual machine. These are referred to as ephemeral block devices.

http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/InstanceStorage.html

The amount of storage on these systems can be quite impressive for the higher tier machines, making them ideal for anything involving large amounts of data. However, for longer duration workloads, the pricing of these higher tier machines is much higher than running an extremely large EBS volume attached to a cheaper machine.

http://www.ec2instances.info/

These block devices are referred to as ephemeral storage because all data on this is lost if the server is ever shut down (the exception being a reboot that isn't a formal shut down). Therefore, while it is recommended that you at least ask for these volumes if you spin up a new virtual machine, be careful about storing data here if you plan on shutting down the server.

In [None]:
"""
List of block device counts based on http://ec2instances.info
"""
ephemeral_counts = {
    'c1.medium': 1, 'c1.xlarge': 4,
    'c3.large': 2, 'c3.xlarge': 2, 'c3.2xlarge': 2, 'c3.4xlarge': 2, 'c3.8xlarge': 2,
    'c4.large': 0, 'c4.xlarge': 0, 'c4.2xlarge': 0, 'c4.4xlarge': 0, 'c4.8xlarge': 0,
    'c5.large': 0, 'c5.xlarge': 0, 'c5.2xlarge': 0, 'c5.4xlarge': 0, 'c5.9xlarge': 0, 'c5.18xlarge': 0,
    'c5d.large': 1, 'c5.xlarge': 1, 'c5.2xlarge': 1, 'c5.4xlarge': 1, 'c5.9xlarge': 1, 'c5.18xlarge': 2,
    'cc2.8xlarge': 4,
    'cg1.4xlarge': 2,
    'cr1.8xlarge': 2,
    'd2.xlarge': 3, 'd2.2xlarge': 6, 'd2.4xlarge': 12, 'd2.8xlarge': 24,
    'g2.2xlarge': 1, 'g2.8xlarge': 2,
    'hi1.4xlarge': 2,
    'hs1.8xlarge': 24,
    'i2.xlarge': 1, 'i2.2xlarge': 2, 'i2.4xlarge': 4, 'i2.8xlarge': 8,
    'i3.large': 1, 'i3.xlarge': 1, 'i3.2xlarge': 1, 'i3.4xlarge': 2, 'i3.8xlarge': 4, 'i3.16xlarge': 8,
    'm1.small': 2, 'm1.medium': 1, 'm1.large': 2, 'm1.xlarge': 4,
    'm2.xlarge': 1, 'm2.2xlarge': 1, 'm2.4xlarge': 2,
    'm3.medium': 1, 'm3.large': 1, 'm3.xlarge': 1, 'm3.2xlarge': 2,
    'm4.large': 0, 'm4.xlarge': 0, 'm4.2xlarge': 0, 'm4.4xlarge': 0, 'm4.10xlarge': 0, 'm4.16xlarge': 0,
    'm5.large': 0, 'm5.xlarge': 0, 'm5.2xlarge': 0, 'm5.4xlarge': 0, 'm5.12xlarge': 0, 'm5.24xlarge': 0,
    'm5d.large': 1, 'm5d.xlarge': 1, 'm5d.2xlarge': 1, 'm5d.4xlarge': 2, 'm5d.12xlarge': 2, 'm5d.24xlarge': 4,
    'p2.xlarge': 0, 'p2.8xlarge': 0, 'p2.16xlarge': 0,
    'r3.large': 1, 'r3.xlarge': 1, 'r3.2xlarge': 1, 'r3.4xlarge': 1, 'r3.8xlarge': 2,
    'r4.large': 0, 'r4.xlarge': 0, 'r4.2xlarge': 0, 'r4.4xlarge': 0, 'r4.8xlarge': 0, 'r4.16xlarge': 0,
    't1.micro': 0,
    't2.nano': 0, 't2.micro': 0, 't2.small': 0, 't2.medium': 0, 't2.large': 0,
    'x1.16xlarge': 1, 'x1.32xlarge': 2,
    'x1e.xlarge': 1, 'x1e.2xlarge': 1, 'x1e.4xlarge': 1, 'x1e.16xlarge': 1, 'x1e.32xlarge': 2
}

More information on how devices are named is listed here.

* http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/device_naming.html

In [None]:
"""
Utility method which returns block storage.
"""
def get_block_devices(image_id, instance_type, volume_size):
    image_info = aws(
        'ec2', 'describe-images', '--image-ids', image_id, '--query',
        'Images[0]')

    root_devices = [
        {
            'DeviceName': image_info['RootDeviceName'],
            'Ebs': {
                'VolumeSize': volume_size,
                'DeleteOnTermination': True,
                'VolumeType': 'gp2'
            }
        }
    ]

    ephemeral_mappings = image_info['BlockDeviceMappings'][1:]

    ephemeral_devices = [
        {
          "DeviceName" : mapping['DeviceName'],
          "VirtualName" : mapping['VirtualName']
        }
        for mapping in ephemeral_mappings[0:ephemeral_counts[instance_type]]
    ]

    return root_devices + ephemeral_devices

## Add Disk Mounts

Of course, simply knowing how many volumes are available doesn't automatically translate to using it. In addition to requesting the volumes, you have to make sure that the volumes are mounted.

The following script assumes a Linux environment and mounts those volumes. If you are using Windows, you will have to remember to mount these volumes manually, if you plan to use them.

In [None]:
%%writefile scripts/extra_storage.sh
#!/bin/bash

VOLUMES=$(lsblk | grep -v '/$' | sed -n '2,$p' | cut -d' ' -f 1)
INSTANCE_TYPE=$(curl -s http://169.254.169.254/latest/meta-data/instance-type)

sudo umount /mnt

echo $INSTANCE_TYPE

NEW_MNT=

if [ "" != "$VOLUMES" ]; then
    for volume in $VOLUMES; do
        echo $volume

        if [ "" == "$(sudo file -s /dev/$volume | grep -i '\(linux\|boot\|swap\)')" ]; then
            sudo mkfs -t ext4 /dev/$volume
        fi

        sudo mkdir -p /$volume
        sudo mount /dev/$volume /$volume

        if [ "" == "$NEW_MNT" ] && [ "" == "$(sudo file -s /dev/$volume | grep -iF boot)" ]; then
            sudo mount --bind /$volume /mnt
            NEW_MNT=$volume
        fi
    done
fi

In [None]:
"""
Utility method which mounts all available volumes.
"""
def extra_storage(user_name, host_names):
    run_script(user_name, host_names, 'extra_storage.sh')

## Enable Swap Space

This script creates a swap partition. You can use the local storage of the machines (available for m3 and r3 machines, for example) mounted at `/mnt` in order to benefit from the higher speed transfers. For machines that do not have this storage, it will create the directory and use the EBS storage for its swap space.

In [None]:
%%writefile scripts/enable_swap.sh
#!/bin/bash

SWAP_SIZE=$(head -1 swapsize.txt)

for swapdisk in $(/sbin/swapon -s | grep -F '/dev' | cut -d' ' -f 1); do
    sudo /sbin/swapoff $swapdisk
done

echo "Creating ${SWAP_SIZE}g swap partition"

# Make sure that everything persists across restarts.

echo '#!/bin/bash' | sudo tee /etc/rc2.d/S01enable_swap
echo "SWAP_SIZE=$SWAP_SIZE" | sudo tee -a /etc/rc2.d/S01enable_swap

echo 'case "$1" in
start)
    mkdir /var/lock/subsys 2>/dev/null
    touch /var/lock/subsys/listener

    # Add designated amount of swap space
    # stackoverflow.com/questions/17173972/how-do-you-add-swap-to-an-ec2-instance

    if [ "" == "$(/sbin/swapon -s)" ]; then
        if [ ! -f /mnt/swapfile ]; then
            sudo dd if=/dev/zero of=/mnt/swapfile bs=1G seek=0 count=$SWAP_SIZE
            sudo chmod og-rw /mnt/swapfile
            sudo /sbin/mkswap /mnt/swapfile
        fi

        sudo /sbin/swapon /mnt/swapfile
    fi ;;
*)
    echo error
    exit 1 ;;
esac' | sudo tee -a /etc/rc2.d/S01enable_swap

# Run the same script that will be run on startup.

sudo chmod a+rwx /etc/rc2.d/S01enable_swap
sudo /etc/rc2.d/S01enable_swap start

Finally, a utility method that we can call from any of our installation notebooks to set how much swap space we would like to initialize on the system.

In [None]:
"""
Utility method which creates a swap partition of the specified size.
"""
def enable_swap(user_name, host_names, swap_size):
    with open('awscli/swapsize.txt', 'w') as swapsize_file:
        swapsize_file.write(str(swap_size))

    upload_file(user_name, host_names, 'awscli/swapsize.txt')
    run_script(user_name, host_names, 'enable_swap.sh')

## Convert Notebook to Script

The following cell will use `jupyter nbconvert` to build an `aws_volumes.py` which will be used in future notebooks in this series.

In [None]:
%%javascript
var script_file = 'aws_volumes.py';

var notebook_name = window.document.getElementById('notebook_name').innerHTML;
var nbconvert_command = 'jupyter nbconvert --stdout --to script ' + notebook_name;

var grep_command = "grep -v '^#' | grep -v -F get_ipython | sed '/^$/N;/^\\n$/D'";
var command = '!' + nbconvert_command + ' | ' + grep_command + ' > ' + script_file;

if (Jupyter.notebook.kernel) {
    Jupyter.notebook.kernel.execute(command);
}