# Amazon Basics 8: Security Groups

## Usage Notes

Amazon Security Groups are a way to provide firewall rules to an Amazon Elastic Compute Cloud (EC2) instance.

http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-network-security.html

## Notebook Imports

In [None]:
from aws_base import *
from aws_util import *
from IPython.utils.py3compat import *
from netaddr import IPAddress, IPNetwork
import os

## Identify Security Groups

You will need to make sure that you've configured security groups that you want to use in order to grant access to the instances created by the spot request.

https://console.aws.amazon.com/ec2/v2/home#SecurityGroups

You could go through all of this using just the `default` security group. However, if you have additional security groups that you want to grant access to your instances (for example, if you've created a security groups representing the IP addresses of team mates on a project that you want to keep separate from any of your own static IP addresses), add it to the list below.

In [None]:
security_group_names = ['default']

We will need to make sure all of these security groups exist. To do that, we will want a utility function that converts our list of group names into a list of actual groups.

In [None]:
"""
Utility method to retrieve the group that matches the specified group name.
"""
def get_security_group(group_name):
    return get_security_groups([group_name])

"""
Utility method to retrieve the groups that matches the specified group names.
"""
def get_security_groups(group_names):
    security_groups_json = aws('ec2', 'describe-security-groups')
    security_groups = security_groups_json['SecurityGroups']

    candidate_groups = {}

    for security_group in security_groups:
        security_group_name = security_group['GroupName']
        if security_group_name in group_names:
            candidate_groups[security_group_name] = security_group

    return candidate_groups

Now we use that function in order to convert our list of group names into a list of group IDs and confirm that all the security groups we've listed actually exist.

In [None]:
security_groups = get_security_groups(security_group_names)

assert set(security_groups.keys()) == set(security_group_names)

## Enable Cluster Communication

We will want to examine `internal_security_group_name`, which is the security group for the cluster nodes to communicate with each other.

In [None]:
internal_security_group_name = 'default'

One of the features of Amazon's security groups is that it allows you to grant access to all instances that are currently using a security group. The following script will make sure that your designated internal security group contains that rule.

In [None]:
internal_group = security_groups[internal_security_group_name]
internal_group_id = internal_group['GroupId']
internal_group_user_id = internal_group['OwnerId']

is_cluster_communication_enabled = False

for permission in internal_group['IpPermissions']:
    if 'FromPort' not in permission or 'ToPort' not in permission:
        continue

    if permission['FromPort'] != 0 or permission['ToPort'] != 65535:
        continue

    for user_id_group_pair in permission['UserIdGroupPairs']:
        if user_id_group_pair['GroupId'] != internal_group_id:
            continue

        if user_id_group_pair['UserId'] != internal_group_user_id:
            continue

        is_cluster_communication_enabled = True

if not is_cluster_communication_enabled:
    aws(
        'ec2', 'authorize-security-group-ingress',
        '--group-name', internal_security_group_name,
        '--protocol', 'tcp', '--port', '0-65535',
        '--source-group', internal_group_id,
        '--group-owner', internal_group_user_id)

    updated_security_groups = get_security_group(internal_security_group_name)
    internal_group = updated_security_groups[internal_security_group_name]
    security_groups[internal_security_group_name] = internal_group

## Enable External Access

In order to access the machines from the outside world, you have a few options. One option is to use SSH tunneling and configure your web browser to use the SSH tunnel as a SOCKS5 proxy. This will allow you to effectively use the DNS lookups on the remote machine and route your traffic through your tunnel to those machines.

* https://www.digitalocean.com/community/tutorials/how-to-route-web-traffic-securely-without-a-vpn-using-a-socks-tunnel

Another option is to simply allow all connections from your machine to the server without requiring an SSH tunnel. In theory, this is slightly less secure as it allows everyone who shares your IP address (such as an office that has only one public IP address) to access the server, but it is simpler to maintain.

Please specify the group name that will contain that information in `external_security_group_name` and whether we should always assume SSH tunneling when spinning up EC2 instances.

In [None]:
external_security_group_name = 'default'
assume_ssh_tunnel = True

The following script will ensure that `external_security_group_name` exists and add your current public IP address to its rules if none of the other security groups in `security_group_names` provides you with access to your servers. This ensures that you will be able to access the cluster remotely.

In [None]:
external_group = security_groups[external_security_group_name]

is_external_communication_enabled = False

external_ip_address_string = check_output(['curl', '-s', 'http://ipinfo.io/ip'])
external_ip_address = IPAddress(external_ip_address_string)

for security_group in security_groups.values():
    for permission in security_group['IpPermissions']:
        if 'FromPort' not in permission or 'ToPort' not in permission:
            continue

        allow_ssh_port = permission['FromPort'] >= 22 and permission['ToPort'] <= 22
        allow_all_ports = permission['FromPort'] == 0 and permission['ToPort'] == 65535

        if not (assume_ssh_tunnel and allow_ssh_port) and not allow_all_ports:
            continue

        for ip_range in permission['IpRanges']:
            ip_network_string = ip_range['CidrIp']
            ip_network = IPNetwork(ip_network_string)

            if external_ip_address in ip_network:
                is_external_communication_enabled = True

if not is_external_communication_enabled:
    if assume_ssh_tunnel:
        port_range = '22'
    else:
        port_range = '0-65535'

    aws(
        'ec2', 'authorize-security-group-ingress',
        '--group-name', external_security_group_name,
        '--protocol', 'tcp', '--port', port_range,
        '--cidr', external_ip_address_string + '/32')

    updated_security_groups = get_security_group(external_security_group_name)
    external_group = updated_security_groups[external_security_group_name]
    security_groups[external_security_group_name] = external_group

## Convert Notebook to Script

The following cell will use `jupyter nbconvert` to build an `aws_group.py` which will be used in future notebooks in this series.

In [None]:
%%javascript
var script_file = 'aws_group.py';

var notebook_name = window.document.getElementById('notebook_name').innerHTML;
var nbconvert_command = 'jupyter nbconvert --stdout --to script ' + notebook_name;

var grep_command = "grep -v '^#' | grep -v -F get_ipython | sed '/^$/N;/^\\n$/D'";
var command = '!' + nbconvert_command + ' | ' + grep_command + ' > ' + script_file;

if (Jupyter.notebook.kernel) {
    Jupyter.notebook.kernel.execute(command);
}