# ML and AWS practical

## Logging in and the console

Once logged in, you will be be placed in the AWS management console. **aws1.png**

A few things to note.
First, AWS is hosted in <a href="https://aws.amazon.com/about-aws/global-infrastructure/">multiple locations world-wide</a>. **aws2.png** 

From Amazon:
 > These locations are composed of Regions and Availability Zones. Each Region is a separate geographic area. Each Region has multiple, isolated locations known as Availability Zones. Amazon EC2 provides you the ability to place resources, such as instances, and data in multiple locations. Although rare, failures can occur that affect the availability of instances that are in the same location. If you host all your instances in a single location that is affected by such a failure, none of your instances would be available.
 
We will just be using one region today, Ireland (eu-west-1). You can see (and select) the region from the drop down list in the top right of the console.

Under 'all services' one can select which tool of AWS one wishes to use. We will restrict ourselves for today to Elastic Compute Cloud (EC2), Simple Storage Service (S3) and Elastic Map Reduce (EMR).

First we'll explore EC2, set up an 'instance' (virtual machine) and connect to it.

- Click on EC2. **aws3.png**
- Click on 'Instances' from the left pane. You will see an empty table and a big button 'Launch Instance' - click it!

#### Step 1: Choose AMI

The first step is deciding what machine image you want on your new instance. An easy start is to use one of the images created by Amazon. Select the **Ubuntu Server 18.04 LTS (HVM)** machine image. **aws4.png**
In the future one could use an image already created with software or data already installed, for example.

#### Step 2: Choose Instance Type

The second step is selecting the (virtual) hardware the instance will run on. For this practical **we ask that you use the t3.medium instance type** (as we have increased the limit on this account to allow you all to start one). **aws5.png** Click Next: Configure Instance Details.

#### Step 3: Configure Instance Details

**There is no need to modify any of this.** But a couple of items of interest:

**Autoscaling**: In the future, even for ML problems, it can be helpful to configure a load-balancer and autoscaling to increase or decrease the number of instances depending on the demand.

**Spot pricing** Typically AWS will not be using all its computational resources. To make use of this 'spare' hardware, AWS offer a service called 'spot pricing' which is typically considerably cheaper than the on-demand price, but comes at the cost of an instance that might be terminated with two minutes warning.

#### Step 4: Add Storage

Simply leave it a 8Gb general purpose SSD. 

A note here: There are many types of storage provided by AWS:

- Low cost, slow access: Amazon Glacier
- Elastic Block Store: This is the type of storage you need in your EC2 instance. (<a href="https://aws.amazon.com/ebs/features/#Amazon_EBS_volume_types">More info</a>) This comes in four flavours,
  - slowest/cheapest: sc1 (cold HDD, solid state)
  - still cheap: st1 (throughput-optimised HDD)
  - solid-state: gp2 (general purpose SSD)
  - fast/expensive: io1.

#### Step 5: Add Tags

As everyone is using the same account it is very useful to **label your instance**, here it might be worth adding your id so you can find it again. For example, click *Add Tag* then use Key = "email" and Value = your email. Without this you might struggle to find your instance again!

#### Step 6: Configure Security Group

Security groups are how EC2 organises access to the instances you create. I've already created one called "justssh" which gives access to the SSH port from anywhere. Typically one would restrict this to be from just your IP address, for example. Feel free to either use a security group that already exists or create a new one. You'll need to be able to SSH into the server later.

#### Step 7: Review and Launch

When you click 'launch' you'll be asked to Select or create a key pair.

A quick detour, from ssh.com:

> Each SSH key pair includes two keys:

> A **public key** that is copied to the SSH server(s). Anyone with a copy of the public key can encrypt data which can then only be read by the person who holds the corresponding private key. Once an SSH server receives a public key from a user and considers the key trustworthy, the server marks the key as authorized in its authorized_keys file. Such keys are called authorized keys.

> A **private key** that remains (only) with the user. The possession of this key is proof of the user's identity. Only a user in possession of a private key that corresponds to the public key at the server will be able to authenticate successfully. The private keys need to be stored and handled carefully, and no copies of the private key should be distributed. The private keys used for user authentication are called identity keys.

Previously, when connecting to SHARC or ICEBERG you were using a username/password. In this case AWS will be using a key pair. This is typically more secure and allows automation and is generally the standard method for secure communication. **aws6.png**

So click 'create a new key pair' and enter a key pair name, e.g. "mikesecretkey". Then click 'download key pair'. You'll receive a file called "mikesecretkey.pem" (or whatever). Depending on your operating system there are a few different things to do:

Finally click 'Launch Instances'.

#### SSHing into your new instance

You'll be shown a summary stating your instances are launching. Either click the link to the instance **aws7.png** or return to the <a href="https://eu-west-1.console.aws.amazon.com/ec2/v2/home?region=eu-west-1#Instances:sort=instanceId">list of instances</a> and filter by the tag email address you entered.

You might need to wait a few seconds while the instance starts.

Click on the instance and then right click (or use the button at the top) select "Connect". This will give instructions on how to SSH in. In linux and iOS one needs to simply set the file's permissions to read only `chmod 400 filename.pem` then ssh:

    ssh -i "mysecretkey.pem" ubuntu@ec2-34-243-65-51.eu-west-1.compute.amazonaws.com

In windows things are more complicated. <a href="https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/AccessingInstancesLinux.html?icmpid=docs_ec2_console">AWS has instructions on how to get this working with putty</a> but you may want to consider an <a href="https://www.bitvise.com/tunnelier">alternative SSH client</a>.



Background
- Key pairs.
- “Instances” (trade off speed vs time?)
- Spec (GPUs? CPU credits)
- Vendor
- Software? Images.
- Pricing (spot, on-demand, etc)
- Storage types (specialist: HDFS = Hadoop Distributed File System)
- Communication
- Security
- Instantiate, choices.
- Connect to instance

### AWS Command Line and API
It is often the case that one will want to perform a console operation repeatedly or describe a complex action programmatically to make it easy to see what has been done. To this end Amazon provide both a command line tool and an API. All the actions you can do with the console can be done via these alternative interfaces.

We will log into our 

## EMR

- copy roughly from https://aws.amazon.com/blogs/big-data/building-a-recommendation-engine-with-spark-ml-on-amazon-emr-using-zeppelin/
(just spark, hadoop)

