Amazon Web Services (AWS)

* Objectives:
    * Describe core AWS services and concepts
    * Configure laptop to use AWS
    * Use SSH key to access EC2 instances
    * Launch and access EC2
    * Access S3

1) Amazon Web Services Core Concepts
* Why use AWS?
    * AWS provides on-demand use of computing resources in the cloud:
        * No need to build data centers
        * Easy to create a new business
        * Only pay for what you use
        * Handle spikes in computational demand
        * Secure, reliable, flexible, scalable, and cost-effective
    * AWS skills are much in demand
* AWS Services Overview:
    * Compute:
        * **Elastic Compute Cloud (EC2)** - computers/virtual machines for diverse problems     
        * **EC2 Container Services** - manage/launch docker containers at scale
        * **EC2 Elastic Beanstalk** - automatically handles deployment of your code (from capacity provisioning, load balancing, auto-scaling to application health monitoring) based on code you upload to it
            * for people who don't understand AWS (will provision for you)
        * **Lambda** - AWS Function-as-a-Service (FaaS) offering that lets you run code without provisioning or managing servers
            * code you can upload to the cloud and execute scripts (e.g. detects when image is uploaded to write text)
        * **Lightsail** - virtual private services (VPS) -> provisions server with fixed IP (simplified EC2)
        * **Batch** - used for batch computing in cloud
    * Storage:
        * **Simple Storage Solution (S3)** - object-based long-term bulk storage using buckets
        * **Elastic File System (EFS)** - network attached storage (NAS) device that can be mounted to virtual machines (EC2)
            * allows you to connect it to multiple EC2 instances at once
            * simple, scalable file storage for EC2 instances
        * **Glacier** - used for data archival (rarely pull data like once a year)
        * **Snowball** - used when you want to bring in large amounts of data to datacenter by writing to disk physically
            * petebyte-scale data transport solution that uses secure applicanes to transfer large amounts of data into and out of the AWS cloud
        * **Storage Gateway** - hybrid storage service that enables your on-premises applications to seamlessly use storage in the AWS cloud (connects an on-premise software application or virtual machine with cloud based storage)
            * virtual appliances stored in datacenters that replicate information back to S3
    * Databases:
        * **Relational Database Services (RDS)** - relational databases e.g. mySQL, Microsoft SQL Server, Postgres, Amazon Aurora (compatible with Postgres)
        * **DynamoDB** - non-relational databases
        * **Elasticache** - caches commonly queried data (e.g. cache top 10 products, which frees your database to do other queries)
        * **Red Shift** - distributed data warehousing/business intelligence
    * Migration:
        * **AWS Migration Hub** - tracking service when you migrate things to AWS
        * **Application Discovery Services** - automated set of tools that detect the applications you have along with the dependencies (e.g. SQL server dependency)
        * **Database Migration Service (DMS)** - easy way to migrate data from on-premise to AWS (Oracle hated when this was released)
        * **Server Migration Service** - helps with migrating servers (virtual and physical) into AWS
    * Networking & Content Delivery:
        * **Virtual Private Cloud (VPC)** - a virtual datacenter to configure firewalls, availability zones, etc.
            * a virtual network dedicated to a single AWS account
            * it is logically isolated from other virtual networks in AWS cloud, providing compute resources with security and robust networking functionality
        * **CloudFront** - Amazon's Content Delivery Network (CDN) e.g. media assets (video/image files) can moved closer to your customers
        * **Route53** - Amazon's DNS e.g. data catalog for looking up ip addresses
            * name is a portmanteau of Route 66, an American highway, and 53, the port used for DNS
        * **API Gateway** - allows you to create your own API for your other services
        * **Direct Connect** - allows you to run a dedicated line from business datacenter to AWS VPC
    * Developer Tools:
        * **CodeStar** - gets a group of developers to work together (project manages code)
        * **CodeCommit** - a place to store code (source control service, private Github repository)
        * **CodeBuild** - builds code from CodeCommit and runs test against it and produces software packages ready for deployment
        * **CodeDeploy** - deployment services for EC2 instances or to on-premise instances or to Lambda functions
        * **CodePipeline** - continuous delivery service that automates the steps required to release the software
        * **X-Ray** - used to debug serverless applications to find root cause of performance bottlenecks
        * **Cloud9** - IDE environment to develop code in AWS console
    * Management Tools:
        * **CloudWatch** - monitoring service (used by sysops administrator) to monitor performance of EC2 instances (e.g. CPU utilization, Disk IO, etc.)
        * **CloudFormation** - scripts infrastructure (for building datacenters) but by code
            * automated provisioning enginer designed to deploy entire cloud environments via a JSON script
        * **CloudTrail** - logs changes when you make modifications to your environment on AWS (by default for one week) e.g. when you create a S3 bucket or instantiate a EC2 instance, it triggers an API call which is logged
            * enables governance, compliance, operational auditing, and risk auditing of AWS account
        * **Config** - monitors configuration of entire AWS environment over time (with point in time snapshots)
        * **OpsWorks** - configuration management service that uses Chef, an automation platform that treats server configurations as code
            * similar to elastic beanstalk, but more robust using shift and puppet to automate environment configuration
        * **Service Catalog** - manages catalog of IT services to be approved for use on AWS (e.g. virtual machine images, individual operating systems, software, databases). Can be split into multi-tier architecture (for governance and compliance requirements)
        * **System Manager** - manages AWS resources for EC2 (e.g. patch maintenance to be rollout to all EC2 instances)
        * **Trusted Advisor** - gives advice in multi disciplines like security (e.g. left ports open), services not taken advantage of (saving money on your services through AWS)
        * **Managed Services** - allows AWS to manage your EC2 instances (auto-scales)
    * Media Services:
        * **Elastic Transcoder** - takes video file and changes it so it fits on other platforms (e.g. video file to android, iphone, ipad platform)
            * e.g. digital media agency wants to convert its media files to formats that can be viewed on a variety of devices
        * **MediaConvert** - file-based video transcoder service (allows to create video-on-demand delivery at scale)
        * **MediaLive** - live video broadcast service (high-quality streaming service) 
        * **MediaPackage** - prepares and protects your video services
        * **MediaStore** - place that is optimized to store media (good performance and low latency)
        * **MediaTailor** - allows for targeted advertising for video streams without sacrificing broadcast quality
    * Machine Learning:
        * **SageMaker** - platform to easily use deep learning for coding for their environments
        * **Comprehend** - sentimental analysis around data (e.g. positive or negative comments about your product)
        * **DeepLens** - Artifically aware camera (knows what it is looking at)
        * **Lex** - powers Amazon Alexa Services; allows for communication with customers using AI
        * **Machine Learning** - normal machine learning algorithms
        * **Polly** - takes text and turns it into speech (into different languages)
        * **Rekognition** - video and image recognition by submitting video or images
        * **Amazon Translate** - translate text from one language to another
        * **Amazon Transcribe** - auto speech recognition (translate voice to written text)
    * Analytics:
        * **Athena** - run SQL queries in S3 bucket
        * **Elastic Map Reduce (EMR)** - processes large amount of data for analysis 
        * **CloudSearch / ElasticSearch Service** - search services for AWS (e.g. search service for discussion forums)
        * **Kinesis** - ingests large amount of streaming data in AWS e.g. tweets in real-time
            * used for collecting large amounts of data streamed from multiple sources
        * **Kinesis Video Streams** 
        * **QuickSight** - BI tool that is fast, cloud-powered and makes building visualizations, performing ad-hoc analysis easy
        * **Data Pipeline** - Moving data between different data services
        * **Glue** - ETL service
    * Security & Identity & Compliance:
        * **Identity Access Management (IAM)** - allows management of users and their level of access to the AWS console
            * adds new users to AWS account and set password rotation policies for these new users
        * **Cognito** - device authentication for mobile applications (e.g. facebook, google, linkedin login)
            * e.g. allows your user to authenticate to your DynamoDB to store geographic locations
        * **GuardDuty** - monitors for malicious activities on AWS account
        * **Inspector** - an agent that is stored on virtual machine (EC2 instances) that runs tests against it (e.g. security vulnerabilities)
        * **Macie** - scans S3 buckets for personally identifiable information (names, tax file numbers, ssn, pw, credit card #s, etc.) and alerts you
        * **Certificate Manager** - manages SSL certificates (register domain through Route53)
        * **Cloud Hardware Security Module (CloudHSM)** - dedicated hardware to store keys (private/public) to access EC2 instances, and other encryption keys
        * **Directory Service** - integrating Microsoft active directory services with AWS services 
        * **Web Application Firewall (WAF)** - implements layer 7 firewall (stops cross-site scripting, sql injection, etc.) where it is looking at the application layer and seeing if the user is conducting malicious activity and if we should take action
        * **Shield** - (comes with CloudFront) conducts Distributed Denial-of-Service (DDoS) mitigation (prevents it)
            * **Shield Advance** - dedicated 24/7 mitigation for DDoS
        * **Artifact** - on-demand access for audit and compliance reports and agreements (e.g. download sock controls, PCI reports, etc.)
    * Mobile Services:
        * **Mobile Hub** - management console (e.g. have mobile app, create this mobile hub with AWS services and generates a cloud configuration file, then use the AWS mobile SDK to connect to mobile app to the AWS backend)
        * **Pinpoint** - use targetted push-notifications drive mobile engagement (e.g. letting customers know about a deal if they are close to a restaurant)
        * **AWS AppSync** - automatically updates the web and mobile applications in real-time and also for offline uses as soon as it reconnects
        * **Device Farm** - testing platform for real live devices (e.g. iphone/android devices)
        * **Mobile Analytics**
    * Augmented Reality (AR) / Virtual Reality (VR):
        * **Sumerian** - platform for 3D application design where you use a common set of tools for different environments
    * Application Integration:
        * **Step Functions** - managing your different lambda functions
        * **Amazon Message Queues (Amazon MQ)** - method of doing message queues (similar to rabbitMQ)
        * **SNS** - notification service (e.g. billing alarm when over 10 dollars)
        * **SQS** - decoupling infrastructure (helps hold information in a queue for you until you're ready to use) (e.g. meme website when user uploads image and has text ready)
        * **Simple Workflow Services (SWF)** - creates a workflow for a job (e.g. Amazon uses this in their warehouses for when you order a package online where it creates a simple workflow job)
    * Customer Engagement:
        * **Connect** - contact center as a service (e.g. your own call center in cloud)
        * **Simple Email Service** - allows you to send large amount of email at scale
    * Business Productivity:
        * **Alexa For Business** - use Alexa for business tasks (e.g. dial into a meeting room, inform IT that the print is broken, order ink for company)
        * **Chime** - video conferencing (similar to Google Hangout)
        * **Work Docs** - dropbox for AWS (safely and security stores work-related documents)
        * **WorkMail** - email service through AWS (similar to Gmail)
    * Desktop & App Streaming:
        * **Workspaces** - VDI solutions for running operating systems (e.g. Windows/Linux) in AWS that streaming to your device
            * VDI solution that replaces local desktop environment (Desktop-as-a-Service)
        * **AppStream** - streams application that's running the cloud live to your mobile devices
    * Internet of Things (IoT):
        * **iOT** - captures thousands of devices sending sensor information (e.g. temp, humidity, etc.) at scale
        * **iOT Device Management**
        * **Amazon FreeRTOS** - operating system for microcontroller
        * **Greengrass** - allows you to run local compute messaging and data cache in-sync machine learning interfaces in devices in a security matter
    * Game Development:
        * **GameLift** - a service to help you develop games on the cloud 

1.1) AWS Main Services Overview:
* AWS Core Services:
    * **Elastic Compute Cloud (EC2)** - computers/virtual machines for diverse problems        
    * **Elastic Block Store (EBS)** - virtual hard disks for use with EC2
    * **Simple Storage Solution (S3)** - long-term bulk storage
    * **Dynamo DB** - a variety of non-relational databases
* **Elastic Compute Cloud (EC2)** - spin up EC2 instances for on-demand computing power
    * **Instance** - a type of hardware you can rent (e.g. "m3.xlarge" or "t2.micro")
    * **Amazon Machine Image (AMI)** - an OS you can run on an instance
    * **Region** - a geographic region such as Oregon (e.g. "us-west-2")
        * a geographical area divided into availability zones (with at least two)
    * **Availability Zone (AZ)** - a specify subset of region, often a data center (e.g. "us-west-2a")
        * each with redundant power, networking and connectivity, housed in separate facilities
    * **Edge Locations** - endpoints for AWS which are used for caching content (e.g. CloudFront, Amazon's Content Delivery Network (CDN))
* **Elastic Block Store (EBS)** - EBS provides disk-drive style storage
    * Create a virtual hard disk
    * Then, mount virtual hard disk on EC2 instances
    * SSD or magnetic
    * Can store data even when you aren't running any EC2 instances
    * Built-in redundancy
    * **Lower latency** than S3, but **more expensive**
* **Simple Storage Solution (S3)** - provides cheap, bulk storage
    * Create a **bucket**, which serves as a container for files and directories
    * Specify permissions using an **Access Control List (ACL)**
    * Access via URL or AWS CLI or suitable API
    * **Higher latency** than EBS, but **less expensive**
* **Dynamo DB** - provides databases in the cloud
    * Support for most common flavors of SQL (Oracle, MySQL, etc.)
    * Once setup, works like normal SQL database
    * AWS supports other databasse types as well

1.2) **Identity Access Management (IAM)** - allows management of users and their level of access to the AWS console
* IAM functionality:
    * Centralised control of your AWS account
    * Shared access to your AWS account
    * Granular Permissions
    * Identity Federation (including Active Directory, Facebook, Linkedin, etc.)
    * Multifactor Authentication
    * Provide temporary access for users/devices and services where necessary
    * Allows you to set up your own password rotation policy
    * Integrates with many different AWS services
    * Supports PCI DSS Compliance
* Terminology:
    * Users - End Users
    * Groups - A collection of users under one set of permissions (e.g. finance group, sysops group, etc.)
    * Roles - create roles and then assign them to AWS resources
    * Policies - a document that defines one (or more permissions) in JSON format
        ```json
        {
          "Version": "2012-10-17",
          "Statement": {
            "Effect": "Allow",
            "Action": "s3:ListBucket",
            "Resource": "arn:aws:s3:::example_bucket"
          }
        }
        ```
* Fact about IAM:
    * IAM is universal (doesn't not apply to specific regions at this time)
    * The "root account" is the first account create when setting up AWS and has complete Admin access by default
    * New users have no permissions when first created
    * New users are assigned **Access Key ID & Secret Access Keys** when first created, which communicates with APIs, SDKs, and CLI
        * They are not the same as passwords, which have access to the AWS console
    * Secret Access Keys & Passwords can only be viewed once (if you lose them, you can regenerate) so save it in secure location
    * Must set up Multi-factor Authentication (MFA) on root account
    * Can create and customize password rotation policies

1.3) Setting Up Billing Alarm
* AWS allows you to setup billing alarms in case your spending goes over a certain threshold per month through CloudWatch
    * Enable billing alerts in Billing & Cost Management
    * Set threshold through CloudWatch for Billing
* e.g. Send email alert when spending is over \$10 per month

2) AWS **Command Line Interface (CLI)**
* Use the AWS CLI:
    * To debug your configuration
    * To manage AWS instances in EC2
    * To access S3
* Installing AWS CLI on OS/X or Linux
    * On Linux:
    ```bash
    curl "https://s3.amazonaws.com/aws-cli/awscli-bundle.zip" -o "awscli-bundle.zip"
    unzip awscli-bundle.zip
    sudo ./awscli-bundle/install -i /usr/local/aws -b /usr/local/bin/aws
    ```
    * On OS/X:
    ```bash
    # Can also install via brew (or pip)
    brew install awscli
    # But, may not be the latest version
    ```
* Obtaining AWS credentials:
    1. Login to AWS
    2. Click "Your Account" in the upper right menu bar
    3. Select "Security Credentials"
    4. Select "Users"
    5. Then, select your username and click "User Actions > Manage Access Keys"
    6. Create your credentials
    7. Choose "Oregon (us-west-2)" as our region
* Configuration of AWS CLI - run `aws configure`
    * Creates default profile in `~/.aws/.config`
    * Stores credentials in `~/.aws/credentials`
    * Can create multiple profiles:
    ```bash
    aws configure --profile fancy_profile
    ```
    * Can also set credential on CLI or via environment variables
* Create AWS configuration info in `~/.aws`:
```bash
aws configure
    AWS Access Key ID [None]: AKIAIOSFODNN7EXAMPLE
    AWS Secret Access Key [None]: wJalrXUtnFEMI/K7MDENG/bPxRfiCYEX Default region name [None]: us-west-2
    Default output format [None]: json
```
* Some tools get AWS credentials via environment variables. Set the following in `~/.bash_profile` or equivalent:
```bash
export AWS_ACCESS_KEY_ID='your access key'
export AWS_SECRET_ACCESS_KEY='your secret key'
```
    * Must set `AWS_*` environment variables to use some big data tools
    * Problematic if using AWS in multiple data centers
* Verify configuration in S3 and EC2:
    * Check S3:
    ```bash
    aws s3 ls
        2015-08-25 10:42:43 dsci
        2015-08-25 11:30:33 seattle-dsi 
    aws s3 ls s3://seattle-dsi
        PRE cohort1/
        PRE skrainka/
    ```
    * Check EC2:
    ```bash
    aws ec2 describe-instances --output table
    aws ec2 describe-instances --output json
    ```
* Help with AWS CLI
```bash
aws help
aws ec2 help
```

3) Configure and use SSH to log into AWS EC2
* To use AWS:
    * Setup **Key Pairs** to access EC2
    * Configure SSH on your laptop
    * Can use SSH to access any remote machine running an SSH server
* **Secure Shell (SSH) Protocol** - allows you to:
    * Login to remote machines, such as EC2 using `ssh`
    * Transfer files between remote machines using `scp` and `sftp`
    * Execute commands on remote machines using `ssh`
    * Do **not** use telnet, rlogin, or FTP, which are older, insecure protocols!
* **Public Key Encryption** - SSH uses *public-key encryption* to protect access:
    * Generate a **public** and **private** key
        * Known as a **key pair**
    * Need both public and private key to decrypt
        * Keep private key safe!
    * Create a key pair for each resource you want to access (AWS, GitHub, etc.)
        * can revoke individual keys in case of a security breach
* Setup **Key Pairs** - create and configure key pair to access EC2
    * Create and import key pair as describe in AWS doc
    * Set permission on private key to 400:
    ```bash
    chmod 600 ~/.ssh/bss-aws-master.pem 
    chmod 644 ~/.ssh/bss-aws-master.pub
    ```
    * Can also generate key pair with `ssh-keygen`
* Configuring SSH - modify SSH configuration to:
    * Create alias for long-running instance
    * Forward X11 or security information
    * Specify which key to use
    * `man ssh_config` for details
* Example SSH Config (`~/.ssh/config`):
    ```sh
    Host github.com
        HostName github.com
        User git
        ForwardAgent yes
        IdentityFile /Users/bss/.ssh/git-hub-id_rsa
    ```
* Master SSH Configuration - if your master will run for a long time, setup an alias:
    ```sh
    Host master
        HostName ec2-54-186-136-57.us-west-2.compute.amazonaws.com
        User ubuntu
        ForwardAgent yes
        ForwardX11Trusted yes
        TCPKeepAlive yes
        IdentityFile /Users/bss/.ssh/aws-master.pem
    ```
    * Now, `ssh master` will connect to your EC2 instance
* Accessing an EC2 instance with `ssh`:
    1. Launch an EC2 instance from console
    2. Use `ssh` from command line to connect to the instances **public DNS** (Shown in EC2 Dashboard):
    ```bash
    ssh -X -i ~/.ssh/aws-master.pem ubuntu@ec2-54-186-136-57.us-west-2.compute.amazonaws.com
    
    # or
    ```
    ```bash
    ssh -X -i ~/.ssh/aws-master.pem ec2user@ec2-54-186-136-57.us-west-2.compute.amazonaws.com
    ```

4) Transferring Files with `scp` and `sftp` and Managing Sessions with `tmux`
* `scp` - to copy files between machines:
    * Works just like regular copy
    * Good for simple operations (if you specify remote user and machine (IP, DNS) correctly)
    * Reference remote location as `user@host:path`
        ```bash
        scp -i ~/.ssh/bss-aws-master.pem ./toy_data.txt ubuntu@54.186.136.57:/home/ubuntu/data
            toy_data.txt 100% 136 0.1KB/s 00:00
        ```
* `sftp` - to copy files interactively:
    * Interactive shell for transferring files
    * Use to transfer many files
    * Use when you don't know the location of file
        ```bash
        sftp -i ~/.ssh/bss-aws-master.pem ubuntu@54.186.136.57
            Connected to 54.186.136.57.
            sftp> help
        ```
* `tmux` - persist jobs across muliple sessions:
    * On logout, all child processes terminate
    * Use `tmux` to safely disconnect from a session
    * Reconnect on next login
    * Install `tmux` via brew or Linux package manager
    * See `tmux` exercise

5) EC2 Tips
* Always create instances with tags so that you can find them easily
* Choose the appropriate hardware type for your problem
* If in doubt, use Ubuntu because it is a friendly flavor of Linux
* Use `tmux` when you login in case you need to disconnect or your connection dies
* Be paranoid: sometimes Amazon will reboot or reclaim your instances
* Put data you need to persist in EBS or a database
* Never put AWS keys in GitHub because someone will steal them

6) S3 Configuration
* S3 Basics:
    * Can find URL to access a file from S3 console
    * Set properties (access) via S3 console
    * Make sure names conform to S3 conventions:
        * lowercase bucket names of at laest four characters
        * no leading or terminal "."
* Location to store S3 Files:
    * User the bucket `denver-dsi`
    * Create a directory with your surname under the `cohort1` sub-directory
    * Put your lab files under `denver-dsi/cohort1/your_surname`
* **Boto** configuration - to access S3 via Python, use the `boto` package:
    * Should be installed if you followed setup instructions
    * Make sure `boto` is up to date:
    ```bash
    conda update boto
    ```
    * Uses credentials in `~/.aws/credentials`
    * Can also read directly from Pandas if you specify S3 URL

7) Advanced Issues
* Use an `ssh` tunnel to run ipython notebook on a remote instance:
    * On remote host:
    ```bash
    ipython notebook --no-browser --port=8889
    ```
    * On local machine:
    ```bash
    ssh -N -f -L localhost:8888:localhost:8889 remote_user@remote_host -i ~/.ssh/aws_key.pem
    ```
    * Access notebook via browser at URL `localhost:8888`