# Storage

There are 3 main storage types in AWS.

* `block`: `EBS` (persistent) and instance store (ephemeral)
* `file`: Amazon `EFS` (Elastic File System)
* `object`: Amazon `S3` and `Glacier`

Deciding on which type of storage to use depends on your `data dimensions`.

* Volume, Variety, Velocity
  * volume is how big is the data in size
  * variety is the heterogeneity of the data types (text, videos, pictures, etc.)
  * velocity is how fast data is flowing through
* Temperature
  * `hot` data is actively used
  * `warm` data is actively used but less than hot data
  * `cold` data is occasionally used data
  * `frozen` data is non-actively used data
* Value
  * `transient` data has a short lifespan
  * `reproducible` data is derived data
  * `authoritative` data is ground truth data
  * `critical` data is data that must be kept 
  
<div class="alert alert-info">
    
**Note:** The `CIA` (Confidentiality, Integrity, Availability) model of information security is used by AWS. Confidentiality is accomplished through permissions and encryption to assure data privacy. Integrity centers on the accuracy of data. Availability refers to service availability to store data.
    
</div>

## Block storage

### EBS

Here are some things to consider about EBS.

* Network-attachment: EBS volume are not physically attached to an EC2 instance; instead, they are network-attached
* Resizing: EBS volumes may be resized and are referred to as `elastic volumes`
* SSD vs HDD: SSD are optimized for IOPS (input/output operations) while HDD are optimized for throughput
* Snapshots: Snapshots of EBS volumes are possible and efficient as they are incremental
* EBS-optimized instances: Instances that are EBS-optimized separate data to and from EBS volumes apart from other network traffic
* Encryption: Data stored on EBS volumes are encrypted using [AES-256](https://en.wikipedia.org/wiki/Advanced_Encryption_Standard) and the AWS `KMS` (Key Management Service)

<div class="alert alert-info">
    
**Note:** When EBS volumes are restored from snapshots, the blocks must be pulled down from S3 and written to the volume, a process known as `initialization`, which may take significant time.
    
</div>

### Instance storage

Here are some things to consider about instance store volumes.

* Direct-attachment: instance store volumes are directl attached to an EC2 instance
* Availability: not all EC2 instance types have instance store available

## Object storage

The fundamental concept of S3 is the `bucket`. A bucket does not behave like a folder, although it may appear to be so. There is a limitation of 100 buckets per account, and buckets cannot be nested (no buckets within buckets). Bucket names are also important as they are global (the same bucket name cannot exists in two different regions), and it is encouraged to use DNS naming convention when naming buckets.

`Objects` refer to the items (files) placed in a bucket. When a bucket is created, one may choose to `version` all items in a bucket (this option cannot be changed later). Objects are limited to 5 TB in size, and anything larger must be chunked. 

<div class="alert alert-info">
    
**Note:** A `presigned url` is a way to generate a URL to give time-boxed access to an object for someone outside. 
    
</div>

<div class="alert alert-info">
    
**Note:** Object lifecycle management is accomplished through `lifecycle configuration`. Objects may undergo transition (when objects are moved to other storage classes) or expiration actions (when objects are deleted).
    
</div>

The different storage classes of S3 are listed below.

| Storage Class | Features |
|---------------|----------|
| Standard | Offers high-availability, high-durabilty and performance |
| `RRS` (Reduced Redundancy Storage) | Like `Standard`, but, with less replication |
| `Standard_IA` (Standard-Infrequent Access) | For less frequently accessed data but requiring rapid access when needed |
| `OneZone_IA` (One Zone-Infrequent Access) | Like `Standard_IA` but stored only in one `Availability Zone` |
| `Glacier` | Offers high durability but high retrieval time |

<div class="alert alert-info">
    
**Note:** When using Glacier, data or `archives` are stored in `vaults`. `Vault Locks` are policies enforced on vaults. Only an archive ID is associated with an archive, and no metadata may be associated with archives; such additional information must be managed separately and outside of Glacier.
    
</div>

<div class="alert alert-warning">
    
**Warning:** S3 is governed by the [CAP Theorem](https://en.wikipedia.org/wiki/CAP_theorem). There may be read-after-write inconsistency for POST and DELETE operations; read-after-write consistency is only guaranteed for PUT operations.
    
</div>

<div class="alert alert-info">
    
**Note:** Access logging may be turned on for a S3 bucket to capture file access.
    
</div>

Encryption of data on S3 may either be server- or client-side. Server-side encryption `SSE` is possible through the following.

* `SSE-S3` S3 manages the encryption of objects with a unique key per object; the key is also encrypted with a rotating master key
* `SSE-C` The customer supplies the encryption key to encrypt data with
* `SSE-KMS` A master key is created in AWS `KMS` (Key Management Service) which then encrypts the keys that encrypts data

In client-side encryption `CSE`, the user supplies the master encryption key. The difference with `SSE-C` is that the master encryption key supplied through `CSE` is used to encrypt the data key (the key used to encrypt the data), while the key supplied through `SSE-C` is used directly to encrypt the data.

<div class="alert alert-info">
    
**Note:** A mixture of user and bucket policies may be used to control access to buckets. Access control lists (ACLs) may also be used.
    
</div>

## File storage

Unlike S3, Amazon EFS is a real file system and may be mounted on multiple EC2 instances at the same time. Using the VPC, a `mount target` (Network File System endpoint) must be created. On an EC2 instance, you use either the `mount` command or `AWS Direct Connect` to access the EFS file system. Here's an example of using `mount` to access an EFS file system.

```bash
mount -t nfs4 -o nfserver=4.1 [file-system-dns-name]:/[ec2-directory]
```

Amazon EFS has 2 performance modes.

* General purpose: default performance mode with low latency
* Max I/O: used when tens, hundreds or thousands of EC2 instances require access

## Data transfer

`AWS Storage Gateway` has quite a approaches to transfer data in and out of the cloud. 

| Appliance | Description |
|-----------|-------------|
| AWS Storage Gateway | Enables on-premise applications to access AWS cloud storage |
| File Gateway | Enables connection to S3 as a file share |
| Volume Gateway | Enables cloud storage volumes to be mounted as iSCSI devices |
| Cache volume mode | Gateway caches frequently accessed S3 objects |
| Stored volume mode | Enables local data to be backed up as snapshots to EBS |
| Tape gateway | Replaces traditional tape backup to Glacier |

Physical devices may also be used to transfer data into and/or out of the cloud.

* `AWS Import/Export` is a service that allows users to transfer huge volumes of data into the cloud with their own devices.
* `AWS Snowball` is a service transferring petabyte-scale data 
* `AWS Snowball Edge` is a service transferring 100-TB scale data
* `AWS Snowmobile` is a service transferring exabyte-scale data into AWS

`AWS Kenesis Data Firehose` is a streaming service that may be used to stream data into the cloud. Here are some key concepts of AWS Kenesis.

* The defined stream is referred to as `data delivery stream`
* The data items being transferred are called `records`
* Data sources are called `producers`
* Data sinks are called `destinations`

## Definitions

* `RTO` (Retrieval Time Objective): the time requirement needed to restore and/or retrieve data
* `read-after-write consistency`: a situation where reading data after writing it may produce unexpected results
* `KMS` (Key Management Service): AWS service to manage encryption keys
* `data key`: the key used to encrypt data
* `master key`: the key used to encrypt data keys
* `data in transit`: data that is moving (on the network or between two locations)
* `data at rest`: data that is stored
* `SSE` (Server-Side Encryption): encryption that happens on the server
* `envelop encryption`: a process of encrypting data with a key, and also encrypting the with another key, called the `key-encrypting key`
* `CSE` (Client-Side Encryption): encryption that happens on the client
* `defense in depth`: a approach to secure data at rest and in transit involving multiple layers from CloudFront (HTTPS/SSL), buckets (policies) to objects (SSE and CSE)
* `MFA delete` (Muti-factor authentication delete): using MFA to enable delete operations on objects in S3 buckets
* `CRR` (Cross-region replication): automatic, aysnchronous copying of objects across buckets in different AWS regions
* `Amazon S3 Transfer Acceleration`: an S3 feature that enables you to transfer data to a final destination over long distances quickly by uploading through/to an edge location
* `iSCSI` (internet Small Computer System Interface): IP-based storage networking standard