# `aws s3` (Simple Storage Service)

when it comes to managing your data, you can get pretty far with just an `ec2` server and a virtual hard drive.

for example, right now I have a postgres server running on an `ec2` instance with a 64G hard drive, and that server has been polling the [`wmata` train position api](https://developer.wmata.com/docs/services/5763fa6ff91823096cac1057/operations/5763fb35f91823096cac1058) every 10 seconds for about 10 months. it currently holds about 200 million records. not exactly big data, but not exactly small data either.

as another example, right now ERI has a long-running project scraping power outage information (mentioned previously) in 15 minute intervals. the resulting `json` files are saved to the `/data` directory directly on that machine, and (barring anything truly bizarre) will keep on downloading and storing those files forever

there are disadvantages, though like:

?

1. access
    1. you have to log in to the linux server to access the files
    2. this means I have to grant selective access to people
2. cost
    1. disk space (specifically: EBS (elastic block store)) on an `ec2` service is expensive, running about $0.10 per GB-month
3. manual administration
    1. if I want to grow the hard drive size, I have to do it myself
    2. I have to know *when* that's going to happen, or I could fill my harddrive without paying attention
    3. I could intentionally set up backup policies and redundancy mechanisms on my `ec2` server

`s3` is a service offered by `aws` to be a central file storage location, accessible from everywhere via standard REST api requests (`GET, PUT, COPY, POST, LIST`), which addresses some of those disadvantages:

1. access
    1. you can control access to any "bucket" (basically, a top-level directory in the file system) from the web console
    2. you can be as permissive or as restrictive as you desire
2. cost
    1. standard storage starts at about $0.023 per GB-month (so about 23% the EBS `ec2` hard drive cost) for the first 50 TB, and gets *cheaper* from there
3. manual administration
    1. will grow on the fly without administration
    2. redundancy and backup is a built-in service option
    3. I can have easy-access logging and version information
    4. I can just host a static webpage with a click of a butoon
    
it's not perfect, but it's pretty cool

## buckets

open your `aws` console and go to the `s3` service

https://s3.console.aws.amazon.com/s3/home?region=us-east-1

everywhere you look: "Buckets". a *bucket* is effectively a top-level directory for a family of related files.

you *could* create a single "root" bucket and keep everything in there, but you probably don't want to. why?

1. a broad interpretation of the "separation of concerns" principle applies
    1. you don't want to mix different files that are doing different things for different purposes
2. permissions will be dicey
    1. eventually you may want to be very restrictive or very permissive
    2. you *can* controll permissions on a per-file basis in `s3`, so it is *possible*
    3. *but* clients may not approve of this at all, or it may just be more onerouse to set permissions on a per-file basis than on a bucket-wide level
3. descriptive names actually make for better code
    1. long path names that have nothing to do with a task or project are wasted words
    2. descriptive names can help you understand exactly what you're looking for from the main page

<div align="center">**mini exercise: create an `s3` bucket**</div>

1. pick a name -- it has to be *globally* (across *all* of `s3`) unique
    1. this often leads to the "url" style of naming things, e.g. I tend to create `***********.lamberty.io` bucket names because I own that domain
    2. it doesn't have to be that way -- just has to be unique
2. go with the NOVA region
3. tag 'em!
4. 

<div align="center">**tour the `s3` web console**</div>

1. main page
    1. pretty self-explanatory: a list of buckets and the ability to create new ones
    2. click on any bucket and a right context menu comes up
        1. basic descriptions of the three types of configuration values (properties, permissions, management), and the ability to quickly access the ARN (amazon resource name)
1. bucket page
    1. overview tab
        1. allows you to upload a file, create a directory, and set object permissions
    2. properties tab
        1. versioning: you have the ability to record version histories of every file in your bucket (off by default)
        2. logging: log (to `s3`, $ for `aws`) access and file creation / manipulation log records
        3. static website hosting: host a webpage straightout out of your `s3` bucket (not bad huh?)
        4. tags: good for collecting resources for shared purpose or project
        5. transfer acceleration: pay for faster access
        6. events: create a trigger when a file gets dropped to a location
        7. requester pays: like calling collect, but for the internet age
    3. permissions tab
        1. access control list: control bucket-level permissions for `aws` users and the public
        2. bucket policy: generate an `iam` policy which you can later attach to `iam` users or roles
        3. CORS configuration: advanced; allows you to allow other web services to access these files as if they were local to that web service (forbidden by default)
    4. management tab
        1. a suite of analytics and monitoring tools; not useful at this time since we have no information in these buckets

### putting things in the bucket: web interface

### putting things in the bucket: `cli`

### putting things in the bucket: `python` and `boto3`