# Images are uploaded to S3 for storage 

## Create a new S3 Bucket to store images
https://docs.aws.amazon.com/AmazonS3/latest/user-guide/create-bucket.html

### 1) Create a new S3 bucket 
Create and configure a new S3 bucket through the Amazon S3 console.  
![Create and configure a new S3 bucket through the Amazon S3 console](img/S3_01.png)  


### 2) Name the bucket
Name the bucket according the DNS specifications.  
Region chosen is Asia Pacific (Singapore) as this is where we are located.  
![Name S3 bucket and choose region](img/S3_02.png)  


### 3) Configure options  
Check the "Versioning - Keep all versions of an object in the same bucket" box.  
AWS Versioning Docs: https://docs.aws.amazon.com/AmazonS3/latest/dev/Versioning.html  
Versioning allows us to keep multiple variants of an object in the same bucket. It helps to mitigate the risk of file incompatibility as it allows us to restore previous versions. It also helps us to keep track of our data. However, storing every single version can be expensive and AWS will charge for every additional Gigabyte used. As our dataset is still small, we will keep versioning on for now. Read this article for more information on versioning: 
https://medium.com/@pvinchon/amazon-s3-versioning-d6c57c513b04  
![S3 Configure Options](img/S3_03.png) 

It can also be useful to enable "server access logging". This allows us to track requets for access to our bucket, therefore helping us understand S3 usage better. There is no extra charge for enabling server access logging on S3. However, it is recommended that logs be written to a different target bucket from the source. This is because additional logs are created for the logs written to the bucket and could make it difficult for us to find the relevant logs for our needs. Therefore, we may in future create a new target bucket t owrite those logs too. 
AWS logging Docs: https://docs.aws.amazon.com/AmazonS3/latest/dev/ServerLogs.html  


### 4) Set permissions  
Leave permission settings to default - "Block all public access"  
These settings can be changed later if required.  
![S3 Permissions](img/S3_04.png)  


### 5) Review and create bucket 
Review settings and create bucket.  
![Review S3 Settings](img/S3_05.png)  


###  New bucket created! 
![New bucket created](img/S3_06.png)  

## Accessing the S3 Bucket 

To access the S3 bucket, we need an **access key** and **access secret**. This allows us to be able to make secure REST or HTTP Query protocal requests to AWS. There are a number of ways of accessing AWS resources. Read more about it [here](https://docs.aws.amazon.com/IAM/latest/UserGuide/id.html). 

To access our S3 bucket, we will create an **IAM user** to be able to [load data from AWS S3 into Google Colab](https://medium.com/python-in-plain-english/how-to-load-data-from-aws-s3-into-google-colab-7e76fbf534d2). 

A point to note: 
Long-term access keys (associated with IAM users and AWS account root users) never expire and remain valid until manually revoked. Therefore it may not be best practice due to security. Read more about [best practices](https://docs.aws.amazon.com/general/latest/gr/aws-access-keys-best-practices.html). 

Example: 
https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/AuroraMySQL.Integrating.Authorizing.IAM.S3CreatePolicy.html 


### 1) Create new IAM User
- In the AWS console, click on your username and go to "My Security Credentials". 
- On the left menu bar,  click on "Users" then "Add user" 

![Add User](img/S3_07.png)


### 2) Set user details
- Give the user a name 
- Select programmatic access 

![Set User details](img/S3_08.png) 


### 3) Set permissions 
- Attach existing policies directly 
- Filter to S3 and choose "AmazonS3FullAccess" so that the Secret and Access keys generated will only allow access to the S3 we created and not any other AWS resources 

![Attach policy](img/S3_09.png) 


### 4) Add tags 
- Skipped for now 

![Add tags](img/S3_10.png)


### 5) Review 
- check and review all settings then click create 

![Review user settings](img/S3_11.png) 


### 6) SUCCESS! 
- We have created our new IAM user!
- An **access key** and **access secret** has been created :) 
- We will use this to access our S3 instance 
- Download the .csv file and save it somewhere 

![Access key and Secret](img/S3_12.png)


## Access our S3 bucket via Python

Let's test if it all works :) 

https://medium.com/python-in-plain-english/how-to-load-data-from-aws-s3-into-google-colab-7e76fbf534d2

### Install AWS Python SDK (boto3)

In [1]:
!pip install -q boto3==1.14.60

- If you lose your keys, go to the AWS IAM console to manage access keys and generate a new set of keys 
- Set up Boto creditials to pull data from S3: 

```python
BUCKET_NAME = 'xxxxxx' # replace with your bucket name

# enter authentication credentials
s3 = boto3.resource('s3', aws_access_key_id = 'ENTER YOUR ACCESS KEY',
                    aws_secret_access_key = 'ENTER YOUR SECRET KEY')
```

In [2]:
import boto3

BUCKET_NAME = 'geoai-dtp-images'

s3 = boto3.resource('s3', aws_access_key_id='ACCESS KEY',
                   aws_secret_access_key='SECRET KEY')

### Test if it works 
To test that this all works, we will create a new file "test.txt", upload it to our S3 bucket, then download it locally with the following code.

To upload the file into our bucket: 
```python
s3.Bucket('BUCKET NAME').upload_file('NAME OF FILE TO UPLOAD', 'KEY OF FILE ON S3') 
```

To download the file to our local directory: 
```python
s3.Bucket('BUCKET NAME').download_file('KEY OF FILE ON S3', 'NAME OF DOWNLOADED FILE') 
```

In [6]:
# Test code 
with open('test.txt', 'w') as test:
    test.write("This is our S3 test file")
    
KEY = 'test.txt' # replace with your object key

s3.Bucket(BUCKET_NAME).upload_file(KEY, 'test.txt')
s3.Bucket(BUCKET_NAME).download_file(KEY, 'downloaded_test.txt')

### Check if it has worked
In the current directory, you should see two new files, "test.txt" and "downoladed_test.txt" 
![Check folder](img/S3_13.png)


In our S3 Console, you should also see our "test.txt" file uploaded 
![Check S3 console](img/s3_14.png)

We can now use this same code to access our S3 bucket via Google Colab