# Accessing data in AWS

### Writing to S3 and reading things other than `csv`
#### AKA, let's get coding

**Objective**: to read and write to s3 from within sagemaker

So, ways to interact with AWS through code:

- [AWS CLI](https://aws.amazon.com/cli/) = Command Line Interface for AWS resources</br>
- [Boto3](https://aws.amazon.com/sdk-for-python/) = Python SDK for AWS, SDK = software development kit ( [SDKs for other languages found here](https://aws.amazon.com/tools/) )</br>
- [S3FS](https://s3fs.readthedocs.io/en/latest/) = S3 File system package, a wrap around for boto3's interaction with S3 to mimic how we interact with files on our computers.

<img src="./image/boto3.png" alt="boto3" style ="text-align:center;width:500px;float:none" ></br>
<img src="./image/s3fs.png" alt="s3fs" style ="text-align:center;width:250px;float:none" ></br>

### We are going to use S3FS

Shortest path to getting things running, most familiar framework.</br>
Are there advantages of the other tools? yes. We will not learn them here.


S3FS creates a connection to your S3 file system, the same way sqlite creates a session, or other packages create a client. </br>
`S3FileSystem()` will accept your credentials as arguments.</br>
You can then use your familiar commands, like `ls`

Before you can access your files in S3, need to set up your S3 credientials in S3FS, the documentation [here](https://s3fs.readthedocs.io/en/latest/api.html#s3fs.core.S3FileSystem) will help.

In [None]:
import s3fs

fs = s3fs.S3FileSystem() # specify your key and secret here


# To List 5 files in your accessible bucket
fs.ls('s3://flatiorn-chicago-bucket')

### JSON & S3 from notebook
Start with tiny one!

`data = {"HelloWorld": []}`

In [None]:
import json
import s3fs

data = {"HelloWorld": []}

fs = s3fs.S3FileSystem()
with fs.open('flatiorn-chicago-bucket/tiny_file2.json', 'wb') as f:
    user_encode_data = json.dumps(data).encode('utf-8')
    f.write(user_encode_data)

##### Read JSON in from S3 to Sagemaker environment

In [None]:
with fs.open('flatiorn-chicago-bucket/tiny_file2.json') as f:
    f_json = f.read()
    test_json = json.loads(f_json.decode('utf-8'))
print(data)

#### Let's get a real JSON

Use `json`, `requests`, and `sf3s` libraries

In [None]:
import requests
import json

url = 'https://opendata.arcgis.com/datasets/14faf3d4bfbe4ca4a713bf203a985151_0.geojson'
r = requests.get(url)
cont = json.loads(r.content.decode())

##### **Write** it to S3 bucket

In [None]:
import s3fs

fs = s3fs.S3FileSystem()
with fs.open('flatiorn-chicago-bucket/dc-requests2.json', 'wb') as f:
    user_encode_data = json.dumps(cont).encode('utf-8')
    f.write(user_encode_data)

##### **Read** JSON from S3 bucket

In [None]:
with fs.open('flatiorn-chicago-bucket/dc-requests.json') as f:
    f_json = f.read()
    test_json = json.loads(f_json.decode('utf-8'))

### Now you try!

What you will need:
- A JSON to scraped off the web
- An AWS account
- an S3 bucket of your choice

#### **Tasks**:
- read your json from the web using `requests`
- save it to a bucket of yours
- go check in the S3 bucket and make sure the permissions are set to public
- Show a coach/instructor that it is indeed there in your bucket!
- Then, read it back into this notebook from the s3 bucket
- Explore the JSON and find where the actual data is
- Share with the group next to you what you managed to pull into s3 and your sagemaker jupyter notebook!
- be ready to share with the larger group if someone near you did something cool