# Accessing S3 Buckets with S3FS
### Author: Chris Battisto
### Date Authored: 1-26-22

### Timing

Exercise: 5 minutes

<p></p>

<div style="background:#fc9090;border:1px solid #cccccc;padding:5px 10px;"><big><b>Note:  </b>This notebook <em><strong>will only run in an environment with <a href="https://disc.gsfc.nasa.gov/information/glossary?keywords=%22earthdata%20cloud%22&amp;title=AWS%20region">us-west-2 AWS access</a></strong></em>.</big></div>

### Overview

This notebook demonstrates accessing the GES DISC S3 bucket through the S3FS Python library.

### Prerequisites

This notebook was written using Python 3.8, and requires these libraries and files: 

- S3FS
    - S3FS documentation: https://s3fs.readthedocs.io/en/latest/install.html
- netrc file with valid Earthdata Login credentials.
- Approval to access the GES DISC archives with your Earthdata credentials (https://disc.gsfc.nasa.gov/earthdata-login)


### Import Libraries

In [16]:
import s3fs
import requests

### What is S3?

- S3 (Simple Storage Service) is an economical and simple way to store and access granules in the cloud.
- It stores data as "objects" inside of "buckets" to allow for data to be accessed securely and optimally.
- Thanks to Python and S3FS, we can load our data objects as if they were stored in a traditional file structure/system.

In [None]:
from IPython.display import Image

Image(url="https://d1.awsstatic.com/s3-pdp-redesign/product-page-diagram_Amazon-S3_HIW.cf4c2bd7aa02f1fe77be8aa120393993e08ac86d.png")

### Get S3 Credentials

In [2]:
gesdisc_s3 = "https://data.gesdisc.earthdata.nasa.gov/s3credentials"
response = requests.get(gesdisc_s3).json() 

### S3 Bucket Access with S3FS

The S3FS library is a FUSE-based service that allows for users to "mount" an S3 bucket and access it as if it were stored locally. It uses the botocore backend, and greatly simplifies accessing cloud-hosted data with Earthdata credentials.

Once our credentials are retrieved using our previously generated .netrc file, we can "mount" the filesystem by passing our token information. Remember that these credentials only last for one hour, and once this time passes, the kernel must be reset and the cell below will need to be re-run.

In [3]:
fs = s3fs.S3FileSystem(key=response['accessKeyId'],
                    secret=response['secretAccessKey'],
                    token=response['sessionToken'],
                    client_kwargs={'region_name':'us-west-2'})

# Check that the file system is intact as an S3FileSystem object, which means that token is valid
# Common causes of rejected S3 access tokens include incorrect passwords stored in the netrc file, or a non-existent netrc file
type(fs)

s3fs.core.S3FileSystem