# Viewing S3 Bucket Granules with S3FS
### Author: Chris Battisto
### Date Authored: 1-26-22

### Timing

Exercise: 5 minutes

<p></p>

<div style="background:#fc9090;border:1px solid #cccccc;padding:5px 10px;"><big><b>Note:  </b>This notebook <em><strong>will only run in an environment with <a href="https://disc.gsfc.nasa.gov/information/glossary?keywords=%22earthdata%20cloud%22&amp;title=AWS%20region">us-west-2 AWS access</a></strong></em>.</big></div>

### Overview

This notebook demonstrates viewing granules inside the GES DISC S3 bucket through the S3FS Python library.

### Prerequisites

This notebook was written using Python 3.8, and requires these libraries and files: 

- Xarray
- S3FS
  - S3FS documentation: https://s3fs.readthedocs.io/en/latest/install.html
- netrc file with valid Earthdata Login credentials.
- Approval to access the GES DISC archives with your Earthdata credentials (https://disc.gsfc.nasa.gov/earthdata-login)


### Import Libraries

In [1]:
import s3fs
import xarray as xr
import requests

### Get S3 Credentials and Mount the S3 Filesystem

In [4]:
gesdisc_s3 = "https://data.gesdisc.earthdata.nasa.gov/s3credentials"
response = requests.get(gesdisc_s3).json() 

fs = s3fs.S3FileSystem(key=response['accessKeyId'],
                    secret=response['secretAccessKey'],
                    token=response['sessionToken'],
                    client_kwargs={'region_name':'us-west-2'})

# Check that the file system is intact as an S3FileSystem object, which means that token is valid
# Common causes of rejected S3 access tokens include incorrect passwords stored in the netrc file, or a non-existent netrc file
type(fs)

s3fs.core.S3FileSystem

### Explore S3FS Bucket Contents

Once you have your S3 URLs and the filesystem mounted, you can access and view information as if it were stored locally.

In [5]:
# You can use filesystem commands like ls and glob to view contents:

print('Current datasets: ')
fs.glob('s3://gesdisc-cumulus-prod-protected/')

Current datasets: 


['gesdisc-cumulus-prod-protected/GPM_3IMERGHH',
 'gesdisc-cumulus-prod-protected/GPM_L3',
 'gesdisc-cumulus-prod-protected/M2T1NXSLV',
 'gesdisc-cumulus-prod-protected/MERRA2']

In [6]:
fs.ls('s3://gesdisc-cumulus-prod-protected/')

['gesdisc-cumulus-prod-protected/GPM_3IMERGHH',
 'gesdisc-cumulus-prod-protected/GPM_L3',
 'gesdisc-cumulus-prod-protected/M2T1NXSLV',
 'gesdisc-cumulus-prod-protected/MERRA2']

### Open a Granule Directly in Xarray

Individual granule metadata can be retrived using S3FS, and can be opened with Xarray natively.

In [7]:
fn = 's3://gesdisc-cumulus-prod-protected/MERRA2/M2T1NXSLV.5.12.4/2019/03/MERRA2_400.tavg1_2d_slv_Nx.20190313.nc4'

fs.info(fn)

{'ETag': '"ab39493d3182642efbf610439b3d1d29-2"',
 'Key': 'gesdisc-cumulus-prod-protected/MERRA2/M2T1NXSLV.5.12.4/2019/03/MERRA2_400.tavg1_2d_slv_Nx.20190313.nc4',
 'LastModified': datetime.datetime(2021, 3, 18, 23, 32, 5, tzinfo=tzutc()),
 'Size': 415071782,
 'size': 415071782,
 'name': 'gesdisc-cumulus-prod-protected/MERRA2/M2T1NXSLV.5.12.4/2019/03/MERRA2_400.tavg1_2d_slv_Nx.20190313.nc4',
 'type': 'file',
 'StorageClass': 'STANDARD',
 'VersionId': None}

Open the file by passing <code>fs.open(fn)</code> into <code>xr.open_dataset()</code>

In [8]:
ds = xr.open_dataset(fs.open(fn),
                     decode_cf=True,)

ds