# Accessing FITS, CDF and NetCDF from S3
## Comments to sandy.antunes@jhuapl.edu
Here we walk through how to access FITS files, CDF files, and NetCDF files that are in AWS S3 storage.  Each cell can be run on its own, with the python import lines in each cell.  Let's begin.

First, a quick sanity check to make sure Python is up and running.

In [None]:
print("hello world")

Next we connect to our S3 bucket.  We'll later use different connections, depending on the file, but this is a good example of how to access S3.

In [None]:
import boto3
mybucket='gov-nasa-hdrl-data1'
s3_res = boto3.resource('s3')
s3_bucket = s3_res.Bucket(mybucket)

And here is our list of potential files to try, from the GUVI, MMS and PSP missions.  (You can skip the commented-out boxes, again provided to add alternative test cases.)

Here is an example of a 'raw' read, where we access any binary file and extract information.  In this example, we open a CDF file as bytes then extract the checksum 'magic number' first field from it (which should read as 'cdf30001').

In [None]:
import boto3
import io
# S3 read specific bytes
s3c = boto3.client('s3')

mykey='demo-data/mms_fgm.cdf'
obj = s3c.get_object(Bucket=mybucket,Key=mykey,Range='bytes=0-8')
rawdata=obj['Body'].read()
bdata=io.BytesIO(rawdata)

magic_number=bdata.read(4).hex()
print("Should print 'cdf30001' if read was correct:",magic_number)

## The Core Examples
Here is the code to read each file, in brief.  We'll then go into each in more depth.

In [None]:
# CDF reading from S3 cloud
import cdflib
s3name="s3://gov-nasa-hdrl-data1/demo-data/mms_fgm.cdf"
with cdflib.CDF(s3name) as cdfin1:
    print(cdfin1.cdf_info())

In [None]:
# CDF reading in a URL
import cdflib
s3name="https://gov-nasa-hdrl-data1.s3.amazonaws.com/demo-data/mms_fgm.cdf"
with cdflib.CDF(s3name) as cdfin1:
    print(cdfin1.cdf_info())

In [None]:
# FITS, using s3fs, reading from S3 cloud
import astropy.io.fits
# note some versions of AstroPy can be compiled to open S3 files directly, with no intermediary
s3name="s3://gov-nasa-hdrl-data1/demo-data/sdo_aia.fits"
try:
    data = astropy.io.fits.open(s3name)
    print("astropy was compiled with S3 support!")
except:
    print("astropy was not compiled with S3 support, using 's3fs'")
    import s3fs
    fs=s3fs.S3FileSystem(anon=False)
    fgrab = fs.open(s3name)
    data = astropy.io.fits.open(fgrab)

print(data[0].header[0:10])

In [None]:
# NetCDF via xarray, using s3fs, reading from S3 cloud
import s3fs
import xarray as xr
s3name="s3://gov-nasa-hdrl-data1/demo-data/guvi_spect.nc"
fs=s3fs.S3FileSystem(anon=False)
fgrab = fs.open(s3name)
dataset = xr.open_dataset(fgrab)
print(dataset)
dataset.close()
fgrab.close()

In [None]:
# Example of reading in a file as a Bytestream for using a non-S3-aware reader (in this case, AstroPy)
import astropy.io.fits
import io
import boto3

s3c = boto3.client('s3')
mybucket, mykey = 'gov-nasa-hdrl-data1', 'demo-data/sdo_aia.fits'

fobj = s3c.get_object(Bucket=mybucket,Key=mykey)
rawdata = fobj['Body'].read()
bdata = io.BytesIO(rawdata)
data = astropy.io.fits.open(bdata,memmap=False)

header = data[0].header
print(header[0:10])


Comments? Feel free to contact the author.