# Accessing FITS, CDF and NetCDF from S3
## Comments to sandy.antunes@jhuapl.edu
Here we walk through how to access FITS files, CDF files, and NetCDF files that are in AWS S3 storage.  We include speed/performance testing in this Notebook as well.  Let's begin.

First, we do the usual python imports we'll need.

In [11]:
import time
import xarray as xr
import sunpy.io as sunio
import spacepy.datamodel as spio
import cdflib
import boto3
import s3fs
import io
import astropy

Next, a quick sanity check to make sure Python is up and running.  Because we use this script for time trials, you can set 'iloop' to gather better statistics on read times.  For now, we'll just do one read.

In [12]:
print("hello world")
iloop=1  # for gathering stats

hello world


Next we connect to our S3 bucket.  We'll later use different connections, depending on the file, but this is a good example of how to access S3.

In [13]:
mybucket='helio-public'
s3_res = boto3.resource('s3')
s3_bucket = s3_res.Bucket(mybucket)

And here is our list of potential files to try, from the GUVI, MMS and PSP missions.  (You can skip the commented-out boxes, again provided to add alternative test cases.)

In [14]:
fitsfiles = ['guvi_spect.fits','mms_fgm.fits','psp_wispr.fts']
cdffiles = ['guvi_spect.cdf','mms_fgm.cdf','psp_wispr.cdf']
netcdffiles = ['guvi_spect.nc','mms_fgm.nc','psp_wispr.nc']

In [15]:
## just cdf-native
#fitsfiles = ['mms_fgm.fits']
#cdffiles = ['mms_fgm.cdf']
#netcdffiles = ['mms_fgm.nc']

In [16]:
## just nc-native
#fitsfiles = ['guvi_spect.fits']
#cdffiles = ['guvi_spect.cdf']
#netcdffiles = ['guvi_spect.nc']

In [17]:
## just fits-native
#fitsfiles = ['psp_wispr.fts']
#cdffiles = ['psp_wispr.cdf']
#netcdffiles = ['psp_wispr.nc']

Here is an example of a 'raw' read, where we access any binary file and extract information.  In this example, we open a CDF file as bytes then extract the checksum 'magic number' first field from it (which should read as 'cdf30001').

In [18]:
# S3 read specific bytes
s3c = boto3.client('s3')
mykey='skantunes/mms_fgm.cdf'
obj = s3c.get_object(Bucket=mybucket,Key=mykey,Range='bytes=0-8')
rawdata=obj['Body'].read()
bdata=io.BytesIO(rawdata)
magic_number=bdata.read(4).hex()
print(magic_number)

cdf30001


## The Core Examples
Here is the code to read each file, in brief.  We'll then go into each in more depth.

In [31]:
# CDF
import cdflib
s3name="s3://helio-public/skantunes/mms_fgm.cdf"
cdfin1=cdflib.CDF(s3name)
cdfin1.close()

# FITS
import astropy.io.fits
fobj = s3c.get_object(Bucket=mybucket,Key='skantunes/psp_wispr.fts')
rawdata = fobj['Body'].read()
bdata = io.BytesIO(rawdata)
data = astropy.io.fits.open(bdata,memmap=False)

# NetCDF
fs=s3fs.S3FileSystem(anon=False)
s3name='helio-public' + '/skantunes/'+'psp_wispr.nc'
fgrab = fs.open(s3name)
dataset = xr.open_dataset(fgrab)
dataset.close()
fgrab.close()

got flags  s3://helio-public/skantunes/mms_fgm.cdf None ascii 1


## More detailed analysis
Now we use the updated CDF library to read in all three CDF files, and verify their contents.  We read in 3 fashions-- as a local file, as a remote URL, and as an S3 file.  Versatile!

In [22]:
# new S3-aware CDF library
import cdflib
start=time.time()
s3stem = "s3://"+mybucket+"/skantunes/"
fname="mms1.cdf"
iloop=1
for i in range(iloop):
    for fname in cdffiles:
        s3fname = s3stem + fname
        #print(s3fname)
        cdfin1=cdflib.CDF(fname)
        cdfin2=cdflib.CDF(s3fname,False,'ascii',2)
        test1=cdfin1.cdf_info()
        test2=cdfin2.cdf_info()
        #print(fname,cdfin1,test1,"\n\n",s3fname,cdfin2,test2,"\n")
        cdfin1.close()
        cdfin2.close()
print("CDF S3 read, new library, time ",time.time()-start)

got flags  /home/jovyan/guvi_spect.cdf None ascii 1
got flags  s3://helio-public/skantunes/guvi_spect.cdf 2 ascii 1
got flags  /home/jovyan/mms_fgm.cdf None ascii 1
got flags  s3://helio-public/skantunes/mms_fgm.cdf 2 ascii 1
got flags  /home/jovyan/psp_wispr.cdf None ascii 1
got flags  s3://helio-public/skantunes/psp_wispr.cdf 2 ascii 1
CDF S3 read, new library, time  1.0311083793640137


Next we read in the FITS files using AstroPy (astropy.io.fits)

In [21]:
import astropy.io.fits
start=time.time()
for i in range(iloop):
    for fname in fitsfiles:
        fobj = s3c.get_object(Bucket=mybucket,Key='skantunes/'+fname)
        rawdata = fobj['Body'].read()
        bdata = io.BytesIO(rawdata)
        data = astropy.io.fits.open(bdata,memmap=False)
        header = data[0].header
print(header[0:1])
print("Fits S3 astropy read, time=",time.time()-start)

SIMPLE  =                    T / Written by IDL:  Thu Sep 24 12:10:46 2020      END                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     



Finally we read in NetCDF files using xarray.

In [23]:
print("NetCDF S3 read")
fs=s3fs.S3FileSystem(anon=False)
mybucket='helio-public'
s3dirloc=mybucket + '/skantunes/'
fname='psp_wispr.nc'
fs.ls(s3dirloc)
start=time.time()
for i in range(iloop):
    for fname in netcdffiles:
        s3name = s3dirloc + fname
        fgrab = fs.open(s3name)
        dataset = xr.open_dataset(fgrab)
        #d2=dataset.load()
        dataset.close()
        fgrab.close()
print("NetCDF S3 read, time=",time.time()-start)

NetCDF S3 read
NetCDF S3 read, time= 1.2773354053497314


Comments? Feel free to contact the author.