# Accessing AWS L2 satellite data

# GOES-16, GOES-17, and Himawari

- Funding: Interagency Implementation and Advanced Concepts Team [IMPACT](https://earthdata.nasa.gov/esds/impact) for the Earth Science Data Systems (ESDS) program and AWS Public Dataset Program
  
### Credits: Tutorial development
* [Dr. Chelle Gentemann](mailto:gentemann@faralloninstitute.org) -  [Twitter](https://twitter.com/ChelleGentemann)   - Farallon Institute

### Here we will demonstrate some ways to access the different geostationary datasets on AWS:
- AWS GOES sea surface temperatures  (L2)
- AWS Himawari sea surface temperatures (L2)

# How to find an AWS Public Dataset

- Click here: [AWS Public Dataset](https://aws.amazon.com/opendata/)
- Click on `Find public available data on AWS` button
- Search for GOES
- Select [GOES-16 and GOES-17](https://registry.opendata.aws/noaa-goes/) or [Himawari](https://registry.opendata.aws/noaa-himawari/)

## NetCDF Geostationary data

- When data is moved on the cloud in it's original format, it is still useful, but can be challenging to use because the older formats lack consolidated metadata.  

- Info on the data is here -- GOES has a lot of different parameters and they are all stored in different files with names that are difficult to understand.  There are *80* different data products.  Himawari has 5 different products. This link lists them all and explains the different GOES Products [AWS info on GOES SST data](https://docs.opendata.aws/noaa-goes16/cics-readme.html).  

- The GOES and Himawari data are stored in netCDF format.  There is a different for each of the 80 projects for year/day/hour.  To 'browse' s3: buckets to understand the directory and data structure:  [Explore S3 structure](https://noaa-goes16.s3.amazonaws.com/index.html).  The directory structure is `<Product>/<Year>/<Day of Year>/<Hour>/<Filename>`

- In the code below we are going to create a function that searches for all products availalbe from each satellite then reads in a full day of data. The files netCDF, so opening a day takes about 3-4 minutes.  There are other ways you could write this function depending on what your analysis goals are, this is just one way to get some data in a reasonable amount of time. 
- This function uses:
-- [`s3fs.S3FileSystem`](https://s3fs.readthedocs.io/en/latest/) which holds a connection with a s3 bucket and allows you to list files, etc.  
-- [`xr.open_mfdataset`](http://xarray.pydata.org/en/stable/generated/xarray.open_mfdataset.html#xarray.open_mfdataset) opens a list of filenames and concatenates the data along the specified dimensions  

### To run this notebook

Code is in the cells that have <span style="color: blue;">In [  ]:</span> to the left of the cell and have a colored background

To run the code:
- option 1) click anywhere in the cell, then hold `shift` down and press `Enter`
- option 2) click on the Run button at the top of the page in the dashboard

Remember:
- to insert a new cell below press `Esc` then `b`
- to delete a cell press `Esc` then `dd`

### First start by importing libraries


In [None]:
# filter some warning messages
import warnings 
warnings.filterwarnings("ignore") 

#libraries
import datetime as dt
import xarray as xr
import fsspec
import s3fs
from matplotlib import pyplot as plt
# make datasets display nicely
xr.set_options(display_style="html")  
import panel as pn
pn.extension()

#magic fncts #put static images of your plot embedded in the notebook
%matplotlib inline  
plt.rcParams['figure.figsize'] = 12, 6
%config InlineBackend.figure_format = 'retina' 

## Create functions for the drop down menus and reading the data

In [None]:
dir_base={'goes-east':'s3://noaa-goes16/','goes-west':'s3://noaa-goes17/','himawari':'s3://noaa-himawari8/'}

#file list of files for geo satellite selected
def prod_select(geo_sat):
    s3_base = dir_base[geo_sat]
    fs = s3fs.S3FileSystem(anon=True) #connect to s3 bucket!
    file_location = fs.glob(s3_base+'*-L2-*')
    ipos=file_location[0].find('/')
    file_types = [file[ipos+1:] for file in file_location]
    vardict = file_types
    return vardict

#get list of daily files
def get_geo_filenames(sat,geo_product,lyr,idyjl):
    # arguments
    # sat, geo_product  =   satellite / product
    # lyr,idyjl         =   year, idyjl day of year
    
    d = dt.datetime(lyr,1,1) + dt.timedelta(days=idyjl)
    fs = s3fs.S3FileSystem(anon=True) #connect to s3 bucket!

    #create strings for the year and julian day
    imon,idym=d.month,d.day
    syr,sjdy,smon,sdym = str(lyr).zfill(4),str(idyjl).zfill(3),str(imon).zfill(2),str(idym).zfill(2)
    s3_base = dir_base[sat]
    #use glob to list all the files in the directory
    if sat=='himawari':
        file_location = fs.glob(s3_base + geo_product + '/'+syr+'/'+smon+'/'+sdym+'/*/*.nc')
    else:
        file_location = fs.glob(s3_base + geo_product + '/'+syr+'/'+sjdy+'/*/*.nc')
    return file_location

#get 1 day data
def get_geo_data(file_location):
    # arguments
    # list of files to open
    # returns data and whether it exists
    
    ds,exist = [],False
    
    #make a list of links to the file keys
    fs = s3fs.S3FileSystem(anon=True) #connect to s3 bucket!
    if len(file_location)<1:
        return 
    file_ob = [fs.open(file) for file in file_location]        #open connection to files
   
    #open all the day's data
    with xr.open_mfdataset(file_ob,combine='nested',concat_dim='time') as ds:      
        if 'himawari' not in files[0]:
            ds = ds.rename({'t':'time'})
            ds = ds.reset_coords()
        else:
            ds = ds.rename({'ni':'x','nj':'y'})  
    return ds


## This code creates a drop down menu to select a satellite 

In [None]:
def callback(target, event):
    var = prod_select(event.new)
    select_product.options=var
    return var

select_product =  pn.widgets.Select(name='Product ID')
select_satellite = pn.widgets.Select(name='Satellite', options= ['goes-east','goes-west','himawari'])
select_satellite.link(select_product, callbacks={'value': callback})

pn.Row(select_satellite, select_product)

- For a first test, we suggest you set the dropdown menus above to 'goes-west' and 'ABI-L2-SSTF'

In [None]:
lyr,idyjl=2020,215
files = get_geo_filenames(select_satellite.value,select_product.value,lyr,idyjl)
print('Number of files:', len(files))
[print(file) for file in files[0:5]]

## OPTIONAL: You may want to filter filenames to only the data you require
 - for example Himawari clouds you many only want the MASK
 - for Himawari SST you may only want L2 rather than L3

In [None]:
#files = [s for s in files if "MASK" in s]  #only retain files with 'MASK' in them

files = [s for s in files if "L2P" in s]  #only retain files with 'L2P' in them

[print(file) for file in files[0:5]]

## Read in data

In [None]:
ds = get_geo_data(files)
ds