# Intro

This notebook describes how to download NESDIS satellite data from the the Amazon cloud serves (AWS) using the nesdis_aws library. nesdis_aws is a wrapper around the s3fs library which is briefly intorduced further down on this page.

For info on the available data projects go to: https://docs.opendata.aws/noaa-goes16/cics-readme.html

# imports

In [1]:
# Module to interface with s3 (AWS)
import s3fs
import pathlib as pl

# The nesdis_aws package

availble here: https://github.com/hagne/nesdis_aws

In [1]:
import nesdis_aws

In [3]:
nesdis_aws.nesdis_aws.readme()

follow link for readme: https://docs.opendata.aws/noaa-goes16/cics-readme.html


### product information

In [117]:
reload(nesdis_aws)
reload(nesdis_aws.nesdis_aws)

<module 'nesdis_aws.nesdis_aws' from '/mnt/telg/prog/nesdis_aws/nesdis_aws/nesdis_aws.py'>

In [100]:
available_products = nesdis_aws.nesdis_aws.get_available_products()

In [108]:
available_products

Unnamed: 0,16-C,16-F,16-M,17-C,17-F,17-M
ABI-L1b-Rad,2017-03-01,2017-03-01,2017-03-01,2018-08-29,2018-08-29,2018-08-29
ABI-L2-ACHA,2019-12-03,2019-12-03,2019-12-03,2019-12-03,2019-12-03,2019-12-03
ABI-L2-ACHT,-,2019-12-06,2019-12-06,-,2019-12-06,2019-12-06
ABI-L2-ACM,2017-04-20,2019-12-03,2019-12-03,2018-08-28,2019-12-03,2019-12-03
ABI-L2-ACTP,2017-05-17,2017-05-17,2019-12-03,2018-08-28,2018-08-28,2019-12-03
ABI-L2-ADP,2019-12-04,2019-12-04,2019-12-04,2019-12-04,2019-12-04,2019-12-04
ABI-L2-AICE,-,2021-02-25,-,-,2021-02-26,-
ABI-L2-AITA,-,2021-02-25,-,-,2021-02-26,-
ABI-L2-AOD,2017-05-25,2019-12-06,-,2018-08-28,2019-12-06,-
ABI-L2-BRF,2021-08-19,2021-08-19,2021-08-19,2021-08-19,2021-08-19,2021-08-19


In [105]:
def test(x, start = '2018-08-01'):
    try:
        dt = pd.to_datetime(x)
    except:
        return False
    dtstart = pd.to_datetime(start)
    return dtstart > dt

In [110]:
available_products[available_products.applymap(test)].dropna(how = 'all')

Unnamed: 0,16-C,16-F,16-M,17-C,17-F,17-M
ABI-L1b-Rad,2017-03-01,2017-03-01,2017-03-01,,,
ABI-L2-ACM,2017-04-20,,,,,
ABI-L2-ACTP,2017-05-17,2017-05-17,,,,
ABI-L2-AOD,2017-05-25,,,,,
ABI-L2-CMIP,2017-03-01,2017-03-01,2017-03-01,,,
ABI-L2-COD,2017-06-09,,,,,
ABI-L2-FDC,2017-05-25,2017-05-25,,,,
ABI-L2-LST,2017-05-25,2017-05-25,,,,
ABI-L2-MCMIP,2017-03-01,2017-03-01,2017-03-01,,,


## Make a query

Initiate a data query. This will generate a list of available files. One can estimate the disk space that is needed and if sufficient space is available

In [7]:
query = nesdis_aws.AwsQuery(path2folder_local='/mnt/telg/data/smoke_events/20200912_18_CO/goes_raw/',
                            satellite='16',
                            product='ABI-L2-AOD',
                            scan_sector='C',
                            start='2020-09-12 12:00:00',
                            end='2020-09-19 13:00:00',
                            no_of_days=None,
                            last_x_days=None,
                            max_no_of_files=100)

In [8]:
print(query.info_on_current_query())


no of files: 2026
estimated disk usage: 7771 mb
remaining disk space after download: 48 %



## get subset of workplan

The workplan is a pandas DataFrame, that can be altered, e.g. truncated or resampled.

### Examples

Only consider every 10th row:

In [9]:
query.workplan[::10]

Unnamed: 0,path2file_aws,path2file_local
2020-09-12 12:01:15,noaa-goes16/ABI-L2-AODC/2020/256/12/OR_ABI-L2-...,/mnt/telg/data/smoke_events/20200912_18_CO/goe...
2020-09-12 12:51:15,noaa-goes16/ABI-L2-AODC/2020/256/12/OR_ABI-L2-...,/mnt/telg/data/smoke_events/20200912_18_CO/goe...
2020-09-12 13:41:15,noaa-goes16/ABI-L2-AODC/2020/256/13/OR_ABI-L2-...,/mnt/telg/data/smoke_events/20200912_18_CO/goe...
2020-09-12 14:31:15,noaa-goes16/ABI-L2-AODC/2020/256/14/OR_ABI-L2-...,/mnt/telg/data/smoke_events/20200912_18_CO/goe...
2020-09-12 15:21:15,noaa-goes16/ABI-L2-AODC/2020/256/15/OR_ABI-L2-...,/mnt/telg/data/smoke_events/20200912_18_CO/goe...
...,...,...
2020-09-19 09:11:16,noaa-goes16/ABI-L2-AODC/2020/263/09/OR_ABI-L2-...,/mnt/telg/data/smoke_events/20200912_18_CO/goe...
2020-09-19 10:01:16,noaa-goes16/ABI-L2-AODC/2020/263/10/OR_ABI-L2-...,/mnt/telg/data/smoke_events/20200912_18_CO/goe...
2020-09-19 10:51:16,noaa-goes16/ABI-L2-AODC/2020/263/10/OR_ABI-L2-...,/mnt/telg/data/smoke_events/20200912_18_CO/goe...
2020-09-19 11:41:16,noaa-goes16/ABI-L2-AODC/2020/263/11/OR_ABI-L2-...,/mnt/telg/data/smoke_events/20200912_18_CO/goe...


Only one file every 15 minutes

In [10]:
resampled = query.workplan.resample('15min').first()
resampled

Unnamed: 0,path2file_aws,path2file_local
2020-09-12 12:00:00,noaa-goes16/ABI-L2-AODC/2020/256/12/OR_ABI-L2-...,/mnt/telg/data/smoke_events/20200912_18_CO/goe...
2020-09-12 12:15:00,noaa-goes16/ABI-L2-AODC/2020/256/12/OR_ABI-L2-...,/mnt/telg/data/smoke_events/20200912_18_CO/goe...
2020-09-12 12:30:00,noaa-goes16/ABI-L2-AODC/2020/256/12/OR_ABI-L2-...,/mnt/telg/data/smoke_events/20200912_18_CO/goe...
2020-09-12 12:45:00,noaa-goes16/ABI-L2-AODC/2020/256/12/OR_ABI-L2-...,/mnt/telg/data/smoke_events/20200912_18_CO/goe...
2020-09-12 13:00:00,noaa-goes16/ABI-L2-AODC/2020/256/13/OR_ABI-L2-...,/mnt/telg/data/smoke_events/20200912_18_CO/goe...
...,...,...
2020-09-19 11:45:00,noaa-goes16/ABI-L2-AODC/2020/263/11/OR_ABI-L2-...,/mnt/telg/data/smoke_events/20200912_18_CO/goe...
2020-09-19 12:00:00,noaa-goes16/ABI-L2-AODC/2020/263/12/OR_ABI-L2-...,/mnt/telg/data/smoke_events/20200912_18_CO/goe...
2020-09-19 12:15:00,noaa-goes16/ABI-L2-AODC/2020/263/12/OR_ABI-L2-...,/mnt/telg/data/smoke_events/20200912_18_CO/goe...
2020-09-19 12:30:00,noaa-goes16/ABI-L2-AODC/2020/263/12/OR_ABI-L2-...,/mnt/telg/data/smoke_events/20200912_18_CO/goe...


To update the query's workplan overwrite it with the altered one:

In [11]:
query.workplan = resampled

In [12]:
print(query.info_on_current_query())

no of files: 676
estimated disk usage: 2512 mb
remaining disk space after download: 48 %



## download the files

To download all files in the workplan execute:

In [None]:
query.download()

## process while downloading (experimental)

Provide a function that is applied to each file after it was downloaded. One can choose to delete the downloaded file after processing.This is most useful when limited storage is available. Note there is an additional column added to the workplan with the name path2file_local_processed.

In [None]:
def function(row):
    """ A function that takes a single row of the workplan and does things based on that.
    For Example:
        - open the downloaded file: data = some_open_function(row.path2file_local)
        - process the data: data_processed = some_processing_function(data)
        - save the processed data: some_save_function(data_processed, row.path2file_local_processed)
    """

query = nesdis_aws.AwsQuery(path2folder_local='/mnt/telg/data/smoke_events/20200912_18_CO/goes_raw/',
                            satellite='16',
                            product='ABI-L2-AOD',
                            scan_sector='C',
                            start='2020-09-12 12:00:00',
                            end='2020-09-19 13:00:00',
                            no_of_days=None,
                            last_x_days=None,
                            process = dict(function = function(row),
                                           prefix = 'ABI_L2_AOD_processed',
                                           path2processed = '/my/processed/files/'),
                           )

# AWS and the s3fs library

This is an introduction into how to use the s3fs library for acessing the AWS system ... In case you don't want to use the nesdis_aws package or simply want to understand the key library in it.

## connect to file system

In [119]:
import s3fs

In [120]:
aws = s3fs.S3FileSystem(anon=True)

## explore file system

In [122]:
satellite = 16#16 (east) or 17(west)
base_folder = pl.Path(f'noaa-goes{satellite}')
base_folder

PosixPath('noaa-goes16')

### products

In [123]:
products_available = aws.glob(base_folder.joinpath('*').as_posix())
products_available

['noaa-goes16/ABI-L1b-RadC',
 'noaa-goes16/ABI-L1b-RadF',
 'noaa-goes16/ABI-L1b-RadM',
 'noaa-goes16/ABI-L2-ACHAC',
 'noaa-goes16/ABI-L2-ACHAF',
 'noaa-goes16/ABI-L2-ACHAM',
 'noaa-goes16/ABI-L2-ACHTF',
 'noaa-goes16/ABI-L2-ACHTM',
 'noaa-goes16/ABI-L2-ACMC',
 'noaa-goes16/ABI-L2-ACMF',
 'noaa-goes16/ABI-L2-ACMM',
 'noaa-goes16/ABI-L2-ACTPC',
 'noaa-goes16/ABI-L2-ACTPF',
 'noaa-goes16/ABI-L2-ACTPM',
 'noaa-goes16/ABI-L2-ADPC',
 'noaa-goes16/ABI-L2-ADPF',
 'noaa-goes16/ABI-L2-ADPM',
 'noaa-goes16/ABI-L2-AICEF',
 'noaa-goes16/ABI-L2-AITAF',
 'noaa-goes16/ABI-L2-AODC',
 'noaa-goes16/ABI-L2-AODF',
 'noaa-goes16/ABI-L2-BRFC',
 'noaa-goes16/ABI-L2-BRFF',
 'noaa-goes16/ABI-L2-BRFM',
 'noaa-goes16/ABI-L2-CMIPC',
 'noaa-goes16/ABI-L2-CMIPF',
 'noaa-goes16/ABI-L2-CMIPM',
 'noaa-goes16/ABI-L2-CODC',
 'noaa-goes16/ABI-L2-CODF',
 'noaa-goes16/ABI-L2-CPSC',
 'noaa-goes16/ABI-L2-CPSF',
 'noaa-goes16/ABI-L2-CPSM',
 'noaa-goes16/ABI-L2-CTPC',
 'noaa-goes16/ABI-L2-CTPF',
 'noaa-goes16/ABI-L2-DMWC',
 'no

### example product AOD conus

In [124]:
# product = 'ABI-L2-DMWVC'#'ABI-L1b-RadC'""
# product = 'ABI-L2-AODC' 
# product = 'ABI-L2-ACHTF'
product = 'ABI-L2-FDCC'

In [125]:
years_available = aws.glob(base_folder.joinpath(product).joinpath('*').as_posix())
years_available

['noaa-goes16/ABI-L2-FDCC/2017',
 'noaa-goes16/ABI-L2-FDCC/2018',
 'noaa-goes16/ABI-L2-FDCC/2019',
 'noaa-goes16/ABI-L2-FDCC/2020',
 'noaa-goes16/ABI-L2-FDCC/2021',
 'noaa-goes16/ABI-L2-FDCC/2022']

In [134]:
year = 2020

In [135]:
dt = pd.to_datetime('2018-09-30')

In [136]:
dt.day_of_year

273

In [137]:
days_available = aws.glob(base_folder.joinpath(product).joinpath(f'{year}').joinpath('*').as_posix())
days_available

['noaa-goes16/ABI-L2-FDCC/2020/001',
 'noaa-goes16/ABI-L2-FDCC/2020/002',
 'noaa-goes16/ABI-L2-FDCC/2020/003',
 'noaa-goes16/ABI-L2-FDCC/2020/004',
 'noaa-goes16/ABI-L2-FDCC/2020/005',
 'noaa-goes16/ABI-L2-FDCC/2020/006',
 'noaa-goes16/ABI-L2-FDCC/2020/007',
 'noaa-goes16/ABI-L2-FDCC/2020/008',
 'noaa-goes16/ABI-L2-FDCC/2020/009',
 'noaa-goes16/ABI-L2-FDCC/2020/010',
 'noaa-goes16/ABI-L2-FDCC/2020/011',
 'noaa-goes16/ABI-L2-FDCC/2020/012',
 'noaa-goes16/ABI-L2-FDCC/2020/013',
 'noaa-goes16/ABI-L2-FDCC/2020/014',
 'noaa-goes16/ABI-L2-FDCC/2020/015',
 'noaa-goes16/ABI-L2-FDCC/2020/016',
 'noaa-goes16/ABI-L2-FDCC/2020/017',
 'noaa-goes16/ABI-L2-FDCC/2020/018',
 'noaa-goes16/ABI-L2-FDCC/2020/019',
 'noaa-goes16/ABI-L2-FDCC/2020/020',
 'noaa-goes16/ABI-L2-FDCC/2020/021',
 'noaa-goes16/ABI-L2-FDCC/2020/022',
 'noaa-goes16/ABI-L2-FDCC/2020/023',
 'noaa-goes16/ABI-L2-FDCC/2020/024',
 'noaa-goes16/ABI-L2-FDCC/2020/025',
 'noaa-goes16/ABI-L2-FDCC/2020/026',
 'noaa-goes16/ABI-L2-FDCC/2020/027',
 

In [141]:
day = 274

In [None]:
noaa-goes16/ABI-L2-FDC/2020/274/17/

In [142]:
hours_available = aws.glob(base_folder.joinpath(product).joinpath(f'{year}').joinpath(f'{day:03d}').joinpath('*').as_posix())
hours_available

['noaa-goes16/ABI-L2-FDCC/2020/274/00',
 'noaa-goes16/ABI-L2-FDCC/2020/274/01',
 'noaa-goes16/ABI-L2-FDCC/2020/274/02',
 'noaa-goes16/ABI-L2-FDCC/2020/274/03',
 'noaa-goes16/ABI-L2-FDCC/2020/274/04',
 'noaa-goes16/ABI-L2-FDCC/2020/274/05',
 'noaa-goes16/ABI-L2-FDCC/2020/274/06',
 'noaa-goes16/ABI-L2-FDCC/2020/274/07',
 'noaa-goes16/ABI-L2-FDCC/2020/274/08',
 'noaa-goes16/ABI-L2-FDCC/2020/274/09',
 'noaa-goes16/ABI-L2-FDCC/2020/274/10',
 'noaa-goes16/ABI-L2-FDCC/2020/274/11',
 'noaa-goes16/ABI-L2-FDCC/2020/274/12',
 'noaa-goes16/ABI-L2-FDCC/2020/274/13',
 'noaa-goes16/ABI-L2-FDCC/2020/274/14',
 'noaa-goes16/ABI-L2-FDCC/2020/274/15',
 'noaa-goes16/ABI-L2-FDCC/2020/274/16',
 'noaa-goes16/ABI-L2-FDCC/2020/274/17',
 'noaa-goes16/ABI-L2-FDCC/2020/274/18',
 'noaa-goes16/ABI-L2-FDCC/2020/274/19',
 'noaa-goes16/ABI-L2-FDCC/2020/274/20',
 'noaa-goes16/ABI-L2-FDCC/2020/274/21',
 'noaa-goes16/ABI-L2-FDCC/2020/274/22',
 'noaa-goes16/ABI-L2-FDCC/2020/274/23']

In [140]:
hour = 9
files_available = aws.glob(base_folder.joinpath(product).joinpath(f'{year}').joinpath(f'{day:03d}').joinpath(f'{hour:02d}').joinpath('*').as_posix())
files_available

['noaa-goes16/ABI-L2-FDCC/2020/273/09/OR_ABI-L2-FDCC-M6_G16_s20202730901168_e20202730903541_c20202730904263.nc',
 'noaa-goes16/ABI-L2-FDCC/2020/273/09/OR_ABI-L2-FDCC-M6_G16_s20202730906168_e20202730908541_c20202730909250.nc',
 'noaa-goes16/ABI-L2-FDCC/2020/273/09/OR_ABI-L2-FDCC-M6_G16_s20202730911168_e20202730913541_c20202730914232.nc',
 'noaa-goes16/ABI-L2-FDCC/2020/273/09/OR_ABI-L2-FDCC-M6_G16_s20202730916168_e20202730918541_c20202730919239.nc',
 'noaa-goes16/ABI-L2-FDCC/2020/273/09/OR_ABI-L2-FDCC-M6_G16_s20202730921168_e20202730923541_c20202730924237.nc',
 'noaa-goes16/ABI-L2-FDCC/2020/273/09/OR_ABI-L2-FDCC-M6_G16_s20202730926168_e20202730928541_c20202730929216.nc',
 'noaa-goes16/ABI-L2-FDCC/2020/273/09/OR_ABI-L2-FDCC-M6_G16_s20202730931168_e20202730933541_c20202730934221.nc',
 'noaa-goes16/ABI-L2-FDCC/2020/273/09/OR_ABI-L2-FDCC-M6_G16_s20202730936168_e20202730938541_c20202730939256.nc',
 'noaa-goes16/ABI-L2-FDCC/2020/273/09/OR_ABI-L2-FDCC-M6_G16_s20202730941168_e20202730943541_c202

## Download a file

In [48]:
path2file_aws = pl.Path(files_available[3])
path2folder_loc =pl.Path('/mnt/telg/tmp/aws_tmp/') 

In [49]:
path2file_local = path2folder_loc.joinpath(path2file_aws.name)
path2file_local

PosixPath('/mnt/telg/tmp/aws_tmp/OR_ABI-L2-AODC-M6_G16_s20200900916180_e20200900918553_c20200900919522.nc')

In [51]:
aws.get(path2file_aws.as_posix(), path2file_local.as_posix())

[None]