# NASA-hosted Cloud Optimized Geotiffs Demo

Most datasets in NASA's archive are stored as NetCDF, HDF5, or TIFF formats. Occaisionally, NASA Distributed Active Archive Centers (DAACs) will host services such as [OpenDAP](https://earthdata.nasa.gov/collaborate/open-data-services-and-software/api/opendap) to provide server-side subsetting functionality, which becomes increasingly useful as the size and complexity of individual data files grows. These services allow users to circumvent downloading entire files, instead extracting a subregion or subset of observational variables. 

An alternative emerging access pattern is to use data formats that allow client-side subsetting instead of maintaining complicated server applications. [Cloud-Optimized Geotiff (COG)](https://www.cogeo.org) for example, is a format that uses internal tiling in order for client-side software libraries to retrieve only portions of a single file over a network connection. This notebook illustrates how this functionality

### Part 1) Locate data

The entire NASA archive can be programmatically searched via the [Common Metadata Repository (CMR)](https://cmr.earthdata.nasa.gov/search/site/docs/search/api.html). Anyone can search this archive, but to download files you'll need to authenticate using [NASA's Earthdata login](https://wiki.earthdata.nasa.gov/display/EL/How+To+Access+Data+With+cURL+And+Wget) (click on the link for instructions to set up a user name and password, and configure a `~/.netrc` file on your computer)




We'll subset here for a MEaSURES dataset consisting of Greenland Image Mosaics form Sentinel-1A and 1-B Synthetic Aperture Radar Backscatter (https://nsidc.org/data/nsidc-0723). Note this is a large data collection (> 1000 individual files, > 1.5Tb). If you want to explore and analyze this dataset it would be inconvenient to download everything. Nevertheless, NASA happily gives you the option to do so, in the form of a download script. Let's modify that script to just get a list of URLs to COGs without downloading them:

In [1]:
# Note clicking on this link:
#https://dycghwhsgr9el.cloudfront.net/concepts/metadata?ee=uat&url=https%3A%2F%2Fcmr.uat.earthdata.nasa.gov%2Fsearch%2Fgranules.json%3Fecho_collection_id%3DC1239257679-NSIDC_TS1&token=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpZCI6MzU1LCJ1c2VybmFtZSI6InNjb3R0eWhxIiwicHJlZmVyZW5jZXMiOnt9LCJ1cnNQcm9maWxlIjp7ImZpcnN0X25hbWUiOiJTY290dCJ9LCJlYXJ0aGRhdGFFbnZpcm9ubWVudCI6InVhdCIsImlhdCI6MTYwODE0NDc3NX0.9RL8SH54D1YzizZuO6eWHvRqc6sKa_ZNBbK9OLYD-uk

# takes one here:
#https://cmr.uat.earthdata.nasa.gov/search/granules.json?echo_collection_id=C1239257679-NSIDC_TS1&token=de1d137c14f23090b5f5bbff870fcebf4d719a10521b6c9532daab06d8a92101:vuVKxuBowNooSuMfeJWZ4g

In [2]:
# UATCMR API Search (Need NASA EarthdataLoging configured in .netrc file locally)
import cmruat

# 
# https://search.uat.earthdata.nasa.gov/search/granules/granule-details?p=C1239257679-NSIDC_TS1

echo_collection_id = 'C1239257679-NSIDC_TS1'
token = '22b99056da7a99cc367e6eb9468d9b6007f2d659e89872a40500c3ffc8fbe007:vuVKxuBowNooSuMfeJWZ4g'
#time_start = '2015-01-01T00:00:00Z'
#time_end = '2020-12-31T15:43:33Z'
#bounding_box = '-54.85,69.31,-52.18,70.26'
#polygon = None
urls = cmruat.get_urls(echo_collection_id, token)

https://cmr.uat.earthdata.nasa.gov/search/granules.json?&scroll=true&page_size=2000&echo_collection_id=C1239257679-NSIDC_TS1&token=22b99056da7a99cc367e6eb9468d9b6007f2d659e89872a40500c3ffc8fbe007:vuVKxuBowNooSuMfeJWZ4g


In [3]:
print(f'retrieved {len(urls)} URLs')

retrieved 121 URLs


In [4]:
# Filter out COG urls only 
cogs = [url for url in urls if url.endswith('.1')]
print(len(cogs))
cogs[:5]

33


['https://n5eil11u.ecs.nsidc.org/TS1/DP0/MEASURES/NSIDC-0723.199/2020.07.27/GL_S1bks_mosaic_27Jul20_01Aug20_gamma0_50m_v03.1',
 'https://n5eil11u.ecs.nsidc.org/TS1/DP0/MEASURES/NSIDC-0723.199/2020.07.27/GL_S1bks_mosaic_27Jul20_01Aug20_sigma0_50m_v03.1',
 'https://n5eil11u.ecs.nsidc.org/TS1/DP0/MEASURES/NSIDC-0723.199/2020.07.27/GL_S1bks_mosaic_27Jul20_01Aug20_image_25m_v03.1',
 'https://n5eil11u.ecs.nsidc.org/TS1/DP0/MEASURES/NSIDC-0723.199/2020.08.02/GL_S1bks_mosaic_02Aug20_07Aug20_gamma0_50m_v03.1',
 'https://n5eil11u.ecs.nsidc.org/TS1/DP0/MEASURES/NSIDC-0723.199/2020.08.02/GL_S1bks_mosaic_02Aug20_07Aug20_sigma0_50m_v03.1']

In [5]:
# filter 'gamma', 'sigma', and 'image'
gammas = [cog for cog in cogs if 'gamma' in cog]
images = [cog for cog in cogs if 'image' in cog]
sigmas = [cog for cog in cogs if 'sigma' in cog]

In [6]:
# Save these lists to a file for use later:
with open('gamma0.txt', 'w') as f:
    f.write('\n'.join(gammas))

In [7]:
!head gamma0.txt

https://n5eil11u.ecs.nsidc.org/TS1/DP0/MEASURES/NSIDC-0723.199/2020.07.27/GL_S1bks_mosaic_27Jul20_01Aug20_gamma0_50m_v03.1
https://n5eil11u.ecs.nsidc.org/TS1/DP0/MEASURES/NSIDC-0723.199/2020.08.02/GL_S1bks_mosaic_02Aug20_07Aug20_gamma0_50m_v03.1
https://n5eil11u.ecs.nsidc.org/TS1/DP0/MEASURES/NSIDC-0723.199/2020.08.08/GL_S1bks_mosaic_08Aug20_13Aug20_gamma0_50m_v03.1
https://n5eil11u.ecs.nsidc.org/TS1/DP0/MEASURES/NSIDC-0723.199/2020.08.14/GL_S1bks_mosaic_14Aug20_19Aug20_gamma0_50m_v03.1
https://n5eil11u.ecs.nsidc.org/TS1/DP0/MEASURES/NSIDC-0723.199/2020.08.20/GL_S1bks_mosaic_20Aug20_25Aug20_gamma0_50m_v03.1
https://n5eil11u.ecs.nsidc.org/TS1/DP0/MEASURES/NSIDC-0723.199/2020.08.26/GL_S1bks_mosaic_26Aug20_31Aug20_gamma0_50m_v03.1
https://n5eil11u.ecs.nsidc.org/TS1/DP0/MEASURES/NSIDC-0723.199/2020.09.01/GL_S1bks_mosaic_01Sep20_06Sep20_gamma0_50m_v03.1
https://n5eil11u.ecs.nsidc.org/TS1/DP0/MEASURES/NSIDC-0723.199/2020.09.07/GL_S1bks_mosaic_07Sep20_12Sep20_gamma0_50m_v03.1
https://n5eil11u

In [8]:
%%time
cmd = '/Users/scott/miniconda3/envs/intake-stac-gui/bin/python'
for cog in cogs:
    #!GDAL_DISABLE_READDIR_ON_OPEN=EMPTY_DIR GDAL_HTTP_COOKIEFILE=.urs_cookies GDAL_HTTP_COOKIEJAR=.urs_cookies rio cogeo validate {cog}
    !GDAL_DISABLE_READDIR_ON_OPEN=EMPTY_DIR GDAL_HTTP_COOKIEFILE=.urs_cookies GDAL_HTTP_COOKIEJAR=.urs_cookies {cmd} validate_cloud_optimized_geotiff.py /vsicurl/{cog} 

/vsicurl/https://n5eil11u.ecs.nsidc.org/TS1/DP0/MEASURES/NSIDC-0723.199/2020.07.27/GL_S1bks_mosaic_27Jul20_01Aug20_gamma0_50m_v03.1 is a valid cloud optimized GeoTIFF

The size of all IFD headers is 66766 bytes
/vsicurl/https://n5eil11u.ecs.nsidc.org/TS1/DP0/MEASURES/NSIDC-0723.199/2020.07.27/GL_S1bks_mosaic_27Jul20_01Aug20_sigma0_50m_v03.1 is a valid cloud optimized GeoTIFF

The size of all IFD headers is 66766 bytes
/vsicurl/https://n5eil11u.ecs.nsidc.org/TS1/DP0/MEASURES/NSIDC-0723.199/2020.07.27/GL_S1bks_mosaic_27Jul20_01Aug20_image_25m_v03.1 is a valid cloud optimized GeoTIFF

The size of all IFD headers is 259948 bytes
/vsicurl/https://n5eil11u.ecs.nsidc.org/TS1/DP0/MEASURES/NSIDC-0723.199/2020.08.02/GL_S1bks_mosaic_02Aug20_07Aug20_gamma0_50m_v03.1 is a valid cloud optimized GeoTIFF

The size of all IFD headers is 66766 bytes
/vsicurl/https://n5eil11u.ecs.nsidc.org/TS1/DP0/MEASURES/NSIDC-0723.199/2020.08.02/GL_S1bks_mosaic_02Aug20_07Aug20_sigma0_50m_v03.1 is a valid cloud optimiz