# NASA-hosted Cloud Optimized Geotiffs Demo

Most datasets in NASA's archive are stored as NetCDF, HDF5, or TIFF formats. Occaisionally, NASA Distributed Active Archive Centers (DAACs) will host services such as [OpenDAP](https://earthdata.nasa.gov/collaborate/open-data-services-and-software/api/opendap) to provide server-side subsetting functionality, which becomes increasingly useful as the size and complexity of individual data files grows. These services allow users to circumvent downloading entire files, instead extracting a subregion or subset of observational variables. 

An alternative emerging access pattern is to use data formats that allow client-side subsetting instead of maintaining complicated server applications. [Cloud-Optimized Geotiff (COG)](https://www.cogeo.org) for example, is a format that uses internal tiling in order for client-side software libraries to retrieve only portions of a single file over a network connection. This notebook illustrates how this functionality

### Part 1) Locate data

The entire NASA archive can be programmatically searched via the [Common Metadata Repository (CMR)](https://cmr.earthdata.nasa.gov/search/site/docs/search/api.html). Anyone can search this archive, but to download files you'll need to authenticate using [NASA's Earthdata login](https://wiki.earthdata.nasa.gov/display/EL/How+To+Access+Data+With+cURL+And+Wget) (click on the link for instructions to set up a user name and password, and configure a `~/.netrc` file on your computer)




We'll subset here for a MEaSURES dataset consisting of Greenland Image Mosaics form Sentinel-1A and 1-B Synthetic Aperture Radar Backscatter (https://nsidc.org/data/nsidc-0723). Note this is a large data collection (> 1000 individual files, > 1.5Tb). If you want to explore and analyze this dataset it would be inconvenient to download everything. Nevertheless, NASA happily gives you the option to do so, in the form of a download script. Let's modify that script to just get a list of URLs to COGs without downloading them:

In [1]:
# CMR API Search (Need NASA EarthdataLoging configured in .netrc file locally)
import cmr

short_name = 'NSIDC-0723'
version = '3'
time_start = '2015-01-01T00:00:00Z'
time_end = '2020-10-05T15:43:33Z'
#bounding_box = '-54.85,69.31,-52.18,70.26'
bounding_box = None
polygon = None
#filename_filter = '*gamma*'
filename_filter = None

urls = cmr.get_urls(short_name, version, time_start, time_end, bounding_box, polygon, filename_filter)

https://cmr.earthdata.nasa.gov/search/granules.json?provider=NSIDC_ECS&sort_key[]=start_date&sort_key[]=producer_granule_id&scroll=true&page_size=2000&short_name=NSIDC-0723&version=3&temporal[]=2015-01-01T00:00:00Z,2020-10-05T15:43:33Z


In [2]:
print(f'retrieved {len(urls)} URLs')

retrieved 5224 URLs


In [3]:
# Note these URLs include JPG thumbnails, full resolution TIF, metadata in the form of XML
urls[:5]

['https://n5eil01u.ecs.nsidc.org/DP4/MEASURES/NSIDC-0723.003/2015.01.01/GL_S1bks_mosaic_01Jan15_12Jan15_gamma0_500m_v03.0.jpg',
 'https://n5eil01u.ecs.nsidc.org/DP4/MEASURES/NSIDC-0723.003/2015.01.01/GL_S1bks_mosaic_01Jan15_12Jan15_gamma0_500m_v03.0.jpg.aux.xml',
 'https://n5eil01u.ecs.nsidc.org/DP4/MEASURES/NSIDC-0723.003/2015.01.01/GL_S1bks_mosaic_01Jan15_12Jan15_gamma0_50m_v03.0.tif',
 'https://n5eil01u.ecs.nsidc.org/DP4/MEASURES/NSIDC-0723.003/2015.01.01/GL_S1bks_mosaic_01Jan15_12Jan15_gamma0_50m_v03.0.tif.aux.xml',
 'https://n5eil01u.ecs.nsidc.org/DP4/MEASURES/NSIDC-0723.003/2015.01.01/GL_S1bks_mosaic_01Jan15_12Jan15_gamma0.xml']

In [9]:
# Get URLs for all JPG thumbnails
jpgs = [url for url in urls if url.endswith('jpg')]
jpgs[-1]

'https://n5eil01u.ecs.nsidc.org/DP4/MEASURES/NSIDC-0723.003/2020.02.22/GL_S1bks_mosaic_22Feb20_27Feb20_sigma0_500m_v03.0.jpg'

In [5]:
# Filter out COG urls only 
cogs = [url for url in urls if url.endswith('tif')]
print(len(cogs))
cogs[:5]

783


['https://n5eil01u.ecs.nsidc.org/DP4/MEASURES/NSIDC-0723.003/2015.01.01/GL_S1bks_mosaic_01Jan15_12Jan15_gamma0_50m_v03.0.tif',
 'https://n5eil01u.ecs.nsidc.org/DP4/MEASURES/NSIDC-0723.003/2015.01.01/GL_S1bks_mosaic_01Jan15_12Jan15_image_25m_v03.0.tif',
 'https://n5eil01u.ecs.nsidc.org/DP4/MEASURES/NSIDC-0723.003/2015.01.01/GL_S1bks_mosaic_01Jan15_12Jan15_sigma0_50m_v03.0.tif',
 'https://n5eil01u.ecs.nsidc.org/DP4/MEASURES/NSIDC-0723.003/2015.01.13/GL_S1bks_mosaic_13Jan15_24Jan15_gamma0_50m_v03.0.tif',
 'https://n5eil01u.ecs.nsidc.org/DP4/MEASURES/NSIDC-0723.003/2015.01.13/GL_S1bks_mosaic_13Jan15_24Jan15_image_25m_v03.0.tif']

In [6]:
# filter 'gamma', 'sigma', and 'image'
gammas = [cog for cog in cogs if 'gamma' in cog]
images = [cog for cog in cogs if 'image' in cog]
sigmas = [cog for cog in cogs if 'sigma' in cog]

In [7]:
# Save these lists to a file for use later:
with open('gamma0.txt', 'w') as f:
    f.write('\n'.join(gammas))

In [8]:
!head gamma0.txt

https://n5eil01u.ecs.nsidc.org/DP4/MEASURES/NSIDC-0723.003/2015.01.01/GL_S1bks_mosaic_01Jan15_12Jan15_gamma0_50m_v03.0.tif
https://n5eil01u.ecs.nsidc.org/DP4/MEASURES/NSIDC-0723.003/2015.01.13/GL_S1bks_mosaic_13Jan15_24Jan15_gamma0_50m_v03.0.tif
https://n5eil01u.ecs.nsidc.org/DP4/MEASURES/NSIDC-0723.003/2015.01.25/GL_S1bks_mosaic_25Jan15_05Feb15_gamma0_50m_v03.0.tif
https://n5eil01u.ecs.nsidc.org/DP4/MEASURES/NSIDC-0723.003/2015.02.06/GL_S1bks_mosaic_06Feb15_17Feb15_gamma0_50m_v03.0.tif
https://n5eil01u.ecs.nsidc.org/DP4/MEASURES/NSIDC-0723.003/2015.02.18/GL_S1bks_mosaic_18Feb15_01Mar15_gamma0_50m_v03.0.tif
https://n5eil01u.ecs.nsidc.org/DP4/MEASURES/NSIDC-0723.003/2015.03.02/GL_S1bks_mosaic_02Mar15_13Mar15_gamma0_50m_v03.0.tif
https://n5eil01u.ecs.nsidc.org/DP4/MEASURES/NSIDC-0723.003/2015.03.14/GL_S1bks_mosaic_14Mar15_25Mar15_gamma0_50m_v03.0.tif
https://n5eil01u.ecs.nsidc.org/DP4/MEASURES/NSIDC-0723.003/2015.03.26/GL_S1bks_mosaic_26Mar15_06Apr15_gamma0_50m_v03.0.tif
https://n5eil01u