# Raster data

```{admonition} Learning Objectives
*A 30 minute guide to raster data for SnowEX*
- find, visualize, interpret raster data formats
- use Python raster libraries [rioxarray](https://corteva.github.io/rioxarray) and [hvplot](https://hvplot.holoviz.org)
```

## Raster Basics

*Raster data is stored as a grid of values which are rendered on a map as pixels. Each pixel value represents an area on the Earth's surface.* Pixel values can be continuous (elevation) or categorical (land use).  This data structure is very common - jpg images on the web, photos from your digital camera. A geospatial raster is only unique in that it is accompanied by metadata that connects the pixel grid to a location on Earth's surface.

```{seealso}
this tutorial is a quick practical guide, if you're new to geospatial rasters explore the following resources:

* https://carpentries-incubator.github.io/geospatial-python/01-intro-raster-data
* https://github.com/geohackweek/raster-2019
```

In [None]:
# Import all Python libraries required for this notebook
import hvplot.xarray
import rioxarray
import rasterio
import rasterio.plot

## Search and Discovery

Typically you'll search for geospatial raster data based on a spatial footprint and time period of interest. There are many web interfaces, for example NASA's EarthData Search: https://search.earthdata.nasa.gov/. 

During the Hackweek we're using computational infrastructure running in Amazon Web Services (AWS) us-west-2 datacenter, where NASA starting to store many archives. By moving our computation to the data, rather than downloading files, we can take advantage of fast access and scalable computing resources, so let's work with data that's being hosted by AWS. You can find such datasets with the following URL: https://search.earthdata.nasa.gov/search?ff=Available%20from%20AWS%20Cloud. You can also browse public datasets via this [AWS website](https://registry.opendata.aws/?search=tags:gis,earth%20observation,events,mapping,meteorological,environmental,transportation), but note these datasets are often mirrors of official archives and may have partial coverage.

For programmatic search, rather than using a web interface you can use Python software to search APIs (behind the scenes this is what is happening when you interact with web interfaces like EarthSearch). There are many options here including [NASA Common Metadata Repository](https://earthdata.nasa.gov/collaborate/open-data-services-and-software/api/cmr-api). For Cloud-hosted data, there is a lot of momentum around Spatio Temporal Asset Catalogs (STAC) as a common standard for user-facing search and discovery. If a dataset has STAC metadata (all NASA data does https://github.com/nasa/cmr-stac). You can easily search for data via Python with https://github.com/stac-utils/pystac-client.

In [None]:
# Test S3 access
s3path = 's3://snowex-data/tutorial-data/sar/GRMCT2_31801_20007_016_200211_09225VV_XX_01.tif'
with rasterio.open(s3path) as src:
    print(src.profile)
    rasterio.plot(src)

In [None]:
# Test .netrc access HDF download
!wget https://n5eil01u.ecs.nsidc.org/MOSA/MYD10A1.006/2002.07.04/MYD10A1.A2002185.h00v08.006.2016152142655.hdf

In [None]:
# Test .netrc open tif
url = 'https://n5eil01u.ecs.nsidc.org/ASO/ASO_3M_SD.001/2017.02.25/ASO_3M_QF_USCOGM_20170225.tif' #100MB
# https://n5eil01u.ecs.nsidc.org/ASO/ASO_3M_SD.001/2017.02.25/ASO_3M_SD_USCOGM_20170225.tif #1.6GB
# Rasterio uses an environment context manager for GDAL environment variables
Env = rasterio.Env(GDAL_DISABLE_READDIR_ON_OPEN='EMPTY_DIR',
                   GDAL_HTTP_COOKIEFILE='.urs_cookies',
                   GDAL_HTTP_COOKIEJAR='.urs_cookies')
with Env:
    with rasterio.open(url) as src:
        print(src.profile)