# Part 2: NetCDF and self-describing datasets

![](img/satellites.gif)

The main satellites that we'll talk about today:
* GOES-R/GOES-16 and GOES-S/GOES-17 with the ABI instrument
* Suomi-NPP with the VIIRS instrument

We're in the golden age of sallite datasets, which is a blessing and a curse:

* Inundated with datasets, don't know which ones to use
* No single repository for access of the data
* Inconsistent formatting and filetypes

netCDF4 and HDF5 are the dominant formats used in satellite remote sensing (but others do exist)

## netCDF Primer
* Hosted by the Unidata program at the University Corporation for Atmospheric Research (UCAR)
* NetCDF (Network Common Data Form) a set of software libraries and self-describing, machine-independent data formats
* Support the creation, access, and sharing of array-oriented scientific data

Advantages: 
* Open source and free
* Provides standard formatting for earth science data
* Compression helps with long term file storage
* Includes additional metadata

Disadvantages: 
* There is a steeper learning curve for working with self-describing file formats

## Panoply
![](img/Picture1.png)
* Pronounced: Pan-OH-plee
* A netCDF, HDF, KMZ, and GRIB data viewer
* Free/Open source for Mac, Windows, Linux
* Developed and maintained by Dr. Robert B. Schmunk of NASA/GISS

Other display tools: 
* Free: HDFView, QGIS, Explorer series
* Not free: ENVI/IDL, MATLAB, ArcMap

## Inspecting ABI files

Run Panoply

Navigate to the following file:
![](img/filename.png)

When you open the file you will see something like this:
![](img/pano-filelist.png)



## Importing netCDF files
The netCDF4 package is included in Anaconda Python. The main function is Dataset, which reads from an existing file:
```
file_id = Dataset("test.nc", "r", format="NETCDF4")
```
You can choose to 'w' (write), 'r' (read), or 'a'

The foramts can be: NETCDF3_CLASSIC, NETCDF3_64BIT_OFFSET, NETCDF3_64BIT_DATA, NETCDF4_CLASSIC, and NETCDF4 (default)

![](img/Picture2.png)

![](img/Picture3.png)


In [1]:
from netCDF4 import Dataset

In [22]:
# To open the files, call the Dataset constructor
fname='data/JRR-AOD_v1r1_npp_s201808091957192_e201808091958434_c201808092051240.nc'
file_id = Dataset(fname)

In [23]:
# Quickly inspect the contents
list(file_id.variables.keys())

['Latitude',
 'Longitude',
 'StartRow',
 'StartColumn',
 'AOD550',
 'AOD_channel',
 'AngsExp1',
 'AngsExp2',
 'QCPath',
 'AerMdl',
 'FineMdlIdx',
 'CoarseMdlIdx',
 'FineModWgt',
 'SfcRefl',
 'SpaStddev',
 'Residual',
 'AOD550LndMdl',
 'ResLndMdl',
 'MeanAOD',
 'HighQualityPct',
 'RetrievalPct',
 'QCRet',
 'QCExtn',
 'QCTest',
 'QCInput',
 'QCAll']

In [28]:
# Copy the AOD variable and the latitude and longitude coordinates into arrays object using .variables
AOD550 = file_id.variables['AOD550']

In [29]:
# Inspect attributes using .ncattrs command
file_id.variables['AOD550'].ncattrs()

['long_name', 'coordinates', 'units', '_FillValue', 'valid_range']

In [30]:
# Get some very simple statistics by converting into a NumPy array
import numpy as np

AOD550 = np.array(AOD550)

# Remove missing values
missing = file_id.variables['AOD550']._FillValue
keepRows = AOD550 != missing
AOD550 = AOD550[keepRows]

avgAOD = AOD550.mean()
stdDev = AOD550.std()
nAOD = AOD550.size

print(avgAOD, stdDev, nAOD)

0.633283 0.422585 1761764


In [None]:
# Tip: netcdf4 Automatically created a masked array where fill values are masked.
# If you want to suppress this, use the following option:
AOD.set_auto_mask(False)

In [31]:
# Close the file when you're done
file_id.close()

<div class="alert alert-block alert-info">

# Exercise 1

## Import netCDF file
From the data folder, import "JRR-AOD_v1r1_npp_s201808091957192_e201808091958434_c201808092051240" using the Dataset command from the netcdf4 package.

## Inspect the list of variables
Get a list of variables after the file has been opening.

## Inspect the attributes of a given variable
What are the attributes of the QCAll variable?

</div>

## Importing HDF files

Very similar process to netCDF. Looking at the mean monthly AOD for August, 2018 using the Deep Blue AOD retrieval (output from Panoply below)

![](img/db.png)

In [15]:
import h5py

In [38]:
# Open the file
fname='data/DeepBlue-SeaWiFS-1.0_L3M_201008_v004-20130604T140615Z.h5'
file_id_DB = h5py.File(fname, 'r')

In [75]:
# Quickly inspect the contents...
list(file_id_DB.keys())

['aerosol_optical_thickness_550_count_land',
 'aerosol_optical_thickness_550_count_land_ocean',
 'aerosol_optical_thickness_550_count_ocean',
 'aerosol_optical_thickness_550_land',
 'aerosol_optical_thickness_550_land_ocean',
 'aerosol_optical_thickness_550_ocean',
 'aerosol_optical_thickness_550_stddev_land',
 'aerosol_optical_thickness_550_stddev_land_ocean',
 'aerosol_optical_thickness_550_stddev_ocean',
 'aerosol_optical_thickness_count_land',
 'aerosol_optical_thickness_count_ocean',
 'aerosol_optical_thickness_land',
 'aerosol_optical_thickness_ocean',
 'aerosol_optical_thickness_stddev_land',
 'aerosol_optical_thickness_stddev_ocean',
 'angstrom_exponent_count_land',
 'angstrom_exponent_count_land_ocean',
 'angstrom_exponent_count_ocean',
 'angstrom_exponent_land',
 'angstrom_exponent_land_ocean',
 'angstrom_exponent_ocean',
 'angstrom_exponent_stddev_land',
 'angstrom_exponent_stddev_land_ocean',
 'angstrom_exponent_stddev_ocean',
 'diagnostics',
 'land_bands',
 'latitude',
 'l

In [71]:
# Import the data...
AOD = file_id_DB['aerosol_optical_thickness_550_land_ocean']

# Check a value...
AOD[60, 300]

0.020318512

In [69]:
list(AOD.attrs)

['long_name',
 'standard_name',
 'units',
 'comment',
 '_FillValue',
 'valid_range',
 'DIMENSION_LIST']

In [68]:
# To view the attribute
AOD.attrs['long_name']

b'aerosol optical thickness estimated at 550 nm over land and ocean'

## Other formats:
* GRIB/GRIB2: World Meteorology Association standard format, e.g. commonly used with weather-related models like ECMWF and GFS. Can be opened using [pygrib](https://github.com/jswhit/pygrib)
* BUFR: Another common model format. Open with [python-bufr](https://github.com/pytroll/python-bufr), part of the pytroll project.

## Resources

Searchable satellite data:
* NOAA CLASS: https://www.avl.class.noaa.gov
* NASA MIRDOR: https://mirador.gsfc.nasa.gov
* EUMETSAT: https://www.eumetsat.int/website/home/Data/DataDelivery/OnlineDataAccess/index.html

Other channels
* Amazon Web Services has GOES-16 radiance, Landsat, MODIS, and more https://registry.opendata.aws/?search=earth%20observation
* python-AWIPS: https://python-awips.readthedocs.io/en/latest/ Has a good repository of atmospheric datasets
* [Python Satellite Data Analysis Toolkit (pysat)](https://github.com/rstoneback) Can pull space science related datasets (e.g. COSMIC-1) 