# Exploratory Activity 15: Using Spatial Data

*prepared in consultation with Dr. Josh Laughner*

Yesterday you learned to use NASA's "[Worldview](https://worldview.earthdata.nasa.gov/)" data visualizer application, an expansive collection of fully processed and publicly available global satellite data. However, not all satellite data sets are available on this platform and there may be a number of different reasons you may want to access the underlying data products directly. As such, today you will learn to directly download, import, and plot satellite data yourself.


In [None]:
# Note this project uses the 'cartopy' mapping library which is best installed with conda 
# (https://docs.conda.io/en/latest/), due to the various dependencies required.

# Creating a virtual environment with this library pre-loading is a reliable way to do this;
# below are Terminal commands which will create a TEST environment with cartopy.

# conda create --name TEST python=3.6 cartopy
# conda activate TEST

In [None]:
%%capture

# first install a libraries for data loading and mapping
!pip install h5py
!pip install numpy
!pip install pandas

## 15.1 Downloading and Importing Spatial Data 

**Question 15.1.1** Much of the data compiled in the Worldview visualizer resides in separate online repositories. One such repository is managed by NASA's Goddard Earth Sciences Data and Information Services Center. Visit the GES DISC website dedicated to the [Ozone Monitoring Instrument's nitrogen dioxide](https://disc.gsfc.nasa.gov/datasets/OMNO2d_V003/summary) data product and choose "Online Archive" from the "Data Access" menu. Follow the instructions at the top of the page to [register](https://disc.gsfc.nasa.gov/data-access) for an Earthdata account and [authorize](https://urs.earthdata.nasa.gov/approve_app?client_id=e2WVk8Pw6weeLUKZYOxvTQ) the proper data archive application.


**Question 15.1.2** Once you have completed the registration process, return to the "Online Archive" page and navigate into the 2013 folder. Click the *.he5 file (as opposed to the *.he5.xml file) for 4 July to download it. 

*Note the location where the file is downloaded on your computer, as you will need to update the "path" in the code below in order to import the file.*


In [None]:
# import the library to access the hdf file contents
import h5py

## UPDATE THIS FILE PATH TO REFLECT THE LOCATION OF THE DOWNLOAD ON YOUR COMPUTER
file_path = "file/path/to/.h35/file"

# read the he5 file into a structure named "f"
f = h5py.File(file_path, 'r+')

The OMI NO<sub>2</sub> data files are in "hierarchical data format," which allows data to be stored in a compact directory-like structure, similar to the folders and subfolders on a computer. At a given level of the data hierarchy, you can display the names of the sub-levels by listing the parent level's keys:


In [None]:
# list f's keys
list(f.keys())

**Question 15.1.3** Complete the path through the file hierarchy that leads to a data set named "ColumnAmountNO2Trop." The "type" function can then be used to identify the type of variable that follows. The correct data set will be a "dataset" type, not a "group" type.

*Hint: The figure below from https://acdisc.gesdisc.eosdis.nasa.gov/ gives a visual representation of the structure of this file.*

![OMNO2_README_V003.jpg](attachment:OMNO2_README_V003.jpg)

In [None]:
## COMPLETE THE FILE PATH LEADING TO THE DESIRED DATA SET BELOW
path_to_trop = 'path/to/ColumnAmountNO2Trop'

# read the variable type 
type(f[path_to_trop])

**Question 15.1.4** Once you have charted a path to the appropriate data set, assign the data set to the variable "NO2_trop." Note that you will need to append a bracketed colon symbol "[:]" to the end of the variable name so that the computer knows to read in the variable's contents rather than its name.

In [None]:
## ENTER CODE HERE TO LOAD DATA INTO NO2_trop VARIABLE

Once you have obtained the data you need, it is best practice to close the original data file so that your jupyter notebook does not have to keep track of it. Use the sample code below to do so when appropriate; you will still have access to new variables you created based on the data file, but will have to reload the file if you want to navigate through it or extract other data sets.


In [None]:
f.close()

## 15.2 Plotting Spatial Data

**Question 15.2.1** The data set you have just imported is a level3 data product, meaning that it has been averaged onto a regularly spaced grid for you. The coordinates of said grid are not, however, included in the file, so before you are able to plot a map of the NO2_trop variable, you will need to create gridded latitude and longitude variables yourself. Note that the first grid cell has edges at 180<sup>o</sup> W longitude and 90<sup>o</sup> S latitude, and there is a 0.25<sup>o</sup> spacing between grid cells.

*Hint 1: The final latitude and longitude arrays should be the same size as the NO2_trop array.*

*Hint 2: You may find the functions np.arange, np.tile, and [array].shape useful. Google their documentation for more information.*


In [None]:
## ENTER CODE HERE TO MAKE LONGITUDE AND LATITUDE VARIABLES
## OF THE SAME SHAPE AS NO2_trop

Once you have successfully created your latitude and longitude arrays, use the code below to plot a colored map of the NO2_trop variable. The first time you use the "coastlines" command, some data will be downloaded from the internet; you may ignore any warning that appears.


In [None]:
# import mapping and graphing libraries
import cartopy
import cartopy.crs as ccrs
import matplotlib.pyplot as plt

# create the figure
fig = plt.figure()

# define the map bounds
ax = fig.add_axes([0.1,0.1,0.8,0.8], projection=ccrs.PlateCarree())

# add the data and color range
h = ax.pcolormesh(lon, lat, NO2_trop, vmin=0, vmax=5e15)
plt.colorbar(h, label='NO$_2$', extend='both')

# show the coastlines layer
ax.coastlines()
fig.set_size_inches(16,8)

# add a title
ax.set_title('Column NO$_2$')

# show the chart
plt.show()

**Question 15.2.2** Examine the map you created above closely. Do high NO<sub>2</sub> column densities appear to be associated with rural or urban areas?


**Question 15.2.3** Use the functions set_xlim and set_ylim to re-make your plot from before, but “zoomed in” on the state of California. 

*Hint 1: Google the documentation for these matplotlib functions to find the necessary arguments.*

*Hint 2: Many online sites allow you to extract the map bounds. Try [openstreetmap.org](https://www.openstreetmap.org/export) or another website of your choosing.* 

In [None]:
## ENTER CODE HERE TO MAKE A MAP "ZOOMED IN" ON CALIFORNIA

**Question 15.2.4** Now "zoom in" on the San Francisco Bay Area only. Approximately how many OMI pixels are used to represent the BEACO<sub>2</sub>N domain? What are the pros and cons of the satellite remote sensing vs. in situ monitoring network approaches?


In [None]:
## ENTER CODE HERE TO MAKE A MAP "ZOOMED IN" ON SF BAY

## 15.3 Comparing Spatial Data

**Question 15.3.1** Back on the GES DISC website, use the "Parent Directory" link to navigate back to the original NO<sub>2</sub> data directory. Clicking "Parent Directory" again will allow you to view the various other level3 data products available from the OMI platform. Choose another atmospheric species, download its 4 July 2013 *.he5 file, and make a map of this species's global distribution, using the NO<sub>2</sub> steps above as a guide.

*Hint 1: In each data product's directory, there is a sub-directory called "doc" that contains helpful documentation. You should consult these documents to ensure that you understand the physical significance of the quantities you are plotting.*

*Hint 2: The numerical magnitudes vary widely from data set to data set. The "vmin" and "vmax" arguments in the pcolormesh function can be used to manipulate the range of values included in the colorbar to facilitate the best possible visualization of the variable.*


In [None]:
## ENTER CODE HERE TO LOAD, EXTRACT, AND PLOT A DIFFERENT OMI DATA SET

**Question 15.3.2** Compare this map to one you made of global NO<sub>2</sub>. Based on a simple visual inspection, how does your new variable appear to be related to NO<sub>2</sub>? Are these species correlated, anti-correlated, unrelated, or somewhere in between?


**Question 15.3.3** Make a correlation plot of the two variables to verify your answer above. Note that there are a number of pixels that contain nonphysical, negative "fill values" in each data product, and these should be filtered out before plotting.

*Hint: If you don't remember how to replace fill values with NaNs, consult Exploratory Activity 2.*

In [None]:
## ENTER CODE HERE TO REMOVE FILL VALUES AND
## PLOT THE CORRELATION BETWEEN THE TWO OMI DATA SETS

## 15.4 Other Spatial Data Formats

Beyond the OMI data products, *.he5 is not the only format in which satellite data is reported. While there are many other conventions, another file format that is commonly used to organize geospatial data is netCDF, which is given the suffix *.nc. The code below installs and imports the libraries necessary for handling this alternative data format.

In [None]:
%%capture

# install a library to load netCDf files
!pip install netCDF4

# import netCDF Dataset module to support data loading
from netCDF4 import Dataset

**Question 15.4.1** To practice importing and handling a netCDF file, use the "Parent Directory" link to navigate out of the Aura_OMI_Level3 directory and view the other instrumentation with available data sets. Choose "CMS" and then "CMS_CH4_FLX_NAD" to see the data products released from the Carbon Monitoring System. Download an *.nc file of your choice. 

Replace the filename and location in the code below to import the file. 

In [None]:
## UPDATE THIS FILE PATH TO REFLECT THE LOCATION OF THE DOWNLOAD ON YOUR COMPUTER
file_path = "file/path/to/.nc/file"

# load the data
f = Dataset(file_path, mode='r')

**Question 15.4.2** While this is similar to how you imported the *.he5 files, netCDF files are not organized into a "hierarchical" structure, so you access the data sets directly. You can, however, still display the "keys" of the data set to list the variables available. Each variable also has "attributes" that you can display the names and/or contents of. These traits are common to *.he5 and *.nc files. Modify the code below to explore the *.nc file as you desire.

In [None]:
# display the variable keys
f.variables.keys()

In [None]:
## UPDATE THIS CODE WITH THE NAME OF ONE OF THE VARIABLES ABOVE
## TO DISPLAY ITS ATTRIBUTES
f.variables['VARIABLE'].ncattrs()

In [None]:
## UPDATE THIS CODE WITH THE NAME OF ONE OF THE VARIABLE ATTRIBUTES ABOVE
## TO DISPLAY ITS CONTENTS
f.variables['VARIABLE'].ATTRIBUTE

For this data product, the necessary the longitudes and latitudes are provided in the *.nc file. Use the code below to save these values to a variable.

In [None]:
# store the lon and lat variables
lon = f.variables['Lon'][:]
lat = f.variables['Lat'][:]

**Question 15.4.3** Choose one of the remaining "CH4_FLX" variables you would like to make a map of. Note that these are three dimensional variables, so you will need to choose a single vertical layer to display.

In [None]:
## ENTER CODE HERE TO STORE VARIABLE FOR MAPPING

**Question 15.4.4** Repeat (or "tile") the longitude and latitude variables created above to create new variables with the same "shape" as the measurement you will be mapping.

In [None]:
## ENTER CODE HERE TO MAKE LONGITUDE AND LATITUDE VARIABLES
## OF THE SAME SHAPE AS THE VARIABLE YOU WILL BE MAPPING

**Question 15.4.5** Make a plot of the data using the tiled longitude and latitude variables. Remember that you can adjust the "vmin" and "vmax" of your pcolormesh to make the variability in the values more visible.

In [None]:
## ENTER CODE HERE TO PLOT THE MAP

As a reminder, it is still best practice to close the file once you have imported the data you need.


In [None]:
f.close()

## 15.5 Independent Exploration of Spatial Data

**Question 15.5.1** Pose a simple scientific question of your choosing that could be reasonably answered using any of the satellite data sets available on the GES DISC website. You may or may not have time to answer it in full today, but you should start and see how far you get! A few ideas are given below to help you brainstorm.

1. Compare measurements made by different (spaceborne or ground-based) instruments 
    * see Exploratory Activity 1 to access BEACO<sub>2</sub>N data
2. Contrast weekday vs. weekend levels of the same pollutant 
    * see Exploratory Activity 3
3. Analyze the long term (i.e., multiannual) trend in a given atmospheric variable

In [None]:
## ENTER CODE HERE TO EXPLORE SPATIAL DATA OF YOUR CHOOSING

**Discussion Questions.** If you have time, pair up with a partner and discuss the following:

1. What was your scientific question, and how did you try to answer it? What did you find?
2. How many data files did you use in your analysis? Did you do any averaging or other analysis to combine multiple files? Do you think your approach was truly representative of the different phenomena you wanted to explore?
3. If you had more time, what would you do next?