# Introduction

In this tutorial, you will learn how to work with *.h5* data.  Specifically, we will look at data from the [Global Precipitation Measurement (GPM) mission](https://www.nasa.gov/mission_pages/GPM/main/index.html).  We will download an Integrated Mulit-Satellite Retrievals for GPM (IMERG) dataset into this Jupyter notebook. Then, we will learn how to navigate the dataset hierarchy and plot measurements.

# Import Dependencies
Let's import all the libraries we need. This needs to be done before any of the other cells can be run. These libraries were installed in the docker container you are using, so we will not need to worry about installing anything. Simply running the following cell takes care of all of the dependencies.

In [None]:
from mpl_toolkits.basemap import Basemap, cm
import matplotlib.pyplot as plt
import numpy as np
import h5py

# Approving the GES DISC DAAC
 
Before we can obtain any GES DISC data, we need to approve the GES DISC DAAC in our Earthdata Account. If you do not yet have an Earthdata Login Account, [you can create one here](https://urs.earthdata.nasa.gov). Next, we will need to approve the GES DISC data. Go to [this link](https://urs.earthdata.nasa.gov/approve_app?client_id=e2WVk8Pw6weeLUKZYOxvTQ) and click approve. If you are prompted to login, fill out your login information and click the link again.

# Downloading the Data

Let's use the *wget* command to download the dataset to our local machine. We will first change directories into the directory of this Jupyter Notebook. In your terminal, type the following command:

> `cd /home/condauser/tutorials/notebooks/IMERG_TUTORIAL`

<br>
Now, you are in the same directory as this Jupyter Notebook.  Run the following command to list all files of this directory:

> `ls`

Next, we need to input your username and password for Earthdata login. Run the following command and enter your username and password when prompted:

> `read -p "enter your username: " username; read -s -p "enter your password: " password; echo ""`

<br> 
Now, let's run the following *wget* command to download the IMERG *.h5* file into this directory. We are obtaining this dataset from the GES DISC DAAC.  *wget* will require a URL for the data. We have provided that for you already. Later in this tutorial, we will learn where the data URL can be found. Run the following commands in your terminal.

> `cd /home/condauser/tutorials/notebooks/IMERG_TUTORIAL`

> `wget --user="$username" --password="$password" "http://gpm1.gesdisc.eosdis.nasa.gov/data//GPM_L3/GPM_3IMERGM.04/2017/3B-MO.MS.MRG.3IMERG.20170101-S000000-E235959.01.V04A.HDF5"; username=""; password=""`

Now, the data will be downloaded as an HDF5 (Hierarchical Data Format 5) to your local machine. You can see this by typing in the following command:

> `ls`

You should now see `3B-MO.MS.MRG.3IMERG.20170101-S000000-E235959.01.V04A.HDF5` in your directory.

# Read the Data into Python

Let's use the *h5py* package to import the data into Python. To pull the dataset into this Jupyter Notebook, we will use the `File()` method which takes as input the file and the `"r"` flag.  This flag ensures that we will *read* in an existing dataset.

In [None]:
dataset = h5py.File("3B-MO.MS.MRG.3IMERG.20170101-S000000-E235959.01.V04A.HDF5", "r")

We now have an object that contains all of the data from the above dataset. Since this data is stored in hierarchical format (dictionaries of dictionaries), let's find the keys for the root dictionary. We can use the `keys()` function.

In [None]:
list(dataset.keys())

So, the root level dictionary has one key: "Grid". Let's navigate into this directory.

In [None]:
grid = dataset["Grid"]

Now, let's use the `keys()` function once more to see what the grid directory holds.

In [None]:
list(grid.keys())

The "Grid" directory contains some atmospheric measurements as well as the latitude and longitude for each measurement. 

Here's the hierarchy of this dataset:

Root
<br>
|&nbsp; &nbsp; -- Grid
<br>
|&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; | &nbsp; -- lat
<br>
|&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; | &nbsp; -- precipitation
<br>
|&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; | &nbsp; -- randomError
<br>
| &nbsp;&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; | &nbsp; -- lon
<br>
|&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; | &nbsp; -- gaugeRelativeWeighting
<br>
|&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; | &nbsp; -- probabilityLiquidPrecipitation

Now that we are in a directory that contains the measurements, let's store one of these measurements in a variable. 

In [None]:
precipitation = grid["precipitation"]

We can grab the data for the precipitation object as a numpy array using the following:

In [None]:
precipitation_data = precipitation[:]

In [None]:
precipitation_data.shape

Okay, let's get the latitude and longitude data now.

In [None]:
lat = grid["lat"][:]
lon = grid["lon"][:]

In [None]:
lat

In [None]:
lon

We can see that the longitude and latitude coordinates stretch across the whole world. Latitude goes from -90 to 90 degrees, and longitude goes from -180 to 180 degrees.

# Plotting the Data

We are going to use the packages *matplotlib*, *numpy*, and *basemap* to plot this data onto a world projection.

In [None]:
%matplotlib inline

The above box ensures that any plots we create stay within the Jupyter Notebook.  

## Reshape Data

To use the *basemap* package, the latitude, longitude and precipitation data must be the same numpy array shape. This is because basemap projects the data onto the the longitude and latitude coordinates.

In [None]:
lon.ndim

In [None]:
lon.shape

In [None]:
lat.ndim

In [None]:
lat.shape

In [None]:
precipitation_data.ndim

In [None]:
precipitation_data.shape

We need to reconfigure the coordinate data to a 2-dimensional grid. We can do so by using the numpy meshgrid method. Specifically, it must the coordinate data must be a 2-dimensional grid with a shape of (3600,1800) so that it matches the shape and dimensions of the precipitation data. We can use the numpy function `meshgrid()` to do this.

In [None]:
lats, lons = np.meshgrid(lat,lon)

In [None]:
lats.shape

Now, let's check the dimensions and shapes of our coordinates and precipitation data.

In [None]:
lats.shape == lons.shape == precipitation_data.shape

Below is an example of how meshgrid can be used.

In [None]:
# example of np.meshgrid
xs = np.linspace(0,5,6)
ys = np.linspace(9,12,4)
print("xs")
print(xs, xs.shape)
print('\nys')
print(ys, ys.shape)
print('\nxgrid')
xgrid, ygrid = np.meshgrid(xs, ys)
print(xgrid, xgrid.shape)
print('\nygrid')
print(ygrid, ygrid.shape)

## Masking Data

Let's gain a little bit more information about this dataset. <br> <br>
1\. Go to the following website: https://mirador.gsfc.nasa.gov <br>
2\. In the Keyword Box, type: IMERG <br>
3\. Click on `Search GES-DISC` <br> 
4\. On the first result, click on `View Files` <br>

You will notice that the first result on this page is the dataset we have been examining: `3B-MO.MS.MRG.3IMERG.20170101-S000000-E235959.01.V04A.HDF5`

5\. Now, click on `OPeNDAP` (beside One Click Download)

You are now on an *OPeNDAP Server Data Access Form* for this dataset.  Some of the useful information on the website are:

* The data URL: This was used in the *wget* command.

* The variable, *precipitation*, has units of *mm/hr*

* The variable, *precipitation*, has a fill value of -9999.9004

For a numpy array, if data is missing for a certain measurement, this fill value will be used.

In [None]:
precipitation_data.mean()

In [None]:
precipitation_data.max()

In [None]:
precipitation_data.min()

Evidently, some of the precipitation data is not recorded. We will create a masked numpy array to *hide* these fill values for plotting the data. Here's how we can do that.

In [None]:
precipitation_data_masked = np.ma.masked_where(precipitation_data < -9000, precipitation_data)

Now that we have created a masked numpy array, let's compare the arrays before and after masking.

In [None]:
precipitation_data

In [None]:
precipitation_data_masked

In [None]:
precipitation_data_masked.mean()

In [None]:
precipitation_data_masked.min()

In [None]:
precipitation_data_masked.max()

Now, we are ready to plot.

## Using Basemap

We are going to setup the boundaries for our map.  We will use:

* Minimum Latitude: -60
* Maximum Latitude: 60
* Minimum Longitude: -180
* Maximum Longitude: 180

In [None]:
latcorners = [-60,60]
loncorners = [-180, 180]

Let's configure the coloring scheme.

In [None]:
cmap = cm.GMT_drywet

We will use a cylinder projection of the world, and produce a contour map with the precipitation data.

In [None]:
plt.figure(figsize=(20,20))
plt.title("Precipitation Measurement from IMERG")

m = Basemap(projection="cyl", llcrnrlat=latcorners[0], urcrnrlat=latcorners[1], llcrnrlon=loncorners[0], urcrnrlon=loncorners[1])
m.drawcoastlines()

plot_data = m.contour(lons, lats, precipitation_data_masked,cmap=cmap, latlon=True)

cbar = m.colorbar(plot_data,location='right',pad="5%")
cbar.set_label('mm/h')

parallels = np.arange(-60.,61,20.)
m.drawparallels(parallels,labels=[True,False,True,False])

meridians = np.arange(-180.,180.,60.)
m.drawmeridians(meridians,labels=[False,False,False,True])

font = {'weight' : 'bold', 'size' : 20}
plt.rc('font', **font)

The Basemap API is quite useful: https://matplotlib.org/basemap/api/basemap_api.html. <br> <br> Let's try plotting a filled contour.

In [None]:
plt.figure(figsize=(20,20))
plt.title("Precipitation Measurement from IMERG")

m = Basemap(projection="cyl", llcrnrlat=latcorners[0], urcrnrlat=latcorners[1], llcrnrlon=loncorners[0], urcrnrlon=loncorners[1])
m.drawcoastlines()


data_range = np.arange(0, np.max(precipitation_data_masked), 0.25)
plot_data = m.contourf(lons, lats, precipitation_data_masked, data_range,cmap=cmap, latlon=True)

cbar = m.colorbar(plot_data,location='right',pad="5%")
cbar.set_label('mm/h')

parallels = np.arange(-60.,61,20.)
m.drawparallels(parallels,labels=[True,False,True,False])

meridians = np.arange(-180.,180.,60.)
m.drawmeridians(meridians,labels=[False,False,False,True])

font = {'weight' : 'bold', 'size' : 20}
plt.rc('font', **font)

Let's try plotting a different projection. A list can be found here: https://matplotlib.org/basemap/users/mapsetup.html. In this case, we will use an Orthographic projection.

In [None]:
plt.figure(figsize=(13,13))
plt.title("Precipitation Measurement from IMERG")

m = Basemap(projection="ortho", lat_0=0, lon_0=125)
m.drawcoastlines()


data_range = np.arange(0, np.max(precipitation_data_masked), 0.25)
plot_data = m.contourf(lons, lats, precipitation_data_masked, data_range,cmap=cmap, latlon=True)

cbar = m.colorbar(plot_data,location='right',pad="5%")
cbar.set_label('mm/h')


font = {'weight' : 'bold', 'size' : 12}
plt.rc('font', **font)