# HW3 Template: Dataset Overview and Use Case Examples
## EDS 220, Fall 2021

The following is a template you can use for constructing your draft Jupyter notebooks demonstrating the features and use case examples for your chosen environmental datasets. I've included sections addressing the major themes that should be included, but there is also room for customization as well. 

Many of the resources provided are adapted from this template guide to notebook creation built for the "EarthCube" project:
https://github.com/earthcube/NotebookTemplates

# NASA Earth Exchange (NEX) Downscaled Climate Projections (NEX-DCP30) dataset for the conterminous United States

The modelled data generated from the General Circulation Model(GCM) runs under 33 different Coupled Model Intercomparison Project Phase 5(CMIP5) models. The modelled data projections are available for four greenhouse gas emissions scenarios known as Representative Concentration Pathways(RCPs) labelled after the range of radiative forcing values in the year 2100 (2.6, 4.5, 6, and 8.5 W/m2).

In this notebook we will be building on our python skills by mapping a region of interest and overlaying climate modelling scenarios over that region. This modelling is vital for county governments and economy sectors to begin planning for climate change and understanding how it may effect them directly.

## Authors

- Desik Somasundaram, UC Santa Barbara (desik@bren.ucsb.edu ) <br>
- Daniel Kerstan, UC Santa Barbara (danielkerstan@bren.ucsb.edu ) <br>
- Joe DeCesaro, UC Santa Barbara (jdecesaro@bren.ucsb.edu ) <br>

## Table of Contents

Include a summary of the various sections included in your notebook, so that users can easily skip to a section of interest. It's also good to include hyperlinks to the different sections, so that clicking on the heading sends one to that section directly. Examples are below; see also this handy guide to adding hyperlinks to Jupyter notebooks:
https://medium.illumidesk.com/jupyter-notebook-little-known-tricks-b0866a558017

[1. Purpose](#purpose)

[2. Dataset Description](#overview)

[3. Data I/O](#io)

[4. Metadata Display and Basic Visualization](#display)

[5. Use Case Examples](#usecases)

[6. Create Binder Environment](#binder)

[7. References](#references)

<a id='purpose'></a> 
## Notebook Purpose

This notebook was produced for the final project of EDS 220 - Remote Sensing. It provides instruction and example code on how to pull in the NASA Earth Exchange climate projection data directly from the source. It provides specific use case examples for this data set. Specifically, this workbook shows how local governments and economy sectors could plan for different climate scenarios from the dataset.

<a id='overview'></a> 
## Dataset Description

### General Description
The modelled data is generated by NASA from the General Circulation Model(GCM) runs under 33 different Coupled Model Intercomparison Project Phase 5(CMIP5) models. The modelled data projections are available for four greenhouse gas emissions scenarios known as Representative Concentration Pathways(RCPs) labelled after the range of radiative forcing values in the year 2100 (2.6, 4.5, 6, and 8.5 W/m2). The 2.6 W/m2 scenario is the least extreme and the 8.5 W/m2 scenario is the most extreme.

- Precipitation data is provided as a monthly mean of the daily precipitation rate in units of kg/m2s
- Temperature data is provided as a monthly mean of the daily predicted temperature in units of degrees K

### Coverage
The data spatially covers the contiguous US at a spatial resolution of 30 arc-seconds/0.00833 degrees (approximately 800 meters). The temoporal coverage of the data is made up of a "Retrosoective Run" (1950-2005), and a "Prospective Run" (2006-2099).

### File Format
Data is in netcdf files. It contains the following information:
- One monthly averaged value for each month from 2006 to 2099
- Monthly average can be based off of daily min, max, average and quartiles for each constituent
- Your selected file will contain the variables you request for the spatial and temporal subset you specify in the NetcdfSubset access url 

### Retrieval
**INSESRT**

### Assumptions
Relative spatial patterns in temperature and precipitation observed from 1950 through 2005 will remain constant under future climate change. 

Does not add information beyond what is contained in the original CMIP5 scenarios, and preserves the frequency of periods of anomalously high and low temperature or precipitation (i.e., extreme events) within each individual CMIP5 scenario.

### Bias Correction-Spatial Disaggregation (BSCD)
***Problem: Biased data unsuitable for local level decision making.*** 

Original GCM model runs are run at coarse resolution which is not at the level of detail required for local decision making. In addition, although the projections are globally accurate there may be considerable bias at the local level since it does not take into account local topography.

***Solution: BSCD using PRISM (observational climate data from meteorological stations)***

The Bias-Correction step corrects the bias of the GCM data through comparisons performed against the observationally-based PRISM data.
The Spatial-Disaggregation step spatially interpolates the bias-corrected GCM data to the finer resolution grid of the 30-arc second PRISM data.

<a id='io'></a> 
## Dataset Input/Output 

**INSERT CODE FROM DESIK COPY**

Next, provide code to read in the data necessary for your analysis. This should be in the following order:

1) Import all necessary packages (matplotlib, numpy, etc)

2) Set any parameters that will be needed during subsequent portions of the notebook. Typical examples of parameters include:
- names of any directories where data are stored
- ranges of years over which data are valid
- any thresholds or latitude/longitude ranges to be used later (e.g. dimensions of NINO3.4 region, threshold SSTA values for El Nino, etc.)

3) Read in the data! If the data files are very large, you may want to consider subsetting the portion of files to be read in (see examples of this during notebooks provided in Weeks 7 and 8).

_Since we will be running these notebooks in class during Weeks 9 and 10_, here is a good rule of thumb: It's good to aim for a relatively short amount of time needed to read in the data, since otherwise we'll be sitting around waiting for things to load for a long time. A  minute or two for data I/O is probably the max you'll want to target!

In [1]:
# 1) Import all necessary packages (matplotlib, numpy, etc)
import numpy as np
import scipy
import scipy.ndimage as ndimage
import scipy.stats as stats
import matplotlib.pyplot as plt
import xarray as xr
import cartopy
import cartopy.crs as ccrs
import cartopy.feature as cfeature
from cartopy.feature import NaturalEarthFeature
import geopandas as gpd
import metpy
from metpy.units import units
import pandas as pd
import plotly.express as px

In [2]:
# 2) Set parameters needed
# - names of any directories where data are stored
# - ranges of years over which data are valid
# - any thresholds or latitude/longitude ranges to be used later (e.g. dimensions of NINO3.4 region, threshold SSTA values for El Nino, etc.)
# Read in Santa Barbara County shapefile (Area of Interest)
sb_county = gpd.read_file('tl_2019_06083_faces/tl_2019_06083_faces.shp')

xmin, ymin, xmax, ymax = sb_county.total_bounds
# Convert x bounds to 0 to 360 format to match netcdf format
xmin_shift = xmin + 360
xmax_shift = xmax + 360

In [3]:
# 3) Read in data
# location of NEXDCP30 data
nexdcp_url="https://ds.nccs.nasa.gov/thredds/dodsC/bypass/NEX-DCP30/nex-quartile/rcp85/r1i1p1.ncml"
# open dataset using remote URL
nexdcp_rcp85_xr=xr.open_dataset(nexdcp_url)

<a id='display'></a> 
### Metadata Display and Basic Visualization
**Using Santa Barbara City as Point Location Example**

In [4]:
# Metadata Display
nexdcp_rcp85_xr

In [5]:
# Select a single x,y combination closest to Santa Barbara city location
# ideally we would write a function that identifies the closest keys based on lat lon input
# in this case the keys(grid cell numbers) are provided
key1=639
key2=1243
longitude = nexdcp_rcp85_xr["ens-avg_tasmax"]["lon"].values[key1]
latitude = nexdcp_rcp85_xr["ens-avg_tasmax"]["lat"].values[key2]

print("Long, Lat values:", longitude, latitude)

Long, Lat values: 240.30416666454 34.42083332919


In [6]:
# Slice the data spatially using a single lat/lon point
start_date = "2022-01-15"
end_date = "2099-12-15"
sb_max_temp = nexdcp_rcp85_xr["ens-avg_tasmax"].sel(time=slice(start_date, end_date),
                                                lat=latitude,
                                                lon=longitude)
sb_avg_precip = nexdcp_rcp85_xr["ens-avg_pr"].sel(time=slice(start_date, end_date),
                                                lat=latitude,
                                                lon=longitude)

In [None]:
# Convert deg K to deg F
sb_max_temp = sb_max_temp.metpy.convert_units('degF')

In [None]:
# Make metpy recognize the units
sb_avg_precip = sb_avg_precip.metpy.quantify()

In [None]:
# convert kg*m2/sec to in/yr
density_water = units('kg / m^3') * 1000
sb_avg_precip_converted_int = (sb_avg_precip / density_water)
sb_avg_precip_converted = sb_avg_precip_converted_int.metpy.convert_units('inches / year')  

### Data Wrangling Xarrays to get more visualization friendly format

In [None]:
# convert xarray array to xarray dataset
sb_max_temp_ds = sb_max_temp.to_dataset()
sb_avg_precip_ds = sb_avg_precip_converted.to_dataset()
# convert xarray dataset to pandas df
temp_df = sb_max_temp_ds.to_pandas()
precip_df = sb_avg_precip_ds.to_pandas()
# convert time into a column from index
temp_df['time'] = temp_df.index
# drop lat lon since we extracted for a point
temp_df.drop(['lat','lon'], axis=1, inplace=True)
precip_df.drop(['lat','lon'], axis=1, inplace=True)

In [None]:
# Resample monthly data to yearly sum
precip_yearly = precip_df.resample('Y').mean()
precip_yearly['time'] = precip_yearly.index

In [None]:
# Plot a quick histogram of temperature
sb_max_temp.plot.hist()
plt.title("Max Daily Average Air Temperature")
plt.xlabel('Temperature (deg F)')
plt.show()

In [None]:
# Plot a quick histogram of precipitation
precip_yearly["ens-avg_pr"].plot.hist()
plt.title("Yearly Mean Precipitation")
plt.xlabel('Precipitation (in)')
plt.show()

In [None]:
# Time series plot of max air temperature
fig = px.line(temp_df, x="time", y="ens-avg_tasmax", labels={
                     "time": "Time",
                     "ens-avg_tasmax": "Air Temperature (deg F)"},
              title="Monthly Average Maximum Near-Surface Air Temperature")
fig.update_xaxes(rangeslider_visible=True)
fig.show()

In [None]:
# Time series plot of average precipitation for year (in/yr)
fig = px.line(precip_yearly, x="time", y="ens-avg_pr", labels={
                     "time": "Time",
                     "ens-avg_pr": "Precipitation (in)"},
              title="Yearly Mean Precipitation")
fig.update_xaxes(rangeslider_visible=True)
fig.show()

### Advanced Visualization
**Using Santa Barbara County as Region Example**

In [None]:
# View date range
print("The earliest date in the data is:", nexdcp_rcp85_xr["ens-avg_tasmax"]["time"].values.min())
print("The latest date in the data is:", nexdcp_rcp85_xr["ens-avg_tasmax"]["time"].values.max())  

In [None]:
# Specify time extent of intesrest
start_date = "2022-01-15"
end_date = "2032-12-15"


In [None]:
# Slice the data by time and spatial extent
# Note that we used the shapefile to specify the extent of our region of interest
sb10_temp = nexdcp_rcp85_xr["ens-avg_tasmax"].sel(
    time=slice(start_date, end_date),
    lon=slice(xmin_shift, xmax_shift),
    lat=slice(ymin, ymax))

In [None]:
# Repeat for precipitation
sb10_precip = nexdcp_rcp85_xr["ens-avg_pr"].sel(
    time=slice(start_date, end_date),
    lon=slice(xmin_shift, xmax_shift),
    lat=slice(ymin, ymax))

In [None]:
# Convert deg K to deg F
sb10_temp_degf = sb10_temp.metpy.convert_units('degF')

In [None]:
# Make metpy recognize the units
sb10_precip = sb10_precip.metpy.quantify()

In [None]:
# convert kg*m2/sec to in/yr
density_water = units('kg / m^3') * 1000
sb10_precip_converted_int = (sb10_precip / density_water)
sb10_precip_converted = sb10_precip_converted_int.metpy.convert_units('inches / year') 
sb10_precip_converted

In [None]:
# Convert to dataset format to resample and get yearly precipitation estimates
sb10_precip_converted_ds = sb10_precip_converted.to_dataset()
sb10_precip_yearly = sb10_precip_converted_ds["ens-avg_pr"].resample(time='Y').mean()

<a id='usecases'></a> 
### Use Case Examples

### Use Case 1: Protecting your county from heat

Congratulations!! You are part of the Santa Barbara County Climate Change Task Force (SBCCCTF). You have been hired to use data analysis to determine the areas where temperature increase will occur most drastically in the county for X, y, and z years. You are charged with making a map that shows the temperature increases. 

This analysis and map will be critical to determine which areas will be most susceptible to heat waves in the future. County and city officials will use your analysis to determine the most vulnerable populations in the county to heat waves. Your analysis will help save lives as there are more extreme heat events in the future.

In [None]:
# Load xarray from dataset included in the xarray tutorial
# COMMENTED TO REDUCE FILE SIZE: Uncomment these for visualization
# fig = px.imshow(sb10_temp_degf, animation_frame='time',zmin=40, zmax=120, 
#                color_continuous_scale="jet", origin='lower',
#                title="SB County Monthly Average Maximum Near-Surface Air Temperature",
#                width=700, height=500)
#fig.update_layout(margin=dict(l=20, r=20, t=50, b=10))
#fig.show()

We now know that the x, y, and z areas of a county will be most vulnerable to increasing temperatures in the future. Further analysis should include looking at previous heat wave events and determining patterns of health issues related to the events. City data could be used to find suitible buildings to be set up as cooling shelters during these events for those that do not have air conditioning.

### Use case 2: You are a big shot wine investor.

Grapevines can be used to make wine for decades and even as long as 100 years! You, a big shot wine investor want to make sure that your investment will be protected from climate change. To protect and plan for climate change you want to determine how predicted rainfall will change throughout your favorite region, Santa Barbara County.

To conduct this analysis you will make a map of the region and determine the predicted total rainfall for years x, y, and z. This information will then be provided to your growers in the region to determine whether they will need to plan for reduced, increased, or stagnant rainfalls. With this information they will tell you whether your current grapes will be climate change resilient or whether or not you should consider planting different types of grapes.

In [None]:
# Load xarray from dataset included in the xarray tutorial
# COMMENTED TO REDUCE FILE SIZE: Uncomment these for visualization
#fig = px.imshow(sb10_precip_yearly, animation_frame='time',zmin=0, zmax=50, 
#                color_continuous_scale="jet", origin='lower',
#                title="SB County Yearly Precipitation Totals",
#                width=700, height=500)

#fig.update_layout(margin=dict(l=20, r=20, t=50, b=10))
#fig.show()

We now know what the future of rainfall in Santa Barbara is predicted to look like under the worst case climate change scenario according to this dataset. The good news for you is that your crop in this region will be relatively protected. With this base analysis you have determined that it is critical to expand your research into all of your west coast vineyards.

This is the "meat" of the notebook, and what will take the majority of the time to present in class. This section should provide:
1) A plain-text summary (1-2 paragraphs) of the use case example you have chosen: include the target users and audience, and potential applicability. For example, the Week 7 SST exercise might discuss how the state of the ENSO system can be important for seasonal weather forecasts/coral bleaching outlooks, then mention the typical diagnostics associated with ENSO (i.e. identification of El Nino/La Nina events).

2) Markdown and code blocks demonstrating how one walks through the desired use case example. This should be similar to the labs we've done in class: you might want to demonstrate how to isolate a particularly interesting time period, then create an image showing a feature you're interested in, for example.

3) A discussion of the results and how they might be extended on further analysis. For example, we are doing El Nino/La Nina composites in class; a natural extension might be to look at individual events to see what their particular impacts were. Or if there are data quality issues which impact the results, you could discuss how these might be mitigated with additional information/analysis.

Just keep in mind, you'll have roughly 20 minutes for your full presentation, and that goes surprisingly quickly! Probably 2-3 diagnostics is the most you'll be able to get through (you could try practicing with your group members to get a sense of timing).


<a id='binder'></a> 
### Create Binder Environment

The last step is to create a Binder environment for your project, so that we don't have to spend time configuring everyone's environment each time we switch between group presentations. Instructions are below:

 - Assemble all of the data needed in your Github repo: Jupyter notebooks, a README file, and any datasets needed (these should be small, if included within the repo). Larger datasets should be stored on a separate server, and access codes included within the Jupyter notebook as discussed above. 
 
 - Create an _environment_ file: this is a text file which contains information on the packages needed in order to execute your code. The filename should be "environment.yml": an example that you can use for the proper syntax is included in this template repo. To determine which packages to include, you'll probably want to start by displaying the packages loaded in your environment: you can use the command `conda list -n [environment_name]` to get a list.
 
 More information on environment files can be found here:
 https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html#

 - Create Binder. Use http://mybinder.org to create a  URL for your notebook Binder (you will need to enter your GitHub repo URL). You can also add a Launch Binder button directly to your GitHub repo, by including the following in your README.md:

```
launch with myBinder
[![Binder](https://mybinder.org/badge.svg)](https://mybinder.org/v2/gh/<path to your repo>)
```

<a id='references'></a> 
### References

List relevant references. Here are some additional resources on creating professional, shareable notebooks you may find useful:

“Getting Started with MetPy — MetPy 1.1.” n.d. Accessed November 19, 2021. https://unidata.github.io/MetPy/latest/userguide/startingguide.html.
“How to Open and Process NetCDF 4 Data Format in Open Source Python.” 2020. Earth Data Science - Earth Lab. October 16, 2020. https://www.earthdatascience.org/courses/use-data-open-source-python/hierarchical-data-formats-hdf/use-netcdf-in-python-xarray/.
“NEX-DCP30 | NASA Center for Climate Simulation.” n.d. Accessed November 19, 2021. https://www.nccs.nasa.gov/services/data-collections/land-based-products/nex-dcp30.
plotlygraphs. 2019. “Imshow.” July 3, 2019. https://plotly.com/python/imshow/.