# NASA Earth Exchange (NEX) Downscaled Climate Projections (NEX-DCP30) dataset for the conterminous United States

The modelled data generated from the General Circulation Model(GCM) runs under 33 different Coupled Model Intercomparison Project Phase 5(CMIP5) models. The modelled data projections are available for four greenhouse gas emissions scenarios known as Representative Concentration Pathways(RCPs) labelled after the range of radiative forcing values in the year 2100 (2.6, 4.5, 6, and 8.5 W/m2). Visit the following link for more details on dataset: https://www.nccs.nasa.gov/services/data-collections/land-based-products/nex-dcp30.

Air Temperature and Precipitation are the only variables available in the dataset so they will the focus of our analysis.

In this notebook we will be building on our python skills by first extracting and plotting a time series for Santa Barbara city. Then we will expand to more advanced mapping using Santa Barbara county as region of interest. The climate dataset is available for four Representative Concentration Pathways(RCPs) but we will be using RCP8.5 as our example. Our variables of interest are precipitation and temperature as modelled for the future. This modelling is vital for county governments and economy sectors to begin planning for climate change and understanding how it may effect them directly.

The contents of this notebook can be easily modified to work with any region of the NEX DCP30 US dataset for any of the available RCPs.

## Authors

- Desik Somasundaram, UC Santa Barbara (desik@bren.ucsb.edu ) <br>
- Daniel Kerstan, UC Santa Barbara (danielkerstan@bren.ucsb.edu ) <br>
- Joe DeCesaro, UC Santa Barbara (jdecesaro@bren.ucsb.edu ) <br>

## Table of Contents
The notebook is structured to start with the specified purpose and describe the key characteristics NASA Earth Exchange (NEX) Downscaled Climate Projections (NEX-DCP30) dataset for the conterminous United States.

The data I/O section sets up the kernel with required packages and reads in the dataset.

The metadata display and basic visualization focus on the point data time series visualization while the advanced visualization focuses on regional time series. 

The use case examples will be based off the regional time series visualizations. 
Binder environment and references are provided. 

[1. Purpose](#purpose)

[2. Dataset Description](#overview)

[3. Data I/O](#io)

[4. Metadata Display and Basic Visualization](#display)

[5. Advanced Visualization](#advanceddisplay)

[6. Use Case Examples](#usecases)

[7. Create Binder Environment](#binder)

[8. References](#references)

<a id='purpose'></a> 
## Notebook Purpose

Climate change remains to be one of the most critical issues facing the United States today.
Climate action requires thoughtful policy-making from local and county governments all the way up to the national level. This notebook is a tutorial on leveraging the NEX-DCP30 dataset developed by NASA Earth Exchange for local climate action. The goal is to  inform policymakers and government officials with tangible insights on potential impacts.  

This notebook was produced for the final project of EDS 220 - Remote Sensing. It provides instruction and example code on how to pull in the NASA Earth Exchange climate projection data directly from the source. It provides specific use case examples for Santa Barbara county. Specifically, this workbook shows how local governments and economy sectors could plan for different climate scenarios from the dataset.


<a id='overview'></a> 
## Dataset Description

### General Description
The modelled data is generated by NASA from the General Circulation Model(GCM) runs under 33 different Coupled Model Intercomparison Project Phase 5(CMIP5) models. The modelled data projections are available for four greenhouse gas emissions scenarios known as Representative Concentration Pathways(RCPs) labelled after the range of radiative forcing values in the year 2100 (2.6, 4.5, 6, and 8.5 W/m2). The 2.6 W/m2 scenario is the least extreme and the 8.5 W/m2 scenario is the most extreme.

- Precipitation data is provided as a monthly mean of the daily precipitation rate in units of kg/m2s
- Temperature data is provided as a monthly mean of the daily predicted temperature in units of degrees K

### Coverage
The data spatially covers the contiguous US at a spatial resolution of 30 arc-seconds/0.00833 degrees (approximately 800 meters). The temoporal coverage of the data is made up of a "Retrospective Run" (1950-2005), and a "Prospective Run" (2006-2099).

### File Format
Data is in netcdf files accessed through ncml format. Ncml is an XML representation of metadata in an NetCDF file. 
The data file contains the following information:
- One monthly averaged value for each month from 2006 to 2099
- Monthly average can be based off of daily min, max, average and quartiles for each constituent
- The openDAP access allows you to subset the variables, region and time extent of your interest.

### Retrieval
In this example, dataset will be accessed through the THREDDS OpenDAP services to avoid download of large files in repo. However, there are many other way to access the data including options such as Netcdfsubset.
https://www.nccs.nasa.gov/services/data-collections/land-based-products/nex-dcp30


### Bias Correction-Spatial Disaggregation (BSCD)
***Problem: Biased data unsuitable for local level decision making.*** 

Original GCM model runs are run at coarse resolution which is not at the level of detail required for local decision making. In addition, although the projections are globally accurate there may be considerable bias at the local level since it does not take into account local topography.

***Solution: BSCD using PRISM (observational climate data from meteorological stations)***

**PRISM Dataset Description:** A set of monthly, yearly, and single-event gridded data products including mean temperature and precipitation and max/min temperatures developed from point measurements of precipitation, temperature, and other climatic factors primarily for the United States. In-situ point measurements are ingested into the PRISM (Parameter elevation Regression on Independent Slopes Model) statistical mapping system. The PRISM products use a weighted regression scheme to account for complex climate regimes associated with orography, rain shadows, temperature inversions, slope aspect, coastal proximity, and other factors. Climatologies (normals) are available at 30-arcsec (800 meters) and monthly data are available at 2.5 arcmin (4 km) resolution. (Daly, Christopher & National Center for Atmospheric Research Staff, 2020)

The Bias-Correction step corrects the bias of the GCM data through comparisons performed against the observationally-based PRISM data.
The Spatial-Disaggregation step spatially interpolates the bias-corrected GCM data to the finer resolution grid of the 30-arc second PRISM data.

### Assumptions
Relative spatial patterns in temperature and precipitation observed from 1950 through 2005 will remain constant under future climate change. 

Does not add information beyond what is contained in the original CMIP5 scenarios, and preserves the frequency of periods of anomalously high and low temperature or precipitation (i.e., extreme events) within each individual CMIP5 scenario.


<a id='io'></a> 
## Dataset Input/Output 

1) Import all necessary packages (matplotlib, numpy, etc)

In [None]:
import numpy as np
import scipy
import scipy.ndimage as ndimage
import scipy.stats as stats
import matplotlib.pyplot as plt
import xarray as xr
import cartopy
import cartopy.crs as ccrs
import cartopy.feature as cfeature
from cartopy.feature import NaturalEarthFeature
import geopandas as gpd
import metpy
from metpy.units import units
import pandas as pd
import plotly.express as px

2) Set parameters needed

In [None]:
# - names of any directories where data are stored
# - ranges of years over which data are valid
# - any thresholds or latitude/longitude ranges to be used later (e.g. dimensions of NINO3.4 region, threshold SSTA values for El Nino, etc.)
# Read in Santa Barbara County shapefile (Area of Interest)
sb_county = gpd.read_file('tl_2019_06083_faces/tl_2019_06083_faces.shp')

xmin, ymin, xmax, ymax = sb_county.total_bounds
# Convert x bounds to 0 to 360 format to match netcdf format
xmin_shift = xmin + 360
xmax_shift = xmax + 360

3) Read in data

In [None]:
# location of NEXDCP30 data
nexdcp_url="https://ds.nccs.nasa.gov/thredds/dodsC/bypass/NEX-DCP30/nex-quartile/rcp85/r1i1p1.ncml"
# open dataset using remote URL




<a id='display'></a> 
### Metadata Display and Basic Visualization
**Using Santa Barbara City as Point Location Example**

Take a look at the dataset along with it's metadata

In [None]:
# Metadata Display





**Selecting Santa Barbara city as our point of interest**

In [None]:
# Select a single x,y combination closest to Santa Barbara city location
# ideally we would write a function that identifies the closest keys based on lat lon input
# in this case the keys(grid cell numbers) are provided (found through trial and error)
key1=639
key2=1243
longitude = nexdcp_rcp85_xr["ens-avg_tasmax"]["lon"].values[key1]
latitude = nexdcp_rcp85_xr["ens-avg_tasmax"]["lat"].values[key2]

print("Long, Lat values:", longitude, latitude)

In [None]:
# Slice the data spatially using a single lat/lon point
start_date = "2022-01-15"
end_date = "2099-12-15"

#subset using slice




**Data Wrangling Xarrays to get more visualization friendly format**

In [None]:
# Convert deg K to deg F





In [None]:
# Make metpy recognize the units
sb_avg_precip = sb_avg_precip.metpy.quantify()

In [None]:
# convert kg*m2/sec to in/yr
density_water = units('kg / m^3') * 1000
sb_avg_precip_converted_int = (sb_avg_precip / density_water)
sb_avg_precip_converted = sb_avg_precip_converted_int.metpy.convert_units('inches / year')  

In [None]:
# convert xarray array to xarray dataset
sb_max_temp_ds = sb_max_temp.to_dataset()
sb_avg_precip_ds = sb_avg_precip_converted.to_dataset()
# convert xarray dataset to pandas df
temp_df = sb_max_temp_ds.to_pandas()
precip_df = sb_avg_precip_ds.to_pandas()
# convert time into a column from index
temp_df['time'] = temp_df.index
# drop lat lon since we extracted for a point
temp_df.drop(['lat','lon'], axis=1, inplace=True)
precip_df.drop(['lat','lon'], axis=1, inplace=True)

In [None]:
# Resample monthly data to yearly sum




**Visualizations**

Basic

In [None]:
# Plot a quick histogram of temperature
sb_max_temp.plot.hist()
plt.title("Max Daily Average Air Temperature")
plt.xlabel('Temperature (deg F)')
plt.show()

In [None]:
# Plot a quick histogram of precipitation
precip_yearly["ens-avg_pr"].plot.hist()
plt.title("Yearly Mean Precipitation")
plt.xlabel('Precipitation (in)')
plt.show()

Interactive

In [None]:
# Time series plot of max air temperature





In [None]:
# Time series plot of average precipitation for year (in/yr)
fig = px.line(precip_yearly, x="time", y="ens-avg_pr", labels={
                     "time": "Time",
                     "ens-avg_pr": "Precipitation (in)"},
              title="Yearly Mean Precipitation")
fig.update_xaxes(rangeslider_visible=True)
fig.show()

<a id='advanceddisplay'></a> 
### Advanced Visualization
**Using Santa Barbara County as Region Example**

**Take a look at time extent of data**

In [None]:
# View date range
print("The earliest date in the data is:", nexdcp_rcp85_xr["ens-avg_tasmax"]["time"].values.min())
print("The latest date in the data is:", nexdcp_rcp85_xr["ens-avg_tasmax"]["time"].values.max())  

**Select the Santa Barbara County of a ROI**

In [None]:
# Specify time extent of interest
start_date = "2022-01-15"
end_date = "2032-12-15"


In [None]:
# Slice the data by time and spatial extent
# Note that we used the shapefile to specify the extent of our region of interest
sb10_temp = nexdcp_rcp85_xr["ens-avg_tasmax"].sel(
    time=slice(start_date, end_date),
    lon=slice(xmin_shift, xmax_shift),
    lat=slice(ymin, ymax))

In [None]:
# Repeat for precipitation
sb10_precip = nexdcp_rcp85_xr["ens-avg_pr"].sel(
    time=slice(start_date, end_date),
    lon=slice(xmin_shift, xmax_shift),
    lat=slice(ymin, ymax))

**Data Wrangling Xarrays to get more visualization friendly units**

In [None]:
# Convert deg K to deg F
sb10_temp_degf = sb10_temp.metpy.convert_units('degF')

In [None]:
# Make metpy recognize the units
sb10_precip = sb10_precip.metpy.quantify()

In [None]:
# convert kg*m2/sec to in/yr
density_water = units('kg / m^3') * 1000
sb10_precip_converted_int = (sb10_precip / density_water)
sb10_precip_converted = sb10_precip_converted_int.metpy.convert_units('inches / year') 
sb10_precip_converted

In [None]:
# Convert to dataset format to resample and get yearly precipitation estimates
sb10_precip_converted_ds = sb10_precip_converted.to_dataset()
sb10_precip_yearly = sb10_precip_converted_ds["ens-avg_pr"].resample(time='Y').mean()

<a id='usecases'></a> 
### Use Case Examples

### Use Case 1: Protecting your county from heat

Congratulations!! You are part of the Santa Barbara County Climate Change Task Force (SBCCCTF). You have been hired to use data analysis to determine the areas most susceptible to heat waves over the next 10 years due to climate change. You are charged with making a map that shows the temperature of the region over the next 10 years. 

This analysis and map will be critical to determine which areas will be most susceptible to heat waves in the future. County and city officials will use your analysis to determine the most vulnerable populations in the county to heat waves. Your analysis will help save lives as there are more extreme heat events in the future.

Air Temperature Advanced Visualization

In [None]:
# Load xarray from dataset included in the xarray tutorial
# COMMENTED TO REDUCE FILE SIZE: Uncomment these for visualization
#fig = px.imshow(sb10_temp_degf, animation_frame='time',zmin=40, zmax=120, 
#                color_continuous_scale="jet", origin='lower',
#                title="SB County Monthly Average Maximum Near-Surface Air Temperature",
#                width=700, height=500)
#fig.update_layout(margin=dict(l=20, r=20, t=50, b=10))
#fig.show()

We now know that the Los Padres National Forest, Santa Ynez, and Santa Maria areas of a county will be most vulnerable to increasing temperatures in the future. Further analysis should include looking at previous heat wave events and determining patterns of health issues related to the events. City data could be used to find suitible buildings to be set up as cooling shelters during these events for those that do not have air conditioning.

### Use case 2: You are a big shot wine investor.

Grapevines can be used to make wine for decades and even as long as 100 years! You, a big shot wine investor want to make sure that your investment will be protected from climate change. To protect and plan for climate change you want to determine how predicted rainfall will change throughout your favorite region, Santa Barbara County.

To conduct this analysis you will make a map of the region and determine the predicted total rainfall for the next 10 years. This information will then be provided to your growers in the region to determine whether they will need to plan for reduced, increased, or stagnant rainfalls. With this information they will tell you whether your current grapes will be climate change resilient or whether or not you should consider planting different types of grapes.

Precipitation Advanced Visualization

In [None]:
# Load xarray from dataset included in the xarray tutorial
# COMMENTED TO REDUCE FILE SIZE: Uncomment these for visualization
#fig = px.imshow(sb10_precip_yearly, animation_frame='time',zmin=0, zmax=50, 
#                color_continuous_scale="PuBuGn", origin='lower',
#                title="SB County Yearly Precipitation Totals",
#                width=700, height=500)
#fig.update_layout(margin=dict(l=20, r=20, t=50, b=10))
#fig.show()

We now know what the future of rainfall in Santa Barbara is predicted to look like under the worst case climate change scenario according to this dataset. The good news for you is that your crop in this region will be relatively protected. With this base analysis you have determined that it is critical to expand your research into all of your west coast vineyards.

<a id='binder'></a> 
### Create Binder Environment

Find the binder environment at our repo!
https://github.com/nex-dcp30-intro/notebook


<a id='references'></a> 
### References

- Daly, Christopher & National Center for Atmospheric Research Staff (Eds). Last modified 12 May 2020. "The Climate Data Guide: PRISM High-Resolution Spatial Climate Data for the United States: Max/min temp, dewpoint, precipitation." Retrieved from https://climatedataguide.ucar.edu/climate-data/prism-high-resolution-spatial-climate-data-united-states-maxmin-temp-dewpoint.

- “Getting Started with MetPy — MetPy 1.1.” n.d. Accessed November 19, 2021. https://unidata.github.io/MetPy/latest/userguide/startingguide.html.

- “How to Open and Process NetCDF 4 Data Format in Open Source Python.” 2020. Earth Data Science - Earth Lab. October 16, 2020. https://www.earthdatascience.org/courses/use-data-open-source-python/hierarchical-data-formats-hdf/use-netcdf-in-python-xarray/.

- “NEX-DCP30 | NASA Center for Climate Simulation.” n.d. Accessed November 19, 2021. https://www.nccs.nasa.gov/services/data-collections/land-based-products/nex-dcp30.

- plotlygraphs. 2019. “Imshow.” July 3, 2019. https://plotly.com/python/imshow/.

- “TIGER/Line Shapefile, 2019, County, Santa Barbara County, CA, Topological Faces (Polygons With All Geocodes) County-Based Shapefile.” n.d. Unknown. Accessed November 19, 2021. https://catalog.data.gov/dataset/tiger-line-shapefile-2019-county-santa-barbara-county-ca-topological-faces-polygons-with-all-ge.