👇 (Press on the three dots to expand the code)

In [1]:
# Code preamble: we'll need some packages to display the information in the notebook.
# Feel free to ignore this cell unless you're running the code.
import folium       # Map visualizations
import requests     # Basic http requests
import json         # For handling API return data
import pandas as pd # Pandas is a data manipulation and analysis library

api_base = "https://api.resourcewatch.org/v1"
    
def show_layer(layer_id, year, provider):
    tiles_url = f"{api_base}/layer/{layer_id}/tile/{provider}/{{z}}/{{x}}/{{y}}?year={str(year)}"
    attribution = "ResourceWatch & Vizzuality, 2018",
    map_object = folium.Map(tiles = tiles_url, attr=attribution, max_zoom = 18, min_zoom= 2)
    return map_object

# NEX-GDDP & LOCA indicators calculations

As part of the development of PREP we processed data from two climate downscaling datasets: [NEX-GDDP](https://nex.nasa.gov/nex/projects/1356/) (NASA Earth eXchange Global Daily Downscaled Projections) and [LOCA](http://loca.ucsd.edu/) (LOcalized Constructed Analogs). Both these models are *downscaled climate scenarios*, where coarse-resolution climate models are applied to a finer spatial resolution grid. GDDP data is offered at the global scale, while LOCA data covers the contiguous United States. Data access is offered through their homepages (linked above) and through several additional data cloud repositories --[Amazon AWS](https://registry.opendata.aws/nasanex/) and the [OpenNEX initiative](https://nex.nasa.gov/nex/static/htdocs/site/extra/opennex/) among them. For ease of use, we'll illustrate any examples with the GDDP data available in [Google Earth Engine](https://earthengine.google.com/).

The general data structure is similar for both datasets: daily measures of three forecasted variables (minimum and maximum daily temperatures, daily precipitation) are available for two of the scenarios of the Representative Concentration Pathways (RCPs), the RCP 4.5 and RCP 8.5. Roughly, these correspond to different levels of radiative forcing due to greenhouse emissions. The former scenario's level of emissions would peak at 2040 and then decline, while the latter would countinue to rise throughtout the 21st century. Each of these scenarios is comprised of forecasts for a set of models (21 for GDDP, 31 for LOCA), daily, from 2006 to 2100. A historical series is also included, where these models are applied to the historical forcing conditions, from 1950 to 2006. This results in a massive amount of data: about 12 terabytes of compressed netcdf files are available just for GDDP data. This amount of data is unwieldy, so some processing is needed to reduce it into a smaller, simpler dataset. We have applied two processes on the data: first, we are calculating several climate indicators on the data, in addition to the base variables. These indicators are then used to create an ensemble measure (an average of the different models) and their 25th and 75th percentiles. These are presented at two different temporal resolutions: decadal averages and three 30-year period averages.

## The indicators
This information is present in the [PREP website](https://prepdata.org). We'll query the RW API (which powers PREP) to obtain the datasets and their layers. You can check out the actual code we've ran [here](https://github.com/resource-watch/nexgddp-dataprep/tree/develop).

In [2]:
nex_datasets = json.loads(requests.get(f"{api_base}/dataset?provider=nexgddp&page[size]=1000&includes=layer").text)['data']
loca_datasets = json.loads(requests.get(f"{api_base}/dataset?provider=loca&page[size]=1000&includes=layer").text)['data']
get_data = lambda dset: (
    dset['attributes']['name'],
    dset['attributes']['tableName'],
    next(iter(dset['attributes']['layer']), {"id": None})['id']
)

df = pd.DataFrame([
    *[get_data(dset) for dset in nex_datasets],
    *[get_data(dset) for dset in loca_datasets]
])
df.columns = ['description', 'tableName', 'layerId']
df

Unnamed: 0,description,tableName,layerId
0,Average low temperature (decadal under rcp45 s...,tasmin/rcp45_decadal,c3bb62e8-2d50-4ad2-9ca5-8ce02bed1de5
1,Average high temperature (decadal under rcp45 ...,tasmax/rcp45_decadal,964d1388-4490-487d-b9cc-cd282e4d3d28
2,Heating degree days (decadal under rcp45 scena...,hdds/rcp45_decadal,6efa391a-da5c-4715-b08c-b4c062a6c8c7
3,Cooling degree days (decadal under rcp45 scena...,cdds/rcp45_decadal,a632a688-a181-48b5-93bc-d230e24550d9
4,Frost free season (decadal under rcp45 scenario),ffs/rcp45_decadal,ff6a563a-6e7e-420a-8e9b-a661f2556426
5,Extreme heat days (decadal under rcp45 scenario),xs/rcp45_decadal,427f44eb-baf4-4fe4-a169-be8115bc7c9f
6,Cumulative precipitation (decadal under rcp45 ...,cum_pr/rcp45_decadal,1072e891-8120-45b8-9e41-df688309c764
7,Dry spells periods (decadal under rcp45 scenario),dry/rcp45_decadal,3a5771d7-0a63-40e3-88d1-79430fd86a65
8,Extreme precipitation days (decadal under rcp4...,xpr/rcp45_decadal,90ab695b-15c3-4028-89f1-e280b9d8bba9
9,Average temperature (decadal under rcp85 scena...,tasavg/rcp85_decadal,02e9f747-7c20-4fc8-a773-a5135f24cc91


### Calculating an indicator
Given that the data is expressed in several dimensions, the data has to be reduced across these dimensions. This is done in a certain order. Consider the format of the 'raw data': daily maps from 1951 to 2100 for each model. The first step is to extract a single year of the raw data, for a single model. It's from this data from which we calculate the indicator --in this case, the average maximum temperature.

![first step](step1.png)

The output we are interested in is still at a lower temporal resolution than the data we have now. If we were to calculate an indicator for the decadal temporal resolution dataset, we would take a whole decade of the indicator, as calculated above, and average it again.

![second step](step2.png)

It's from these averaged indicators from where we can calculate the average and the 25th and 75th percentiles. The final measure --the one that can be seen on the web-- is the average of the indicators *across models*. This is known as an ensemble measure.

![third step](step3.png)

### Temperature indicators
#### Maximum daily temperature (tasmax)
The maximum daily temperature is already present in the 'raw' datasets. These values are averaged per temporal unit (decadal, 30y) and model, as described above.

In [27]:
tasmax_layer = show_layer("964d1388-4490-487d-b9cc-cd282e4d3d28", 1971, "nexgddp")
tasmax_layer

#### Minimum daily temperature
Similarly to the indicator above, no processing is needed in this indicator other than the averaging.

In [16]:
tasmin_layer = show_layer("c3bb62e8-2d50-4ad2-9ca5-8ce02bed1de5", 1971, "nexgddp")
tasmin_layer

#### Average daily temperature
We construct the 'average daily temperature' from the average maximum and minimun daily temperatures. This would be the first step in the processing --we would first construct a 'tasavg' variable, and then procesed with the rest of the analysis as usual.

In [18]:
tasavg_layer = show_layer("02e9f747-7c20-4fc8-a773-a5135f24cc91", 1971, "nexgddp")
tasavg_layer

#### Heating degree days
[Heating Degree Days (HDDs)](https://en.wikipedia.org/wiki/Heating_degree_day) is a measure of the demand for energy for heating. It's defined in terms of a fixed baseline, which in our case is 65F. The measure is the accumulated difference in *average* temperatures (Kelvin degrees) to this baseline for days where this temperature does not reach the baseline (i.e. in a day hotter than 65F, 0 heating degree days would be accumulated).

In [19]:
hdd_layer = show_layer("8bc10da3-e610-4105-9f4e-8ebfb1725874", 1971, "nexgddp")
hdd_layer

#### Cooling degree days
In the same vein than HDDs, Cooling Degree Days (CDDs) are the accumulated degrees in excess of the baseline (again, 65F) of the average temperature for a year. It's a measure of energy consumption for cooling in hot days.

In [20]:
cdd_layer = show_layer("a632a688-a181-48b5-93bc-d230e24550d9", 1971, "nexgddp")
cdd_layer

#### Extreme heat days
The number of extreme heat days in a year is defined as the count of days with a maximum temperature higher than the 99th percentile of the baseline. This baseline is calculated per model and per raster pixel, and is the temperature for which 99% of measures from 1971 to 2000 fall below --any temperature higher than this is considered extreme.

![99percentile](histo.png)

In [23]:
xh_layer = show_layer("2266fa97-e19c-4056-a1a9-4d4f29dd178e", 1971, "nexgddp")
xh_layer

In [24]:
# Notice the large difference
xh_layer_2 = show_layer("2266fa97-e19c-4056-a1a9-4d4f29dd178e", 2051, "nexgddp")
xh_layer_2

#### Frost free season
The frost free season is the longest streak of days (measured in *number of days*) above 0C per year.

In [26]:
ffs_layer = show_layer("83ec85e4-997b-4613-bf9e-2301ba6d7b63", 1971, "nexgddp")
ffs_layer

### Precipitation indicators
#### Cummulative precipitation
Precipitation is given in kg m^⁻2 s^-1, both in solid and liquid phases and from all types of clouds. The calculated measure is given as the accumulated yearly precipitation mass per square meter -- it's transformed to mm in the front-end.

In [28]:
cummpr_layer = show_layer("56e19aef-3194-4aad-8df0-9bb9064ac8e6", 1971, "nexgddp")
cummpr_layer

#### Extreme precipitation days
Calculated with the same method as the extreme temperatures indicator, but the baseline is constructed with the precipitations indicator.

In [29]:
xpr_layer = show_layer("7e76e90f-4c35-48fb-9604-3c187a28723b", 1971, "nexgddp")
xpr_layer

#### Dry spells
Average count of 5-day perdios without precipitation per year. In this case, we count longer periods as consecutive dry spells. Any excess on a multiple of 5 days is added as a 'fractional' dry spell.

In [30]:
dry_layer = show_layer("72996b8f-1f59-4d1d-b48b-490d72677473", 1971, "nexgddp")
dry_layer