# Cooling Degree Days in the US


## 1. Introduction (feel free to make changes to the text)

Cooling degree days (CDD) is a temperature index that is used to monitor the climate in the US ([see US Global Change Research Program](https://www.globalchange.gov/browse/indicators/indicator-heating-and-cooling-degree-days)).
The EPA has reported that CDDs have significantly increased in the past decades ([see the EPA online summary report](https://www3.epa.gov/climatechange/science/indicators/health-society/heating-cooling2.html), and the [technical description](https://www.epa.gov/sites/production/files/2016-08/documents/heating-cooling_documentation.pdf)). However, spatial trends vary and the figures on the EPA web pages, for example, also show regions (states) in which no trend or slight negative trends are found.

The purpose of this research project ...


<P style="background-color:purple;color:gold;font-size:130%">
<BR>
    The purpose of this research project ... [Fill in a paragraph]
<BR>
<BR>
</P>

## 2. Data and code used (and other resources)

The starting point of the data analysis was given in the Notebook "unit9_conus_cdd_homework_assignment2". It is the starting point for the trend analysis of the continental US. Here we make use of a CDD data set that is derived from a gridded daily temperature data set developed by scientists at NCAR ([Newman et al.(2015)](https://journals.ametsoc.org/doi/10.1175/JHM-D-15-0026.1)).

The data cover the continental U.S. (CONUS) with a resolution of 1/8 degree, and the time range is 1980-2015.

The version of the source data product is an updated version of the orginally published data set (Newman, personal communication, 2018). 

### 2.1 CCD calculation

We use the calculated annual CCD data. They were derived from the mean daily temperature adopting the standard CDD calculation with a base temperature of tbase=65F. Note the calculations were done in the units Fahrenheit. 
In each year the daily mean temperatures were set to 0 were the temperature is below the base temperature value. For temperatures above, *tmean-tbase* were calculated and then the resulting values were summed for each year. 

### 2.2 CDD data

The CDD data are yearly values from 1980 to 2015. For each year the gridded CDD data are stored as 2-dimensional arrays in a single NetCDF file. The grid resolution is 1/8 degree in latitude and longitude. In addition we have a NetCDF file with the gridded elevation (same grid resolution).

Reading and working with NetCDF files  is supported through the package xarray. 

The xarray package builds upon the numpy package and provides many useful methods to analyze gridded data. Here we extract the data and assign them to variables of type numpy array. Numpy arrays were used in previous units of this course, and we therefore keep the main data analysis working with numpy arrays. 

- *lon, lat*     : 1-dim arrays with longitude and latitude information
- *lon2d, lat2d* : 2-dim arrays with longitude and latitudes for each grid point (dimensions 0:lat, 1:lon)
- *elev*         : 2-dim array with elevation for the land grid points (dimensions 0:lat, 1:lon)
- *data*         : 3-dim array with all CDD values for all years, latitudes and longitudes. The dimensions are:
    - 0: time
    - 1: latitude
    - 2: longitude
    
In addition we have the array with the datetime values in variable 
- *time*          : 1-dim array with the datetime values (useful for time series plots)
- *x*             : 1-dim array with the years (useful for trend fitting and plotting)








In [None]:
%matplotlib inline

In [None]:
import numpy as np
import matplotlib.pylab as plt
import xarray as xr
from scipy import stats

## Reading data from a NetCDF file

### What is NetCDF format and who uses it? 
NetCDF is a common data format used in atmospheric, ocean and climate sciences. Many model data output is 
distributed in NetCDF format. You also find remote sensing products in NetCDF format. The format is platform independent, and it is self-describing. That is all 

In [None]:
# PATH to the local NetCDF file
# and NetCDF file name
data_path="/home11/staff/timm/Public/Data/"


files = [data_path+f"conus_tmean_cdd65_{year}.nc" for year in range(1980, 2016)]

print("open the all the data files "+files[0]+"..."+files[-1] )
try:
    nc=xr.open_mfdataset(files,coords=('time',),chunks={'time':1}) # opens the NetCDF
    failed=False
except:
    print("failed")
    failed=True
if not failed: 
    print(nc)
    # get values as numpy array
    lon=nc['lon'].values # NetCDF variable name for the longitude,
    lat=nc['lat'].values # NetCDF variable name for the latitude
    lon2d, lat2d =np.meshgrid(lon, lat) # takes lon and lat data and makes a full 2-d grid filled with lat and lon values
    cdd=nc['cdd']
    # this copies the cdd data from the xarray object into a new 3-d numpy array
    data=cdd.values 
    time=nc.time.values
    x=np.arange(1980,2015+1,1)
nc.close()

In [None]:
# Load the elevation map data for this data set's grid resolution

data_path="/home11/staff/timm/Public/Data/"
file = data_path+f"conus_eighth_elevation.nc"
print("open the elevation "+file)
try:
    nc_elev=xr.open_dataset(file) # opens the NetCDF
    failed=False
except:
    print("failed")
    failed=True
if not failed:
    print(nc_elev)
    elev=nc_elev['elevation'].values # copied to numpy array
nc_elev.close()

In [None]:
print("longitudes, latitudes, cooling degree days (data), and orography(elev)")
print("the time information is given in variable x: it contains the years.")
print(60*"=")
print("shape of array variable data  : ",np.shape(data))
print("------------------------------------------------")
print("shape of array variable elev  : ",np.shape(elev))
print("shape of array variable lon2d : ",np.shape(lon2d))
print("shape of array variable lat2d : ",np.shape(lat2d))
print("------------------------------------------------")
print("shape of array variable lat   : ", np.shape(lat))
print("shape of array variable lon   : ", np.shape(lon))
print("shape of array variable x     : ", np.shape(x))

print(60*'=')



## 2.3 Variables that contain our data:
- data: a 3-dimensional array with cooling degree days [year,lat,lon]
- elev: 2-dim array with elevation [lat,lon]
- lon2d : a 2-dimensional array with longitude grid coordinates (matches dimension of variable elev)
- lat2d : a 2-dimensional array with latitude grid coordinates (matches dimension of variable elev)

- lon and lat: 1-dimensional arrays with the coordinates.


## 3 Data analysis

### User defined variables for plot customization

In [None]:
# plot customization
# lev: levels for color shading
# cm:  colormap for fill colors
lev=np.arange(0,4000,200) # 1-d numpy array 
cm=plt.get_cmap('gray')
xlabel="Longitude"
ylabel="Latitude"
title="CDD"

# time information is stored in variable x (1-d array)
# x contains the years.
# use same x predictor data (years) as before
# used for plotting and linear regression of trends


<P style="background-color:purple;color:gold;font-size:130%">
<BR>
    3.1 Make a plot of the elevation with plt.pcolormesh (1 pt)
<BR>
<BR>
</P>



<P style="background-color:purple;color:gold;font-size:130%">
<BR>
    3.2 Make a contour plot of the of CDD for one selected year (1pt)
<BR>
<BR>
</P>


<P style="background-color:purple;color:gold;font-size:130%">
<BR>
    3.3 Make a time series plot for the average CDD (averaged over the whole 
    lat-lon domain) (2pts)
<BR>
<BR>
</P>


- use np.nanmean with the axis keyword 


<P style="background-color:purple;color:gold;font-size:130%">
<BR>
    3.4 Calculate the trend for the time series (from 3.3) (2pts) 
<BR>
<BR>
</P>




<P style="background-color:purple;color:gold;font-size:130%">
<BR>
    3.5 Summarize your results and discuss the trend (e.g. is there a significant trend and what does the slope tell us about the change in the CCD days quantitatively?). Take a look at the year to year variability in the time series. (4pts) 
<BR>
<BR>
</P>





<P style="background-color:purple;color:gold;font-size:130%">
<BR>
    4. Spatial trend analysis: In which regions can we find a significant trend in the CDD? 
<BR>
<BR>
</P>





<P style="background-color:purple;color:gold;font-size:130%">
<BR>
    4.1 Calculate the trend (slope,intercept, correlation, p-value) for all grid points (4 pts)
<BR>
<BR>
</P>

- Define four 2-dimensional numpy arrays (same shape as *elev*)
- Code up two loops for automatic calculation of the trend line with stats.linregress
    - one loop for latitude
    - one loop for longitude
- Assign the returned values to the 2-dim data arrays at the right index positions.



<P style="background-color:purple;color:gold;font-size:130%">
<BR>
    4.2 Plot a trend map (slopes) with plt.contourf or plt.pcolormesh  (1pt)
<BR>
<BR>
</P>

Choose a good color map.



<P style="background-color:purple;color:gold;font-size:130%">
<BR>
    4.3 Plot a map of the p-value with plt.contourf or plt.pcolormesh (1pt)
<BR>
<BR>
</P>

Choose a good color map.



<P style="background-color:purple;color:gold;font-size:130%">
<BR>
    4.4 Summarize your results and discuss the trend map (e.g. where is a significant trend, what does the map tell us about the change in the CCD days quantitatively?). Take a closer look at one region of interest (e.g. NY, CA, Rocky Mountains) and discuss the regional results. (3pts) 
<BR>
<BR>
</P>





<P style="background-color:purple;color:gold;font-size:130%">
<BR>
    5. A short final conclusion statement (1pt) 
<BR>
<BR>
</P>





## References: 
- [Weather Service Information on CDD](https://www.weather.gov/key/climate_heat_cool)
- [Unidata NetCDF documentation](https://www.unidata.ucar.edu/software/netcdf/docs/index.html)
- [Color maps in matplotlib](https://matplotlib.org/examples/color/colormaps_reference.html)

---

### References:
    
- Gridded temperature data: [Newman et al.(2015)](https://journals.ametsoc.org/doi/10.1175/JHM-D-15-0026.1)