# University of Delaware temperature and rain data

Datasets downloaded from: https://psl.noaa.gov/data/gridded/data.UDel_AirT_Precip.html

* Format: PSL standard NetCDF4
* Monthly values for 1900/01 - 2017/12 (118 years)

Parsing logic is implemented into a module. This notebook is a demonstration of the original dataset and the netCDF format.

In [None]:
%load_ext autoreload
%autoreload 2

In [None]:
temp_data = '../../../data/UDelaware/raw/air.mon.mean.v501.nc'
rain_data = '../../../data/UDelaware/raw/precip.mon.total.v501.nc'

In [None]:
temp_clean = '../../../data/UDelaware/clean/delaware_monthly_temp.csv'
rain_clean = '../../../data/UDelaware/clean/delaware_monthly_rain.csv'

## Parse data

Use `netcdf4` library: https://github.com/Unidata/netcdf4-python

In [None]:
from stairway.sources.udelaware import netcdf_to_pandas

In [None]:
for (path_in, path_out) in zip([temp_data, rain_data], [temp_clean, rain_clean]):
    df = netcdf_to_pandas(path_in)
    df.to_csv(path_out, index=False)

Output data will look as follows (with weather column either being 'temp' or 'precip'):

In [None]:
df.loc[lambda df: (df['lon'] == 0.25) & (df['lat'] == 53.25)]

## Demo of NetCDF data

Below are just some commands that were used in exploring how to parse the netCDF data. 

There is a [nice example notebook](https://nbviewer.jupyter.org/github/Unidata/netcdf4-python/blob/master/examples/reading_netCDF.ipynb) with temperature data that was taken for a reference.


In [None]:
import netCDF4
import pandas as pd

In [None]:
f = netCDF4.Dataset(temp_data)
print(f)

In [None]:
print(f.variables.keys()) # get all variable names

Time variable 

In [None]:
time = f.variables['time']
print(time)

In [None]:
print(time[1])

In [None]:
time.actual_range

In [None]:
time.avg_period

In [None]:
time.units

Temperature variable

In [None]:
temp = f.variables['time']  # temperature variable
print(temp) 

In [None]:
temp.dimensions

In [None]:
temp.shape

In [None]:
temp.units

1416 is the time dimension:

In [None]:
1416/12

Subsetting last 30 years, it's just like nump slicing an array.

In [None]:
tempslice = f.variables['air'][-30*12:, :, :]
tempslice.shape

let's take the man of all january observations

In [None]:
# january start at time=0
tempslice[range(0, 30*12, 12), :, :].mean(axis=0).shape

Can we convert this into a pandas long dataframe with the coordinates as well?

In [None]:
test = tempslice[range(0, 30*12, 12), :, :].mean(axis=0)

In [None]:
test.shape

In [None]:
f.variables['lat'].shape

In [None]:
f.variables['lon'].shape

In [None]:
df = pd.DataFrame(data=test, index=f.variables['lat'], columns=f.variables['lon'])

In [None]:
df.shape

In [None]:
df.head()

In [None]:
df.unstack().reset_index().rename(columns={0: 'temp'}).assign(month = 1).head()

Now do for all months -> see module implementation.

Done.