# Self-calibrated PDSI computation for Canadian stations

We will compute the self-calibrated PDSI as well as the associated PDSI, PHDI, PMDI, and Z-Index values for a time series of temperature and precipitation data from a Canadian weather station.  

In [160]:
import bokeh.io
from bokeh.plotting import figure, show
import numpy as np
import pandas as pd

from climate_indices import indices

Read station metadata from a comma-separated file. Columns are station ID, station name, longitude, latitude, and available water capacity.

In [161]:
metadata = pd.read_csv("data/1012475.metadata")
station_id = metadata["Station"][0]
lat = metadata["Lat"][0]
awc = metadata["AWCmm"][0]
print(f"Station ID: {station_id},  latitide: {lat},  AWC: {awc}")

Station ID: 1012475,  latitide: 48.42,  AWC: 123


Read climatology data from a space-delimited file. Columns are station ID, year, month, day, tmMax, tmin, and precipitation.

In [162]:
column_names = ["station_id", "year", "month", "day", "tmax", "tmin", "prcp"]
station_data = pd.read_csv("data/1012475.dat", header=None, names=column_names, delim_whitespace=True)
station_data.head()

Unnamed: 0,station_id,year,month,day,tmax,tmin,prcp
0,1012475,1950,1,1,-0.4,-5.4,4.6
1,1012475,1950,1,2,-5.4,-8.9,3.1
2,1012475,1950,1,3,-3.6,-9.5,5.9
3,1012475,1950,1,4,-1.1,-7.0,1.6
4,1012475,1950,1,5,2.8,-5.5,6.9


Next we'll convert the year, month, and day columns into a datetime, drop those columns, and set the new datetime as the index.

In [163]:
station_data["date"] = station_data[["year", "month", "day"]].apply(lambda s : pd.Timestamp(f'{s.year}-{s.month}-{s.day}'), axis = 1)
station_data.drop(columns=["year", "month", "day"], inplace=True)
station_data.set_index("date", inplace=True)
station_data.head()

Unnamed: 0_level_0,station_id,tmax,tmin,prcp
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1950-01-01,1012475,-0.4,-5.4,4.6
1950-01-02,1012475,-5.4,-8.9,3.1
1950-01-03,1012475,-3.6,-9.5,5.9
1950-01-04,1012475,-1.1,-7.0,1.6
1950-01-05,1012475,2.8,-5.5,6.9


Get the start and end dates.

In [164]:
start_year = station_data.index[0].year
end_year = station_data.index[-1].year
print(f"Start: {start_year}\nEnd:   {end_year}")

Start: 1950
End:   2019


We'll now take the average of the `tmin` and `tmax` columns to get a daily average temperature Series.

In [165]:
tavg_daily = ((station_data["tmin"] + station_data["tmax"]) / 2)
tavg_daily.head()

date
1950-01-01   -2.90
1950-01-02   -7.15
1950-01-03   -6.55
1950-01-04   -4.05
1950-01-05   -1.35
dtype: float64

We resample the daily average temperature values as monthly, and then get the mean in order to have monthly average temperatures.

In [166]:
# get the monthly average temperatures as a numpy array
tavg = tavg_daily.resample('1M').mean().values
print(f"tavg: type={type(tavg)}, shape={tavg.shape}, dtype={tavg.dtype}")

tavg: type=<class 'numpy.ndarray'>, shape=(830,), dtype=float64


We resample the daily precipitation values as monthly, and then get the sum in order to have monthly total precipitation.

In [167]:
# get the monthly total precipitation as a numpy array
prcp = station_data["prcp"].resample('1M').sum().values
print(f"prcp: type={type(prcp)}, shape={prcp.shape}, dtype={prcp.dtype}")

prcp: type=<class 'numpy.ndarray'>, shape=(830,), dtype=float64


We can now compute PET.

In [168]:
pet = indices.pet(tavg, lat, start_year)
pet.shape

(830,)

We can now compute self-calibrated PDSI, "traditional" PDSI, PHDI, PMDI, and Z-Index.

Since the precipitation, PET, and AWC values are all in millimeters we'll first multiply these by `0.0393701` in order to convert the units to inches, since this is the unit required for the `climate_indices.indices.scpdsi` function.

In [169]:
prcp = prcp * 0.0393701
pet = pet * 0.0393701
awc = awc * 0.0393701
scpdsi, pdsi, phdi, pmdi, zindex = indices.scpdsi(prcp, pet, awc, start_year, start_year, end_year)

We'll now add all the data into a single Pandas DataFrame.

In [170]:
df = tavg_daily.resample('1M').mean().to_frame(name="tavg")
df["prcp"] = station_data["prcp"].resample('1M').sum()
df["pet"] = pet
df["scpdsi"] = scpdsi
df["pdsi"] = pdsi
df["phdi"] = phdi
df["pmdi"] = pmdi
df["zindex"] = zindex
df.head()

Unnamed: 0_level_0,tavg,prcp,pet,scpdsi,pdsi,phdi,pmdi,zindex
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
1950-01-31,-3.767742,163.7,0.0,0.52346,0.528755,0.52346,0.52346,1.570381
1950-02-28,4.1125,170.5,0.528578,1.788406,1.806495,1.788406,1.788406,3.956587
1950-03-31,5.340323,172.5,0.914755,3.529464,3.565162,3.529464,3.529464,5.77579
1950-04-30,7.995,57.8,1.596089,3.622673,3.659315,3.622673,3.622673,1.370234
1950-05-31,10.479032,24.1,2.47893,3.280187,3.313364,3.280187,3.257101,0.091947


Plot the values we've computed.

In [171]:
bokeh.io.output_notebook()
titles_to_data = {"Self-calibrated PDSI": df["scpdsi"],
                  "Traditional PDSI": df["pdsi"],
                  "(Self-calibrated - Traditional) PDSI Difference": df["scpdsi"] - df["pdsi"],
                  "PHDI": df["phdi"],
                  "PMDI": df["pmdi"],
                  "Z-Index": df["zindex"],}
for title, data in titles_to_data.items():
    p = figure(x_axis_type="datetime", title=title, plot_height=350, plot_width=800)
    p.xgrid.grid_line_color=None
    p.ygrid.grid_line_alpha=0.5
    p.xaxis.axis_label = 'Time'
    p.yaxis.axis_label = 'Value'
    p.line(df.index, data)
    show(p)