Here we will compute weather types for NDJF

In [1]:
import xarray as xr
import pandas as pd
import numpy as np
from paraguayfloodspy.weather_type import XrEofCluster

Load in the raw data -- in this case, anomalies of streamfunction

In [2]:
ds = xr.open_dataset("../_data/reanalysis/subset/streamfunc_850_anom.nc")
ds = ds.sel(lat = slice(-15, -30), lon = slice(295, 315), time = slice('1979-11-01', '2016-02-29'))
print(ds)

<xarray.Dataset>
Dimensions:         (lat: 7, lon: 9, time: 4487)
Coordinates:
  * lat             (lat) float32 -15.0 -17.5 -20.0 -22.5 -25.0 -27.5 -30.0
  * lon             (lon) float32 295.0 297.5 300.0 302.5 305.0 307.5 310.0 ...
  * time            (time) datetime64[ns] 1979-11-01 1979-11-02 1979-11-03 ...
    month           (time) int64 11 11 11 11 11 11 11 11 11 11 11 11 11 11 ...
Data variables:
    streamfunction  (time, lat, lon) float32 1.16365e+06 673118.0 195866.0 ...


The weather typing algorithm is performed using the `XrEofCluster` algorithm.
Relevant parameters are entered here!
Note that this function is a little picky and it is necessary that it be a `xarray.Dataset` with the `variable` name passed as a parameter.

In [3]:
best_centroid, best_ts, classifiability = XrEofCluster(
    ds, 
    n_clusters=6, # How many weather types to create?
    prop=0.95, # What proportion of variance should be retained?
    nsim=250, # How many random initializations to compute?
    variable='streamfunction',
    verbose = True # get useful info from the algorithm
)

xarray-based classifiability for 6 clusters
Performing EOF decomposition of data for dimension reduction...
Number of EOFs retained is 4
Carrying out 250 k-means simulations...


  attrs={'long_name': 'eof_mode_number'})
  attrs={'long_name': 'eof_mode_number'})


Computing classifiability index for each pair of simulations...


What is our Classifiability Index?

In [4]:
print("Classifiability Index: {}".format(classifiability))

Classifiability Index: 0.9743113994020977


Now get the data in a useful format

In [5]:
df = pd.DataFrame({'wtype': pd.Series(np.int_(best_ts), index=ds.time)})
df.head()

Unnamed: 0_level_0,wtype
time,Unnamed: 1_level_1
1979-11-01,4
1979-11-02,2
1979-11-03,4
1979-11-04,2
1979-11-05,3


Save to file

In [6]:
df.to_csv("../_data/derived/WeatherTypes.csv")