# Windkit long term correction module `windkit.ltc`
This module allows to perform long-term corrections of measured wind data. There are currently
two regression methods implemented, the `LinearRegression` method and the `VarianceRatio` method.
We will perform the long term correction of a mast measurement in Sujawal, Pakistan (lat=24.515563, lon=68.18865).
The mast has wind speed and wind direction measurements at different heights. We will use the wind speeds at 80 meters
and the wind directions a 78.5 meters and use ERA5 data as long term reference.


In [None]:
import windkit as wk
import pandas as pd
import xarray as xr
import matplotlib.pyplot as plt
import numpy as np

## Get mast data
For this example we will use data from https://energydata.info/
### Download data
We will provide the exact URL for download

In [None]:
!wget https://energydata.info/dataset/520eb754-84de-45fb-8d2c-7b2eaa2f7b6e/resource/c0efce06-686e-47ca-a673-0a85ab6fd145/download/wind-measurements_pakistan_sujawal_wb-esmap_qc.csv

### Explore the data with pandas
We can explore the csv file and its columns with `pandas`, particularly we want to see the available column names.

In [None]:
mast_csv_filename="wind-measurements_pakistan_sujawal_wb-esmap_qc.csv"

In [None]:
df_mast=pd.read_csv(mast_csv_filename)
df_mast

In [None]:
df_mast.columns

### Read file with Windkit
The function `wk.read_timeseries_from_csv` read wind data stored in a tabular format and returns a
`windkit` time series wind climate `xarray.Dataset`. We need to indicate which columns corresponds
to wind speed and direction for a specific height so the function can parse the file properly.
The function allows to read several heights for one point, but for now we only need one.

In [None]:
lat=24.515563
lon=68.18865
height=80.0
# the dictionary is {height:(wind speed, wind direction)}
col_info={height:("a80T_wind_speed_mean","d78.5_wind_direction_mean")}

ds_mast_raw=wk.read_timeseries_from_csv(mast_csv_filename,
                                  time_col="time",
                                  west_east=lon,
                                  south_north=lat,
                                  crs=4326,
                                  height_to_columns=col_info)

ds_mast_raw

The mast seems to be `nan` values, so we need to clean the data

In [None]:
ds_mast=ds_mast_raw.dropna(dim="time")
ds_mast

In [None]:
print(f"Mast Range: {ds_mast.time.min().values} - {ds_mast.time.max().values}")

### Load ERA5 Reference data
We can use data from ERA5 for the long term correction. The file used here has been pre downloaded using `windkit.get_era5` and it is for the same location as the mast.

In [None]:
local_filename="ERA5_Sujawal_2010_2018_80m.nc"
era5_ds_point=xr.open_dataset(local_filename)
era5_ds_point

We can visualize the wind rose of the mast and the ERA5 data using windkit. We need to convert the time series
into a binned wind climate.

In [None]:
bwc_mast=wk.bwc_from_timeseries(ds_mast)
wk.plot.histogram_lines(bwc_mast)

In [None]:
bwc_era5=wk.bwc_from_timeseries(era5_ds_point)
wk.plot.histogram_lines(bwc_era5)

## Visualize correlation
For this example, we will visualize the correlation between the masts and the ERA5 Data for the overlapping time, i.e. 2018-2020, for the different wind direction sectors. 
First we need to align both time series so their timestamps overlap. The mast data is sampled every 10 minutes but the ERA5 data is hourly. we can resample to a hourly frequency with `windkit`.

In [None]:
ds_mast_hourly=wk.resample_wind_and_direction(ds_mast,"h")

In [None]:
ds_mast_hourly

In [None]:
ds_mast_hourly=ds_mast_hourly.dropna(dim="time")
ds_mast_hourly

In [None]:
ds_ref,ds_target= xr.align(ds_mast_hourly,era5_ds_point)

### Calculate correlation scores
the function `windkit.ltc.calc_scores` returns a `pandas.DataFrame` with some commonly used scores. The parameters `name` and `period` are just labels we would like to use.

In [None]:
df_scores = wk.ltc.calc_scores(ds_ref,ds_target,name="ERA5",period="2016-2018")
df_scores

### Perform the correlation
the class `windkit.ltc.VarRatMCP` Implements the variance ratio regression. 
The class `windkit.ltc.LinRegMCP` Implements a linear regression. Both classes have
methods `fit` and `predict`.

In [None]:
n_sectors = 12 
quantiles = False

models = [
    wk.ltc.VarRatMCP(n_sectors=n_sectors, quantiles=quantiles),
    wk.ltc.LinRegMCP(n_sectors=n_sectors, quantiles=quantiles),
]

for model in models:
    model_name = type(model).__name__
    
    model.fit(ds_ref, ds_target)
    ds_pred = model.predict(ds_ref)
    scores_n = wk.ltc.calc_scores(ds_target, ds_pred, name=model_name, period='2016-2018')
    df_scores = pd.concat([df_scores,scores_n])


In [None]:
df_scores

### Plot correlation by sector
The following code will allow to visualize the correlation by sector, for both methods.

In [None]:
def plot_sectors(ds_ref,ds_target,n_sectors,models,quantiles):
    ds_ref=wk.spatial.spatial_stack(ds_ref)
    ds_target=wk.spatial.spatial_stack(ds_target)
    fig, axes = plt.subplots(3, 4, figsize=(15.5, 11))

    sector_ref, edges, centers = wk.wd_to_sector(ds_ref.wind_direction, bins=n_sectors, quantiles=quantiles)
    line_ref = np.linspace(0.0, 20.0, 41)
    for ax, i_sec in zip(axes.flat, np.arange(12)):
        mask = sector_ref == i_sec
        ax.scatter(ds_target.wind_speed.values.flatten()[mask.values.flatten()], ds_ref.wind_speed.values.flatten()[mask.values.flatten()], color='Gray', alpha=0.1, zorder=0)
        for model in models:
            model_name = type(model).__name__
            line_pred = model.models_[i_sec].predict(line_ref[:, None])
            ax.plot(line_pred, line_ref, label=model_name, zorder=2)
        ax.plot([0.0, 20.0], [0.0, 20.0], color='black', ls='--', zorder=1)
        ax.set(
            xlim=[0.0, 20.0],
            ylim=[0.0, 20.0],
            xlabel=r'$U_\mathrm{Obs}$ [m$\,$s$^{-1}$]',
            ylabel=r'$U_\mathrm{Ref}$ [m$\,$s$^{-1}$]',
        )
        ax.set_title(r'$n=$' + f'{int(len(ds_target.wind_speed.values[mask]))}', loc='left')
    axes.flat[0].legend()
    plt.tight_layout()



In [None]:
plot_sectors(ds_ref,ds_target,n_sectors,models,quantiles)

## Predict for a test year
We will use one year of the mast (2018) to fit the models and  we will predict for 2016-2017

In [None]:
year =2018
ds_ref_train =  ds_ref.where(ds_ref["time.year"] == year).dropna(dim="time")
ds_ref_test = ds_ref.where(ds_ref["time.year"] != year).dropna(dim="time")
ds_target_train =  ds_target.where(ds_target["time.year"] == year).dropna(dim="time")
ds_target_test = ds_target.where(ds_target["time.year"] != year).dropna(dim="time")

for model in models:
    model_name = type(model).__name__
    
    model.fit(ds_ref_train, ds_target_train)
    ds_predicted = model.predict(ds_ref_test)
    scores_n = wk.ltc.calc_scores(ds_target_test, ds_predicted, name=model_name, period=year)
    df_scores = pd.concat([df_scores,scores_n])
    

In [None]:
df_scores

In [None]:
plot_sectors(ds_target_test,ds_predicted,n_sectors,models,quantiles)

## Perform the long term correction
We will use the year 2018 to fit the model and we will predict a new wind climate using the era5 wind climate. obtaining a corrected wind climate for the period 2010-2018.

In [None]:
year =2018
ds_ref_train =  ds_ref.where(ds_ref["time.year"] == year).dropna(dim="time")
ds_target_train =  ds_target.where(ds_target["time.year"] == year).dropna(dim="time")

models = [
    wk.ltc.VarRatMCP(n_sectors=n_sectors, quantiles=quantiles),
    wk.ltc.LinRegMCP(n_sectors=n_sectors, quantiles=quantiles),
]
for model in models:
    model_name = type(model).__name__
    
    model.fit(ds_ref_train, ds_target_train)
    ds_predicted = model.predict(era5_ds_point)
    scores_n = wk.ltc.calc_scores(era5_ds_point, ds_predicted, name=model_name, period="2010-2018")
    df_scores = pd.concat([df_scores,scores_n])

Now we build a corrected binned wind climate dataset

In [None]:
corrected_bwc=wk.bwc_from_timeseries(ds_predicted)

We can visualize the binned wind climates.

In [None]:
wk.plot.histogram_lines(corrected_bwc)

In [None]:
wk.plot.histogram_lines(bwc_era5)

In [None]:
wk.plot.histogram_lines(bwc_mast)