# Agreement Evaluation between Data Sets

Now that we have computed the error variances in the [TC notebook](2_TC_application.ipynb) and covariances in the [EC notebook](3_EC_application.ipynb), let's compare the differences (i.e., relative bias) between data sets over all time/seasons. We can then compare this relative bias to the estimated errors to see if the data sets are in statistical agreement. If the data sets are not in agreement based off their relative bias and random errors, then the choice of ET data set could have implications and propagated biases on resulting products modeled from the chosen ET data set.

In [1]:
import hvplot.xarray
import holoviews as hv
import panel as pn
import cartopy.crs as ccrs
import numpy as np
import xarray as xr
from xarray_einstats import linalg, stats
from scipy.stats import norm
import itertools
import warnings
import os

## Combine Data Sets in Xarray
First, we need to load in our ET data sets and limit them to a common date range. Since biases will be between two data sets, we will restrict the data ranges of all data sets to have the beginning date of the second oldest starting date and ending data of the second most recent ending date. This choice allows us to save some memory usage, while also utilizing the largest amount of data. When computing biases for data sets with a more restricted date range, the missing data should propagate and not give us a value on those dates.

In [2]:
files = ['../Data/ssebop/ssebop_aet_regridded.nc',
         '../Data/gleam/gleam_aet.nc',
         '../Data/era5/era5_aet_regridded.nc',
         '../Data/nldas/nldas_aet_regridded.nc',
         '../Data/terraclimate/terraclimate_aet_regridded.nc',        
         '../Data/wbet/wbet_aet_regridded.nc',
         ]
dataset_name = ['SSEBop', 'GLEAM', 'ERA5', 'NLDAS', 'TerraClimate', 'WBET']

date_ranges = {}
for file, name in zip(files, dataset_name):
    ds_temp = xr.open_dataset(file, engine='netcdf4', chunks={'lon': -1, 'lat': -1, 'time': -1})
    date_ranges[name] = [ds_temp.time.min().values, ds_temp.time.max().values]

# Take the second oldest start and second most recent end dates
date_range = [np.sort(np.array(list(date_ranges.values()))[:, 0])[1],
              np.sort(np.array(list(date_ranges.values()))[:, 1])[-2]]
date_range

[numpy.datetime64('1950-01-01T00:00:00.000000000'),
 numpy.datetime64('2022-12-01T00:00:00.000000000')]

Using the date range, we can now combine all of the data sets into a single `xarray.DataSet` for easy computations.

In [3]:
def preprocess(ds):
    """
    Keep only the specified time range for each file.
    """
    return ds.sel(time=slice(date_range[0], date_range[1]))

ds = xr.open_mfdataset(files, engine='netcdf4', preprocess=preprocess, combine='nested', concat_dim='dataset_name')
ds = ds.assign_coords({'dataset_name': dataset_name})
ds.dataset_name.attrs['description'] = 'Dataset name'

# Need time as first index for TC computation
ds = ds.transpose('time', ...)
# The data set is less than 1GiB, so let's read it into memory vs keeping as a dask array
ds = ds.compute()
ds

## Relative Bias Estimation

Next, we will want to compute the relative bias for all 15 possible pairs of our six data sets. So, let's generate those pairs or combinations.

In [4]:
# Generate a list of the combinations
combos = list(itertools.combinations(dataset_name, 2))
combos = [list(combo) for combo in combos]
combos

[['SSEBop', 'GLEAM'],
 ['SSEBop', 'ERA5'],
 ['SSEBop', 'NLDAS'],
 ['SSEBop', 'TerraClimate'],
 ['SSEBop', 'WBET'],
 ['GLEAM', 'ERA5'],
 ['GLEAM', 'NLDAS'],
 ['GLEAM', 'TerraClimate'],
 ['GLEAM', 'WBET'],
 ['ERA5', 'NLDAS'],
 ['ERA5', 'TerraClimate'],
 ['ERA5', 'WBET'],
 ['NLDAS', 'TerraClimate'],
 ['NLDAS', 'WBET'],
 ['TerraClimate', 'WBET']]

Now that we have our data set combinations, let's compute the relative biases!

In [5]:
ds_bias = []
for combo in combos:
    ds_combo = ds.sel(dataset_name=combo)

    da_combo_bias = ds_combo.aet.diff('dataset_name')
    da_combo_bias = da_combo_bias.squeeze('dataset_name').drop_vars('dataset_name')

    ds_combo_bias = xr.Dataset(data_vars={'rel_bias': da_combo_bias},
                               coords={'dataset_pairs': [' '.join(combo)],
                                       'time': ds.time, 'lat': ds.lat, 'lon': ds.lon})
    ds_bias.append(ds_combo_bias)

ds_bias = xr.concat(ds_bias, dim='dataset_pairs')

ds_bias.rel_bias.attrs['description'] = 'Relative bias (i.e., difference) between two data sets listed in dataset_pairs'
ds_bias.dataset_pairs.attrs['description'] = 'Dataset pairs used in difference.'
ds_bias.rel_bias.attrs['units'] = 'mm.month-1'

ds_bias

Now, let's see how the resulting biases look.

In [6]:
plt = ds_bias.rel_bias.hvplot(groupby=['dataset_pairs', 'time'], geo=True, coastline=True,
                              clim=(-75, 75), cmap='PuOr').opts(frame_width=500)

pn.panel(plt, widget_location='top')

## Relative Bias Discussion

Looking at the biases, we can see a large temporal variation in each estimate. However, while being able to check this relative bias temporally is in itself interesting, our goal is to compare the bias with the errors estimated from EC. Therefore, a single bias product that does not vary with time will make this comparison easier. To that end, we will temporally average the bias estimates (both over all time and each season) and use these averages to compare with the errors. Since the EC error covariance matrix estimates utilized a more limited date range than the current bias estimates (fourth versus second oldest and most recent), we limit the temporal average of the biases to use the same date range restriction as used in the EC estimates for consistency.

> Note that the EC estimates actually used the most restrictive date range of all four data sets used in its computation. However, since the EC error estimates we will be using are the ones averaged across data sets, we cannot limit to an exact date range. Instead, we use the same initial date range imposed on the EC data sets.

In [7]:
# We want to ignore all of the sqrt and log warnings with negative values
warnings.filterwarnings("ignore", category=RuntimeWarning)

# Create list of seasons
seasons = ['All'] + list(np.unique(ds.time.dt.season))

# Limit to EC limited date range
# Take the third oldest start and third most recent end dates
ec_date_range = [np.sort(np.array(list(date_ranges.values()))[:, 0])[3],
                 np.sort(np.array(list(date_ranges.values()))[:, 1])[-4]]
ds_bias = ds_diff.sel(time=slice(ec_date_range[0], ec_date_range[1]))

ds_mean_bias = []
ds_median_bias = []
ds_std_bias = []
ds_count_bias = []
for season in seasons:
    if season == 'All':
        ds_season = ds_bias
    else:
        ds_season = ds_bias.isel(time=(ds_bias.time.dt.season == season))

    mean_bias = ds_season.difference.mean(dim='time', skipna=True, keep_attrs=True).expand_dims(season=[season])
    mean_bias.name = 'mean_bias'
    mean_bias.attrs['description'] = 'Mean bias estimate for all common time steps between data sets.'
    mean_bias.attrs['units'] = 'mm.month-1'
    ds_mean_bias.append(mean_bias)

    median_bias = ds_season.difference.median(dim='time', skipna=True, keep_attrs=True).expand_dims(season=[season])
    median_bias.name = 'median_bias'
    median_bias.attrs['description'] = 'Median bias estimate for all common time steps between data sets.'
    median_bias.attrs['units'] = 'mm.month-1'
    ds_median_bias.append(median_bias)

    std_bias = ds_season.difference.std(dim='time', ddof=1, skipna=True, keep_attrs=True).expand_dims(season=[season])
    std_bias.name = 'std_bias'
    std_bias.attrs['description'] = 'Standard deviation of the bias estimates for all common time steps between data sets.'
    std_bias.attrs['units'] = 'mm.month-1'
    ds_std_bias.append(std_bias)

    count_bias = np.isfinite(ds_season.difference).sum(dim='time').expand_dims(season=[season])
    count_bias.name = 'counts'
    count_bias.attrs['description'] = ('Number of datasets used in the average bias '
                                         'estimates (i.e., number of finite time values in a given pixel).')
    count_bias.attrs['units'] = 'counts'
    ds_count_bias.append(count_bias)

ds_mean_bias = xr.concat(ds_mean_bias, dim='season')
ds_median_bias = xr.concat(ds_median_bias, dim='season')
ds_std_bias = xr.concat(ds_std_bias, dim='season')
ds_count_bias = xr.concat(ds_count_bias, dim='season')

# Compile these DataSets into one and save
bias_averages = xr.merge([ds_mean_bias, ds_median_bias, ds_std_bias, ds_count_bias], join='exact')

bias_averages.attrs = None
bias_averages.season.attrs['description'] = ('Season of the year given by the first letter of '
                                             'each month within the season. The full year is given by "All".')

if not os.path.isfile('../Data/compiled_avg_bias.nc'):
    _ = bias_averages.to_netcdf(path='../Data/compiled_avg_bias.nc', format='NETCDF4', engine='netcdf4')

Now that we have our average biases, let's generate some plots that show how the biases compare to the errors. If we see that the bias between the data sets is in agreement within the errors, then it indicates that the uncertainty in ET data sets are more important than the relative biases between them. Conversely, they do not agree, then the choice of ET data set could have implications and propagated biases on resulting products modeled from the ET data.

To show this comparison and check for agreement, we can use an analytical method, since EC assumes a normal distribution for the errors. This method consists of:

1. Calculate the absolute relative bias ($\textrm{bias}_\textrm{abs} = | \textrm{dataset}_A - \textrm{dataset}_B |$),
2. [Propagate the data sets' error variances and covariance](https://en.wikipedia.org/wiki/Propagation_of_uncertainty#Example_formulae) to the bias uncertainty ($\sigma_{\varepsilon_{\rm bias}} = \sqrt{\sigma_{\varepsilon_A}^2 + \sigma_{\varepsilon_B}^2 - 2 \sigma_{\varepsilon_{AB}}}$),
3. Assuming a normal distribution of mean $\textrm{bias}_\textrm{abs}$ and variance $\sigma_{\varepsilon_{\rm bias}}^2$, calculate the probability density of this bias distribution that is less than or equal 0 (a bias of 0 would indicate the data sets are the same).

> We use the absolute relative bias versus the relative bias since we don't care which data set is larger than the other. Rather we just want to know how well they agree within their errors.

With a given probability density, we can then estimate agreement from its value, since it is the probability that the absolute difference between the two data sets is less than or equal to 0. If the probability is low (i.e., say <0.1 or whatever significance level we choose), then the data sets are likely not in agreement. If the probability is near 0.5, then the data sets are in good agreement (i.e., the distribution is relatively equally spread around 0). Given that we have bias and error maps, we will be able to calculate probability densities for each pixel in the maps. Therefore, we can check the agreement of each data set pair across CONUS from these (what we will term) "agreement probability maps".

To run this method and calculate the agreement probability maps, we will first need to read in the average EC error covariance matrices generated in the [EC notebook](3_EC_application.ipynb#EC-Discussion). We can then merge this with the average biases to get a unified data set for the density calculations.

In [9]:
ec_est_averages = xr.open_dataset('../Data/compiled_EC_avg_covar_errs.nc')
# Drop the cross-correlation variables as we don't need them
ec_est_averages = ec_est_averages.drop_vars(['mean_rho', 'median_rho', 'std_rho'])
# Rename the counts and covar_pair to match the bias dataset_pairs
ec_est_averages = ec_est_averages.rename({'counts': 'counts_covar',
                                          'covar_pair': 'dataset_pairs',
                                          'covar_pair_idx_1': 'dataset_pairs_idx_1',
                                          'covar_pair_idx_2': 'dataset_pairs_idx_2'})

bias_ec_averages = xr.merge([bias_averages.rename({'counts': 'counts_bias'}), ec_est_averages])
bias_ec_averages

Now that we have the error covariance matrices read in and matched with the bias estimates, we can calculate the probabilities for checking agreement using the method outlined above.

In [10]:
variances = bias_ec_averages['median_covar'].linalg.diagonal(
                dims=['dataset_pairs_idx_1', 'dataset_pairs_idx_2'], offset=0,
            )
covariances = bias_ec_averages['median_covar'].linalg.diagonal(
                dims=['dataset_pairs_idx_1', 'dataset_pairs_idx_2'], offset=1,
              ).squeeze()
bias_ec_averages['sigma_bias'] = np.sqrt(variances.sum(dim='dataset_pairs_idx_1') - 2 * covariances)

norm_dist = stats.XrContinuousRV(norm,
                                 loc=np.abs(bias_ec_averages['median_bias']),
                                 scale=bias_ec_averages['sigma_bias'])
# Set and name it as a DataArray as the attributes of the coordinates are not kept
agreement_probability = norm_dist.cdf(0)
agreement_probability.name = 'agreement_probability'

# Merge to preserve coordinate attributes
bias_ec_averages = xr.merge([bias_ec_averages, agreement_probability])
bias_ec_averages

Finally, it is time to generate some plots that show how the biases compare to the errors.

In [11]:
plt = (bias_ec_averages['agreement_probability'].hvplot(groupby=['season', 'dataset_pairs'], geo=True, 
                                                        coastline=True, clim=(0, 0.5), cmap='Purples',
                                                        title='Agreement Probability').opts(frame_width=500)
       + bias_ec_averages['median_bias'].hvplot(groupby=['season', 'dataset_pairs'], geo=True,
                                                coastline=True, clim=(-50, 50), cmap='PuOr',
                                                title='Relative Bias').opts(frame_width=500))

pn.panel(plt.cols(2), widget_location='top')

Two things to reiterate when looking at the agreement probabilities is that the errors used in its calculation are likely lower limits on the uncertainty and the bias is a temporal median. Therefore, if we see a good agreement between data sets, this is likely true for the majority of the monthly ET data (i.e., 50% of the monthly relative bias is below the median). Additionally, if there is a lack of agreement, then it is still possible the data sets agree if the errors are drastically underestimated.

With that in mind, looking at the agreement probabilites for each data set we can see that:

1. **SSEBop** - The agreement probability with all data sets is typically high across CONUS, except in mountainous regions like the Rocky Mountains. Looking at seasonal data, winter shows minimal agreement with the other data sets. However, that is expected as winter typically has almost no ET, and therefore has small errors, which make agreement difficult. As for spring, summer, and fall, the agreement patterns vary, with spring mainly having agreement in the Southeast, summer having agreement in the Midwest, and fall having agreement in the East to Southeast for with the other data sets.

2. **GLEAM** - The agreement probability with all data sets is typically high for all of the CONUS, excluding the Pacific Northwest, where the probability can be below 0.05. As for seasonal data, the spatial pattern of agreement is not very consistent across data sets in the summer and spring, with some data sets showing agreement in southern CONUS, while others do not. However, most data sets show very low agreement in northern CONUS during these seasons. Looking at fall, most data sets show agreement in the South to Southeast, but still show low agreement probabilites in the North.

3. **ERA5** - Overall, the agreement probability with all data sets is typically high for all of the CONUS, with some lower agreement happening across CONUS with NLDAS. Seasonal subsets show high agreement in the summer and fall for most data sets in the South to Southeast. However, agreement is low in the northern half of CONUS during these months, as well as practically all of CONUS in the spring.

4. **NLDAS** - The agreement probability is spatially varied when comparing between the other data sets. However, most data sets show agreement across CONUS at reasonable probability levels (i.e., >0.1). Some lower agreement probabilities across data sets occurs in eastern CONUS, especially the southeastern coast. In terms of seasonal variation, the southwest and centeral US typically have high agreement in the summer and fall, but the majority of CONUS shows low agreement in the spring.

5. **TerraClimate** - Overall, the agreement probability with all data sets is high for all of the CONUS, with relatively lower (but still reasonably high) probabilities in the Southeast and Pacific Northwest. Across seasons, we typically see low agreement in all but the Southwest during spring. Agreement is more spatially varied in the summer, with some data sets showing more agreement in the North and others the South. As for fall, most data sets show excellent agreement in the western half of CONUS, with mixed agreement in the eastern half.

6. **WBET** - On average, the agreement probabilities are high across CONUS, but there is some variation between data sets. The seasonal subsets follow a similar trend, with agreement varying spatially between data sets. Spring shows the lowest overall agreement, which only has strong agreement in the extreme Southwest and Florida Peninsula. Summer shows better agreement, primarily in the MidWest. Finally, fall shows overall high agreement between data sets for most of CONUS except with SSEBop and TerraClimate, which show low agreement in the western half and eastern half of CONUS, respectively.

From these summaries, we can draw a few conclusions about the agreement between the data sets. In general, most data sets agree across CONUS when not separating the data by seasons. However, each have certain regions where the agreement with the other data sets is low. These regions indicate one of two things with the data set, either 1) the lack of consensus with the other data sets shows the ET data set is truly biased in these regions, or 2) the uncertainty is underestimated in these regions. Either way, the lack of agreement indicates that the ET data set is not optimally performing in these given regions. As for seasonalities (excluding winter as low agreement is expected as stated above), there is a lot of variation in agreement based on the data set and the corresponding bias pair. In general though, it appears that spring has the lowest overall agreement, increasing into summer and further increasing in the fall. Again, each data set has certain regions where the seasonal agreement with the other data sets is high and low. Therefore, we will explore this regional variation further in a [Regional Analysis notebook](5_regional_analysis.ipynb), where we explore how each ET data sets performs across three regions in CONUS where quality ET data is critical to hydrological modeling and water availability.