# Triple Collocation Uncertainty Analysis

Now that we have all of our monthly ET datasets spatially collocated from the [regridding notebook](1_regrid.ipynb), we are ready to perform a Triple Collocation (TC) analysis on the common date ranges.

In [None]:
import hvplot.xarray
import holoviews as hv
import panel as pn
import numpy as np
import xarray as xr
import itertools
import warnings
import os

First, we will run in the Extended Collocation notebook to create our TC function. (We use the EC function as it can do TC and runs each spatial point simultaneously. See the [EC notebook](../TC/EC_function.ipynb) and the [TC notebook](../TC/TC_function.ipynb) for details on the TC method.)

In [None]:
%run ../TC/EC_function.ipynb

## Combine Data Sets in Xarray

Next, we need to load in our data sets and limit them to a common date range. Since we need at least three data sets to utilized TC, we will restrict the data ranges of all data sets to have the beginning date of the third oldest starting date and ending data of the third most recent ending date. This choice allows us to save memory usage, while also utilizing the largest amount of data. For triplets with a more restricted date range, due to one data set having a smaller date range, we will limit the date range further at the time of the TC computation.

In [None]:
files = [
    "../Data/ssebop/ssebop_aet_regridded.nc",
    "../Data/gleam/gleam_aet.nc",
    "../Data/era5/era5_aet_regridded.nc",
    "../Data/nldas/nldas_aet_regridded.nc",
    "../Data/terraclimate/terraclimate_aet_regridded.nc",
    "../Data/wbet/wbet_aet_regridded.nc",
]
dataset_name = ["SSEBop", "GLEAM", "ERA5", "NLDAS", "TerraClimate", "WBET"]

date_ranges = {}
for file, name in zip(files, dataset_name):
    ds_temp = xr.open_dataset(
        file, engine="netcdf4", chunks={"lon": -1, "lat": -1, "time": -1}
    )
    date_ranges[name] = [ds_temp.time.min().values, ds_temp.time.max().values]

# Take the third oldest start and third most recent end dates
date_range = [
    np.sort(np.array(list(date_ranges.values()))[:, 0])[2],
    np.sort(np.array(list(date_ranges.values()))[:, 1])[-3],
]
date_range

Using the date range, we can now combine all of the data sets into a single `xarray.DataSet` for easy computations.

In [None]:
def preprocess(ds):
    """
    Keep only the specified time range for each file.
    """
    return ds.sel(time=slice(date_range[0], date_range[1]))


ds = xr.open_mfdataset(
    files,
    engine="netcdf4",
    preprocess=preprocess,
    combine="nested",
    concat_dim="dataset_name",
)
ds = ds.assign_coords({"dataset_name": dataset_name})
ds.dataset_name.attrs["description"] = "Dataset name"

# Need time as first index for TC computation
ds = ds.transpose("time", ...)
# The data set is less than 1GiB, so let's read it into memory vs keeping as a dask array
ds = ds.compute()
ds

## Time Series Exploration
In case we want to explore the time series of each pixel later, let's make an interactive figure where we can select the latitude and longitude and plot the time series. As we may also want to explore how the time series vary with season, let's add the functionality to select the full year or just a certain season. (Seasons will be denoted by the first letter of each month within the season. For example winter contains December, January, and February. So, it will be denoted by `DJF`. The full year with all seasons will be denoted with `All`.)

In [None]:
def create_timseries(lat=40.125, lon=-100.125, season="All"):
    location_map = ds.aet.isel(time=0, dataset_name=2)
    location_map = location_map * 0
    location_map.loc[{"lat": lat, "lon": lon}] = 1
    if season == "All":
        ds_season = ds
    else:
        ds_season = ds.isel(time=(ds.time.dt.season == season))

    plt = location_map.hvplot(
        geo=True,
        coastline=True,
        title="Time Series Location  (Red dot indicates current pixel)",
        colorbar=False,
        cmap="kr",
    ).opts(frame_height=250) + ds_season.sel(lat=lat, lon=lon, method="nearest").hvplot(
        x="time", groupby="dataset_name", title="Datasets' ET Time Series"
    ).overlay().opts(legend_position="right", frame_height=250)

    return plt


lat_widget = pn.widgets.FloatSlider(
    name="lat",
    start=ds.lat.min().item(),
    end=ds.lat.max().item(),
    step=0.25,
    value=40.125,
)
lon_widget = pn.widgets.FloatSlider(
    name="lon",
    start=ds.lon.min().item(),
    end=ds.lon.max().item(),
    step=0.25,
    value=-100.125,
)
season_widget = pn.widgets.Select(
    name="season", value="All", options=["All", "DJF", "MAM", "JJA", "SON"]
)

bound_plot = pn.bind(
    create_timseries, lat=lat_widget, lon=lon_widget, season=season_widget
)

pn.Column(pn.Row(pn.Column(lat_widget, lon_widget), season_widget), bound_plot)

## TC Estimation

Time to compute the TC uncertainty estimates. To do that, we first need to decide on data sets that have "independent" errors in order to group them together into TC sets.

Here is a table of the data and method used for calculating each ET data set:

| Data Sets      | Reference Data Type | Calculation Method | Date Range       | Resolution       | Input Data |
| -------------- | ------------------- | ------------------ | ---------------- | ---------------- | ---------- |
| SSEBop         | Ex situ             | Energy balance     | 2001/01-2022/12 | 1 km             | **STRM** elevn; **PRISM** Ta; **MODIS** Ts, emissivity, albedo, and NDVI; **GDAS** ETo |
| GLEAM v3.7b    | Ex situ             | Energy balance     | 2003/01-2022/12 | 0.25$^{\circ}$   | **CERES** radiation; **TMPA** precip; **AIRS** Ta; **GLOBSNOW** snow-water equiv; **CCI** vegetation optical depth; **GLDAS** and **CCI** Soil moisture; **MODIS** GVCF (global vegetation continuous fields); **IGBP-DIS** soil properties; **CGLFRD** lightning flash rate for rainfall inference |
| ERA5-Land      | Reanalysis          | Energy balance     | 1950/01-2022/12 | 0.1$^{\circ}$    | **CHTESSEL** Land surface model using model cycle Cy45r1 (2018) |
| NLDAS-2 (Noah) | Reanalysis          | Energy balance     | 1979/02-2022/12 | 0.125$^{\circ}$  | **NARR** (North American Regional Reanalysis) atmospheric forcing data; **PRISM** precip |
| TerraClimate   | Interpolated        | Water balance      | 1958/01-2022/12 | 0.0416$^{\circ}$ | **WorldClim** Ta, vapor, precip, solar radiation, wind (Uses **MODIS** Ts, cloud cover; **STRM** elevn); **CRU** Ts4.0, Tmax, Tmin, vapor, precip, Ta; **JRA-55** Ta, vapor, precip, radiation, wind |
| WBET           | Interpolated        | Water balance      | 1895/10-2018/09 | 800 m            | **PRISM** precip, mean Ta, max Ta, min Ta; **USGS** water use irrigaion, national elevation dataset, NWIS gage II discharge; **EROS** land cover (1938-1999); **Landsat** NLCD land cover (2000-2018); **gridMT** wind; **Koppen-Geiger** climate classification; **Fenneman & Johnson** physiographic province classification; **EPA** level III ecoregions; **STATSGO2** soil saturated hydraulic conductivity, porosity, field capacity, thickness, available water capacity |

From this table, we can group the data sets into measurement systems that "should" be independent:

1) Ex situ
2) Reanalysis
3) Interpolated

This give us 8 different possible combinations of datasets. However, since the computation is fast and resulting TC error estimates will be small in memory (~65MiB), we will just compute all 20 combinations and can filter them out later if we need.

In [None]:
# Generate a list of the combinations
combos = list(itertools.combinations(dataset_name, 3))
combos = [list(combo) for combo in combos]
combos

Since we have data sets with different date ranges, we will need to trim the date ranges here before computing the TC error variances. This will be slightly complicated. So, let's make it the date range selection its own function.

In [None]:
def common_date_range(ds, combo):
    """Return the common date slice of the datasets."""
    old_common_date = []
    recent_common_date = []
    for name in combo:
        old_common_date.append(date_ranges[name][0])
        recent_common_date.append(date_ranges[name][1])

    return slice(np.max(old_common_date), np.min(recent_common_date))

Now that we have the ability to select the common date range, let's compute the TC error standard deviations. We will do this for each season independently along with the full year. (The season and full year will be denoted with the monthly abbreviations or `All` as above.)

In [None]:
# We want to ignore all of the sqrt and log warnings with negative values
warnings.filterwarnings("ignore", category=RuntimeWarning)

# Create list of seasons
seasons = ["All"] + list(np.unique(ds.time.dt.season))

tc_est = []
tc_est_season = []
for combo in combos:
    for season in seasons:
        if season == "All":
            ds_season = ds
        else:
            ds_season = ds.isel(time=(ds.time.dt.season == season))

        ds_combo = ds_season.sel(time=common_date_range(ds, combo), dataset_name=combo)

        tc_covar, snr_temp = ec_covar_multi(
            ds_combo.aet.data, corr_sets=[1, 2, 3], return_snr=True
        )

        tc_est_season.append(
            xr.Dataset(
                data_vars={
                    "error": (
                        ["dataset_combo", "season", "combo_idx", "lat", "lon"],
                        np.sqrt(np.diagonal(tc_covar)).transpose((2, 0, 1))[
                            None, None, ...
                        ],
                    ),
                    "snr": (
                        ["dataset_combo", "season", "combo_idx", "lat", "lon"],
                        # Perform a logarithm as a simple method for setting
                        # negatives to NaN
                        (10 ** np.log10(snr_temp[None, None, ...])),
                    ),
                },
                coords={
                    "dataset_combo": [" ".join(combo)],
                    "season": [season],
                    "combo_idx": [0, 1, 2],
                    "lat": ds.lat,
                    "lon": ds.lon,
                },
            )
        )

    tc_est.append(xr.concat(tc_est_season, dim="season"))
    tc_est_season = []

tc_est = xr.concat(tc_est, dim="dataset_combo")

tc_est.error.attrs["description"] = (
    "TC error standard deviation estimate for the dataset_combo triplet."
)
tc_est.snr.attrs["description"] = (
    "TC unbiased SNR estimate for the dataset_combo triplet."
)
tc_est.dataset_combo.attrs["description"] = "Dataset combination used in TC evaluation."
tc_est.combo_idx.attrs["description"] = (
    'Name index of "dataset_combo" coordinate associated with the data set.'
)
tc_est.season.attrs["description"] = (
    "Season of the year given by the first letter of each month within the "
    'season. The full year is given by "All".'
)
tc_est.error.attrs["units"] = "mm.month-1"
tc_est = tc_est.compute()

tc_est

For convenience, let's rearrange the ET `DataSet` to be by data set rather than combination.

In [None]:
tc_est_by_dataset = []
est_pair = []
for name in dataset_name:
    idx_loc = np.char.find(tc_est.dataset_combo.data, name)
    dataset_loc = np.where(idx_loc != -1)[0]
    combos_single_dataset = tc_est.isel(dataset_combo=dataset_loc)

    tc_est_dataset = []
    for i in range(len(combos_single_dataset.dataset_combo.values)):
        tc_est_single_dataset = combos_single_dataset.isel(dataset_combo=i)
        idx = str(tc_est_single_dataset.dataset_combo.data).split(" ").index(name)
        tc_est_single_dataset = tc_est_single_dataset.sel(combo_idx=idx)
        tc_est_single_dataset = tc_est_single_dataset.drop_vars(
            ["dataset_combo", "combo_idx"]
        )

        tc_est_dataset.append(
            xr.Dataset(
                data_vars=tc_est_single_dataset,
                coords={
                    "dataset_name": name,
                    "est_idx": i,
                    "lat": ds.lat,
                    "lon": ds.lon,
                },
            )
        )

    tc_est_dataset = xr.concat(tc_est_dataset, dim="est_idx")

    tc_est_by_dataset.append(tc_est_dataset)
    est_pair.append(
        [
            combinations.replace(name, "").strip()
            for combinations in tc_est.dataset_combo.data[dataset_loc]
        ]
    )

tc_est_by_dataset = xr.concat(tc_est_by_dataset, dim="dataset_name")

tc_est_by_dataset = tc_est_by_dataset.assign_coords(
    est_pair=(["dataset_name", "est_idx"], np.array(est_pair))
)

tc_est_by_dataset.dataset_name.attrs["description"] = "Dataset names."
tc_est_by_dataset.est_idx.attrs["description"] = (
    "Index of the other two data sets used in the TC triplet as contained in est_pair."
)
tc_est_by_dataset.est_pair.attrs["description"] = (
    "Abbreviation of the other two data sets used in the TC triplet."
)

tc_est_by_dataset = tc_est_by_dataset.compute()

# Save the results for later use in the 4_dataset_agreement and 5_regional notebooks
if not os.path.isfile("../Data/TC_errs.nc"):
    _ = tc_est_by_dataset.to_netcdf(
        path="../Data/TC_errs.nc", format="NETCDF4", engine="netcdf4"
    )

tc_est_by_dataset

Let's see how the resulting error estimates look along with the unbiased SNR of the data sets. This should give us an indication of how much non-noise information the data sets actually contain.

> Since we suppressed the `sqrt` and `log` run time warnings, we can expect to see `NaN`s throughout the maps, where the TC calculation resulted in negative values. This works as intended as any negative error variances should be flagged as incorrect (i.e., this is done with `NaN`).

In [None]:
def tc_plts(dataset_name="SSEBop", est_idx=0, season="All"):
    tc_data = tc_est_by_dataset.sel(
        dataset_name=dataset_name, est_idx=est_idx, season=season
    )

    est_pairs = str(
        tc_est_by_dataset.est_pair.sel(dataset_name=dataset_name, est_idx=est_idx).data
    )
    plt = tc_data.error.hvplot(
        geo=True,
        coastline=True,
        clim=(0, 50),
        title="Error Standard Deviation (other triplet datasets: " + est_pairs + ")",
    ).opts(frame_width=500) + tc_data.snr.hvplot(
        geo=True,
        coastline=True,
        clim=(0.1, 50),
        cnorm="log",
        title="Unbiased SNR (other triplet datasets: " + est_pairs + ")",
    ).opts(frame_width=500)

    return plt


dataset_name_widget = pn.widgets.Select(
    name="dataset_name",
    value="SSEBop",
    options=list(tc_est_by_dataset.dataset_name.values),
)
est_idx_widget = pn.widgets.IntSlider(name="est_idx", start=0, end=9, step=1, value=0)
season_widget = pn.widgets.Select(
    name="season", value="All", options=["All", "DJF", "MAM", "JJA", "SON"]
)

bound_plot = pn.bind(
    tc_plts,
    dataset_name=dataset_name_widget,
    season=season_widget,
    est_idx=est_idx_widget,
)

pn.Column(dataset_name_widget, est_idx_widget, season_widget, bound_plot)

## TC Discussion

Some of the datasets (Mainly GLEAM and NLDAS, and some ERA5 and WBET) have large swaths of `NaN` values caused by negative variances. This is typically caused by one of two things. (1) Covariances in the errors, which are assumed to not be present, or (2) two datasets have approximately order of magnitude larger error variances compared to the third.

After carefully looking at the maps for each data set and pair, we can see that SSEBop and TerraClimate have the largest estimated errors across all combinations, followed by ERA5 and WBET, then GLEAM and NLDAS. Therefore, it appears that (2) is likely the culprit in causing the `NaN` swaths, since the `NaN` density appears to be inversely proportional to the estimate error standard deviation.

To overcome these swaths of `NaN` values, let's combine each error variance to find the mean, median, and standard deviation for each computation, excluding the `NaN`s.

> Note we use the error variances in the mean calculation and then take the sqrt for the mean error standard deviation. Using the error standard deviation in the mean is not statistically correct, as standard deviations should be added in quadrature (i.e., sum of variances). This does not effect the median as it is just a center value.

In [None]:
mean_tc_est = np.sqrt(
    (tc_est_by_dataset.error**2).mean(dim="est_idx", skipna=True, keep_attrs=True)
)
mean_tc_est.name = "mean_error"
mean_tc_est.attrs["description"] = (
    "Mean TC error estimate across all combinations with other datasets."
)
mean_tc_est.attrs["units"] = "mm.month-1"

median_tc_est = tc_est_by_dataset.error.median(
    dim="est_idx", skipna=True, keep_attrs=True
)
median_tc_est.name = "median_error"
median_tc_est.attrs["description"] = (
    "Median TC error estimate across all combinations with other datasets."
)
median_tc_est.attrs["units"] = "mm.month-1"

std_tc_est = tc_est_by_dataset.error.std(
    dim="est_idx", ddof=1, skipna=True, keep_attrs=True
)
std_tc_est.name = "std_error"
std_tc_est.attrs["description"] = (
    "Standard Deviation of TC error estimate across all combinations with other datasets."
)
std_tc_est.attrs["units"] = "mm.month-1"

median_tc_snr = tc_est_by_dataset.snr.median(
    dim="est_idx", skipna=True, keep_attrs=True
)
median_tc_snr.name = "median_snr"
median_tc_snr.attrs["description"] = (
    "Median TC unbiased SNR estimate across all combinations with other datasets."
)

std_tc_snr = tc_est_by_dataset.snr.std(dim="est_idx", skipna=True, keep_attrs=True)
std_tc_snr.name = "std_snr"
std_tc_snr.attrs["description"] = (
    "Standard Deviation of TC unbiased SNR estimate across all "
    "combinations with other datasets."
)

count_tc_est = np.isfinite(tc_est_by_dataset.error).sum(dim="est_idx")
count_tc_est.name = "counts"
count_tc_est.attrs["description"] = (
    "Number of datasets used in the average TC error "
    "estimates (i.e., number of finite values in a given pixel)."
)
count_tc_est.attrs["units"] = "counts"

However, before we look at these average error estimates, let's check the fractional difference between each data set combination and the average to make sure the majority of the combinations are close to the average. This way we are not biasing the average by some extreme estimate.

In [None]:
frac_diff = (tc_est_by_dataset.error - mean_tc_est) / mean_tc_est
quant = frac_diff.quantile(np.linspace(0, 1, 101), dim=["lat", "lon", "est_idx"])
quant.name = "Fractional Difference"
plt = (
    quant.hvplot(groupby=["dataset_name", "season"])
    * hv.HLine(-0.5).opts(color="red", line_width=1, line_dash="dashed")
    * hv.HLine(0.5).opts(color="red", line_width=1, line_dash="dashed")
    * hv.HLine(-0.25).opts(color="orange", line_width=1, line_dash="dashed")
    * hv.HLine(0.25).opts(color="orange", line_width=1, line_dash="dashed")
    * hv.HLine(0.0).opts(color="black", line_width=1, line_dash="dashed")
    * hv.VLine(0.5).opts(color="black", line_width=1, line_dash="dashed")
)
pn.panel(plt, widget_location="top")

From the fraction differences, we can see that at least 60% (86%) of the data is within a fractional difference of 0.25 (0.5) of the average, with the GLEAM having the lowest percentage due it having the smallest estimated error variances. Additionally, the quantile distribution is relatively symmetric for each data set with the median close to a zero fractional difference. Therefore, averaging the different combinations should result in a robust mean error estimate.

Now that we have confirmed the average is not biased, let's take a look at them.

In [None]:
plt = (
    median_tc_est.hvplot(
        groupby=["dataset_name", "season"],
        geo=True,
        coastline=True,
        clim=(0, 50),
        title="Median Error Standard Deviation",
    ).opts(frame_width=500)
    + std_tc_est.hvplot(
        groupby=["dataset_name", "season"],
        geo=True,
        coastline=True,
        clim=(0, 15),
        title="Std of Error Standard Deviation",
    ).opts(frame_width=500)
    + median_tc_snr.hvplot(
        groupby=["dataset_name", "season"],
        geo=True,
        coastline=True,
        clim=(0, 50),
        title="Median Unbiased SNR",
    ).opts(frame_width=500)
    + std_tc_snr.hvplot(
        groupby=["dataset_name", "season"],
        geo=True,
        coastline=True,
        clim=(0, 25),
        title="Std of Unbiased SNR",
    ).opts(frame_width=500)
    + count_tc_est.hvplot(
        groupby=["dataset_name", "season"],
        geo=True,
        coastline=True,
        title="Number of data points used in calculation",
    ).opts(frame_width=500)
)

pn.panel(plt.cols(2), widget_location="top")

Now that we have some quality error standard deviation estimates, let's discuss how the TC assumptions may affect the results. To remind ourselves of these assumptions, they are:

1. The signal and random errors are stationary (i.e., the mean of each is constant with time).
2. All data sets are represent exactly the same ET state (i.e., the three data sets have the same spatial resolution and sampling intervals).
3. No cross-correlation of errors (i.e., measurement system errors are independent of each other).
4. Error orthogonality (i.e., the measurement system errors are independent of the true value).
5. No error autocorrelation (i.e., the error estimates are not correlated with time).

Of these assumptions, it is possible that each is influencing the result. First, it is likely that our signal and random errors are not stationary. We have performed some stationarity tests off-hand and found that the signal is not always stationary (this is expected since ET is not constant with time (seasonally and yearly)). Additionally, the error likely has a non-stationary seasonal component. As discussed in [Gruber et al. (2016)](http://dx.doi.org/10.1016/j.jag.2015.09.002), this is not an issue if the data sets all have the same non-stationarity effect. However, determining this is difficult and other studies typically avoid the issue by using anomaly space or not discussing it at all. Therefore, gauging its effect is highly complex.

Next, representativeness can bias one or two of the error variances in the triplet if the spatial or physical representativeness is highly different between the data sets [(Gruber et al. 2016)](http://dx.doi.org/10.1016/j.jag.2015.09.002). In our case in spatial terms, the data sets all originally had their own native resolution, which we degraded to match the GLEAM resolution. Therefore, it is possible that data sets with high native resolution may be penalized for not resolving coarse-scale features in the data sets with lower resolution. Of the six data sets, SSEBop and WBET were similar in resolution, with TerraClimate close to their resolution as well. ERA5 and NLDAS are both also similar in resolution, with them being almost double GLEAM and 10x lower than SSEBop and WBET. Additionally, the physical representativeness is likely not exactly the same between data sets. Since each uses different methods to estimate ET, it is possible that different methods may be missing certain ET components that others are not. Therefore each data set may not be calculating the "same" ET. However, it is something that can be checked using EC to evaluate the error covariances.

| Data Set | SSEBop | GLEAM v3b | ERA5 | NLDAS | TerraClimate | WBET |
| ------  |  ----  | -----     | ---- | ----- | ----         | ---- |
| Resolution | 0.01 deg (1 km) | 0.25 deg (22.5 km) | 0.1 deg (9 km) | 0.125 deg (11.25 km) | 0.04166 deg (3.75 km) | 0.009 deg (800 m) |


Of these assumptions, the largest issue in our error variance estimates is likely the inclusion of data sets with cross-correlated errors. Meeting this assumption has been shown to be the most influential in getting correct error variance estimates, more than error orthogonality and error autocorrelation [(Yilmax & Crow 2014)](https://doi.org/10.1175/JHM-D-13-0158.1). From a basic assumption standpoint, as discussed above, we would have expected that SSEBop and GLEAM would be correlated along with ERA5 and NLDAS, and TerraClimate and WBET, as they are generated from similar measurement systems. However, this does not seem to affect the error standard deviation estimates drastically as the estimates are relatively consistent between combinations. However, there is still some significant variation. Therefore, this TC application demonstrates that using basic TC is likely not a reliable measure of **exact** error variances for any arbitrarily chosen data set triplet, as knowing what data sets are truly independent is almost an insurmountable task. However, at the very least, it gives us a lower bound on the error variances as triplets with cross-correlated errors result in underestimated error variances for the correlated pair.

Finally, while the errors are potentially orthogonal to the true value, they are likely not free of autocorrelation. This is mainly due to seasonality in the ET data. Errors will most definitely be smaller during the winter when ET is near zero compared to the peaks in the summer. The effects of this autocorrelation on the error estimates could result in underestimated error variances as the data variance from which it is estimated would be biased by these lower error values in the winter. This is obvious when looking at the seasonal results versus the full year. Therefore, seasonal components should be taken into consideration when determining the error variances.

Finally, let's look that how the error variance estimations vary between each combination in an aggregate form. We will do this by plotting the whole map of ET estimates as a histogram (i.e., each pixel becomes a count in the histogram). If we find that the error variances are similar between each combination, it may be that cross-correlation is not prominent or it may help us see certain combinations that result in larger error variances, which indicate strong cross-correlations.

In [None]:
def histogram_plts(dataset_name="SSEBop", season="All"):
    da_pair = []
    for i in range(10):
        da_pair.append(
            tc_est_by_dataset.error.sel(
                dataset_name=dataset_name, est_idx=i, season=season
            )
        )
        da_pair[i].name = tc_est_by_dataset.est_pair.sel(
            dataset_name=dataset_name, est_idx=i
        ).data.item()
    da_mean = mean_tc_est.sel(dataset_name=dataset_name, season=season)
    da_median = median_tc_est.sel(dataset_name=dataset_name, season=season)
    da_mean.name = "Mean"
    da_median.name = "Median"

    plt = (
        da_pair[0]
        .hvplot.hist(
            bins=50,
            bin_range=(0, 50),
            xlabel="Error (mm.month-1)",
            ylabel="Counts",
            title="TC Error Standard Deviation Distribution of "
            + dataset_name
            + " Data Set",
            alpha=0.7,
            normed=True,
        )
        .opts(height=400)
    )
    for i in range(1, 10):
        plt *= da_pair[i].hvplot.hist(
            bins=50,
            bin_range=(0, 50),
            alpha=0.7,
            normed=True,
            title="TC Error Standard Deviation Distribution of "
            + dataset_name
            + " Data Set",
        )
    plt *= da_mean.hvplot.hist(
        bins=50,
        bin_range=(0, 50),
        alpha=0.7,
        normed=True,
        title="TC Error Standard Deviation Distribution of "
        + dataset_name
        + " Data Set",
    )
    plt *= da_median.hvplot.hist(
        bins=50,
        bin_range=(0, 50),
        alpha=0.7,
        normed=True,
        title="TC Error Standard Deviation Distribution of "
        + dataset_name
        + " Data Set",
    )

    return plt


def median_table(dataset_name="SSEBop", season="All"):
    da_pair = []
    for i in range(10):
        da_pair.append(
            tc_est_by_dataset.error.sel(
                dataset_name=dataset_name, est_idx=i, season=season
            )
        )
    da_mean = mean_tc_est.sel(dataset_name=dataset_name, season=season)
    da_median = median_tc_est.sel(dataset_name=dataset_name, season=season)

    medians = [da.median().data.item() for da in da_pair + [da_mean, da_median]]
    table = hv.Table(
        {
            "Independent Pair": list(
                tc_est_by_dataset.est_pair.sel(dataset_name=dataset_name).data
            )
            + ["Mean", "Median"],
            "Median": medians,
        },
        ["Independent Pair", "Median"],
    ).opts(width=250, height=450)

    return table


dataset_name_widget = pn.widgets.Select(
    name="dataset_name",
    value="SSEBop",
    options=list(tc_est_by_dataset.dataset_name.values),
)
season_widget = pn.widgets.Select(
    name="season", value="All", options=["All", "DJF", "MAM", "JJA", "SON"]
)

bound_plot = pn.bind(
    histogram_plts, dataset_name=dataset_name_widget, season=season_widget
)
bound_table = pn.bind(
    median_table, dataset_name=dataset_name_widget, season=season_widget
)

pn.Column(dataset_name_widget, season_widget, pn.Row(bound_plot, bound_table))

From these results we can see that the ET error estimates are actually quite similar between combinations. We can see some interesting results. Like ERA5 and TerraClimate give larger error variances for other data sets when in the combination together, but lower for themselves. This strongly indicates that these two data sets may actually have correlated errors. We can further test this by performing Extended Collocation and having ERA5 and TerraClimate be correlated (which we test in the [next notebook](3_EC_application.ipynb)).

Overall though, the strong overlap between each distribution is a nice indication that the TC method can derive reasonable uncertainty estimates that are relatively consistent regardless of the chosen data set triplets. Therefore, including cross-correlated data sets, while having some effect on the estimates, does not cause multiple factor differences in the error estimates. In other words, as long as it is understood that the TC derived uncertainies are lower limits on the uncertainty, using cross-correlated data sets will give viable results when utilized in a TC analysis.