# Applying logic of geocat-comp climatology to a UXarray datafile

And encountering issues with the cftime accessor

geocat-comp code for reference - https://github.com/NCAR/geocat-comp/blob/main/geocat/comp/climatologies.py

Data files exist at https://web.lcrc.anl.gov/public/e3sm/diagnostics/uxarray_data/ENSO_ctl_1std/

In [1]:
import cartopy.crs as ccrs
import holoviews as hv
import uxarray as ux
import xarray as xr
import numpy as np

In [5]:
grid_path = '../../data/ne30pg2_grd.nc'
data_path = '../../data/E3SM_ENSO_ctl_keeling_eam_h0_0006-12.nc'

uxds = ux.open_dataset(grid_path, data_path)

In [6]:
uxds

We already know that for this dataset the time coordinate is called 'time'.

Looking at surface temperature data 'TS'

In [7]:
uxds["TS"].isel(time=1).plot()

One piece of geocat-comp's climatology is inferring a time coordinate if it is not provided. While not necessary to move forward with this dataset exploration, it will be required for making the function work more broadly.

It fails with `AttributeError: 'UxDataset' object has no attribute 'cf'`.

In [8]:
time = uxds.cf["time"]
time_coord_name = time.name

AttributeError: 'UxDataset' object has no attribute 'cf'

Moving on

In [9]:
time = uxds["time"]
time

Time exists and is of type `cftime.DatetimeNoLeap`, can't convert to pandas datatimeindex as in https://docs.xarray.dev/en/stable/generated/xarray.CFTimeIndex.to_datetimeindex.html

In [10]:
datetime = time.values.to_datetimeindex()
datetime

AttributeError: 'numpy.ndarray' object has no attribute 'to_datetimeindex'

In [11]:
seasons_dict = {
    "DJF": ([12, 1, 2], 'QS-DEC'),
    "JFM": ([1, 2, 3], 'QS-JAN'),
    "FMA": ([2, 3, 4], 'QS-FEB'),
    "MAM": ([3, 4, 5], 'QS-MAR'),
    "AMJ": ([4, 5, 6], 'QS-APR'),
    "MJJ": ([5, 6, 7], 'QS-MAY'),
    "JJA": ([6, 7, 8], 'QS-JUN'),
    "JAS": ([7, 8, 9], 'QS-JUL'),
    "ASO": ([8, 9, 10], 'QS-AUG'),
    "SON": ([9, 10, 11], 'QS-SEP'),
    "OND": ([10, 11, 12], 'QS-OCT'),
    "NDJ": ([11, 12, 1], 'QS-NOV'),
}
seasons = ['DJF', 'JJA', 'MAM', 'SON']
time_dim = 'time'
frequency = 'season'

Fails at filtering data to desired season

In [12]:
seasonal_climates = []
for season in seasons:

    # Grab the months for each season
    months = seasons_dict[season]

    # Filter data to only contain the months of interest
    dset_filter = uxds.sel(
        {time_dim: uxds[time_dim].dt.month.isin(months)})

    # Calculate monthly average before calculating seasonal climatologies
    dset_filter = dset_filter.resample({time_dim: frequency}).mean().dropna(time_dim)

    # Compute the weights for the months in each season so that the
    # seasonal averages account for months being of different lengths
    month_length = dset_filter[time_dim].dt.days_in_month
    weights = month_length / month_length.sum()
    climatology = (dset_filter * weights).sum(dim=time_dim)

    seasonal_climates.append(climatology)
    uxds = xr.concat(seasonal_climates, dim='season')
    uxds.coords['season'] = np.array(seasons).astype(object)

ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (2,) + inhomogeneous part.

Unpack to just one season

Value error if using months as in the comp function

Failure due to "inhomogeneous shape" perhaps due to nature of unstructured grid?

From this [stack overflow question](https://stackoverflow.com/questions/67183501/setting-an-array-element-with-a-sequence-requested-array-has-an-inhomogeneous-sh) it looks like inhomogenous shape happens if you have a list of lists that are themselves different lengths (ex. `x_set = [[0, 1, 2], [2], [3, 4]]`)

In [13]:
season = 'JJA'
months = seasons_dict[season] #([6, 7, 8], 'QS-JUN')

# Filter data to only contain the months of interest
dset_filter = uxds.sel(
        {time_dim: uxds[time_dim].dt.month.isin(months)})

ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (2,) + inhomogeneous part.

Unpack month list from tuple

ValueError: conflicting sizes for dimension

In [14]:
season = 'JJA'
months, _ = seasons_dict[season] #[6,7,8]


# Filter data to only contain the months of interest
dset_filter = uxds.sel(
        {time_dim: uxds[time_dim].dt.month.isin(months)})

ValueError: conflicting sizes for dimension 'time': length 18 on 'TREFHT' and length 72 on {'time': 'time', 'lev': 'lev'}

Try boolean mask instead of sel

In [15]:
season = 'JJA'
months, _ = seasons_dict[season] #[6,7,8]

mask = uxds[time_dim].dt.month.isin(months)  # Boolean mask
mask

Mask seems able to distinguish which data belongs to the specified months, but gives same error when dropping values.

In [16]:
dset_filter = uxds.where(mask, drop=True)  # Drops elements where mask is False

ValueError: conflicting sizes for dimension 'time': length 18 on 'TREFHT' and length 72 on {'time': 'time', 'lev': 'lev'}

Can try other masking methods, curious about how and if `where` is implemented for UXarray. It doesn't say "`where` does not exist," but rather is unhappy aout the dimensions.

Future code within for-loop