# Averaging over dimensions of the dataset

The average over dimensions operation makes use of `clisops.core.average` to process the datasets and to set the output type and the output file names.

It is possible to average over none or any number of time, longitude, latitude or level dimensions in the dataset.

In [None]:
from clisops.utils import get_file
# fetch files locally or from github
tas_files = get_file([
    "cmip5/tas_Amon_HadGEM2-ES_rcp85_r1i1p1_200512-203011.nc",
    "cmip5/tas_Amon_HadGEM2-ES_rcp85_r1i1p1_203012-205511.nc",
    "cmip5/tas_Amon_HadGEM2-ES_rcp85_r1i1p1_205512-208011.nc",
])

o3_file = get_file("cmip6/o3_Amon_GFDL-ESM4_historical_r1i1p1f1_gr1_185001-194912.nc")

# remove previously created example file
import os
if os.path.exists("./output_001.nc"):
    os.remove("./output_001.nc")

## Parameters

Parameters taken by the `average_over_dims` are below:

    ds: Union[xr.Dataset, str]
    dims : Optional[Union[Tuple[str], DimensionParameter]]
      The dimensions over which to apply the average. If None, none of the dimensions are averaged over. Dimensions
      must be one of ["time", "level", "latitude", "longitude"].
    ignore_undetected_dims: bool
      If the dimensions specified are not found in the dataset, an Exception will be raised if set to True.
      If False, an exception will not be raised and the other dimensions will be averaged over. Default = False
    output_dir: Optional[Union[str, Path]] = None
    output_type: {"netcdf", "nc", "zarr", "xarray"}
    split_method: {"time:auto"}
    file_namer: {"standard", "simple"}
    
    
The output is a list containing the outputs in the format selected.   

In [None]:
from clisops.ops.average import average_over_dims
from roocs_utils.exceptions import InvalidParameterValue
import xarray as xr

In [None]:
ds = xr.open_mfdataset(tas_files, use_cftime=True, combine="by_coords")

ds

## Average over one dimension

In [None]:
result = average_over_dims(ds, dims=["time"], ignore_undetected_dims=False, output_type="xarray")

result[0]


As you can see in the output dataset, time has been averaged over and has been removed.

## Average over two dimensions

Averaging over two dimensions is just as simple as averaging over one. The dimensions to be averaged over should be passed in as a sequence.

In [None]:
result = average_over_dims(ds, dims=["time", "latitude"], ignore_undetected_dims=False, output_type="xarray")

result[0]

In this case both the time and latitude dimensions have been removed.

## Allowed dimensions

It is only possible to average over longtiude, latitude, level and time. If a different dimension is provided to average over an error will be raised.

In [None]:
try:
    average_over_dims(
                ds,
                dims=["incorrect_dim"],
                ignore_undetected_dims=False,
                output_type="xarray",
    )
except InvalidParameterValue as exc:
    print(exc)

## Dimensions not found

In the case where a dimension has been selected for averaging but it doesn't exist in the dataset, there are 2 options. 

1. To raise an exception when the dimension doesn't exist, set `ignore_undetected_dims = False`

In [None]:
try:
    average_over_dims(
        ds,
        dims=["level", "time"],
        ignore_undetected_dims=False,
        output_type="xarray",
    )
except InvalidParameterValue as exc:
    print(exc)

2. To ignore when the dimension doesn't exist, and average over any other requested dimensions anyway, set `ignore_undetected_dims = True`

In [None]:
result = average_over_dims(
        ds,
        dims=["level", "time"],
        ignore_undetected_dims=True,
        output_type="xarray",
)
result[0]

In the case above, a level dimension did not exist, but this was ignored and time was averaged over anyway.

## No dimensions supplied

If no dimensions are supplied, no averaging will be applied and the original dataset will be returned.

In [None]:
result = average_over_dims(
        ds,
        dims=None,
        ignore_undetected_dims=False,
        output_type="xarray"
)

result[0]

## An example of averaging over level

In [None]:
print("Original dataset")
print(xr.open_dataset(o3_file, use_cftime=True))

result = average_over_dims(
        o3_file,
        dims=["level"],
        ignore_undetected_dims=False,
        output_type="xarray",
    )


print("Averaged dataset")
result[0]

In the above, the dimension `plev` has be removed and averaged over