# Subsetting

The Python package [clisops](https://github.com/roocs/clisops) provides operators like `subset` and `regrid` based on [xarray](http://xarray.pydata.org/en/stable/) on climate model data (CMIP5, CMIP6, CORDEX).

You can install it with *conda*:
```
conda install -c conda-forge clisops
```

This example shows the usage of the `subset` operator with `time` and `area` parameters.

## Init Clisops

In [None]:
from clisops.ops.subset import subset
import xarray as xr

In [None]:
## Turn off warnings?
import warnings
warnings.simplefilter("ignore")

## Get CMIP6 data for testing

In [None]:
# Download from Copernicus data node
!wget -N https://data.mips.copernicus-climate.eu/thredds/fileServer/esg_c3s-cmip6/CMIP/MPI-M/MPI-ESM1-2-HR/historical/r1i1p1f1/Amon/tas/gn/v20190710/tas_Amon_MPI-ESM1-2-HR_historical_r1i1p1f1_gn_201001-201412.nc

In [None]:
ds = xr.open_mfdataset('tas_Amon_MPI-ESM1-2-HR_historical_r1i1p1f1_gn_201001-201412.nc', use_cftime=True, combine="by_coords")

## Subset by time and area

Provide output as `xarray` object.

In [None]:
outputs = subset(
        ds=ds,
        time="2010-01-01T00:00:00/2010-12-31T00:00:00",
        area=(-40, -40, 70, 70),
        output_type="xarray",
    )

print(f"There is only {len(outputs)} output.")
outputs[0]

In [None]:
outputs[0].tas.isel(time=0).plot()

## Subset: Output to netCDF with standard namer

There is only one output as the file size is under the memory limit (1 GB) so does not need to be split.
This example uses the standard namer which names output files according to the input file and how it has been subsetted.

In [None]:
outputs = subset(
        ds=ds,
        time="2010-01-01T00:00:00/2010-12-31T00:00:00",
        area=(-40, -40, 70, 70),
        output_type="nc",
        # output_dir=".",
        # split_method="time:auto",
        file_namer="standard"
    )
outputs

In [None]:
# To open the file

subset_ds = xr.open_mfdataset(outputs)
subset_ds

In [None]:
subset_ds.tas.isel(time=0).plot()