In [1]:
# remove previosuly created example file
import os

from daops.ops.subset import subset

if os.path.exists("./output_001.nc"):
    os.remove("./output_001.nc")

## Subset

Daops has a subsetting operation that calls ``clisops.ops.subset.subset`` from the ``clisops`` library. 

Before making the call to the subset operation, ``daops`` will look up a database of known fixes. If there are any fixes for the requested dataset then the data will be loaded and fixed using the ``xarray`` library and the subsetting operation is then carried out by ``clisops``.

### Results of subset and applying a fix

The results of the subsetting operation in daops are returned as an ordered dictionary of the input dataset id and the output in the chosen format (xarray dataset, netcdf file paths, zarr file paths)

The example below requires a fix so the elasticsearch index has been consulted.

It also demostrates the results of the operation 

In [2]:
# An example of subsetting a dataset that requires a fix - the elasticsearch index is consulted.

ds = "badc/cmip5/data/cmip5/output1/INM/inmcm4/rcp45/mon/ocean/Omon/r1i1p1/latest/zostoga/*.nc"
result = subset(
        ds,
        time=("1955-01-01T00:00:00", "2013-12-30T00:00:00"),
        output_dir=None,
        output_type="xarray",
    )

result._results

2020-12-16 12:20:42,474 - /srv/conda/envs/notebook/lib/python3.7/site-packages/daops/utils/consolidate.py - INFO - Testing 1 files in time range: ...
2020-12-16 12:20:42,507 - /srv/conda/envs/notebook/lib/python3.7/site-packages/daops/utils/consolidate.py - INFO - File 0: badc/cmip5/data/cmip5/output1/INM/inmcm4/rcp45/mon/ocean/Omon/r1i1p1/latest/zostoga/zostoga_Omon_inmcm4_rcp45_r1i1p1_200601-210012.nc
2020-12-16 12:20:42,766 - /srv/conda/envs/notebook/lib/python3.7/site-packages/daops/utils/consolidate.py - INFO - Kept 1 files
2020-12-16 12:20:42,767 - /srv/conda/envs/notebook/lib/python3.7/site-packages/daops/utils/normalise.py - INFO - Working on datasets: OrderedDict([('badc/cmip5/data/cmip5/output1/INM/inmcm4/rcp45/mon/ocean/Omon/r1i1p1/latest/zostoga/*.nc', ['badc/cmip5/data/cmip5/output1/INM/inmcm4/rcp45/mon/ocean/Omon/r1i1p1/latest/zostoga/zostoga_Omon_inmcm4_rcp45_r1i1p1_200601-210012.nc'])])
2020-12-16 12:20:43,550 - elasticsearch - INFO - GET https://elasticsearch.ceda.ac.u

  result = subset_time(ds, **kwargs)
  result = subset_time(ds, **kwargs)


OrderedDict([('badc/cmip5/data/cmip5/output1/INM/inmcm4/rcp45/mon/ocean/Omon/r1i1p1/latest/zostoga/*.nc',
              [<xarray.Dataset>
               Dimensions:    (bnds: 2, time: 96)
               Coordinates:
                   lev        float64 0.0
                 * time       (time) object 2006-01-16 12:00:00 ... 2013-12-16 12:00:00
               Dimensions without coordinates: bnds
               Data variables:
                   lev_bnds   (bnds) float64 dask.array<chunksize=(2,), meta=np.ndarray>
                   time_bnds  (time, bnds) object dask.array<chunksize=(96, 2), meta=np.ndarray>
                   zostoga    (time) float32 dask.array<chunksize=(96,), meta=np.ndarray>
               Attributes:
                   institution:            INM (Institute for Numerical Mathematics,  Moscow...
                   institute_id:           INM
                   experiment_id:          rcp45
                   source:                 inmcm4 (2009)
                   

### File paths of output

If output as file paths, it is also possible to access just the output file paths from the results object.
This is demonstrated below.

In [3]:
# An example of subsetting a dataset that requires a fix - the elasticsearch index is consulted.

ds = "badc/cmip5/data/cmip5/output1/INM/inmcm4/rcp45/mon/ocean/Omon/r1i1p1/latest/zostoga/*.nc"
result = subset(
        ds,
        time=("1955-01-01T00:00:00", "2013-12-30T00:00:00"),
        output_dir=".",
        output_type="netcdf",
        file_namer="simple"
    )

print("ouptut file paths = ", result.file_uris)

2020-12-16 12:20:43,627 - /srv/conda/envs/notebook/lib/python3.7/site-packages/daops/utils/consolidate.py - INFO - Testing 1 files in time range: ...
2020-12-16 12:20:43,649 - /srv/conda/envs/notebook/lib/python3.7/site-packages/daops/utils/consolidate.py - INFO - File 0: badc/cmip5/data/cmip5/output1/INM/inmcm4/rcp45/mon/ocean/Omon/r1i1p1/latest/zostoga/zostoga_Omon_inmcm4_rcp45_r1i1p1_200601-210012.nc
2020-12-16 12:20:43,973 - /srv/conda/envs/notebook/lib/python3.7/site-packages/daops/utils/consolidate.py - INFO - Kept 1 files
2020-12-16 12:20:43,975 - /srv/conda/envs/notebook/lib/python3.7/site-packages/daops/utils/normalise.py - INFO - Working on datasets: OrderedDict([('badc/cmip5/data/cmip5/output1/INM/inmcm4/rcp45/mon/ocean/Omon/r1i1p1/latest/zostoga/*.nc', ['badc/cmip5/data/cmip5/output1/INM/inmcm4/rcp45/mon/ocean/Omon/r1i1p1/latest/zostoga/zostoga_Omon_inmcm4_rcp45_r1i1p1_200601-210012.nc'])])
2020-12-16 12:20:44,431 - elasticsearch - INFO - GET https://elasticsearch.ceda.ac.u

  result = subset_time(ds, **kwargs)
  result = subset_time(ds, **kwargs)


### Checks implemented by daops

Daops will check that files exist in the requested time range

In [4]:
ds = "/badc/cmip5/data/cmip5/output1/INM/inmcm4/rcp45/mon/ocean/Omon/r1i1p1/latest/zostoga/*.nc"

try:
    result = subset(
            ds,
            time=("1955-01-01T00:00:00", "1990-12-30T00:00:00"),
            output_dir=None,
            output_type="xarray",
        )

except Exception as exc:
    print(exc)

2020-12-16 12:20:44,553 - /srv/conda/envs/notebook/lib/python3.7/site-packages/daops/utils/consolidate.py - INFO - Testing 0 files in time range: ...
no files to open
