Skip to content

Commit

Permalink
Change Parameter capabilities to allow new subsetting options (#85)
Browse files Browse the repository at this point in the history
* Updated to work with new interval and series parameters

* Updated parameters.

* Got series working for time and level

* Added TimeComponentsParameter:

- for selecting date/times by specific components, such as:
  - year: 2000, 2001
  - month: "feb", "mar"
  - day: 28, 29, 30
  - hour 6

- it will allow subsetting/filtering in clisops/daops/rook

* Added a test

* Updated parameter parsing if class instance provided

* Updated `.asdict()` method on TimeComponentsParameter.

* Added `get_bounds()` method to TimeParameter.

* Updated import in time_parameter.py

* Linted

* Updated CollectionParameter to take FileMapper as input

* Updated: open_xr_dataset

* Updated: open_xr_dataset - now ensures time.encoding["units"] is preserved.

* Updated open_xr_dataset

* Updated arg management in: open_xr_dataset

* Updated HISTORY.rst

* Updated inspection of xr.open_dataset kwargs

* adjust error message to include FileMapper

* use AnyCalendarDateTime to parse datetimes

* update behaviour when dt is none

* extended time and level parameter for string input

* fixed import for daops

* added time parameter get_bounds tests

* update tests

* update tests

* added get_bounds for time_components_parameter

* black

* updated get_bounds (non 360 day calendar)

* update time parameter end value

* update get_bounds tests

* Updated to ensure branch `fix-time-encoding-for-mfdataset` is merged

* Updated HISTORY.rst

* linting

* make small changes to prepare for release

* update history

Co-authored-by: Eleanor Smith <esmith88@sci4.jasmin.ac.uk>
Co-authored-by: Carsten Ehbrecht <ehbrecht@dkrz.de>
Co-authored-by: MacPingu <cehbrecht@users.noreply.github.com>
Co-authored-by: ellesmith88 <e.s.smith@hotmail.co.uk>
Co-authored-by: Elle Smith <40183561+ellesmith88@users.noreply.github.com>
  • Loading branch information
6 people committed Oct 21, 2021
1 parent 62fb87c commit 053eb57
Show file tree
Hide file tree
Showing 23 changed files with 1,122 additions and 394 deletions.
6 changes: 3 additions & 3 deletions .github/workflows/main.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ jobs:
runs-on: ubuntu-latest
strategy:
matrix:
python-version: [3.6, 3.7, 3.8, 3.9]
python-version: [3.7, 3.8, 3.9]
steps:
- uses: actions/checkout@v2
- name: Set up Python ${{ matrix.python-version }}
Expand All @@ -23,10 +23,10 @@ jobs:
if [ -f requirements_dev.txt ]; then pip install -r requirements_dev.txt; fi
- name: Lint with flake8
run: flake8 roocs_utils tests
if: matrix.python-version == 3.6
if: matrix.python-version == 3.7
- name: Check formatting with black
run: black --check --target-version py36 roocs_utils tests
if: matrix.python-version == 3.6
if: matrix.python-version == 3.7
- name: Test with pytest
run: |
pytest -v tests
16 changes: 15 additions & 1 deletion HISTORY.rst
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
Version History
===============

v0.5.0 (Unreleased)
v0.5.0 (2021-10-26)
-------------------
Bug Fixes
^^^^^^^^^
Expand All @@ -11,6 +11,14 @@ Bug Fixes
Breaking Changes
^^^^^^^^^^^^^^^^
* Intake catalog maker removed, now in it's own package: `roocs/catalog-maker <https://github.com/roocs/catalog-maker>`_
* Change to input parameter classes::
* Added: ``roocs_utils.parameter.time_components_parameter.TimeComponentsParameter``
* Modified input types required for classes::
* ``roocs_utils.parameter.time_parameter.TimeParameter``
* ``roocs_utils.parameter.level_parameter.LevelParameter``
* They both now require their inputs to be one of::
* ``roocs_utils.parameter.param_utils.Interval`` - to specify a range/interval
* ``roocs_utils.parameter.param_utils.Series`` - to specify a series of values

New Features
^^^^^^^^^^^^
Expand All @@ -24,6 +32,12 @@ New Features
cru_ts.4.05.{variable}:cru_ts_4.05/data/{variable}/cru_ts4.05.1901.2*.{variable}.dat.nc.gz

In this example, the `variable` parameter will be expanded out to each of the options provided in the list.
* The ``roocs_utils.xarray_utils.xarray_utils.open_xr_dataset()`` function was improved so that the time units of the first data file are preserved in: ``ds.time.encoding["units"]``. A multi-file dataset has now keeps the time "units" of the first file (if present). This is useful for converting to other formats (e.g. CSV).

Other Changes
^^^^^^^^^^^^^
* Python 3.6 no longer tested in GitHub actions.


v0.4.2 (2021-05-18)
-------------------
Expand Down
12 changes: 12 additions & 0 deletions docs/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -35,12 +35,24 @@ Parameters
:undoc-members:
:show-inheritance:

.. automodule:: roocs_utils.parameter.time_components_parameter
:noindex:
:members:
:undoc-members:
:show-inheritance:

.. automodule:: roocs_utils.parameter.dimension_parameter
:noindex:
:members:
:undoc-members:
:show-inheritance:

.. automodule:: roocs_utils.parameter.param_utils
:noindex:
:members:
:undoc-members:
:show-inheritance:

.. automodule:: roocs_utils.parameter.parameterise
:noindex:
:members:
Expand Down
53 changes: 24 additions & 29 deletions roocs_utils/parameter/area_parameter.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,8 @@
from collections.abc import Sequence

from roocs_utils.exceptions import InvalidParameterValue
from roocs_utils.parameter.base_parameter import _BaseParameter
from roocs_utils.parameter.param_utils import area, to_float, parse_sequence


class AreaParameter(_BaseParameter):
Expand All @@ -17,46 +20,38 @@ class AreaParameter(_BaseParameter):
"""

parse_method = "_parse_sequence"
allowed_input_types = [Sequence, str, area, type(None)]

def _validate(self):
def _parse(self):
if isinstance(self.input, type(None)) or self.input == "":
return None

if self._result is not None and len(self._result) != 4:
raise InvalidParameterValue(
f"{self.__class__.__name__} should be of length 4 but is of length "
f"{len(self._result)}"
)
self._parse_values()
if isinstance(self.input, (str, bytes)):
value = parse_sequence(self.input, caller=self.__class__.__name__)

def _parse_values(self):
if self._result is None:
return self._result
elif isinstance(self.input, Sequence):
value = self.input

area = []
for value in self._result:
if isinstance(value, str):
if not value.replace(".", "", 1).strip("-").isdigit():
raise InvalidParameterValue("Area values must be a number")
else:
if not (isinstance(value, float) or isinstance(value, int)):
raise InvalidParameterValue("Area values must be a number")
elif isinstance(self.input, area):
value = self.input.value

area.append(float(value))
self.type = "series"

return tuple(area)
if value is not None and len(value) != 4:
raise InvalidParameterValue(
f"{self.__class__.__name__} should be of length 4 but is of length "
f"{len(value)}"
)

@property
def tuple(self):
"""Returns a tuple of the area values"""
return self._parse_values()
return tuple([to_float(i, allow_none=False) for i in value])

def asdict(self):
"""Returns a dictionary of the area values"""
if self.tuple is not None:
if self.value is not None:
return {
"lon_bnds": (self.tuple[0], self.tuple[2]),
"lat_bnds": (self.tuple[1], self.tuple[3]),
"lon_bnds": (self.value[0], self.value[2]),
"lat_bnds": (self.value[1], self.value[3]),
}

def __str__(self):
return f"Area to subset over:" f"\n {self.tuple}"
return f"Area to subset over:" f"\n {self.value}"
131 changes: 65 additions & 66 deletions roocs_utils/parameter/base_parameter.py
Original file line number Diff line number Diff line change
@@ -1,94 +1,93 @@
from collections.abc import Sequence
from pydoc import locate

from roocs_utils.exceptions import InvalidParameterValue
from roocs_utils.exceptions import MissingParameterValue
from roocs_utils.utils.file_utils import FileMapper
from roocs_utils.parameter.param_utils import interval, series


class _BaseParameter(object):
"""
Base class for parameters used in operations (e.g. subset, average etc.)
"""

parser_method = "UNDEFINED"
allowed_input_types = None

def __init__(self, input):
self.input = input
self._result = self._parse()
self._validate()

def _validate(self):
raise NotImplementedError

@property
def raw(self):
return self.input

def _parse(self):
self.input = self.raw = input

# If the input is already an instance of this class, call its parse method
if isinstance(self.input, self.__class__):
return self.input._parse()

self.value = self.input.value
self.type = getattr(self.input, "type", "undefined")
else:
return getattr(self, self.parse_method)()

def _parse_range(self):
if self.input in ("/", None, ""):
start = None
end = None

elif isinstance(self.input, str):
if "/" not in self.input:
raise InvalidParameterValue(
f"{self.__class__.__name__} should be passed in as a range separated by /"
)
self._check_input_type()
self.value = self._parse()

# empty string either side of '/' is converted to None
start, end = [x.strip() or None for x in self.input.split("/")]
def _check_input_type(self):
if not self.allowed_input_types:
return
if not isinstance(self.input, tuple(self.allowed_input_types)):
raise InvalidParameterValue(
f"Input type of {type(self.input)} not allowed. "
f"Must be one of: {self.allowed_input_types}"
)

elif isinstance(self.input, Sequence):
if len(self.input) != 2:
raise InvalidParameterValue(
f"{self.__class__.__name__} should be a range. Expected 2 values, "
f"received {len(self.input)}"
)
def _parse(self):
raise NotImplementedError

start, end = self.input
def get_bounds(self):
"""Returns a tuple of the (start, end) times, calculated from
the value of the parameter. Either will default to None."""
raise NotImplementedError

else:
raise InvalidParameterValue(
f"{self.__class__.__name__} is not in an accepted format"
)
return start, end
def __str__(self):
raise NotImplementedError

def _parse_sequence(self):
def __repr__(self):
return str(self)

if self.input in (None, ""):
sequence = None
def __unicode__(self):
return str(self)

# check str or bytes
elif isinstance(self.input, (str, bytes)):
sequence = [x.strip() for x in self.input.split(",")]

elif isinstance(self.input, FileMapper):
return [self.input]
class _BaseIntervalOrSeriesParameter(_BaseParameter):
"""
A base class for a parameter that can be instantiated from either and
`Interval` or `Series` class instance. It has a `type` and a `value`
reflecting the type. E.g.:
type: "interval" --> value: (start, end)
type: "series" --> value: [item1, item2, ..., item_n]
"""

elif isinstance(self.input, Sequence):
sequence = self.input
allowed_input_types = [interval, series, type(None), type("")]

else:
raise InvalidParameterValue(
f"{self.__class__.__name__} is not in an accepted format"
)
def _parse(self):

return sequence
if isinstance(self.input, interval):
self.type = "interval"
return self._parse_as_interval()
elif isinstance(self.input, series):
self.type = "series"
return self._parse_as_series()
elif isinstance(self.input, type(None)):
self.type = "none"
return None
elif isinstance(self.input, type("")):
if "/" in self.input:
self.type = "interval"
self.input = interval(self.input)
return self._parse_as_interval()
else:
self.type = "series"
self.input = series(self.input)
return self._parse_as_series()

def _parse_as_interval(self):
raise NotImplementedError

def __str__(self):
def _parse_as_series(self):
raise NotImplementedError

def __repr__(self):
return str(self)
def _value_as_tuple(self):
value = self.value
if value is None:
value = None, None

def __unicode__(self):
return str(self)
return value
31 changes: 16 additions & 15 deletions roocs_utils/parameter/collection_parameter.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,10 @@
from collections.abc import Sequence

from roocs_utils.exceptions import InvalidParameterValue
from roocs_utils.exceptions import MissingParameterValue
from roocs_utils.parameter.base_parameter import _BaseParameter
from roocs_utils.utils.file_utils import FileMapper
from roocs_utils.parameter.param_utils import collection, parse_sequence


class CollectionParameter(_BaseParameter):
Expand All @@ -17,30 +20,28 @@ class CollectionParameter(_BaseParameter):
"""

parse_method = "_parse_sequence"
allowed_input_types = [Sequence, str, collection, FileMapper]

def _validate(self):
if self._result is None:
raise MissingParameterValue(f"{self.__class__.__name__} must be provided")
def _parse(self):
classname = self.__class__.__name__

self._parse_items()
if self.input in (None, ""):
raise MissingParameterValue(f"{classname} must be provided")
elif isinstance(self.input, collection):
value = self.input.value
else:
value = parse_sequence(self.input, caller=classname)

def _parse_items(self):
for value in self._result:
if not (isinstance(value, str) or isinstance(value, FileMapper)):
for item in value:
if not isinstance(item, (str, FileMapper)):
raise InvalidParameterValue(
f"Each id in a collection must be a string or an instance of {FileMapper}"
)

return tuple(self._result)

@property
def tuple(self):
"""Returns a tuple of the collection items"""
return self._parse_items()
return tuple(value)

def __str__(self):
string = "Datasets to analyse:"
for i in self.tuple:
for i in self.value:
string += f"\n{i}"
return string

0 comments on commit 053eb57

Please sign in to comment.