Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NetCDF and OPeNDAP complex validators #394

Merged
merged 11 commits into from
Aug 31, 2018
Merged

Conversation

huard
Copy link
Collaborator

@huard huard commented Aug 28, 2018

Overview

  • Implement a complex validator for netCDF and OPeNDAP formats.
  • Adds a new DODS format for OPeNDAP mimetypes.

Related Issue / Discussion

#152

Additional Information

Entails a dependency on netCDF (optional).

There is no change to the docs as this is not really discussed anywhere.

Contribution Agreement

(as per https://github.com/geopython/pywps/blob/master/CONTRIBUTING.rst#contributions-and-licensing)

  • I'd like to contribute [feature X|bugfix Y|docs|something else] to PyWPS. I confirm that my contributions to PyWPS will be compatible with the PyWPS license guidelines at the time of contribution.
  • I have already previously agreed to the PyWPS Contributions and Licensing Guidelines

@cehbrecht cehbrecht added this to the 4.2.0 milestone Aug 28, 2018
@coveralls
Copy link

coveralls commented Aug 28, 2018

Coverage Status

Coverage increased (+1.3%) to 74.646% when pulling f226448 on huard:nc_validator into 1356504 on geopython:master.

@cehbrecht
Copy link
Collaborator

@huard I'm testing it :)

codacy has complains about imports. One is relevant (unused exception). Maybe we can ignore the netcdf import complain ... or do it in a different way.

@cehbrecht
Copy link
Collaborator

I'm using the validation mode with Emu. When I use an invalid input for OpenDAP (or NetCDF) the exception is not handled in the validator ... it should probably catch it and say the input is invalid, like here (opendap):

        try:
            from pywps.dependencies import netCDF4 as nc
            nc.Dataset(data_input.file)
            passed = True
        except ImportError:
            passed = False
        except Exception:   # or a more specific exception
            passed = False

@huard
Copy link
Collaborator Author

huard commented Aug 29, 2018

Good point, will update the PR.

@cehbrecht
Copy link
Collaborator

I get an exception when I define the default attribute of a netCDF or OpenDAP ComplexInput with active validation (SIMPLE or STRICT):
TypeError: expected str, bytes or os.PathLike object, not NoneType

@huard
Copy link
Collaborator Author

huard commented Aug 29, 2018

Are you using default_type=SOURCE_TYPE.URL ?
Do you have an example in Emu I could run to reproduce the problem ?

@huard
Copy link
Collaborator Author

huard commented Aug 29, 2018

The default source_type is DATA in pywps, so here we need to change it to URL to avoid this error. We should think of a way to handle this more gracefully.

@huard
Copy link
Collaborator Author

huard commented Aug 29, 2018

I pushed to emu with a fix.

@huard
Copy link
Collaborator Author

huard commented Aug 29, 2018

The default handling question raises an issue I hadn't thought about. At the moment the default is set at the creation of ComplexInput, using the default source type, which is DATA. Whatever the default value, I can think of cases where the user request source type would be different from the default value. But changing the source type is not supported by IOHandler.

I see two options, 1. add support for source type changes in IOHandler, 2. defer the setting of the default value until after the user request, when we know no values were provided and we need to actually set the object to its default value. I'd tend to go for the second option to avoid triggering actions (downloads, copies) for nothing.

@cehbrecht
Copy link
Collaborator

From your explanation I also would prefer option 2.

But changing the handling of the default value looks like quite some refactoring. Also we could then skip the default_type for Inputs ... would be an API change. Should this be a separate PR?

@jachym what is your opinion on this? The default will be set when a request comes in (if necessary) to make the source_type decision (File or URL) when we know the value.

@cehbrecht
Copy link
Collaborator

@huard another thing I stumbled on. The setter for the default value does not know the URL type:

def _set_default_value(self, value, value_type):

When I add it then I run into more issues. The validator triggers a download to the workdir which does not exist ... we are at definition time, no process execution:

  File "/Users/pingu/Documents/GitHub/birdhouse/emu/emu/processes/wps_ncmeta.py", line 50, in __init__
    mode=MODE.STRICT),
  File "/Users/pingu/Documents/GitHub/birdhouse/pywps/pywps/inout/inputs.py", line 82, in __init__
    default=default, default_type=default_type)
  File "/Users/pingu/Documents/GitHub/birdhouse/pywps/pywps/inout/basic.py", line 851, in __init__
    self._set_default_value(default, default_type)
  File "/Users/pingu/Documents/GitHub/birdhouse/pywps/pywps/inout/basic.py", line 204, in _set_default_value
    self.url = value
  File "/Users/pingu/Documents/GitHub/birdhouse/pywps/pywps/inout/basic.py", line 261, in fset
    setattr(s, kls.prop, value)
  File "/Users/pingu/Documents/GitHub/birdhouse/pywps/pywps/inout/basic.py", line 416, in url
    self._check_valid()
  File "/Users/pingu/Documents/GitHub/birdhouse/pywps/pywps/inout/basic.py", line 157, in _check_valid
    _valid = validate(self, self.valid_mode)
  File "/Users/pingu/Documents/GitHub/birdhouse/pywps/pywps/validator/complexvalidator.py", line 243, in validatenetcdf
    name = data_input.file
  File "/Users/pingu/Documents/GitHub/birdhouse/pywps/pywps/inout/basic.py", line 424, in file
    self._file = self._build_file_name(href=self.url)
  File "/Users/pingu/Documents/GitHub/birdhouse/pywps/pywps/inout/basic.py", line 216, in _build_file_name
    input_file_name = os.path.join(self.workdir, file_name)
  File "/usr/local/anaconda3/envs/emu/lib/python3.6/posixpath.py", line 80, in join
    a = os.fspath(a)
TypeError: expected str, bytes or os.PathLike object, not NoneType

@cehbrecht
Copy link
Collaborator

@huard @jachym we discoverd the issues with the default value when testing the validator but they are not directly related to it. To get the validator PR in before fixing the default value could we disable default values for ComplexInput and fix this in another PR?

@huard
Copy link
Collaborator Author

huard commented Aug 30, 2018

@cehbrecht Unless I'm wrong, it's a two line change to defer the evaluation of ComplexInput. I'll push the commit so you can take a look. I've also fixed the missing URL setting which I also ran into yesterday.

@cehbrecht
Copy link
Collaborator

@huard It works :) If I understand it correctly you store the default in _default and in case of ComplexInput _set_default_value is only called when an execute request comes in.

Shall we set the default_type for ComplexInput to SOURCE_TYPE.URL by default? We mostly use URLs.

Still there are things to clean up and fix for defaults:

  • default value is not shown in DescribeProcess for ComplexInput (template, json, ...).
  • I doubt we need to explicitly set default_type ... it is either fixed (Literal, BBox) or we guess it from the value.
  • Consistent handling of default for Literal, BBox and ComplexInput (default is set twice for Literal and BBox).

Shall we open another ticket for this?

@huard
Copy link
Collaborator Author

huard commented Aug 30, 2018

Correct.

I agree it would make sense to make changes to the default_type mechanisms, but I didn't want to break anything.

I suppose it would be cleaner to make a separate PR with changes to default handling, but I can try to fix the describeprocess issue in this PR.

@huard
Copy link
Collaborator Author

huard commented Aug 30, 2018

In the json property I could add:

        if not self.data_set and self._default is not None:
                self._set_default_value()

so that default values are set before creating the json representation.

There is another issue with the json property: it accesses ComplexInput.file, which for a URL input would trigger a download. What about when the source is DATA, does it make sense to return the link to the file when the json already include the data itself ?

Also, having to return the file created from a default value requires a workdir to be defined, which is not set before execution time. This case does not seem to be exercised in test_describe.

One option would be in the json property to do:
'file': self.file if self.prop == 'file' else None
The url or data is going to be returned anyway in other fields, but it may break some external code relying on file actually providing a link to the server disk copy of the data. The question is really how much "work" is the json description allowed to trigger? My suspicion is that users except the json description to be a cheap operation with no substantive side-effects.

@cehbrecht
Copy link
Collaborator

@huard I would not trigger _set_default_value in the json method. I have checked the WPS specs and it looks like the configured default value is not expected to be shown in the DescribeProcess response, just the formats (mimetypes):

http://www.opengeospatial.org/standards/wps (wps 1.0.0, table 21).

That was wishful thinking ;)

@huard
Copy link
Collaborator Author

huard commented Aug 30, 2018

Ok, then I think we have a working version, with the caveat that a source_type clean-up remains to be done.

@cehbrecht cehbrecht self-requested a review August 30, 2018 17:36
Copy link
Collaborator

@cehbrecht cehbrecht left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR looks fine and tested with Emu WPS. Remaining changes for default value handling are coming with a new PR.

@cehbrecht
Copy link
Collaborator

@jachym @huard I have opened a ticket (#395) for the issues with default values.

@cehbrecht cehbrecht merged commit 878bda0 into geopython:master Aug 31, 2018
@cehbrecht
Copy link
Collaborator

@huard checked again and merged. Thanks :)

cehbrecht added a commit that referenced this pull request Aug 31, 2018
@cehbrecht cehbrecht mentioned this pull request Oct 1, 2018
2 tasks
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants