Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to use OpenDAP with PyWPS? #152

Closed
cehbrecht opened this issue Aug 10, 2016 · 22 comments
Closed

How to use OpenDAP with PyWPS? #152

cehbrecht opened this issue Aug 10, 2016 · 22 comments
Assignees
Milestone

Comments

@cehbrecht
Copy link
Collaborator

i'm using PyWPS for processing of climate data. A major data format used here is NetCDF. In PyWPS one can use a complex type with mime-type application/x-netcdf to define input and output parameters for NetCDF files.

Another important way to access NetCDF files is with OpenDAP. A data service to access parts of the remote NetCDF file. The OGC WPS Best Practise paper recommends the mime-type application/x-­ogc‐dods for OpenDAP. If one would provide an OpenDAP url in a complex-type, PyWPS would try downloading it ... this is not the right behaviour, it should just pass it to the process. I could add a patch to fix this behaviour for OpenDAP mime-types.

A workaround would be to use a string parameter for the OpenDAP url. But then you loose the descriptive mime-type (which is used by generic wps clients).

The Python NetCDF4 library can handle both NetCDF files and OpenDAP urls with the same initialisation code (Dataset(file_or_url)), so for a process it would not be necessary to have separate input parameters for OpenDAP and NetCDF.

Any recommendations on this topic?

Cheers,
Carsten

@jachym
Copy link
Member

jachym commented Aug 10, 2016

Hi Carsten,

PyWPS is trying to handle the data inputs for you - and make sure, you are downloading right data, not too big, because you are operating with untrusted source. We try not to download data bigger than the configuration value (server -> maxsingleinputsize) and then validation can be applied (pywps.inout.formats.Format -> validate and mode attributes).

Having said that, I see your needs and it makes sense to me to address them. I can also see, having services around, like FME, it makes sense enable usage of such services and leave the input data handling and validating to them - to trusted 3rd party services.

Well, I think, we would need some complex approach to this topic, do you see any possible solution in
https://github.com/geopython/pywps/blob/master/pywps/app/Service.py#L393 where the Reference data are downloaded?

@cehbrecht
Copy link
Collaborator Author

@jachym
i will have a look at it. I'm still on PyWPS 3.x ... so maybe i try it first there ... and hopefully switch to pywps 4.x soon.

@cehbrecht cehbrecht added this to the 4.2.0 milestone Mar 22, 2018
@cehbrecht cehbrecht self-assigned this Mar 22, 2018
@cehbrecht
Copy link
Collaborator Author

We talked about this at the OGC coding-sprint in Bonn. I will make a small modification of the PyWPS 4.x code to make it possible to provide a netCDF file and a OpenDAP URL with the same ComplexInput parameter. netCDF and OpenDAP will have a different mimetype. Depending on the mimetype the PyWPS code can decide if it will download the file (netcdf) or just pass through the URL reference (OpenDAP) to the process.

@cehbrecht
Copy link
Collaborator Author

See ticket bird-house/flyingpigeon#231.

It references a code snippet to check if a URL is OpenDAP:
https://github.com/bird-house/hummingbird/blob/6d5ea96641abf49eae3a7a4d126425e5396ff283/hummingbird/processes/wps_cfchecker.py#L47

Maybe this code example could be used for a PyWPS input validator.

@ldesousa
Copy link
Contributor

@cehbrecht That looks a very reasonable approach, as discussed in Bonn.

@cehbrecht
Copy link
Collaborator Author

@huard PR #387 introduces the url_handler which we need for OpenDAP support. I have written a process which uses this url_handler for a ComplexInput parameter with OpenDAP mimetype:
https://github.com/bird-house/emu/blob/61a151dc8ec7dfb4d71da548e1607ff6b4152a2d/emu/processes/wps_ncmeta.py#L27

Currently the implementation (in the PR) uses the url_handler both for OpenDAP and other HTTP urls (netcdf from file-server). If the url_handler recognizes the mimetype (validation mode) then it could download the netcdf file from the file-server and provide a local file url (or is this the file_handler then?) ... in case of OpenDAP it just skips the download.

The reasoning for this is to have the same CompleInput parameter both for NetCDF and OpenDAP and pywps handles these according to the mimetype. In my process code I can then just use:

inpt = request.inputs['dataset'][0]   # opendap or netcdf
ds = Dataset(inpt.url)

Currently I have to use:

ds = Dataset(inpt.url)  # opendap
ds = Dataset(inpt.file) # netcdf

@huard
Copy link
Collaborator

huard commented Aug 22, 2018

You're right. Will think about it.

@huard
Copy link
Collaborator

huard commented Aug 22, 2018

I've thought about this, and here is the problem I see. The convention I've used is that accessing url does not trigger a download, it's just a link. So if for certain url we download the file locally and we don't for other types of files, I think we're creating confusion. Same thing for file, the idea is that accessing file creates a local copy of the link provided by the request. Accessing the file property of an opendap ComplexInput downloads the full file.

My guess is that it would be clearer to write something like the following

ds = Dataset(inpt.url if inpt.data_format.mime_type == 'application/x-ogc-dods' else inpt.file)

Another issue is that if an input has more than one supported format (netCDF and DODS), and the user does not specify the mimetype, pywps picks the first one by default. I don't know how to pass the mimetype using key-value pairs, is that even possible?

@cehbrecht
Copy link
Collaborator Author

@huard

Another issue is that if an input has more than one supported format (netCDF and DODS), and the user does not specify the mimetype, pywps picks the first one by default. I don't know how to pass the mimetype using key-value pairs, is that even possible?

When we enable validation for inputs the NetCDF/OpenDAP validator (bird-house#1) can check the mimetype of the input data. The WPS protocol allows it also to set explicitly the mimetype ... but it is optional and nothing we can rely on (relevant for the validator only?):
http://geoprocessing.info/wpsdoc/1x0ExecuteGETInputs

But relying on pywps to figure out the right mimetype is error-prone.

@cehbrecht
Copy link
Collaborator Author

@huard

Well, I thought the following would work:

ds = Dataset('file:///path/to/my/local/data.nc')

But the Dataset does not like file URLs.

My guess is that it would be clearer to write something like the following

ds = Dataset(inpt.url if inpt.data_format.mime_type == 'application/x-ogc-dods' else inpt.file)

Maybe we can add a util method:

ds = Dataset(util.netcdf_or_opendap(inpt))

@cehbrecht
Copy link
Collaborator Author

... follow up. The util method makes only sense when pywps can guess the right mimetype. Otherwise we need separate input parameters for netcdf and opendap, like already in ncmeta process example.

@huard
Copy link
Collaborator

huard commented Aug 23, 2018

Support for file URLs is something we could ask upstream libraries to support (netCDF4, xarray), but it's not a short term fix.
Could we not just do in the util:

try:
  return Dataset(inpt.url)
except:
  return Dataset(input.file)

@huard
Copy link
Collaborator

huard commented Aug 23, 2018

That, in the case the mimetype is unknown.

@cehbrecht
Copy link
Collaborator Author

Could we not just do in the util:

try:
  return Dataset(inpt.url)
except:
  return Dataset(inpt.file)

That might work ... I can try it with ncmeta process.

@cehbrecht
Copy link
Collaborator Author

I have added an nc_resource util method to ncmeta and it works :)

https://github.com/bird-house/emu/blob/ea899f544f4029f75594d4cf210251c5e3ee0a3d/emu/processes/wps_ncmeta.py#L17

I have done it different ... using requests.head to check the mimetype.

@huard
Copy link
Collaborator

huard commented Aug 23, 2018

Looks good. Note that the next PR will include a new FORMATS.DODS for open dap mimetypes. I'm just waiting for the UrlHandler PR to go in before submitting it.

@cehbrecht
Copy link
Collaborator Author

Just give me a "go" on the url_handler PR and I can pull it in.

@huard
Copy link
Collaborator

huard commented Aug 23, 2018

Hum, your nc_resource won't work for file:/// schema...

@huard
Copy link
Collaborator

huard commented Aug 23, 2018

Go. I've added a few tests. Nothing broke.

@cehbrecht
Copy link
Collaborator Author

Hum, your nc_resource won't work for file:/// schema...

Fixed:
https://github.com/bird-house/emu/blob/14eeee066ddf52bdfaa9cc79670abd41acba1ea9/emu/processes/wps_ncmeta.py#L16

It does not work on Python 2.7 due to import issue in PR #387:

File "/home/travis/build/bird-house/emu/emu/processes/wps_ncmeta.py", line 20, in nc_resource
    if urlparse(inpt.url).scheme == 'file':
  File "/home/travis/miniconda/envs/emu/lib/python2.7/site-packages/pywps/inout/basic.py", line 315, in url
    import pathlib
ImportError: No module named pathlib

@huard
Copy link
Collaborator

huard commented Oct 5, 2018

@cehbrecht I think this can be closed.

@cehbrecht
Copy link
Collaborator Author

Initial support is integrated in pywps now.

This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants