Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parameter 'outputs' from request object is not propagated to the manager execute_process and then to processor execute #1420

Closed
francescoingv opened this issue Dec 7, 2023 · 7 comments
Labels
bug Something isn't working OGC API - Processes OGC API - Processes
Milestone

Comments

@francescoingv
Copy link
Contributor

In API.execute_process(),
the (optional) parameter 'outputs', if present in the request, is not passed on to BaseManager.execute_process() (or derived).
Even more, the parameter 'outputs' is not collected: only the parameter 'inputs' is collected.

To properly process a request the processor (i.e. "BaseProcessor" or derived) should get the requested outputs to know which outputs to produce.
I expect the manager (e.g. "BaseManager" or derived) to use this information, and also the property "transmitionMode" of each output, to prepare:

  • body;
  • headers (i.e. 'Location');
  • content pointed by 'Location' in the headers.

In case of "transmissionMode": "reference", I am not sure if either the processor or the manager should prepare the reference (possibly writing to a file the single output).
As far as I understand, the reference is not expected to be served by the API, therefore only the processor can prepare it (possibly on a different endpoint).
If the reference could be server by the API (but I do not understand the corresponding entrypoint), then the manager could be in charge of it.

I propose to add an additonal parameter (a dictionary with the content of outputs) to BaseManager.execute_process()
and the same additional parameter to BaseProcessor.execute()

@francescoingv francescoingv added the enhancement New feature or request label Dec 7, 2023
@francbartoli
Copy link
Contributor

Hi @francescoingv, few comments on the formal aspects of the ticket:

  • It's still not clear to me if we should consider the ticket as a feature request or a bug report even if you have flagged it as enhancement.
  • Can you kindly assess with the OGC API - Processes specification if the current pygeoapi interface of the OpenAPI document doesn't comply with the standard? (i.e. outputs is not taken into account) If it doesn't then it's clearly a bug otherwise it would be a feature to be added
  • Regardless of the outcome I would strongly encourage you to edit your explanation above and adapt its content under one of the relevant template that the team has prepared for the reporters. This would help us to understand the problem and review properly any eventual pull request that would fix the issue.

Thanks,
Francesco

@francbartoli francbartoli added OGC API - Processes OGC API - Processes question Further information is requested and removed enhancement New feature or request labels Dec 8, 2023
@francescoingv
Copy link
Contributor Author

Description
The processor cannot return outputs based on the outputs parameter of a process-execute

Steps to Reproduce
Create a processor (say TestProcessor) derived from BaseProcessor with the following outputs section in PROCESS_METADATA

  "outputs": {
    "echo1": {
      "schema": {
        "type": "string"
      },
    "echo2": {
      "schema": {
        "type": "string"
      }
    }
  },

It is impossible to make the function TestProcessor.execute() returning a single specific output as requested by the user.

Expected behavior
Note: The problem is not present if the processor defines a single output with a single format.

If process-execute is requested without outputs parameter, all defined outputs (i.e. echo1 and echo2) will be returned (requirement 27 of the standard - however the standard does not specify the transmissionMode to return the outputs in this case).
If process-execute is requested with outputs parameter, and the outputs parameter contains only echo2, only echo2 will be returned (I did not find this clearly stated in the standard).

The specification also spells:

A process output can be defined in the process description as being of one or more media
types. In cases where a specific output can be presented in one of a number of media types, the
format parameter in the execute request can be used to indicate the format that should be used
to present the process output value in the server’s response.

This is far more difficult to describe in Steps to Reproduce and then in Expected behavior, but to me it requires the TestProcessor can access the full outputs parameter.

Environment

  • OS: Ubuntu
  • Python version: 3.8.10
  • pygeoapi version: 0.15.0

Additional context
I propose to add an additonal parameter (a dictionary with the content of outputs) to BaseManager.execute_process() and the same additional parameter to BaseProcessor.execute().
To keep backward compatibility the parameter could be a named parameter.

@francbartoli francbartoli added bug Something isn't working and removed question Further information is requested labels Dec 11, 2023
@ricardogsilva
Copy link
Member

This is a subset of #1285, which deals more generally with what is missing from pygeoapi in terms of fully supporting clause 7.11 (execute a process) of the OGC API - Processes v1.0.0 spec.

TL; DR
To be clear, if a PR with this functionality would be merged, there would likely be a breaking change to pygeoapi's API and this would require a new release and a potential refactor of downstream projects and deployments.


In my opinion it would be difficult to solve this problem while still ensuring backwards compatibility. The reasons being that in order to support the additional information that is being passed in the request, which is:

  • outputs
  • response
  • subscriber

we need to add support for this in:

  • pygeoapi.process.manager.base.BaseManager.execute_process()

    def execute_process(

    This method's signature would need to change. Additionally, since pygeoapi supports plugging in a custom process manager, whatever downstream managers might exist out there in the wild would need to be modified as well in order to cope with the change

  • pygeoapi.process.base.BaseProcessor.execute()

    def execute(self, data: dict) -> Tuple[str, Any]:

    This method's signature also needs to change. Additionally, since pygeoapi also supports plugging in custom processes, this would break whatever third party process implementations exist.

This is not to say that I do not support the change, I absolutely do, since it is required in order for pygeoapi to be compliant with the OAProc standard. But it seems to me that it implies a breaking change that will require existing deployments to modify their code. This also means that if there are any deployments out there that are living on the edge (i.e. depending on pygeoapi's master branch), these would likely break.

Adding support for parsing an execute request's outputs option would be a nice addition indeed. I also think that while we are at it, it would be optimal to up the OAProc support for full compliance with v1.0.0 of the standard and then releasing a new version of pygeoapi - perhaps even a v1.0.0?

I'm not sure what would be the most appropriate course of action in this regard, so maybe we'd need some input from @tomkralidis here?

@francescoingv
Copy link
Contributor Author

I understand there is a backward compatibility issue,
however I believe to let the community to program the processing services according to the standard, the upgrade of the interface is required/mandatory.
Otherwise the frame over-constraints and limits the development, possibly towards bad practices (e.g. I am wondering about: one service entry point for each combination of outputs; subscriber passed on as input parameters, and the Processor taking care to act on them; transmissionMode passed as input parameter and possibly the link returned as output; etc...).

I wonder if a viable solution could be:

  • for the current release, to keep the backward compatibility, but also to allow coding adhering to the standard:
  1. declare BaseProcessor and BaseManager @deprecated
  2. create NewBaseProcessor and NewBaseManager with the new required signatures
  3. change API.execute_process() to check if self.manager is derived from BaseManager or NewBaseManager and accordingly call self.manager.execute_process() with different number of parameters
  • for the next release, possibly drop the backward compatibility:
  1. make BaseProcessor and BaseManager an alias of NewBaseProcessor and NewBaseManager

@totycro
Copy link
Contributor

totycro commented Dec 14, 2023

We did have a breaking change in execute_process more or less recently (1-2 years ago?) though, so I'm not sure if this part of pygeoapi is considered stable enough to need a complex deprecation handling (I can't find the commit right now, but input parameter handling was changed from [{"id": "foo", "value": "bar"}] to {"foo": "bar"} or so), this was the previous implementation

value = input['value']
)

Adding more parameters to BaseProcessor.execute_process would of course be a breaking change, but it would be fairly simple for maintainers to accept the parameters and ignore them for now.

Here's also the pull request for subscriber, which would add a parameter to BaseProcessor.execute_process: #1313

francescoingv added a commit to francescoingv/pygeoapi that referenced this issue Dec 21, 2023
Add support for OGC API Processes Outputs
Copy link

This Issue has been inactive for 90 days. As per RFC4, in order to manage maintenance burden, it will be automatically closed in 7 days.

@github-actions github-actions bot added the stale Issue marked stale by stale-bot label Mar 17, 2024
@francescoingv
Copy link
Contributor Author

The issue is still open as a bug:
the parameter outputs contains also the property transmissionMode,
which should be passed on to the ProcessManager and then to Process to properly handle the request.
Ref. also: #1285

There is an approved PR #1448 to solve the issue, but it was not merged.

The solution needs to change two signatures and could be applied in two steps:

  1. BaseManager.execute_process() (which was also recently changed)
  2. [dependent on the previous change] BaseProcessor.execute()

Accordingly, the proposed PR could be split in two separate PRs for smoother integration.

@github-actions github-actions bot removed the stale Issue marked stale by stale-bot label Mar 24, 2024
tomkralidis pushed a commit that referenced this issue May 10, 2024
* Solve issue #1420

Add support for OGC API Processes Outputs

* Resolve Jan, 3 2024 totycro comments

* Solve issue 1420 with full backward compatibility for Processors.

* changed formattings

* Some additional formatting changes

* Update api.py

missing line in api

* Update base.py

To resolve conflict with #1603

* Update base.py

Added subscriber inline doc

* After Ricardo Silva comments on 13 Apr.

Included all the suggested changes on code format andparams name.

* Changed line length

* fixed trailing spaces.

* Update formatting base.py

* Update base.py

---------

Co-authored-by: FrancescoIngv <FrancescoIngv@users.noreply.github.com>
@tomkralidis tomkralidis added this to the 0.17.0 milestone May 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working OGC API - Processes OGC API - Processes
Projects
None yet
Development

No branches or pull requests

5 participants