POST end-points for synchronous requests #124

jerstlouis · 2021-02-08T15:19:05Z

Currently process execution requests are always POSTed to /jobs.
If /jobs becomes a generic OGC API - Common end-point for asynchronous requests, would it make sense to reserve that for asynchronous processing?
While POSTing to /jobs could still be potentially supported for submitting async requests, proposing to support POST to /processes/{processId} (also currently the end-point used for Workflows extension).

Either only /processes/{processId} could be supported, or if the mode is omitted the end-point you POST to could determine a default mode.

The text was updated successfully, but these errors were encountered:

pvretano · 2021-02-08T19:26:07Z

@jerstlouis why are you under the impressions that /jobs is only for asynchronous requests? A job is not asynchronously executed just because you POST it to /jobs.

jerstlouis · 2021-02-08T19:45:15Z

@pvretano What was discussed at the last Members Meeting Open OAB session, and I had also brought that up at the Common SWG meeting, was that /jobs could be a general asynchronous processing end-points, not only for Processes but for other OGC API specifications as well.

So it's from that line of thought that I am suggesting that perhaps /jobs could be reserved for querying and managing jobs status, and potentially also to submit jobs as async by default, if supporting POST requests to /jobs as well.

It makes very little sense to POST a synchronous execute-request to /jobs when you are not expecting to be creating a job but just receiving results right away.

I also think that having /processes/{processId} as the general end-point to POST to, i.e. you POST directly to the process, would facilitate the harmonization with process aliases, e.g. with the Routes API, where if your routing process /processes/routingEngine end-point has an alias at /route, then you can POST your execute-request to /route.

pvretano · 2021-02-08T20:21:52Z

@jerstlouis when we discussed the move from /process/{processId}/jobs to /job in the SWG asynchronous execution was not part of the discussion. I was simply an example that I presented as one reason why a non-processes API might want to use a /jobs endpoint.
Why does it make very little sense to post a synchronous job to /process/{processesId}/job (like we did in Processes up until very recently) and now because we moved jobs to /jobs it no longer makes sense? What does not make sense is posting to two different endpoints (with potentially two different execute schemas) just because you want to execute a processes synchronously or asynchronously.
I would also point out that the "mode" key accepts a value of "auto" that lets the server decide whether the job should be synchronously or asynchronously executed and so having two different execution endpoints make that implementation a little messy.

jerstlouis · 2021-02-08T20:39:46Z

@pvretano I understand it was not addressed specifically in the discussion, but now that it lives outside of the /processes tree I think it screams at you even more. I personally always thought it odd that you'd POST to /process/{processId}/jobs for a synchronous request when you are not creating a new job (and I find it very confusing from a resource name / methods perspective), and that's something that was inherited from WPS being traditionally very rooted in async processing.

What does not make sense is posting to two different endpoints

Always POST to /processes/{processId} then, and if async, the jobs can be managed from /jobs?

Two different execute schemas

Definitely not, the same one schema :)

Another potential advantage of this proposal is that the async job you POST to /processes/{processId} could possibly create a job which will be running and managed on another server elsewhere which implements the /jobs job management functionality.

I would also point out that the "mode" key accepts a value of "auto"

I think that could have worked quite well meaning "depending on where you POST it to", if we were to decide to support POSTing it to both places... But that's a different use case of auto than currently specifiled, the current one expecting the server to make the decision based on something like results size, right? So I guess not... (side note: since there is an auto, wishing mode can be optional and default to auto -- less is more for clients trying to craft minimal valid execution requests).

pvretano · 2021-02-08T20:54:03Z

@jerstlouis just because you execute a job synchronously, that does not mean that you don't want the results to persist. This is especially true if the transmission mode is set to reference. Having the a job endpoint allows the server to consistently and conveniently persist execution results at /jobs/{jobId}/results/{outputId}.
In my implementation every execution gets a job id (sync or async) and the results can age out or persist until removed (but that is just an extension in my server).

jerstlouis · 2021-02-08T21:32:50Z

@pvretano ah, well that's a good point.
If I understand the current specs correctly, there is no requirement for the results to persist when executing synchronously when using "response" : "raw" and "transmissionMode" : "value". So even if the server can decide to offer the synchronous results at /jobs/{jobId}/results/{outputId}, from the specs a client cannot rely on that being the case.

In our implementation, for synchronous we would not want to expose the persistence of the results, even if internally we could manage some persistence / caching to accelerate future identical requests. And actually this would allow to point to the exact same output for different requests producing the same results. In general, those sync results are likely to be much more numerous and short-lived than typical async requests.

Especially if synchronous execute requests are available for non-authenticated users, it's convenient to not expose them and avoid any privacy issue by not providing a list of jobs / results (as well as to avoid too many results piling up quickly, without any special efforts on the server's part).

But I am not suggesting to get rid of the jobs end-point or alter anything concerning the results, just either replacing its POST method by (or adding as an alternative to) a POST method to /processes/{processId}.

pvretano · 2021-02-08T22:00:18Z

@jerstlouis what you seem to be suggesting is that very late in the game we change the well established execution endpoint from the jobs endpoint to the endpoint that is used by Workflows. Does Workflows not work if it changes its execution endpoint to what is specified in the processes specification?

jerstlouis · 2021-02-08T22:10:44Z

@pvretano Not exactly...

The well established execution end-point was just changed recently from /processes/{processId}/jobs to /jobs.

The Workflows extension was using /processes/{processId} as:
A) that felt like the logical thing to me (when originally proposing this over a year ago I actually assumed that's what Processes used)
B) when realizing that's not what Processes used, I thought, well, then it conveniently avoids any overlap / confusion between the two approaches.

But considering the recent move to /jobs and concern with POSTing to /jobs without really creating a job for synchronous execution (the main reason for this suggestion), and the fact that there is mode to distinguish between sync / async, and for Workflows extension maybe: "mode" : "collection" for collection description results (and on-demand processing/triggering based on OGC API requests), or maybe later for multi-results: "mode" : "dataset" for OGC API landing page triggers/results, I am now making this proposal that brings it all to /processes/{processId}

because:

What does not make sense is posting to two different endpoints

:)

And yes Workflows do not work with /jobs, because the client will POST the execute request to whatever the value for "process" is directly (attempting to do any kind of URL parsing / replacing to figure out what to strip and where to add /jobs would be problematic).

To some extent this argument would also work for non-Workflows synchronous or asynchronous requests... being able to POST the execute request directly to whatever the "process" URL is I think is quite convenient, if request documents are shared (e.g. for Reusability / Reproduceability -- the R in FAIR!).

pvretano · 2021-02-08T22:29:29Z

@jerstlouis the move the /jobs is NOT recent. Process execution was always to a "jobs" endpoint (previously /processes/{processId}/jobs and now /jobs) ... and that was the point ... processes execution always involved a job (whether implicitly or explicitly created).
With regard to the requirements for Workflows, perhaps "process" is not the correct key and "executionEndpoint" is the key that should be in the execute request.

jerstlouis · 2021-02-08T22:36:20Z

Fair point about always having a .../jobs.
But the implicit job with "mode" : "sync", "response" : "raw", "transmissionMode" : "value" is not a resource the client can interact with per the specs (in that mode the job is just as abstract as with a request to /map or to /tiles/{tmsID}/{level}/{row}/{col}).

@pvretano Well what I liked very much about the "process" key, is that it's also what identifies the type of input.
i.e. in core you have for example:

"href"
"value"

and Workflows adds:

"collection"
"process"

So the "process" input in Workflows allowing to specify an external process would neatly map to the process identifier / execute end-point in Core, while allowing the "process" object to be recursive/nested in Workflows.

I especially liked the process & collection duality, as it illustrates very well the idea of bringing together data and algorithms/computing resources from any available OGC API service.

christophenoel · 2021-02-09T08:18:30Z

It makes very little sense to POST a synchronous execute-request to /jobs when you are not expecting to be creating a job but just receiving results right away.

I expect that a WPS client application can discover the jobs and results independently of the execution mode sync/async. Therefore, I hope that, even if the execution was sync, the client still can access the result at /jobs/{jobId}/result .

"mode" : "sync", "response" : "raw", "transmissionMode" : "value"

Those properties are old fashioned, not in line with RESTful services, and complexifies the implemention for no added value. I expect the API to support two modes:

Request includes a callback endpoint --> the response is sent async
Request does not include a callback endpoint --> the client needs to poll the jobs endpoint.

response (raw/document) and transmissionMode (value,reference) act on the API itself which is not a good idea at all.

jerstlouis · 2021-02-09T09:04:22Z

@spacebel With "mode" : "sync", "response" : "raw", "transmissionMode" : "value" the client was never given a jobId.
The result is just directly the data, as is the case for a GET to /map or to a tile...

While this may not seem practical for batch processing taking a lot of time, it makes tons of sense for processes / small requests expected to complete almost immediately, which is what I expect the synchronous mode would be most useful for.
A perfect example of this is the Routes API which we are trying to harmonize with OGC API - Processes, which supports a synchronous mode.
Another excellent example is rendering a map as a process.

Please don't take away that mode, it's the simplest one by far :)

pvretano · 2021-02-09T13:00:30Z

@spacebel about the async processing ... agreed! see: https://github.com/opengeospatial/ogcapi-common/tree/master/proposals/simple-async

@jerstlouis how exactly is staying with the status quo "taking away that mode"?? OGC API Processes supports synchronous processing and always has. I thought that the current debate is about whether the execution endpoint should be /jobs or /processes/{processesId}. Am I wrong about that? In my opinion, whatever endpoint is chosen for execution it should be a single endpoint regardless of the execution mode (sync or async). Having one execution endpoint for sync and another for async seems unnecessarily complicated.

jerstlouis · 2021-02-09T14:44:21Z

@pvretano At the moment OGC API - Processes - Core supports
"mode" : "sync", "response" : "raw", "transmissionMode" : "value"

But I was getting the impression that @spacebel would like to remove that functionality.

You are correct, this issue is strictly about the execution end-point.
I agree with a single end-point -- but I do feel there are many advantages (even putting the Workflows extension aside) with /processes/{processId} being that one end-point :)

christophenoel · 2021-02-09T16:37:19Z

To my knowledge the "sync" mode (from XML bindings) was not migrated to the REST/JSON API. So the question about how is performed the sync mode seems a good question. Note the sync mode is not async callback, neither the async polling.

For the 'sync' mode, the approach should be friendly with the REST philosoph (that you probably know better than me).

Same REST endpoint path for both:

201 response (as usual) for async mode
200 response with result document for sync mode

Dedicated endpoint path for each.

In any case, I suppose returning on same endpoint with code 200 two possibles reponse types (status or result) would not be a good practice ? Such pattern was more in line with our good old SOAP/XML...

PS: Even in 'sync' mode, you could return a jobId and that would make sense.

jerstlouis · 2021-02-09T16:50:08Z

@spacebel

For sync mode, the current details as specified at: http://docs.opengeospatial.org/DRAFTS/18-062.html#_response_6

A successful execution of the operation SHALL be reported as a response with a HTTP status code 200.

If the "response" attribute of the execute request was set to "document", the content of the response SHALL be based upon the OpenAPI 3.0 schema result.yaml

If the "response" attribute of the execute request was set to "raw", the content of the response SHALL only include the one output selected by the execute request body.

There is no jobId returned for sync mode, whether response is set to raw or document, or whether transmissionMode is set to value or reference (if I understand correctly, transmissionMode only applies to "response" : "document"). The server can have an internal jobId if it likes and implement a vendor extension exposing that, but as far as I understand, the specifications do not specify anything about jobId for sync mode (and in my opinion, that is exactly how it should be).

For async, the status query is at /jobs/{jobId}, while the results are found at /jobs/{jobId}/results.

christophenoel · 2021-02-10T07:42:02Z

There is no jobId returned for sync mode,

I would rather say: There is no jobId returned for sync mode in the current spec (which simply translated the XML bindings instead of rethinking the standard for a modern, restful API).

Ok fine, but my opinion is still that such options (e.g. response, mode) are absolutely not RESTful. All those modes and options acting on the operation response is exactly what a RESTful micro service should NOT do.

My humble personnal point of view :) :)

ghobona · 2021-02-10T07:57:08Z

All, let's keep in mind that the focus at the moment is on Part 1: Core.

mode and other options are well understood and have been proven to work.

Future extensions may include support for additional approaches such as those discussed in the Pub/Sub whitepaper edited by @sarasaeedi.

fmigneault · 2021-03-10T20:03:02Z

I have just taken note of this proposition through other channels, and I am very against it.

I agree with the comment of @spacebel, POST on /processes/{id} to execute a job is NOT RESTful.
The execution is not done ON the process, but BY the process (or some async worker). There is no way to know an execution gets created with that path and gives the wrong impression that something modifying the process was accomplished (e.g. such as a PUT or PATCH would override/modify the process with PATCH|PUT /processes/{id}). Path /processes/{id} should be used only to do operations ON the process.

Since nothing is created on /processes/{id}, the process execution, or job, must be created with another path such as /processes/{id}/job and retrieved by /processes/{id}/jobs/{id} (or with corresponding /jobs and /jobs/{id}).
Whether the sync mode does create or not an actual job item is an implementation detail that should not affect the API definition. The actual name job could even be replaced by execution, and the same principle would still apply. No execution item necessarily created, but the API definition indicates that a job/execution of the process was created, not modifying that process definition.

I also agree with @pvretano that sync vs async executions should be done on the same endpoint. Having different ones unnecessarily complicates things, and since jobs are mandatory for async to retrieve the status, jobs path should also be used for sync mode.

I again agree with @pvretano that sync does not necessarily mean that there is no job, but instead just enforces that the status is directly returned only when the full execution is completed. The result of that execution could well be persisted in a job, and fetched at a later time.

I would argue also that even a sync execution could return a jobId where further details about the execution such as processing logs could be retrieved at a later time. For example, our implementation (https://github.com/crim-ca/weaver) provides /jobs/{id} for status, /jobs/{id}/results for obtained results, /jobs/{id}/inputs for submitted inputs, /jobs/{id}/logs for logging steps and /jobs/{id}/exceptions for details in case of errors. It wouldn't make any sense to query GET /processes/{id}/logs to obtain one specific execution logs, as we are missing a parameter and that path seems to indicate that we are fetching logs of the process itself rather than its execution. Even when running in sync, being able to retrieve execution logs after the fact is often the only way we can actually debug a failing process.

Finally, POST /processes/{id} could be requested more than once at the same time even when running them both in sync (e.g.: when parallel load-balanced workers running the API). There would be no way at all to distinguish between both executions without a jobId in that case.

bpross-52n · 2021-03-11T09:35:08Z

Thanks @fmigneault, we should take this into consideration. I also invite you to our next SWG meeting on March 17, 9 a. EDT to take part in the discussion.

jerstlouis · 2021-03-17T14:48:20Z

2021-03-17 SWG (Members) Meeting: We agreed to /processes/{processId}/execution .
One argument was that this would facilitate the DAPA API integration where execution via GET is supported.

bpross-52n · 2021-04-12T12:29:01Z

Should be fixed by #159

jerstlouis changed the title ~~POST for synchronous end-points~~ POST end-points for synchronous requests Feb 8, 2021

jerstlouis mentioned this issue Feb 8, 2021

Rename "id" to "process" and use full URI to process #123

Closed

pvretano mentioned this issue Feb 10, 2021

Get rid of the "mode" key ... #127

Closed

bpross-52n added the approved label Mar 8, 2021

jerstlouis mentioned this issue Mar 10, 2021

Consider using "self" relation type for elements of /processes pointing to /processes/{processId} #149

Closed

fmigneault mentioned this issue Mar 17, 2021

[Feature] Process Execution endpoint crim-ca/weaver#235

Closed

bpross-52n closed this as completed Apr 12, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

POST end-points for synchronous requests #124

POST end-points for synchronous requests #124

jerstlouis commented Feb 8, 2021 •

edited

pvretano commented Feb 8, 2021

jerstlouis commented Feb 8, 2021 •

edited

pvretano commented Feb 8, 2021 •

edited

jerstlouis commented Feb 8, 2021 •

edited

pvretano commented Feb 8, 2021 •

edited

jerstlouis commented Feb 8, 2021 •

edited

pvretano commented Feb 8, 2021

jerstlouis commented Feb 8, 2021 •

edited

pvretano commented Feb 8, 2021 •

edited

jerstlouis commented Feb 8, 2021 •

edited

christophenoel commented Feb 9, 2021

jerstlouis commented Feb 9, 2021

pvretano commented Feb 9, 2021

jerstlouis commented Feb 9, 2021

christophenoel commented Feb 9, 2021

jerstlouis commented Feb 9, 2021 •

edited

christophenoel commented Feb 10, 2021

ghobona commented Feb 10, 2021 •

edited

fmigneault commented Mar 10, 2021

bpross-52n commented Mar 11, 2021

jerstlouis commented Mar 17, 2021 •

edited

bpross-52n commented Apr 12, 2021

POST end-points for synchronous requests #124

POST end-points for synchronous requests #124

Comments

jerstlouis commented Feb 8, 2021 • edited

pvretano commented Feb 8, 2021

jerstlouis commented Feb 8, 2021 • edited

pvretano commented Feb 8, 2021 • edited

jerstlouis commented Feb 8, 2021 • edited

pvretano commented Feb 8, 2021 • edited

jerstlouis commented Feb 8, 2021 • edited

pvretano commented Feb 8, 2021

jerstlouis commented Feb 8, 2021 • edited

pvretano commented Feb 8, 2021 • edited

jerstlouis commented Feb 8, 2021 • edited

christophenoel commented Feb 9, 2021

jerstlouis commented Feb 9, 2021

pvretano commented Feb 9, 2021

jerstlouis commented Feb 9, 2021

christophenoel commented Feb 9, 2021

jerstlouis commented Feb 9, 2021 • edited

christophenoel commented Feb 10, 2021

ghobona commented Feb 10, 2021 • edited

fmigneault commented Mar 10, 2021

bpross-52n commented Mar 11, 2021

jerstlouis commented Mar 17, 2021 • edited

bpross-52n commented Apr 12, 2021

jerstlouis commented Feb 8, 2021 •

edited

jerstlouis commented Feb 8, 2021 •

edited

pvretano commented Feb 8, 2021 •

edited

jerstlouis commented Feb 8, 2021 •

edited

pvretano commented Feb 8, 2021 •

edited

jerstlouis commented Feb 8, 2021 •

edited

jerstlouis commented Feb 8, 2021 •

edited

pvretano commented Feb 8, 2021 •

edited

jerstlouis commented Feb 8, 2021 •

edited

jerstlouis commented Feb 9, 2021 •

edited

ghobona commented Feb 10, 2021 •

edited

jerstlouis commented Mar 17, 2021 •

edited