Piperack / statelessness #176

Closed
gimsieke opened this Issue Sep 3, 2014 · 8 comments

Comments

Projects
None yet
2 participants
@gimsieke
Contributor

gimsieke commented Sep 3, 2014

When submitting input documents to several ports or when passing parameters, several documents have to be posted that will be used by a subsequent invocation of /pipelines({id}/run. This may lead to race conditions if Piperack is used as a service that multiple users can run the same pipeline on.
It would be really helpful if the different documents that are necessary for each invocation could be submitted in a multipart request or if Piperack returned a handle for each invocation.

The latter solution might be exposed in such a way that if the user intends to post multiple documents, she first calls /pipelines/{id}/init.(xml|json|txt) and receives a response that contains a token. (Piperack maintains an internal list of all runs, identified by token.) She then posts the input and parameter documents to /runs/{token}/inputs/{port}, /runs/{token}/parameters, /runs/{token}/parameters/{port}, etc. Same goes for /runs/{token}/run, GET /runs/{token} → info about a specific run, just like GET /pipelines/{id}.

I don’t suggest this for the sake of REST purity but because it is a real issue that has kept me from replacing the Saxon calls in many of our workflows with Piperack calls.

@ndw

This comment has been minimized.

Show comment
Hide comment
@ndw

ndw Sep 13, 2014

Owner

I assume by "same pipeline" you mean pipeline with the same name. In an environment where multiple users could be running pipelines, isn't it sufficient to give each instance a unique name? That's effectively what the token is anyway.

Owner

ndw commented Sep 13, 2014

I assume by "same pipeline" you mean pipeline with the same name. In an environment where multiple users could be running pipelines, isn't it sufficient to give each instance a unique name? That's effectively what the token is anyway.

@gimsieke

This comment has been minimized.

Show comment
Hide comment
@gimsieke

gimsieke Sep 13, 2014

Contributor

yes, same pipeline = same name

In our production scenarios it’s alwys the same user invoking the pipelines. We cannot create a Linux user for each Web app user, and sometimes they may run conversions anonymously. Calabash or Saxon invocations via shell script are atomic in that no one can tamper with the inputs or parameters of an already configured run (because submitting inputs and parameters happens with invocation).
If restlet.jar supports multipart, I think this will be the most elegant way to cater to this atomicity requirement. You could submit the different inputs/param sets with curl -F source=@file1.xml -F stylesheet=@file2.xsl -F params=@file3.xml

Contributor

gimsieke commented Sep 13, 2014

yes, same pipeline = same name

In our production scenarios it’s alwys the same user invoking the pipelines. We cannot create a Linux user for each Web app user, and sometimes they may run conversions anonymously. Calabash or Saxon invocations via shell script are atomic in that no one can tamper with the inputs or parameters of an already configured run (because submitting inputs and parameters happens with invocation).
If restlet.jar supports multipart, I think this will be the most elegant way to cater to this atomicity requirement. You could submit the different inputs/param sets with curl -F source=@file1.xml -F stylesheet=@file2.xsl -F params=@file3.xml

@ndw

This comment has been minimized.

Show comment
Hide comment
@ndw

ndw Sep 13, 2014

Owner

Sure, but if your scripts are submitting the pipelines, they don't need to be different users to be unique. Simply don't include a name parameter when you post the pipeline, you'll get back a unique URI in the Location: header. The rest of the script can use that.

Or am I missing something else?

Owner

ndw commented Sep 13, 2014

Sure, but if your scripts are submitting the pipelines, they don't need to be different users to be unique. Simply don't include a name parameter when you post the pipeline, you'll get back a unique URI in the Location: header. The rest of the script can use that.

Or am I missing something else?

@gimsieke

This comment has been minimized.

Show comment
Hide comment
@gimsieke

gimsieke Sep 13, 2014

Contributor

Ah, I didn’t even consider the possibility of submitting pipelines anonymously. This will indeed behave similar to what I sketched above, except that you don’t get a dynamic handle for a run but for the whole pipeline.
The drawback is that for more complex pipelines, calculating the dependency graph, compiling stylesheets used therein, slurping in 40 MB metadata dumps, etc., may take several seconds. I started exploring Piperack not just in order to save the JVM tax but also the pipeline analysis/compilation tax.
On the other hand, given that I wanted to use it to accelerate frequently invoked, short-running, simple pipelines, the analysis/compilation phase won’t take several seconds. More in the order of half a second. Still, this is a delay that one wants to optimize away once one’s gotten into optimization mode or mood.
The multipart thing would be quite cool. It will save you (= me) some bookkeeping in glue scripts, it will give an extra performance boost, it offers a natural -F form-input-name ↔ port-name association, and it doesn’t seem to fall prey to another race condition that I was thinking of: what if a named pipeline is running and I post another request to that same pipeline? Piperack just seems to wait until the other request has been served, which is ok.

Contributor

gimsieke commented Sep 13, 2014

Ah, I didn’t even consider the possibility of submitting pipelines anonymously. This will indeed behave similar to what I sketched above, except that you don’t get a dynamic handle for a run but for the whole pipeline.
The drawback is that for more complex pipelines, calculating the dependency graph, compiling stylesheets used therein, slurping in 40 MB metadata dumps, etc., may take several seconds. I started exploring Piperack not just in order to save the JVM tax but also the pipeline analysis/compilation tax.
On the other hand, given that I wanted to use it to accelerate frequently invoked, short-running, simple pipelines, the analysis/compilation phase won’t take several seconds. More in the order of half a second. Still, this is a delay that one wants to optimize away once one’s gotten into optimization mode or mood.
The multipart thing would be quite cool. It will save you (= me) some bookkeeping in glue scripts, it will give an extra performance boost, it offers a natural -F form-input-name ↔ port-name association, and it doesn’t seem to fall prey to another race condition that I was thinking of: what if a named pipeline is running and I post another request to that same pipeline? Piperack just seems to wait until the other request has been served, which is ok.

@ndw

This comment has been minimized.

Show comment
Hide comment
@ndw

ndw Sep 14, 2014

Owner

Unfortunately, the design of XML Calabash 1.x doesn't separate the analysis/compilation phase from the runtime phase. (V.next will; no ETA.)

The "get a token" approach would have to be implemented in terms of saving the pre-compiled pipeline source and spawning a new runtime for each token.

I'll see about the form solution though. Might be interesting.

Owner

ndw commented Sep 14, 2014

Unfortunately, the design of XML Calabash 1.x doesn't separate the analysis/compilation phase from the runtime phase. (V.next will; no ETA.)

The "get a token" approach would have to be implemented in terms of saving the pre-compiled pipeline source and spawning a new runtime for each token.

I'll see about the form solution though. Might be interesting.

ndw added a commit that referenced this issue Sep 14, 2014

@ndw

This comment has been minimized.

Show comment
Hide comment
@ndw

ndw Sep 14, 2014

Owner

Ok, I implemented multipart/form-data for POSTing to /pipelines/{id}.

Owner

ndw commented Sep 14, 2014

Ok, I implemented multipart/form-data for POSTing to /pipelines/{id}.

@gimsieke

This comment has been minimized.

Show comment
Hide comment
@gimsieke

gimsieke Sep 14, 2014

Contributor

👍

Contributor

gimsieke commented Sep 14, 2014

👍

ndw added a commit that referenced this issue Sep 15, 2014

@ndw

This comment has been minimized.

Show comment
Hide comment
@ndw

ndw Oct 2, 2014

Owner

Fixed, I think.

Owner

ndw commented Oct 2, 2014

Fixed, I think.

@ndw ndw closed this Oct 2, 2014

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment