New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Only declared inputs should be available in task (ScalaTask) #71
Comments
To clarify, I am mainly interested in reducing the amount of data being transferred to the compute node. I assume if my input variable has some complex custom type or is a File, this type object has to be serialised, transmitted to the compute node, and deserialised again. Same in case of a File, which then would need to be copied to the workDir of the task on the compute node even though the file was not specified as actual input to this particular task but should only be passed on to the following task (and this transition takes place on the local machine, so nothing needs to be serialised/copied). |
Changing the specification on the strainer would imply additional complexity since the strained inputs and outputs would have to go through something else than the context of the task. I think the way to solve your pb is to use an additional transition instead of a strainer capsule. For instance: val bigDataProducer = Capsule(Task(....))
val computation = Capsule(Task(...))
val bigDataReader = Capsule(Task(....))
val mole = (bigDataProducer -- computation -- bigDataReader) + (bigDataProducer -- bigDataReader) Using this pattern the data would be transmitted along the direct transition from bigDataProducer to bigDataReader and will not be an input of the computation task. |
I see. Well ok, I guess I will be able to live with how it is and/or use your suggestion. The thing is that I am trying to make my rather complex workflow modular and separate tasks reusable. Here the (heavy) use of strainer capsules was quite handy as when I write the task I don't need to think of all the data that is currently flowing through my pipeline... |
Do you think the following pattern would be a useful addition to OpenMOLE itself ? It's actually just a Skip but without any condition (or negation of the second condition and Condition.True). The name can be changed of course if you have better ideas... package org.openmole.plugin.tool.pattern
import org.openmole.core.workflow.mole._
import org.openmole.core.workflow.puzzle._
import org.openmole.core.workflow.task._
import org.openmole.core.workflow.transition._
object Strain {
def apply(puzzle: Puzzle) = {
val first = Capsule(EmptyTask(), strainer = true)
val firstSlot = Slot(first)
val last = Capsule(EmptyTask(), strainer = true)
(firstSlot -- puzzle -- last) + (firstSlot -- last)
}
} |
This object Strain {
def apply(puzzle: Puzzle) = {
val first = Capsule(EmptyTask(), strainer = true)
val firstSlot = Slot(first)
val last = Capsule(EmptyTask(), strainer = true)
val action = firstSlot -- puzzle -- last
val strain = firstSlot -- last
Puzzle.merge(firstSlot, Seq(last), puzzles = Seq(action, strain))
}
}
val i = Val[Int]
val producer = EmptyTask() set (outputs += i, i := 42)
val consumer = ScalaTask("println(i)") set (inputs += i)
val ex = producer -- Strain(computation) -- consumer start The error is:
Using a strainer capsule works fine, though: val ex = producer -- Capsule(computation, strainer = true) -- consumer start |
The slot should be on the last capsule: object Strain {
def apply(puzzle: Puzzle) = {
val first = Capsule(EmptyTask(), strainer = true)
val last = Slot(Capsule(EmptyTask(), strainer = true))
val action = first -- puzzle -- last
val strain = first -- last
action + strain
}
} Does it work like that? |
Yes! Thanks for fixing it. It looks like this pattern is exactly what I was looking for! |
Ah... just closed it b/c I added this Strain object to my plugin. But back to the question, do you think we should add it to the "tool.pattern" plugin of OpenMOLE ? |
Definitely. |
I'll send a PR then shortly. |
Using the new |
P.S.: I've just learned there is a limitation to these patterns, though. The wrapped puzzles must aggregate the results already if an exploration took place. I.e., I cannot wrap a single ExplorationTask inside a val i = Val[Int]
val t = ScalaTask("println(i)") set (inputs += i)
Strain(ExplorationTask(i in Range(1 to 10))) -< t |
Using a strainer capsule, all inputs are passed through to the output. Only a subset of these inputs may be used by the encapsulated task such as a ScalaTask (or SystemExecTask). Currently, the Scala script of the ScalaTask is compiled with all strained inputs available to the user code. Instead, only the actual subset of declared ScalaTask inputs should really be made available. My concern is mainly that if all inputs must be available to the ScalaTask which may be executed remotely, all these inputs must be copied to the compute node even if it was made clear that the task will only require a subset of the inputs.
I can tell that all strained inputs are made available to the ScalaTask (not only those added via the tasks InputBuilder) in the error output of a forced compilation failure as seen below. Given that the ScalaTask has no inputs, the
input
object should not have thevar d
member which was the output of the preceding task.The text was updated successfully, but these errors were encountered: