Only declared inputs should be available in task (ScalaTask) #71

schuhschuh · 2015-06-06T13:55:37Z

Using a strainer capsule, all inputs are passed through to the output. Only a subset of these inputs may be used by the encapsulated task such as a ScalaTask (or SystemExecTask). Currently, the Scala script of the ScalaTask is compiled with all strained inputs available to the user code. Instead, only the actual subset of declared ScalaTask inputs should really be made available. My concern is mainly that if all inputs must be available to the ScalaTask which may be executed remotely, all these inputs must be copied to the compute node even if it was made clear that the task will only require a subset of the inputs.

I can tell that all strained inputs are made available to the ScalaTask (not only those added via the tasks InputBuilder) in the error output of a forced compilation failure as seen below. Given that the ScalaTask has no inputs, the input object should not have the var d member which was the output of the preceding task.

val a = Capsule(ScalaTask("foo"), strainer = true)

SEVERE: Error in user job execution for capsule scalaTask0, job state is FAILED.
org.openmole.core.exception.InternalProcessingError: Error for context values in scalaTask0 {d=com.andreasschuh.repeat.core.Dataset@80bb1de, oMSeed=-8870641145331932613}
        at org.openmole.core.workflow.tools.InputOutputCheck$class.perform(InputOutputCheck.scala:94)
        at org.openmole.plugin.task.scala.ScalaTask.perform(ScalaTask.scala:33)
        at org.openmole.core.workflow.task.Task$class.perform(Task.scala:47)
        at org.openmole.plugin.task.scala.ScalaTask.perform(ScalaTask.scala:33)
        at org.openmole.core.workflow.mole.StrainerTaskDecorator.process(Capsule.scala:139)
        at org.openmole.core.workflow.mole.StrainerTaskDecorator.perform(Capsule.scala:138)
        at org.openmole.core.workflow.job.MoleJob.perform(MoleJob.scala:99)
        at org.openmole.core.workflow.execution.local.LocalExecutor$$anonfun$1$$anonfun$apply$1.apply(LocalExecutor.scala:70)
        at org.openmole.core.workflow.execution.local.LocalExecutor$$anonfun$1$$anonfun$apply$1.apply(LocalExecutor.scala:62)
        at scala.collection.immutable.List.foreach(List.scala:381)
        at org.openmole.core.workflow.execution.local.LocalExecutor$$anonfun$1.apply(LocalExecutor.scala:62)
        at org.openmole.core.workflow.execution.local.LocalExecutor$$anonfun$1.apply(LocalExecutor.scala:59)
        at org.openmole.core.workflow.execution.local.LocalExecutor.withRedirectedOutput(LocalExecutor.scala:113)
        at org.openmole.core.workflow.execution.local.LocalExecutor.run(LocalExecutor.scala:59)
        at java.lang.Thread.run(Thread.java:745)
Caused by: org.openmole.core.exception.InternalProcessingError: Error compiling import org.openmole.core.workflow.tools.CodeTool._

(_input_value_context: org.openmole.core.workflow.data.Context, _input_value_RNGProvider: org.openmole.core.workflow.data.RandomProvider) => {
    object input {
      var d = _input_value_context("d").asInstanceOf[com.andreasschuh.repeat.core.Dataset]; var oMSeed = _input_value_context("oMSeed").asInstanceOf[Long]; var workDir = _input_value_context("workDir").asInstanceOf[java.io.File]
    }
    import input._
    implicit lazy val oMRNG: util.Random = _input_value_RNGProvider()
    foo

import scala.collection.JavaConversions.mapAsJavaMap
mapAsJavaMap(Map[String, Any](  ))

}

        at org.openmole.core.workflow.tools.ScalaCompilation$$anonfun$compile$1.apply(ScalaCompilation.scala:42)
        at scala.util.Try$.apply(Try.scala:191)
        at org.openmole.core.workflow.tools.ScalaCompilation$class.compile(ScalaCompilation.scala:38)
        at org.openmole.plugin.task.scala.ScalaTask.compile(ScalaTask.scala:33)
        at org.openmole.core.workflow.tools.ScalaWrappedCompilation$class.function(ScalaCompilation.scala:136)
        at org.openmole.plugin.task.scala.ScalaTask.function(ScalaTask.scala:33)
        at org.openmole.core.workflow.tools.ScalaWrappedCompilation$$anonfun$compiled$1.apply(ScalaCompilation.scala:181)
        at org.openmole.core.workflow.tools.ScalaWrappedCompilation$$anonfun$compiled$1.apply(ScalaCompilation.scala:181)
        at scala.collection.mutable.MapLike$class.getOrElseUpdate(MapLike.scala:194)
        at scala.collection.mutable.AbstractMap.getOrElseUpdate(Map.scala:80)
        at org.openmole.core.workflow.tools.ScalaWrappedCompilation$class.compiled(ScalaCompilation.scala:179)
        at org.openmole.plugin.task.scala.ScalaTask.compiled(ScalaTask.scala:33)
        at org.openmole.core.workflow.tools.ScalaWrappedCompilation$class.compiled(ScalaCompilation.scala:194)
        at org.openmole.plugin.task.scala.ScalaTask.compiled(ScalaTask.scala:33)
        at org.openmole.core.workflow.tools.ScalaWrappedCompilation$class.run(ScalaCompilation.scala:197)
        at org.openmole.plugin.task.scala.ScalaTask.run(ScalaTask.scala:33)
        at org.openmole.plugin.task.scala.ScalaTask.processCode(ScalaTask.scala:34)
        at org.openmole.plugin.task.jvm.JVMLanguageTask$class.process(JVMLanguageTask.scala:41)
        at org.openmole.plugin.task.scala.ScalaTask.process(ScalaTask.scala:33)
        at org.openmole.core.workflow.task.Task$$anonfun$perform$1.apply(Task.scala:47)
        at org.openmole.core.workflow.task.Task$$anonfun$perform$1.apply(Task.scala:47)
        at org.openmole.core.workflow.tools.InputOutputCheck$class.perform(InputOutputCheck.scala:91)
        ... 14 more
Caused by: org.openmole.core.exception.UserBadDataError: Error while compiling:
not found: value foo
                  foo
                  ^
on line 17
        at org.openmole.core.console.ScalaREPL.eval(ScalaREPL.scala:70)
        ... 36 more

The text was updated successfully, but these errors were encountered:

schuhschuh · 2015-06-06T13:57:50Z

To clarify, I am mainly interested in reducing the amount of data being transferred to the compute node. I assume if my input variable has some complex custom type or is a File, this type object has to be serialised, transmitted to the compute node, and deserialised again. Same in case of a File, which then would need to be copied to the workDir of the task on the compute node even though the file was not specified as actual input to this particular task but should only be passed on to the following task (and this transition takes place on the local machine, so nothing needs to be serialised/copied).

romainreuillon · 2015-06-06T16:52:14Z

Changing the specification on the strainer would imply additional complexity since the strained inputs and outputs would have to go through something else than the context of the task. I think the way to solve your pb is to use an additional transition instead of a strainer capsule. For instance:

val bigDataProducer = Capsule(Task(....))
val computation = Capsule(Task(...))
val bigDataReader = Capsule(Task(....))

val mole = (bigDataProducer -- computation -- bigDataReader) + (bigDataProducer -- bigDataReader)

Using this pattern the data would be transmitted along the direct transition from bigDataProducer to bigDataReader and will not be an input of the computation task.

schuhschuh · 2015-06-06T22:53:38Z

I see. Well ok, I guess I will be able to live with how it is and/or use your suggestion. The thing is that I am trying to make my rather complex workflow modular and separate tasks reusable. Here the (heavy) use of strainer capsules was quite handy as when I write the task I don't need to think of all the data that is currently flowing through my pipeline...

schuhschuh · 2015-06-07T11:24:24Z

Do you think the following pattern would be a useful addition to OpenMOLE itself ? It's actually just a Skip but without any condition (or negation of the second condition and Condition.True). The name can be changed of course if you have better ideas...

package org.openmole.plugin.tool.pattern

import org.openmole.core.workflow.mole._
import org.openmole.core.workflow.puzzle._
import org.openmole.core.workflow.task._
import org.openmole.core.workflow.transition._

object Strain {

  def apply(puzzle: Puzzle) = {
    val first = Capsule(EmptyTask(), strainer = true)
    val firstSlot = Slot(first)
    val last = Capsule(EmptyTask(), strainer = true)

    (firstSlot -- puzzle -- last) + (firstSlot -- last)
  }

}

schuhschuh · 2015-06-07T17:45:55Z

This Strain pattern doesn't actually seem to work. I get many formal validation errors stating that the inputs for the following task (the strained ones) are missing. Do you spot any mistake in how I implemented this pattern @romainreuillon ?

object Strain {
  def apply(puzzle: Puzzle) = {
    val first = Capsule(EmptyTask(), strainer = true)
    val firstSlot = Slot(first)
    val last = Capsule(EmptyTask(), strainer = true)

    val action = firstSlot -- puzzle -- last
    val strain = firstSlot -- last

    Puzzle.merge(firstSlot, Seq(last), puzzles = Seq(action, strain))
  }
}

val i = Val[Int]

val producer = EmptyTask() set (outputs += i, i := 42)
val consumer = ScalaTask("println(i)") set (inputs += i)
val ex = producer -- Strain(computation) -- consumer start

The error is:

org.openmole.core.exception.UserBadDataError: Formal validation of your mole has failed, several errors have been found: Input i: Int is missing when reaching the Slot hash 173617743 of scalaTask0.
  at org.openmole.core.workflow.mole.MoleExecution.start(MoleExecution.scala:175)
  ... 43 elided

Using a strainer capsule works fine, though:

val ex = producer -- Capsule(computation, strainer = true) -- consumer start

romainreuillon · 2015-06-07T20:31:20Z

The slot should be on the last capsule:

object Strain {
  def apply(puzzle: Puzzle) = {
    val first = Capsule(EmptyTask(), strainer = true)
    val last = Slot(Capsule(EmptyTask(), strainer = true))

    val action = first -- puzzle -- last
    val strain = first -- last

    action + strain
  }
}

Does it work like that?

schuhschuh · 2015-06-07T20:38:37Z

Yes! Thanks for fixing it. It looks like this pattern is exactly what I was looking for!

schuhschuh · 2015-06-07T20:39:40Z

Ah... just closed it b/c I added this Strain object to my plugin. But back to the question, do you think we should add it to the "tool.pattern" plugin of OpenMOLE ?

romainreuillon · 2015-06-07T20:40:11Z

Definitely.

schuhschuh · 2015-06-07T20:41:07Z

I'll send a PR then shortly.

schuhschuh · 2015-06-07T21:18:17Z

Using the new Strain pattern instead of strainer capsules, I can create modular tasks/puzzles which only consume and produce those variables they are really using or generating, respectively. All other data is passed by the task/puzzle via the first -- last shortcut. Problem solved.

schuhschuh · 2015-06-07T22:41:32Z

P.S.: I've just learned there is a limitation to these patterns, though. The wrapped puzzles must aggregate the results already if an exploration took place. I.e., I cannot wrap a single ExplorationTask inside a Strain. For this I still have to use a regular strainer capsule. In code, the following doesn't work because either it produces an Array as output (and hence type mismatch for the task after the exploration transition) or causes a LevelProblem. This should be because only one "branch" of the Strain contains an exploration while the other doesn't. The same should be true for Switch and Skip.

val i = Val[Int]
val t = ScalaTask("println(i)") set (inputs += i)
Strain(ExplorationTask(i in Range(1 to 10))) -< t

schuhschuh closed this as completed Jun 7, 2015

schuhschuh reopened this Jun 7, 2015

schuhschuh mentioned this issue Jun 7, 2015

[Plugin] add: New Strain pattern #73

Merged

schuhschuh closed this as completed Jun 7, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Only declared inputs should be available in task (ScalaTask) #71

Only declared inputs should be available in task (ScalaTask) #71

schuhschuh commented Jun 6, 2015

schuhschuh commented Jun 6, 2015

romainreuillon commented Jun 6, 2015

schuhschuh commented Jun 6, 2015

schuhschuh commented Jun 7, 2015

schuhschuh commented Jun 7, 2015

romainreuillon commented Jun 7, 2015

schuhschuh commented Jun 7, 2015

schuhschuh commented Jun 7, 2015

romainreuillon commented Jun 7, 2015

schuhschuh commented Jun 7, 2015

schuhschuh commented Jun 7, 2015

schuhschuh commented Jun 7, 2015

Only declared inputs should be available in task (ScalaTask) #71

Only declared inputs should be available in task (ScalaTask) #71

Comments

schuhschuh commented Jun 6, 2015

schuhschuh commented Jun 6, 2015

romainreuillon commented Jun 6, 2015

schuhschuh commented Jun 6, 2015

schuhschuh commented Jun 7, 2015

schuhschuh commented Jun 7, 2015

romainreuillon commented Jun 7, 2015

schuhschuh commented Jun 7, 2015

schuhschuh commented Jun 7, 2015

romainreuillon commented Jun 7, 2015

schuhschuh commented Jun 7, 2015

schuhschuh commented Jun 7, 2015

schuhschuh commented Jun 7, 2015