Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Only declared inputs should be available in task (ScalaTask) #71

Closed
schuhschuh opened this issue Jun 6, 2015 · 12 comments
Closed

Only declared inputs should be available in task (ScalaTask) #71

schuhschuh opened this issue Jun 6, 2015 · 12 comments

Comments

@schuhschuh
Copy link
Contributor

Using a strainer capsule, all inputs are passed through to the output. Only a subset of these inputs may be used by the encapsulated task such as a ScalaTask (or SystemExecTask). Currently, the Scala script of the ScalaTask is compiled with all strained inputs available to the user code. Instead, only the actual subset of declared ScalaTask inputs should really be made available. My concern is mainly that if all inputs must be available to the ScalaTask which may be executed remotely, all these inputs must be copied to the compute node even if it was made clear that the task will only require a subset of the inputs.

I can tell that all strained inputs are made available to the ScalaTask (not only those added via the tasks InputBuilder) in the error output of a forced compilation failure as seen below. Given that the ScalaTask has no inputs, the input object should not have the var d member which was the output of the preceding task.

val a = Capsule(ScalaTask("foo"), strainer = true)
SEVERE: Error in user job execution for capsule scalaTask0, job state is FAILED.
org.openmole.core.exception.InternalProcessingError: Error for context values in scalaTask0 {d=com.andreasschuh.repeat.core.Dataset@80bb1de, oMSeed=-8870641145331932613}
        at org.openmole.core.workflow.tools.InputOutputCheck$class.perform(InputOutputCheck.scala:94)
        at org.openmole.plugin.task.scala.ScalaTask.perform(ScalaTask.scala:33)
        at org.openmole.core.workflow.task.Task$class.perform(Task.scala:47)
        at org.openmole.plugin.task.scala.ScalaTask.perform(ScalaTask.scala:33)
        at org.openmole.core.workflow.mole.StrainerTaskDecorator.process(Capsule.scala:139)
        at org.openmole.core.workflow.mole.StrainerTaskDecorator.perform(Capsule.scala:138)
        at org.openmole.core.workflow.job.MoleJob.perform(MoleJob.scala:99)
        at org.openmole.core.workflow.execution.local.LocalExecutor$$anonfun$1$$anonfun$apply$1.apply(LocalExecutor.scala:70)
        at org.openmole.core.workflow.execution.local.LocalExecutor$$anonfun$1$$anonfun$apply$1.apply(LocalExecutor.scala:62)
        at scala.collection.immutable.List.foreach(List.scala:381)
        at org.openmole.core.workflow.execution.local.LocalExecutor$$anonfun$1.apply(LocalExecutor.scala:62)
        at org.openmole.core.workflow.execution.local.LocalExecutor$$anonfun$1.apply(LocalExecutor.scala:59)
        at org.openmole.core.workflow.execution.local.LocalExecutor.withRedirectedOutput(LocalExecutor.scala:113)
        at org.openmole.core.workflow.execution.local.LocalExecutor.run(LocalExecutor.scala:59)
        at java.lang.Thread.run(Thread.java:745)
Caused by: org.openmole.core.exception.InternalProcessingError: Error compiling import org.openmole.core.workflow.tools.CodeTool._

(_input_value_context: org.openmole.core.workflow.data.Context, _input_value_RNGProvider: org.openmole.core.workflow.data.RandomProvider) => {
    object input {
      var d = _input_value_context("d").asInstanceOf[com.andreasschuh.repeat.core.Dataset]; var oMSeed = _input_value_context("oMSeed").asInstanceOf[Long]; var workDir = _input_value_context("workDir").asInstanceOf[java.io.File]
    }
    import input._
    implicit lazy val oMRNG: util.Random = _input_value_RNGProvider()
    foo

import scala.collection.JavaConversions.mapAsJavaMap
mapAsJavaMap(Map[String, Any](  ))

}

        at org.openmole.core.workflow.tools.ScalaCompilation$$anonfun$compile$1.apply(ScalaCompilation.scala:42)
        at scala.util.Try$.apply(Try.scala:191)
        at org.openmole.core.workflow.tools.ScalaCompilation$class.compile(ScalaCompilation.scala:38)
        at org.openmole.plugin.task.scala.ScalaTask.compile(ScalaTask.scala:33)
        at org.openmole.core.workflow.tools.ScalaWrappedCompilation$class.function(ScalaCompilation.scala:136)
        at org.openmole.plugin.task.scala.ScalaTask.function(ScalaTask.scala:33)
        at org.openmole.core.workflow.tools.ScalaWrappedCompilation$$anonfun$compiled$1.apply(ScalaCompilation.scala:181)
        at org.openmole.core.workflow.tools.ScalaWrappedCompilation$$anonfun$compiled$1.apply(ScalaCompilation.scala:181)
        at scala.collection.mutable.MapLike$class.getOrElseUpdate(MapLike.scala:194)
        at scala.collection.mutable.AbstractMap.getOrElseUpdate(Map.scala:80)
        at org.openmole.core.workflow.tools.ScalaWrappedCompilation$class.compiled(ScalaCompilation.scala:179)
        at org.openmole.plugin.task.scala.ScalaTask.compiled(ScalaTask.scala:33)
        at org.openmole.core.workflow.tools.ScalaWrappedCompilation$class.compiled(ScalaCompilation.scala:194)
        at org.openmole.plugin.task.scala.ScalaTask.compiled(ScalaTask.scala:33)
        at org.openmole.core.workflow.tools.ScalaWrappedCompilation$class.run(ScalaCompilation.scala:197)
        at org.openmole.plugin.task.scala.ScalaTask.run(ScalaTask.scala:33)
        at org.openmole.plugin.task.scala.ScalaTask.processCode(ScalaTask.scala:34)
        at org.openmole.plugin.task.jvm.JVMLanguageTask$class.process(JVMLanguageTask.scala:41)
        at org.openmole.plugin.task.scala.ScalaTask.process(ScalaTask.scala:33)
        at org.openmole.core.workflow.task.Task$$anonfun$perform$1.apply(Task.scala:47)
        at org.openmole.core.workflow.task.Task$$anonfun$perform$1.apply(Task.scala:47)
        at org.openmole.core.workflow.tools.InputOutputCheck$class.perform(InputOutputCheck.scala:91)
        ... 14 more
Caused by: org.openmole.core.exception.UserBadDataError: Error while compiling:
not found: value foo
                  foo
                  ^
on line 17
        at org.openmole.core.console.ScalaREPL.eval(ScalaREPL.scala:70)
        ... 36 more

@schuhschuh
Copy link
Contributor Author

To clarify, I am mainly interested in reducing the amount of data being transferred to the compute node. I assume if my input variable has some complex custom type or is a File, this type object has to be serialised, transmitted to the compute node, and deserialised again. Same in case of a File, which then would need to be copied to the workDir of the task on the compute node even though the file was not specified as actual input to this particular task but should only be passed on to the following task (and this transition takes place on the local machine, so nothing needs to be serialised/copied).

@romainreuillon
Copy link
Contributor

Changing the specification on the strainer would imply additional complexity since the strained inputs and outputs would have to go through something else than the context of the task. I think the way to solve your pb is to use an additional transition instead of a strainer capsule. For instance:

val bigDataProducer = Capsule(Task(....))
val computation = Capsule(Task(...))
val bigDataReader = Capsule(Task(....))

val mole = (bigDataProducer -- computation -- bigDataReader) + (bigDataProducer -- bigDataReader)

Using this pattern the data would be transmitted along the direct transition from bigDataProducer to bigDataReader and will not be an input of the computation task.

@schuhschuh
Copy link
Contributor Author

I see. Well ok, I guess I will be able to live with how it is and/or use your suggestion. The thing is that I am trying to make my rather complex workflow modular and separate tasks reusable. Here the (heavy) use of strainer capsules was quite handy as when I write the task I don't need to think of all the data that is currently flowing through my pipeline...

@schuhschuh
Copy link
Contributor Author

Do you think the following pattern would be a useful addition to OpenMOLE itself ? It's actually just a Skip but without any condition (or negation of the second condition and Condition.True). The name can be changed of course if you have better ideas...

package org.openmole.plugin.tool.pattern

import org.openmole.core.workflow.mole._
import org.openmole.core.workflow.puzzle._
import org.openmole.core.workflow.task._
import org.openmole.core.workflow.transition._

object Strain {

  def apply(puzzle: Puzzle) = {
    val first = Capsule(EmptyTask(), strainer = true)
    val firstSlot = Slot(first)
    val last = Capsule(EmptyTask(), strainer = true)

    (firstSlot -- puzzle -- last) + (firstSlot -- last)
  }

}

@schuhschuh
Copy link
Contributor Author

This Strain pattern doesn't actually seem to work. I get many formal validation errors stating that the inputs for the following task (the strained ones) are missing. Do you spot any mistake in how I implemented this pattern @romainreuillon ?

object Strain {
  def apply(puzzle: Puzzle) = {
    val first = Capsule(EmptyTask(), strainer = true)
    val firstSlot = Slot(first)
    val last = Capsule(EmptyTask(), strainer = true)

    val action = firstSlot -- puzzle -- last
    val strain = firstSlot -- last

    Puzzle.merge(firstSlot, Seq(last), puzzles = Seq(action, strain))
  }
}

val i = Val[Int]

val producer = EmptyTask() set (outputs += i, i := 42)
val consumer = ScalaTask("println(i)") set (inputs += i)
val ex = producer -- Strain(computation) -- consumer start

The error is:

org.openmole.core.exception.UserBadDataError: Formal validation of your mole has failed, several errors have been found: Input i: Int is missing when reaching the Slot hash 173617743 of scalaTask0.
  at org.openmole.core.workflow.mole.MoleExecution.start(MoleExecution.scala:175)
  ... 43 elided

Using a strainer capsule works fine, though:

val ex = producer -- Capsule(computation, strainer = true) -- consumer start

@romainreuillon
Copy link
Contributor

The slot should be on the last capsule:

object Strain {
  def apply(puzzle: Puzzle) = {
    val first = Capsule(EmptyTask(), strainer = true)
    val last = Slot(Capsule(EmptyTask(), strainer = true))

    val action = first -- puzzle -- last
    val strain = first -- last

    action + strain
  }
}

Does it work like that?

@schuhschuh
Copy link
Contributor Author

Yes! Thanks for fixing it. It looks like this pattern is exactly what I was looking for!

@schuhschuh
Copy link
Contributor Author

Ah... just closed it b/c I added this Strain object to my plugin. But back to the question, do you think we should add it to the "tool.pattern" plugin of OpenMOLE ?

@romainreuillon
Copy link
Contributor

Definitely.

@schuhschuh
Copy link
Contributor Author

I'll send a PR then shortly.

@schuhschuh
Copy link
Contributor Author

Using the new Strain pattern instead of strainer capsules, I can create modular tasks/puzzles which only consume and produce those variables they are really using or generating, respectively. All other data is passed by the task/puzzle via the first -- last shortcut. Problem solved.

@schuhschuh
Copy link
Contributor Author

P.S.: I've just learned there is a limitation to these patterns, though. The wrapped puzzles must aggregate the results already if an exploration took place. I.e., I cannot wrap a single ExplorationTask inside a Strain. For this I still have to use a regular strainer capsule. In code, the following doesn't work because either it produces an Array as output (and hence type mismatch for the task after the exploration transition) or causes a LevelProblem. This should be because only one "branch" of the Strain contains an exploration while the other doesn't. The same should be true for Switch and Skip.

val i = Val[Int]
val t = ScalaTask("println(i)") set (inputs += i)
Strain(ExplorationTask(i in Range(1 to 10))) -< t

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants