Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Accessing workflow final output #18

Open
hbredin opened this issue Mar 2, 2016 · 3 comments
Open

Accessing workflow final output #18

hbredin opened this issue Mar 2, 2016 · 3 comments

Comments

@hbredin
Copy link

hbredin commented Mar 2, 2016

I plan to use hyperopt hyper-parameter optimization toolkit in combination with sciluigi workflows.

Instead of doing exhaustive grid search, hyperopt will (smartly) propose a set of parameters params that I use like this:

task = MyWorkflow(**params)
luigi.build([task], local_scheduler=True)

In my current setup, the final task of the workflow prints the value of the objective into a file.
Therefore, I have to read this file again to let hyperopt know what is the value of the objective for the current set of parameters. It means that I have to know exactly where the workflow will save this file.

I expected task.output() to return the output of the final task but instead it returns a dict with audit and log values. Is there an easy way to access the output of the final task of the workflow?

Even better, is there a way to make the workflow returns the value of the objective directly?
objective = f(task)

@samuell
Copy link
Member

samuell commented Mar 2, 2016

Hi @hbredin! Interesting use case, it sounds very similar to what we are doing with machine learning in drug discovery / cheminformatics!

It sounds like you are basically looking for using the workflow task as a "subworkflow", to be a part of a larger workflow. Is that correct?

We have plans (See this issue) to implement sub-workflow support in SciLuigi. It should not be a big development but we simply haven't got onto it just yet, as we have managed to do without it so far. But now that there is more need for it, we should probably look into implementing this.

@hbredin
Copy link
Author

hbredin commented Mar 5, 2016

Indeed, it would be great to have sciluigi.Workflow implement the standard luigi.Task interface:

  • .requires(self)
  • .run(self)
  • .output(self)

But, really, what I am looking for right now is a standard .output() method that would return the same thing as luigi.Tasks do -- so that I can use workflow.output().path.

@hbredin
Copy link
Author

hbredin commented Mar 17, 2016

FYI, I ended up adding a dummy Hyperopt task at the end of my workflow that write the path of the final task to a temporary file whose path (temp) is provided as parameter:

class Hyperopt(sciluigi.Task):

    temp = luigi.Parameter()
    in_final = None

    def out_put(self):
        return sciluigi.TargetInfo(self, self.temp)

    def run(self):
        with self.out_put().open('w') as fp:
            fp.write(self.in_final().path)

class MyWorkflow(sciluigi.WorkflowTask):

    hyperopt = luigi.Parameter()

    def workflow(self):
        ...
        final_task = self.new_task(...)
        # return final_task

        hyperopt = self.new_task('hyperopt', Hyperopt, temp=self.hyperopt)
        hyperopt.in_final = final_task.out_put
        return hyperopt

Then, to get the path of the output of the final task, here is what I have to do:

# create path to temporary file and add it to the set of parameters
directory = mkdtemp()
params['hyperopt'] = directory + '/hyperopt'

# actually run the workflow with this extended set of parameters
task = MyWorkflow(**params)
luigi.build([task], local_scheduler=True)

# obtain the path of final task output from the temporary hyperopt file
with open(args['hyperopt'], 'r') as fp:
    path = fp.read()

# do something with path...

I will now try to write a decorator for any sciluigi.WorkflowTask that does this change automatically.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants