Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reana.yaml: parameter array read from file #305

Closed
alintulu opened this issue Apr 22, 2020 · 4 comments · Fixed by reanahub/reana-workflow-engine-yadage#155 or reanahub/reana-commons#196

Comments

@alintulu
Copy link
Member

Both CWL and Yadage provide a “scatter-gather” paradigm. The workflow takes the input as an array and runs the specified steps on each element of the array as if it were a single input (Yadage allows for wanted batch size if specified).

The array can be declared in reana.yaml under inputs: parameters: like in the example from the Awesome Workshop.

Currently the parameter array has to be declared explicitly by writing each element of the array down as a new line in the reana.yaml.

inputs:
  parameters:
    cross_sections:
      - 19.6
      - 1.55
     [...]

This is okey when you have 2-10 entries, however not realistic to enter 1500 entries as may be the case (example; names of data set files).

To be added:

Allow to specify a file to read the entries from. Each line in the file would be taken as an entry to the array.

inputs:
  parameters:
    cross_sections:
      - index.txt

Instead of adding 1500 lines to the reana.yaml those lines could be read from index.txt. The parameter array cross_sections would then be provided to CWL or Yadage which would use it as an input for their “scatter-gather” paradigm.

@tiborsimko
Copy link
Member

Both CWL and Yadage can have inputs specified as separate files. Example for CWL:

$ cat reana.yaml
inputs:
  parameters:
    input: workflow/input.yml
workflow:
  type: cwl
  file: workflow/workflow.cwl

$ cat workflow/input.yml
library:
  class: File
  path: src/PhysicsObjectsHistos.cc
build_file:
  class: File
  path: BuildFile.xml
validation_script:
  class: File
  path: demoanalyzer_cfg.py

So you could use this technique, create a big input.yml that would list all the cross section values or all the dataset ROOT files etc, and this should work.

  • For Yadage it is also possible to do something like yadage-run workflow.yaml inputs.yaml, but I'm not sure we have any concrete example tested on REANA yet. So the "passing of input files" may need to be added to r-w-e-yadage, perhaps.

  • For CWL, we do have many examples, so this should work out of the box already.

Can you try to create a vanilla cwltool or yadage-run example using such input file, and once you have an example ready, we can see how to best convert it to `reana.yaml?

P.S. See e.g. reana-demo-worldpopulation CWL example that has 4-5 parameters.

@alintulu
Copy link
Member Author

alintulu commented Apr 23, 2020

Simple example of Yadage containing

  • Scatter-gather paradigm
  • Reading inputs from file

can be found here. Workflow runs with

yadage-run workdir workflow.yaml input.yaml

where the input is read from input.yaml. Next step figuring out how to best implement the passing of input parameters from input.yaml when file declared in reana.yaml. As mentioned this already works for CWL :)

@alintulu
Copy link
Member Author

alintulu commented May 12, 2020

In yadage it seems like initdata is a json with key-value pairs of 'parameter name'-'parameter value'. It is set in two ways, by

  • initfiles - yaml files, either passed in command line or by default if namned input.yml
  • parameters - params passed in the command line like -p pname=pvalue

In REANA initdata is set to workflow_parameters at reana_workflow_engine_yadage/clip.y and reana_workflow_controller/workflow_run_manager.py which in turn is set to parameters at reana_deb/models.py.

parameters are read from reana.yaml from the inputs: parameter: field.

inputs:
    parameters:

i.e. currently initdata passed to yadage can only be set by defining the parameters in reana.yaml.

It also seems like initfiles cannot be directly passed to yadage since only initdata is specified in steering_ctx.

Hence we can not just create an input.yml file and hand it to yadage as initfiles, but instead we have to create a method in REANA that sets initdata by

  1. given an input.yml file create json with key-values as specified in the yaml file
  2. append json to parameters which in turn sets initdata

@tiborsimko
Copy link
Member

Regarding user interface, we should introduce a new option initfiles that people can use in their reana.yaml, similarly to the recently-added options initdir and toplevel. In this way the analysis will have explicitly documented its input files and/or parameters.

Regarding implementation, the r-w-e-yadage would have to do something like the following to merge the input file parameters and command-line parameters:

from yadage.utils import getinit_data
initdata = getinit_data(initfiles, parameter)

in order to pass the resulting merged initdata to the yadage steering. (See Yadage sources.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants