Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reana.yaml: multi-line instructions #42

Open
tiborsimko opened this issue Sep 13, 2018 · 2 comments
Open

reana.yaml: multi-line instructions #42

tiborsimko opened this issue Sep 13, 2018 · 2 comments
Labels

Comments

@tiborsimko
Copy link
Member

Currently we have in reana.yaml long instructions like:

workflow:
  type: serial
  specification:
    steps:
      - environment: 'reanahub/reana-env-jupyter'
        commands:
          - mkdir -p results && papermill ${notebook} /dev/null -p input_file ${input_file} -p output_file ${output_file} -p region ${region} -p year_min ${year_min} -p year_max ${year_max}

located in one single line.

It would be useful to accept mult-iline formats such as:

workflow:
  type: serial
  specification:
    steps:
      - environment: 'reanahub/reana-env-jupyter'
        commands:
          - mkdir -p results && 
            papermill ${notebook} /dev/null 
                 -p input_file ${input_file} 
                 -p output_file ${output_file} 
                 -p region ${region} 
                 -p year_min ${year_min} 
                 -p year_max ${year_max}

for better readability.

A quick experiment with YAML's standard '>' technique to allow for newlines did not work; see reanahub/reana-demo-worldpopulation#22 (comment).

Investigate this.

@tiborsimko tiborsimko added this to the v0.4.0 milestone Sep 18, 2018
@diegodelemos diegodelemos self-assigned this Sep 24, 2018
diegodelemos pushed a commit to diegodelemos/reana-demo-helloworld that referenced this issue Sep 24, 2018
* Uses multiline command for better readability
  (addresses reanahub/reana-workflow-engine-serial#42).
diegodelemos pushed a commit to diegodelemos/reana-demo-helloworld that referenced this issue Sep 24, 2018
* Uses multiline command for better readability
  (addresses reanahub/reana-workflow-engine-serial#42).
diegodelemos pushed a commit to diegodelemos/reana-demo-helloworld that referenced this issue Sep 24, 2018
* Uses multi-line command for better readability
  (addresses reanahub/reana-workflow-engine-serial#42).
@diegodelemos
Copy link
Member

After investigating the yaml standard regarding multi-line strings, I have found the three ways in which we can support multi-line commands:

1. Using the > syntax (block scalar, folded style):

workflow:
  type: serial
  specification:
    steps:
      - environment: 'python:2.7'
        commands:
        - >
          echo "Running ${helloworld}." &&
          python "${helloworld}"
          --sleeptime ${sleeptime}
          --inputfile "${inputfile}"
          --outputfile "${outputfile}"

Available to try at reanahub/reana-demo-helloworld@07c8fdb.

Potential source of errors with this approach: it took me a while to realise that the > syntax was not working because, as the standard states, there is no line folding (allows long lines to be broken for readability) when the indentation of the different lines in the multi-line string is different, so next example wouldn't work (more info here):

workflow:
  type: serial
  specification:
    steps:
      - environment: 'python:2.7'
        commands:
         - >
           echo "Running ${helloworld}." &&
           python "${helloworld}"
-          --sleeptime ${sleeptime}
-          --inputfile "${inputfile}"
-          --outputfile "${outputfile}"
+                 --sleeptime ${sleeptime}
+                 --inputfile "${inputfile}"
+                 --outputfile "${outputfile}"

This is how the command looks like in the container:

$ kubectl get -o yaml pod bc381d0e-7ca6-43dd-87cb-2a02d0758a45-4dgp9
...
  - command:
    - bash
    - -c
    - "cd /reana/users/00000000-0000-0000-0000-000000000000/workflows/03d7521d-e606-4d48-b9d7-4a9e42ad0e15
      ; echo \"Running code/helloworld.py.\" && python \"code/helloworld.py\" --sleeptime
      2 --inputfile \"inputs/names.txt\" --outputfile \"outputs/greetings.txt\"\n "
...

2. Using the | syntax (block scalar, literal style):

workflow:
  type: serial
  specification:
    steps:
      - environment: 'python:2.7'
        commands:
        - |
          echo "Running ${helloworld}."
          python "${helloworld}" --sleeptime ${sleeptime} \
                                 --inputfile "${inputfile}" \
                                 --outputfile "${outputfile}"

Available to try at reanahub/reana-demo-helloworld@3fdcc47.

It is a more close approach to Dockerfiles' command syntax.

This is how it ends up looking inside the container:

$ kubectl get -o yaml pod 3a774fc3-5a83-4305-be68-edf93382e78d-wv579
...
  - command:
    - bash
    - -c
    - "cd /reana/users/00000000-0000-0000-0000-000000000000/workflows/6fb6fc46-a8d9-46b2-9bb6-6875e9537833
      ; echo \"Running code/helloworld.py.\"\npython \"code/helloworld.py\" --sleeptime
      2 \\\n                       --inputfile \"inputs/names.txt\" \\\n                       --outputfile
      \"outputs/greetings.txt\"\n "
...

3. Using no indicator (flow scalar, plain syle)

workflow:
  type: serial
  specification:
    steps:
      - environment: 'python:2.7'
        commands:
        - echo "Running ${helloworld}." &&
          python "${helloworld}" --sleeptime ${sleeptime}
                                 --inputfile "${inputfile}"
                                 --outputfile "${outputfile}"

Available to try at reanahub/reana-demo-helloworld@9bce493.

This approach is the less powerful since it has a lot of limitations, due to ambiguity reasons many characters would be forbidden. There is also the possibility to enclose the whole string in double or single quotes, plus escaping all forbidden characters inside the string (more info here).

$ kubectl get -o yaml pod a8f3a74c-8e1b-4266-8311-9b64e0f31120-4mdbl
...
  - command:
    - bash
    - -c
    - 'cd /reana/users/00000000-0000-0000-0000-000000000000/workflows/b953b28f-b7f2-44fa-a60c-8464fd65ad45
      ; echo "Running code/helloworld.py." && python "code/helloworld.py" --sleeptime
      2 --inputfile "inputs/names.txt" --outputfile "outputs/greetings.txt" '
...

As a conclusion, I think we should definitely go for a block scalar because option 3 will potentially end up being messy with escaped characters. Regarding block scalars, I would choose the literal style (option 2) since the problem with the indentation for the folded style (option 1) will definitely end up creating problems for users. Moreover, the standard recommends literal for code blocks.

cc'ing @reanahub/developers since this directly affects users.

@tiborsimko
Copy link
Member Author

@diegodelemos Nice summary; I also prefer the option number 2 where the use of backslashes seems rather intuitive. (E.g. Travis CI does the same in multiline conditions https://docs.travis-ci.com/user/conditions-v1#line-continuation-multiline-conditions.)

However dunno about the "visual non-splitting" of the echo and python commands in your second example; e.g. see its JSON representation:

$ yaml2json reana.yaml | jq -S '.workflow.specification.steps'
[
  {
    "commands": [
      "echo \"Running ${helloworld}.\"\npython \"${helloworld}\" --sleeptime ${sleeptime} \\\n                       --inputfile \"${inputfile}\" \\\n                       --outputfile \"${outputfile}\"\n"
    ],
    "environment": "python:2.7"
  }
]

The notion that the commands are multiple is lost there. Would be nice if commands were a list.

Seeing

 - command1 arg11 arg12
   command2 arg21 arg22 arg23 \ 
            arg24 arg25

people might treat it as:

 - command1 arg11 arg12 && \
   command2 arg21 arg22 arg23 \ 
            arg24 arg25

Consider something long as:

 - command1 arg11 arg12
 - command2 arg21 arg22 arg23 \ 
            arg24 arg25
 - command3 arg31 arg32
 - command4 arg41 arg42 arg43 \ 
            arg44 arg45
 - command5 arg51

...

@diegodelemos diegodelemos modified the milestones: v0.4.0, Someday Oct 19, 2018
@diegodelemos diegodelemos removed this from the Someday milestone Oct 4, 2019
@diegodelemos diegodelemos removed their assignment Oct 7, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants