Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

abbreviated syntax for passthrough call inputs #365

Merged
merged 4 commits into from
Nov 25, 2020

Conversation

mlin
Copy link
Member

@mlin mlin commented May 21, 2020

Very often our workflow has multiple tasks sharing some inputs in common (e.g. reference_genome_fasta or docker_image), and we bind them by declaring workflow input(s) with the same name, then mechanically passing those through each applicable call inputs: section. This results in repetitive foo=foo, bar=bar entries in call inputs throughout our workflow. Here we propose to allow abbreviating foo=foo, bar=bar to just foo, bar.

Implementation: https://github.com/chanzuckerberg/miniwdl/pull/388/files

Checklist

  • Pull request details were added to CHANGELOG.md

@rhpvorderman
Copy link
Contributor

This would be very useful, especially in workflows such as this one.

@patmagee
Copy link
Member

This is an interesting change and I think would definately simplify alot of the workflows that ive written. Its kind of python esque. It also seems like its one step towards treating Call blocks as though they themselves are functions.

@mlin would you be interested in taking a stab at update the grammar to support this? You should only have to change the parser.

@jdidion jdidion changed the base branch from master to main June 29, 2020 14:04
@patmagee patmagee added this to the v2.0 milestone Aug 28, 2020
@patmagee
Copy link
Member

patmagee commented Sep 2, 2020

@mlin I actually really like this idea as a way to reduce the boilerplate that a workflow would need. I wonder if a logical extension of this would also (or only) allow for position based passing of inputs.

For example, lets assume you have the following task

task foo {
  input {
    String biz
    String baz
    String bop
  }
  ...
  output {
    String bar = "..."
  }
}

Using your approach, the ordering of the inputs does not matter, only the name of the variables. As long as the variables share the same name as the task variables then passing the inputs in is incredibly easy.

String biz
String baz
String bop

# the user could map the similarily named variables in any order
# like this
call foo as pass_like_python {
  input: biz, baz, bop
}
# or like this. Its important to note that this and the previous call are equivalent
call foo as pass_like_python_2 {
  input: baz,biz,bop
}

This is only useful for a call which has no upstream dependencies and where the variables in the workflow match exactly what is in the task. A more flexible approach could potentially be using the position of the inputs instead of their names. This has the benefit of working for all use cases, even when upstream dependencies are passed in, and where there are no common variable names. The downside to this is that positional baseda arguments requires the user to reference the task

# Variables with no task reference, so they cannot be passed as original propossed
String a
String b
String c

#
call foo as pass_by_position {
  input: a, b,c 
}

# changing the order of inputs changes which task input is being bound
# the following is NOT the same as the above
call foo as pass_by_position_2 {
  input: c,a,b
}

# We could also use an upstream call outputs in a call reference
call foo pass_by_position_with_dependency {
  input: pass_by_position.bar, b, c
}

@rhpvorderman what I would be interested in knowing is which approach you would rather do?

@rhpvorderman
Copy link
Contributor

@patmagee I am in favour of the original proposal by @mlin so name-based. This has some advantages over position based.

  • Position-based is a nightmare for maintenance of the task. You can not add an extra option because it might ruin all depending workflows. So you can only add extra options add the very bottom, not group them with similar options.
  • Name-based is more explicit. What you see being passed-trough is what actually being passed trough. This will prevent a lot of semantic errors.

I.e. Take this name based example you wrote.

String biz
String baz
String bop

# the user could map the similarily named variables in any order
# like this
call foo as pass_like_python {
  input: biz, baz, bop
}
# or like this. Its important to note that this and the previous call are equivalent
call foo as pass_like_python_2 {
  input: baz,biz,bop
}

With name-based these have the same meaning. With position-based call foo as pass_like_python_2 will behave differently. And there is no way to know this except to look at the task definition. Which means you will have to go to a different section of the file, or to another file even. This is very inconvenient. Quick reading and comprehension is best served by being able to read the code from top-to-bottom from left-to-right. Name-based does allow this.

@patmagee
Copy link
Member

patmagee commented Sep 3, 2020

@rhpvorderman With the Name based lookups wouldnt you still need to look in the task to determine what inputs are named?

@mlin
Copy link
Member Author

mlin commented Sep 3, 2020

Task input declarations and calls to them currently have no inherent order, so introducing a positional concept seems to me like a relatively dramatic language change to contemplate at this stage. I think the ergonomics could be challenging given that non-toy WDL tasks commonly use 10+ inputs.

I think positional arguments would be great for a subset of calls with few inputs & reused numerous times in a workflow or a library of workflows -- i.e. more like a user-defined library function as we've discussed before, which would entail other sugar too.

@illusional
Copy link
Contributor

Yeah I'm much more in favour of name-based arguments:

  • I agree with Mike, task inputs don't feel they have an inherent order,
  • Name-based is more self-documenting, position based makes accidental swaps prone,
  • And positional when there's optional inputs means you'd have to use None, which if you had lots of params you could end up with biz, None, None, None, None, None, None, baz, None, None, pop.

@patmagee
Copy link
Member

patmagee commented Nov 3, 2020

@mlin, do you think this feature is ready for voting? It would be a great ease of use addition to 2.0

@mlin
Copy link
Member Author

mlin commented Nov 4, 2020

@patmagee Yes, I've updated this PR and the miniwdl implementation which I plan to include in the next release. Should be easy for everybody to implement if/when the time comes

@jdidion
Copy link
Collaborator

jdidion commented Nov 11, 2020

Implemented in a branch of wdlTools here: dnanexus/wdlTools#122

@patmagee
Copy link
Member

@mlin, we have two implementations now. This seems like it is ready to move to voting.

@jdidion
Copy link
Collaborator

jdidion commented Nov 13, 2020

👍

@rhpvorderman
Copy link
Contributor

👍

3 similar comments
@geoffjentry
Copy link
Member

👍

@patmagee
Copy link
Member

👍

@DavyCats
Copy link
Contributor

👍

mlin added a commit to chanzuckerberg/miniwdl that referenced this pull request Nov 16, 2020
@cjllanwarne
Copy link
Contributor

👍

@mlin mlin merged commit f337a0a into main Nov 25, 2020
@mlin mlin deleted the mlin-abbreviate-call-inputs branch November 25, 2020 20:08
@patmagee patmagee modified the milestones: v2.0, 1.2 Mar 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants