Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

implement support for tuples with optional file outputs #2710

Closed

Conversation

rcannood
Copy link

@rcannood rcannood commented Mar 9, 2022

This is a suggested implementation for #2678.

I spent a few hours coming up with a solution which seems to work. Running the following main.nf (called 'attempt 1' in #2678):

nextflow.enable.dsl=2

process test_process1 {
  input:
    tuple val(id)
  output:
    tuple val(id), path("output.txt"), path("output2.txt", optional: true)
  script:
    """
    echo $id > output.txt
    if [[ "$id" == "foo" ]]; then
      echo $id > output2.txt
    fi
    """
}

workflow {
  Channel.fromList( ["foo", "bar"] )
    | view { "input: ${it}" }
    | test_process1
    | view { "output: ${it}" }
}

Results in the following output:

input: foo
input: bar
output: [foo, work/a1/c20320b80afe33ff3137c7d25dc6bc/output.txt, work/a1/c20320b80afe33ff3137c7d25dc6bc/output2.txt]
output: [bar, work/50/8a3705c51700aec25955dd251273d8/output.txt, null]

the "attempt 2" approach (tuple val(id), path("output.txt"), path("output2.txt") optional true) -- which is not a solution to my problem -- still yields the same result:

input: foo
input: bar
output: [foo, work/81/66ed22161ab9e9d7e66237a05e47c2/output.txt, work/81/66ed22161ab9e9d7e66237a05e47c2/output2.txt]

If I make both the tuple and the path non-optional (tuple val(id), path("output.txt"), path("output2.txt"), I get the following (expected) output:

input: foo
input: bar
output: [foo, /home/rcannood/workspace/di/viash_nxf_poc/work/e1/814e717ceaa7325ed18b3ee136ee77/output.txt, /home/rcannood/workspace/di/viash_nxf_poc/work/e1/814e717ceaa7325ed18b3ee136ee77/output2.txt]
[1d/cf5edd] NOTE: Missing output file(s) `output2.txt` expected by process `test_process1 (2)` -- Error is ignored

Please let me know if the suggested implementation could be of use.

Signed-off-by: Robrecht Cannoodt <rcannood@gmail.com>
@rcannood rcannood force-pushed the feature/optional_tuple_values branch from f484f65 to 50ef43c Compare March 9, 2022 14:39
@bentsherman
Copy link
Member

Hi @rcannood , the CI tests are failing. Please check the details to see which tests failed. It may be that they were already failing on master. If you can confirm that the same tests were failing before your changes, please let me know here and we can disregard them. Otherwise, you'll need to address those test failures.

Signed-off-by: Robrecht Cannoodt <rcannood@gmail.com>
@rcannood rcannood force-pushed the feature/optional_tuple_values branch from ac43cdf to e8266b0 Compare March 10, 2022 15:15
@rcannood
Copy link
Author

Thanks for taking a look at this PR, @bentsherman .

Apologies for the failing test, that was a mistake from my end. Should be fixed now.

Signed-off-by: Robrecht Cannoodt <rcannood@gmail.com>
Signed-off-by: Robrecht Cannoodt <rcannood@gmail.com>
Signed-off-by: Robrecht Cannoodt <rcannood@gmail.com>
@bentsherman
Copy link
Member

I think that test may have been just a clever way of testing multiple cases at once, but feel free to hijack it 🙂

@pditommaso
Copy link
Member

Sorry, I was assessing the impact of having null into an output value, since nextflow usually does not allow that.

The main concern is what happens when a downstream task is expecting a triple and the 3rd element is null. I think an error is thrown

@rcannood
Copy link
Author

I see. To some extent, if the user wants to use the functionality of having optional files, they should know how to deal with the output.

In our case, we will map the output of a process with optional output to catch null values. The following code works with the current implementation in the PR:

nextflow.enable.dsl=2

process test_process1 {
  input:
    tuple val(id)
  output:
    tuple val(id), path("proc1_output.txt"), path("proc1_output2.txt", optional: true)
  script:
    """
    echo $id > proc1_output.txt
    if [[ "$id" == "foo" ]]; then
      echo $id > proc1_output2.txt
    fi
    """
}
process test_process2 {
  input:
    tuple val(id), path(input), path(input2)
  output:
    tuple val(id), path("proc2_output.txt")
  script:
    """
    cat $input > proc2_output.txt
    ${ input2 instanceof List && input2.isEmpty() ? "" : "cat ${input2} >> proc2_output.txt" }
    """
}

workflow {
  Channel.fromList( ["foo", "bar"] )
    | test_process
    | map { it -> 
      if (!it[2]) it[2] = []
      it
    }
    | test_process2
}

However, it's a bit cumbersome to have to map null values to empty lists to get the code to work. I think it might make sense to allow the following:

nextflow.enable.dsl=2

process test_process1 {
  input:
    tuple val(id)
  output:
    tuple val(id), path("proc1_output.txt"), path("proc1_output2.txt", optional: true)
  script:
    """
    echo $id > proc1_output.txt
    if [[ "$id" == "foo" ]]; then
      echo $id > proc1_output2.txt
    fi
    """
}
process test_process2 {
  input:
    tuple val(id), path(input), path(input2, optional: true) // this setting allows input2 to be null
  output:
    tuple val(id), path("proc2_output.txt")
  script:
    """
    cat $input > proc2_output.txt
    ${ input2 ? "cat ${input2} >> proc2_output.txt" : "" } 
    """ // <- no need to check if input2 is a list
}

workflow {
  Channel.fromList( ["foo", "bar"] )
    | test_process // no need to map output of test_process
    | test_process2
}

What are your thoughts @pditommaso ?

@rcannood
Copy link
Author

rcannood commented Mar 21, 2022

I noticed in the documentation that the 'join' operator can also result in the creation of tuples that have null values inside (when remainder = true).

Would it be reasonable to already accept this PR, and work out how to deal with null inputs in processes in a separate issue? :)

@pditommaso
Copy link
Member

That's a good point, I think this feature is useful, but some points need to be taken into consideration:

  1. what happens when a null is received by process declaring either an input val or path
  2. what happens when a null element is received by a process declaring an input tuple
  3. I think using optional for both the existing mechanism and the new one could bring a serious semantics inconstancy. When using the former the output is NOT emitted when missing. Using the new approach is emitted with null. Imagine the explain the difference between output: val(x) optional and output: val(x, optional:true).
  4. Following the above it should be taken into consideration to use a different name for the new one, e.g. nullable or allowNull or allowEmpty (other?)

@pditommaso
Copy link
Member

We discussed a bit with @jorgeaguileraseqera how to move one on this. The idea is to add the option allowNull:true to output path emit Nullable object when a value/path is missing.

In a symmetry manner the allowNull:true should be allowed for an input path. When such Nullable file is received is mapped to a non-existing Path object.

@bentsherman
Copy link
Member

Closing in favor of #2893

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants