Error message control #725

shaze · 2018-06-04T14:06:38Z

I would like to have a feature to allow me to control precisely what the error message is if something fails, suppressing the normal NextFlow error message and stack track.

Sometime the workflow runs into an error because of some problem with the data and I want to print a clear error message to the user to explain the problem and how to fix it. However, the normal error message is very verbose and for someone who is not the developer of the workflow very confusing. This is particularly the case when the script being run in a template where the whole templatised code gets dumped out as the "script".

What I'd want is a directive like

errorMessage 45 "pheno_err_msg.py $fname"

which would mean "if any step in the script returns errror code 45, call the script bin/pheno_err_msg.py"

pditommaso · 2018-06-04T19:30:26Z

I got the idea of the errorMessage directive but I'm not understanding the string "pheno_err_msg.py $fname". Should it execute a python command ?

stevekm · 2018-06-06T16:05:15Z

Sometime the workflow runs into an error because of some problem with the data

Do you mean problems with the data that is produced while running the pipeline? In those cases you might consider using the filter or choice operators to check for bad data and remove it from the channel before it gets to the next process and breaks. You can combine this with a collectFile to implement custom logging of 'bad' datasets without breaking the entire pipeline (example here)

shaze · 2018-06-07T05:50:55Z

@pditommaso We want to be able to control the output, which is why I thought it might be good to be able to call a Python program to produce the output. Perhaps this is somethig that the main script of the process could produce an output file, which could be output. I suppose what I want is an exception handler.

@stevekm This works will in many cases, but this assumes that the checking is cheap. It also requires us to write two sets of code to manage the same data. Where we have many large data sets, it wouldn't be computationally feasible to sequentially check all the data before running processes. Rather, it's a the point of actual processing we can have an exception and handle it.

Let me give two different examples, simplifications of a real cases I've had. Someone runs a workflow and one step runs PLINK on a large file -- but they've done something stupid (e.g. not provided the appropriate phenotype file) and PLINK then fails. I can check the PLINK log file and print out an appropriate error message. Here I can only reasonably detect there's a problem with the data when PLINK fails. I couldn't duplicate this cde.

We're processing 100s of 100MB size data inputting tables and we get a bad value in a column in one file which means we can no longer continue. This is not something that I could write Groovy code to check beforehand, practically, since it would take far too long (and means I have to duplicate code). My processing code (in Python) has an exception hander for when it detects errors -- I want to be able to print out a simple message saying : "File "dataABD" column "SNPcount" line 3189 has an error". See directory /.../.../work/ef/272861abce77827171"

I don't wan't to print out the normal trace, and I don't want to print the script (which especially if a template is very confusing for others)

pditommaso · 2018-06-13T07:33:26Z

I understand but I'm not sure to support this feature because there's no way to prevent a user to execute compute/memory intensive tasks by using this mechanism.

Unless the job is not killed abruptly by the cluster you should be able to wrap your task execution by another script that terminate gracefully and return a more informative error message if the task fail.

shaze · 2018-06-13T08:36:00Z

Would this be possible though

suppressing the normal trace and error message
printing out a script-created file (even .command.err)

be possible.

A particular problem is template scripts -- which when it fails prints the whole template which could be hundreds of line long. So suppressing normal error messages and trace is important

pditommaso · 2018-06-18T13:29:43Z

Basically an errorTerse mode that would only show the stderr output?

shaze · 2018-06-18T14:47:39Z

Perfect — that would be great thanks Scott On 18 Jun 2018, at 17:29, Paolo Di Tommaso <notifications@github.com<mailto:notifications@github.com>> wrote: Basically an errorTerse mode that would only show the stderr output? This communication is intended for the addressee only. It is confidential. If you have received this communication in error, please notify us immediately and destroy the original message. You may not copy or disseminate this communication without the permission of the University. Only authorised signatories are competent to enter into agreements on behalf of the University and recipients are thus advised that the content of this message may not be legally binding on the University and may contain the personal views and opinions of the author, which are not necessarily the views and opinions of The University of the Witwatersrand, Johannesburg. All agreements between the University and outsiders are subject to South African Law unless the University agrees in writing to the contrary.

stale · 2020-04-27T06:50:00Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

mhoban · 2023-10-02T21:09:47Z

Basically an errorTerse mode that would only show the stderr output?

I think this would have been an extremely useful feature, but it seems to have been ignored? I'm in a similar situation where I can either fail silently using a filter command or some such (in which case the user has no way of telling what's gone wrong) or fail loudly with a ton of extra information (e.g., the content of the script) that is also not informative to the typical user. The ability to just say the pipeline failed and show an error message would be incredibly helpful.

MatteoSchiavinato · 2023-12-01T10:47:40Z

I associate myself to what @mhoban said. For example I'm now doing an integrity check on some gz files, which returns a nonzero exit code when they show lack of integrity.

Something like this would have been very nice to have:

process proc_name {

  input:
  file(gz_file)

  output:
  stdout 

  error:
  """
  The exit code was not 0, this means your gz files aren't OK. Pipeline will stop. 
  """

  script:
  """
  gzip -t ${gz_file}
  """
}

I am fully aware that the pipeline will stop and that I'll have all the information in the work directory and related files. But it would have been something useful to reduce the stderr clutter to the bare minimum in some selected situations where maybe someone else is running my pipeline :)

Just a thought.

bentsherman · 2023-12-01T17:44:10Z

You should be able to customize the error message by piggy backing on Nextflow's standard error message for a task failure, which includes the stderr of the task. So in your Bash script you could catch a particular exit code and print a custom error message:

gzip -t ${gz_file}
ret=$?
[[ $ret != 0 ]] && >&2 echo "Your gz files are not okay, please check them"
exit $ret

Then the task will fail and Nextflow will print the standard task failure message with stdout and stderr, including your message

mhoban · 2024-01-06T04:50:18Z

Respectfully, this suggestion doesn't address the issue both I and @MatteoSchiavinato have brought up. The point is not whether we can control what the error message is. The point is that when we do so it'd be very nice to have the option to display only the error message that would be helpful in the case, rather than all the other information that ends up being displayed (viz. script contents, working file path, etc.). Perhaps I'm missing something in what you're suggesting? Here is a pared-down example:

#!/usr/bin/env nextflow
nextflow.enable.dsl=2


process just_fail {
  output:
    stdout

  script:
  """
   >&2 echo "This is a bad error!"
   exit 1
  """
}

workflow {
  just_fail | view
}

When I run this, I get the following output:

What we would love is the option to display only the part within the "Command error" section and not all the other stuff. The "errorTerse" suggestion that was suggested above is exactly the sort of thing we're looking for.

pditommaso added kind/feature triage/needs-information labels Jun 4, 2018

pditommaso added pri/low and removed triage/needs-information labels Jul 19, 2018

pditommaso added the nfhack18 label Oct 3, 2018

stale bot added the wontfix label Apr 27, 2020

pditommaso added stale and removed wontfix labels Apr 27, 2020

stale bot closed this as completed Jun 27, 2020

bentsherman added the lang/processes label Nov 8, 2023

mhoban mentioned this issue Jan 6, 2024

Investigate whether it's possible to show less-ugly results when there's an error mhoban/eDNAFlow#42

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error message control #725

Error message control #725

shaze commented Jun 4, 2018

pditommaso commented Jun 4, 2018

stevekm commented Jun 6, 2018

shaze commented Jun 7, 2018

pditommaso commented Jun 13, 2018

shaze commented Jun 13, 2018

pditommaso commented Jun 18, 2018

shaze commented Jun 18, 2018 via email

stale bot commented Apr 27, 2020

mhoban commented Oct 2, 2023

MatteoSchiavinato commented Dec 1, 2023

bentsherman commented Dec 1, 2023 •

edited

mhoban commented Jan 6, 2024

Error message control #725

Error message control #725

Comments

shaze commented Jun 4, 2018

pditommaso commented Jun 4, 2018

stevekm commented Jun 6, 2018

shaze commented Jun 7, 2018

pditommaso commented Jun 13, 2018

shaze commented Jun 13, 2018

pditommaso commented Jun 18, 2018

shaze commented Jun 18, 2018 via email

stale bot commented Apr 27, 2020

mhoban commented Oct 2, 2023

MatteoSchiavinato commented Dec 1, 2023

bentsherman commented Dec 1, 2023 • edited

mhoban commented Jan 6, 2024

bentsherman commented Dec 1, 2023 •

edited