Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error message control #725

Closed
shaze opened this issue Jun 4, 2018 · 12 comments
Closed

Error message control #725

shaze opened this issue Jun 4, 2018 · 12 comments

Comments

@shaze
Copy link

shaze commented Jun 4, 2018

I would like to have a feature to allow me to control precisely what the error message is if something fails, suppressing the normal NextFlow error message and stack track.

Sometime the workflow runs into an error because of some problem with the data and I want to print a clear error message to the user to explain the problem and how to fix it. However, the normal error message is very verbose and for someone who is not the developer of the workflow very confusing. This is particularly the case when the script being run in a template where the whole templatised code gets dumped out as the "script".

What I'd want is a directive like

errorMessage 45 "pheno_err_msg.py $fname"

which would mean "if any step in the script returns errror code 45, call the script bin/pheno_err_msg.py"

@pditommaso
Copy link
Member

I got the idea of the errorMessage directive but I'm not understanding the string "pheno_err_msg.py $fname". Should it execute a python command ?

@stevekm
Copy link
Contributor

stevekm commented Jun 6, 2018

Sometime the workflow runs into an error because of some problem with the data

Do you mean problems with the data that is produced while running the pipeline? In those cases you might consider using the filter or choice operators to check for bad data and remove it from the channel before it gets to the next process and breaks. You can combine this with a collectFile to implement custom logging of 'bad' datasets without breaking the entire pipeline (example here)

@shaze
Copy link
Author

shaze commented Jun 7, 2018

@pditommaso We want to be able to control the output, which is why I thought it might be good to be able to call a Python program to produce the output. Perhaps this is somethig that the main script of the process could produce an output file, which could be output. I suppose what I want is an exception handler.

@stevekm This works will in many cases, but this assumes that the checking is cheap. It also requires us to write two sets of code to manage the same data. Where we have many large data sets, it wouldn't be computationally feasible to sequentially check all the data before running processes. Rather, it's a the point of actual processing we can have an exception and handle it.

Let me give two different examples, simplifications of a real cases I've had. Someone runs a workflow and one step runs PLINK on a large file -- but they've done something stupid (e.g. not provided the appropriate phenotype file) and PLINK then fails. I can check the PLINK log file and print out an appropriate error message. Here I can only reasonably detect there's a problem with the data when PLINK fails. I couldn't duplicate this cde.

We're processing 100s of 100MB size data inputting tables and we get a bad value in a column in one file which means we can no longer continue. This is not something that I could write Groovy code to check beforehand, practically, since it would take far too long (and means I have to duplicate code). My processing code (in Python) has an exception hander for when it detects errors -- I want to be able to print out a simple message saying : "File "dataABD" column "SNPcount" line 3189 has an error". See directory /.../.../work/ef/272861abce77827171"

I don't wan't to print out the normal trace, and I don't want to print the script (which especially if a template is very confusing for others)

@pditommaso
Copy link
Member

I understand but I'm not sure to support this feature because there's no way to prevent a user to execute compute/memory intensive tasks by using this mechanism.

Unless the job is not killed abruptly by the cluster you should be able to wrap your task execution by another script that terminate gracefully and return a more informative error message if the task fail.

@shaze
Copy link
Author

shaze commented Jun 13, 2018

Would this be possible though

  • suppressing the normal trace and error message
  • printing out a script-created file (even .command.err)

be possible.

A particular problem is template scripts -- which when it fails prints the whole template which could be hundreds of line long. So suppressing normal error messages and trace is important

@pditommaso
Copy link
Member

Basically an errorTerse mode that would only show the stderr output?

@shaze
Copy link
Author

shaze commented Jun 18, 2018 via email

@stale
Copy link

stale bot commented Apr 27, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the wontfix label Apr 27, 2020
@pditommaso pditommaso added stale and removed wontfix labels Apr 27, 2020
@stale stale bot closed this as completed Jun 27, 2020
@mhoban
Copy link

mhoban commented Oct 2, 2023

Basically an errorTerse mode that would only show the stderr output?

I think this would have been an extremely useful feature, but it seems to have been ignored? I'm in a similar situation where I can either fail silently using a filter command or some such (in which case the user has no way of telling what's gone wrong) or fail loudly with a ton of extra information (e.g., the content of the script) that is also not informative to the typical user. The ability to just say the pipeline failed and show an error message would be incredibly helpful.

@MatteoSchiavinato
Copy link

I associate myself to what @mhoban said. For example I'm now doing an integrity check on some gz files, which returns a nonzero exit code when they show lack of integrity.

Something like this would have been very nice to have:

process proc_name {

  input:
  file(gz_file)

  output:
  stdout 

  error:
  """
  The exit code was not 0, this means your gz files aren't OK. Pipeline will stop. 
  """

  script:
  """
  gzip -t ${gz_file}
  """
}

I am fully aware that the pipeline will stop and that I'll have all the information in the work directory and related files. But it would have been something useful to reduce the stderr clutter to the bare minimum in some selected situations where maybe someone else is running my pipeline :)

Just a thought.

@bentsherman
Copy link
Member

bentsherman commented Dec 1, 2023

You should be able to customize the error message by piggy backing on Nextflow's standard error message for a task failure, which includes the stderr of the task. So in your Bash script you could catch a particular exit code and print a custom error message:

gzip -t ${gz_file}
ret=$?
[[ $ret != 0 ]] && >&2 echo "Your gz files are not okay, please check them"
exit $ret

Then the task will fail and Nextflow will print the standard task failure message with stdout and stderr, including your message

@mhoban
Copy link

mhoban commented Jan 6, 2024

Respectfully, this suggestion doesn't address the issue both I and @MatteoSchiavinato have brought up. The point is not whether we can control what the error message is. The point is that when we do so it'd be very nice to have the option to display only the error message that would be helpful in the case, rather than all the other information that ends up being displayed (viz. script contents, working file path, etc.). Perhaps I'm missing something in what you're suggesting? Here is a pared-down example:

#!/usr/bin/env nextflow
nextflow.enable.dsl=2


process just_fail {
  output:
    stdout

  script:
  """
   >&2 echo "This is a bad error!"
   exit 1
  """
}

workflow {
  just_fail | view
}

When I run this, I get the following output:
image

What we would love is the option to display only the part within the "Command error" section and not all the other stuff. The "errorTerse" suggestion that was suggested above is exactly the sort of thing we're looking for.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants