# Troubleshooting

This tutorial steps through tecnhiques to identify errors and pipeline failures, and
avoid common pitfalls.

In [3]:
# This is needed to run parallel workflows in Jupyter notebooks
import nest_asyncio
nest_asyncio.apply()


## Things to check first

### Running in *debug* mode

By default, Pydra will run with the *debug* worker, which executes each task serially
within a single process without use of `async/await` blocks, to allow raised exceptions
to propagate gracefully to the calling code. If you are having trouble with a pipeline,
ensure that `worker=debug` is passed to the submission/execution call (the default).


## Enclosing multi-process code within `if __name__ == "__main__"`

If using the concurrent futures worker (`worker="cf"`) on macOS or Windows, then you need
to enclose top-level scripts within `if __name__ == "__main__"` blocks, e.g.

In [None]:
from pydra.tasks.testing import UnsafeDivisionWorkflow
from pydra.engine.submitter import Submitter

# This workflow will fail because we are trying to divide by 0
wf = UnsafeDivisionWorkflow(a=10, b=5, denominator=2)

if __name__ == "__main__":
    with Submitter(worker="cf") as sub:
        result = sub(wf)


### Remove stray lockfiles

During the execution of a task, a lockfile is generated to signify that a task is running.
These lockfiles are released after a task completes, either successfully or with an error,
within a *try/finally* block. However, if a task/workflow is terminated by an interactive
debugger the finally block may not be executed causing stray lockfiles to hang around. This
can cause the Pydra to hang waiting for the lock to be released. If you suspect this to be
an issue, and there are no other jobs running, then simply remove all lock files from your
cache directory (e.g. `rm <your-run-cache-dir>/*.lock`) and re-submit your job.

If the  `clean_stale_locks` flag is set (by default when using the *debug* worker), locks that
were created before the outer task was submitted are removed before the task is run.
However, since these locks could be created by separate submission processes, ``clean_stale_locks`
is not switched on by default when using production workers (e.g. `cf`, `slurm`, etc...).

## Locating error messages

If running in debug mode (the default), runtime exceptions will be raised to the
call shell or debugger. However, when using asynchronous workers the errors will
be saved in `_error.pklz` pickle files inside the task's cache directory. For
example, given the following toy example

In [None]:
from pydra.tasks.testing import UnsafeDivisionWorkflow
from pydra.engine.submitter import Submitter
import nest_asyncio

# This is needed to run parallel workflows in Jupyter notebooks
nest_asyncio.apply()

# This workflow will fail because we are trying to divide by 0
failing_workflow = UnsafeDivisionWorkflow(a=10, b=5).split(denominator=[3, 2 ,0])

with Submitter(worker="cf") as sub:
    result = sub(failing_workflow)
    
if result.errored:
    print("Workflow failed with errors:\n" + str(result.errors))
else:
    print("Workflow completed successfully :)")

## Tracing upstream issues

Failures are common in scientific analysis, even for well tested workflows, due to
the novel nature and of scientific experiments and known artefacts that can occur.
Therefore, it is always to sanity-check results produced by workflows. When a problem
occurs in a multi-stage workflow it can be difficult to identify at which stage the
issue occurred.