# Debugging & Wrapping things up

<table style="width:90%">
    <tr><td style="text-align:left;"><a href="./4 - Intro to Workflows.ipynb">Previous (Intro to Workflows)</a><td style="text-align:right;"></td></tr>
</table>

In the case something goes wrong on the Parsl side of things there are a few ways to try and debug the code. There are debug arguments to many of the classes (executors, etc.) which can output more information to the logs. You can also specify where these logs go in some instances. Here we will look at the default logging infrastructure. There are many different layouts for the logging infrastructure on disk, depending on the number of executors, type of channels, number of apps, etc. But they all follow the same general pattern. 

## Local Logs

Logs on the local machine (e.g. where you are executing the main script) are put in the following structure below. The root path is where the main script was executed from.

```
cmd_parsl.slurm.1675882844.0374064.sh
runinfo/
    000/
        parsl.log
        local_tpex/
            interchange.log
            block-0/
                321f496c86b4/
                    manager.log
                    worker_0.log
                    worker_1.log
                    worker_2.log
        submit_scripts/
            parsl.slurm.1675882844.0374064.submit
            parsl.slurm.1675882844.0374064.submit.stderr
            parsl.slurm.1675882844.0374064.submit.stdout
    001/
    002/

```

| File                    | Description                                    |
| :---------------------- | :--------------------------------------------- |
| cmd_parsl.slurm.1675882844.0374064.sh | master script that starts the parsl system |
| 000, 001, ...           | there is one directory for each parsl execution |
| parsl.log               | stdout/stderr from the main parsl thread, logs messages about configuration, task status, etc. |
| ICC_htex                | there will be a directory for each executor (named by the name you gave the executor) |
| interchange.log         | messages from the executor connection to the workers |
| block-0                 | directory for each block in the job |
| 321f496c86b4            | randomly generated directory |
| manager.log             | stdout/stderr from manager which handles worker communication and spawning |
| submit_scripts          | there will be at least on directory for each type of submit script (the names all end in submit_scritps) |
| parsl.1675882844.0374064.submit | the script that will be run on the submit node, in this case it sets up the environment and wraps the srun command, there could be several scripts |
| parsl.1675882844.0374064.submit.stderr/stdout | stdout/stderr from running the submit script |

## Exceptions

Parsl is designed to capture, track, and handle various errors occurring during execution, including those related to the program, apps, execution environment, and Parsl itself. It also provides functionality to appropriately respond to failures during execution. Parsl handles the different types of errors (within Parsl, within app, node failure, etc.) in different ways. A good description is given on the Parsl <a href='https://parsl.readthedocs.io/en/stable/userguide/exceptions.html'>user guide</a>.

## Retries

In case of transient errors, Parsl has built in retry capabilities for tasks. This can be controlled by arguments to the configuration.

## Lazy Failure

Parsl implements a lazy failure model through which a workload will continue to execute in the case that some tasks fail. That is, the program will not halt as soon as it encounters a failure, rather it will continue to execute unaffected apps.

# Memoization and checkpointing

When an app is invoked several times with the same parameters, Parsl can reuse the result from the first invocation without executing the app again.

 * App caching will allow reuse of results within the same run
 * Checkpointing will store results on the filesystem and reuse those results in later runs
 
# Monitoring

Parsl has a monitoring system which captures task state and resource usage over time. The data are stored in an SQLite database and there is a script interface to visualize the data. Note that this is still under development.

<table style="width:90%">
    <tr><td style="text-align:left;"><a href="./4 - Intro to Workflows.ipynb">Previous (Intro to Workflows)</a><td style="text-align:right;"></td></tr>
</table>