# Failed Tasks

Sometimes tasks can fail. Let's see how to deal with failed tasks in brigade.

Let's start as usual with the needed boilerplate:

In [1]:
from brigade.core import InitBrigade
from brigade.plugins.tasks import networking, text
from brigade.plugins.functions.text import print_result

brg = InitBrigade(config_file="config.yaml")
cmh = brg.filter(site="cmh", type="network_device")

Now, as an example we are going to use a similar task group like the one we used in the previous tutorial:

In [2]:
def basic_configuration(task):
    # Transform inventory data to configuration via a template file
    r = task.run(task=text.template_file,
                 name="Base Configuration",
                 template="base.j2",
                 path=f"templates/junos")

    # Save the compiled configuration into a host variable
    task.host["config"] = r.result

    # Deploy that configuration to the device using NAPALM
    task.run(task=networking.napalm_configure,
             name="Loading Configuration on the device",
             replace=False,
             configuration=task.host["config"])

Note that the path is hardcoded to templates/junos, this should cause an error when trying to apply the configuration to the EOS devices. Let's see what happens:

In [3]:
result = cmh.run(task=basic_configuration)

Let's inspect the object:

In [4]:
result.failed

True

In [5]:
result.failed_hosts

{'leaf00.cmh': MultiResult: [Result: "basic_configuration", Result: "Base Configuration", Result: "Loading Configuration on the device"],
 'spine00.cmh': MultiResult: [Result: "basic_configuration", Result: "Base Configuration", Result: "Loading Configuration on the device"]}

In [6]:
result['spine00.cmh'][1].exception

As you can see, the result object is aware something went wrong and you can inspect the errors if you so desire.

You can also using the `print_result` function on it:

In [7]:
print_result(result)

[1m[36mbasic_configuration*************************************************************[0m
[0m[1m[34m* spine00.cmh ** changed : False ***********************************************[0m
[0m[1m[32mvvvv basic_configuration ** changed : False vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv INFO[0m
[0m[1m[32m---- Base Configuration ** changed : False ------------------------------------- INFO[0m
[0msystem {
  host-name spine00.cmh;
  domain-name cmh.acme.local;
}[0m
[0m[1m[31m---- Loading Configuration on the device ** changed : False -------------------- ERROR[0m
[0mTraceback (most recent call last):
  File "/Users/dbarroso/.virtualenvs/brigade/lib/python3.6/site-packages/napalm/eos/eos.py", line 231, in _load_config
    self.device.run_commands(commands)
  File "/Users/dbarroso/.virtualenvs/brigade/lib/python3.6/site-packages/pyeapi/client.py", line 730, in run_commands
    response = self._connection.execute(commands, encoding, **kwargs)
  File "/Users/dbarroso/.virtualenvs/br

There is also a method that will raise an exception if the task had an error:

In [8]:
from brigade.core.exceptions import BrigadeExecutionError
try:
    result.raise_on_error()
except BrigadeExecutionError:
    print("ERROR!!!")

ERROR!!![0m
[0m

## Skipped hosts

Brigade will keep track of hosts that failed and won't run future tasks on them:

In [9]:
from brigade.core.task import Result

def hi(task):
    return Result(host=task.host, result=f"{task.host.name}: Hi, I am still here!")
    
result = cmh.run(task=hi)

In [10]:
print_result(result)

[1m[36mhi******************************************************************************[0m
[0m[1m[34m* spine01.cmh ** changed : False ***********************************************[0m
[0m[1m[32mvvvv hi ** changed : False vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv INFO[0m
[0mspine01.cmh: Hi, I am still here![0m
[0m[1m[32m^^^^ END hi ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^[0m
[0m[1m[34m* leaf01.cmh ** changed : False ************************************************[0m
[0m[1m[32mvvvv hi ** changed : False vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv INFO[0m
[0mleaf01.cmh: Hi, I am still here![0m
[0m[1m[32m^^^^ END hi ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^[0m
[0m

You can force the execution of tasks on failed hosts by passing the arguments `run_on_failed=True`:

In [11]:
result = cmh.run(task=hi, on_failed=True)
print_result(result)

[1m[36mhi******************************************************************************[0m
[0m[1m[34m* spine01.cmh ** changed : False ***********************************************[0m
[0m[1m[32mvvvv hi ** changed : False vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv INFO[0m
[0mspine01.cmh: Hi, I am still here![0m
[0m[1m[32m^^^^ END hi ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^[0m
[0m[1m[34m* leaf01.cmh ** changed : False ************************************************[0m
[0m[1m[32mvvvv hi ** changed : False vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv INFO[0m
[0mleaf01.cmh: Hi, I am still here![0m
[0m[1m[32m^^^^ END hi ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^[0m
[0m[1m[34m* spine00.cmh ** changed : False ***********************************************[0m
[0m[1m[32mvvvv hi ** changed : False vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv INFO[0m
[0mspine00.cmh: Hi, I a

You can also exclude the hosts that are "good" if you want to with the `run_on_good` flag:

In [12]:
result = cmh.run(task=hi, on_failed=True, on_good=False)
print_result(result)

[1m[36mhi******************************************************************************[0m
[0m[1m[34m* spine00.cmh ** changed : False ***********************************************[0m
[0m[1m[32mvvvv hi ** changed : False vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv INFO[0m
[0mspine00.cmh: Hi, I am still here![0m
[0m[1m[32m^^^^ END hi ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^[0m
[0m[1m[34m* leaf00.cmh ** changed : False ************************************************[0m
[0m[1m[32mvvvv hi ** changed : False vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv INFO[0m
[0mleaf00.cmh: Hi, I am still here![0m
[0m[1m[32m^^^^ END hi ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^[0m
[0m

To achieve this `brigade` keeps a list failed hosts in it's shared [data](../../ref/api/brigade.rst#brigade.core.Data) object:

In [13]:
brg.data.failed_hosts

{'leaf00.cmh', 'spine00.cmh'}

If you want to mark some hosts as succeeded and make them back eligible for future tasks you can do it individually per host with the function [recover_host](../../ref/api/brigade.rst#brigade.core.Data.recover_host) or reset the list completely with [reset_failed_hosts](../../ref/api/brigade.rst#brigade.core.Data.reset_failed_hosts):

In [14]:
brg.data.reset_failed_hosts()
brg.data.failed_hosts

set()

## Raise on error automatically

Alternatively, you can configure brigade to raise the exception automatically in case of error with the `raise_on_error` configuration option:

In [15]:
brg = InitBrigade(config_file="config.yaml", raise_on_error=True)
cmh = brg.filter(site="cmh", type="network_device")
try:
    cmh.run(task=basic_configuration)
except BrigadeExecutionError:
    print("ERROR!!!")

ERROR!!![0m
[0m

## Workflows

The default workflow should work for most use cases as hosts with errors are skipped and the `print_result` should give enough information to understand what's going on. For more complex workflows this framework should give you enough room to easily implement them regardless of the complexity.