# Error Handling
[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/htcondor/htmap/master?urlpath=lab/tree/tutorials/error-handling.ipynb)

## Holds

In previous tutorials we mentioned that HTMap is able to track the status of your components and inform you about something called a "hold".
A hold occurs when HTCondor notices something wrong about your map component.
Perhaps an input file is missing, or your component tried to open a file that didn't exist.

The last one is easy to force, so let's do it and see what happens:

In [1]:
import htmap

@htmap.mapped
def foo(_):
    return "I didn't get held!"

In [2]:
path = htmap.TransferPath('this-file-does-not-exist')
will_get_held = foo.map(
    [path],
)

created map dark-swift-heel with 1 components


We know that the component will fail, but HTMap won't know about it until we try to look at the output:

In [3]:
print(will_get_held.get(0))

MapComponentHeld: component 0 of map dark-swift-heel is held: [13] Error from slot1@jupyter-htcondor-2dhtmap-2dbjvt9l6e: SHADOW at 10.12.70.166 failed to send file(s) to <10.12.70.166:41467>: error reading from /home/jovyan/tutorials/this-file-does-not-exist: (errno 2) No such file or directory; STARTER failed to receive file(s) from <10.12.70.166:9618>

Yikes!
HTMap has raised an exception to inform us that a component of our map got held.
It also tells us why HTCondor held the component: `error reading from /home/jovyan/tutorials/this-file-does-not-exist: (errno 2) No such file or directory; STARTER failed to receive file(s) from <172.17.0.2:9618>`.

This time around the hold reason is pretty clear: a local file that HTCondor expected to exist didn't.
We could fix the problem by creating the file, and then releasing the map, which tells HTCondor to try again:

In [4]:
path.touch()  # this creates an empty file

Now the map will run successfully.
We tell HTMap to "release" the hold, allowing the map to continue running.

In [5]:
will_get_held.release()
print(will_get_held.get(0))

I didn't get held!


And, of course, clean up:

In [6]:
path.unlink()

Unfortunately, holds will often not be so easy to resolve.
Sometimes they are simply ephemeral errors that can be resolved by releasing the map without changing anything.
But sometimes you'll need to talk to your HTCondor pool administrator to figure out what's going wrong.

## Execution Errors

HTMap can also detect Python exceptions that occur during component execution.
To see this in action, let's define a function where a component will have a problem:

In [7]:
@htmap.mapped
def inverse(x):
    return 1 / x

When `x = 0`, `inverse(x)` will fail with a `ZeroDivisionError`.
If we run it locally, the error will halt execution and drop a traceback into our laps:

In [8]:
inverse(0)

ZeroDivisionError: division by zero

The traceback has a lot of critically-useful information in it. In fact, it tells us exactly the line that raised the error (remember that tracebacks should be read in reverse - the last block of source code is where the error began).

HTMap is able to transport this kind of information back from an executing component, but like the regular output of a map we won't see it until we try to load up the output for the failed component.
We'll make a one-component map to demonstrate what happens:

In [9]:
bad_map = inverse.map([0])
bad_map.get(0)

created map swift-wicked-badge with 1 components


MapComponentError: component 0 of map swift-wicked-badge encountered error while executing. Error report:
========  Start error report for component 0 of map swift-wicked-badge  ========
Landed on execute node jupyter-htcondor-2dhtmap-2dbjvt9l6e (10.12.70.166) at 2019-02-12 03:21:25.332392

Python executable is /opt/conda/bin/python3 (version 3.6.6 final)
with installed packages
  alembic==0.9.9
  asn1crypto==0.24.0
  async-generator==1.10
  backcall==0.1.0
  beautifulsoup4==4.6.3
  bleach==3.0.2
  bokeh==0.13.0
  certifi==2018.10.15
  cffi==1.11.5
  chardet==3.0.4
  Click==7.0
  click-didyoumean==0.0.3
  cloudpickle==0.6.1
  colorama==0.3.9
  conda==4.5.11
  cryptography==2.3.1
  cryptography-vectors==2.3.1
  cursor==1.2.0
  cycler==0.10.0
  Cython==0.28.5
  dask==0.19.4
  decorator==4.3.0
  dill==0.2.8.2
  entrypoints==0.2.3
  enum34==1.1.6
  fastcache==1.0.2
  gmpy2==2.0.8
  h5py==2.7.1
  halo==0.0.23
  htcondor==8.8.0
  htmap==0.2.0
  idna==2.7
  imageio==2.3.0
  ipykernel==5.1.0
  ipython==7.0.1
  ipython-genutils==0.2.0
  ipywidgets==7.2.1
  jedi==0.13.1
  Jinja2==2.10
  jsonschema==2.6.0
  jupyter-client==5.2.3
  jupyter-core==4.4.0
  jupyterhub==0.9.4
  jupyterlab==0.34.12
  jupyterlab-launcher==0.13.1
  kiwisolver==1.0.1
  llvmlite==0.23.0
  log-symbols==0.0.12
  Mako==1.0.7
  MarkupSafe==1.0
  matplotlib==2.2.3
  mistune==0.8.4
  mpmath==1.0.0
  nbconvert==5.3.1
  nbformat==4.4.0
  nbstripout==0.3.3
  networkx==2.2
  notebook==5.7.0
  numba==0.38.1
  numexpr==2.6.8
  numpy==1.13.3
  olefile==0.46
  packaging==18.0
  pamela==0.3.0
  pandas==0.23.4
  pandocfilters==1.4.2
  parso==0.3.1
  patsy==0.5.0
  pexpect==4.6.0
  pickleshare==0.7.5
  Pillow==5.3.0
  prometheus-client==0.4.2
  prompt-toolkit==2.0.6
  protobuf==3.6.1
  ptyprocess==0.6.0
  pycosat==0.6.3
  pycparser==2.19
  pycurl==7.43.0.2
  Pygments==2.2.0
  pyOpenSSL==18.0.0
  pyparsing==2.2.2
  PySocks==1.6.8
  python-dateutil==2.7.3
  python-editor==1.0.3
  python-oauth2==1.0.1
  pytz==2018.6
  PyWavelets==1.0.1
  PyYAML==3.13
  pyzmq==17.1.2
  requests==2.20.0
  ruamel-yaml==0.15.71
  scikit-image==0.14.1
  scikit-learn==0.19.2
  scipy==1.1.0
  seaborn==0.9.0
  Send2Trash==1.5.0
  simplegeneric==0.8.1
  six==1.11.0
  spinners==0.0.23
  SQLAlchemy==1.2.12
  statsmodels==0.9.0
  sympy==1.1.1
  termcolor==1.1.0
  terminado==0.8.1
  testpath==0.4.2
  toml==0.10.0
  toolz==0.9.0
  tornado==5.1.1
  tqdm==4.31.1
  traitlets==4.3.2
  urllib3==1.23
  vincent==0.4.4
  wcwidth==0.1.7
  webencodings==0.5.1
  widgetsnbextension==3.2.1
  xlrd==1.1.0

Working directory contents are
  /home/jovyan/.condor/state/execute/dir_64637/.machine.ad
  /home/jovyan/.condor/state/execute/dir_64637/.chirp.config
  /home/jovyan/.condor/state/execute/dir_64637/0.in
  /home/jovyan/.condor/state/execute/dir_64637/condor_exec.exe
  /home/jovyan/.condor/state/execute/dir_64637/_condor_stderr
  /home/jovyan/.condor/state/execute/dir_64637/.job.ad
  /home/jovyan/.condor/state/execute/dir_64637/func
  /home/jovyan/.condor/state/execute/dir_64637/_condor_stdout
  /home/jovyan/.condor/state/execute/dir_64637/_htmap_transfer

Exception and traceback (most recent call last):
  File "<ipython-input-7-769ac4dfb4b6>", line 3, in inverse
    return 1 / x

    Local variables:
      x = 0

  ZeroDivisionError: division by zero

=========  End error report for component 0 of map swift-wicked-badge  =========

Neat!
This traceback is, unfortunately, harder to read than the other one.
We need to ignore everything above `MapComponentError: component 0 of map barbed-tan-robe encountered stderr while executing. Error report:` - it's just about the internal error that HTMap is raising to propagate the error to us.
The real error is the stuff below `=========  Start error report for component 0 of map barbed-tan-robe  =========`.

Since we're trying to debug remotely, HTMap has gathered some metadata about the HTCondor "execute node" where the component was running.
First it tell us where it is and when the component started executing.
Next, the report tells us about the Python environment that was used to execute your function, including a list of installed packages.
We also get a listing of the contents of the working directory - in this example, because we didn't add any extra input files, it's just a bunch of files that HTCondor and HTMap are using.

The meat of the error is the last thing in the error report.
We get roughly the same information that we got in the local traceback, but we also get a printout of the local variables in each stack frame.

Since the local HTMap error is raised as soon as it finds a bad component, you may find it convenient to look at _all_ of the error reports for your map (hopefully not too many!).
[htmap.Map.error_reports](../api.rst#htmap.Map.error_reports) provides exactly this functionality:

In [10]:
worse_map = inverse.map([0, 0, 0])
for report in worse_map.error_reports():
    print(report + '\n')

created map prim-sleek-frog with 3 components
Landed on execute node jupyter-htcondor-2dhtmap-2dbjvt9l6e (10.12.70.166) at 2019-02-12 03:22:16.924484

Python executable is /opt/conda/bin/python3 (version 3.6.6 final)
with installed packages
  alembic==0.9.9
  asn1crypto==0.24.0
  async-generator==1.10
  backcall==0.1.0
  beautifulsoup4==4.6.3
  bleach==3.0.2
  bokeh==0.13.0
  certifi==2018.10.15
  cffi==1.11.5
  chardet==3.0.4
  Click==7.0
  click-didyoumean==0.0.3
  cloudpickle==0.6.1
  colorama==0.3.9
  conda==4.5.11
  cryptography==2.3.1
  cryptography-vectors==2.3.1
  cursor==1.2.0
  cycler==0.10.0
  Cython==0.28.5
  dask==0.19.4
  decorator==4.3.0
  dill==0.2.8.2
  entrypoints==0.2.3
  enum34==1.1.6
  fastcache==1.0.2
  gmpy2==2.0.8
  h5py==2.7.1
  halo==0.0.23
  htcondor==8.8.0
  htmap==0.2.0
  idna==2.7
  imageio==2.3.0
  ipykernel==5.1.0
  ipython==7.0.1
  ipython-genutils==0.2.0
  ipywidgets==7.2.1
  jedi==0.13.1
  Jinja2==2.10
  jsonschema==2.6.0
  jupyter-client==5.2.3
  jup

Unlike holds, you generally won't want to re-run components that experienced errors (they'll just fail again).
Instead, an error is generally a signal that you've got a bug in your own code.
Remove your map, debug the error locally, then create a new map.