# HTMap

This is a notebook to show how the prototype `htmap` library (https://github.com/JoshKarpel/htmap) works.

If you've messed up your cache directory somehow, running `htmap.clean()` will delete everything so you can start fresh.
Note that each map call is given a `map_id`, which are unique.
To run a new map with the same `map_id`, you must remove the old one.
Tools for managing existing maps are shown in the section on management, after going through the interface sections.

In [None]:
import htmap

In [None]:
htmap.clean()

## Functional Interface (`map`-like)

`htmap` currently has two interfaces. The first is a very "functional", map-based interface.

In [None]:
def double(x):
    return 2 * x

Python's built-in `map` function works like this:

In [None]:
doubled = list(map(double, range(10)))
doubled

To do the same with `htmap`, we just use the `map` function it provides instead. Note that `htmap` has persistence for completed jobs, so if you get a `clusterid` of `None`, you already have the outputs for all of your inputs cached.

In [None]:
result = htmap.map('double', double, range(10))
result

That function returns a `MapResult` which we can use to get information about the running jobs.

For example, we can call its `tail` method to tail the cluster log.
It doesn't look backwards in the log, so if you wait too long you may never see anything, and it may take a moment to see anything when you start it.
Note that this runs forever, so you'll need to interrupt the Jupyter kernel (black square along the top bar) to run the next cell.

In [None]:
result.tail()

To see the results, we iterate over the `MapResult` (passing it into the `list` constructor does this internally).

In [None]:
doubled_htc = list(result)
doubled_htc

## Functional Interface w/ Decorator

The second interface has the same functional flavor to it, but uses a decorator on the function itself.

For those who care, the first interface is doing the same thing, but just hides the decorator from you.

I'll also use a slightly more complicated function to show off some other features. This function has two arguments, and one of them is a keyword argument.

In [None]:
@htmap.htmap
def power(x, p = 1):
    return x ** p

power

As you can see, `power` is not actually a function, but instead a `HTMapper` which has a reference to the real function inside it. Because of Python voodoo, you can still call it like a normal function, running entirely locally:

In [None]:
power(5, 3)

We can't use `map` now because it only accepts a one-dimensional input. Instead, we'll use `starmap`. Both `map` and `starmap` are now methods of the `HTMapper` object. That does mean we have to contort things a little so that we're passing lists of tuples and dictionaries to `starmap`, which looks a little weird.

In [None]:
xs = [(x,) for x in range(10)]
powers = [{'p': p} for p in range(10)]

power_result = power.starmap('power', xs, powers)
power_result

We can iterate over the result ourselves. By doing it this way, they'll come back in order as soon as possible. The outputs should be 0^0, 1^1, 2^2, 3^3, etc. We'll use the `iter_with_inputs` method to see how the inputs are mapped to the outputs.

In [None]:
for inp, out in power_result.iter_with_inputs():
    print(f'{inp} -> {out}')

## Looping Interface

The other interface is built to look like the same looping constructs that people are probably using before they start doing any HTC.

It relies on Python's `with` statement, which lets you run code before and after a block of code runs. It looks like this.

In [None]:
def triple(x):
    return 3 * x

In [None]:
with htmap.build_map('triple', triple) as map_builder:
    for x in range(10):
        map_builder(x)
        
triple_result = map_builder.result
triple_result

Note that once we create the `MapBuilder`, stored in the variable `map_builder`, we can just call it as if it was the function we wanted to do a map on. The `MapBuilder` catches the calls and feeds them into the same backend that does the mapping above. I really like this because it's super-simple: you don't need to do anything weird with the arguments to fit them into the right shape for the map. If you can call your function normally, you can slap it in this `with` block, replace it with the `MapBuilder`, and do the map.

This time we'll iterate in an unordered way, as jobs come back (the previous iterators went in order, as available).

In [None]:
for r in triple_result.iter_as_available():
    print(r)

## Looping Interface w/ Decorator

Again, it's essentially the same, it's just that `build_map` is a method of the decorated function.

In [None]:
@htmap.htmap
def quadruple(x):
    return 4 * x

In [None]:
with quadruple.build_map('quadruple') as map_builder:
    for x in range(10):
        map_builder(x)
        
quadruple_result = map_builder.result
quadruple_result

In [None]:
for r in quadruple_result:
    print(r)

## Controlling Jobs

You can interact with the jobs behind a map by calling methods on the `MapResult`. Let's define a sleepy function so that we have time to interact with the jobs while they're running.

I'll use the command line `condor_q` here to prove that it's really working - eventually you'll be able to get the same information from the `MapResult` itself.

In [None]:
import time

@htmap.htmap
def sleep_and_double(x):
    time.sleep(60)
    return 2 * x

We can kill all the jobs associated with a `MapResult` using the `remove()` method.
This also removes all of the input, output, and log files associated with that map.
Therefore, this also frees up the `map_id` to use for another map.

In [None]:
sleepy_result = sleep_and_double.map('sleepy', range(10))

time.sleep(3)

!condor_q

rm_output = sleepy_result.remove()
print('OUTPUT FROM REMOVE COMMAND')
print(rm_output)

time.sleep(3)

!condor_q

We can also hold and release jobs (and the rest of the job actions, but I won't go over them here).

In [None]:
sleepy_result = sleep_and_double.map('sleepy', range(10))

time.sleep(3)

!condor_q

hold_output = sleepy_result.hold()
print('OUTPUT FROM HOLD COMMAND')
print(hold_output)

time.sleep(1)

!condor_q

release_output = sleepy_result.release()
print('OUTPUT FROM RELEASE COMMAND')
print(release_output)

time.sleep(1)

!condor_q

## Map ID Management

To get a list of all of the `map_id`s you have stored, do

In [None]:
maps = htmap.maps()
maps

To recover an existing `map_id`, use the module-level `recover` function:

In [None]:
recovered_result = htmap.recover(maps[0])
print(list(recovered_result))

## Error Handling

Let's make a job that we know will experience an exception on the execute node.

In [None]:
@htmap.htmap
def bad(x):
    return x / 0

In [None]:
bad_result = bad.map('bad', range(10))
bad_result

Wait for the maps to finish (we can't use the `wait()` method because we aren't going to manage to produce any output files, which is what it's watching for).

In [None]:
bad_result.tail()

Now we can inspect the stdout and stderr of each job using the `output` and `error` methods on the `MapResult`.
The argument is the index of the input.

In [None]:
print(bad_result.output(0))

In [None]:
print(bad_result.error(0))