# htmap

This is a notebook to show how the prototype `htmap` library (https://github.com/JoshKarpel/htmap) works.

If you've messed up your cache directory somehow, running `rm -rf ~/.htmap` will delete everything in your cache directory.

In [2]:
import htmap

## Functional Interface (`map`-like)

`htmap` currently has two interfaces. The first is a very "functional", map-based interface.

In [None]:
def double(x):
    return 2 * x

Python's built-in `map` function works like this:

In [None]:
doubled = list(map(double, range(10)))
doubled

To do the same with `htmap`, we just use the `map` function it provides instead. Note that `htmap` has persistence for completed jobs, so if you get a `clusterid` of `None`, you already have the outputs for all of your inputs cached.

In [None]:
result = htmap.map('double', double, range(20))
result

That function returns a `MapResult` which we can use to get information about the running jobs.

For example, we can call its `tail` method to tail the cluster log. Note that this runs forever, so you'll need to interrupt the Jupyter kernel (black square along the top bar) to run the next cell.

In [None]:
result.tail()

To see the results, we iterate over the `MapResult` (passing it into the `list` constructor does this interally).

In [None]:
doubled_htc = list(result)
doubled_htc

## Functional Interface w/ Decorator

The second interface has the same functional flavor to it, but uses a decorator on the function itself.

For those who care, the first interface is doing the same thing, but just hides the decorator from you.

I'll also use a slightly more complicated function to show off some other features. This function has two arguments, and one of them is a keyword argument.

In [None]:
@htmap.htmap
def power(x, p = 1):
    return x ** p

power

As you can see, `power` is not actually a function, but instead a `HTCMapper` which has a reference to the real function inside it. Because of Python voodoo, you can still call it like a normal function:

In [None]:
power(5, 3)

We can't use `map` now because it only accepts a one-dimensional input. Instead, we'll use `starmap`. Both `map` and `starmap` are now methods of the `HTCMapper` object. That does mean we have to contort things a little so that we're passing lists of tuples and dictionaries to `starmap`, which looks a little weird.

In [None]:
xs = [(x,) for x in range(10)]
powers = [{'p': p} for p in range(10)]

power_result = power.starmap(xs, powers)
power_result

We can iterate over the result ourselves. By doing it this way, they'll come back in order as soon as possible. The outputs should be 0^0, 1^1, 2^2, 3^3, etc. We'll use the `iter_with_inputs` method to see how the inputs are mapped to the outputs.

In [None]:
for inp, out in power_result.iter_with_inputs():
    print(f'{inp} -> {out}')

## Looping Interface

The other interface is built to look like the same looping constructs that people are probably using before they start doing any HTC.

It relies on Python's `with` statement, which lets you run code before and after a block of code runs. It looks like this.

In [None]:
def triple(x):
    return 3 * x

In [None]:
with htmap.build_job('triple', triple) as job_builder:
    for x in range(10):
        job_builder(x)
        
triple_result = job_builder.result
triple_result

Note that once we create the `JobBuilder`, stored in the variable `job_builder`, just call it as if it was the function we wanted to do a map on. The `JobBuilder` catches the calls and feeds them into the same backend that does the mapping above. I really like this because it's super-simple: you don't need to do anything weird with the arguments to fit them into the right shape for the map. If you can call your function normally, you can slap it in this `with` block, replace it with the `JobBuilder`, and do the map.

This time we'll iterate in an unordered way, as jobs come back (the previous iterators went in order, as available).

In [None]:
for r in triple_result.iter_as_available():
    print(r)

## Looping Interface w/ Decorator

Again, it's essentially the same, it's just that `build_job` is a method of the decorated function.

In [None]:
@htmap.htmap
def quadruple(x):
    return 4 * x

In [None]:
with quadruple.build_job('quadruple') as job_builder:
    for x in range(10):
        job_builder(x)
        
quadruple_result = job_builder.result
quadruple_result

In [None]:
for r in quadruple_result:
    print(r)

## Killing a Job

We can kill all the jobs associated with a `MapResult` using the `remove()` method. At the moment, this does not remove any input or output files.

In [3]:
import time

@htmap.htmap
def sleep_and_double(x):
    time.sleep(60)
    return 2 * x

In [12]:
sleepy_result = sleep_and_double.map(range(10))

time.sleep(3)

!condor_q

rm_output = sleepy_result.remove()
print('OUTPUT FROM REMOVE COMMAND')
print(rm_output)

time.sleep(3)

!condor_q



-- Schedd: jupyter0000.chtc.wisc.edu : <127.0.0.1:9618?... @ 07/30/18 20:59:54
OWNER  BATCH_NAME          SUBMITTED   DONE   RUN    IDLE  TOTAL JOB_IDS
karpel sleep_and_double   7/30 20:59      _      4      6     10 38.0-9

Total for query: 10 jobs; 0 completed, 0 removed, 6 idle, 4 running, 0 held, 0 suspended 
Total for karpel: 10 jobs; 0 completed, 0 removed, 6 idle, 4 running, 0 held, 0 suspended 
Total for all users: 15 jobs; 0 completed, 0 removed, 6 idle, 4 running, 5 held, 0 suspended

OUTPUT FROM REMOVE COMMAND

    [
        TotalJobAds = 15; 
        TotalPermissionDenied = 0; 
        TotalAlreadyDone = 0; 
        TotalNotFound = 0; 
        TotalSuccess = 10; 
        TotalChangedAds = 1; 
        TotalBadStatus = 0; 
        TotalError = 0
    ]


-- Schedd: jupyter0000.chtc.wisc.edu : <127.0.0.1:9618?... @ 07/30/18 20:59:57
OWNER BATCH_NAME      SUBMITTED   DONE   RUN    IDLE   HOLD  TOTAL JOB_IDS

Total for query: 0 jobs; 0 completed, 0 removed, 0 idle, 0 running, 0