# Example: Usage of law in notebooks

This example demonstrates how to define and run law tasks from within notebooks (jupyter or colab).
The usage is mostly identical to starting things from the command line, with a single difference regarding dynamic task definition.

### Setup

But first, let's make sure law is fully setup.

In [1]:
import os
import sys

!command -v law > /dev/null || pip install git+https://github.com/riga/law.git
![ -f "law.cfg" ] || wget -q https://raw.githubusercontent.com/riga/law/master/examples/notebooks/law.cfg

# check if we are in the law checkout
checkout_dir = os.path.normpath(os.path.join(os.getcwd(), "..", ".."))
in_checkout = len(set(os.listdir(checkout_dir)) & {"law", "examples", "docs"}) == 3

# amend the path if we are
if in_checkout:
    sys.path.insert(0, checkout_dir)

In the next cell, we import law and luigi, and load the law ipython magics

- `%law`, which runs the passed line in a subprocess, and
- `%ilaw`, which runs the passed line interactively in the current process.

Since we indent to run tasks by the running notebook kernel, we will only use the latter `%ilaw`.

In [2]:
import luigi
import law

law.contrib.load("ipython")

[0;49;32mINFO[0m: [0;49;39mlaw.contrib.ipython.magic[0m - [0;49;39mmagics successfully registered: %law, %ilaw[0m


The ipython magic functions are part of a `contrib` package, which is loaded by the last line.
After that, the package `law.ipython` is available.

### Define tasks

The purpose of the tasks we are going to define below is to calculate $\pi$ to a certain precision using [Machin's formula](https://en.wikipedia.org/wiki/Machin-like_formula).
Using this formula, one can calculate independent terms for certain orders which eventually can be added up to obtain an approximation for $\pi$.
This can be achieved with two simple tasks:

- `ComputeTerm`: Computes terms for a certain order (starting at 0).
- `ComputePi`: Sums up the results produced by multiple `ComputeTerm`'s (with a configurable precision) to obtain $\pi$.

##### BaseTask

For that, we first define a base task that serves two purposes.

1. It inherits from `law.ipython.Task`. By that, task classes can be redefined using the same name, e.g. when running a cell multiple times. This would otherwise be forbidden by means of luigi's task register.
2. It provides some convenience methods for storing output targets.

In [3]:
class BaseTask(law.ipython.Task):
    
    def local_target(self, *paths) -> law.LocalFileTarget:
        """
        Returns a law.LocalFileTarget, located at $PWD/data/<task_name>/paths.
        """
        paths = ("$PWD", "data", self.task_family) + paths
        return law.LocalFileTarget(os.path.join(*paths))

##### ComputeTerm

In [4]:
class ComputeTerm(BaseTask):

    order = luigi.IntParameter(
        default=0,
        description="the order of the term to compute; default: 0",
    )

    def output(self):
        """
        Declare the output of this task, which will be a json file.
        """
        return self.local_target(f"term_{self.order}.json")

    def run(self):
        """
        Compute a term of pi following Machin's formula for the given order,
        https://en.wikipedia.org/wiki/Machin-like_formula.
        """
        # define a lambda to compute the arctan term
        exp = self.order * 2 + 1
        sgn = -1 if self.order % 2 else 1
        arctan_term = lambda x: sgn / exp * (x ** exp)
        
        # compute the pi term
        term = 16 * arctan_term(5 ** -1) - 4 * arctan_term(239 ** -1)
        
        # save the term in a json file using the "json" target formatter
        # (a wrapper around the usual "import json - open file - json.dump()")
        self.output().dump({"term": term}, formatter="json")

##### ComputePi

In [5]:
class ComputePi(BaseTask):
    
    # reuse the definition of the order parameter
    # (not it's value!)
    order = ComputeTerm.order

    def requires(self):
        """
        Require #order ComputeTerm tasks.
        """
        # cls.req returns an instance of cls with parameter values
        # derived from the "self", plus additional keyword arguments
        return [
            ComputeTerm.req(self, order=o)
            for o in range(self.order + 1)
        ]
    
    def output(self):
        """
        Declare the output of this task, which again will be a json file.
        """
        return self.local_target(f"pi_{self.order}.json")

    def run(self):
        """
        Sum up values of requirements.
        """
        # input() is the container for the outputs of all requirements
        pi = sum(
            inp.load(formatter="json")["term"]
            for inp in self.input()
        )
        self.output().dump({"pi": pi}, formatter="json")
        
        # print the result
        print(f"pi for order {self.order}\n")
        print(f"approx.: {pi}")
        import math
        print(f"actual : {math.pi:.30f}")

### Run tasks via `%ilaw`

Now that we have defined our tasks, it's time to run them.
As a start, let's run a `ComputeTerm` task for order `0`.

In [6]:
%ilaw run ComputeTerm --order 0

[0;49;32mINFO[0m: [0;49;39mluigi-interface[0m - [0;49;39mInformed scheduler that task   [1;49;39mComputeTerm_0_14b1d8259d[0m   has status   [0;49;36mPENDING[0m[0m
[0;49;32mINFO[0m: [0;49;39mluigi-interface[0m - [0;49;39m[1;49;32mDone scheduling tasks[0m[0m
[0;49;32mINFO[0m: [0;49;39mluigi-interface[0m - [0;49;39mRunning Worker with 1 processes[0m
[0;49;32mINFO[0m: [0;49;39mluigi-interface[0m - [0;49;39m[pid 79545] Worker Worker(salt=8172917226, workers=1, username=marcel, pid=79545) [0;49;36mrunning[0m   [1;49;39m[0;49;32mComputeTerm[0m[0m([1;49;34morder[0m=0)[0m
[0;49;32mINFO[0m: [0;49;39mluigi-interface[0m - [0;49;39m[pid 79545] Worker Worker(salt=8172917226, workers=1, username=marcel, pid=79545) [0;49;32mdone[0m      [1;49;39m[0;49;32mComputeTerm[0m[0m([1;49;34morder[0m=0)[0m
[0;49;32mINFO[0m: [0;49;39mluigi-interface[0m - [0;49;39mInformed scheduler that task   [1;49;39mComputeTerm_0_14b1d8259d[0m   has status   [0;49;32m

True

In the above output, you see three phases.

1. **Tree building phase:** The task that you run can have requirements (dependent tasks whose output is required to produce the output of the *triggered* task) that need to be identified first, resulting in a directed acyclic graph (DAG) or *tree*. It ends with luigi reporting `Done scheduling tasks`.
2. **Run phase:** All tasks that were identified as incomplete (i.e., not all outputs exist) are run, including (and usually concluding with) the triggered task.
3. **Summary phase:** Luigi eventually prints a summary of which tasks ran and whether or not they were successful. It starts with luigi reporting `Luigi Execution Summary`.

##### Interactive parameters

Let's run the next order, but this time we don't start the task right away, but we perform some **interactive** checks first.
For that, we use the same command as above and we append `--print-status 0`.

In [7]:
%ilaw run ComputeTerm --order 1 --print-status 0

print task status with max_depth 0 and target_depth 0

0 > [0;49;32mComputeTerm[0m([1;49;34morder[0m=1)
      [0;49;36mLocalFileTarget[0m([1;49;34mfs[0m=local_fs, [1;49;34mpath[0m=$PWD/data/ComputeTerm/term_1.json)
        [1;49;31mabsent[0m


This command prints the current status of the task, i.e., its outputs and whether they exist or not.
The `0` defines the recursion level for showing dependencies with `0` being the triggered task itself.
However, since our `ComputeTerm` task has no requirements itself, choosing a value other than `0` would not change anything.
So let's check the status of the `ComputePi` task which requires several `ComputeTerm` tasks (depending on the passed order).

In [8]:
%ilaw run ComputePi --order 1 --print-status 1

print task status with max_depth 1 and target_depth 0

0 > [0;49;32mComputePi[0m([1;49;34morder[0m=1)
│     [0;49;36mLocalFileTarget[0m([1;49;34mfs[0m=local_fs, [1;49;34mpath[0m=$PWD/data/ComputePi/pi_1.json)
│       [1;49;31mabsent[0m
│
├──1 > [0;49;32mComputeTerm[0m([1;49;34morder[0m=0)
│        [0;49;36mLocalFileTarget[0m([1;49;34mfs[0m=local_fs, [1;49;34mpath[0m=$PWD/data/ComputeTerm/term_0.json)
│          [1;49;32mexistent[0m
│
└──1 > [0;49;32mComputeTerm[0m([1;49;34morder[0m=1)
         [0;49;36mLocalFileTarget[0m([1;49;34mfs[0m=local_fs, [1;49;34mpath[0m=$PWD/data/ComputeTerm/term_1.json)
           [1;49;31mabsent[0m


We can see the dependency relation between `ComputePi` and `ComputeTerm`, as well as that the output of `ComputeTerm` for order 0 already exists while that for order 1 is missing.
It is also obvious that values of parameters can control dependencies, depending on how you implement the `requires()` method.
If we request order 2, `ComputePi` will have three requirements.

In [9]:
%ilaw run ComputePi --order 2 --print-status 1

print task status with max_depth 1 and target_depth 0

0 > [0;49;32mComputePi[0m([1;49;34morder[0m=2)
│     [0;49;36mLocalFileTarget[0m([1;49;34mfs[0m=local_fs, [1;49;34mpath[0m=$PWD/data/ComputePi/pi_2.json)
│       [1;49;31mabsent[0m
│
├──1 > [0;49;32mComputeTerm[0m([1;49;34morder[0m=0)
│        [0;49;36mLocalFileTarget[0m([1;49;34mfs[0m=local_fs, [1;49;34mpath[0m=$PWD/data/ComputeTerm/term_0.json)
│          [1;49;32mexistent[0m
│
├──1 > [0;49;32mComputeTerm[0m([1;49;34morder[0m=1)
│        [0;49;36mLocalFileTarget[0m([1;49;34mfs[0m=local_fs, [1;49;34mpath[0m=$PWD/data/ComputeTerm/term_1.json)
│          [1;49;31mabsent[0m
│
└──1 > [0;49;32mComputeTerm[0m([1;49;34morder[0m=2)
         [0;49;36mLocalFileTarget[0m([1;49;34mfs[0m=local_fs, [1;49;34mpath[0m=$PWD/data/ComputeTerm/term_2.json)
           [1;49;31mabsent[0m


Other interactive parameters are

- `--print-deps LEVEL`: Prints only the dependent tasks and no output targets statuses.
- `--print-output LEVEL[,FLAGS]`: Prints a flat list of output files.
- `--remove-output LEVEL[,FLAGS]`: Removes outputs (see below).
- `--fetch-output LEVEL[,FLAGS]`: Fetches (remote) outputs to a local directory.

Always feel free to add `--help` to any command to get more info on which parameters exist and which values they accept.

As an example, we can delete the output of the `ComputeTerm` task that we ran above.

In [10]:
%ilaw run ComputeTerm --order 0 --remove-output 0

remove task output with max_depth 0
removal mode? [i*(interactive), d(dry), a(all)] 
selected [1;49;34minteractive mode[0m mode

0 > [0;49;32mComputeTerm[0m([1;49;34morder[0m=0)
      remove outputs? [y*(yes), n(no), a(all)] 
      [0;49;36mLocalFileTarget[0m([1;49;34mfs[0m=local_fs, [1;49;34mpath[0m=$PWD/data/ComputeTerm/term_0.json)
        remove? [y(yes), n*(no)] y
        [1;49;31mremoved[0m


As output removal can be dangerous, the default mode is interactive which forces you to confirm deletions on a per-task or even per-target level.
If you are absolutely sure that you know what you are deleting, you can select `all` mode.
And to avoid being prompted for the mode, you can add "a" to the `--remove-ouput` argument, separatey by a comma.

In [11]:
%ilaw run ComputeTerm --order 0 --remove-output 0,a

remove task output with max_depth 0
selected [1;49;34mall mode[0m mode

0 > [0;49;32mComputeTerm[0m([1;49;34morder[0m=0)
      [0;49;36mLocalFileTarget[0m([1;49;34mfs[0m=local_fs, [1;49;34mpath[0m=$PWD/data/ComputeTerm/term_0.json)
        [1;49;31mremoved[0m


Now, if we query the status again (with the above command), we see that the output is in fact missing.

In [12]:
%ilaw run ComputePi --order 2 --print-status 1

print task status with max_depth 1 and target_depth 0

0 > [0;49;32mComputePi[0m([1;49;34morder[0m=2)
│     [0;49;36mLocalFileTarget[0m([1;49;34mfs[0m=local_fs, [1;49;34mpath[0m=$PWD/data/ComputePi/pi_2.json)
│       [1;49;31mabsent[0m
│
├──1 > [0;49;32mComputeTerm[0m([1;49;34morder[0m=0)
│        [0;49;36mLocalFileTarget[0m([1;49;34mfs[0m=local_fs, [1;49;34mpath[0m=$PWD/data/ComputeTerm/term_0.json)
│          [1;49;31mabsent[0m
│
├──1 > [0;49;32mComputeTerm[0m([1;49;34morder[0m=1)
│        [0;49;36mLocalFileTarget[0m([1;49;34mfs[0m=local_fs, [1;49;34mpath[0m=$PWD/data/ComputeTerm/term_1.json)
│          [1;49;31mabsent[0m
│
└──1 > [0;49;32mComputeTerm[0m([1;49;34morder[0m=2)
         [0;49;36mLocalFileTarget[0m([1;49;34mfs[0m=local_fs, [1;49;34mpath[0m=$PWD/data/ComputeTerm/term_2.json)
           [1;49;31mabsent[0m


##### Run everything

Let's compute $\pi$ up to order 2.

In [13]:
%ilaw run ComputePi --order 2

[0;49;32mINFO[0m: [0;49;39mluigi-interface[0m - [0;49;39mInformed scheduler that task   [1;49;39mComputePi_2_65f6329628[0m   has status   [0;49;36mPENDING[0m[0m
[0;49;32mINFO[0m: [0;49;39mluigi-interface[0m - [0;49;39mInformed scheduler that task   [1;49;39mComputeTerm_2_65f6329628[0m   has status   [0;49;36mPENDING[0m[0m
[0;49;32mINFO[0m: [0;49;39mluigi-interface[0m - [0;49;39mInformed scheduler that task   [1;49;39mComputeTerm_1_861bfab55e[0m   has status   [0;49;36mPENDING[0m[0m
[0;49;32mINFO[0m: [0;49;39mluigi-interface[0m - [0;49;39mInformed scheduler that task   [1;49;39mComputeTerm_0_14b1d8259d[0m   has status   [0;49;36mPENDING[0m[0m
[0;49;32mINFO[0m: [0;49;39mluigi-interface[0m - [0;49;39m[1;49;32mDone scheduling tasks[0m[0m
[0;49;32mINFO[0m: [0;49;39mluigi-interface[0m - [0;49;39mRunning Worker with 1 processes[0m
[0;49;32mINFO[0m: [0;49;39mluigi-interface[0m - [0;49;39m[pid 79545] Worker Worker(salt=4407183577, worke

pi for order 2

approx.: 3.141621029325035
actual : 3.141592653589793115997963468544


True

Add another order and note, that only one additional `ComputeTerm` task is run, since the ones for orders 0, 1 and 2 are already complete (as their outputs exist).

In [14]:
%ilaw run ComputePi --order 3

[0;49;32mINFO[0m: [0;49;39mluigi-interface[0m - [0;49;39mInformed scheduler that task   [1;49;39mComputePi_3_981b2635ea[0m   has status   [0;49;36mPENDING[0m[0m
[0;49;32mINFO[0m: [0;49;39mluigi-interface[0m - [0;49;39mInformed scheduler that task   [1;49;39mComputeTerm_3_981b2635ea[0m   has status   [0;49;36mPENDING[0m[0m
[0;49;32mINFO[0m: [0;49;39mluigi-interface[0m - [0;49;39mInformed scheduler that task   [1;49;39mComputeTerm_2_65f6329628[0m   has status   [0;49;32mDONE[0m[0m
[0;49;32mINFO[0m: [0;49;39mluigi-interface[0m - [0;49;39mInformed scheduler that task   [1;49;39mComputeTerm_1_861bfab55e[0m   has status   [0;49;32mDONE[0m[0m
[0;49;32mINFO[0m: [0;49;39mluigi-interface[0m - [0;49;39mInformed scheduler that task   [1;49;39mComputeTerm_0_14b1d8259d[0m   has status   [0;49;32mDONE[0m[0m
[0;49;32mINFO[0m: [0;49;39mluigi-interface[0m - [0;49;39m[1;49;32mDone scheduling tasks[0m[0m
[0;49;32mINFO[0m: [0;49;39mluigi-interfac

pi for order 3

approx.: 3.1415917721821778
actual : 3.141592653589793115997963468544


True

### Run tasks programmatically

Tasks can always be run programmatically within your python code.
Create an instance of a task class and call its `law_run()` method.

**However**, note that parameter values that are passed via keyword arguments should either be strings in the same format as you would pass them on the command line, or encoded in the structure they are expected (e.g. already as integers in case an `luigi.IntParameter` is used).
Usually, it's more convenient to pass strings and let the parameter objects handle the parsing.

In [15]:
# run up to order 5
task = ComputePi(order="5")
task.law_run()

[0;49;32mINFO[0m: [0;49;39mluigi-interface[0m - [0;49;39mInformed scheduler that task   [1;49;39mComputePi_5_17e7c03c31[0m   has status   [0;49;36mPENDING[0m[0m
[0;49;32mINFO[0m: [0;49;39mluigi-interface[0m - [0;49;39mInformed scheduler that task   [1;49;39mComputeTerm_5_17e7c03c31[0m   has status   [0;49;36mPENDING[0m[0m
[0;49;32mINFO[0m: [0;49;39mluigi-interface[0m - [0;49;39mInformed scheduler that task   [1;49;39mComputeTerm_4_b90cc5595b[0m   has status   [0;49;36mPENDING[0m[0m
[0;49;32mINFO[0m: [0;49;39mluigi-interface[0m - [0;49;39mInformed scheduler that task   [1;49;39mComputeTerm_3_981b2635ea[0m   has status   [0;49;32mDONE[0m[0m
[0;49;32mINFO[0m: [0;49;39mluigi-interface[0m - [0;49;39mInformed scheduler that task   [1;49;39mComputeTerm_2_65f6329628[0m   has status   [0;49;32mDONE[0m[0m
[0;49;32mINFO[0m: [0;49;39mluigi-interface[0m - [0;49;39mInformed scheduler that task   [1;49;39mComputeTerm_1_861bfab55e[0m   has statu

pi for order 5

approx.: 3.141592652615309
actual : 3.141592653589793115997963468544


True

You can access the outputs afterwards and continue working with them, for instance, in a script for prototyping.

In [16]:
import math

approx_pi = task.output().load(formatter="json")["pi"]

rel_diff = abs(approx_pi - math.pi) / math.pi
print(f"relative difference: {rel_diff * 100:.9f}%")

relative difference: 0.000000031%
