In [None]:
import pydra

## Intro to FunctionTask

Task can be created from every function defined by the user by using the `pydra.to_task` decorator.

In [None]:
@pydra.mark.task
def add_var(a, b):
    return a + b

Once we decorate the function, we can create a pydra `Task` and specify the input.

In [None]:
task1 = add_var(a=4, b=5)

We can check the type of `task1`:

In [None]:
type(task1)

pydra uses `FunctionTask` to create tasks from python functions.

We can also check that the task has correct values of `a` and `b`, that are saved in the task inputs.

In [None]:
print(f"a = {task1.inputs.a}")
print(f"b = {task1.inputs.b}")

We can also check entire inputs

In [None]:
task1.inputs

As you could see, `task.inputs` contains also information about the function, that is an inseparable part of the `FunctionTask`.

Once we have the task with set input, we can run it. Since `Task` is a "callable object", we can use the syntax:

In [None]:
task1()

As you can see, the result was returned right away, but we can also access it later:

In [None]:
task1.result()

`Result` contains more than just an output, so if we want to get the task output, we can type:

In [None]:
result = task1.result()
result.output.out

### Customizing output names
Note that "out" is a default name for the task output, but we can always customize it. There are two ways of doing it - we can annotate the python function:

In [None]:
import typing as ty

@pydra.mark.task
def add_var_an(a, b) -> ty.NamedTuple("Output", [("sum_a_b", int)]):
    return a + b


task1a = add_var_an(a=4, b=5)
task1a()

The annotation might be very useful to specify output name when the fnction returns multiple values.

In [None]:
@pydra.mark.task
def modf_an(a) -> ty.NamedTuple("Output", [("fractional", ty.Any), ("integer", ty.Any)]):
    import math
    return math.modf(a)

task2 = modf_an(a=3.5)
task2()

The second way of customize the output requires the second decorator - `pydra.mark.annotate`

In [None]:
@pydra.mark.task
@pydra.mark.annotate({"return": {"fractional": ty.Any, "integer": ty.Any}})
def modf(a):
    import math
    return math.modf(a)

task2a = modf(a=3.5)
task2a()

### Setting the input

We don't have to provide the input when we create a task, we can always set it later:

In [None]:
task3 = add_var()
task3.inputs.a = 4
task3.inputs.b = 5
task3()

If we forget to specify the input, `None` will be used as the default value, so the function will return a python error.

In [None]:
task3a = add_var()
task3a.inputs.a = 4
try:
    task3a()
except(TypeError) as err:
    print(f"TimeError: {err}")
else:
    raise

### Output directory and caching the results

We can check where is the output directory with results

In [None]:
task3.output_dir

Within the directory you can find the file with the results: `_result.pklz`.

In [None]:
import os
os.listdir(task3.output_dir)

But we can also provide the directory where we want to store the results, let's create a temporary directory and a specific subdirectory "task4":

In [None]:
from tempfile import mkdtemp
from pathlib import Path
cache_dir_tmp = Path(mkdtemp()) / "task4"
print(cache_dir_tmp)

Now we can set pass this directory to the argument of `FunctionTask`. To observe the execution time, we specify a function that is sleeping for 5s:

In [None]:
@pydra.mark.task
def add_var_wait(a, b):
    import time
    time.sleep(5)
    return a + b

task4 = add_var_wait(a=4, b=6, cache_dir=cache_dir_tmp)
task4()

If you're running the cell first time, it should take around 5s.

We can see check `output_dir` of our task, it should contain the path of `cache_dir_tmp` and the last part contains the name of the task class `FunctionTask` and the task checksum.

In [None]:
task4.output_dir

Let's see what happens when we defined identical task again with the same `cache_dir`: 

In [None]:
task4a = add_var_wait(a=4, b=6, cache_dir=cache_dir_tmp)
task4a()

This time the result should be ready right away! Pydra uses already available results and do not recompute the task.

Pydra not only checks the results in `cache_dir`, but you can provide a list of other locations that should be checked. Let's create another directory that will be used as `cache_dir` and previous working directory will be used in `cache_locations`.

In [None]:
cache_dir_tmp_new = Path(mkdtemp()) / "task4b"

task4b = add_var_wait(a=4, b=6, cache_dir=cache_dir_tmp_new, cache_locations=[cache_dir_tmp])
task4b()
#task4a.result()

This time the results should be also returned quickly! And we can check that `task4b.output_dir` was not created:

In [None]:
task4b.output_dir.exists()

### Using Audit

Pydra can record various run time information, including the workflow provenance, by setting `audit_flags` and the type of messengers. 

In [None]:
from pydra.utils.messenger import AuditFlag, PrintMessenger

task5 = add_var(a=4, b=5, audit_flags=AuditFlag.PROV, messengers=PrintMessenger())
task5()

@satra - could you please add some comments