# Basic examples
In this notebook we will show how to write a basic pipeline, in the **desipipe** framework. You need to have installed **desipipe** with:
```
python -m pip install git+https://github.com/cosmodesi/desipipe#egg=desipipe
```

## Task manager
Let's consider a simple example: the Monte-Carlo estimation of $\pi$.

In [1]:
import time

from desipipe import Queue, Environment, TaskManager, FileManager

# Let's instantiate a Queue, which records all tasks to be performed
# spawn=True means a manager process is spawned to distribute the tasks among workers
# spawn=False only updates the queue, but no other process to run the tasks is spawned
# That can be updated afterwards, with e.g. the command line (see below):
# desipipe spawn -q ./_tests/test --spawn
queue = Queue('test', base_dir='_tests', spawn=True)
# Pool of 4 workers
# Any environment variable can be passed to Environment: it will be set when running the tasks below
tm = TaskManager(queue, environ=Environment(), scheduler=dict(max_workers=4))

# We decorate the function (task) with tm.python_app
@tm.python_app
def fraction(seed=42, size=10000):
    # All definitions, except input parameters, must be in the function itself
    import time
    import numpy as np
    time.sleep(5)  # wait 5 seconds, just to show jobs are indeed run in parallel
    x, y = np.random.uniform(-1, 1, size), np.random.uniform(-1, 1, size)
    return np.sum((x**2 + y**2) < 1.) * 1. / size

# Here we use another task manager, with only 1 worker
tm2 = tm.clone(scheduler=dict(max_workers=1))
@tm2.python_app  # the two lines above can be on the same line in Python >= 3.9
def average(fractions):
    import numpy as np
    return np.average(fractions) * 4.

# Let's add another task, to be run in a shell
@tm2.bash_app
def echo(avg):
    return ['echo', '-n', 'bash app says pi is ~ {:.4f}'.format(avg)]

t0 = time.time()
# The following line stacks all the tasks in the queue
fractions = [fraction(seed=i) for i in range(20)]
# fractions is a list of Future instances
# We can pass them to other tasks, which creates a dependency graph
avg = average(fractions)
ech = echo(avg)
# At this point jobs are submitted
print('Elapsed time: {:.4f}'.format(time.time() - t0))

TypeError: Queue.__init__() got an unexpected keyword argument 'spawn'

In [None]:
# result() returns the result of the function, which can take some time to complete
# in this case, ~ 20 tasks which take 5 seconds distributed over 4 processes: typically 25 seconds
print(ech.out())
print('pi is ~ {:.4f}'.format(avg.result()))
print('Elapsed time: {:.1f}'.format(time.time() - t0))

## Tips
If you re-execute the two above cells, the cached result is immediately returned.
If you modify e.g. ``fraction``, a new result (including ``average``) will be computed.
If you modify ``average``, only ``average`` will be computed again.
To change this default behavior, you can pass ``skip=True`` (skip this app) or ``name=True`` (or a the original app name)

In [None]:
@tm2.bash_app(skip=True)  # no computation scheduled, just returns None
def echo2(avg):
    return 42

assert echo2(avg) is None

@tm2.bash_app(name=True)
def fraction():
    return None

for frac in fractions:
    assert fraction().result() == frac.result()  # the previous fraction result is used

@tm2.bash_app(name='echo')
def echo2(avg):
    return 42

print(echo2().out())  # the same as echo().out()

Note that one can incrementally build the script: previous tasks will not be rerun if they have not changed.

## Command line
We provide a number of command line instructions to interact with queues: list queues, tasks in a queue, pause or resume a queue.

### List queues

In [None]:
%%bash
desipipe queues -q './_tests/*'

### List tasks in a queue

In [None]:
%%bash
desipipe tasks -q ./_tests/test
# task state can be:
# WAITING  Waiting for requirements (other tasks) to finish
# PENDING  Eligible to be selected and run
# RUNNING  Running right now
# SUCCEEDED  Finished with errno = 0
# FAILED  Finished with errno != 0

### Pause a queue
When pausing a queue, all processes running tasks from this queue will stop (after they finish their current task).

In [None]:
%%bash
desipipe pause -q ./_tests/test
desipipe queues -q './_tests/*'  # state is now PAUSED

### Resume a queue
When resuming a queue, tasks can be processed.

In [None]:
%%bash
desipipe resume -q ./_tests/test  # pass --spawn to spawn a manager process that will distribute the tasks among workers
desipipe queues -q './_tests/*'  # state is now ACTIVE

### Retry
Change task state to PENDING.

In [None]:
%%bash
desipipe retry -q ./_tests/test --state SUCCEEDED
desipipe queues -q './_tests/*'  # task state is now PENDING

### Spawn a manager process
Spawn a manager process that will distribute the tasks among workers, using the scheduler and provider defined above.

In [None]:
%%bash
desipipe spawn -q ./_tests/test  # pass --spawn to spawn an independent process, and exit this one
desipipe queues -q './_tests/*'  # tasks have been reprocessed: SUCCEEDED

### Delete queue(s)

In [None]:
%%bash
desipipe delete -q './_tests/*'  # pass --force to actually delete the queue

## File manager
The file manager aimes at keeping track of files (of all kinds) produced in the processing.

In [None]:
%%file '_tests/files.yaml'

description: Some text file
id: my_input_file
filetype: text
path: ${SOMEDIR}/in_{option1}_{i:d}.txt
author: Chuck Norris
options:
  option1: ['a', 'b']
  i: range(0, 3, 1)

In [None]:
fm = FileManager('_tests/files.yaml', environ=dict(SOMEDIR='_tests'))
# To select files
fm2 = fm.select(keywords='text file', option1=['a'])
# Iterate over files
for fi in fm2:
    fi = fi.get()
    print(fi)
    # Write text
    fi.write('hello world!')

In [None]:
# To add a new entry
fm.append(dict(description='added file', id='added_file', filetype='catalog', path='test.fits'))
# To delete an entry
del fm[-1]
# To add a cloned entry
fm.append(fm[0].clone(id='my_output_file', path='${SOMEDIR}/out_{option1}_{i:d}.txt'))
fm.write('_tests/files.yaml')
# Display new file data base
!cat '_tests/files.yaml'

In practice, we will just edit the *.yaml* file directly.

In [None]:
# Let's add a new task!
@tm.python_app
def copy(text_in, text_out):
    import numpy as np  # just to illustrate that the package version is tracked
    text = text_in.read()
    text += ' this is my first message'
    print('saving', text_out.filepath)
    text_out.write(text)

In [None]:
# Iterate over files
for fi in fm:
    copy(fi.get(id='my_input_file'), fi.get(id='my_output_file'))

# Let's spawn a new process, as the previous one has finished (there was no work anymore!)
from desipipe import spawn
spawn(queue)

In [None]:
!ls -a _tests/

In [None]:
!cat _tests/out_a_0.txt

In [None]:
# This is where desipipe processing information is saved
!ls -a _tests/.desipipe
print('\n*.py file is:')
!cat _tests/.desipipe/copy.py
print('\n*.versions file is:')
!cat _tests/.desipipe/copy.versions

In [None]:
# Delete queue
queue.delete()