# Staging Data with RADICAL-Pilot

RADICAL-Pilot (RP) provides capabilities for data management from the client side to the agent side (on the HPC platform) and back, and within the agent space (i.e., between sandboxes for session, pilots and tasks). To ensure that the task finds the data it needs on the HPC platform where it runs, RP provides a mechanism to stage input data automatically. As well, it lets to stage output data, generated by tasks, back to the user or any other place on the HPC platform, where it will be collected.

Every staging directive is characterized by three key parameters:

- **Source** - data files (or directory) that need to be staged (see the section [Locations](#Locations));
- **Target** - the target path for the staged data (see the section [Locations](#Locations));
- **Action** - defines how the provided data should be staged.

If user provides just the _source_ of the data (e.g., file name within the working directory), then RP will generate a _target_ automatically, based on where this directive is provided (in pilot description or in task description), and that data will be _transferred_ according to the default _action_ (see the section [Actions](#Actions)).

## Staging directives

### Format

The staging directives are specified using a `dict` type in the following form:

```python
staging_directive = {
    'source'  : None,  # see 'Locations' below
    'target'  : None,  # see 'Locations' below
    'action'  : None,  # See 'Actions'   below
    'flags'   : None   # See 'Flags'     below
}
```

If user provides just names, then RP uses it as data sources and expands that into directives of this format.
For example, staging directive in task description will be expanded as following:
```shell
in:  [ 'input.dat' ]
out: [{'source' : 'client:///input.dat',
       'target' : 'task:///input.dat',
       'action' : rp.TRANSFER}]
in:  [ 'input.dat > staged.dat' ]
out: [{'source' : 'client:///input.dat',
       'target' : 'task:///staged.dat',
       'action' : rp.TRANSFER}]
```

### Locations

`Source` and `Target` locations can be given as strings or `ru.Url` instances. Strings containing `://` are converted into URLs immediately. Otherwise, they are considered absolute or relative paths and are then interpreted in the context of the client's working directory.

Special URL schemas to be interpreted relative to certain locations:
* `client://` - client's working directory;
* `endpoint://` - root of the file system on the target platform;
* `resource://` - agent's sandbox (`radical.pilot.sandbox`) on the target platform;
* `pilot://` - pilot sandbox on the target platform;
* `task://` - task sandbox on the target platform.

All locations are interpreted as directories, never as files. For the above schemas, we interpret `schema://` the same as `schema:///`, i.e., we treat this as a namespace, not as location qualified by a hostname - the `hostname` element of the URL is expected to be empty, and the path is _always_ considered relative to the locations specified above (even though URLs usually don't have a notion of relative paths).

`endpoint://` is based on the `filesystem_endpoint` attribute of the platform config (see the tutorial [RADICAL-Pilot Configuration System](configuration.ipynb)) and points to the file system accessible via that URL. Note that the notion of `root` depends of the access protocol and the providing service implementation.

<div class="alert alert-info">
    
__Note:__ For more details on path and sandbox handling check the documentation of `radical.pilot.staging_directives.complete_url`.

</div>

### Actions

* `rp.TRANSFER` (__*default*__) - remote file transfer from `source` URL to `target` URL;
* `rp.COPY` - local file copy (i.e., not crossing host boundaries);
* `rp.MOVE` - local file move;
* `rp.LINK` - local file symlink.

### Flags

Flags are set automatically.

* `rp.CREATE_PARENTS` - create the directory hierarchy for targets on the fly;
* `rp.RECURSIVE` - if `source` is a directory, handles it recursively.

## Examples

<div class="alert alert-info">
    
__Note:__ For the initial setup regarding MongoDB see the tutorial [Getting Started](tutorials/getting_started.ipynb).

</div>

In [None]:
%env RADICAL_PILOT_DBURL=mongodb://guest:guest@mongodb:27017/default

<div class="alert alert-info">

__Note:__ In provided example run, we will not show an animation during the waiting steps (e.g., while waiting pilot to be stopped).

</div>

In [None]:
%env RADICAL_REPORT_ANIME=False

In [None]:
import radical.pilot as rp
import radical.utils as ru

In [None]:
session = rp.Session()
pmgr    = rp.PilotManager(session=session)
tmgr    = rp.TaskManager(session=session)

Example of providing staging directives within the pilot.

In [None]:
with open('./input.txt', 'w') as f:
    f.write('Input from the pilot (uid=$RP_PILOT_ID)')

pd = rp.PilotDescription({
    'resource'      : 'local.localhost',
    'cores'         : 2,
    'runtime'       : 15,
    'input_staging' : [ 'input.txt' ]
})

<div class="alert alert-info">
    
__Note:__ Staging of input data for pilot can be describe within `PilotDescription` or as input parameters in `pilot.stage_in()` method, but staging of output data is aplicable by using `pilot.stage_out()` method only.

</div>

In [None]:
pilot = pmgr.submit_pilots(pd)
tmgr.add_pilots(pilot)

Example of providing staging directives within the task.

In [None]:
td = rp.TaskDescription({
    'executable'    : 'eval',
    'arguments'     : ['echo "$(cat input.txt)"'],
    'stdout'        : 'output.txt',
    'input_staging' : [{'source': 'pilot:///input.txt',
                        'target': 'task:///input.txt',
                        'action': rp.LINK}],
    'output_staging': [{'source': 'task:///output.txt',
                        'target': 'pilot:///output.txt',
                        'action': rp.COPY}]
})

In [None]:
tmgr.submit_tasks(td)
tmgr.wait_tasks()

In [None]:
pilot.stage_out({'source': 'pilot:///output.txt',
                 'target': 'client:///result.txt',
                 'action': rp.TRANSFER})

In [None]:
!cat result.txt

In [None]:
session.close(cleanup=True)