# Using Multiple Pilots with RADICAL-Pilot

RADICAL-Pilot supports managing multiple pilots during a single run, while workload will be distributed among all active pilots. The tasks are distributed over the created set of pilots according to some scheduling mechanism (the default scheduling policy in `TaskManager` is Round Robin).

Thus, examples below will demonstrate how multiple pilots are submitted with different pilot descriptions.

<div class="alert alert-info">
    
__Note:__ For the initial setup regarding MongoDB see the tutorial [Getting Started](tutorials/getting_started.ipynb).

</div>

In [1]:
%env RADICAL_PILOT_DBURL=mongodb://guest:guest@mongodb:27017/default

env: RADICAL_PILOT_DBURL=mongodb://guest:guest@mongodb:27017/default


<div class="alert alert-info">

__Note:__ In provided example run, we will not show an animation during the waiting steps (e.g., while waiting pilot to be stopped)

</div>

In [2]:
%env RADICAL_REPORT_ANIME=False

env: RADICAL_REPORT_ANIME=False


## Example run

In [3]:
import radical.pilot as rp
import radical.utils as ru

report = ru.Reporter(name='radical.pilot')
report.title('Multiple pilots (RP version %s)' % rp.version)

session = rp.Session()

[94m[1m
[39m[0m[94m[1m Multiple pilots (RP version 1.22.0)                                            
[39m[0m[94m[1m
[39m[0m[94mnew session: [39m[0m[rp.session.f2ecab7e-cb95-11ed-a0ec-0242ac140003][39m[0m[94m                 \
database   : [39m[0m[mongodb://guest:****@mongodb:27017/default][39m[0m[92m                     ok
[39m[0m

For this example we will have 2 pilots running on `localhost` (built-in resource description for localhost mimics the real resource, thus we can request more than 1 node per pilot).

In [4]:
pd0 = rp.PilotDescription({
    'resource': 'local.localhost',
    'cores'   : 10,
    'runtime' : 10
})
pd1 = rp.PilotDescription({
    'resource': 'local.localhost',
    'cores'   : 15,
    'runtime' : 5
})

In [5]:
pmgr   = rp.PilotManager(session=session)
pilots = pmgr.submit_pilots([pd0, pd1])

[94mcreate pilot manager[39m[0m[92m                                                          ok
[39m[0m[94msubmit 2 pilot(s)[39m[0m
        pilot.0000   local.localhost          10 cores       0 gpus[39m[0m
        pilot.0001   local.localhost          15 cores       0 gpus[39m[0m[92m           ok
[39m[0m

In [6]:
tmgr = rp.TaskManager(session=session)

[94mcreate task manager[39m[0m[92m                                                           ok
[39m[0m

Add submitted pilots to `TaskManager`.

In [7]:
tmgr.add_pilots(pilots)
tmgr.list_pilots()

['pilot.0000', 'pilot.0001']

In [8]:
N_TASKS = 10

tds = list()
for idx in range(N_TASKS):
    td = rp.TaskDescription()
    td.executable = '/bin/echo'
    td.arguments  = ['pilot_id=$RP_PILOT_ID']
    tds.append(td)

tasks = tmgr.submit_tasks(tds)
tmgr.wait_tasks()

submit: [39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m
[39m[0mwait  : [39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m

['DONE',
 'DONE',
 'DONE',
 'DONE',
 'DONE',
 'DONE',
 'DONE',
 'DONE',
 'DONE',
 'DONE']

Distribution of tasks among pilots follows Round Robin scheduling policy, but it is possibly to assign a task to a particular pilot explicitly.

In [9]:
td = rp.TaskDescription()
td.executable = '/bin/echo'
td.arguments  = ['task is assigned to $RP_PILOT_ID']
td.pilot      = pilots[0].uid

tasks.append(tmgr.submit_tasks(td))
tmgr.wait_tasks(uids=tasks[-1].uid)

submit: [39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m
[39m[0mwait  : [39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m

'DONE'

If task is assigned to the unknown pilot, then it will wait for that pilot to appear in `TaskManager`.

In [10]:
td = rp.TaskDescription()
td.executable = '/bin/echo'
td.arguments  = ['pilot_id=$RP_PILOT_ID']
td.pilot      = 'unknown_pilot_id'

task = tmgr.submit_tasks(td)
tmgr.wait_tasks(uids=task.uid, timeout=15)

submit: [39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m#[39m[0m
[39m[0mwait  : [39m[0m
[39m[0m[94m	TMGR_SCHEDULING:     1
[39m[0m[93m                                                                         timeout
[39m[0m

'TMGR_SCHEDULING'

In [11]:
for task in tasks:
    stdout = task.stdout.strip()[:35]
    report.plain('%s - output: %-30s (in task description: pilot="%s")\n' %
                 (task.uid, stdout, task.description['pilot']))

task.000000 - output: pilot_id=pilot.0000            (in task description: pilot="")
[39m[0mtask.000001 - output: pilot_id=pilot.0001            (in task description: pilot="")
[39m[0mtask.000002 - output: pilot_id=pilot.0000            (in task description: pilot="")
[39m[0mtask.000003 - output: pilot_id=pilot.0001            (in task description: pilot="")
[39m[0mtask.000004 - output: pilot_id=pilot.0000            (in task description: pilot="")
[39m[0mtask.000005 - output: pilot_id=pilot.0001            (in task description: pilot="")
[39m[0mtask.000006 - output: pilot_id=pilot.0000            (in task description: pilot="")
[39m[0mtask.000007 - output: pilot_id=pilot.0001            (in task description: pilot="")
[39m[0mtask.000008 - output: pilot_id=pilot.0000            (in task description: pilot="")
[39m[0mtask.000009 - output: pilot_id=pilot.0001            (in task description: pilot="")
[39m[0mtask.000010 - output: task is assigned to pilot.0000 (in tas

In [12]:
session.close(cleanup=True)

[94mclosing session rp.session.f2ecab7e-cb95-11ed-a0ec-0242ac140003[39m[0m[94m                \
close task manager[39m[0m[92m                                                            ok
[39m[0m[94mclose pilot manager[39m[0m[94m                                                            \
wait for 2 pilot(s)
        [39m[0m[93m                                                                 timeout
[39m[0m[92m                                                                              ok
[39m[0m[94msession lifetime: 80.8s[39m[0m[92m                                                       ok
[39m[0m