## Introduction

In this notebook, we will share how to create the Simulation Analysis Loop pattern  
using the EnTK API. We will use a small alanine dipeptide system and implement  
the Gromacs-LSDMap algorithm that you saw in the presentation.


**Note**: This notebook is not to be executed. The notebook format allows us to explain  
step-by-step. An example that you can run is placed in '../examples/',  please see  
the pre-reqs to run the example in the sa folder.

We know the Simulation Analysis Loop can be represented as:

<img style="float: left;" src="./Simulation_Analysis_Loop_pattern.jpg">

Let's set start with something simple!

We will start with the following constraints/assumptions and extend  
as we go forward:

* Number of iterations = 1
* Number of simulation tasks = 16
* Number of analysis tasks = 1


We will perform the following steps:

1. Create a Pipeline
2. Create a Stage with Simulation Tasks
3. Create a Stage with Analysis Tasks
4. Describe a resource to use for execution
5. Submit the Pipeline for execution

**Let's start with the code!**

## Encoding the Simulation Analysis Loop pattern

### Step 0 -- Importing the API objects and set our constants

In [25]:
from radical.entk import Pipeline, Stage, Task, AppManager
# AppManager accepts the Pipeline and resource specification.

num_iterations = 1
num_sim_tasks  = 16
num_ana_tasks  = 1

### Step 1 -- Create a Pipeline

In [33]:
sal_pipe = Pipeline()
sal_pipe.name = 'sal'
# Names are user-defined, but required when we need to reference data 
# across tasks. We will see how in a few minutes. 

### Step 2 -- Create a Stage with Simulation Tasks

It is important to add the Simulation Stage to the SAL Pipeline first!  
EnTK will respect the order you specify within a Pipeline.

In [26]:
sim_stage = Stage()
sim_stage.name = 'gmx'
for x in range(num_sim_tasks):
    sim_task = create_gromacs_task()
    # We will define 'create_gromacs_task()' in a few minutes
    # We decouple the specification of the task from the 
    # creation of the Simulation Analysis Loop, and keep the
    # focus on the latter.
    
    sim_stage.add_tasks(sim_task)
    # The order in which the tasks are created and added to 
    # a Stage are not important since they all run concurrently.
    
sal_pipe.add_stages(sim_stage)

### Step 3 -- Create a Stage with Analysis Tasks

It is important to add the Analysis Stage to the SAL Pipeline second!  
EnTK will respect the order you specify within a Pipeline.

In [27]:
ana_stage = Stage()
ana_stage.name = 'lsdm'
for x in range(num_ana_tasks):
    ana_task = create_lsdmap_task()
    # We will define 'create_lsdmap_task()' in a few minutes
    # We decouple the specification of the task from the 
    # creation of the Simulation Analysis Loop, and keep the
    # focus on the latter.
    
    ana_stage.add_tasks(ana_task)
    # The order in which the tasks are created and added to 
    # a Stage are not important since they all run concurrently.
    
sal_pipe.add_stages(ana_stage)

### Step 4 -- Describe a resource to use for execution

Create a dictionary describing three mandatory keys: **resource**,  
**walltime**, and **cpus**. 

Note that **resource** is 'local.localhost' to execute locally. At the end of  
this notebook, we have provided a link to the resources supported by default  
along with documentation on how new resources can be supported.

In [28]:
resource_desc = {

        'resource': 'local.localhost',
        'walltime': 10,
        'cpus': 2
    }

### Step 5 -- Submit the Pipeline for execution

We create an AppManager object, provide with the resource description and  
the SAL Pipeline. We then run our application.

In [22]:
amgr = AppManager()
amgr.resource_desc = resource_desc
amgr.workflow = [sal_pipe]
amgr.run()

We are almost done with our complete EnTK script! 

The above five steps create our Simulation Analysis Loop pattern with EnTK.  
We create one Pipeline with one Simulation stage followed by one Analysis stage.    
The Simulation stage consists of  16 tasks and the Analysis stage consists of 1 task.

Now let's complete the script by specifying the task creation functions.

### Defining the functions that create the tasks

For the two different types of tasks, we will specify the following properties:

* executable
* arguments
* input data

For simplicity, let's assume the input data for the simulation task is available  
at '/tmp/'.  Our simulation task can specified as follows:

In [29]:
def create_gromacs_task(ind):
    gmx_task = Task()
    gmx_task.name = 'sim_%s'%ind
    gmx_task.executable = ['gmx mdrun']
    gmx_task.arguments = []
    gmx_task.copy_input_data = []
    return gmx_task

Next, we will specify the analysis task. Note how the output files of the simulation task  
are described as input to the analysis task.

In [30]:
def create_lsdmap_task():
    lsdm_task = Task()
    lsdm_task.executable = ['lsdmap']
    lsdm_task.arguments = []
    for ind in range(num_sim_tasks):
        sim_task_name = 'sim-%s'%ind
        sim_task_ref = '$Pipeline_%s_Stage_%s_Task_%s' % (sal_pipe.name, sim_stage.name,sim_task_name)
        lsdm_task.copy_input_data.append(sim_task_ref)
    return lsdm_task

## That's all folks!

## Additional Information:

* EnTK Documentation: https://radicalentk.readthedocs.io/
* EnTK Repository: https://github.com/radical-cybertools/radical.entk
* Installation instructions: https://radicalentk.readthedocs.io/en/latest/install.html
* User Guide: https://radicalentk.readthedocs.io/en/latest/user_guide.html
* Adaptive examples: https://radicalentk.readthedocs.io/en/latest/advanced_examples.html

Please feel free to ask any questions now or drop us a question either via our 
[mailing list](https://groups.google.com/d/forum/ensemble-toolkit-users) or [GitHub issues](https://github.com/radical-cybertools/radical.entk/issues)