# AdaptiveMD

## Example 5 - Custom `Generator` objects

### 0. Imports

In [1]:
import sys, os

In [2]:
from adaptivemd import (
    Project, Task, File, PythonTask
)

Let's open our `test` project by its name. If you completed the first examples this should all work out of the box.

In [3]:
project = Project('tutorial')

Open all connections to the `MongoDB` and `Session` so we can get started.

Let's see again where we are. These numbers will depend on whether you run this notebook for the first time or just continue again. Unless you delete your project it will accumulate models and files over time, as is our ultimate goal.

In [4]:
print project.files
print project.generators
print project.models

<StoredBundle for with 123 file(s) @ 0x111718a10>
<StoredBundle for with 2 file(s) @ 0x1117189d0>
<StoredBundle for with 18 file(s) @ 0x111718990>


Now restore our old ways to generate tasks by loading the previously used generators.

In [5]:
engine = project.generators['openmm']
modeller = project.generators['pyemma']
pdb_file = project.files['initial_pdb']

## A simple generator

A word about this example. While a `Task` can be created and configured a new class in `adaptivemd` needs to be part of the project. So we will write discuss the essential parts of the existing code.

A generator is in essence a factory to create `Task` objects with a single command. A generator can be initialized with certain files that the created tasks will always need, like an engine will need a topology for each task, etc. It also (as explained briefly before in Example 4) knows about certain callback behaviour of their tasks. Last, a generator allows you to assign a worker only to tasks that were created by a generator.

#### The execution structure

Let's look at the code of the PyEMMAAnalysis

```py
class PyEMMAAnalysis(Analysis):
    def __init__(self, pdb_file):
        super(PyEMMAAnalysis, self).__init__()

        self['pdb_file'] = pdb_file
        stage = pdb_file.transfer('staging:///')

        self['pdb_file_stage'] = stage.target
        self.initial_staging.append(stage)

    @staticmethod
    def then_func(project, task, model, inputs):
        # add the input arguments for later reference
        model.data['input']['trajectories'] = inputs['files']
        model.data['input']['pdb'] = inputs['topfile']
        project.models.add(model)

    def task_run_msm_files(
            self,
            trajectories,
            tica_lag=2,
            tica_dim=2,
            msm_states=5,
            msm_lag=2,
            stride=1):

        t = PythonTask(self)

        input_pdb = t.link(self['pdb_file_stage'], 'input.pdb')
        t.call(
            remote_analysis,
            trajectories=list(trajectories),
            topfile=input_pdb,
            tica_lag=tica_lag,
            tica_dim=tica_dim,
            msm_states=msm_states,
            msm_lag=msm_lag,
            stride=stride
        )

        return t

```

```py
def __init__(self, pdb_file):
    # don't forget to call super
    super(PyEMMAAnalysis, self).__init__()  

    # a generator also acts like a dictionary for files
    # this way you can later access certain files you might need
    
    # save the pdb_file under the same name
    self['pdb_file'] = pdb_file  

    # this creates a transfer action like it is used in tasks
    # and moves the passed pdb_file (usually on the local machein)
    # to the staging_area root directory
    stage = pdb_file.transfer('staging:///')
    
    # and the new target file (which is also like the original) 
    # on the staging_area is saved unter `pdb_file_stage`
    # so, we can access both files if we wanted to
    # note that the original file most likely is in the DB
    # so we could just skip the stage transfer completely
    self['pdb_file_stage'] = stage.target
    
    # last we add this transfer to the initial_staging which
    # is done only once per used generator
    self.initial_staging.append(stage)
```

```py
# the kwargs is to keep the exmaple short, you should use explicit
# parameters and add appropriate docs
def task_run_msm_files(self, trajectories, **kwargs):
    # create the task and set the generator to self, our new generator
    t = PythonTask(self)

    # we want to copy the staged file to the worker directory
    # and name it `input.pdb`
    input_pdb = t.link(self['pdb_file_stage'], 'input.pdb')
    
    # if you chose not to use the staging file and copy it directly you
    # would use in analogy
    # input_pdb = t.link(self['pdb_file'], 'input.pdb')

    # finally we use `.call` and want to call the `remote_analysis` function
    # which we imported earlier from somewhere
    t.call(
        remote_analysis,
        trajectories=list(trajectories),
        **kwargs
    )

    return t
```

And finally a call_back function. The name `then_func` is the default function name to be called.

```py
# we use a static method, but you can of course write a normal method
@staticmethod
# the call_backs take these arguments in this order
# the second parameter is actually a `Model` object in this case
# which has a `.data` attribute
def then_func(project, task, model, inputs):
    # add the input arguments for later reference to the model
    model.data['input']['trajectories'] = inputs['kwargs']['files']
    model.data['input']['pdb'] = inputs['kwargs']['topfile']
    # and save the model in the project
    project.models.add(model)
```

A brief summary and things you need to set to make your generator work

```py
class MyGenerator(Analysis):
    def __init__(self, {things your generator always needs}):
        super(MyGenerator, self).__init__()
        
        # Add input files to self
        self['file1'] = file1

        # stage all files to the staging area of you want to keep these
        # files on the HPC
        for fn in ['file1', 'file2', ...]:
            stage = self[fn].transfer('staging:///')
            self[fn + '_stage'] = stage.target
            self.initial_staging.append(stage)

    @staticmethod
    def then_func(project, task, outputs, inputs):
        # do something with input and outputs
        # store something in your project

    def task_using_python_rpc(
            self,
            {arguments}):

        t = PythonTask(self)

        # set any task dependencies if you need
        t.dependencies = []
                
        input1 = t.link(self['file1'], 'alternative_name1')
        input2 = t.link(self['file2'], 'alternative_name2')
        ...

        # add whatever bash stuff you need BEFORE the function call
        t.append('some bash command')
        ...

        # use input1, etc in your function call if you like. It will
        # be converted to a regular file location you can use
        t.call(
            {my_remote_python_function},
            files=list(files),
        )

        # add whatever bash stuff you need AFTER the function call
        t.append('some bash command')
        ...

        return t

    def task_using_bash_argument_call(
            self,
            {arguments}):

        t = Task(self)

        # set any task dependencies if you need
        t.dependencies = []

        input1 = t.link(self['file1'], 'alternative_name1')
        input2 = t.link(self['file2'], 'alternative_name2')
        ...
        # add more staging
        t.append({action})
        ...

        # add whatever bash stuff you want to do
        t.append('some bash command')
        ...

        # add whatever staging stuff you need AFTER the function call
        t.append({action})
        ...
        
        return t
```

The simplified code for the OpenMMEngine

```py
class OpenMMEngine(Engine):
    trajectory_ext = 'dcd'

    def __init__(self, system_file, integrator_file, pdb_file, args=None):
        super(OpenMMEngine, self).__init__()

        self['pdb_file'] = pdb_file
        self['system_file'] = system_file
        self['integrator_file'] = integrator_file
        self['_executable_file'] = exec_file

        for fn in self.files:
            stage = self[fn].transfer(Location('staging:///'))
            self[name + '_stage'] = stage.target
            self.initial_staging.append(stage)

        if args is None:
            args = '-p CPU --store-interval 1'

        self.args = args

    # this one only works if you start from a file
    def task_run_trajectory_from_file(self, target):
        # we create a special Task, that has some additional functionality
        t = TrajectoryGenerationTask(self, target)

        # link all the files we require
        initial_pdb = t.link(self['pdb_file_stage'], Location('initial.pdb'))
        t.link(self['system_file_stage'])
        t.link(self['integrator_file_stage'])
        t.link(self['_executable_file_stage'])

        # use the initial PDB to be used
        input_pdb = t.get(target.frame, 'coordinates.pdb')

        # this represents our output trajectory
        output = Trajectory('traj/', target.frame, length=target.length, engine=self)

        # create the directory so openmmrun can write to it
        t.touch(output)

        # build the actual bash command
        cmd = 'python openmmrun.py {args} -t {pdb} --length {length} {output}'.format(
            pdb=input_pdb,
            length=target.length,
            output=output,
            args=self.args,
        )
        t.append(cmd)
        
        # copy the resulting trajectory directory back to the staging area
        t.put(output, target)

        return t
```

In [6]:
project.close()