# AdaptiveMD

## Example 1 - Setup

### 0. Imports

In [1]:
import sys, os

Alright, let's load the package and pick the `Project` since we want to start a project

In [2]:
from adaptivemd import Project

Let's open a project with a UNIQUE name. This will be the name used in the DB so make sure it is new and not too short. Opening a project will always create a non-existing project and reopen an exising one. You cannot chose between opening types as you would with a file. This is a precaution to not accidentally delete your project.

In [3]:
# Use this to completely remove the example-worker project from the database.
Project.delete('tutorial')

In [4]:
project = Project('tutorial')

Now we have a handle for our project. First thing is to set it up to work on a resource.

### 1. Set the resource

What is a resource? A `Resource` specifies a shared filesystem with one or more clusteres attached to it. This can be your local machine or just a regular cluster or even a group of cluster that can access the same FS (like Titan, Eos and Rhea do).

Once you have chosen your place to store your results this way it is set for the project and can (at least should) not be altered since all file references are made to match this resource. Currently you can use the Fu Berlin Allegro Cluster or run locally. There are two specific local adaptations that include already the path to your conda installation. This simplifies the use of `openmm` or `pyemma`.

Let us pick a local resource on a laptop for now.

In [5]:
from adaptivemd import LocalCluster, AllegroCluster

first pick your resource -- where you want to run your simulation. Local or on Allegro

In [6]:
resource = LocalCluster()

Now you can add some additional paths, conda environment, etc, before we setup the project. This works by setting a special task `.wrapper` (see notebook 4 for more things to do)

In [7]:
resource.wrapper

<adaptivemd.task.DummyTask at 0x10ee6f150>

In a nutshell, this dummy task has a `.pre` and `.post` list of commands you can add any command you want to be executed before every task you run.

In [8]:
resource.wrapper.pre.append('echo "Hello World"')

A task can also automatically add to the `PATH` variable, set environment variables and you can add conda environments

In [9]:
resource.wrapper.add_conda_env('my_env_python_27')

In [10]:
resource.wrapper.add_path('/x/y/z')

In [11]:
resource.wrapper.environment['CONDA'] = 'True'

In [12]:
print resource.wrapper.description

Task: DummyTask
<pre>
export PATH=/x/y/z:$PATH
export CONDA=True
echo "Hello World"
</pre>
<main />
<post>
</post>


Let's reset that now and just add a little comment

In [13]:
resource = LocalCluster()
resource.wrapper.pre.append('# This is part of the adaptivemd tutorial')

and finally initialize the project with this specific resource. This is done once for a project and should not be altered.

In [14]:
project.initialize(resource)

### 2. Add `TaskGenerators`

TaskGenerators are instances whose purpose is to create tasks to be executed. This is similar to the
way Kernels work. A TaskGenerator will generate `Task` objects for you which will be translated into a `ComputeUnitDescription` and executed. In simple terms:

**The task generator creates the bash scripts for you that run a simulation or run pyemma.**

A task generator will be initialized with all parameters needed to make it work and it will now what needs to be staged to be used.

In [15]:
from adaptivemd.engine.openmm import OpenMMEngine
from adaptivemd.analysis.pyemma import PyEMMAAnalysis

from adaptivemd import File, Directory

#### The engine

A task generator that will create jobs to run simulations. Currently it uses a little python script that will excute OpenMM. It requires conda to be added to the PATH variable or at least openmm to be installed on the cluster. If you setup your resource correctly then this should all happen automatically.

First we define a `File` object. These are used to represent files anywhere, on the cluster or your local application. `File` like any complex object in adaptivemd can have a `.name` attribute that makes them easier to find later.

In [16]:
pdb_file = File('file://../files/alanine/alanine.pdb').named('initial_pdb').load()

Here we used a special prefix that can point to specific locations. 

- `file://` points to files on your local machine. 
- `unit://` specifies files on the current working directory of the executing node. Usually these are temprary files for a single execution.
- `shared://` specifies the root shared FS directory (e.g. `NO_BACKUP/` on Allegro) Use this to import and export files that are already on the cluster.
- `staging://` a special scheduler specific directory where files are moved after they are completed on a node and should be used for later. Use this to relate to files that should be stored or reused. After you one excution is done you usually move all important files to this place.
- `sandbox://` this should not concern you and is a special RP folder where all pilot/session folders are located.

The `.load()` at the end is important. It causes the `File` object to load the content of the file and if you save the `File` object, the actual file is stored with it. This way it can simply be rewritten on the cluster or anywhere else.

So let's do an example for an OpenMM engine. This is simply a small python script that makes OpenMM look like a executable. It run a simulation by providing an initial frame, OpenMM specific system.xml and integrator.xml files and some additional parameters like the platform name, how often to store simulation frames, etc.

In [17]:
engine = OpenMMEngine(
    pdb_file=pdb_file,
    system_file=File('file://../files/alanine/system.xml').load(),
    integrator_file=File('file://../files/alanine/integrator.xml').load(),
    args='-r --report-interval 1 -p CPU --store-interval 1'
).named('openmm')

To explain this we have now an OpenMMEngine which uses the previously made pdb `File` object and uses the location defined in there. The same some Files for the OpenMM XML files and some args to store each frame (to keep it fast) and run using the `CPU` kernel.

Last we name the engine `openmm` to find it later.

In [18]:
engine.name

'openmm'

#### The modeller

The instance to compute an MSM model of existing trajectories that you pass it. It is initialized with a `.pdb` file that is used to create features between the $c_\alpha$ atoms. This implementaton requires a PDB but in general this is not necessay. It is specific to my PyEMMAAnalysis show case.

In [19]:
modeller = PyEMMAAnalysis(
    pdb_file=pdb_file
).named('pyemma')

Again we name it `pyemma` for later reference.

#### Add generators to project

Next step is to add these to the project for later usage. We pick the `.generators` store and just add it. Consider a store to work like a `set()` in python. It contains objects only once and is not ordered. Therefore we need a name to find the objects later. Of course you can always iterate over all objects, but the order is not given.

To be precise there is an order in the time of creation of the object, but it is only accurate to seconds and it really is the time it was created and not stored.

In [20]:
project.generators.add(engine)
project.generators.add(modeller)

Note, that you cannot add the same engine twice. But if you create a new engine it will be considered different and hence you can store it again. 

### 3. Create one intial trajectory

Finally we are ready to run a first trajectory that we will store as a point of reference in the project. Also it is nice to see how it works in general.

We are using a _Worker_ approach. This means simply that someone (in our case the user from inside a script or a notebook) creates a list of tasks to be done and some other instance (the worker) will actually do the work.

#### Create a `Trajectory` object

First we create the parameters for the engine to run the simulation. Since it seemed appropriate we use a `Trajectory` object (a special `File` with initial frame and length) as the input. You could of course pass these things separately, but this way, we can actualy reference the no yet existing trajectory and do stuff with it.

A Trajectory should have a unique name and so there is a project function to get you one. It uses numbers and makes sure that this number has not been used yet in the project.

In [21]:
trajectory = project.new_trajectory(engine['pdb_file'], 100)
trajectory

Trajectory('alanine.pdb' >> [0..100])

This says, initial is `alanine.pdb` run for 100 frames and is named `xxxxxxxx.dcd`.

#### Create a `Task` object

Now, we want that this trajectory actually exists so we have to make it. This requires a `Task` object that _knows_ to describe a simulation. Since `Task` objects are very flexible and can be complex there are helper functions (i.e. factories) to get these in an easy manner, like the ones we already created just before. Let's use the openmm engine to create an openmm task now.

In [22]:
task = engine.task_run_trajectory(trajectory)

That's it, just take a trajectory description and turn it into a task that contains the shell commands and needed files, etc. 

#### Submit the task to the queue

Finally we need to add this task to the things we want to be done. This is easy and only requires saving the task to the project. This is done to the `project.tasks` bundle and once it has been stored it can be picked up by any worker to execute it.

In [23]:
project.queue(task)  # shortcut for project.tasks.add(task)

That is all we can do from here. To execute the tasks you need to run a worker using

```bash
adaptivemdworker -l tutorial --verbose
```

Once this is done, come back here and check your results. If you want you can execute the next cell which will block until the task has been completed.

In [24]:
print project.files
print project.trajectories

<StoredBundle for with 6 file(s) @ 0x10ee65a50>
<ViewBundle for with 1 file(s) @ 0x10ee65c10>


and close the project.

In [25]:
project.close()

The final project.close() will close the DB connection. 