# Inputs setter

This notebook is a tool to manually set the workflow inputs with descriptions and examples. Values set here are then exported to a json file which is mandatory to run the MoDEL analyses workflow.

Inputs:
1. [Project metadata](#project)
2. [Simulation metadata](#simulation)
3. [Analysis parameters](#analysis)
4. [Representation parameters](#representation)
6. [Collections](#collections)

In [1]:
inputs = {}

## Project metadata <a name="project"></a>

The following metadata has no effect on the workflow itself, but it will be written to the output metadata file. These values will be uploaded to the database and then exposed in the project overview. They may be also useful to search this simulation in the browser.

Write a breif description or title for this trajectory for the overview page<br />
This name may be used by the client to search the trajectory in the database<br />
The name is displayed in the overview page

In [2]:
inputs['name'] = "Folding@home molecular dynamics simulations of Diamond Light Source / XChem X-ray structures of small molecule inhibitors of the SARS-CoV-2 main protease from the COVID Moonshot"

Write additional comments<br />
The description is displayed in the overview page

In [3]:
inputs['description'] = "This dataset contains all-atom molecular dynamics simulations starting from Diamond Light Source / XChem X-ray structures of small molecule inhibitors of the SARS-CoV-2 main viral protease (Mpro, 3CLpro) from the COVID Moonshot simulated on Folding@home with gromacs. All X-ray structures come from the XChem Fragalysis platform as of 2020-12-21. These structures are linked to activity data on the COVID Moonshot Structures browser."

Write author names<br />
Authors are displayed in the overview page

In [4]:
inputs['authors'] = "John D. Chodera"

Write author group name/s<br />
The group is displayed in the overview page

In [5]:
inputs['groups'] = "Chodera lab"

How to contact the authors. The contact is displayed in the overview page

In [6]:
inputs['contact'] = None

Program (software) name which carried the trajectory and its version<br />
Program and version are both displayed in the overview page<br/>
<span style="color:red">WARNING: Check the database search page in order to see current values</span>

In [7]:
inputs['program'] = 'OpenMM'
inputs['version'] = '7.4.2'

Type of molecular dynamics.<br/>
At this moment there are only two options in this field: 'trajectory' and 'ensemble'.<br/>
Note that this field has an effect on the client. Some time-dependent analysis will change the labels of their axes in order to make sense. e.g. RMSD X axis will be 'frames' instead of 'time'.

In [8]:
inputs['type'] = 'trajectory'

MD method
e.g. 'Classical MD', 'Targeted MD', 'Biased MD (Accelerated Weighted Ensemble)', Enhanced sampling (Hamiltonian Replica Exchange) ...
MD method is displayed in the overview

In [9]:
inputs['method'] = 'Classical MD'

License and link to the license web page<br />
The license is displayed in the overview page. Under the license there is a 'More information' button. The link is used to redirect the user when the button is clicked.

#### Example:

```json
inputs['license'] = ("This trajectory dataset is released under a Creative Commons "
           "Attribution 4.0 International Public License")
inputs['linkcense'] = "https://creativecommons.org/licenses/by/4.0/"
```

In [10]:
inputs['license'] = ("This trajectory dataset is released under a Creative Commons "
           "Attribution 4.0 International Public License")
inputs['linkcense'] = "https://creativecommons.org/licenses/by/4.0/"

Citation for refering this simulation. The citation is displayed in the overview page<br />
To set a citation use the following instructions:
To add a line break type '(br)' inside the citation string
To add superior text type '^' before each character

In [11]:
inputs['citation'] = None

Acknowledgements to be shown in the overview page

In [12]:
inputs['thanks'] = "We are thankful to the Folding@home community for their generous donations that made this simulation dataset possible. SS is a Damon Runyon Quantitative Biology Fellow funded by the Damon Runyon Cancer Research Foundation"

Links to somewhere related to the simulation


WARNING: This field has no effect anywhere in our work stream BUT others may rely on it. MolSSI uses this field to find simulations in our database and then to place the embed viewer in their website. You must fit to the standard when adding a new MolSSI simulation.

#### Example:

```json
[
    {
        'name': 'Data source',
        'url': 'https://data.source.org/'
    },
    {...}
]
```

In [13]:
inputs['links'] = [
    {
        'name': 'Data source',
        'url': 'https://covid.molssi.org//simulations/#foldinghome-molecular-dynamics-simulations-of-diamond-light-source--xchem-x-ray-structures-of-small-molecule-inhibitors-of-the-sars-cov-2-main-protease-from-the-covid-moonshot'
    },
]

Set the source pdb ids of the trajectory structure<br />
Additional data from the pdb is harvested by the loader while uploading to the database<br />
This data is displayed in the overview page

#### Example:

```python
['2AJF', '6M17']
```

In [14]:
inputs['pdbIds'] = []

## Simulation metadata <a name="simulation"></a>

These inputs may be automatically mined from the topology and trajectory files<br />
However they also may be forced here<br />
<span style="color:red">DANI: Todo mentira. El minado de metadata para estos valores no funciona casi nunca</span><br />
<span style="color:red">DANI: Dejé de mantenerlo hace tiempo y hay que poner todos los valores a mano</span>

Time length in nanoseconds (ns). May be None if this is not a trajectory, but an ensemble.
Length is an important value since it is used in many graph axes in the web client.

In [15]:
inputs['length'] = None # In nanoseconds (ns)

The rest of values are displayed in the web client as trajectory metadata<br />
These values do not affect other outcomes<br/>
<span style="color:red">WARNING: Check the database search page in order to see current values</span><br/>
<span style="color:red">WARNING: Specially check already existing force fields and how they are named</span>

In [16]:
inputs['temp'] = 300 # In Kelvin (K)
inputs['ensemble'] = 'NPT' # e.g. NVT, NPT, etc.
inputs['timestep'] = None # In fs (fs/step)
inputs['ff'] = ['Amber ff14SB', 'OpenFF 1.3.0'] # Force fields (e.g. ['CHARMM36'])
inputs['wat'] = 'TIP3P' # Water force field (e.g. TIP3P)
inputs['boxtype'] = None # e.g. Triclinic, Cubic, Dodecahedron

## Analysis parameters <a name="analysis"></a>

These fields have an impact in the analysis workflow.

### Interactions <a name="interactions"></a>

Set which are the interesting interactions to be analyzed<br />
A bunch of interaction-specific analyses will be run for each interaction and displayed in the web client

Interactions are defined by the 'agents' which are meant to interact pairwise. An 'agent' may be anything, even a group of unrelated molecules. The only condition is that agents must be defined by residues. The workflow will find out which residues of each agent are close enought to be considered as interface residues. These residues will be the ones considered in interface analyses<br />

Interactions are uploaded to the database as part of the project metadata. They include the interaction name, agents name, residues selection of both whole agents and the residue selections of both agent interfaces<br />

Each interaction has the following attributes:
   - name: a string tag used to relate interaction analyses data with their corresponding residues.
    In addition, the name is used to label the corresponding analyses in the web client
   - agent_1: the name of the first agent in the interaction, which is used to label in the client
   - selection_1: the VMD selection of the first agent in the interaction
   - agent_2: the name of the second agent in the interaction, which is used to label in the client
   - selection_2: the VMD selection of the second agent in the interaction
    
VMD atom selection language:
https://www.ks.uiuc.edu/Research/vmd/vmd-1.3/ug/node132.html
    
#### Example:

```python
{
    'name': 'protein-ligand interaction',
    'agent_1': 'protein',
    'selection_1': 'not resname lig',
    'agent_2': 'ligand',
    'selection_2': 'resname lig',
},
{
    'name': 'domain-domain interaction',
    'agent_1': 'domain 1',
    'selection_1': 'resid 2 to 291',
    'agent_2': 'domain 2',
    'selection_2': 'resid 2 to 291',
},
```

In [17]:
inputs['interactions'] = [
    {
        'name': 'Protease-ligand interaction',
        'agent_1': 'Protease',
        'selection_1': 'chain A',
        'agent_2': 'Ligand',
        'selection_2': 'chain B',
    },
]

### Periodic boundary conditions selection <a name="pbcs"></a>

[EXPERIMENTAL INPUT]

Set those residues which are under periodic boundary conditions (PBC)<br/>
These residues are excluded from the imaging centering and fitting<br/>
These residues are excluded in the follwoing analyses:
   - RMSD: Sudden jumps in PBC residues result in non-sense high peaks
   - RMSD per residue: Sudden jumps in PBC residues result in non-sense high peaks
   - RMSD pairwise: Sudden jumps in PBC residues result in non-sense high peaks
   - TM score: Sudden jumps in PBC residues result in non-sense high peaks
   - RGYR: Sudden jumps in PBC residues result in non-sense high changes
   - RMSF: Sudden jumps in PBC residues result in non-sense high peaks
   - PCA: Sudden jumps make not sense in PCA and they eclipse non-PBC movements
   - SASA: Residues close to the boundary will be considered exposed to solvent while they may be not
   - Pockets: Residues close to the boundary may be considered to have pockets while they have not <span style="color:red">(DANI: Esto en realidad no se puede hacer porque fpocket no permite "descartar" átomos de manera inteligente. Si quitas átomos para que no encuentre pockets en ellos entonces pueden aparecer pockets en los sitios que están ocupados por estos átomos. De momento descartamos el análisis entero cuando hay algo en PBC y listo)<span />

Note that this input is used mostly for membranes since the most tipical residues under periodic boundary conditions (solvent and counter ions) are usually removed from the trajectory. They are to be included in this field as well when they are not removed. This field is also useful for those scenarions with several protein or nucleic acid molecules floating around and crossing boundaries. In this situation you can not image and fit all molecules. You must focus in one molecule and let the others stay in PBC.

These residues are defined using VMD selection lenguage.<br/>


VMD atom selection language:
https://www.ks.uiuc.edu/Research/vmd/vmd-1.3/ug/node132.html

In [18]:
inputs['pbc_selection'] = None

## Representation parameters  <a name="representation"></a>

These fields have an impact in the display of the simulation once in the web client.

### Forced references

Set which reference sequences must be used in order to map residues in the structure of the simulation. EMBL/UniProt accession ids are accepted. If forced references are not provided (which is totally acceptable) or the provided forced references do not fully cover the structure then a blast will be run for each orphan chain sequence.
In addition, accessions may be guessed from the PDB ids, when provided.
Note that forced references may be provided as a list (then it is guessed where each reference belongs to) os as a dictionary (then the user specifies which reference belongs to each chain).
Use the "noref" flag to mark a chain as "no referable" (e.g. antibodies, synthetic constructs).

#### Example:

```python
[ "Q9BYF1", "P0DTC2" ]
{ "A": "Q9BYF1", "B": "P0DTC2" }
{ "A": "Q9BYF1", "B": "noref" }
```

In [19]:
inputs['forced_references'] = ["P0DTD1"]

### Chainnames <a name="chainnames"></a>

Set the chain names<br />
This names are used to label chains in the web client

#### Example:

```python
{
    'A':'Protein',
    'B':'Protein',
    'G':'Glycans'
}
```

In [20]:
inputs['chainnames'] = {'A': 'Protease', 'B': 'Ligand'}

### Ligands <a name="ligands"></a>

Set all ligands in the simulation. Each ligand has the following attributes:
    - name: this is used as the ligand NGL representation label
    - ngl: NGL selection used to represent the ligand on the NGL viewer in the client
    - drugbank (Optional): the drugbank accession which is used in the overview to make a link
    - chembl (Optional): the chembl accession which is used in the overview to make a link
    
At this moment, ligands are used nowhere in the workflow
    
NGL viewer selection:
http://nglviewer.org/ngl/api/manual/usage/selection-language.html

<span style="color:green">LORE: This field is the predecessor of the 'interactions' field. Now it is deprecated but we are still maintaining it since one day this information may be useful, we already have it in the database and the web client is still relying on it to better represent small ligands.
    
#### Example:

```python
{
    'name': 'Some compount',
    'ngl': ':L',
    'drugbank': 'DB00945',
    'chembl': 'CHEMBL25'
},
{
    'name': 'Some nucleic acid',
    'ngl': ':A or :B'
}
```


In [21]:
inputs['ligands'] = [
    {
        'name': 'Ligand',
        'ngl': ':B'
    }
]

### Membranes <a name="membranes"></a>

Set those elements which must be considered membrane<br/>
These elements will be representated in the web client with a specific pattern:
- Licorice
- Purple color
- Low opacity


To define a membrane it is required a name and a selection in VMD selection lenguage


VMD atom selection language:
https://www.ks.uiuc.edu/Research/vmd/vmd-1.3/ug/node132.html

<span style="color:green">LORE: This field is the predecessor of the 'pbc_selection' field. Now it is deprecated but we are still maintaining it since one day this information may be useful and we already have it in the database.

<span style="color:red">DANI: Aunque este campo ya no debería tener efecto en el workflow sigue teniendo uno: el análisis de pockets no se corre si hay membranas<br/>
<span style="color:red">DANI: Esto no está implementado relamente, pero va bien tener estos valores en la db para el día que nos pongamos a ello<br/>

#### Example:

```python
{
    'name': 'Cell membrane',
    'selection': 'chain M',
}
```

In [22]:
inputs['membranes'] = []

### Custom representations

The web client sets some default representations (ngl configurations) to highlight important features in the structure according to the topology reference or interactions. 

In addition, you may set extra customized representations which are interesting and thus must have an independent highlight system. This has no effect in the workflow, but it is very visual in the client.<br />
<span style="color:red">WARNING: Make sure whatever you want to highlight is not already highlighted by default or it would be duplicated</span>

#### Example:

```python
[
    {
        "name" : "Remdesivir",
        "representations" : [
            {
                "name" : "Remdesivir",
                "selection" : "REM",
                "type" : "licorice"
            }
        ]
    }
]
```

In [23]:
inputs['customs'] = []

### Curated orientation

Set a specific starting orientation for the web client viewer.
Normally this is done once the simulation has been uploaded since there is no easy way to get the orientation before.

#### Example:

```python
[
    72.05997406618104,
    21.871748915422142,
    47.89720038949639,
    0,
    34.3234627961572,
    42.053333152877315,
    -70.84188126104011,
    0,
    -39.93012781662099,
    75.61943426331311,
    25.542927052994127,
    0,
    -63.015499114990234,
    -33.07249975204468,
    -39.439000606536865,
    1
]
```

In [24]:
inputs['orientation'] = None

# Collections  <a name="collections"></a>

Set to which collection does this simulation belong to.<br />
Set also additional collection_related metadata values.<br />

Currently supported collections:
   - cv19
   - mcns

In [25]:
inputs['collections'] = ['cv19']

### BioExcel-CV19 specific metadata fields

Set which family does this trajectory belong to.

Supported units:
- RBD-ACE2
- RBD
- ACE2
- Spike
- 3CLpro
- PLpro
- Polymerase
- E protein
- Exoribonuclease
- Other

In [26]:
if 'cv19' in inputs['collections']:
    inputs['cv19_unit'] = '3CLpro'

## Export

Finally export everything to json format

In [27]:
import json

# Export it to json
inputs_filename = 'inputs.json'
with open(inputs_filename, 'w') as file:
    json.dump(inputs, file, indent=4)