# Inputs setter

This notebook is a tool to manually set the workflow inputs with descriptions and examples. Values set here are then exported to a json file which is mandatory to run the MoDEL analyses workflow.

Inputs:
1. [Chainnames](#chainnames)
2. [Ligands](#ligands)
3. [Interactions](#interactions)
4. [Topology references](#toporefs)
5. [Metadata](#metadata)
6. [Simulation parameters](#simulation)

In [1]:
inputs = {}

### Chainnames <a name="chainnames"></a>

Set the chain names<br />
This names are used to label chains in the web client

#### Example:

```python
{
    'A':'Protein',
    'B':'Protein',
    'G':'Glycans'
}
```

In [5]:
inputs['chainnames'] = {
    'A':'Spike',
    'B':'Spike',
    'C':'Spike',
}

### Ligands <a name="ligands"></a>

Set all ligands in the simulation. Each ligand has the following attributes:
    - name: this is used as chain name and as the ligand NGL representation label
    - ngl: NGL selection used to represent the ligand in the NGL viewer
    - prody: ProDy selection used to handle workflow analyses where this ligand is involved
    - acpype: set if the ligand charges must be guessed by ACPYPE (e.g. compounds)
    - accession (Optional): the drugbank accession which is used in the overview to make a link
    (DANI: Habría que cambiarle el nombre de 'accession' por 'drugbank')
    - chembl (Optional): the chembl accession which is used in the overview to make a link
    
NGL viewer selection:
http://nglviewer.org/ngl/api/manual/usage/selection-language.html    
ProDy selection:
http://prody.csb.pitt.edu/manual/reference/atomic/select.html
    
#### Example:

```python
{
    'name': 'Some compount',
    'ngl': ':L',
    'prody': 'chain L',
    'acpype': True,
    'accession': 'DB00945',
    'chembl': 'CHEMBL25',
}
{
    'name': 'Some nucleic acid',
    'ngl': ':A or :B',
    'prody': 'chain A or chain B',
    'acpype': False,
}
```


In [6]:
inputs['ligands'] = []

### Interactions <a name="interactions"></a>

[EXPERIMENTAL INPUT]

Set which are the interesting interactions to be analyzed<br />
A bunch of interaction-specific analyses will be run for each interaction and displayed in the web client

Interactions are defined by the 'agents' which are meant to interact pairwise. An 'agent' may be anything, even a group of unrelated molecules. The only condition is that agents must be defined by residues. The workflow will find out which residues of each agent are close enought to be considered as interface residues. These residues will be the ones considered in interface analyses<br />

Interactions are uploaded to the database as part of the project metadata. They include the interaction name, agents name, residues selection of both whole agents and the residue selections of both agent interfaces<br />

Each interaction has the following attributes:
    - name: a string tag used to relate interaction analyses data with their corresponding residues.
    In addition, the name is used to label the corresponding analyses in the web client
    - agent_1: the name of the first agent in the interaction, which is used to label in the client
    - selection_1: the prody selection of the first agent in the interaction
    - agent_2: the name of the second agent in the interaction, which is used to label in the client
    - selection_2: the prody selection of the second agent in the interaction
    
Prody selection:
http://prody.csb.pitt.edu/manual/reference/atomic/select.html
    
#### Example:

```python
{
    'name':'protein-ligand interaction',
    'agent_1': 'protein',
    'selection_1': 'not resname lig',
    'agent_2': 'ligand',
    'selection_2': 'resname lig',
}
{
    'name':'domain-domain interaction',
    'agent_1': 'domain 1',
    'selection_1': 'resnum 2:291',
    'agent_2': 'domain 2',
    'selection_2': 'resnum 306:529',
},
```

In [7]:
inputs['interactions'] = [
    {
        'name':'RBD 1-RBD 2 interaction',
        'agent_1': 'RBD 1',
        'selection_1': 'resnum 306:529',
        'agent_2': 'RBD 2',
        'selection_2': 'resnum 1593:1816',
    },
    {
        'name':'RBD 2-RBD 3 interaction',
        'agent_1': 'RBD 2',
        'selection_1': 'resnum 1593:1816',
        'agent_2': 'RBD 3',
        'selection_2': 'resnum 2878:3101',
    },
    {
        'name':'RBD 3-RBD 1 interaction',
        'agent_1': 'RBD 3',
        'selection_1': 'resnum 2878:3101',
        'agent_2': 'RBD 1',
        'selection_2': 'resnum 306:529',
    },
    {
        'name':'NTD 1-RBD 2 interaction',
        'agent_1': 'NTD 1',
        'selection_1': 'resnum 2:291',
        'agent_2': 'RBD 2',
        'selection_2': 'resnum 1593:1816',
    },
    {
        'name':'NTD 2-RBD 3 interaction',
        'agent_1': 'NTD 2',
        'selection_1': 'resnum 1289:1578',
        'agent_2': 'RBD 3',
        'selection_2': 'resnum 2878:3101',
    },
    {
        'name':'NTD 3-RBD 1 interaction',
        'agent_1': 'NTD 3',
        'selection_1': 'resnum 2574:2863',
        'agent_2': 'RBD 1',
        'selection_2': 'resnum 306:529',
    },
]

### Topology references <a name="toporefs"></a>

[EXPERIMENTAL INPUT]

Set the protein references (if any) according to other standard databases (e.g. UniProt, NCBI).

These references include the aminoacids sequence and allows the resiude aligns to find mutations, deletions and other possible modifications. In addition, it is used to represent correctly the epitopes and mutations analyses.

These references have been added to the workflow and database manually. Check the workflow resources to know which references are available. Use the reference names to include them.
e.g. 'SARS-CoV-2 spike'

In [None]:
inputs['toporefs'] = ['SARS-CoV-2 spike']

In addition, you may set extra customized default representations (ngl configurations) which are interesting and thus must have an independent highlight system. This has no effect in the workflow, but it is very visual in the client.<br />
<span style="color:red">WARNING: Make sure whatever you want to highlight is not already highlighted by the topology reference</span>

In [8]:
inputs['customs'] = []

## Metadata <a name="metadata"></a>

The following metadata has no effect on the workflow itself, but they will be written to the metadata file. These values will be uploaded to the database and then exposed in the web client

Set which family does this trajectory belong to. For example, in the Bioexcel CV19 context, families are 'ACE2', 'Spike', 'RBD', etc.

In [9]:
inputs['unit'] = 'Spike'

<span style="color:red">DANI: Esto está completamente obsoleto</span><br />
<span style="color:red">DANI: Cuando volvamos a trabajar con simulaciones de membrana habrá que plantearse un nuevo sistema tipo 'ligands' pero para las membranas</span><br />

Set the membrane type as the name of membrane resiudes in the pdb file<br />
Set the membrane as 'No' if there is no membrane<br />

In [10]:
inputs['membrane'] = 'No'

Set the source pdb of the trajectory structure<br />
Additional data from the pdb is harvested by the loader while uploading to the database<br />
This data is displayed in the overview page

In [11]:
inputs['pdbId'] = '6ACC'

Write a breif description or title for this trajectory for the overview page<br />
This name may be used by the client to search the trajectory in the database<br />
The name is displayed in the overview page

In [12]:
inputs['name'] = ('SARS-CoV2 spike glycoprotein homotrimeric head-in a closed conformation (1μs)')

Write additional comments<br />
The description is displayed in the overview page

In [13]:
inputs['description'] = ("1μs simulation trajectoryof the spike glycoprotein head based on the structure released on the SwissModel website to which 18 N-glycans were added on each subunit based on the experimentally determined glycomic profile. The C-and N-peptide termini are capped with amide and acetyl groups, respectively.The simulation was carried out using the Amber ff14SB force field and GLYCAM_06j for the protein and N-glycans, respectively. The periodic-cubic water box was solvated with TIP3P water molecules. The system was neutralized with NaClto attain an ionic concentration of 0.15M.The simulations were conducted at 310 K in the NPT ensemble.")

Write author names<br />
Authors are displayed in the overview page

In [14]:
inputs['authors'] = ('Giulia Paiardi, Stefan Richter, Marco Rusnati, Rebecca Wade')

Write author group name/s
The group is displayed in the overview page

In [15]:
inputs['groups'] = ('Molecular and Cellular Modeling Group, Heidelberg Institute for Theoretical Studies (HITS), Heidelberg, Germany. Experimental Oncology and Immunology, Department of Molecular and Translational Medicine, University of Brescia, Brescia, Italy.')

How to contact the authors. The contact is displayed in the overview page

In [16]:
inputs['contact'] = 'For further informations or any queries, please contact mcmsoft@h-its.org'

Program (software) name which carried the trajectory and its version<br />
Program and version are both displayed in the overview page

In [17]:
inputs['program'] = 'Amber20'
inputs['version'] = '2020'

License and link to the license web page<br />
The license is displayed in the overview page. Under the license there is a 'More information' button. The link is used to redirect the user when the button is clicked

In [18]:
inputs['license'] = ("This trajectory dataset is released under a Creative Commons "
           "Attribution 4.0 International Public License")
inputs['linkcense'] = "https://creativecommons.org/licenses/by/4.0/"

Citation for refering this simulation. The citation is displayed in the overview page<br />
To set a citation use the following instructions:
To add a line break type '(br)' inside the citation string
To add superior text type '^' before each character

In [19]:
inputs['citation'] = 'Mechanism of inhibition of Sars-CoV2 infection by the interaction of the spike glycoprotein with heparin.(br)Giulia Paiardi, Stefan Richter, Marco Rusnati, Rebecca C. Wade. In preparation.'

Acknowledgements to be shown in the overview page

In [20]:
inputs['thanks'] = 'We gratefully acknowledge PRACE for awarding us access to Marconi100 based in Italy at CINECA (Project COVID19-54) to generate these trajectories. The technical support of Alessandro Grottesi from CINECA (Italy) and Filippo Spiga from NVIDIA is gratefully acknowledged. We thank the Klaus Tschira Foundation for support. G.P. was supported by Erasmus+, an EMBO short-term fellowship (STF_8594) and The Guido Berlucchi foundation young researchers mobility program.'

## Simulation parameters <a name="simulation"></a>

These inputs may be automatically mined from the topology and trajectory files<br />
However they also may be forced here<br />
<span style="color:red">DANI: Todo mentira. El minado de metadata para estos valores no funciona casi nunca</span><br />
<span style="color:red">DANI: Dejé de mantenerlo hace tiempo y hay que poner todos los valores a mano</span>

Length is an important value since it is used in many graph axes in the web client

In [21]:
inputs['length'] = 1000 # In nanoseconds (ns)

The rest of values are displayed in the web client as trajectory metadata<br />
These values do not affect other outcomes

In [22]:
inputs['temp'] = 310 # In Kelvin (K)
inputs['ensemble'] = 'NPT' # e.g. NVT, NPT, etc.
inputs['timestep'] = None # In fs (fs/step)
inputs['pcoupling'] = None # e.g. Isotropic
inputs['ff'] = 'ff14SB GLYCAM_06j' # Force fields
inputs['wat'] = 'TIP3P' # Water force field
inputs['boxtype'] = 'Triclinic' # e.g. Triclinic

## Export

Finally export everything to json format

In [23]:
import json

# Export it to json
inputs_filename = 'inputs.json'
with open(inputs_filename, 'w') as file:
    json.dump(inputs, file, indent=4)