## 01 Conversion

This tutorial serves as an example workflow for converting between molecular file formats using BioSimSpace. We will also show you how to work using 
three different approaches: The jupyter notebook and the command line, where either nodes objects or python can be used. 

<a rel="license" href="http://creativecommons.org/licenses/by/4.0/"><img alt="Creative Commons Licence" style="border-width:0" src="https://i.creativecommons.org/l/by/4.0/88x31.png" title='This work is licensed under a Creative Commons Attribution 4.0 International License.' align="right"/></a>

Authors: 
 
[Lester Hedges -- @lohedges](https://github.com/lohedges)
[Sofia Bariami -- @SofiaBriami](https://github.com/SofiaBariami)

## Learning objectives:
- Be able to convert between different molecular file formats using BioSimSpace
- Learn how to use the interactive computing environment of the jupyther notebooks 
- Learn how to use nodes objects to create python scripts  



**Reading time**:
~ 20 mins

**Jupyter cheat sheet**:
- to run the currently highlighted cell, hold <kbd>&#x21E7; Shift</kbd> and press <kbd>&#x23ce; Enter</kbd>;
- to get help for a specific function, place the cursor within the function's brackets, hold <kbd>&#x21E7; Shift</kbd>, and press <kbd>&#x21E5; Tab</kbd>;

### Let's start with the necessary imports

In [2]:
import BioSimSpace as BSS




Sending anonymous Sire usage statistics to http://siremol.org.
For more information, see http://siremol.org/analytics
To disable, set the environment variable 'SIRE_DONT_PHONEHOME' to 1
To see the information sent, set the environment variable 
SIRE_VERBOSE_PHONEHOME equal to 1. To silence this message, set
the environment variable SIRE_SILENT_PHONEHOME to 1.



We begin by creating a Node object. This is the core of our molecular workflow component. It defines what it does, what input is needed, and the output that is produced.

In [14]:
node = BSS.Gateway.Node("A node to convert between molecular file formats.")

Nodes require inputs. To specify inputs we use the Gateway module, which is used as a bridge between BioSimSpace and the outside world. This will allow us to document the inputs, define their type, and specify any constraints on their allowed values. Here we will need a set of files that define the molecular system, and a string that will specify the format to convert to. Note that the string can only be one of a set of allowed values. We directly query the BSS.IO package to generate the dropdown of allowed options. (If you don't know what these formats are, then run, e.g. BSS.IO.formatInfo("GroTop") for a description.)

In [20]:
node.addInput("files", BSS.Gateway.FileSet(help="A set of molecular input files."))
node.addInput("format", BSS.Gateway.String(help="The molecular file format to convert to.", 
                                           allowed=BSS.IO.fileFormats(), default="PDB"))

We now need to define the output of the node. In this case we will return a file containing the converted molecular system.

In [21]:
node.addOutput("converted", BSS.Gateway.File(help="The converted molecular system."))



When working interactively within a Jupyter notebook we need a way to allow users to set the input requirements. The `node.showControls` method will display a graphical user interface (GUI), from which inputs can be set. Note that the GUI requires active user input. All input requirements that don't have a default value _must_ be set before the node can proceed. If you try to query the node for one of the user values then an error will be raised. Use the dropdown button to choose a file format from the list of allowed values.

When working interactively you will typically be running on a remote server where you won't have access to the local filesystem. In this case you'll need to upload files for any of the `File` or `FileSet` input requirements. The GUI below will provide buttons that allow you to browse your own filesystem and select files. Since Jupyter has a limit of 5MB for file transfers, we provide support for compressed formats, such as `.zip` or `.tar.gz`. (A single archive can contain a set of files, allowing you to set a single value for a `FileSet` requirement.) We've provided some example input files that can be used in the training notebooks, which are available to download from the links below. These can then be re-uploaded using the GUI.

AMBER: [ala.crd](https://raw.githubusercontent.com/michellab/BioSimSpace/devel/demo/amber/ala/ala.crd), [ala.top](https://raw.githubusercontent.com/michellab/BioSimSpace/devel/demo/amber/ala/ala.top)

GROMACS: [kigaki.gro](https://raw.githubusercontent.com/michellab/BioSimSpace/devel/demo/gromacs/kigaki/kigaki.gro), [kigaki.top](https://raw.githubusercontent.com/michellab/BioSimSpace/devel/demo/gromacs/kigaki/kigaki.top)

NAMD: [alanin.pdb](https://raw.githubusercontent.com/michellab/BioSimSpace/devel/demo/namd/alanin/alanin.pdb), [alanin.psf](https://raw.githubusercontent.com/michellab/BioSimSpace/devel/demo/namd/alanin/alanin.psf), [alanin.params](https://raw.githubusercontent.com/michellab/BioSimSpace/devel/demo/namd/alanin/alanin.params)

When uploading files the name of the current file will replace the `Browse` button. If you need to change the file, simply click on the button again and choose a new file. For `FileSet` requirements, a new `Browse` button will appear whenever an additional file is uploaded.

In [22]:
node.showControls()

Box(children=(Box(children=(Box(children=(Label(value='files: A set of molecular input files.'), FileUploadWid…

Once all requirements are set then we can acces the values using the 'node.getInput' method. The first time this is called the node will automatically validate all of the input and report the user if any errors were found.

Now it is time to create a molecular system using the input files that we uploaded. Note that we don't specify the format of the files, since this is automatically determined by BioSimSpace. BioSimSpace has support for a wide range of formats and can convert between certain formats too.

In [7]:
system = BSS.IO.readMolecules(node.getInput("files"))

ValueError: Value is unset for requirement 'files'!

Next we write the system to the chosen molecular file format and set this as the output variable of the node. This is done by using the 'BioSimSpace.IO' package. If the format conversion is not possible then an error will be thrown. 
Note that the `saveMolecules` function returns a list containing the names of the files in which they were written. Since there is only one file, we take the first item from the list ([0]).

In [24]:
file_format = node.getInput("format")
node.setOutput("converted", BSS.IO.saveMolecules("converted", system, file_format)[0])

ValueError: Value is unset for requirement 'files'!

Finally, we need to check that all output requirements are satisfied and that no errors were raised by the user. To do so, we validate that the node completed succesfully. Note that the validation will fail until the cell above finishes running. Any file outputs will be available for the user to download as a compressed archive.



In [26]:
node.validate()

Missing output for requirement 'converted'


SystemExit: Node failed!

  warn("To exit: use 'exit', 'quit', or Ctrl-D.", stacklevel=1)


While running commands with jupyter on your local machine is very straight-forward and self-contained, sometimes  we need to use more computer resources. This might mean that you need to host your work on a remote computer, and run all the above commands as a script. Luckily, there is an option to download our node as a regular Python script, that can be run from the command-line. Once we are satisfied with our node and the output, click on: 'File/Download As/Python' to get the script.