# `standardiser`

This is the first publically-released version of a tool designed to provide a simple way of standardising molecules as a prelude to _e.g._ molecular modelling exercises. A Python module is provided that performs the complete standardisation procedure; in addition, the modules that implement the individual steps may be used seperately if required, perhaps as part of a custom standardisation pipeline.

The tool is open-source and is available from [GitHub](https://github.com/flatkinson/standardiser).

A slide-set describing some of the background to the project is shown below...


In [1]:
import IPython; IPython.display.HTML('<iframe src="standardiser.pdf" width="800" height="600"></iframe>')

In summary, the general procedure for standardising a molecule (with the documentation for the appropriate module linked) is...

* Break bonds to Group I or II metals [[**`break_bonds`**](01_break_bonds.ipynb)]

* Neutralize charges by adding/removing protons [[**`neutralise`**](02_neutralise.ipynb)]

* Apply standardization rules [[**`rules`**](03_rules.ipynb)]

* Re-run neutralisation (in case any charges are exposed by rules)

* Discard any salt/solvate components [[**`unsalt`**](04_unsalt.ipynb)]

* Return standardized parent

The complete procedure is implemented by the [**`standardise`**](05_standardise.ipynb) module; a bare-bones alternative workflow using the individual modules is shown [here](06_alternative.ipynb).

The documentantion is contained in the project **`docs/`** directory, and consists of a set of [IPython Notebooks](http://ipython.org/notebook.html), which can be viewed (and run and edited) by starting a notebook server in that directory. Alternatively, the notebooks have been exported as HTML pages, which can be viewed by pointing a browser at the **`docs/html/`** directory.

A simple command-line driver program **`standardiser.py`** is available in the project **`bin/`** directory. It take SD or SMILES as input, and writes out a file containing those structures that have been successfuly standardised and one containing structures for which the procedure has failed.

In the project **`test/`** directory are examples of running **`standardiser.py`** on structures from taken from the PubChem and EPA ACToR databases.

### Further work

* Tidy up the code


* Proper installer
  
    
* Proper documentation
    - Hopefully the notebooks serve to show how things work, but more Pythonic documentation is still needed


* Improve the rule set, neutralisation algorithm and salt dictionary


* Better (optional) logging of what each module has done to a molecule...
    - Verbose logging may be turned on, but this output can't (easily) be used programmatically
    - The **`rules`** module can return a list of what rules have been applied
    - Other modules cannot do anything equivalent as yet.

### Acknowledgements

* This work was funded by the <a target="_blank" href="http://www.imi.europa.eu/content/etox">IMI eTOX</a> project.


* The salt dictionary used is based on that used in the ChEMBL database; this was compiled by L.J. Bellis, A. Hersey and others and was in turn was based on that used in [USAN](http://www.ama-assn.org//ama/pub/physician-resources/medical-science/united-states-adopted-names-council/naming-guidelines/organic-radicals-counterions-solvent-molecules-used.page) nomenclature.


* Some of the standardisation rules were inspired by those used in the [InChI](http://www.inchi-trust.org/home) software.


* This project is built using the [RDKit](http://www.rdkit.org/) chemistry toolkit. 

<p><a name="licensing"></a></p>
### Licensing

This code is released under the [Apache 2.0](http://opensource.org/licenses/apache2.0.php) license. Copyright [2014] is retained by the [EMBL-EBI](http://www.ebi.ac.uk).

### Contact details

Please sent bug reports and suggestions for improvements to <a href="mailto:francis@ebi.ac.uk">Francis Atkinson</a>.