Skip to content

PathwayCommons/drugbank-to-biopax

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

drugbank-to-biopax

Build with Maven

Originated from https://bitbucket.org/armish/gsoc14 and will continue here (ToDo).

DrugBank to BioPAX Level3 data converter.

Most of the drug-target interactions in DrugBank do not contain detailed process information—i.e. how the drug actually potentiates/inhibits the protein. Although their XML data export is easy to parse, conversion of this to BioPAX will require discussion with Pathway Commons and deciding on the best way to represent these interactions.

Data source

Implementation details

DrugBank XML file contains sufficient information to create SmallMolecule and Protein instances with proper external identification to them. There are multiple types of relationships between SmallMolecules and Proteins presented in this database: Transporters, Targets, Enzymes. For the purpose of this project, we are interested in capturing drug-target relationships, hence we parse only this information and convert it to BioPAX BiochemicalReactions. These binary relationships do not contain structured mechanism information, e.g. how a drug inhibits or potentiates a target. We have decided to encode this information in BioPAX by using SequenceModificationFeatures with active and inactive terms and associating these features with target proteins. Therefore, the final BioPAX model contains BiochemicalReactions where drugs regulate the reaction and the participant protein either gets activated. The type of regulation is based on the controlled vocabulary adopted by DrugBank. If the interaction type is one of the following, then we represent that as a negative regulation, meaning that drug inhibits the inactivation of the protein (double negative): substrate, agonist, inducer, potentiator, stimulator, cofactor or ligand. Otherwise, the drug positively regulates the inactivation reaction.

The following screenshot, for example, shows positive (green edges) and negative (red edges) regulations by some drugs:

https://bitbucket.org/armish/gsoc14/downloads/goal3_drugbank_screenshot_20140730.jpg

For some drugs, the XML file also contains information about how the drug is metabolized by various enzymes, but we currently do not capture this in the final model. The reason we are not doing this is partly due to incomplete knowledge (especially regarding the intermediate chemicals) and partly due to the fact that SMPDB knowledgebase already has these in BioPAX.

Usage

Check out (git clone) and change to:

$ cd drugbank-to-biopax

build with Maven:

$ mvn clean package

This will create a single executable JAR file under the target/ directory, with the following file name: drugbank-to-biopax.jar. Once you have the single JAR file, you can try to run without any command line options to see the help text:

$ java -jar drugbank-to-biopax.jar 
usage: DrugbankToBiopax
 -d,--drugs <arg>    structured DrugBank data (XML) [required]
 -o,--output <arg>   Output (BioPAX file) [required]

The only input file required for this conversion is the XML file that is distributed by DrugBank. Once downloaded, you can then convert the model as follows:

$ java -jar drugbank-to-biopax.jar -d drugbank.xml -o drugbank.owl

Validation

The (OLD) validation report is available under the Downloads: goal3_drugbank_validationResults_20140730.zip. The converted model does not have errors, but it produces warnings for few known cases.

The first noticeable problem has to do with the active and inactive terms of the modification features. These terms are not registered in Miriam, but they have been used in BioPAX models, especially in NCI-PID. Ideally instead of these terms, we encode the activation reaction with all mechanistic details; but since this information is not available, we have to ignore these warnings for the time being.

Another problem that validator captures has to do with the external links. During conversion, we create RelationshipXrefs for SmallMolecules as they are provided by DrugBank. The database names for some of these links are not registered in Miriam, hence we get unknown.db warnings in the validation report. One way to deal with this is to manually fix the database names to match them with the existing Miriam identifiers, but ideally we will work with DrugBank to get these names standardized.

Releases

No releases published

Packages

No packages published

Languages