# Drug Bank Demo

![DrugBank](./figures/drugbank.jpg)

This demo demonstrates how to access the open DrugBank dataset. This dataset contains identifiers and names for integration with other data resources.

## Reference
 
Wishart DS, et al., DrugBank 5.0: a major update to the DrugBank database for 2018. Nucleic Acids Res. 2017 Nov 8, <a href="https://dx.doi.org/10.1093/nar/gkx1037">doi:10.1093/nar/gkx1037</a>.

## Imports

In [1]:
from pyspark.sql import SparkSession
from mmtfPyspark.datasets import drugBankDataset

#### Configure Spark 

In [2]:
spark = SparkSession.builder.appName("DrugBankDemo").getOrCreate()

2022-01-23 14:17:21 WARN  NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable


Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).


## Download open DrugBank dataset

In [3]:
openDrugLinks = drugBankDataset.get_open_drug_links()

openDrugLinks.columns

['DrugBankID',
 'AccessionNumbers',
 'Commonname',
 'CAS',
 'UNII',
 'Synonyms',
 'StandardInChIKey']

## Find all drugs with an InChIKey

In [4]:
openDrugLinks = openDrugLinks.filter("StandardInChIKey IS NOT NULL")

## Show some sample data

In [5]:
openDrugLinks.select("DrugBankID","Commonname","CAS","StandardInChIKey").show()

+----------+--------------------+-----------+--------------------+
|DrugBankID|          Commonname|        CAS|    StandardInChIKey|
+----------+--------------------+-----------+--------------------+
|   DB00006|         Bivalirudin|128270-60-0|OIRCOABEOLEUMC-GE...|
|   DB00007|          Leuprolide| 53714-56-0|GFIJNRVAKGFPGQ-LI...|
|   DB00014|           Goserelin| 65807-02-5|BLCLNMBMMGCOAS-UR...|
|   DB00027|        Gramicidin D|  1405-97-6|NDAYQJDHGXTBJL-MW...|
|   DB00035|        Desmopressin| 16679-58-6|NFLWUMRGJYTJIN-PN...|
|   DB00050|          Cetrorelix|120287-85-6|SBNPWPIBESPSIF-MH...|
|   DB00067|         Vasopressin| 11000-17-2|JLTCWSBVQSZVLT-UH...|
|   DB00080|          Daptomycin|103060-53-3|DOAKLVKFURWEDJ-QC...|
|   DB00091|        Cyclosporine| 59865-13-3|PMATZTZNYRCHOR-CG...|
|   DB00093|         Felypressin|    56-59-7|SFKQVVDKFKYTNA-DZ...|
|   DB00104|          Octreotide| 83150-76-9|DEQANNDTNATYII-OU...|
|   DB00106|            Abarelix|183552-38-7|AIWRTTMUVOZGPW-HS

## Download DrugBank dataset for approved drugs

The DrugBank password protected datasets contain more information.
You need to create a DrugBank account and supply username/passwork to access these datasets.

[Create DrugBank account](https://www.drugbank.ca/public_users/sign_up)

In [6]:
# username = "<your DrugBank account username>"
# password = "<your DrugBank account password>"
# drugLinks = drugBankDataset.get_drug_links("APPROVED", username,password)

## Show some sample data from DrugLinks

In [7]:
# drugLinks.select("DrugBankID","Name","CASNumber","Formula","PubChemCompoundID",\
#                "PubChemSubstanceID","ChEBIID","ChemSpiderID").show()

## Terminate Spark

In [8]:
spark.stop()