# Welcome to the Ramirez Lab Wiki – Approved drugs from Drugbank
DrugBank is a pharmaceutical knowledge database and a key pharmacoinformatics resource with comprehensive drug/target/disease information.

In this Jupyter notebook, we will download approved drugs information from DrugBank database: https://www.drugbank.ca

This notebook was taken and adapted from **Github|dhimmel** --> https://github.com/dhimmel/drugbank

Some other links of interest: 
* https://docs.drugbankplus.com/v1/#view-product-concept-strengths 
* https://www.biostars.org/p/242799/ 


Este notebook puede ser ejecutado bla bla bla [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/ramirezlab/Farmacoinformatica-2022/blob/main/02_Sesiones-practicas/03_Sesion-III/03_DataVisualization_Colab.ipynb), o en cbla bla bla

### Importing libraries

In [None]:
import csv
import gzip
import collections
import re
import io
import json
import xml.etree.ElementTree as ET
import requests
import pandas as ps

### Reading DrugBank database

The entire database was donwloaded from https://www.drugbank.ca as `full_database.xml` in the folder `db`

In [None]:
import os
xml_path = os.path.join('db/full_database.xml')
with open(xml_path) as xml_file:
    tree = ET.parse(xml_file)
root = tree.getroot()

### Pharsing the DrugBank database and extracting information

In [None]:
ns = '{http://www.drugbank.ca}'
for drug in root:
    print(drug.findtext(ns + "name"))
    break

In [None]:
ns = '{http://www.drugbank.ca}'
inchikey_template = "{ns}calculated-properties/{ns}property[{ns}kind='InChIKey']/{ns}value"
inchi_template = "{ns}calculated-properties/{ns}property[{ns}kind='InChI']/{ns}value"

rows = list()
for i, drug in enumerate(root):
    row = collections.OrderedDict()
    assert drug.tag == ns + 'drug'
    row['type'] = drug.get('type')
    row['drugbank_id'] = drug.findtext(ns + "drugbank-id[@primary='true']")
    row['name'] = drug.findtext(ns + "name")
    row['description'] = drug.findtext(ns + "description")
    row['indication'] = drug.findtext(ns + "indication")
    row['groups'] = [group.text for group in
        drug.findall("{ns}groups/{ns}group".format(ns = ns))]
    row['atc_codes'] = [code.get('code') for code in
        drug.findall("{ns}atc-codes/{ns}atc-code".format(ns = ns))]
    row['categories'] = [x.findtext(ns + 'category') for x in
        drug.findall("{ns}categories/{ns}category".format(ns = ns))]
    row['inchi'] = drug.findtext(inchi_template.format(ns = ns))
    row['inchikey'] = drug.findtext(inchikey_template.format(ns = ns))
    
    # Add drug aliases
    aliases = {
        elem.text for elem in 
        drug.findall("{ns}international-brands/{ns}international-brand".format(ns = ns)) +
        drug.findall("{ns}synonyms/{ns}synonym[@language='English']".format(ns = ns)) +
        drug.findall("{ns}international-brands/{ns}international-brand".format(ns = ns)) +
        drug.findall("{ns}products/{ns}product/{ns}name".format(ns = ns))

    }
    aliases.add(row['name'])
    row['aliases'] = sorted(aliases)

    rows.append(row)

Example of which drug and/or compounds are tagged as `alzheimer` | `Alzheimer` |  `lzheimer`

In [None]:
for row in rows:
    if row['indication'].count("lzheimer") > 0:
        print(row['name'])

Phosphatidyl serine
NADH
Tacrine
Trazodone
Galantamine
Risperidone
Donepezil
Rivastigmine
Selegiline
Memantine
Ibuprofen
Ginkgo biloba
Huperzine B
Zanapezil
Huperzine A
Phenserine
Neramexane
SGS-742
CX717
VP025
CAD106
NGX267
E-2012
LX6171
Caprospinol
Tarenflurbil
Smilagenin
Paliroden
Pozanicline
Mimopezil
PBT-1033
Facinicline
PRX-03140
DDP-225
GTS-21
AVE-1625
PPI-1019
Mito-4509
EHT 0202
MEM 1414
CERE-110
Edonerpic
EVT-101
PRX-07034
CTS-21166
Tesofensine
CX516
Xaliproden
FK-960
Propentofylline
S-8510
Ganstigmine
Tramiprosate
7-beta-Hydroxyepiandrosterone
Florbetaben (18F)
Florbetapir (18F)
Flutemetamol (18F)
Tideglusib
D-alpha-Tocopherol acetate
Flortaucipir F-18


### Adding aliases. 

Please use the file `aliases-drugbank.json`, available at: https://github.com/ramirezlab/WIKI/tree/master/Computational_Polypharmacology/DrugBank/Files

In [None]:
alias_dict = {row['drugbank_id']: row['aliases'] for row in rows}
with open('db/aliases-drugbank.json', 'w') as fp:
    json.dump(alias_dict, fp, indent=2, sort_keys=True)

In [None]:
def collapse_list_values(row):
    for key, value in row.items():
        if isinstance(value, list):
            row[key] = '|'.join(value)
    return row

rows = list(map(collapse_list_values, rows))

In [None]:
columns = ['drugbank_id', 'name', 'indication', 'type', 'groups', 'atc_codes', 'categories', 'inchikey', 'inchi', 'description']
drugbank_df = pandas.DataFrame.from_dict(rows)[columns]
drugbank_df

Unnamed: 0,drugbank_id,name,indication,type,groups,atc_codes,categories,inchikey,inchi,description
0,DB00001,Lepirudin,For the treatment of heparin-induced thrombocy...,biotech,approved,B01AE02,"Amino Acids, Peptides, and Proteins|Anticoagul...",,,Lepirudin is identical to natural hirudin exce...
1,DB00002,Cetuximab,"Cetuximab, used in combination with irinotecan...",biotech,approved,L01XC06,"Amino Acids, Peptides, and Proteins|Antibodies...",,,Cetuximab is an epidermal growth factor recept...
2,DB00003,Dornase alfa,Used as adjunct therapy in the treatment of cy...,biotech,approved,R05CB13,"Amino Acids, Peptides, and Proteins|Cough and ...",,,Dornase alfa is a biosynthetic form of human d...
3,DB00004,Denileukin diftitox,For treatment of cutaneous T-cell lymphoma,biotech,approved|investigational,L01XX29,"ADP Ribose Transferases|Amino Acids, Peptides,...",,,A recombinant DNA-derived cytotoxic protein co...
4,DB00005,Etanercept,Etanercept is indicated for the treatment of m...,biotech,approved|investigational,L04AB01,"Agents reducing cytokine levels|Amino Acids, P...",,,Dimeric fusion protein consisting of the extra...
...,...,...,...,...,...,...,...,...,...,...
13575,DB15689,Azoximer bromide,,small molecule,investigational,,,,,Azoximer bromide is under investigation in cli...
13576,DB15690,Fluoroestradiol F-18,Fluoroestradiol F-18 is a radioactive diagnost...,small molecule,approved,,Estradiol Congeners|Estranes|Estrogens|Estroge...,KDLLNMRYZGUVMA-ZYMZXAKXSA-N,InChI=1S/C18H23FO2/c1-18-7-6-13-12-5-3-11(20)8...,Fluoroestradiol F-18 is an imaging agent used ...
13577,DB15691,Anti-SARS-CoV-2 REGN-COV2,,biotech,investigational,,Experimental Unapproved Treatments for COVID-19,,,Anti-SARS-COV-2 REGN-COV2 is a combination of ...
13578,DB15692,COVID-19 convalescent plasma,,biotech,investigational,,Experimental Unapproved Treatments for COVID-19,,,COVID-19 convalescent plasma is plasma collect...


A subset referred to as *slim* contains only drugs that are approved, small molecules, and contain an InChI structure

In [None]:
drugbank_slim_df = drugbank_df[
    drugbank_df.groups.map(lambda x: 'approved' in x) &
    drugbank_df.inchi.map(lambda x: x is not None) &
    drugbank_df.type.map(lambda x: x == 'small molecule')
]
drugbank_slim_df

Unnamed: 0,drugbank_id,name,indication,type,groups,atc_codes,categories,inchikey,inchi,description
5,DB00006,Bivalirudin,For treatment of heparin-induced thrombocytope...,small molecule,approved|investigational,B01AE06,"Amino Acids, Peptides, and Proteins|Anticoagul...",OIRCOABEOLEUMC-GEJPAHFPSA-N,InChI=1S/C98H138N24O33/c1-5-52(4)82(96(153)122...,Bivalirudin is a synthetic 20 residue peptide ...
6,DB00007,Leuprolide,Leuprolide is indicated for the palliative tre...,small molecule,approved|investigational,L02AE51|L02AE02,Adrenal Cortex Hormones|Agents Causing Muscle ...,GFIJNRVAKGFPGQ-LIJARHBVSA-N,InChI=1S/C59H84N16O12/c1-6-63-57(86)48-14-10-2...,Leuprolide is a synthetic 9-residue peptide an...
13,DB00014,Goserelin,Goserelin is indicated for:\r\n\r\n- Use in co...,small molecule,approved,L02AE03,"Adrenal Cortex Hormones|Amino Acids, Peptides,...",BLCLNMBMMGCOAS-URPVMXJPSA-N,InChI=1S/C59H84N18O14/c1-31(2)22-40(49(82)68-3...,"Goserelin is a synthetic hormone. In men, it s..."
25,DB00027,Gramicidin D,"For treatment of skin lesions, surface wounds ...",small molecule,approved,R02AB30,"Amino Acids, Peptides, and Proteins|Anti-Bacte...",NDAYQJDHGXTBJL-MWWSRJDJSA-N,InChI=1S/C96H135N19O16/c1-50(2)36-71(105-79(11...,Gramcidin D is a heterogeneous mixture of thre...
33,DB00035,Desmopressin,- Indicated for the treatment of nocturia due ...,small molecule,approved,H01BA02,"Agents that produce hypertension|Amino Acids, ...",NFLWUMRGJYTJIN-PNIOQBSNSA-N,InChI=1S/C46H64N14O12S2/c47-35(62)15-14-29-40(...,"Desmopressin (dDAVP), a synthetic analogue of ..."
...,...,...,...,...,...,...,...,...,...,...
13504,DB15617,Ferric derisomaltose,This drug is indicated for the treatment of ir...,small molecule,approved,,"Anemia, Iron-Deficiency|Antianemia Drugs|Antia...",JTQTXQSGPZRXJF-DOJSGGEQSA-N,InChI=1S/C18H34O16.Fe/c19-1-5(21)9(23)10(24)6(...,Iron deficiency is an extremely common conditi...
13564,DB15678,Calcium undecylenate,,small molecule,approved|experimental,,,CLOKKBBIKHZGNX-UHFFFAOYSA-L,InChI=1S/2C11H20O2.Ca/c2*1-2-3-4-5-6-7-8-9-10-...,
13565,DB15679,Aluminum subacetate,,small molecule,approved|experimental,,Astringents,HQQUTGFAWJNQIP-UHFFFAOYSA-K,"InChI=1S/2C2H4O2.Al.H2O/c2*1-2(3)4;;/h2*1H3,(H...",
13571,DB15685,Selpercatinib,Selpercatinib is indicated for the treatment o...,small molecule,approved|investigational,,Agents that produce hypertension|Antineoplasti...,XIIOFHFUYBLOLW-UHFFFAOYSA-N,"InChI=1S/C29H31N7O3/c1-29(2,37)18-39-24-9-25(2...",Selpercatinib is a kinase inhibitor with enhan...


In [None]:
# write drugbank csv
drugbank_df.to_csv('1_All-drugbank_db.csv', index=False)

# write drugbank_slim csv
drugbank_slim_df.to_csv('2_SLIM-drugbank_db.csv', index=False)

NameError: name 'drugbank_df' is not defined

### Including interaction proteins as *Target*, *Enzyme*, *Transporter* & *Carrier*

DrugBank contains four types of drug-protein interactions (taken from: https://nbviewer.jupyter.org/github/dhimmel/drugbank/blob/22d835b3cd0ed421c18f855a85a183a9c1349e8f/parse.ipynb):

* **Target**: A protein, macromolecule, nucleic acid, or small molecule to which a given drug binds, resulting in an alteration of the normal function of the bound molecule anda desirable therapeutic effect. Drug targets are most commonly proteins such as enzymes, ion channels, and receptors.
* **Enzyme**: A protein which catalyzes chemical reactions involving the a given drug (substrate). Most drugs are metabolized by the Cytochrome P450 enzymes.
* **Transporter**: A membrane bound protein which shuttles ions, small molecules or macromolecules across membranes, into cells or out of cells.
* **Carrier**: A secreted protein which binds to drugs, carrying them to cell transporters, where they are moved into the cell. Drug carriers may be used in drug design to increase the effectiveness of drug delivery to the target sites of pharmacological actions.

In [None]:
protein_rows = list()
for i, drug in enumerate(root):
    drugbank_id = drug.findtext(ns + "drugbank-id[@primary='true']")
    for category in ['target', 'enzyme', 'carrier', 'transporter']:
        proteins = drug.findall('{ns}{cat}s/{ns}{cat}'.format(ns=ns, cat=category))
        for protein in proteins:
            row = {'drugbank_id': drugbank_id, 'category': category}
            row['organism'] = protein.findtext('{}organism'.format(ns))
            row['known_action'] = protein.findtext('{}known-action'.format(ns))
            actions = protein.findall('{ns}actions/{ns}action'.format(ns=ns))
            row['actions'] = '|'.join(action.text for action in actions)
            uniprot_ids = [polypep.text for polypep in protein.findall(
                "{ns}polypeptide/{ns}external-identifiers/{ns}external-identifier[{ns}resource='UniProtKB']/{ns}identifier".format(ns=ns))]                      
            if len(uniprot_ids) != 1: continue
            row['uniprot_id'] = uniprot_ids[0]
            protein_rows.append(row)
protein_df = pandas.DataFrame.from_dict(protein_rows)
protein_df

Unnamed: 0,drugbank_id,category,organism,known_action,actions,uniprot_id
26940,DB15599,target,Humans,unknown,inhibitor,P43235
26941,DB15617,target,Humans,unknown,binder,P69905
26942,DB15623,target,Human Immunodeficiency Virus,unknown,inhibitor,O90777
26943,DB15643,target,Humans,unknown,inhibitor,Q9BYF1
26944,DB15647,target,Humans,unknown,antagonist,P10275
26945,DB15665,target,Humans,unknown,agonist,Q96RJ0
26946,DB15665,target,Humans,unknown,agonist,P08908
26947,DB15685,target,Humans,yes,inhibitor,P07949
26948,DB15685,target,Humans,unknown,inhibitor,P17948
26949,DB15685,target,Humans,unknown,inhibitor,P35916


In [None]:
join_df = drugbank_df.set_index('drugbank_id').join(protein_df.set_index('drugbank_id'))

# write drugbank+targets csv
join_df.to_csv('3_All-drugbank_with_targets_db.csv')

### Extracting compunds from an especific disease

We will be use Alzheimer´s disease (AD) as example. The entire Drugbank database is needed

In [None]:
df_disease1 = pd.read_csv('3_All-drugbank_with_targets_db.csv')
df_disease1

Unnamed: 0,drugbank_id,name,indication,type,groups,atc_codes,categories,inchikey,inchi,description,category,organism,known_action,actions,uniprot_id
0,DB00001,Lepirudin,For the treatment of heparin-induced thrombocy...,biotech,approved,B01AE02,"Amino Acids, Peptides, and Proteins|Anticoagul...",,,Lepirudin is identical to natural hirudin exce...,target,Humans,yes,inhibitor,P00734
1,DB00002,Cetuximab,"Cetuximab, used in combination with irinotecan...",biotech,approved,L01XC06,"Amino Acids, Peptides, and Proteins|Antibodies...",,,Cetuximab is an epidermal growth factor recept...,target,Humans,yes,antagonist,P00533
2,DB00002,Cetuximab,"Cetuximab, used in combination with irinotecan...",biotech,approved,L01XC06,"Amino Acids, Peptides, and Proteins|Antibodies...",,,Cetuximab is an epidermal growth factor recept...,target,Humans,unknown,,O75015
3,DB00002,Cetuximab,"Cetuximab, used in combination with irinotecan...",biotech,approved,L01XC06,"Amino Acids, Peptides, and Proteins|Antibodies...",,,Cetuximab is an epidermal growth factor recept...,target,Humans,unknown,,P02745
4,DB00002,Cetuximab,"Cetuximab, used in combination with irinotecan...",biotech,approved,L01XC06,"Amino Acids, Peptides, and Proteins|Antibodies...",,,Cetuximab is an epidermal growth factor recept...,target,Humans,unknown,,P02746
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
32679,DB15690,Fluoroestradiol F-18,Fluoroestradiol F-18 is a radioactive diagnost...,small molecule,approved,,Estradiol Congeners|Estranes|Estrogens|Estroge...,KDLLNMRYZGUVMA-ZYMZXAKXSA-N,InChI=1S/C18H23FO2/c1-18-7-6-13-12-5-3-11(20)8...,Fluoroestradiol F-18 is an imaging agent used ...,carrier,Humans,unknown,binder,P04278
32680,DB15690,Fluoroestradiol F-18,Fluoroestradiol F-18 is a radioactive diagnost...,small molecule,approved,,Estradiol Congeners|Estranes|Estrogens|Estroge...,KDLLNMRYZGUVMA-ZYMZXAKXSA-N,InChI=1S/C18H23FO2/c1-18-7-6-13-12-5-3-11(20)8...,Fluoroestradiol F-18 is an imaging agent used ...,carrier,Humans,unknown,binder,P02768
32681,DB15691,Anti-SARS-CoV-2 REGN-COV2,,biotech,investigational,,Experimental Unapproved Treatments for COVID-19,,,Anti-SARS-COV-2 REGN-COV2 is a combination of ...,,,,,
32682,DB15692,COVID-19 convalescent plasma,,biotech,investigational,,Experimental Unapproved Treatments for COVID-19,,,COVID-19 convalescent plasma is plasma collect...,,,,,


In [None]:
selection = ['drugbank_id','name','type','groups','indication','description','category','known_action','actions','uniprot_id']
df_disease2 = df_disease1[selection]
df_disease2

Unnamed: 0,drugbank_id,name,type,groups,indication,description,category,known_action,actions,uniprot_id
0,DB00001,Lepirudin,biotech,approved,For the treatment of heparin-induced thrombocy...,Lepirudin is identical to natural hirudin exce...,target,yes,inhibitor,P00734
1,DB00002,Cetuximab,biotech,approved,"Cetuximab, used in combination with irinotecan...",Cetuximab is an epidermal growth factor recept...,target,yes,antagonist,P00533
2,DB00002,Cetuximab,biotech,approved,"Cetuximab, used in combination with irinotecan...",Cetuximab is an epidermal growth factor recept...,target,unknown,,O75015
3,DB00002,Cetuximab,biotech,approved,"Cetuximab, used in combination with irinotecan...",Cetuximab is an epidermal growth factor recept...,target,unknown,,P02745
4,DB00002,Cetuximab,biotech,approved,"Cetuximab, used in combination with irinotecan...",Cetuximab is an epidermal growth factor recept...,target,unknown,,P02746
...,...,...,...,...,...,...,...,...,...,...
32679,DB15690,Fluoroestradiol F-18,small molecule,approved,Fluoroestradiol F-18 is a radioactive diagnost...,Fluoroestradiol F-18 is an imaging agent used ...,carrier,unknown,binder,P04278
32680,DB15690,Fluoroestradiol F-18,small molecule,approved,Fluoroestradiol F-18 is a radioactive diagnost...,Fluoroestradiol F-18 is an imaging agent used ...,carrier,unknown,binder,P02768
32681,DB15691,Anti-SARS-CoV-2 REGN-COV2,biotech,investigational,,Anti-SARS-COV-2 REGN-COV2 is a combination of ...,,,,
32682,DB15692,COVID-19 convalescent plasma,biotech,investigational,,COVID-19 convalescent plasma is plasma collect...,,,,


### Searching the disease 

1. The word *lzheimer* will be used as query because we are interested in Alzheimer´s Disease. Change the query for the studied disease
2. Then, the column *Disease_index* will be created and a new index will be included. The occurrence of the query *lzheimer* in the column *description* will be indicated as an index in the *Disease_index* column, and is equal to the position first occurrence of character in the string. If the substring doesn’t exist in the text, **-1** is returned.
3. The rows with values equal or greater that 0 will be selected, becuase -1 indicate creating and passsing series to new column
4. The same process will be done but now in the column *indication*
5. Finally, a *drug_class_index* column will be created, and the occurrence of the query *approved* in the column *groups* will be indicated as an new index.
6. The same process will be done to track the drugs with know action in proteins with **target** category. 

In [None]:
# dis ='your_query_disease'
dis ='lzheimer'
df_disease2['Disease_index']= df_disease2['description'].str.find(dis)
df_disease3 = df_disease2.loc[df_disease2['Disease_index'] >= 0]

df_disease3['Disease_index2']= df_disease3['indication'].str.find(dis)
df_disease4 = df_disease3.loc[df_disease3['Disease_index2'] >= 0]

drug_class ='approved'
df_disease4['drug_class_index']= df_disease4['groups'].str.find(drug_class)
df_disease5 = df_disease4.loc[df_disease4['drug_class_index'] >= 0]

# write drugbank-approved with intercting proteins (Target, Enzyme, Transporter, Carrier) csv
selection = ['drugbank_id','name','type','groups','indication','description','category','known_action','actions','uniprot_id']
df_approved_drugs = df_disease5[selection]
df_approved_drugs.to_csv('4_approved-drugbank_with_proteins.csv', index=False)
df_approved_drugs

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_disease2['Disease_index']= df_disease2['description'].str.find(dis)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_disease3['Disease_index2']= df_disease3['indication'].str.find(dis)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_disease4['drug_class_index']= df_disease4['groups'].str.find

Unnamed: 0,drugbank_id,name,type,groups,indication,description,category,known_action,actions,uniprot_id
2905,DB00382,Tacrine,small molecule,approved|investigational|withdrawn,For the palliative treatment of mild to modera...,A centerally active cholinesterase inhibitor t...,target,yes,inhibitor,P22303
2906,DB00382,Tacrine,small molecule,approved|investigational|withdrawn,For the palliative treatment of mild to modera...,A centerally active cholinesterase inhibitor t...,target,yes,inhibitor,P06276
2907,DB00382,Tacrine,small molecule,approved|investigational|withdrawn,For the palliative treatment of mild to modera...,A centerally active cholinesterase inhibitor t...,target,unknown,,P23141
2908,DB00382,Tacrine,small molecule,approved|investigational|withdrawn,For the palliative treatment of mild to modera...,A centerally active cholinesterase inhibitor t...,enzyme,unknown,substrate,P05177
2909,DB00382,Tacrine,small molecule,approved|investigational|withdrawn,For the palliative treatment of mild to modera...,A centerally active cholinesterase inhibitor t...,transporter,unknown,substrate,P08183
5163,DB00674,Galantamine,small molecule,approved,Galantamine is indicated for the treatment of ...,Galantamine is a tertiary alkaloid and reversi...,target,yes,inhibitor,P22303
5164,DB00674,Galantamine,small molecule,approved,Galantamine is indicated for the treatment of ...,Galantamine is a tertiary alkaloid and reversi...,target,yes,allosteric modulator,P36544
5165,DB00674,Galantamine,small molecule,approved,Galantamine is indicated for the treatment of ...,Galantamine is a tertiary alkaloid and reversi...,target,unknown,allosteric modulator,A9X444
5166,DB00674,Galantamine,small molecule,approved,Galantamine is indicated for the treatment of ...,Galantamine is a tertiary alkaloid and reversi...,target,unknown,inhibitor,P06276
5167,DB00674,Galantamine,small molecule,approved,Galantamine is indicated for the treatment of ...,Galantamine is a tertiary alkaloid and reversi...,enzyme,no,substrate,P08684


In [None]:
#Reading the 4_approved-drugbank_with_proteins.csv file
df_approved_drugs = pd.read_csv("4_approved-drugbank_with_proteins.csv")

#Selection of known targets
dis ='target'
df_approved_drugs['target_index']= df_approved_drugs['category'].str.find(dis)
df_1 = df_approved_drugs.loc[df_approved_drugs['target_index'] >= 0]

#Selection of drugs with known pharmacological action. Comment the following lines if you want all actions
known_action_YES ='yes'
df_1['known_action_index']= df_1['known_action'].str.find(known_action_YES)
df_2 = df_1.loc[df_1['known_action_index'] >= 0]

#Selection of drug with pharmacological action as inhibitors and allosteric modulators. Comment the following lines if you want all drugs
df_2['actions_index']= df_2['actions'].str.find('binder')
df_3 = df_2.loc[df_2['actions_index'] != 0]

# write drugbank-approved just with targets=yes -> csv
selection = ['drugbank_id','name','type','groups','indication','description','category','known_action','actions','uniprot_id']
df_approved_drugs_target=df_3[selection]
df_approved_drugs_target.to_csv('5_approved-drugbank_just_with_target.csv', index=False)
df_approved_drugs_target


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  del sys.path[0]
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy


Unnamed: 0,drugbank_id,name,type,groups,indication,description,category,known_action,actions,uniprot_id
0,DB00382,Tacrine,small molecule,approved|investigational|withdrawn,For the palliative treatment of mild to modera...,A centerally active cholinesterase inhibitor t...,target,yes,inhibitor,P22303
1,DB00382,Tacrine,small molecule,approved|investigational|withdrawn,For the palliative treatment of mild to modera...,A centerally active cholinesterase inhibitor t...,target,yes,inhibitor,P06276
5,DB00674,Galantamine,small molecule,approved,Galantamine is indicated for the treatment of ...,Galantamine is a tertiary alkaloid and reversi...,target,yes,inhibitor,P22303
6,DB00674,Galantamine,small molecule,approved,Galantamine is indicated for the treatment of ...,Galantamine is a tertiary alkaloid and reversi...,target,yes,allosteric modulator,P36544
12,DB00843,Donepezil,small molecule,approved,Donepezil is indicated for the management of m...,"In 2016, the global burden of dementia was est...",target,yes,inhibitor,P22303
22,DB00989,Rivastigmine,small molecule,approved|investigational,For the treatment of mild to moderate dementia...,Rivastigmine is a parasympathomimetic or choli...,target,yes,inhibitor,P22303
23,DB00989,Rivastigmine,small molecule,approved|investigational,For the treatment of mild to moderate dementia...,Rivastigmine is a parasympathomimetic or choli...,target,yes,inhibitor,P06276
