# Usage examples of ATC KEGG_DRUG database
------------
+ Antonio Oliver Gelabert, April 2020
+ ORCID    : http://orcid.org/0000-0001-8571-2733
+ Linkedin : https://www.linkedin.com/in/aoliverg/?locale=en_US
------------
ATC is european systemathic classification of accepted medicines by therapeutic usage. To see details of this classification better take a look at : 

+ https://en.wikipedia.org/wiki/Anatomical_Therapeutic_Chemical_Classification_System
+ https://www.whocc.no/atc_ddd_index/

KEGG_DRUG database is a repository of compounds which structure is available from: 

+ https://www.genome.jp/kegg/drug/

The individual information about compounds can be consulted at LigandBox:

+ http://www.mypresto5.com/ligandbox/

Many properties has been obtained using openbabel 3.0 software (http://openbabel.org). For instance, molecular properties (MW,logP,TPSA,...) and fingerprints (FP2,FP3,FP4,MACCS). 

Ohter has been obtained using 3D derived descriptors as PED:

+ https://www.nature.com/articles/srep43738

From the curated dataset combining information of this two links:

+ http://www.mypresto5.com/ligandbox/cgi-bin/lbox_download.cgi?LANG=en  (MOL2 FILES OF KEGG_DRUG DATABASE)
+ https://www.genome.jp/kegg-bin/get_htext#A1 (KEGG ATC codes)

With the database of accepted medicine we cand find for similarities or explore pharmaceuticals and classes of pharmaceuticals and use openbabel command line to make visualizations.

In this demostration I will show how to do the following tasks using the provided dataset:

1. Search for specific compounds: Ibuprofen (Let's try by yourself the following Parmaceuticals: Paracetamol, Omeprazol,...)
2. Explore specific class of pharmaceuticals. ATC category: Protease inhibitors of antivirals (J05AE) 
3. Statistical Properties of Full Dataset 
4. Search and visualization of Covid19 candidates for potential treatment.

Lets first import pandas and use it to visualize columns:

In [None]:
import pandas as pd

keggfull= pd.read_csv('/kaggle/input/properties-of-atc-accepted-medicines/KEGG_DRUG_ATC_PROPERTIES_PED_FP.csv', delimiter=',')

Take a fast look of the content using head:

In [None]:
keggfull.head()

## 1. Search for specific pharmaceutical. For instance, Ibuprofen:

In [None]:
drugname='Ibuprofen'
labsearch=' '+drugname
longname=len(labsearch)

In [None]:
keggmol=keggfull[keggfull['CompoundName'].str[:longname]==labsearch]
keggmol.head()

Results show that has classified in 4 of 14 big categories in Musculo-skeletal system, Genito-urinary system and sex hormones and Respiratory system):

In [None]:
keggmol.drop_duplicates(['BigGroup_ATC_class'], keep='first')

Let's search the ATC codes of Ibuprofen and derivatives, searching by unique structures compounds by SMILES structure:

In [None]:
mol_and_derivatives=keggmol.drop_duplicates(['SMILES'], keep='first')
keggmol.drop_duplicates(['SMILES'], keep='first')

We can take a look at this structures. We have two options:

1. Use URL column (last column) and visualize the entry
2. Make a SVG of a group and use Openbabel software and display it

Lets start by the first way, look at URL entry:

In [None]:
urlfull=mol_and_derivatives['URL_KEGG'].values
print(urlfull[0],urlfull[1])

In [None]:
from IPython.display import IFrame
IFrame(src=urlfull[1], width=1000, height=300)

Also, we can take a look only to images using KEGG codes and ident=0 (first molecule) and iden=1 (second molecule)

In [None]:
from PIL import Image
import requests
from io import BytesIO
iden=0
keggcode=mol_and_derivatives['KEGG_code'].values
url='https://www.genome.jp/Fig/drug/'+keggcode[iden]+'.gif'
response = requests.get(url)
img = Image.open(BytesIO(response.content))
img

In [None]:
iden=1
keggcode=mol_and_derivatives['KEGG_code'].values
url='https://www.genome.jp/Fig/drug/'+keggcode[iden]+'.gif'
response = requests.get(url)
img = Image.open(BytesIO(response.content))
img

1.2 Now we can also use Openbabel and Shell commands (Linux: sudo apt-get install openbabel gawk ; Windows using Cygwin libraries and installing Openbabel) to do the same task. Uncomment lines to process if having requirements satisfied.

In [None]:
from IPython.display import SVG, display

#mol_and_derivatives[['SMILES','KEGG_code']].to_csv('subset.smi', sep=' ', encoding='utf-8')
#!gawk "{print $2,$3}" subset.smi | sed '1d'  > subsetcur.smi
#!head -n 20 subsetcur.smi > subsetcur20.smi
#!obabel -ismi subsetcur20.smi -osvg -O subset.svg

display(SVG('/kaggle/input/compoundgroupsatc/subset.svg'))

## 2. Search for Specific class compounds: particular case of class of Antivirals (J05AE)

In this example we will explore protease inhibitors, which ATC code is J05E (**COVID-19 potential treatment**):
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4774534/

In [None]:
atcsearch='J05AE'
lensearch=len(atcsearch)
keggsubset2=keggfull[keggfull['ATC_full_code'].str[:lensearch]==atcsearch]
keggsubset2=keggsubset2.drop_duplicates(['SMILES'], keep='first')
#keggsubset2[['SMILES','KEGG_code']].to_csv('subset2.smi', sep=' ', encoding='utf-8')
#!gawk "{print $2,$3}" subset2.smi | sed '1d'  > subsetcur2.smi
#! head -n 20 subsetcur2.smi > ProteaseinhibJ05AE.smi
#!obabel -ismi ProteaseinhibJ05AE.smi -osvg -O ProteaseinhibJ05AE.svg
display(SVG('/kaggle/input/compoundgroupsatc/ProteaseinhibJ05AE.svg'))

To look at level categories lets explore a bit the ATC classes and codes:

In [None]:
keggfull[['ATC_label_class','ATC_full_code','BigGroup_ATC_class']].drop_duplicates(['ATC_label_class'], keep='first').head(n=100)

In [None]:
# Lets search for antiseptics in category R02AA :https://www.whocc.no/atc_ddd_index/?code=R02AA&showdescription=yes
#atcsearch='R02AA'
#lensearch=len(atcsearch)
#keggsubset2=keggfull[keggfull['ATC_full_code'].str[:lensearch]==atcsearch]
#keggsubset2=keggsubset2.drop_duplicates(['SMILES'], keep='first')
#keggsubset2[['SMILES','KEGG_code']].to_csv('subset2.smi', sep=' ', encoding='utf-8')
#!gawk "{print $2,$3}" subset2.smi | sed '1d'  > subsetcur2.smi
#! head -n 20 subsetcur2.smi > R02AA.smi
#!obabel -ismi R02AA.smi -osvg -O R02AA.svg
display(SVG('/kaggle/input/compoundgroupsatc/R02AA.svg'))

What is the entry of the compound D07208?

In [None]:
search='D07208'
lensearch=len(search)
class_search='KEGG_code'
kegg_search_mol=keggfull[keggfull[class_search].str[:lensearch]==search]
kegg_search_mol.drop_duplicates(['SMILES'], keep='last')

## 3. Statistical Properties of Full Dataset  

Let's verify if Lipinski Rule of five (https://en.wikipedia.org/wiki/Lipinski%27s_rule_of_five) is satisfied by the average:

1. No more than 5 hydrogen bond donors (the total number of nitrogen–hydrogen and oxygen–hydrogen bonds)
2. No more than 10 hydrogen bond acceptors (all nitrogen or oxygen atoms)
3. A molecular mass less than 500 daltons
4. An octanol-water partition coefficient (log P) that does not exceed 5

In [None]:
import numpy as np
round(np.mean(keggfull),2).head(n=7)

As we can see, condition 1 is satisfied (HBD=2.64 < 5), condition 2 is also satisfied looking at 2 (HBA2 < 10), condition 3 is also satisfied (MW = 377.54 < 500) and also condition 4 (logP = 2.41 < 5) 

Now lets visualize distribution of some properties of the full dataset by histogram plots

In [None]:
import matplotlib.pyplot as plt

fig, axs = plt.subplots(2, 3, constrained_layout=True,figsize=(15,15))

axs[0, 0].hist(keggfull['MW']);
axs[0, 0].set_title('MW')
axs[0, 1].hist(keggfull['logP']);
axs[0, 1].set_title('logP')
axs[0, 2].hist(keggfull['HBA1']);
axs[0, 2].set_title('HBA1')
axs[1, 1].hist(keggfull['HBD']);
axs[1, 1].set_title('HBD')
axs[1, 0].hist(keggfull['TPSA']);
axs[1, 0].set_title('TPSA')
axs[1, 2].hist(keggfull['MR']);
axs[1, 2].set_title('MR')

for ax in axs.flat:
    ax.set(xlabel='', ylabel='')

plt.show()

## 4. Search and visualization of Covid19 candidates for potential treatment. 

Lets now search for drugs that have been candidates for test their effectiveness against COVID19: 

(source:https://www.linkedin.com/pulse/covid-19-treatments-clinical-trials-ana-gavald%25C3%25A1/?trackingId=tzMbX9MOQUK8qJMQDKuE3w%3D%3D): 

+ REMDESIVIR - Antiviral, prefered treatment against COVID-19 (https://en.wikipedia.org/wiki/Remdesivir, https://www.foxnews.com/science/remdesivir-what-to-know-about-potential-coronavirus-treatment, https://www.nature.com/articles/d41573-020-00016-0 )
+ CHLOROQUINE - Antimalarial (https://en.wikipedia.org/wiki/Chloroquine)
+ HYDROXYCHLOROQUINE - Derivative that improves CHLOROQUINE effectiveness (https://www.nature.com/articles/s41421-020-0156-0.pdf)
+ LOPINAVIR - Antiviral against VIH (https://en.wikipedia.org/wiki/Lopinavir)
+ DARUNAVIR - Antiviral against VIH (https://en.wikipedia.org/wiki/Darunavir)
+ FAVIPIRAVIR - Effective with Ebola and other viruses (https://www.livescience.com/flu-drug-could-treat-coronavirus.html ),(https://www.redaccionmedica.com/secciones/sanidad-hoy/coronavirus-tratamiento-china-anuncia-resultados-del-antiviral-favipiravir-4773)
+ THALIDOMIDE - (https://en.wikipedia.org/wiki/Thalidomide)
+ FINGOLIMOD - Inmunomodulator effect. Multiple sclerosis (MS) threatment (https://en.wikipedia.org/wiki/Fingolimod )
+ GALIDESIVIR - (https://en.wikipedia.org/wiki/Galidesivir)

### 4.1 Chloroquine. Antimalarial (https://en.wikipedia.org/wiki/Chloroquine)

In [None]:
drugname='Chloroquine'
labsearch=' '+drugname
longname=len(labsearch)
keggmol=keggfull[keggfull['CompoundName'].str[:longname]==labsearch]
keggmol.head()
test=keggmol.drop_duplicates(['SMILES'], keep='first')
test

In [None]:
iden=0
keggcode=test['KEGG_code'].values
url='https://www.genome.jp/Fig/drug/'+keggcode[iden]+'.gif'
response = requests.get(url)
img = Image.open(BytesIO(response.content))
img

### 4.2 Hydroxychloroquine. Derivative that improves CHLOROQUINE effectiveness (https://www.nature.com/articles/s41421-020-0156-0.pdf)

In [None]:
drugname='Hydroxychloroquine'
labsearch=' '+drugname
longname=len(labsearch)
keggmol=keggfull[keggfull['CompoundName'].str[:longname]==labsearch]
keggmol.head()
test=keggmol.drop_duplicates(['SMILES'], keep='first')
test


In [None]:
iden=0
keggcode=test['KEGG_code'].values
url='https://www.genome.jp/Fig/drug/'+keggcode[iden]+'.gif'
response = requests.get(url)
img = Image.open(BytesIO(response.content))
img

### 4.3 Lopinavir (Antiviral against VIH (https://en.wikipedia.org/wiki/Lopinavir))

In [None]:
drugname='Lopinavir'
labsearch=' '+drugname
longname=len(labsearch)
keggmol=keggfull[keggfull['CompoundName'].str[:longname]==labsearch]
keggmol.head()
keggmol.drop_duplicates(['SMILES'], keep='first')
test=keggmol.drop_duplicates(['SMILES'], keep='first')
test

In [None]:
iden=0
keggcode=test['KEGG_code'].values
url='https://www.genome.jp/Fig/drug/'+keggcode[iden]+'.gif'
response = requests.get(url)
img = Image.open(BytesIO(response.content))
img

### 4.4 Darunavir. Antiviral against VIH (https://en.wikipedia.org/wiki/Darunavir)

In [None]:
drugname='Darunavir'
labsearch=' '+drugname
longname=len(labsearch)
keggmol=keggfull[keggfull['CompoundName'].str[:longname]==labsearch]
keggmol.head()
keggmol.drop_duplicates(['SMILES'], keep='first')
test=keggmol.drop_duplicates(['SMILES'], keep='first')
test

In [None]:
iden=0
keggcode=test['KEGG_code'].values
url='https://www.genome.jp/Fig/drug/'+keggcode[iden]+'.gif'
response = requests.get(url)
img = Image.open(BytesIO(response.content))
img

### 4.5. Fingolimod. Inmunomodulator effect. Multiple sclerosis (MS) threatment (https://en.wikipedia.org/wiki/Fingolimod )

In [None]:
drugname='Fingolimod'
labsearch=' '+drugname
longname=len(labsearch)
keggmol=keggfull[keggfull['CompoundName'].str[:longname]==labsearch]
keggmol.head()
keggmol.drop_duplicates(['SMILES'], keep='first')
test=keggmol.drop_duplicates(['SMILES'], keep='first')
test

In [None]:
iden=0
keggcode=test['KEGG_code'].values
url='https://www.genome.jp/Fig/drug/'+keggcode[iden]+'.gif'
response = requests.get(url)
img = Image.open(BytesIO(response.content))
img

In [None]:
iden=1
keggcode=test['KEGG_code'].values
url='https://www.genome.jp/Fig/drug/'+keggcode[iden]+'.gif'
response = requests.get(url)
img = Image.open(BytesIO(response.content))
img