<a href="https://colab.research.google.com/github/ohashin2/Dataset/blob/master/test.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Exploring HTR3A protein target activity data from ExcapeDB

The Target specific data was downloaded from https://zenodo.org/record/173258#.X_44kuhKi70

Please refer to the ExCAPE-DB publication (https://pubmed.ncbi.nlm.nih.gov/28316655/) for details about the database 

Here are some details about HTR3A gene (taken from RefSeq NCBI)
*The product of this gene belongs to the ligand-gated ion channel receptor superfamily. This gene encodes subunit A of the type 3 receptor for 5-hydroxytryptamine (serotonin), a biogenic hormone that functions as a neurotransmitter, a hormone, and a mitogen. This receptor causes fast, depolarizing responses in neurons after activation. It appears that the heteromeric combination of A and B subunits is necessary to provide the full functional features of this receptor, since either subunit alone results in receptors with very low conductance and response amplitude. Alternatively spliced transcript variants encoding different isoforms have been identified.*

Diseases associated with HTR3A include Irritable Bowel Syndrome and Motion Sickness.

## Basic information about HTR3A gene:

**Present in** Chromosome 11
**Exon count:** 10

mRNA and protein information

*   NM_000869.6 → NP_000860.3 
*   NM_001161772.3 → NP_001155244.1
*   NM_213621.4 → NP_998786.3 








# Before you begin, make sure you close all other COLAB notebooks. 

# Change Runtime settings

## Plesae change your runtime settings to use GPU and high-memory, if you have 

## Runtime --> Change Runtime Type --> GPU wityh high-RAM

In [1]:
!date # starting time

Mon Feb 15 18:53:23 UTC 2021


## Install AMPL GPU version

```
real	2m59.739s
user	1m48.995s
sys	0m20.614s
```

In [2]:
import requests

# Copy AMPL install script
url='https://raw.githubusercontent.com/ravichas/AMPL-Tutorial/master/config/install_AMPL_GPU.sh'

downloaded_obj = requests.get(url)
with open("install_AMPL_GPU.sh", "wb") as file:
    file.write(downloaded_obj.content)


# Install and load AMPL
!chmod u+x install_AMPL_GPU.sh
!time ./install_AMPL_GPU.sh
import sys
if '/content/AMPL/lib/python3.6/site-packages' not in sys.path:
    sys.path.insert(1, '/content/AMPL/lib/python3.6/site-packages')

sys.path

--2021-02-15 18:53:25--  https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
Resolving repo.anaconda.com (repo.anaconda.com)... 104.16.131.3, 104.16.130.3, 2606:4700::6810:8303, ...
Connecting to repo.anaconda.com (repo.anaconda.com)|104.16.131.3|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 94235922 (90M) [application/x-sh]
Saving to: ‘Miniconda3-latest-Linux-x86_64.sh’


2021-02-15 18:53:25 (221 MB/s) - ‘Miniconda3-latest-Linux-x86_64.sh’ saved [94235922/94235922]

PREFIX=/content/AMPL
Unpacking payload ...
Collecting package metadata (current_repodata.json): - \ | done
Solving environment: - \ done

## Package Plan ##

  environment location: /content/AMPL

  added / updated specs:
    - _libgcc_mutex==0.1=main
    - brotlipy==0.7.0=py38h27cfd23_1003
    - ca-certificates==2020.10.14=0
    - certifi==2020.6.20=pyhd3eb1b0_3
    - cffi==1.14.3=py38h261ae71_2
    - chardet==3.0.4=py38h06a4308_1003
    - conda-package-handling==1.7

['',
 '/content/AMPL/lib/python3.6/site-packages',
 '/env/python',
 '/usr/lib/python36.zip',
 '/usr/lib/python3.6',
 '/usr/lib/python3.6/lib-dynload',
 '/usr/local/lib/python3.6/dist-packages',
 '/usr/lib/python3/dist-packages',
 '/usr/local/lib/python3.6/dist-packages/IPython/extensions',
 '/root/.ipython']

## Exploring target activity data from selected database
## Data source = ExcapeDB
## Target activity for the receptor HTR3A

In [6]:
# There is a problem with the previously imported cffi, so delete it and 
# load it with AMPL instead
if 'cffi' in sys.modules:
  del sys.modules['cffi']

In [7]:
# We temporarily disable warnings for demonstration.
# FutureWarnings and DeprecationWarnings are present from some of the AMPL 
# dependency modules.
import warnings
warnings.filterwarnings('ignore')

import json
# import numpy as np
# import pandas as pd
import os
import requests

In [8]:
#
# Import AMPL libraries
# 
import atomsci.ddm.utils.data_curation_functions as dcf
import atomsci.ddm.utils.curate_data as curate_data
import atomsci.ddm.pipeline.diversity_plots as dp
import atomsci.ddm.pipeline.chem_diversity as cd
# Additional python libraries
import pandas as pd
import numpy as np
import getpass,os

## Select a target to work with 
### (e.g. PDE2A, KCNH2, SCNA5)

In [9]:
target_name='HTR3A'

ofile=target_name+'_excape_curated.csv'

# Define data locations 
## get username to use as a unique identifier to work in shared directories

In [10]:
# ofile=target_name+'_excape_curated.csv'
ofile=target_name+'_excape.csv'

In [13]:
import io
url = 'https://raw.githubusercontent.com/ohashin2/Dataset/master/ACHE_ChEMBL_TSV_2021-02-13.tsv'
# url = 'https://raw.githubusercontent.com/ravichas/AMPL-Tutorial/master/datasets/Excape_HTR3A.tsv'
download = requests.get(url).content

In [14]:
download

b'"Molecule ChEMBL ID"\t"Molecule Name"\t"Molecule Max Phase"\t"Molecular Weight"\t"#RO5 Violations"\t"AlogP"\t"Compound Key"\t"Smiles"\t"Standard Type"\t"Standard Relation"\t"Standard Value"\t"Standard Units"\t"pChEMBL Value"\t"Data Validity Comment"\t"Comment"\t"Uo Units"\t"Ligand Efficiency BEI"\t"Ligand Efficiency LE"\t"Ligand Efficiency LLE"\t"Ligand Efficiency SEI"\t"Potential Duplicate"\t"Assay ChEMBL ID"\t"Assay Description"\t"Assay Type"\t"BAO Format ID"\t"BAO Label"\t"Assay Organism"\t"Assay Tissue ChEMBL ID"\t"Assay Tissue Name"\t"Assay Cell Type"\t"Assay Subcellular Fraction"\t"Target ChEMBL ID"\t"Target Name"\t"Target Organism"\t"Target Type"\t"Document ChEMBL ID"\t"Source ID"\t"Source Description"\t"Document Journal"\t"Document Year"\t"Cell ChEMBL ID"\n"CHEMBL14598"\t""\t"0"\t"386.43"\t"0"\t"2.35"\t"1k"\t"COc1ccc(C(=O)/C(=N/O)SCCCN(C)C)cc1.O=C(O)C(=O)O"\t"IC50"\t""\t""\t""\t""\t""\t"Not Determined"\t""\t""\t""\t""\t""\t"False"\t"CHEMBL644103"\t"Compound was evaluated for 

In [17]:
# Reading the downloaded content and turning it into a pandas dataframe
orig_df = pd.read_csv(io.StringIO(download.decode('utf-8')), sep='\t' )

In [18]:
orig_df

Unnamed: 0,Molecule ChEMBL ID,Molecule Name,Molecule Max Phase,Molecular Weight,#RO5 Violations,AlogP,Compound Key,Smiles,Standard Type,Standard Relation,Standard Value,Standard Units,pChEMBL Value,Data Validity Comment,Comment,Uo Units,Ligand Efficiency BEI,Ligand Efficiency LE,Ligand Efficiency LLE,Ligand Efficiency SEI,Potential Duplicate,Assay ChEMBL ID,Assay Description,Assay Type,BAO Format ID,BAO Label,Assay Organism,Assay Tissue ChEMBL ID,Assay Tissue Name,Assay Cell Type,Assay Subcellular Fraction,Target ChEMBL ID,Target Name,Target Organism,Target Type,Document ChEMBL ID,Source ID,Source Description,Document Journal,Document Year,Cell ChEMBL ID
0,CHEMBL14598,,0,386.43,0,2.35,1k,COc1ccc(C(=O)/C(=N/O)SCCCN(C)C)cc1.O=C(O)C(=O)O,IC50,,,,,,Not Determined,,,,,,False,CHEMBL644103,Compound was evaluated for Reversible inhibiti...,B,BAO_0000357,single protein format,,,,,,CHEMBL220,Acetylcholinesterase,Homo sapiens,SINGLE PROTEIN,CHEMBL1123431,1,Scientific Literature,J. Med. Chem.,1986.0,
1,CHEMBL496946,,0,452.56,0,3.89,3c,N[C@@H](Cc1ccc2ccccc2c1)C(=O)NCC(=O)Nc1c2c(nc3...,Ki,'=',11190.0,nM,4.95,,,UO_0000065,10.94,0.20,1.06,5.10,False,CHEMBL997710,Inhibition of human recombinant AChE,B,BAO_0000357,single protein format,Homo sapiens,,,,,CHEMBL220,Acetylcholinesterase,Homo sapiens,SINGLE PROTEIN,CHEMBL1141710,1,Scientific Literature,Bioorg. Med. Chem. Lett.,2008.0,
2,CHEMBL508778,,0,555.68,2,5.18,3g,CC(C)(C)OC(=O)N[C@@H](Cc1c[nH]c2ccccc12)C(=O)N...,Ki,'=',7970.0,nM,5.10,,,UO_0000065,9.18,0.17,-0.08,4.07,False,CHEMBL997710,Inhibition of human recombinant AChE,B,BAO_0000357,single protein format,Homo sapiens,,,,,CHEMBL220,Acetylcholinesterase,Homo sapiens,SINGLE PROTEIN,CHEMBL1141710,1,Scientific Literature,Bioorg. Med. Chem. Lett.,2008.0,
3,CHEMBL496127,,0,455.56,0,3.61,3h,N[C@@H](Cc1c[nH]c2ccccc12)C(=O)NCCC(=O)Nc1c2c(...,Ki,'=',34030.0,nM,4.47,,,UO_0000065,9.81,0.18,0.86,3.96,False,CHEMBL997710,Inhibition of human recombinant AChE,B,BAO_0000357,single protein format,Homo sapiens,,,,,CHEMBL220,Acetylcholinesterase,Homo sapiens,SINGLE PROTEIN,CHEMBL1141710,1,Scientific Literature,Bioorg. Med. Chem. Lett.,2008.0,
4,CHEMBL95,TACRINE,4,198.27,0,2.70,1,Nc1c2c(nc3ccccc13)CCCC2,Ki,'=',36.0,nM,7.44,,,UO_0000065,37.54,0.68,4.74,19.13,True,CHEMBL997710,Inhibition of human recombinant AChE,B,BAO_0000357,single protein format,Homo sapiens,,,,,CHEMBL220,Acetylcholinesterase,Homo sapiens,SINGLE PROTEIN,CHEMBL1141710,1,Scientific Literature,Bioorg. Med. Chem. Lett.,2008.0,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
13475,CHEMBL4282360,,0,309.28,0,2.99,3g,CCCCCCN(C)Cc1cc(N)ccc1O.Cl.Cl,Activity,,,,,,Active,,,,,,False,CHEMBL4274220,Reactivation of VX-induced inhibition of AChE ...,B,BAO_0000249,cell membrane format,Homo sapiens,,,,,CHEMBL220,Acetylcholinesterase,Homo sapiens,SINGLE PROTEIN,CHEMBL4270596,1,Scientific Literature,Eur J Med Chem,2018.0,
13476,CHEMBL4285213,,0,295.25,0,2.60,3i,CCCN(CCC)Cc1cc(N)ccc1O.Cl.Cl,Activity,,,,,,Not Active,,,,,,False,CHEMBL4274224,Reactivation of GA-induced inhibition of AChE ...,B,BAO_0000249,cell membrane format,Homo sapiens,,,,,CHEMBL220,Acetylcholinesterase,Homo sapiens,SINGLE PROTEIN,CHEMBL4270596,1,Scientific Literature,Eur J Med Chem,2018.0,
13477,CHEMBL4282163,,0,265.18,0,1.57,3l,Cl.Cl.Nc1ccc(O)c(CN2CCCC2)c1,K,,,,,,Not Active,,,,,,False,CHEMBL4274245,Reactivation of GA-induced inhibition of AChE ...,B,BAO_0000249,cell membrane format,Homo sapiens,,,,,CHEMBL220,Acetylcholinesterase,Homo sapiens,SINGLE PROTEIN,CHEMBL4270596,1,Scientific Literature,Eur J Med Chem,2018.0,
13478,CHEMBL502,DONEPEZIL,4,379.50,0,4.36,Donepezil,COc1cc2c(cc1OC)C(=O)C(CC1CCN(Cc3ccccc3)CC1)C2,IC50,'=',22.0,nM,7.66,,,UO_0000065,,,,,False,CHEMBL4273352,Inhibition of AChE (unknown origin),A,BAO_0000357,single protein format,Homo sapiens,,,,,CHEMBL220,Acetylcholinesterase,Homo sapiens,SINGLE PROTEIN,CHEMBL4270567,1,Scientific Literature,Eur J Med Chem,2018.0,


In [None]:
!date # ending time

Fri Feb 12 14:54:35 UTC 2021
