<a href="https://colab.research.google.com/github/mr-nahash/drug-discovery-antipsychotics-D2DR/blob/main/CDD_ML_Part_1_sigma1_Download_bioactivity_and_target.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Bioinformatics Project - Computational Drug Discovery [Part 1] Download Bioactivity Data (Concised version)**

 By Fernando Martinez

In this Jupyter notebook, we will be building a real-life **data science project**. Particularly, we will be building a machine learning model using the ChEMBL bioactivity data.

In **Part 1**, we will be performing Data Collection and Pre-Processing from the ChEMBL Database.

Note for this Concised Version:
* Redundant code cells were deleted.
* Code cells for saving files to Google Drive has been deleted.

---

## **ChEMBL Database**


The [*ChEMBL Database*](https://www.ebi.ac.uk/chembl/) is a database that contains curated bioactivity data of more than 2 million compounds. It is compiled from more than 76,000 documents, 1.2 million assays and the data spans 13,000 targets and 1,800 cells and 33,000 indications.
[Data as of April 25, 2022; ChEMBL version 26].

## **Installing libraries**

Install the ChEMBL web service package so that we can retrieve bioactivity data from the ChEMBL Database.

In [None]:
! pip install chembl_webresource_client



## **Importing libraries**

In [None]:
# Import necessary libraries
import pandas as pd
from chembl_webresource_client.new_client import new_client
help(new_client)

Help on NewClient in module chembl_webresource_client.new_client object:

class NewClient(builtins.object)
 |  Data descriptors defined here:
 |  
 |  __dict__
 |      dictionary for instance variables (if defined)
 |  
 |  __weakref__
 |      list of weak references to the object (if defined)



## **Search for Target protein**

### **Target search for sigma1**

In [None]:
# Target search for sigma1
target = new_client.target
target_query = target.search('sigma1')
targets = pd.DataFrame.from_dict(target_query)
targets

Unnamed: 0,cross_references,organism,pref_name,score,species_group_flag,target_chembl_id,target_components,target_type,tax_id
0,"[{'xref_id': 'NBK23440', 'xref_name': 'Sigma 1...",Mus musculus,Sigma opioid receptor,14.0,False,CHEMBL3465,"[{'accession': 'O55242', 'component_descriptio...",SINGLE PROTEIN,10090
1,"[{'xref_id': 'Q9R0C9', 'xref_name': None, 'xre...",Rattus norvegicus,Sigma opioid receptor,14.0,False,CHEMBL3602,"[{'accession': 'Q9R0C9', 'component_descriptio...",SINGLE PROTEIN,10116
2,"[{'xref_id': 'Q60492', 'xref_name': None, 'xre...",Cavia porcellus,Sigma-1 receptor,14.0,False,CHEMBL4153,"[{'accession': 'Q60492', 'component_descriptio...",SINGLE PROTEIN,10141
3,"[{'xref_id': 'Q99720', 'xref_name': None, 'xre...",Homo sapiens,Sigma opioid receptor,11.0,False,CHEMBL287,"[{'accession': 'Q99720', 'component_descriptio...",SINGLE PROTEIN,9606
4,[],Homo sapiens,Sigma receptor,8.0,False,CHEMBL4524009,"[{'accession': 'Q99720', 'component_descriptio...",PROTEIN FAMILY,9606


### **Select and retrieve bioactivity data for *sigma1 receptor* (3rd entry)**

We will assign the 3rd entry (which corresponds to the target protein, *sigma 1 receptor*) to the ***selected_target*** variable 

In [None]:
selected_target = targets.target_chembl_id[3]
selected_target

'CHEMBL287'

Here, we will retrieve only bioactivity data for * sigma1 receptor* (CHEMBL4296312) that are reported as IC$_{50}$ values in nM (nanomolar) unit.

In [None]:
activity = new_client.activity
res = activity.filter(target_chembl_id=selected_target).filter(standard_type="IC50")

In [None]:
df = pd.DataFrame.from_dict(res)

In [None]:
df.head(10)

Unnamed: 0,activity_comment,activity_id,activity_properties,assay_chembl_id,assay_description,assay_type,assay_variant_accession,assay_variant_mutation,bao_endpoint,bao_format,bao_label,canonical_smiles,data_validity_comment,data_validity_description,document_chembl_id,document_journal,document_year,ligand_efficiency,molecule_chembl_id,molecule_pref_name,parent_molecule_chembl_id,pchembl_value,potential_duplicate,qudt_units,record_id,relation,src_id,standard_flag,standard_relation,standard_text_value,standard_type,standard_units,standard_upper_value,standard_value,target_chembl_id,target_organism,target_pref_name,target_tax_id,text_value,toid,type,units,uo_units,upper_value,value
0,,33306,[],CHEMBL804884,The compound was evaluated for binding affinit...,B,,,BAO_0000190,BAO_0000249,cell membrane format,C/C(=N\C1CCCCC1)NC12CC3CC(CC(C3)C1)C2,,,CHEMBL1127129,Bioorg. Med. Chem. Lett.,1993.0,"{'bei': '26.03', 'le': '0.49', 'lle': '2.84', ...",CHEMBL67010,,CHEMBL67010,7.14,False,http://www.openphacts.org/units/Nanomolar,115498,=,1,True,=,,IC50,nM,,72.0,CHEMBL287,Homo sapiens,Sigma opioid receptor,9606,,,IC50,nM,UO_0000065,,72.0
1,,34563,[],CHEMBL804884,The compound was evaluated for binding affinit...,B,,,BAO_0000190,BAO_0000249,cell membrane format,C/C(=N\C12CC3CC(CC(C3)C1)C2)Nc1ccccc1C.Cl,,,CHEMBL1127129,Bioorg. Med. Chem. Lett.,1993.0,"{'bei': '29.11', 'le': '0.53', 'lle': '3.43', ...",CHEMBL542638,,CHEMBL1191528,8.22,False,http://www.openphacts.org/units/Nanomolar,115495,=,1,True,=,,IC50,nM,,6.0,CHEMBL287,Homo sapiens,Sigma opioid receptor,9606,,,IC50,nM,UO_0000065,,6.0
2,,46551,[],CHEMBL804884,The compound was evaluated for binding affinit...,B,,,BAO_0000190,BAO_0000249,cell membrane format,C/C(=N\C1CCCCC1)Nc1ccccc1C.Cl,,,CHEMBL1127129,Bioorg. Med. Chem. Lett.,1993.0,"{'bei': '34.93', 'le': '0.65', 'lle': '3.89', ...",CHEMBL544054,,CHEMBL1192761,8.05,False,http://www.openphacts.org/units/Nanomolar,115496,=,1,True,=,,IC50,nM,,9.0,CHEMBL287,Homo sapiens,Sigma opioid receptor,9606,,,IC50,nM,UO_0000065,,9.0
3,,51708,[],CHEMBL804884,The compound was evaluated for binding affinit...,B,,,BAO_0000190,BAO_0000249,cell membrane format,C/C(=N\C12CC3CC(CC(C3)C1)C2)NC12CC3CC(CC(C3)C1)C2,,,CHEMBL1127129,Bioorg. Med. Chem. Lett.,1993.0,"{'bei': '23.87', 'le': '0.44', 'lle': '2.87', ...",CHEMBL67388,,CHEMBL67388,7.8,False,http://www.openphacts.org/units/Nanomolar,115497,=,1,True,=,,IC50,nM,,16.0,CHEMBL287,Homo sapiens,Sigma opioid receptor,9606,,,IC50,nM,UO_0000065,,16.0
4,,53078,[],CHEMBL804884,The compound was evaluated for binding affinit...,B,,,BAO_0000190,BAO_0000249,cell membrane format,C/C(=N\c1ccccc1C)Nc1ccccc1C.Cl,,,CHEMBL1127129,Bioorg. Med. Chem. Lett.,1993.0,"{'bei': '32.83', 'le': '0.59', 'lle': '3.35', ...",CHEMBL538754,,CHEMBL1189436,7.82,False,http://www.openphacts.org/units/Nanomolar,115500,=,1,True,=,,IC50,nM,,15.0,CHEMBL287,Homo sapiens,Sigma opioid receptor,9606,,,IC50,nM,UO_0000065,,15.0
5,,61954,[],CHEMBL804884,The compound was evaluated for binding affinit...,B,,,BAO_0000190,BAO_0000249,cell membrane format,C/C(=N\C1CC2CCC1C2)Nc1ccccc1C,,,CHEMBL1127129,Bioorg. Med. Chem. Lett.,1993.0,"{'bei': '33.20', 'le': '0.61', 'lle': '4.04', ...",CHEMBL63508,,CHEMBL63508,8.05,False,http://www.openphacts.org/units/Nanomolar,115501,=,1,True,=,,IC50,nM,,9.0,CHEMBL287,Homo sapiens,Sigma opioid receptor,9606,,,IC50,nM,UO_0000065,,9.0
6,,67137,[],CHEMBL804884,The compound was evaluated for binding affinit...,B,,,BAO_0000190,BAO_0000249,cell membrane format,C/C(=N\c1ccc(Br)cc1C)Nc1ccc(Br)cc1C,,,CHEMBL1127129,Bioorg. Med. Chem. Lett.,1993.0,"{'bei': '19.15', 'le': '0.52', 'lle': '1.60', ...",CHEMBL67665,,CHEMBL67665,7.58,False,http://www.openphacts.org/units/Nanomolar,115499,=,1,True,=,,IC50,nM,,26.0,CHEMBL287,Homo sapiens,Sigma opioid receptor,9606,,,IC50,nM,UO_0000065,,26.0
7,,68349,[],CHEMBL804884,The compound was evaluated for binding affinit...,B,,,BAO_0000190,BAO_0000249,cell membrane format,Cc1ccccc1NC(=N)Nc1ccccc1C,,,CHEMBL1127129,Bioorg. Med. Chem. Lett.,1993.0,"{'bei': '31.32', 'le': '0.57', 'lle': '3.73', ...",CHEMBL282433,DITOLYLGUANIDINE,CHEMBL282433,7.5,False,http://www.openphacts.org/units/Nanomolar,115494,=,1,True,=,,IC50,nM,,32.0,CHEMBL287,Homo sapiens,Sigma opioid receptor,9606,,,IC50,nM,UO_0000065,,32.0
8,,108356,[],CHEMBL803934,Sigma receptor binding was determined by label...,B,,,BAO_0000190,BAO_0000019,assay format,Fc1ccc(C(OCCN2CCN(C/C=C/c3ccco3)CC2)c2ccc(F)cc...,,,CHEMBL1129953,J. Med. Chem.,1997.0,"{'bei': '16.80', 'le': '0.31', 'lle': '2.38', ...",CHEMBL159967,,CHEMBL159967,7.37,False,http://www.openphacts.org/units/Nanomolar,317786,=,1,True,=,,IC50,nM,,43.0,CHEMBL287,Homo sapiens,Sigma opioid receptor,9606,,,IC50,nM,UO_0000065,,43.0
9,,117620,[],CHEMBL803934,Sigma receptor binding was determined by label...,B,,,BAO_0000190,BAO_0000019,assay format,c1ccc(CCCN2CCN(CCOC(c3ccccc3)c3ccccc3)CC2)cc1,,,CHEMBL1129953,J. Med. Chem.,1997.0,"{'bei': '18.74', 'le': '0.34', 'lle': '2.73', ...",CHEMBL26320,,CHEMBL26320,7.77,False,http://www.openphacts.org/units/Nanomolar,317768,=,1,True,=,,IC50,nM,,17.0,CHEMBL287,Homo sapiens,Sigma opioid receptor,9606,,,IC50,nM,UO_0000065,,17.0


Finally we will save the resulting bioactivity data to a CSV file **bioactivity_data.csv**.

In [None]:
df.to_csv('sigma1_data_raw.csv', index=False)

# Copying files to Google Drive

In [None]:
from google.colab import drive
drive.mount('/content/gdrive/', force_remount=True)

Mounted at /content/gdrive/


In [None]:
! mkdir "/content/gdrive/My Drive/Colab Notebooks/data"
! cp sigma1_data_raw.csv "/content/gdrive/My Drive/Colab Notebooks/data"
! ls -l "/content/gdrive/My Drive/Colab Notebooks/data"
! ls
! head sigma1_data_raw.csv

mkdir: cannot create directory ‘/content/gdrive/My Drive/Colab Notebooks/data’: File exists
total 946
-rw------- 1 root root  72601 Feb  4 05:32 sigma1_bioactivity_data_preprocessed.csv
-rw------- 1 root root 895639 Feb  4 07:06 sigma1_data_raw.csv
bioactivity_data_preprocessed.csv  sample_data
bioactivity_data_raw.csv	   sigma1_bioactivity_data_preprocessed.csv
gdrive				   sigma1_data_raw.csv
activity_comment,activity_id,activity_properties,assay_chembl_id,assay_description,assay_type,assay_variant_accession,assay_variant_mutation,bao_endpoint,bao_format,bao_label,canonical_smiles,data_validity_comment,data_validity_description,document_chembl_id,document_journal,document_year,ligand_efficiency,molecule_chembl_id,molecule_pref_name,parent_molecule_chembl_id,pchembl_value,potential_duplicate,qudt_units,record_id,relation,src_id,standard_flag,standard_relation,standard_text_value,standard_type,standard_units,standard_upper_value,standard_value,target_chembl_id,target_organism,target_pr

## **Handling missing data**
If any compounds has missing value for the **standard_value** column then drop it

In [None]:
df2 = df[df.standard_value.notna()]
df2

Unnamed: 0,activity_comment,activity_id,activity_properties,assay_chembl_id,assay_description,assay_type,assay_variant_accession,assay_variant_mutation,bao_endpoint,bao_format,bao_label,canonical_smiles,data_validity_comment,data_validity_description,document_chembl_id,document_journal,document_year,ligand_efficiency,molecule_chembl_id,molecule_pref_name,parent_molecule_chembl_id,pchembl_value,potential_duplicate,qudt_units,record_id,relation,src_id,standard_flag,standard_relation,standard_text_value,standard_type,standard_units,standard_upper_value,standard_value,target_chembl_id,target_organism,target_pref_name,target_tax_id,text_value,toid,type,units,uo_units,upper_value,value
0,,33306,[],CHEMBL804884,The compound was evaluated for binding affinit...,B,,,BAO_0000190,BAO_0000249,cell membrane format,C/C(=N\C1CCCCC1)NC12CC3CC(CC(C3)C1)C2,,,CHEMBL1127129,Bioorg. Med. Chem. Lett.,1993.0,"{'bei': '26.03', 'le': '0.49', 'lle': '2.84', ...",CHEMBL67010,,CHEMBL67010,7.14,False,http://www.openphacts.org/units/Nanomolar,115498,=,1,True,=,,IC50,nM,,72.0,CHEMBL287,Homo sapiens,Sigma opioid receptor,9606,,,IC50,nM,UO_0000065,,72.0
1,,34563,[],CHEMBL804884,The compound was evaluated for binding affinit...,B,,,BAO_0000190,BAO_0000249,cell membrane format,C/C(=N\C12CC3CC(CC(C3)C1)C2)Nc1ccccc1C.Cl,,,CHEMBL1127129,Bioorg. Med. Chem. Lett.,1993.0,"{'bei': '29.11', 'le': '0.53', 'lle': '3.43', ...",CHEMBL542638,,CHEMBL1191528,8.22,False,http://www.openphacts.org/units/Nanomolar,115495,=,1,True,=,,IC50,nM,,6.0,CHEMBL287,Homo sapiens,Sigma opioid receptor,9606,,,IC50,nM,UO_0000065,,6.0
2,,46551,[],CHEMBL804884,The compound was evaluated for binding affinit...,B,,,BAO_0000190,BAO_0000249,cell membrane format,C/C(=N\C1CCCCC1)Nc1ccccc1C.Cl,,,CHEMBL1127129,Bioorg. Med. Chem. Lett.,1993.0,"{'bei': '34.93', 'le': '0.65', 'lle': '3.89', ...",CHEMBL544054,,CHEMBL1192761,8.05,False,http://www.openphacts.org/units/Nanomolar,115496,=,1,True,=,,IC50,nM,,9.0,CHEMBL287,Homo sapiens,Sigma opioid receptor,9606,,,IC50,nM,UO_0000065,,9.0
3,,51708,[],CHEMBL804884,The compound was evaluated for binding affinit...,B,,,BAO_0000190,BAO_0000249,cell membrane format,C/C(=N\C12CC3CC(CC(C3)C1)C2)NC12CC3CC(CC(C3)C1)C2,,,CHEMBL1127129,Bioorg. Med. Chem. Lett.,1993.0,"{'bei': '23.87', 'le': '0.44', 'lle': '2.87', ...",CHEMBL67388,,CHEMBL67388,7.80,False,http://www.openphacts.org/units/Nanomolar,115497,=,1,True,=,,IC50,nM,,16.0,CHEMBL287,Homo sapiens,Sigma opioid receptor,9606,,,IC50,nM,UO_0000065,,16.0
4,,53078,[],CHEMBL804884,The compound was evaluated for binding affinit...,B,,,BAO_0000190,BAO_0000249,cell membrane format,C/C(=N\c1ccccc1C)Nc1ccccc1C.Cl,,,CHEMBL1127129,Bioorg. Med. Chem. Lett.,1993.0,"{'bei': '32.83', 'le': '0.59', 'lle': '3.35', ...",CHEMBL538754,,CHEMBL1189436,7.82,False,http://www.openphacts.org/units/Nanomolar,115500,=,1,True,=,,IC50,nM,,15.0,CHEMBL287,Homo sapiens,Sigma opioid receptor,9606,,,IC50,nM,UO_0000065,,15.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1898,,16466402,[],CHEMBL3768981,Displacement of 5-(Dimethylamino)-2-(6-((5-(4-...,B,,,BAO_0000190,BAO_0000219,cell-based format,CC(C)=CCN1CC[C@]2(C)c3cc(O)ccc3C[C@H]1[C@H]2C,,,CHEMBL3763126,Eur. J. Med. Chem.,2016.0,"{'bei': '28.32', 'le': '0.53', 'lle': '4.20', ...",CHEMBL60542,(+)-PENTAZOCINE,CHEMBL60542,8.08,False,http://www.openphacts.org/units/Nanomolar,2757235,=,1,True,=,,IC50,nM,,8.24,CHEMBL287,Homo sapiens,Sigma opioid receptor,9606,,,IC50,nM,UO_0000065,,8.24
1899,,16466403,[],CHEMBL3768982,Displacement of 5-(Dimethylamino)-2-{6-[(5-(4-...,B,,,BAO_0000190,BAO_0000219,cell-based format,COc1ccc2c(c1)CCCC2CCCCN1CCC(C)CC1,,,CHEMBL3763126,Eur. J. Med. Chem.,2016.0,"{'bei': '25.09', 'le': '0.47', 'lle': '2.90', ...",CHEMBL177952,,CHEMBL177952,7.92,False,http://www.openphacts.org/units/Nanomolar,2757034,=,1,True,=,,IC50,nM,,12.1,CHEMBL287,Homo sapiens,Sigma opioid receptor,9606,,,IC50,nM,UO_0000065,,12.1
1900,,16466404,[],CHEMBL3768982,Displacement of 5-(Dimethylamino)-2-{6-[(5-(4-...,B,,,BAO_0000190,BAO_0000219,cell-based format,COc1ccc2c(CCCCN3CCC(C)CC3)cccc2c1,,,CHEMBL3763126,Eur. J. Med. Chem.,2016.0,"{'bei': '25.60', 'le': '0.47', 'lle': '3.07', ...",CHEMBL176941,,CHEMBL176941,7.97,False,http://www.openphacts.org/units/Nanomolar,2757035,=,1,True,=,,IC50,nM,,10.6,CHEMBL287,Homo sapiens,Sigma opioid receptor,9606,,,IC50,nM,UO_0000065,,10.6
1901,,16466405,[],CHEMBL3768983,Displacement of 5-(Dimethylamino)-2-{6-[(5-(4-...,B,,,BAO_0000190,BAO_0000219,cell-based format,CC(C)=CCN1CC[C@]2(C)c3cc(O)ccc3C[C@H]1[C@H]2C,,,CHEMBL3763126,Eur. J. Med. Chem.,2016.0,"{'bei': '28.45', 'le': '0.53', 'lle': '4.24', ...",CHEMBL60542,(+)-PENTAZOCINE,CHEMBL60542,8.12,False,http://www.openphacts.org/units/Nanomolar,2757235,=,1,True,=,,IC50,nM,,7.59,CHEMBL287,Homo sapiens,Sigma opioid receptor,9606,,,IC50,nM,UO_0000065,,7.59


Apparently, for this dataset there is no missing data. But we can use the above code cell for bioactivity data of other target protein.

## **Data pre-processing of the bioactivity data**

**Labeling compounds as either being active, inactive or intermediate**
The bioactivity data is in the IC50 unit. Compounds having values of less than 1000 nM will be considered to be **active** while those greater than 10,000 nM will be considered to be **inactive**. As for those values in between 1,000 and 10,000 nM will be referred to as **intermediate**. 

In [None]:
bioactivity_class = []
for i in df2.standard_value:
  if float(i) >= 10000:
    bioactivity_class.append("inactive")
  elif float(i) <= 1000:
    bioactivity_class.append("active")
  else:
    bioactivity_class.append("intermediate")

**Combine the 3 columns (molecule_chembl_id,canonical_smiles,standard_value) and bioactivity_class into a DataFrame**

In [None]:
selection = ['molecule_chembl_id','canonical_smiles','standard_value']
df3 = df2[selection]

df3

In [None]:
df4 = pd.concat([df3, bioactivity_class], axis=1)
df4
#REMEMBER LOW VALUES OF STANDARD VALUE ARE GOOOOODDDD!!!!

Unnamed: 0,molecule_chembl_id,canonical_smiles,standard_value,bioactivity_class
0,CHEMBL67010,C/C(=N\C1CCCCC1)NC12CC3CC(CC(C3)C1)C2,72.0,active
1,CHEMBL542638,C/C(=N\C12CC3CC(CC(C3)C1)C2)Nc1ccccc1C.Cl,6.0,active
2,CHEMBL544054,C/C(=N\C1CCCCC1)Nc1ccccc1C.Cl,9.0,active
3,CHEMBL67388,C/C(=N\C12CC3CC(CC(C3)C1)C2)NC12CC3CC(CC(C3)C1)C2,16.0,active
4,CHEMBL538754,C/C(=N\c1ccccc1C)Nc1ccccc1C.Cl,15.0,active
...,...,...,...,...
1898,CHEMBL60542,CC(C)=CCN1CC[C@]2(C)c3cc(O)ccc3C[C@H]1[C@H]2C,8.24,
1899,CHEMBL177952,COc1ccc2c(c1)CCCC2CCCCN1CCC(C)CC1,12.1,
1900,CHEMBL176941,COc1ccc2c(CCCCN3CCC(C)CC3)cccc2c1,10.6,
1901,CHEMBL60542,CC(C)=CCN1CC[C@]2(C)c3cc(O)ccc3C[C@H]1[C@H]2C,7.59,


Saves dataframe to CSV file and add to google drive

In [None]:
df4.to_csv('sigma1_bioactivity_data_preprocessed.csv', index=False)
! cp sigma1_bioactivity_data_preprocessed.csv "/content/gdrive/My Drive/Colab Notebooks/data"

In [None]:
! ls -l

total 1832
-rw-r--r-- 1 root root  72601 Feb  4 05:29 bioactivity_data_preprocessed.csv
-rw-r--r-- 1 root root 895639 Feb  4 04:58 bioactivity_data_raw.csv
drwx------ 5 root root   4096 Feb  4 05:13 gdrive
drwxr-xr-x 1 root root   4096 Feb  1 14:32 sample_data
-rw-r--r-- 1 root root 895639 Feb  4 05:12 sigma1_data_raw.csv


---

In [None]:
for i in df.iloc[i]['bioactivity_class'] ==True


SyntaxError: ignored

In [None]:
type(df4)
print(df['canonical_smiles'])
print(df.iloc[0]['canonical_smiles'])
print(df.iloc[3]['canonical_smiles'])

0                   C/C(=N\C1CCCCC1)NC12CC3CC(CC(C3)C1)C2
1               C/C(=N\C12CC3CC(CC(C3)C1)C2)Nc1ccccc1C.Cl
2                           C/C(=N\C1CCCCC1)Nc1ccccc1C.Cl
3       C/C(=N\C12CC3CC(CC(C3)C1)C2)NC12CC3CC(CC(C3)C1)C2
4                          C/C(=N\c1ccccc1C)Nc1ccccc1C.Cl
                              ...                        
1898        CC(C)=CCN1CC[C@]2(C)c3cc(O)ccc3C[C@H]1[C@H]2C
1899                    COc1ccc2c(c1)CCCC2CCCCN1CCC(C)CC1
1900                    COc1ccc2c(CCCCN3CCC(C)CC3)cccc2c1
1901        CC(C)=CCN1CC[C@]2(C)c3cc(O)ccc3C[C@H]1[C@H]2C
1902               CCC(=O)N(c1ccccc1)C1CCN(CCc2ccccc2)CC1
Name: canonical_smiles, Length: 1903, dtype: object
C/C(=N\C1CCCCC1)NC12CC3CC(CC(C3)C1)C2
C/C(=N\C12CC3CC(CC(C3)C1)C2)NC12CC3CC(CC(C3)C1)C2


# MOLECULE VISUALIZATION


In [None]:
%%bash
wget -c https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh
chmod +x Miniconda3-latest-Linux-x86_64.sh
./Miniconda3-latest-Linux-x86_64.sh -b -f -p /usr/local
conda config --set always_yes yes --set changeps1 no
conda install -q -y -c conda-forge python=3.7
conda install -q -y -c conda-forge rdkit==2020.09.2 

In [None]:
import sys
sys.path.append('/usr/local/lib/python3.7/site-packages/')

try:
  from rdkit import Chem
  from rdkit.Chem.Draw import IPythonConsole
except ImportError:
  print('Stopping RUNTIME. Colaboratory will restart automatically. Please run again.')
  exit()

In [None]:
for i in range(0,4):
 a =+ i
 print(a)
mol = Chem.MolFromSmiles(df.iloc[a]['canonical_smiles'])
mol

In [None]:
smiles = Chem.MoltoSmiles(mol)
smiles_list = df['canonical_smiles']
mol_list = []
for i in smiles_list:
  mol = Chem.MolFromSmiles(smiles)
  mol_list.append(mol)
img = DrawMolsToGridImage(mol_list, molsperrow=4)
img



NameError: ignored

Stadistical analisis
