# **Computational Drug Discovery - Download Bioactivity Data**

Machine learning model using the ChEMBL bioactivity data.

---

## **ChEMBL Database**

The [*ChEMBL Database*](https://www.ebi.ac.uk/chembl/) is a database containing curated bioactivity data of more than 2.1 million compounds compiled from more than 81,000 documents, 1.4 million assays and the data spans 14,000 targets and 2,000 cells and 40,000 indications.
[Data as of January 11, 2022; ChEMBL version 29].

## **Installing libraries**

Install ChEMBL web service package to retrieve bioactivity data from the ChEMBL Database.

In [None]:
! pip install chembl_webresource_client

Collecting chembl_webresource_client
  Downloading chembl_webresource_client-0.10.7-py3-none-any.whl (55 kB)
[?25l[K     |██████                          | 10 kB 24.1 MB/s eta 0:00:01[K     |███████████▉                    | 20 kB 30.8 MB/s eta 0:00:01[K     |█████████████████▊              | 30 kB 27.0 MB/s eta 0:00:01[K     |███████████████████████▋        | 40 kB 19.1 MB/s eta 0:00:01[K     |█████████████████████████████▌  | 51 kB 17.0 MB/s eta 0:00:01[K     |████████████████████████████████| 55 kB 3.3 MB/s 
Collecting requests-cache~=0.7.0
  Downloading requests_cache-0.7.5-py3-none-any.whl (39 kB)
Collecting itsdangerous>=2.0.1
  Downloading itsdangerous-2.0.1-py3-none-any.whl (18 kB)
Collecting pyyaml>=5.4
  Downloading PyYAML-6.0-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl (596 kB)
[K     |████████████████████████████████| 596 kB 29.7 MB/s 
Collecting url-normalize<2.0,>=1.4
  Downloading url_normalize-1.4.3-py2.

## **Importing libraries**

In [None]:
# Import necessary libraries
import pandas as pd
from chembl_webresource_client.new_client import new_client

## **Search for Target protein**

### **Target search for human epidermal growth factor receptor 2(HER2)**

In [None]:
# Target search for coronavirus
target = new_client.target
target_query = target.search('HER2')
targets = pd.DataFrame.from_dict(target_query)
targets

Unnamed: 0,cross_references,organism,pref_name,score,species_group_flag,target_chembl_id,target_components,target_type,tax_id
0,[],Homo sapiens,FASN/HER2,16.0,False,CHEMBL4106134,"[{'accession': 'P04626', 'component_descriptio...",PROTEIN COMPLEX,9606
1,"[{'xref_id': 'P04626', 'xref_name': None, 'xre...",Homo sapiens,Receptor protein-tyrosine kinase erbB-2,14.0,False,CHEMBL1824,"[{'accession': 'P04626', 'component_descriptio...",SINGLE PROTEIN,9606
2,[],Homo sapiens,Epidermal growth factor receptor and ErbB2 (HE...,12.0,False,CHEMBL2111431,"[{'accession': 'P04626', 'component_descriptio...",PROTEIN FAMILY,9606
3,[],Homo sapiens,ErbB-2/ErbB-3 heterodimer,11.0,False,CHEMBL4630723,"[{'accession': 'P04626', 'component_descriptio...",PROTEIN COMPLEX,9606
4,[],Homo sapiens,Epidermal growth factor receptor,8.0,False,CHEMBL2363049,"[{'accession': 'P04626', 'component_descriptio...",PROTEIN FAMILY,9606
5,"[{'xref_id': 'P06494', 'xref_name': None, 'xre...",Rattus norvegicus,Receptor protein-tyrosine kinase erbB-2,6.0,False,CHEMBL3848,"[{'accession': 'P06494', 'component_descriptio...",SINGLE PROTEIN,10116


### **Select and retrieve bioactivity data for specific entry using entry ID**

We will assign the second entry (which corresponds to the target protein, *Receptor protein-tyrosine kinase erbB-2*) to the ***selected_target*** variable 

In [None]:
selected_target = targets.target_chembl_id[1]
selected_target

'CHEMBL1824'

Here, we will retrieve only bioactivity data for *Receptor protein-tyrosine kinase erbB-2* (CHEMBL1824) that are reported as IC$_{50}$ values in nM (nanomolar) unit.

In [None]:
activity = new_client.activity
res = activity.filter(target_chembl_id=selected_target).filter(standard_type="IC50")

In [None]:
df = pd.DataFrame.from_dict(res)

In [None]:
df.head(5)

Unnamed: 0,activity_comment,activity_id,activity_properties,assay_chembl_id,assay_description,assay_type,assay_variant_accession,assay_variant_mutation,bao_endpoint,bao_format,bao_label,canonical_smiles,data_validity_comment,data_validity_description,document_chembl_id,document_journal,document_year,ligand_efficiency,molecule_chembl_id,molecule_pref_name,parent_molecule_chembl_id,pchembl_value,potential_duplicate,qudt_units,record_id,relation,src_id,standard_flag,standard_relation,standard_text_value,standard_type,standard_units,standard_upper_value,standard_value,target_chembl_id,target_organism,target_pref_name,target_tax_id,text_value,toid,type,units,uo_units,upper_value,value
0,,32264,[],CHEMBL845865,Inhibition of autophosphorylation of human Her...,F,,,BAO_0000190,BAO_0000219,cell-based format,Cc1cc(C)c(/C=C2\C(=O)Nc3ncnc(Nc4ccc(F)c(Cl)c4)...,,,CHEMBL1134862,Bioorg. Med. Chem. Lett.,2002.0,,CHEMBL68920,,CHEMBL68920,6.52,False,http://www.openphacts.org/units/Nanomolar,119482,=,1,True,=,,IC50,nM,,300.0,CHEMBL1824,Homo sapiens,Receptor protein-tyrosine kinase erbB-2,9606,,,IC50,uM,UO_0000065,,0.3
1,,32266,[],CHEMBL615491,Inhibition of ligand induced proliferation in ...,F,,,BAO_0000190,BAO_0000219,cell-based format,Cc1cc(C)c(/C=C2\C(=O)Nc3ncnc(Nc4ccc(F)c(Cl)c4)...,,,CHEMBL1134862,Bioorg. Med. Chem. Lett.,2002.0,,CHEMBL68920,,CHEMBL68920,5.6,False,http://www.openphacts.org/units/Nanomolar,119482,=,1,True,=,,IC50,nM,,2500.0,CHEMBL1824,Homo sapiens,Receptor protein-tyrosine kinase erbB-2,9606,,,IC50,uM,UO_0000065,,2.5
2,,32271,[],CHEMBL683802,Inhibition of autophosphorylation of human Her...,F,,,BAO_0000190,BAO_0000219,cell-based format,Cc1cc(C(=O)N2CCOCC2)[nH]c1/C=C1\C(=O)Nc2ncnc(N...,,,CHEMBL1134862,Bioorg. Med. Chem. Lett.,2002.0,,CHEMBL69960,,CHEMBL69960,6.4,False,http://www.openphacts.org/units/Nanomolar,119494,=,1,True,=,,IC50,nM,,400.0,CHEMBL1824,Homo sapiens,Receptor protein-tyrosine kinase erbB-2,9606,,,IC50,uM,UO_0000065,,0.4
3,,32273,[],CHEMBL615491,Inhibition of ligand induced proliferation in ...,F,,,BAO_0000190,BAO_0000219,cell-based format,Cc1cc(C(=O)N2CCOCC2)[nH]c1/C=C1\C(=O)Nc2ncnc(N...,,,CHEMBL1134862,Bioorg. Med. Chem. Lett.,2002.0,,CHEMBL69960,,CHEMBL69960,5.92,False,http://www.openphacts.org/units/Nanomolar,119494,=,1,True,=,,IC50,nM,,1210.0,CHEMBL1824,Homo sapiens,Receptor protein-tyrosine kinase erbB-2,9606,,,IC50,uM,UO_0000065,,1.21
4,,47937,[],CHEMBL683802,Inhibition of autophosphorylation of human Her...,F,,,BAO_0000190,BAO_0000219,cell-based format,Cc1cc(C(=O)N2CCOCC2)[nH]c1/C=C1\C(=O)Nc2ncnc(N...,,,CHEMBL1134862,Bioorg. Med. Chem. Lett.,2002.0,,CHEMBL67057,,CHEMBL67057,7.0,False,http://www.openphacts.org/units/Nanomolar,119500,=,1,True,=,,IC50,nM,,100.0,CHEMBL1824,Homo sapiens,Receptor protein-tyrosine kinase erbB-2,9606,,,IC50,uM,UO_0000065,,0.1


In [None]:
df.standard_type.unique()

array(['IC50'], dtype=object)

Finally we will save the resulting bioactivity data to a CSV file **bioactivity_data.csv**.

In [None]:
df.to_csv('bioactivity_data.csv', index=False)

## **Copying files to Google Drive**

Firstly, we need to mount the Google Drive into Colab so that we can have access to our Google adrive from within Colab.

In [None]:
from google.colab import drive
drive.mount('/content/gdrive/', force_remount=True)


Mounted at /content/gdrive/


Next, we create a **data** folder in our **Colab Notebooks** folder on Google Drive.

In [None]:
! mkdir "/content/gdrive/My Drive/Colab Notebooks/data"

In [None]:
! cp bioactivity_data.csv "/content/gdrive/My Drive/Colab Notebooks/data"

In [None]:
! ls -l "/content/gdrive/My Drive/Colab Notebooks/data"

total 1995
-rw------- 1 root root 2042499 Jan 11 05:19 bioactivity_data.csv


Let's see the CSV files that we have so far.

In [None]:
! ls

bioactivity_data.csv  bioactivity_preprocessed_data.csv  gdrive  sample_data


Take a glimpse of the **bioactivity_data.csv** file created.

In [None]:
! head bioactivity_data.csv

activity_comment,activity_id,activity_properties,assay_chembl_id,assay_description,assay_type,assay_variant_accession,assay_variant_mutation,bao_endpoint,bao_format,bao_label,canonical_smiles,data_validity_comment,data_validity_description,document_chembl_id,document_journal,document_year,ligand_efficiency,molecule_chembl_id,molecule_pref_name,parent_molecule_chembl_id,pchembl_value,potential_duplicate,qudt_units,record_id,relation,src_id,standard_flag,standard_relation,standard_text_value,standard_type,standard_units,standard_upper_value,standard_value,target_chembl_id,target_organism,target_pref_name,target_tax_id,text_value,toid,type,units,uo_units,upper_value,value
,32264,[],CHEMBL845865,Inhibition of autophosphorylation of human Her (p185erbB) tyrosine kinase expressed in SKOV-3 cells,F,,,BAO_0000190,BAO_0000219,cell-based format,Cc1cc(C)c(/C=C2\C(=O)Nc3ncnc(Nc4ccc(F)c(Cl)c4)c32)[nH]1,,,CHEMBL1134862,Bioorg. Med. Chem. Lett.,2002.0,,CHEMBL68920,,CHEMBL68920,6.52,False,http://www.o

## **Handling missing data**
If any compounds has missing value for the **standard_value** column then it will be removed from the dataset

In [None]:
df2 = df[df.standard_value.notna()]
df2

Unnamed: 0,activity_comment,activity_id,activity_properties,assay_chembl_id,assay_description,assay_type,assay_variant_accession,assay_variant_mutation,bao_endpoint,bao_format,bao_label,canonical_smiles,data_validity_comment,data_validity_description,document_chembl_id,document_journal,document_year,ligand_efficiency,molecule_chembl_id,molecule_pref_name,parent_molecule_chembl_id,pchembl_value,potential_duplicate,qudt_units,record_id,relation,src_id,standard_flag,standard_relation,standard_text_value,standard_type,standard_units,standard_upper_value,standard_value,target_chembl_id,target_organism,target_pref_name,target_tax_id,text_value,toid,type,units,uo_units,upper_value,value
0,,32264,[],CHEMBL845865,Inhibition of autophosphorylation of human Her...,F,,,BAO_0000190,BAO_0000219,cell-based format,Cc1cc(C)c(/C=C2\C(=O)Nc3ncnc(Nc4ccc(F)c(Cl)c4)...,,,CHEMBL1134862,Bioorg. Med. Chem. Lett.,2002.0,,CHEMBL68920,,CHEMBL68920,6.52,False,http://www.openphacts.org/units/Nanomolar,119482,=,1,True,=,,IC50,nM,,300.0,CHEMBL1824,Homo sapiens,Receptor protein-tyrosine kinase erbB-2,9606,,,IC50,uM,UO_0000065,,0.3
1,,32266,[],CHEMBL615491,Inhibition of ligand induced proliferation in ...,F,,,BAO_0000190,BAO_0000219,cell-based format,Cc1cc(C)c(/C=C2\C(=O)Nc3ncnc(Nc4ccc(F)c(Cl)c4)...,,,CHEMBL1134862,Bioorg. Med. Chem. Lett.,2002.0,,CHEMBL68920,,CHEMBL68920,5.60,False,http://www.openphacts.org/units/Nanomolar,119482,=,1,True,=,,IC50,nM,,2500.0,CHEMBL1824,Homo sapiens,Receptor protein-tyrosine kinase erbB-2,9606,,,IC50,uM,UO_0000065,,2.5
2,,32271,[],CHEMBL683802,Inhibition of autophosphorylation of human Her...,F,,,BAO_0000190,BAO_0000219,cell-based format,Cc1cc(C(=O)N2CCOCC2)[nH]c1/C=C1\C(=O)Nc2ncnc(N...,,,CHEMBL1134862,Bioorg. Med. Chem. Lett.,2002.0,,CHEMBL69960,,CHEMBL69960,6.40,False,http://www.openphacts.org/units/Nanomolar,119494,=,1,True,=,,IC50,nM,,400.0,CHEMBL1824,Homo sapiens,Receptor protein-tyrosine kinase erbB-2,9606,,,IC50,uM,UO_0000065,,0.4
3,,32273,[],CHEMBL615491,Inhibition of ligand induced proliferation in ...,F,,,BAO_0000190,BAO_0000219,cell-based format,Cc1cc(C(=O)N2CCOCC2)[nH]c1/C=C1\C(=O)Nc2ncnc(N...,,,CHEMBL1134862,Bioorg. Med. Chem. Lett.,2002.0,,CHEMBL69960,,CHEMBL69960,5.92,False,http://www.openphacts.org/units/Nanomolar,119494,=,1,True,=,,IC50,nM,,1210.0,CHEMBL1824,Homo sapiens,Receptor protein-tyrosine kinase erbB-2,9606,,,IC50,uM,UO_0000065,,1.21
4,,47937,[],CHEMBL683802,Inhibition of autophosphorylation of human Her...,F,,,BAO_0000190,BAO_0000219,cell-based format,Cc1cc(C(=O)N2CCOCC2)[nH]c1/C=C1\C(=O)Nc2ncnc(N...,,,CHEMBL1134862,Bioorg. Med. Chem. Lett.,2002.0,,CHEMBL67057,,CHEMBL67057,7.00,False,http://www.openphacts.org/units/Nanomolar,119500,=,1,True,=,,IC50,nM,,100.0,CHEMBL1824,Homo sapiens,Receptor protein-tyrosine kinase erbB-2,9606,,,IC50,uM,UO_0000065,,0.1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3857,,20708048,[],CHEMBL4628614,Inhibition of HER2 (unknown origin) using FAM-...,B,,,BAO_0000190,BAO_0000357,single protein format,O=C(/C=C/CN1CCCC1)N1CCOc2cc3ncnc(Nc4ccc(OCc5cc...,,,CHEMBL4627322,Bioorg Med Chem Lett,2020.0,"{'bei': '14.63', 'le': '0.28', 'lle': '2.28', ...",CHEMBL4635100,,CHEMBL4635100,8.40,False,http://www.openphacts.org/units/Nanomolar,3487704,=,1,True,=,,IC50,nM,,4.0,CHEMBL1824,Homo sapiens,Receptor protein-tyrosine kinase erbB-2,9606,,,IC50,uM,UO_0000065,,0.004
3858,,20708049,[],CHEMBL4628614,Inhibition of HER2 (unknown origin) using FAM-...,B,,,BAO_0000190,BAO_0000357,single protein format,CCN(CC)C/C=C/C(=O)N1CCOc2cc3ncnc(Nc4ccc(OCc5cc...,,,CHEMBL4627322,Bioorg Med Chem Lett,2020.0,"{'bei': '14.58', 'le': '0.28', 'lle': '2.03', ...",CHEMBL4648271,,CHEMBL4648271,8.40,False,http://www.openphacts.org/units/Nanomolar,3487707,=,1,True,=,,IC50,nM,,4.0,CHEMBL1824,Homo sapiens,Receptor protein-tyrosine kinase erbB-2,9606,,,IC50,uM,UO_0000065,,0.004
3859,,20708050,[],CHEMBL4628614,Inhibition of HER2 (unknown origin) using FAM-...,B,,,BAO_0000190,BAO_0000357,single protein format,CS(=O)(=O)CCNCc1ccc(-c2ccc3ncnc(Nc4ccc(OCc5ccc...,,,CHEMBL4627322,Bioorg Med Chem Lett,2020.0,"{'bei': '13.57', 'le': '0.27', 'lle': '1.75', ...",CHEMBL554,LAPATINIB,CHEMBL554,7.89,False,http://www.openphacts.org/units/Nanomolar,3487717,=,1,True,=,,IC50,nM,,13.0,CHEMBL1824,Homo sapiens,Receptor protein-tyrosine kinase erbB-2,9606,,,IC50,uM,UO_0000065,,0.013
3860,,20708051,[],CHEMBL4628614,Inhibition of HER2 (unknown origin) using FAM-...,B,,,BAO_0000190,BAO_0000357,single protein format,COc1cc2ncnc(Nc3ccc(F)c(Cl)c3)c2cc1OCCCN1CCOCC1,,,CHEMBL4627322,Bioorg Med Chem Lett,2020.0,"{'bei': '14.66', 'le': '0.29', 'lle': '2.27', ...",CHEMBL939,GEFITINIB,CHEMBL939,6.55,False,http://www.openphacts.org/units/Nanomolar,3487718,=,1,True,=,,IC50,nM,,281.0,CHEMBL1824,Homo sapiens,Receptor protein-tyrosine kinase erbB-2,9606,,,IC50,uM,UO_0000065,,0.281


## **Data pre-processing of the bioactivity data**

### **Labeling compounds as either being active, inactive or intermediate**
IC50 unit represents bioactivity data. Compounds with values of <1000 nM will be considered to be **active** while those >10,000 nM will be considered to be **inactive**. As for those values in between 1,000 and 10,000 nM will be referred to as **intermediate**. 

In [None]:
bioactivity_class = []
for i in df2.standard_value:
  if float(i) >= 10000:
    bioactivity_class.append("inactive")
  elif float(i) <= 1000:
    bioactivity_class.append("active")
  else:
    bioactivity_class.append("intermediate")

### **Iterate the *molecule_chembl_id* to a list**

In [None]:
mol_cid = []
for i in df2.molecule_chembl_id:
  mol_cid.append(i)

### **Iterate *canonical_smiles* to a list**

In [None]:
canonical_smiles = []
for i in df2.canonical_smiles:
  canonical_smiles.append(i)

### **Iterate *standard_value* to a list**

In [None]:
standard_value = []
for i in df2.standard_value:
  standard_value.append(i)

### **Combine the 4 lists into a dataframe**

In [None]:
data_tuples = list(zip(mol_cid, canonical_smiles, bioactivity_class, standard_value))
df3 = pd.DataFrame( data_tuples,  columns=['molecule_chembl_id', 'canonical_smiles', 'bioactivity_class', 'standard_value'])

In [None]:
df3

Unnamed: 0,molecule_chembl_id,canonical_smiles,bioactivity_class,standard_value
0,CHEMBL68920,Cc1cc(C)c(/C=C2\C(=O)Nc3ncnc(Nc4ccc(F)c(Cl)c4)...,active,300.0
1,CHEMBL68920,Cc1cc(C)c(/C=C2\C(=O)Nc3ncnc(Nc4ccc(F)c(Cl)c4)...,intermediate,2500.0
2,CHEMBL69960,Cc1cc(C(=O)N2CCOCC2)[nH]c1/C=C1\C(=O)Nc2ncnc(N...,active,400.0
3,CHEMBL69960,Cc1cc(C(=O)N2CCOCC2)[nH]c1/C=C1\C(=O)Nc2ncnc(N...,intermediate,1210.0
4,CHEMBL67057,Cc1cc(C(=O)N2CCOCC2)[nH]c1/C=C1\C(=O)Nc2ncnc(N...,active,100.0
...,...,...,...,...
2929,CHEMBL4635100,O=C(/C=C/CN1CCCC1)N1CCOc2cc3ncnc(Nc4ccc(OCc5cc...,active,4.0
2930,CHEMBL4648271,CCN(CC)C/C=C/C(=O)N1CCOc2cc3ncnc(Nc4ccc(OCc5cc...,active,4.0
2931,CHEMBL554,CS(=O)(=O)CCNCc1ccc(-c2ccc3ncnc(Nc4ccc(OCc5ccc...,active,13.0
2932,CHEMBL939,COc1cc2ncnc(Nc3ccc(F)c(Cl)c3)c2cc1OCCCN1CCOCC1,active,281.0


### **Alternative method**

In [None]:
selection = ['molecule_chembl_id', 'canonical_smiles', 'standard_value']
df3 = df2[selection]
df3

Unnamed: 0,molecule_chembl_id,canonical_smiles,standard_value
0,CHEMBL68920,Cc1cc(C)c(/C=C2\C(=O)Nc3ncnc(Nc4ccc(F)c(Cl)c4)...,300.0
1,CHEMBL68920,Cc1cc(C)c(/C=C2\C(=O)Nc3ncnc(Nc4ccc(F)c(Cl)c4)...,2500.0
2,CHEMBL69960,Cc1cc(C(=O)N2CCOCC2)[nH]c1/C=C1\C(=O)Nc2ncnc(N...,400.0
3,CHEMBL69960,Cc1cc(C(=O)N2CCOCC2)[nH]c1/C=C1\C(=O)Nc2ncnc(N...,1210.0
4,CHEMBL67057,Cc1cc(C(=O)N2CCOCC2)[nH]c1/C=C1\C(=O)Nc2ncnc(N...,100.0
...,...,...,...
3857,CHEMBL4635100,O=C(/C=C/CN1CCCC1)N1CCOc2cc3ncnc(Nc4ccc(OCc5cc...,4.0
3858,CHEMBL4648271,CCN(CC)C/C=C/C(=O)N1CCOc2cc3ncnc(Nc4ccc(OCc5cc...,4.0
3859,CHEMBL554,CS(=O)(=O)CCNCc1ccc(-c2ccc3ncnc(Nc4ccc(OCc5ccc...,13.0
3860,CHEMBL939,COc1cc2ncnc(Nc3ccc(F)c(Cl)c3)c2cc1OCCCN1CCOCC1,281.0


In [None]:
pd.concat([df3,pd.Series(bioactivity_class)], axis=1)

Unnamed: 0,molecule_chembl_id,canonical_smiles,standard_value,0
0,CHEMBL68920,Cc1cc(C)c(/C=C2\C(=O)Nc3ncnc(Nc4ccc(F)c(Cl)c4)...,300.0,active
1,CHEMBL68920,Cc1cc(C)c(/C=C2\C(=O)Nc3ncnc(Nc4ccc(F)c(Cl)c4)...,2500.0,intermediate
2,CHEMBL69960,Cc1cc(C(=O)N2CCOCC2)[nH]c1/C=C1\C(=O)Nc2ncnc(N...,400.0,active
3,CHEMBL69960,Cc1cc(C(=O)N2CCOCC2)[nH]c1/C=C1\C(=O)Nc2ncnc(N...,1210.0,intermediate
4,CHEMBL67057,Cc1cc(C(=O)N2CCOCC2)[nH]c1/C=C1\C(=O)Nc2ncnc(N...,100.0,active
...,...,...,...,...
3857,CHEMBL4635100,O=C(/C=C/CN1CCCC1)N1CCOc2cc3ncnc(Nc4ccc(OCc5cc...,4.0,
3858,CHEMBL4648271,CCN(CC)C/C=C/C(=O)N1CCOc2cc3ncnc(Nc4ccc(OCc5cc...,4.0,
3859,CHEMBL554,CS(=O)(=O)CCNCc1ccc(-c2ccc3ncnc(Nc4ccc(OCc5ccc...,13.0,
3860,CHEMBL939,COc1cc2ncnc(Nc3ccc(F)c(Cl)c3)c2cc1OCCCN1CCOCC1,281.0,


Saves dataframe to CSV file

In [None]:
df3.to_csv('bioactivity_preprocessed_data.csv', index=False)

In [None]:
! ls -l

total 2236
-rw-r--r-- 1 root root 2042499 Jan 11 05:18 bioactivity_data.csv
-rw-r--r-- 1 root root  233541 Jan 11 05:24 bioactivity_preprocessed_data.csv
drwx------ 5 root root    4096 Jan 11 05:18 gdrive
drwxr-xr-x 1 root root    4096 Jan  7 14:33 sample_data


Let's copy to the Google Drive

In [None]:
! cp bioactivity_preprocessed_data.csv "/content/gdrive/My Drive/Colab Notebooks/data"

In [None]:
! ls "/content/gdrive/My Drive/Colab Notebooks/data"

bioactivity_data.csv  bioactivity_preprocessed_data.csv


---