# NuBBE - Fetch Raw Data

**Date:** 26/01/23

**Done by:** Gustavo H. M. Sousa


This notebook describes the process of gathering data from [NuBBE](https://nubbe.iq.unesp.br/portal/nubbe-search.html) (Núcleo de Bioensaios, Biossíntese e Ecofisiologia de Produtos Naturais). 

**Basic description:**
NuBBEDB is an attractive source of information for the scientific community of natural products and medicinal chemistry. It is an useful tool for studies on naturally occurring bioactive compounds, molecular and physicochemical properties, database generation, virtual screening, dereplication, metabolomics, and for the design and synthesis of bioactive compounds. NuBBE database contains a variety of natural products isolated from Braziian Biodiversity and provides information on Chemical (metabolic class, chemical structure, physicochemical properties, common and IUPAC name and molecular mass), biological (species, geographic location, biological activities), pharmacological and spectroscopic data (molar mass and nuclear magnetic resonance). This is also an effort to make natural products accessible for virtual screening in the academic community, with the 3D structure format compatible with the most widely used docking programs.

The first step was to download all compounds. Currently, there are more than 2000 compounds available for download in the web platform. By hitting the button `Search Compounds(s)` without putting any entries, we are able to search the entire database.

In [1]:
import pandas as pd

![image.png](attachment:image.png)

Then, after waiting a couple of minutes, we can scroll down the page and hit the `.mol2` button for download all molecular strucutre files in this format.

![image.png](attachment:image.png)

The downloaded file `zip` was then extracted and using `openbabel` software, it was possible to convert the strucutures into `SMILES`, and to finally save it into a `csv` file format. The command line used was:

` obabel *.mol2 -osmi -O nubbe.smi `

The `nubbe.smi` file is just a text file with 'tab' separated values. It was manually converted to a `csv` file format as pandas supports this format fairly easily.

A glimpse of the data and format can be found in the chunk below:

In [5]:
nubbe_data = pd.read_csv('data/nubbe.csv', sep="\t")
nubbe_data.head(5)

Unnamed: 0,smiles,nubbe_id
0,CC1(C)[C@H]2CC[C@@H]3[C@@]4(C)CC[C@H]([C@@H](C...,NuBBE_1003
1,CC([C@@H](/C=C/[C@H]([C@H]1CCC2=C3C=CC4=CC(=O)...,NuBBE_1004
2,CC([C@@H](/C=C/[C@H]([C@H]1CC[C@]23[C@@]4(CC(=...,NuBBE_1005
3,O=C1c2c(O)cccc2[C@@H](O)[C@@H](C)O1,NuBBE_1006
4,O=C1c2c(O)ccc(O)c2C[C@@H](C)O1,NuBBE_1007


In [7]:
print(f"The size of the dataset is: {nubbe_data.shape}")
nubbe_data.dtypes

The size of the dataset is: (2222, 2)


smiles      object
nubbe_id    object
dtype: object