# PubChem Molecular Formula Search example

In this example, we perform a `MolecularFormulaSearch` to request the smiles of all compounds on PubChem that contain C, H, B and Al elements exclusively.

In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
from pubchem_api_crawler.molecular_search import MolecularFormulaSearch

In [3]:
import logging

logger = logging.getLogger('pubchem_api_crawler')
logger.setLevel(logging.INFO)
ch = logging.StreamHandler()
ch.setFormatter(logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s'))
logger.addHandler(ch)

In [12]:
mf = MolecularFormulaSearch()
df = mf.search(["C1-", "H1-", "B1-", "Al2-"],allow_other_elements=False,properties=["MolecularFormula", "CanonicalSMILES"])

2024-01-29 13:43:21,540 - pubchem_api_crawler.molecular_search - INFO - Exceuting Molecular Formula Search request: https://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/fastformula/C1-H1-B1-Al2-/property/MolecularFormula,CanonicalSMILES/JSON?AllowOtherElements=false&MaxRecords=2000000
2024-01-29 13:43:23,995 - pubchem_api_crawler.molecular_search - INFO - Request Count status: Green (0%), Request Time status: Green (0%), Service status: Green (13%)


In [13]:
df

Unnamed: 0_level_0,MolecularFormula,CanonicalSMILES
CID,Unnamed: 1_level_1,Unnamed: 2_level_1
160469542,C8H26Al2B2,[B](C)C.[B](C)C.C[AlH]C.C[AlH]C
159970515,C20H48Al2B2,B(C)(CCB(C)CCC)CCC.CCC[Al](C)CC[Al](C)CCC


In [28]:
df = mf.search(["C1-200", "H1-200", "Al1-100"], allow_other_elements=True,properties=["MolecularFormula", "CanonicalSMILES"])
df

2024-01-29 13:50:26,269 - pubchem_api_crawler.molecular_search - INFO - Exceuting Molecular Formula Search request: https://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/fastformula/C1-200H1-200Al1-100/property/MolecularFormula,CanonicalSMILES/JSON?AllowOtherElements=true&MaxRecords=2000000
2024-01-29 13:50:39,280 - pubchem_api_crawler.molecular_search - INFO - Request Count status: Green (0%), Request Time status: Green (0%), Service status: Green (13%)


Unnamed: 0_level_0,MolecularFormula,CanonicalSMILES
CID,Unnamed: 1_level_1,Unnamed: 2_level_1
16683018,C9H15AlO9,CC(C(=O)O[Al](OC(=O)C(C)O)OC(=O)C(C)O)O
12496,C54H105AlO6,CCCCCCCCCCCCCCCCCC(=O)[O-].CCCCCCCCCCCCCCCCCC(...
8757,C6H9AlO6,CC(=O)[O-].CC(=O)[O-].CC(=O)[O-].[Al+3]
16682987,C18H39AlO4,CCCCCCCCCCCCCCCCCC(=O)O[Al].O.O
11237,C2H5AlCl2,CC[Al](Cl)Cl
...,...,...
169488426,CH9AlMgO11-4,C(=O)([O-])[O-].O.[OH-].[OH-].[OH-].[OH-].[OH-...
169490843,C66H57AlO39,CC1=C2C(=CC(=C1C(=O)O)O)C(=O)C3=C(C(=C(C(=C3C2...
169493512,C18H17AlNO2+,CC1=C(C(=CC=C1)C)[O-].CC1=NC2=C(C=CC=C2[O-])C=...
169550479,C32H16AlClN8O6S2,C1=CC=C2C(=C1)C3=NC4=C5C=CC(=CC5=C6N4[Al](N7C(...


In [25]:
df.shape

(19, 2)

In [17]:
res = mf._pug_search(["C1-", "H1-", "B-", "Al-"],allow_other_elements=False,properties=["MolecularFormula", "CanonicalSMILES"])

2024-01-29 13:44:56,097 - pubchem_api_crawler.molecular_search - INFO - Checking status for query 1078300897660591343.
2024-01-29 13:44:57,006 - pubchem_api_crawler.molecular_search - INFO - Query 1078300897660591343 is success.


In [18]:
res

168084494
163556649
161576177
160352291
159123289
158802573
158250967
158044531
157093180
156888304
129859217


In [19]:
len(res)

17