Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Search by canonical SMILES to retrieve all stereoisomers #42

Open
BalooRM opened this issue Mar 26, 2020 · 1 comment
Open

Search by canonical SMILES to retrieve all stereoisomers #42

BalooRM opened this issue Mar 26, 2020 · 1 comment

Comments

@BalooRM
Copy link

BalooRM commented Mar 26, 2020

Is it possible to perform a PUG REST synchronous (fastidentity) search to retrieve all related isomers for a canonical SMILES string (unspecified sterochemistry)? get_cids() returns a list with a single CID.

For example, the following request returns the desired information as JSON.
CC(C)(C)NCC(C1=CC(=C(C=C1)O)CO)O is the canonical SMILES for albuterol (CID = 2083).

https://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/fastidentity/smiles/CC(C)(C)NCC(C1=CC(=C(C=C1)O)CO)O/cids/JSON?identity_type=same_isotope

Returns:

{
  "IdentifierList": {
    "CID": [
      2083,
      123600,
      182176
    ]
  }
}
@BalooRM
Copy link
Author

BalooRM commented Apr 18, 2020

My fork (https://github.com/BalooRM/PubChemPy) has an update to pubchempy.py that permits searching by SMILES to retrieve specific isomers. In the example below, the canonical SMILES for albuterol, which has 2 stereoisomers and a non-specific structure in PubChem, are retrieved by using a fastidentity search with the identitytype = same_isotope. There are other isotopes for albuterol in PubChem.

The synchronous ("fast") searches are documented here:
https://pubchemdocs.ncbi.nlm.nih.gov/pug-rest

The following output is generated by the test code which follows.

get_compounds by SMILES
CID      2083
IUPAC Name       4-[2-(tert-butylamino)-1-hydroxyethyl]-2-(hydroxymethyl)phenol
Canonical SMILES         CC(C)(C)NCC(C1=CC(=C(C=C1)O)CO)O
Isomeric SMILES  CC(C)(C)NCC(C1=CC(=C(C=C1)O)CO)O

get_compounds by SMILES: searchtype='fastidentity', identity_type='same_isotope'
CID      2083
IUPAC Name       4-[2-(tert-butylamino)-1-hydroxyethyl]-2-(hydroxymethyl)phenol
Canonical SMILES         CC(C)(C)NCC(C1=CC(=C(C=C1)O)CO)O
Isomeric SMILES  CC(C)(C)NCC(C1=CC(=C(C=C1)O)CO)O
CID      123600
IUPAC Name       4-[(1R)-2-(tert-butylamino)-1-hydroxyethyl]-2-(hydroxymethyl)phenol
Canonical SMILES         CC(C)(C)NCC(C1=CC(=C(C=C1)O)CO)O
Isomeric SMILES  CC(C)(C)NC[C@@H](C1=CC(=C(C=C1)O)CO)O
CID      182176
IUPAC Name       4-[(1S)-2-(tert-butylamino)-1-hydroxyethyl]-2-(hydroxymethyl)phenol
Canonical SMILES         CC(C)(C)NCC(C1=CC(=C(C=C1)O)CO)O
Isomeric SMILES  CC(C)(C)NC[C@H](C1=CC(=C(C=C1)O)CO)O

get_cids by SMILES
[2083]

get_cids by SMILES: searchtype=fastidentity, identity_type='same_isotope'
https://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/fastidentity/smiles/cids/JSON?identity_type=same_isotope
[2083, 123600, 182176]
import pubchempy as pcp

mycid = 2083 
mycansmiles = "CC(C)(C)NCC(C1=CC(=C(C=C1)O)CO)O"
myisosmiles = "CC(C)(C)NCC(C1=CC(=C(C=C1)O)CO)O"

print("get_compounds by SMILES")
for compound in pcp.get_compounds(mycansmiles, 'smiles'):
    print ('CID\t', compound.cid)
    print ('IUPAC Name\t', compound.iupac_name)
    print ('Canonical SMILES\t', compound.canonical_smiles)
    print ('Isomeric SMILES\t', compound.isomeric_smiles)
    
print("\nget_compounds by SMILES: searchtype='fastidentity', identity_type='same_isotope'")
for compound in pcp.get_compounds(mycansmiles, 'smiles', searchtype='fastidentity', identity_type='same_isotope'):
    print ('CID\t', compound.cid)
    print ('IUPAC Name\t', compound.iupac_name)
    print ('Canonical SMILES\t', compound.canonical_smiles)
    print ('Isomeric SMILES\t', compound.isomeric_smiles)

print("\nget_cids by SMILES")
print(pcp.get_cids(mycansmiles, 'smiles'))

print("\nget_cids by SMILES: searchtype=fastidentity, identity_type='same_isotope'")
print("https://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/fastidentity/smiles/cids/JSON?identity_type=same_isotope")
print(pcp.get_cids(mycansmiles, 'smiles',searchtype='fastidentity', identity_type='same_isotope')) 

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant