Skip to content

Latest commit

 

History

History
207 lines (154 loc) · 6.77 KB

README.md

File metadata and controls

207 lines (154 loc) · 6.77 KB

webchem

Build Status webchem is a R package to retrieve chemical information from the web. This package interacts with a suite of web APIs for chemical information.

Currently implemented in webchem

Source Functions API Docs API key
Chemical Identifier Resolver (CIR) cir_query() link none
ChemSpider get_csid(), csid_compinfo(), csid_extcompinfo() link link
PubChem get_cid(), cid_compinfo() link none
Chemical Translation Service (CTS) cts_convert(), cts_compinfo() link none

API keys

ChemSpider functions require a security token. Please register at RSC (https://www.rsc.org/rsc-id/register) to retrieve a security token.

Installation

Install from CRAN (stable)

webchem is currently not available on CRAN.

Install from Github (development)

install.packages("devtools")
library("devtools")
install_github("edild/webchem")

Quickstart

library("webchem")

Chemical Identifier Resolver (CIR)

CAS numbers and molecular weight for Triclosan. Use first to return only the first hit.

cir_query('Triclosan', 'cas')
#> [1] "3380-34-5"   "112099-35-1" "88032-08-0"
cir_query('Triclosan', 'cas', first = TRUE)
#> [1] "3380-34-5"
cir_query('Triclosan', 'mw')
#> [1] "289.5451"

Query SMILES and InChIKey from CAS (Triclosan). Inputs might by ambiguous and we can specify where to search using resolver=.

cir_query('3380-34-5', 'smiles')
#> [1] "C1=CC(=CC(=C1OC2=CC=C(C=C2Cl)Cl)O)Cl"
cir_query('3380-34-5', 'stdinchikey', resolver = 'cas_number')
#> [1] "InChIKey=XEFQLINVKFYRCS-UHFFFAOYSA-N"

Convert InChiKey (Triclosan) to ChemSpider ID and retrieve the number of rings

cir_query('XEFQLINVKFYRCS-UHFFFAOYSA-N', 'chemspider_id', first = TRUE)
#> [1] "5363"
cir_query('XEFQLINVKFYRCS-UHFFFAOYSA-N', 'ring_count')
#> [1] "2"

ChemSpider

You'll need a API key:

token = '<YOUR TOKEN HERE'

Retrieve the ChemSpider ID of Triclosan

(id <- get_csid('Triclosan', token = token))
#> [1] "5363"

Use this ID to query information from ChemSpider

csid_extcompinfo(id, token = token)
#>                                                                          CSID 
#>                                                                        "5363" 
#>                                                                            MF 
#>                                                      "C_{12}H_{7}Cl_{3}O_{2}" 
#>                                                                        SMILES 
#>                                              "c1cc(c(cc1Cl)O)Oc2ccc(cc2Cl)Cl" 
#>                                                                         InChI 
#> "InChI=1/C12H7Cl3O2/c13-7-1-3-11(9(15)5-7)17-12-4-2-8(14)6-10(12)16/h1-6,16H" 
#>                                                                      InChIKey 
#>                                                   "XEFQLINVKFYRCS-UHFFFAOYAS" 
#>                                                                   AverageMass 
#>                                                                    "289.5418" 
#>                                                               MolecularWeight 
#>                                                                    "289.5418" 
#>                                                              MonoisotopicMass 
#>                                                                  "287.951172" 
#>                                                                   NominalMass 
#>                                                                         "288" 
#>                                                                         ALogP 
#>                                                                        "5.53" 
#>                                                                         XLogP 
#>                                                                           "5" 
#>                                                                    CommonName 
#>                                                                   "Triclosan"

PubChem

Retrieve PubChem CID

get_cid('Triclosan')
#>  [1] "5564"     "131203"   "627458"   "15942656" "16220126" "16220128"
#>  [7] "16220129" "16220130" "18413505" "22947105" "23656593" "24848164"
#> [13] "25023954" "25023955" "25023956" "25023957" "25023958" "25023959"
#> [19] "25023960" "25023961" "25023962" "25023963" "25023964" "25023965"
#> [25] "25023966" "25023967" "25023968" "25023969" "25023970" "25023971"
#> [31] "25023972" "25023973" "45040608" "45040609" "67606151" "71752714"
cid <- get_cid('3380-34-5')

Use this CID to retrieve some chemical properties:

props <- cid_compinfo(cid)
props$InChIKey
#> [1] "XEFQLINVKFYRCS-UHFFFAOYSA-N"
props$MolecularWeight
#> [1] "289.541780"
props$IUPACName
#> [1] "5-chloro-2-(2,4-dichlorophenoxy)phenol"

Chemical Translation Service (CTS)

CTS allows to convert from nearly every possible identifier to nearly every possible identifier:

cts_convert(query = '3380-34-5', from = 'CAS', to = 'PubChem CID')
#> [1] "5564"
cts_convert(query = '3380-34-5', from = 'CAS', to = 'ChemSpider')
#> [1] "5363"
(inchk <- cts_convert(query = 'Triclosan', from = 'Chemical Name', to = 'inchikey'))
#> [1] "XEFQLINVKFYRCS-UHFFFAOYSA-N"

Moreover, we can a lot of information stored in the CTS database using InChIkey

info <- cts_compinfo(inchikey = inchk)
info[1:5]
#> $inchikey
#> [1] "XEFQLINVKFYRCS-UHFFFAOYSA-N"
#> 
#> $inchicode
#> [1] "InChI=1S/C12H7Cl3O2/c13-7-1-3-11(9(15)5-7)17-12-4-2-8(14)6-10(12)16/h1-6,16H"
#> 
#> $molweight
#> [1] 289.5418
#> 
#> $exactmass
#> [1] 287.9512
#> 
#> $formula
#> [1] "C12H7Cl3O2"

Acknowledgements

Without the fantastic web services webchem wouldn't be here. Therefore, kudos to the web service providers and developers!

Related Projects

If you're more familiar with python than with R, you should check out Matt Swains repositories - ChemSpiPY, PubChemPy and CirPy provide similar functionality as webchem.

Contributors

Meta

ropensci