Small molecules are represented by a variety of machine-readable strings (SMILES, InChi, SMARTS, among others). On the contrary, IUPAC (International Union of Pure and Applied Chemistry) names are devised for human readers. The authors trained a language translator model treating the SMILES and IUPAC as two different languages. 81 million SMILES were downloaded from PubChem and converted to SELFIES for model training. The corresponding IUPAC names for the 81 million SMILES were obtained with ChemAxon molconvert software.
- EOS model ID:
eos4se9
- Slug:
smiles2iupac
- Input:
Compound
- Input Shape:
Single
- Task:
Representation
- Output:
Text
- Output Type:
String
- Output Shape:
Single
- Interpretation: IUPAC name of a specific SMILES
- Publication
- Source Code
- Ersilia contributor: carcablop
If you use this model, please cite the original authors of the model and the Ersilia Model Hub.
This package is licensed under a GPL-3.0 license. The model contained within this package is licensed under a MIT license.
Notice: Ersilia grants access to these models 'as is' provided by the original authors, please refer to the original code repository and/or publication if you use the model in your research.
The Ersilia Open Source Initiative is a Non Profit Organization (1192266) with the mission is to equip labs, universities and clinics in LMIC with AI/ML tools for infectious disease research.
Help us achieve our mission!