New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support valences of 4 and 6 for Te #1204
Comments
@hsiaoyi0504 all of those molecules have chemistry problems (i.e. invalid valence states on main-group elements). What do you expect to happen with them? There's documentation out there on the web about why these are rejected by the RDKit and how to process them anyway. Does that not help you? |
None of those in your file (Te, Al, ...) will be in ZINC. Just wanted to
make that clear.
…On Wed, Dec 14, 2016 at 10:58 PM, hsiao yi ***@***.***> wrote:
I have some SMILES from ZINC database, but it seems that some of them are
invalid or RDKit is unable to parse (read + canonicalize) them. I collect
them in this file
<https://gist.github.com/hsiaoyi0504/d13c5f880cf769f8495360827dfca6dc>.
Additionally, I have tried to submit these SMILES to PubChem
Standardization Service
<https://pubchem.ncbi.nlm.nih.gov/standardize/standardize.cgi>. Only
CC12O[Te]34OC(C)(C1(C)O3)C2(C)O4 can be standardized (but RDKit can't do
this).
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#1204>, or mute the thread
<https://github.com/notifications/unsubscribe-auth/AFEdH_-A4FfYGZCJp7aY1OZTesgRp7kRks5rIOUqgaJpZM4LNxw3>
.
|
@jir322 Sorry, I finally found it should be from chembl22 |
@greglandrum Yes, when I parse all of them through RDKit, they all report that there are invalid valence states, but is it really an invalid valence state in |
Thanks. I am hono(u)red to be confused with ChEMBL.
On Dec 15, 2016 7:16 AM, "hsiao yi" <notifications@github.com> wrote:
@jir322 <https://github.com/jir322> Sorry, I finally found it should be
from chembl22
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1204 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AFEdHySdaLUwJo2nG-OCBCIwuMu3TX5Mks5rIVmggaJpZM4LNxw3>
.
|
Pubchem is a great public resource, but it's not necessarily the best source to cite for to indicate that something is reasonable chemistry. :-) |
@greglandrum Thanks, I think I didn't clearly point out my point. I think the valence of Te here is acceptable. I also now testing the functionality of reading SMILES from ZINC. Maybe will add more things to current list later. |
I update the list here. |
Those structures are chemically wrong. What exactly do you expect the RDKit to do with them? |
@greglandrum I know they are chemically wrong, but is it possible to correct them? In my opinion, correct some cases is possible. For instance, if I input CC(=O)Nc1ccc(cc1)S(=O)(=O)N=N=N to pubchem standardization service, I will get CC(=O)NC1=CC=C(C=C1)S(=O)(=O)N=[N+]=N. Thus, I think it is possible to do that. |
The RDKit generally does not "guess" about these things unless it's really clear what the user intended and the incorrect form is a more or less standard one. I don't think either of those conditions applies here. This is one where, if you care about correctness, a human being needs to look at what's intended and fix the input. If you don't care about correctness, it's easy to write some RDKit code that reads a molecule in without sanitizing it and adds a charge to neutral four-valent nitrogens. The pubchem solution, as you present, it actually changes the overall charge on the molecule. I'm pretty sure that's not correct. |
OK, I got your point, but I really thought it is good for a software to notify user checking or modifying the input data. How about adding some suggestion for some cases? In most case here, it occurs for |
I have some SMILES from ZINC database, but it seems that some of them are invalid or RDKit is unable to parse (read + canonicalize) them. I collect them in this file. Additionally, I have tried to submit these SMILES to PubChem Standardization Service. Only
CC12O[Te]34OC(C)(C1(C)O3)C2(C)O4
can be standardized (but RDKit can't do this).The text was updated successfully, but these errors were encountered: