-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add SMILES to compounds.csv.gz #103
Conversation
This file adds a standardized SMILES column and the first 14 characters of the InChI key (representing the connectivity) to the compounds.csv.gz We show that six compounds have dual entires, often different ionization states.
Over to @afermg to review @srijitseal I noticed a few things that would need fixing, but let's wait for @afermg 's full review
|
@srijitseal Could you please point me to the code? I think it can go to monorepo, but we must first package it as a small library for reproducibility. The most important things at the moment is pinning the dependencies. Let me know if I can be of help for that. |
jump-cellpainting/jump-cellpainting#156
You can find the file here! It's almost the same but I removed the loop to
save time after consulting with Andreas, I think the efficiency to
standardize now is 6 times faster for less loss of information and
tautomers will always remain a problem no matter which package we use or
how many loops we run for finding the best tautomer.
…On Mon, Mar 18, 2024 at 11:56 AM Alán F. Muñoz ***@***.***> wrote:
@srijitseal <https://github.com/srijitseal> Could you please point me to
the code? I think it can go to monorepo, but we must first package it as a
small library for reproducibility. The most important things at the moment
is pinning the dependencies. Let me know if I can be of help for that.
—
Reply to this email directly, view it on GitHub
<#103 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AN34ZTZHB5STEHDQAOEG5DLYY4FBBAVCNFSM6AAAAABE2ZNC5SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMBUGMYDOMJVGA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
The comment on pinning dependencies is only for reproducibility, I am not knowledgeable enough about chemoinformatics to comment about the usage of those library. I do need the dependency versions and one test to ensure that packaging still works. Sorry if it seems like I'm asking for a lot, I just want to ensure that the code that we put in the monorepo runs correctly so it can be reliably referred to in the future. Also, because it is to be a tiny tool, we need to have it as a script/module, not a notebook. I can do the transformation though, as long as I can reproduce the environment in which you produced the data. |
I overrode this PR for now by using the SMILES generated when the JCP IDs were created |
This file adds a standardized SMILES column and the first 14 characters of the InChI key (representing the connectivity) to the compounds.csv.gz We show that six compounds have dual entires, often different ionization states.