Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add SMILES to compounds.csv.gz #103

Merged
merged 2 commits into from
Apr 6, 2024
Merged

Conversation

srijitseal
Copy link
Contributor

This file adds a standardized SMILES column and the first 14 characters of the InChI key (representing the connectivity) to the compounds.csv.gz We show that six compounds have dual entires, often different ionization states.

This file adds a standardized SMILES column and the first 14 characters of the InChI key (representing the connectivity) to the compounds.csv.gz
We show that six compounds have dual entires, often different ionization states.
@shntnu
Copy link
Contributor

shntnu commented Mar 18, 2024

Over to @afermg to review

@srijitseal I noticed a few things that would need fixing, but let's wait for @afermg 's full review

@afermg
Copy link
Collaborator

afermg commented Mar 18, 2024

@srijitseal Could you please point me to the code? I think it can go to monorepo, but we must first package it as a small library for reproducibility. The most important things at the moment is pinning the dependencies. Let me know if I can be of help for that.

@srijitseal
Copy link
Contributor Author

srijitseal commented Mar 18, 2024 via email

@afermg
Copy link
Collaborator

afermg commented Mar 18, 2024

The comment on pinning dependencies is only for reproducibility, I am not knowledgeable enough about chemoinformatics to comment about the usage of those library. I do need the dependency versions and one test to ensure that packaging still works.

Sorry if it seems like I'm asking for a lot, I just want to ensure that the code that we put in the monorepo runs correctly so it can be reliably referred to in the future. Also, because it is to be a tiny tool, we need to have it as a script/module, not a notebook. I can do the transformation though, as long as I can reproduce the environment in which you produced the data.

@shntnu shntnu changed the title Added compounds.SMILES.csv.gz Add SMILES to compounds.csv.gz Apr 6, 2024
@shntnu
Copy link
Contributor

shntnu commented Apr 6, 2024

I overrode this PR for now by using the SMILES generated when the JCP IDs were created
Details: https://github.com/jump-cellpainting/datasets-private/pull/88

@shntnu shntnu merged commit e603f6e into jump-cellpainting:main Apr 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants