Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Canonicalization of reactions and multi-SMILES: component-wise, or global? #8

Open
avaucher opened this issue Sep 5, 2022 · 2 comments

Comments

@avaucher
Copy link
Member

avaucher commented Sep 5, 2022

The current implementations of canonicalize_multicomponent_smiles and canonicalize_compounds, are doing molecule-wise canonicalizations, not changing the order of molecules.

As raised in a discussion with @A-Thakkar, one may also expect the canonicalization to take care of reordering the components: one may expect to obtain the same representation when "canonicalizing" two identical reactions where only the order of the compounds is different.

It is still unclear which of the two approaches should be preferred.

@avaucher
Copy link
Member Author

avaucher commented Sep 5, 2022

Maybe we could add a flag canonical_ordering: bool to those functions? (with default value False?)

To be investigated: sorting the compounds alphabetically, would we obtain the same ordering as direct canonicalization with a dot-separated SMILES string with RDKit?

@drugilsberg
Copy link
Contributor

I like the idea of adding a dedicated flag to impose this (I would default to False for backward compatibility). One can find useful to have the components sorted in a "standard" way.

Regarding the last question, I think it's not a problem to diverge from the direct canonicalisation behaviour, but maybe I'm overlooking some obvious problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants