Enhanced Stereochemistry canonicalization errors #7041

tadhurst-cdd · 2024-01-12T15:59:54Z

Reference Issue

Enhanced Stereochemistry canonicalization errors

What does this implement/fix? Explain your changes.

Many compounds can be formulated as smiles with different enhanced stereochemistry specification but are actually the same compound. For example:

N[C@H]1CC[C@@H](O)CC1 |a:1,4|
N[C@H]1CC[C@@H](O)CC1 |o1:1,4|
N[C@H]1CC[C@@H](O)CC1 |&1:1,4|
N[C@@H]1CC[C@H](O)CC1 |a:1,4|
N[C@@H]1CC[C@H](O)CC1 |o1:1,4|
N[C@@H]1CC[C@H](O)CC1 |&1:1,4

These are all the same, but without the new code, generate different canonical smiles

These are also the same:

C[C@@H](Cl)C[C@H](C)Cl |a:1,4,|
C[C@@H](Cl)C[C@H](C)Cl |o1:1,4,|
C[C@H](Cl)C[C@@H](C)Cl |a:1,4,|
C[C@@H](Cl)C[C@H](C)Cl |&1:1,4,|

Any other comments?

get the tests passing on linux and the psql results updated

…eudoChiralCanonError

mc-robinson · 2024-04-08T20:58:21Z

@tadhurst-cdd this looks like a very nice change. It looks like this may go towards fixing an issue I recently reported #7266

…eudoChiralCanonError

tadhurst-cdd · 2024-05-02T21:27:33Z

Addressed changes to fix errors in tests provided by Greg Landrum. There were a couple of fixes, and the code now does NOT throw an error is the enhanced procedure does not work, but simply calls the old canonicalization method

…eudoChiralCanonError

tadhurst-cdd · 2024-05-20T11:39:25Z

I know that one concern about the performance of the new rigorousEnhancedStereo functionality in RDKit.

I do not think that this is not a major problem.

First, the difference between time required to do the canonicalization WITHOUT the new functionality and WITH the new stuff is a comparison between producing incorrect results and producing correct results. I think we really want the correct results.

Second, the new stuff does not affect the time required to canonicalize structures that do NOT have enhanced stereochemistry. Less than 4% of the structures our customers have registered in CDD Vault contain enhanced stereochemistry, so the impact is very small.

The currently suggested method does this:

Enumerates the possible structures that the enhanced notation represents.
produces a unique smiles for each
Makes a unique list of the unique smiles.
Convert that list of smiles into a list of mols to be represented.
Finds a canonical enhanced stereo representation that expresses that unique list.
(The steps listed above are done twice – once for any OR type enhanced markers, and one for the AND type markers)

This method relies heavily on the current functionality for producing canonical smiles for stereo-labeled compounds, and that is the source of computational complexity. It would be possible to have the new method NOT actually generate and subsequently parse the canonical smiles, but the work of canonicalization would still need to be done. I doubt that any substantial improvement in performance could be made.

One possible change to the method might be to produce, more directly, a list of mols by reordering the atoms and bonds according to the canonical atom rankings. It would be necessary to be able to compare and sort these mols to produce a unique list.

I am interested in other thoughts and suggestions.

tad

tadhurst-cdd and others added 30 commits November 13, 2023 07:29

atropisomer handling added

a0f37c2

fixed non-used variables, linking directives

1525a51

BOOST LIB start/stop fixes, linking fix

4e5c03b

Fixes for RDKIT CI errors

217f1ce

minimalLib fix

9f55f6c

changed vector<enum> for java builds

f1f967c

check for extra chars in CIP labeling

8701c29

removed wrong deprecated message

cb479c2

fix ostrstream output error?

217985b

restored _ChiralAtomRank to lowercase first letter

ff8a5c9

Merge branch 'master' into pr/atropisomers2

5ea34e7

changes for merged master

21c3645

Fixed catch label for new Catch package

4c73e32

update expected psql results

1ed0228

get swig wrappers building

11bbaae

Merge pull request #8 from greglandrum/pr/atropisomers2

a46feae

get the tests passing on linux and the psql results updated

restore MolFileStereochem to FileParsers

83c3ce4

fix java wrapper for reapplyMolBlockWedging

cfc8532

Merge branch 'master' into pr/atropisomers2

94efdf1

test changes

0947e20

Merge branch 'master' into pr/specialQueries

d84cf33

some suggestions

57f3b44

move a couple functions out of Bond

41055e8

Merge branch 'master' into pr/atropisomers2

665e95f

Merge branch 'master' into pr/specialQueries

cef4c8a

Merge branch 'master' into pr/specialQueries

ea560bc

Merge branch 'master' into pr/atropisomers2

612dca4

merged master

0ed969b

Merge branch 'master' into pr/atropisomers2

923b503

Merge fixes

195d0dc

tadhurst-cdd and others added 7 commits March 27, 2024 08:28

Merge branch 'master' into pseudoChiralCanonError

9459831

Now allows or and and groups together

d241af9

Merge branch 'master' into pseudoChiralCanonError

38f5f04

internal routines inside detail scope

c82e512

Merge branch 'pseudoChiralCanonError' of github.com:cdd/rdkit into ps…

fcd837a

…eudoChiralCanonError

fix test error

9f4e966

changed string back to string_view and fixed a CHECK

24a7b19

tadhurst-cdd added 2 commits April 12, 2024 10:02

Merge branch 'master' of github.com:cdd/rdkit

ba110c9

Merge branch 'master' into pseudoChiralCanonError

531c525

greglandrum self-assigned this Apr 13, 2024

tadhurst-cdd and others added 4 commits April 29, 2024 19:58

Fixes for PR review tests

fe73bea

Merge branch 'master' into pseudoChiralCanonError

4674389

Fix RDKit_Book.rst failure on build test

a8fae14

Merge branch 'pseudoChiralCanonError' of github.com:cdd/rdkit into ps…

d938eed

…eudoChiralCanonError

tadhurst-cdd mentioned this pull request May 2, 2024

No coords atropisomers - fix smiles output of atrop wedges after reordering #7418

Merged

tadhurst-cdd and others added 8 commits May 6, 2024 15:56

Merge branch 'master' of github.com:cdd/rdkit

2d381f2

Merge branch 'master' into pseudoChiralCanonError

5b9b894

Merge branch 'master' into pseudoChiralCanonError

d68f6ef

fix xqm sql test

514fad0

Merge branch 'pseudoChiralCanonError' of github.com:cdd/rdkit into ps…

9bd54d0

…eudoChiralCanonError

Merge branch 'master' of github.com:cdd/rdkit

f486447

Merge branch 'master' into pseudoChiralCanonError

b9f5ae6

updated expected files for cxsmiles_test

b8f9c7d

tadhurst-cdd added 4 commits June 3, 2024 07:22

Merge branch 'master' of github.com:cdd/rdkit

e2b1f9f

Merge branch 'master' of github.com:cdd/rdkit

ee714a4

Merge branch 'master' into pseudoChiralCanonError

1285d89

Fixed removal of atom attrs

53c7081

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enhanced Stereochemistry canonicalization errors #7041

Enhanced Stereochemistry canonicalization errors #7041

tadhurst-cdd commented Jan 12, 2024 •

edited by greglandrum

mc-robinson commented Apr 8, 2024

tadhurst-cdd commented May 2, 2024

tadhurst-cdd commented May 20, 2024

Enhanced Stereochemistry canonicalization errors #7041

Are you sure you want to change the base?

Enhanced Stereochemistry canonicalization errors #7041

Conversation

tadhurst-cdd commented Jan 12, 2024 • edited by greglandrum

Reference Issue

What does this implement/fix? Explain your changes.

Any other comments?

mc-robinson commented Apr 8, 2024

tadhurst-cdd commented May 2, 2024

tadhurst-cdd commented May 20, 2024

tadhurst-cdd commented Jan 12, 2024 •

edited by greglandrum