Remove check for ring information from Atom::Match #6063
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
I ran into a case where I had two different SMILES for identical molecules, but no matching was found (i.e., HasSubstructMatch returned False)
A minimal example would be:
In that case,
Chem.MolToSmiles(mol1) == Chem.MolToSmiles(mol2)
isTrue
, butmol1.GetSubstructMatch(mol2)
is empty.It seems that this is a difficult case for the ring searching algorithm, such that
len(Chem.GetSymmSSSR(mol1)), len(Chem.GetSymmSSSR(mol2))
is(7, 9)
. Since the number of rings is queried in Atom::Match, the substructure search fails as well.This pull request removes the check for the ring number from
Atom::Match
inCode/GraphMol/Atom.cpp
, so that the substructure search is successful. It also adds a test case for a slightly smaller version of that molecule (1 ring less).All other tests (ran using
ctest
) are still passing.Timings
I also did some timing checks, and it does not seem to make a big difference. Running
new_timings.py
without (top) and with (bottom) the change gives:I also ran some small tests myself (using cyclosporin A, a linear peptide with sequence "DGAPSTE", as well as the molecule mentioned above), and did not find a large difference, except for the case were no alignment was found previously. The timings were collected using the
timeit.repeat
function in Python, with 1000 runs and 5 repeats.Without the change:
With the change:
Questions
I would be very grateful if you could have a look at this. This is my first pull request to rdkit, so please let me know if I forgot something.
Best regards,
Franz Waibl