Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix a problem with pickling molecules with more than 255 rings #5992

Merged
merged 2 commits into from
Jan 31, 2023

Conversation

greglandrum
Copy link
Member

We were saving the number of rings using an unsigned char. This fixes that.

Backwards compatibility is maintained.

@greglandrum greglandrum added this to the 2022_09_5 milestone Jan 19, 2023
@greglandrum
Copy link
Member Author

@bp-kelley this one will make you laugh

for (unsigned int i = 0; i < ringInfo->numRings(); i++) {
INT_VECT ring;
ring = ringInfo->atomRings()[i];
tmpT = static_cast<T>(ring.size());
T tmpT = static_cast<T>(ring.size());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be uint32_t as well?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm reading this right, this is saying no single ring can be > 255 atoms, but the pickle format supports rings > 255 so I'm a bit confused.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm reading this right, this is saying no single ring can be > 255 atoms, but the pickle format supports rings > 255 so I'm a bit confused.

The type of T is determined by the number of atoms in the molecule. So the only time T would be char instead of uint32_t is when the molecule has <255 atoms. If you have less than 255 atoms then you can't end up with a ring of size>255. :-)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough, I wonder what the equation is for N Vertices versus m rings is.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough, I wonder what the equation is for N Vertices versus m rings is.

Fixing the problem that you can have more rings than atoms is the general point of the PR, which changes the type of the number of rings pickle element to an int even if the size of the atom index (or number of atoms) is a char.

This particular line of code is connected to the maximum size of any given ring, and that is the number of atoms.

@greglandrum
Copy link
Member Author

@bp-kelley : are your questions answered?

@bp-kelley bp-kelley merged commit 82dd73d into rdkit:master Jan 31, 2023
@greglandrum greglandrum deleted the fix/pickle_error branch January 31, 2023 05:06
greglandrum added a commit that referenced this pull request Feb 23, 2023
* Fix a problem with pickling molecules with more than 255 rings

* fix doctest
@ptosco
Copy link
Contributor

ptosco commented Mar 24, 2023

@greglandrum This commit has introduced a backwards incompatible change and should not have been included in a patch release.
If you now pickle a molecule with 2022_09_5 and unpickle in 2022_09_3 you will get a segfault:

# pickle_bug.py

import rdkit
from rdkit import Chem
import pickle

print(rdkit.__version__)
if rdkit.__version__ == '2022.09.5':
    print("Writing pickle")
    mol = Chem.MolFromSmiles("C1CC1")
    Chem.FastFindRings(mol)
    with open("mol.pkl", "wb") as hnd:
        pickle.dump(mol, hnd)
else:
    print("Reading pickle")
    with open("mol.pkl", "rb") as hnd:
        mol = pickle.load(hnd)

$ python pickle_bug.py
2022.09.5
Writing pickle

$ python pickle_bug.py
2022.09.3
Reading pickle
Segmentation fault

@greglandrum
Copy link
Member Author

@ptosco : that's not a backwards incompatible change. We've never guaranteed that an older version of the code could parse newer pickles, and this pickle is a new version

It was a mistake to only increase the patch number of the pickle version. This should have been the minor version so that trying to read it with an older version of the code produces a warning.

I guess the solution to that would be to do another PR which either bumps the minor version number or changes the code to warn on patch release mismatches. Probably the second makes more sense

@greglandrum
Copy link
Member Author

Also: you're probably right that this wasn't a good candidate for a patch release.

@ptosco
Copy link
Contributor

ptosco commented Mar 24, 2023

Sorry, I meant forward, not backward. It is pretty normal that an older version can't read files written by a newer version, but I don't think this should happen within patch versions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants