Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Calling MurckoScaffold on molecule causes bug in pickling #6036

Closed
dangthatsright opened this issue Jan 31, 2023 · 10 comments · Fixed by #6086
Closed

Calling MurckoScaffold on molecule causes bug in pickling #6036

dangthatsright opened this issue Jan 31, 2023 · 10 comments · Fixed by #6086
Labels
Milestone

Comments

@dangthatsright
Copy link

dangthatsright commented Jan 31, 2023

Describe the bug
Calling MurckoScaffold on an rdkit mol mutates it in a way that probably has bad downstream effects. I am not entirely sure what is wrong but seems like the mol is unable to be pickled.

To Reproduce

from multiprocessing import Pool
from rdkit import Chem
from rdkit.Chem.Scaffolds import MurckoScaffold
from rdkit import DataStructs

Chem.SetDefaultPickleProperties(Chem.PropertyPickleOptions.AllProps)

def no_op(m):
    return m

mols = [Chem.MolFromSmiles(s) for s in ["C","CC"]]
# scaffolds = [MurckoScaffold.GetScaffoldForMol(m) for m in mols]

with Pool(processes=4) as pool:
    res_list = [pool.apply_async(no_op, (m,)) for m in mols]
    output = [res.get(timeout=1) for res in res_list]
    print(output)

The above code should work, but once you uncomment the MurckoScaffold section, there is a lot of pickling errors.
which looks like

Traceback (most recent call last):
  File "/home/ubuntu/miniconda/envs/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/home/ubuntu/miniconda/envs/lib/python3.9/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/home/ubuntu/miniconda/envs/lib/python3.9/multiprocessing/pool.py", line 114, in worker
    task = get()
  File "/home/ubuntu/miniconda/envs/lib/python3.9/multiprocessing/queues.py", line 368, in get
    return _ForkingPickler.loads(res)
ValueError: invalid BitVect pickle

Expected behavior
MurckoScaffold.GetScaffoldForMol should have no affect on the underly mol

Screenshots
If applicable, add screenshots to help explain your problem.

Configuration (please complete the following information):

  • RDKit version: 2022.9.1
  • OS: Ubuntu 20.04.5
  • Python version (if relevant): 3.9.10
  • Are you using conda? no
  • If you are using conda, which channel did you install the rdkit from?
  • If you are not using conda: how did you install the RDKit? pip install rdkit==2022.9.1

Additional context
Add any other context about the problem here.

@greglandrum
Copy link
Member

@dangthatsright I cannot reproduce the error from your script

Also this isn't directly a problem with the pickling code, at least not for me:

In [20]: mols = [Chem.MolFromSmiles(s) for s in ["C","CC"]]
    ...: scaffolds = [MurckoScaffold.GetScaffoldForMol(m) for m in mols]
    ...: 
    ...: 

In [21]: pickle.loads(pickle.dumps(mols))
Out[21]: 
[<rdkit.Chem.rdchem.Mol at 0x7f1501411e90>,
 <rdkit.Chem.rdchem.Mol at 0x7f1501413ab0>]

In [22]: pickle.loads(pickle.dumps(scaffolds))
Out[22]: 
[<rdkit.Chem.rdchem.Mol at 0x7f14fbe53600>,
 <rdkit.Chem.rdchem.Mol at 0x7f14fbe5b970>]

Are you sure that there's not something else going on in the script that's causing the problem?

@dangthatsright
Copy link
Author

@greglandrum yes you are right, I had some other imports in my script that I thought were unrelated but I was able to fully reproduce after including Chem.SetDefaultPickleProperties(Chem.PropertyPickleOptions.AllProps). I've corrected the original post

@greglandrum
Copy link
Member

@dangthatsright that updated version still runs fine for me.
Do you get the error you report when you run exactly that script?

@dangthatsright
Copy link
Author

Yes I get the error when I run the exact script. Are you testing on ubuntu or mac? I know multiprocessing recently changed for mac vs ubuntu so perhaps the issue is there?

@greglandrum
Copy link
Member

Yeah, I ran it on ubuntu 22.04.

@greglandrum
Copy link
Member

One possible difference is that I'm using the conda version of the RDKit.

@dangthatsright
Copy link
Author

Ah maybe that's it? In any case, it doesn't matter too much for me since I just make copies of the molecule before calling MurckoScaffold. It seems like this is probably less of an issue than it's worth to fix so I'm just going to close it. Thanks for looking into it though!

@rubbs14
Copy link

rubbs14 commented Feb 10, 2023

Just joining in to report the same issue with the pip version of RDKit on RHL (2022.9.4)
I have the exact same error reported here after retrieving the molecules from the queue (invalid BitVect).
I am computing multiple fingerprints for each molecule with multiprocessing.

I can share the code if needed, but it is indeed very similar to @dangthatsright code

@bp-kelley bp-kelley reopened this Feb 10, 2023
@bp-kelley
Copy link
Contributor

This looks pretty nasty, but I think it's worth investigating so I'm going to keep it open. Here is a non-multiprocessing reproducible:

from rdkit import Chem                                                                                                                 
from rdkit.Chem.Scaffolds import MurckoScaffold                                                                                        
import pickle                                                                                                                          
Chem.SetDefaultPickleProperties(Chem.PropertyPickleOptions.AllProps)                                                                   
mols = [Chem.MolFromSmiles(s) for s in ["C","CC"]]                                                                                     
scaffolds = [MurckoScaffold.GetScaffoldForMol(m) for m in mols]                                                                        
[pickle.loads(pickle.dumps(m)) for m in mols]

@bp-kelley
Copy link
Contributor

@greglandrum this was fun, I have a solution and it wasn't really related to the BitVector. I'll put in a PR tonight.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants