Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using Mol Field #24

Closed
ivannnnnnnnnn opened this issue Jun 1, 2022 · 12 comments · Fixed by #26
Closed

Using Mol Field #24

ivannnnnnnnnn opened this issue Jun 1, 2022 · 12 comments · Fixed by #26

Comments

@ivannnnnnnnnn
Copy link

Hello! I am using rdkit in django app. The Compound model has a MolField.
I have trouble with using this implementation of get_prep_value in MolField:

    def get_prep_value(self, value):
        # convert the Molecule instance to the value used by the
        # db driver
        if isinstance(value, str):
            value = self.text_to_mol(value)
        if isinstance(value, Chem.Mol):
            value = memoryview(value.ToBinary())
        if value is None:
            return None
        return Func(value, function="mol_from_pkl")

But I fix it by change implementation to:

    def get_prep_value(self, value):
        # convert the Molecule instance to the value used by the
        # db driver
        if isinstance(value, str):
            value = self.text_to_mol(value)
        if value is None:
            return None
        return value.ToBinary()

Its fix my issue but I am new with rdkit and not shour what its correct way to serialize data to db.
Can someone answer me wich of method a more correct?

@rvianello
Copy link
Contributor

As discussed in the rdkit issue #5344, why your fix would work is a bit unclear to me, and the ticket is unfortunately not providing a clear description of the problem you had to solve. Could you please provide some additional information with regard to what query is executed, what trouble you experienced with the original implementation (what is more exactly the error, or unexpected behavior?) and what input value is passed?

Additional details about what version of django-rdkit was used, any modifications applying to it, versions of django, postgresql, rdkit (for both the web application and database cartridge) would also help.

@ivannnnnnnnnn
Copy link
Author

I have trouble by using bulk_update method of django ORM.

When I try use bulk_update with default implementation of get_prep_value I catch exception cant adapt Func .

When I using my implementation all work correctly.

But I'm not shour about correct work of other features of Mol field

@rvianello
Copy link
Contributor

Thank you for sharing the additional information. I also realized now that the get_prep_value implementation you included in this ticket description is different from the one you posted with the original rdkit issue.

bulk_update works differently than a plain update (there are a few explicitly documented caveats), which is probably related to why Func may not work in this case. At the same time, mol_from_pkl might be unnecessary in the given context, but I'll try to perform a few tests during the next days.

@MichelML
Copy link
Contributor

MichelML commented Jun 15, 2022

@rvianello if you could adress this quickly, I think the fix of @ivannnnnnnnnn would also allow older versions of the cartridge to be compatible with up to date versions of rdkit (on the django-rdkit/django app side). Based on what I currently understand, the reason is that a Mol from a future version of rdkit is not picklable by a MolPickler version of an older version of rdkit (that doesn't have the right MolPickler version). However, with the fix of @ivannnnnnnnnn , since the mol_from_pkl is not called on the binary value, the DB accepts it, and all the functions work properly on the db side.

However, this is the part I don't understand fully yet: Why, once the binary gets inserted in the db (without mol_from_pkl), everything seems to work perfectly: functions on the db side (including mol_from_pkl???), deserialization from a django backend using django-rdkit, etc ? . We currently use the cartridge only for similarity, substruct and exact searches on a set of molecules, and to keep conformer information of molecules, maybe there are some cases where it wouldn't work.

Small comment on @ivannnnnnnnnn fix, I think this would be more explicit:

    def get_prep_value(self, value):
        # convert the Molecule instance to the value used by the
        # db driver
        if isinstance(value, str):
            value = self.text_to_mol(value)

        if not isinstance(value, Chem.Mol):
            return None

        return value.ToBinary()

related conversations:

Thanks in advance @rvianello , and I'm happy to make the PR if this fix works (or let ivan do it)

cc @greglandrum if you think you can give context into why this change works

@MichelML
Copy link
Contributor

Gentle poke here ☝️

@MichelML
Copy link
Contributor

Gentle repoke here ☝️ (I know this is summer and you might be in vacation @rvianello ) , I'll just regularly check up here because this is sort of blocking for us at the moment

@rvianello
Copy link
Contributor

@MichelML I hope I will be able to look into this in better detail starting tomorrow (sorry about the slow feedback, I'm just short on time)

@MichelML
Copy link
Contributor

Thanks for the update @rvianello ! Totally understand

@rvianello
Copy link
Contributor

@ivannnnnnnnnn @MichelML I've spent a bit of time on this during the past few days and I could easily reproduce the issue with the bulk_update method. Apparently, the new field values are passed directly to the db driver, which results in the can't adapt type 'Func' error message. On the other hand, not wrapping the value in a call to mol_from_pkl as you suggested, at the moment breaks some substructure lookup tests, and I'm therefore working on a refactoring that would consider the Lookup classes too.

@rvianello
Copy link
Contributor

@ivannnnnnnnnn @MichelML I think PR #26 should fix the issue with the bulk update of MolField. It would be really great if you could find some time to test it and let me know how it works (I'll try to address the problem related to the client/server rdkit version separately).

@ivannnnnnnnnn
Copy link
Author

@ivannnnnnnnnn @MichelML I think PR #26 should fix the issue with the bulk update of MolField. It would be really great if you could find some time to test it and let me know how it works (I'll try to address the problem related to the client/server rdkit version separately).

Thank you. I will try to use this in my project on next week and write feedback here

@MichelML
Copy link
Contributor

MichelML commented Jul 5, 2022

thanks a lot @rvianello!

I left a comment and approved the PR! Sorry for last week I was in vacation.

Thanks for taking the time again

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants