Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mols matrix to grid image #6080

Merged
merged 36 commits into from
Jul 29, 2023
Merged

Conversation

bertiewooster
Copy link
Contributor

@bertiewooster bertiewooster commented Feb 10, 2023

Reference Issue

Adds functionality #5917

What does this implement/fix? Explain your changes.

Adds the functionality to use a two-dimensional (nested) data structure as input to create molecular grid images. For example, the following molecular grid could be created by supplying a nested data structure where

  • Each data substructure represents a row
  • The length of each data substructure can be different and the RDKit will automatically add empty cells as needed so that all rows are padded to the length of the longest row

Thus, the user can provide a "ragged" data structure (where the length of each data substructure can be different), and the number of columns in the molecular grid image will automatically be set correctly.

Annotated grid of maximum common substructure and core; molecules and groups off maximum common substructure

Any other comments?

@bertiewooster
Copy link
Contributor Author

bertiewooster commented Feb 12, 2023

I'd like to create parametrized tests to cover a variety of situations. Is there a standard way to parametrize unittests? The three options that I found are:

  1. define a function that runs the desired tests, then call it for each parameter set
  2. ddt (Data-Driven Tests, not the chemical ;) whose last commit was Aug 2022
  3. parameterized which has no commits since Jan 2021 so that suggests it's not maintained anymore

I didn't find option 2 or 3 in the RDKit codebase. Absent any feedback here, I guess I'll go with 1. to avoid adding a dependency to the RDKit codebase, especially one that may not be that well maintained.

P.S. Python has a built-in subTest() context manager but that seems slightly different, not designed to take a range of parameters.

@greglandrum
Copy link
Member

@bertiewooster : this is still marked as a draft, so I haven't looked at it. Do you intend to come back to it at some point?

@bertiewooster
Copy link
Contributor Author

bertiewooster commented May 16, 2023

@greglandrum: no need to review it yet. Yes, I intend to come back to it. I was making good progress then my C got confused about which chip architecture my computer has mach-o file, but is an incompatible architecture (have 'x86_64', need 'arm64') and I haven't gotten around to fixing it but should have time in the next few weeks. Alternatively I may try the approach in your blog post Setting up an environment to make Python contributions to the RDKit because my contribution will be in Python.

@bertiewooster bertiewooster marked this pull request as ready for review May 29, 2023 23:45
@bertiewooster
Copy link
Contributor Author

bertiewooster commented May 29, 2023

Usage examples from MolsMatrixToGridImage docstring:

        from rdkit import Chem
        from rdkit.Chem.Draw import MolsMatrixToGridImage, rdMolDraw2D
        FCl = Chem.MolFromSmiles("FCl")
        mols_matrix = [[FCl, FCl], [FCl, None, FCl]]

        # Minimal example: Only mols_matrix is supplied,
        # result will be a drawing containing (where each row contains molecules):
        # F-Cl    F-Cl
        # F-Cl            F-Cl
        img = MolsMatrixToGridImage(mols_matrix)
        img.save("MolsMatrixToGridImage_minimal.png")
        # img is a PIL object for a PNG image file like:
        # <PIL.PngImagePlugin.PngImageFile image mode=RGB size=600x200 at 0x1648CC390>
        # Drawing will be saved as PNG file MolsMatrixToGridImage_minimal.png

        # Exhaustive example: All parameters are supplied,
        # result will be a drawing containing (where each row of molecules is followed by a row of legends):
        # 1 F-Cl 0              1 F-Cl 0
        # no highlighting       bond highlighted         
        # 1 F-Cl 0                                  1 F-Cl 0
        # sodium highlighted                        chloride and bond highlighted
        legends_matrix = [["no highlighting", "bond highlighted"], 
        ["F highlighted", "", "Cl and bond highlighted"]]
        highlightAtomLists_matrix = [[[],[]], [[0], None, [1]]]
        highlightBondLists_matrix = [[[],[0]], [[], None, [0]]]

        dopts = rdMolDraw2D.MolDrawOptions()
        dopts.addAtomIndices = True

        img_file = MolsMatrixToGridImage(mols_matrix=mols_matrix, subImgSize=(300, 400), 
        legends_matrix=legends_matrix, highlightAtomLists_matrix=highlightAtomLists_matrix, 
        highlightBondLists_matrix=highlightBondLists_matrix, useSVG=False, returnPNG=True, drawOptions=dopts)
        img_file.save("MolsMatrixToGridImage_exhaustive.png")
        # Drawing will be saved as PNG file MolsMatrixToGridImage_exhaustive.png

Copy link
Member

@greglandrum greglandrum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution @bertiewooster . I do think this will be useful functionality.
I went through and made a bunch of mainly cosmetic/style suggestions which will make it easier for me to actually review the substance of what's here.
I'll do the real review after that stuff is resolved.

rdkit/Chem/Draw/UnitTestDraw.py Outdated Show resolved Hide resolved
rdkit/Chem/Draw/UnitTestDraw.py Outdated Show resolved Hide resolved
rdkit/Chem/Draw/__init__.py Outdated Show resolved Hide resolved
rdkit/Chem/Draw/__init__.py Outdated Show resolved Hide resolved
rdkit/Chem/Draw/__init__.py Outdated Show resolved Hide resolved
rdkit/Chem/Draw/__init__.py Outdated Show resolved Hide resolved
rdkit/Chem/Draw/__init__.py Outdated Show resolved Hide resolved
rdkit/Chem/Draw/__init__.py Outdated Show resolved Hide resolved
rdkit/Chem/Draw/__init__.py Outdated Show resolved Hide resolved
rdkit/Chem/Draw/UnitTestDraw.py Outdated Show resolved Hide resolved
@bertiewooster
Copy link
Contributor Author

Hi, I'm re-requesting substantive review because I believe I addressed all @greglandrum's cosmetic/style suggestions above. Thanks!

Copy link
Member

@greglandrum greglandrum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@greglandrum greglandrum added this to the 2023_09_1 milestone Jul 29, 2023
@greglandrum
Copy link
Member

Thanks @bertiewooster !

@greglandrum greglandrum merged commit de602c8 into rdkit:master Jul 29, 2023
10 checks passed
@bertiewooster
Copy link
Contributor Author

Great, thanks @greglandrum! I'm excited to have made a functionality contribution to the RDKit.

Regarding documentation for this new feature,

  • Because this is a graphical feature, a visual format such as a blog post would be helpful for explaining it. I'll work on a Jupyter Notebook and would be happy to contribute it to the RDKit blog if you like.
  • I assume the API documentation such as function signature (similar to that of MolsToGridImage) will be automatically generated from the docstring in the code.

@greglandrum
Copy link
Member

  • Because this is a graphical feature, a visual format such as a blog post would be helpful for explaining it. I'll work on a Jupyter Notebook and would be happy to contribute it to the RDKit blog if you like.

That would be great!

Yep, that should happen automatically.

@bertiewooster
Copy link
Contributor Author

bertiewooster commented Oct 21, 2023

I've got the Jupyter Notebook for explaining MolsMatrixToGridImage() ready and am trying to run it with the RDKit 2023_09_1 release; I am having trouble upgrading to that version on my Macs (one M2 and one Intel processor). When I follow the installation instructions

conda create -c conda-forge -n my-rdkit-env rdkit` 
conda activate my-rdkit-env

the RDKit version is 2022.03.5:

# Name                    Version                   Build  Channel
rdkit                     2022.03.5       py310h9a91a65_0    conda-forge

Running conda upgrade rdkit does not upgrade the RDKit version. Is there some other way to get 2023_09_1? I checked PyPI for the pip-installable version, but PyPI is on version 2023.3.3.

@greglandrum
Copy link
Member

Maybe try conda install rdkit=2023.09.1?

@bertiewooster
Copy link
Contributor Author

Thanks for the tip! What ended up working was

conda install -c conda-forge rdkit=2023.09.1

I initially tried your suggestion as-is and got

PackagesNotFoundError: The following packages are not available from current channels:

  - rdkit=2023.09.1

Current channels:

  - https://repo.anaconda.com/pkgs/main/osx-64
  - https://repo.anaconda.com/pkgs/main/noarch
  - https://repo.anaconda.com/pkgs/r/osx-64
  - https://repo.anaconda.com/pkgs/r/noarch

so I gathered that part of the problem was the latest RDKit wasn't in the Anaconda channels.

@greglandrum
Copy link
Member

so I gathered that part of the problem was the latest RDKit wasn't in the Anaconda channels

To the best of my knowledge no version of the RDKit is in the Anaconda channels. It's always been in conda-forge.

@bertiewooster
Copy link
Contributor Author

@greglandrum I completed drafting the blog post and submitted a pull request to the RDKit Blog repo.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants