Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Project]: Train REINVENT Mol2MolSimilarity model to predict molecules similar in 3d shape #1128

Open
ankitskvmdam opened this issue May 21, 2024 · 2 comments
Assignees

Comments

@ankitskvmdam
Copy link

Summary

Currently, the REINVENT 4 Mol2MolSimilarity model generates new molecules with a similar 2D structure but not necessarily a similar 3D structure. Our goal is to train the REINVENT 4 Mol2MolSimilarity model to produce molecules with a 3D structure similar to the input molecule.

Approach 1

To achieve this, we will train the Mol2MolSimilarity model to generate new molecules with a similar 3D shape. The training process involves the following steps:

  1. Input the molecule into the Mol2MolSimilarity model.
  2. Pass the generated molecule to smiles-to-3d, which will generate 3D conformers from the SMILES notation.
  3. smiles-to-3d will produce an SDF file, which we will then pass to vsflow to obtain the similarity score.
  4. Use this similarity score as feedback for the Mol2MolSimilarity model to improve its performance.

Approach 2

While most steps are similar to Approach 1, this approach explores alternative tools for generating 3D conformers and calculating 3D shape similarity scores. One such tool we can investigate is Cheese.

Scope

Initiative 🐋

Objective(s)

To develop a model that can efficiently produce new molecules with a 3D structure similar to that of the input molecule.

Team

Role & Responsibility Username(s)
DRI / Lead Developer @ankitskvmdam
Supervisor @miquelduranfrigola

Timeline

TBD

Documentation

@miquelduranfrigola
Copy link
Member

Hello @ankitskvmdam,

After some thought, I suggest the following:

  1. Let's use REINVENT in transfer learning mode, as you suggested. As a starting model, we can use the Mol2MolSimilarity.
  2. Let's use 3D shape search via CHEESE.

The process will be the following:

  1. A molecule A is passed as input.
  2. We do a 3D Shape Similarity search via the CHEESE API. We can use the Enamine REAL database as the reference library. From that search, we get the top N compounds (L).
  3. Of these 100 compounds, we use 80% as a training set for the transfer learning, 10% as a validation set, and 10% as a held out test set.
  4. We train (fine-tune) REINVENT with transfer learning using the training set, and controling its performance with the validation set.
  5. With the test set, we make sure that, indeed, the test compounds are similar in 3D shape to molecule A. This 3D shape comparison can be done with VSFlow, if that is easier.

I hope this makes sense?

Then, I have more ideas to complicate things further (for example, to search against other databases such as ZINC in CHEESE, to do multiple similarity searches, to penalize molecules that are similar in 2D (favouring scaffold hopping), etc.). But let's go step by step.

Please let me know if something is not clear, @ankitskvmdam !

@ankitskvmdam
Copy link
Author

@miquelduranfrigola It makes sense. I will proceed with this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Queue
Development

No branches or pull requests

3 participants