Source code for DIPS split #18

anton-bushuiev · 2023-01-11T10:53:29Z

Hi 👋! In the paper it is mentioned: "For DIPS, the split is based on protein family to separate similar proteins". Is there a source code for this split? I could only find a random split in paritition_dips.py.

The text was updated successfully, but these errors were encountered:

AxelGiottonini · 2023-01-25T09:56:12Z

Hey !

I don't remember finding any code for the split, but you can certainly use create a simple script to cluster your proteins using foldseek or something similar and dgl, networkx or any other graph library you want. The only thing you need to output is then the list of files in the same format than you could find in the original splits definition.

Sincerly meow !

anton-bushuiev · 2023-01-25T11:57:58Z

Hi, @AxelGiottonini!

Thank you very much for you response. Foldseek looks perfect, I did not know about it. What exactly do you mean by using a graph library? To cluster PPIs using graph metrics based on their EquiDock graph representations? Also, I am still curios how exactly PPIs were split based on the folds of individual interacting partners. If PPI1 has partners with folds A and B and PPI2 with C and D, are they decided to be separated if {A, B} != {C, D} or more strictly {A, B} and {C, D} are disjoint 🤔? It may be important from the perspective of data leakage.

AxelGiottonini · 2023-01-25T12:27:15Z

What I did in a previous project was to cluster the proteins using foldseek (all vs all) and to create a graph using all the protein as vertices and putting edges between paired proteins (receptor - ligand) and proteins in a cluster. Then I used the biggest clusters to create the training set and the smallest for validation and testing (90-5-5).

What may be an option could also be to characterize the binding pocket and split the data according to this characterization, but I miss knowledge to do that kind of things.

anton-bushuiev · 2023-01-25T13:08:14Z

Thank you for sharing!

Yes, I am also considering to create a split based on interface similarity using a tool like this.

AxelGiottonini · 2023-01-26T12:29:14Z

You're welcome ! I did not look for such tool but that seems promising !

Also, when I was working with EquiDock, I had results with a bad accuracy considering only the ligand RMSD (as the receptor RMSD is always 0). I'll share my code and results in the next days, but could you consider sharing your results if something similar occurred?

anton-bushuiev · 2023-01-26T14:52:56Z

Hi! I do not use EquiDock and I was mainly interested in the data split. I am working on a related problem of predicting binding affinity change upon mutation (based on the SKEMPI2 data). It as about learning from already bound structures, so its a bit different.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Source code for DIPS split #18

Source code for DIPS split #18

anton-bushuiev commented Jan 11, 2023

AxelGiottonini commented Jan 25, 2023

anton-bushuiev commented Jan 25, 2023

AxelGiottonini commented Jan 25, 2023

anton-bushuiev commented Jan 25, 2023

AxelGiottonini commented Jan 26, 2023

anton-bushuiev commented Jan 26, 2023

Source code for DIPS split #18

Source code for DIPS split #18

Comments

anton-bushuiev commented Jan 11, 2023

AxelGiottonini commented Jan 25, 2023

anton-bushuiev commented Jan 25, 2023

AxelGiottonini commented Jan 25, 2023

anton-bushuiev commented Jan 25, 2023

AxelGiottonini commented Jan 26, 2023

anton-bushuiev commented Jan 26, 2023