-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Source code for DIPS split #18
Comments
Hey ! I don't remember finding any code for the split, but you can certainly use create a simple script to cluster your proteins using foldseek or something similar and dgl, networkx or any other graph library you want. The only thing you need to output is then the list of files in the same format than you could find in the original splits definition. Sincerly meow ! |
Hi, @AxelGiottonini! Thank you very much for you response. Foldseek looks perfect, I did not know about it. What exactly do you mean by using a graph library? To cluster PPIs using graph metrics based on their EquiDock graph representations? Also, I am still curios how exactly PPIs were split based on the folds of individual interacting partners. If |
What I did in a previous project was to cluster the proteins using foldseek (all vs all) and to create a graph using all the protein as vertices and putting edges between paired proteins (receptor - ligand) and proteins in a cluster. Then I used the biggest clusters to create the training set and the smallest for validation and testing (90-5-5). What may be an option could also be to characterize the binding pocket and split the data according to this characterization, but I miss knowledge to do that kind of things. |
Thank you for sharing! Yes, I am also considering to create a split based on interface similarity using a tool like this. |
You're welcome ! I did not look for such tool but that seems promising ! Also, when I was working with EquiDock, I had results with a bad accuracy considering only the ligand RMSD (as the receptor RMSD is always 0). I'll share my code and results in the next days, but could you consider sharing your results if something similar occurred? |
Hi! I do not use EquiDock and I was mainly interested in the data split. I am working on a related problem of predicting binding affinity change upon mutation (based on the SKEMPI2 data). It as about learning from already bound structures, so its a bit different. |
Hi 👋! In the paper it is mentioned: "For DIPS, the split is based on protein family to separate similar proteins". Is there a source code for this split? I could only find a random split in
paritition_dips.py
.The text was updated successfully, but these errors were encountered: