Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BENCHMARK DATA: PPDbench #8

Open
dlemas opened this issue Sep 26, 2022 · 4 comments
Open

BENCHMARK DATA: PPDbench #8

dlemas opened this issue Sep 26, 2022 · 4 comments
Assignees
Labels
data benchmark data documentation Improvements or additions to documentation

Comments

@dlemas
Copy link

dlemas commented Sep 26, 2022

Supplementary Table 4. The statistical information of the additional benchmark
datasets. To evaluate the performance of binary interaction prediction task, we also
generated the negative samples using the same strategy as described in the Methods
section (i.e., shuffling the non-interacting pairs). All these datasets were essentially
derived from the Protein Data Bank (PDB). \Num" is the abbreviation of \Number".

@dlemas dlemas added the documentation Improvements or additions to documentation label Sep 26, 2022
@dlemas dlemas added this to the DEBUG: step1_pdb_process.py milestone Sep 26, 2022
@dlemas dlemas changed the title CAMP: Proteins Reported in Manuscript Supplementary Note 5: Additional results on Generalizability on additional benchmark datasets Sep 26, 2022
@dlemas
Copy link
Author

dlemas commented Sep 26, 2022

Supplementary Table 4.

Dataset Num of PDBs Num of Proteins Num of Peptides Model
PPDbench 133 111 110 4

PPDbench: https://webs.iiitd.edu.in/raghava/ppdbench/dataset.php

  • download the 5 different datasets. count the entries to link with the table information above. We can use this to demo our system.

@dlemas dlemas changed the title Supplementary Note 5: Additional results on Generalizability on additional benchmark datasets BENCHMARK DATA: PPDbench Oct 13, 2022
@dlemas
Copy link
Author

dlemas commented Oct 13, 2022

Update on ppdbench. the files have been distributed to the team. we will start with the ligand files (133). There is a file called ./ppdbench_metadata.csv with all the file names for ligand pdb files.

Next steps to create the PLIP files for benchmark data includes:

  • read the file: ./ppdbench_metadata.csv. this file contains all "ligand" pdb files.
  • read the directory with ligand pdb files
  • load biopython
  • read each pdb file and output the sequence as a column in ./ppdbench_metadata.csv
  • output the metadata file.

https://biopython.org/wiki/The_Biopython_Structural_Bioinformatics_FAQ

How do I get the sequence of a structure?
The first thing to do is to extract all polypeptides from the structure (see previous entry). The sequence of each polypeptide can then easily be obtained from the Polypeptide objects. The sequence is represented as a Biopython Seq object.

Example:

seq = polypeptide.get_sequence()
print(seq)
Seq('SNVVE...')

The new metadata will then be used to identify the protein chain in the next step.

@dlemas dlemas added the data benchmark data label Oct 13, 2022
@dlemas
Copy link
Author

dlemas commented Oct 13, 2022

1A1M is an MHC

@evanhadam
Copy link

The sequences have been inserted into ppdbench_metadata.csv using a request to an API.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
data benchmark data documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

4 participants