BENCHMARK DATA: PPDbench #8

dlemas · 2022-09-26T12:25:12Z

Supplementary Table 4. The statistical information of the additional benchmark
datasets. To evaluate the performance of binary interaction prediction task, we also
generated the negative samples using the same strategy as described in the Methods
section (i.e., shuffling the non-interacting pairs). All these datasets were essentially
derived from the Protein Data Bank (PDB). \Num" is the abbreviation of \Number".

dlemas · 2022-09-26T18:38:17Z

Supplementary Table 4.

Dataset	Num of PDBs	Num of Proteins	Num of Peptides	Model
PPDbench	133	111	110	4

PPDbench: https://webs.iiitd.edu.in/raghava/ppdbench/dataset.php

download the 5 different datasets. count the entries to link with the table information above. We can use this to demo our system.

dlemas · 2022-10-13T00:59:37Z

Update on ppdbench. the files have been distributed to the team. we will start with the ligand files (133). There is a file called ./ppdbench_metadata.csv with all the file names for ligand pdb files.

Next steps to create the PLIP files for benchmark data includes:

read the file: ./ppdbench_metadata.csv. this file contains all "ligand" pdb files.
read the directory with ligand pdb files
load biopython
read each pdb file and output the sequence as a column in ./ppdbench_metadata.csv
output the metadata file.

https://biopython.org/wiki/The_Biopython_Structural_Bioinformatics_FAQ

How do I get the sequence of a structure?
The first thing to do is to extract all polypeptides from the structure (see previous entry). The sequence of each polypeptide can then easily be obtained from the Polypeptide objects. The sequence is represented as a Biopython Seq object.

Example:

seq = polypeptide.get_sequence()
print(seq)
Seq('SNVVE...')

The new metadata will then be used to identify the protein chain in the next step.

dlemas · 2022-10-13T01:21:03Z

1A1M is an MHC

evanhadam · 2022-10-20T01:26:55Z

The sequences have been inserted into ppdbench_metadata.csv using a request to an API.

dlemas added the documentation Improvements or additions to documentation label Sep 26, 2022

dlemas added this to the DEBUG: step1_pdb_process.py milestone Sep 26, 2022

dlemas assigned nataliegood Sep 26, 2022

dlemas changed the title ~~CAMP: Proteins Reported in Manuscript~~ Supplementary Note 5: Additional results on Generalizability on additional benchmark datasets Sep 26, 2022

dlemas assigned AnthonyYao7 Sep 26, 2022

dlemas changed the title ~~Supplementary Note 5: Additional results on Generalizability on additional benchmark datasets~~ BENCHMARK DATA: PPDbench Oct 13, 2022

dlemas assigned evanhadam Oct 13, 2022

dlemas added the data benchmark data label Oct 13, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BENCHMARK DATA: PPDbench #8

BENCHMARK DATA: PPDbench #8

dlemas commented Sep 26, 2022 •

edited

Loading

dlemas commented Sep 26, 2022

dlemas commented Oct 13, 2022

dlemas commented Oct 13, 2022

evanhadam commented Oct 20, 2022

BENCHMARK DATA: PPDbench #8

BENCHMARK DATA: PPDbench #8

Comments

dlemas commented Sep 26, 2022 • edited Loading

dlemas commented Sep 26, 2022

dlemas commented Oct 13, 2022

dlemas commented Oct 13, 2022

evanhadam commented Oct 20, 2022

dlemas commented Sep 26, 2022 •

edited

Loading