Skip to content

Specification on PDB input files

Jordy Homing Lam edited this page Jun 27, 2022 · 6 revisions

Valid input

A few example valid PDB input files are provided. The NucleicNet operates on protein inputs in PDB file format. These file inputs should be placed in the "GridData" folder as indicated from our bash script. To allow a uniform processing of PDB files, users are recommended to check validity of their input files by observing the following criteria:

  • Only contain rows starting with "ATOM" or "TER". Chain Termination indicated by "TER".
  • The PDB file should not contain Hydrogen elements.
  • The PDB file should only contain protein. (i.e. without RNA/DNA/solvents/ligands/HETATM etcetera)
  • Do not contain non-standard amino acid within the protein chain.
  • Do not contain chemicals other than proteins.
  • Each PDB file should contain only one model (c.f. NMR models). In case multiple models are included in the same file, only the first one will be analysed.
  • The file name can be any 4-digit alphanumeric starting with an integer followed by ".pdb" suffix. (e.g. "03Aa.pdb" or "2357.pdb" are valid, but "t3f4.pdb" is not.)
  • The file name should not contain non-alphanumeric other than "." in ".pdb".
Clone this wiki locally