Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create a independent PDB sanitize step before entering the haddock3 pipeline #143

Closed
19 of 22 tasks
joaomcteixeira opened this issue Nov 24, 2021 · 8 comments · Fixed by #144
Closed
19 of 22 tasks

Create a independent PDB sanitize step before entering the haddock3 pipeline #143

joaomcteixeira opened this issue Nov 24, 2021 · 8 comments · Fixed by #144
Assignees
Labels
documentation Improve docs feature New feature request

Comments

@joaomcteixeira
Copy link
Member

joaomcteixeira commented Nov 24, 2021

After the discussions in #140 and #142. PR #144 implements what is here discussed. The preprocessing steps happen before the haddock3 pipeline when the original input date is copied to the run_dir/data folder. The aim is also to have a CLI that the user can run and correct the PDBs (or dry-run) before submittting.

Done:

The list follows the execution order defined in the process_pdbs function, order matters:

  • ANISOU records can be discarded (use pdb_keepcoord)
  • REMARK lines are discarded (use pdb_keepcoord)
  • Select altloc with the highest occupancy (waiting for Improved pdb_selaltloc pdb-tools#117 to be merged)
  • sets all occupancy to 1.00 (uses pdb_occ)
  • Rename the MSE residues (selenomethionine) to MET. Also if MSE are defined as HETATM, make them ATOM.
  • Rename HSD, HSE, HIE, HID to HIS and convert them also to ATOM if needed.
  • Corrects charge in ions
  • Supports all residues defined in cns/toppar/*.top files. Automatically retrieves new residues from those files.
  • Convert ATOM to HETATM for residues that are expected to be HETATM. If the user provides .top file also residues defined there get converted to HETATM if needed.
  • Convert HETATM to ATOM for residues that should be ATOM and are defined as HETATM (these are natural and modified aminoacids)
  • Remove unsupported HETATM. Accepts user .top file.
  • Remove unsupported ATOM.
  • Insertions (AARG, BARG) (uses pdb_fixinsert)
  • Renumbers atoms (uses pdb_reatom starting at 1)
  • Renumbers residues (uses pdb_reres starting at 1)
  • pdb_tidy (waiting Correct MODEL, END and ENDMDL in pdb_tidy pdb-tools#119)
  • address Weird behaviour when PDB input files have no chainID #138
  • If input PDBs have the same chain ID (in different files) these are corrected such that all PDBs have different chains.

Todo

  • Residues in the same chain cannot have repeated numbering
  • If there is a gap in the sequence, the gap must be maintained (from the above list nothing corrects for this, so it should be as is, to be tested)
  • All models in an ensemble (MODEL) should be equal, that is, same labels.
  • Add flag to skip the preprocessing step

Probably good to check what our current 2.4 server machinery is doing in terms of input PDB validation

@joaomcteixeira joaomcteixeira added the enhancement Enhancing an existing feature of adding a new one label Nov 24, 2021
@joaomcteixeira joaomcteixeira added this to To Do in Features via automation Nov 24, 2021
@joaomcteixeira joaomcteixeira added feature New feature request and removed enhancement Enhancing an existing feature of adding a new one labels Nov 24, 2021
@joaomcteixeira joaomcteixeira added this to the v3.0.0 stable release milestone Nov 24, 2021
@joaomcteixeira joaomcteixeira moved this from To Do to In Progress in Features Nov 24, 2021
@rvhonorato
Copy link
Member

I've added more points, the server also has checks specific to the moleculetypes

@joaomcteixeira
Copy link
Member Author

When a PDB has multiple chains, what should be the behaviour? Keep only one chain or homogenize all chains to the same identifier?

@amjjbonvin
Copy link
Member

amjjbonvin commented Nov 25, 2021 via email

@joaomcteixeira joaomcteixeira linked a pull request Nov 26, 2021 that will close this issue
@joaomcteixeira
Copy link
Member Author

Are ligands always given in independent PDBs, or can they be given in the same PDB together with the protein? And if the latter is the case, should HETATM be all at the end of the file, with TER, same chain?

@rvhonorato
Copy link
Member

They can be either separated or together with any moleculetype actually, before or after the ATOM. Not sure about the effect of the TER record.

@joaomcteixeira
Copy link
Member Author

pdb_tidy will add TER if there is an ATOM/HETATM break. If that can't be the case, I need to add something to tidy.

@rvhonorato
Copy link
Member

Related? haddocking/pdb-tools#101

@amjjbonvin
Copy link
Member

amjjbonvin commented Dec 3, 2021 via email

@rvhonorato rvhonorato removed their assignment Feb 1, 2022
@joaomcteixeira joaomcteixeira added this to To do in PDB preprocessing via automation May 10, 2022
@joaomcteixeira joaomcteixeira removed this from In Progress in Features May 10, 2022
@joaomcteixeira joaomcteixeira moved this from To do to In progress in PDB preprocessing May 10, 2022
@joaomcteixeira joaomcteixeira added the documentation Improve docs label May 10, 2022
PDB preprocessing automation moved this from In progress to Done Jul 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improve docs feature New feature request
Projects
Development

Successfully merging a pull request may close this issue.

3 participants