Dealing with templates that have missing residues #6

luhong88 · 2023-06-07T20:08:06Z

What is the "proper" way to use a template that has missing residues? Let's say I have a template that has 20 missing residues due to an unresolved loop region, and I'm not interested in the region for the purpose of assessing pLDDT. I tried to give AF2Rank the template as is and a sequence with the gap token represented by "X", but it seems that AF2Rank processes the template as a contiguous chain and as such gives an error message due to sequence length mismatch. In this case, am I supposed to give AF2Rank a sequence with no gap tokens? Alternatively, I know that I can treat the template as a dimer and use a alphafold_multimer model, but I'm not sure if dealing with a monomeric protein with a multimer model is the best way to score the structure.

jproney · 2023-06-10T18:20:44Z

Hello! Thanks for the question. The way to handle this scenario is to create a template with the full sequence length, but use the template_all_atom_masks feature to indicate which residues are fully or partially missing from the template. I believe the existing code should be able to handle this scenario, although clearly something is going wrong in this case. The following logic in the score_decoy function is meant to accommodate missing residues so long as the residues that do exist match the target sequence:

  decoy_seq_in = "".join([residue_constants.restypes[x] for x in decoy_prot.aatype]) # the sequence in the decoy PDB file

  mismatch = False
  if decoy_seq_in == target_seq:
    assert jnp.all(decoy_prot.residue_index - 1 == np.arange(len(target_seq)))
  else: # case when template is missing some residues
    if args.verbose:
      print("Sequece mismatch: {}".format(name))
    mismatch=True

    assert "".join(target_seq[i-1] for i in decoy_prot.residue_index) == decoy_seq_in

Is it one of these assertions that is failing? If so, there could potentially be an issue with the numbering of the residues in the PDB file, which should match their position in the sequence regardless of whether some residues are missing. Any more specific information you can provide on the issue would be helpful. Thanks!

luhong88 · 2023-06-21T02:37:17Z

Okay, I guess there's a higher level question here. So far the way I've been running af2rank is through a local installation of the colab notebook code (https://colab.research.google.com/github/sokrypton/ColabDesign/blob/main/af/examples/AF2Rank.ipynb#scrollTo=UCUZxJdbBjZt). A quick glance at the test_templates.py seems to suggest that this is not quite the same as what's provided with the colab notebook. Should I be switching over to using test_templates.py?

Edit: after reading through the code more carefully, it seems to me that the main differences between test_templates.py and the colab notebook are:

test_templates.py directly calls the alphafold module, while the colab notebook goes through colabdesign
test_templates.py cannot take in multimers, while the notebook can.
test_templates.py can take in templates with missing residues and will produce a predicted structure with the missing residues filled in through af, but the notebook ignores missing residues.

But in the end, both test_templates.py and the notebook will create fake CB coordinates for glycines, and both have the option to mask template sequence and sidechain atom coordinates. Is this accurate?

jproney · 2023-07-03T04:56:46Z

I think your assessment is accurate, although the notebook can handle templates with missing residues with a few minor modifications! I've put together the following notebook to handle missing residues:

https://colab.research.google.com/drive/1lFg0zem4-dm70JdZEhXJuigZb0NFW3fH?usp=sharing

The notebook contains an example of ranking a template with deleted residues, which should help give some insight into how to address this issue. In this new notebook, calling af.predict(pdb=pdb_path, seq=seq) will ensure that seq is used as the target sequence. The residues in the template will then be correctly indexed and masked within the native sequence, so long as the PDB residues are numbered correctly (see the example in the notebook for reference, but essentially the index associated with each residue in the PDB needs to be its index within the full sequence, meaning that some numbers will be skipped in a PDB with unresolved residues).

@sokrypton if all seems well could you integrate this update to the notebook into the ColabFold repo?

jproney closed this as completed Sep 18, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dealing with templates that have missing residues #6

Dealing with templates that have missing residues #6

luhong88 commented Jun 7, 2023

jproney commented Jun 10, 2023

luhong88 commented Jun 21, 2023 •

edited

Loading

jproney commented Jul 3, 2023 •

edited

Loading

Dealing with templates that have missing residues #6

Dealing with templates that have missing residues #6

Comments

luhong88 commented Jun 7, 2023

jproney commented Jun 10, 2023

luhong88 commented Jun 21, 2023 • edited Loading

jproney commented Jul 3, 2023 • edited Loading

luhong88 commented Jun 21, 2023 •

edited

Loading

jproney commented Jul 3, 2023 •

edited

Loading