incorrect m3a format in alphafold #34

aai97 · 2020-02-28T08:52:55Z

When describing the deletion_probability feature for alphafold, you specify the fact that you use the m3a format from hhblits.
To quote hhblits' github on the m3a format:

residues emitted by Match states of the HMM are in upper case, residues emitted by Insert states are in lower case and deletions are written -.

In the m3a format, the sequences of the MSA are not necessarily of equal length, and deletions are denoted by "-", whereas lowercase letters denote insertions and cause the disparities in sequence length:

A B C        Original sequence
A - E        Sequence where residue 2 was deleted, residue 3 was substituded
A d B C      Sequence where a residue d was inserted between residue 2 and 3. 
             Note that now the residue B no longer aligns with that of the original sequence.

This means your description of the deletion_probability feature makes no sense: not just should we count "-" rather than lowercase letters if we are looking for deletions, but aligning the residues by column makes no sense in the a3m format, since the lengths dont match.

Assuming that the name deletion_probability is not a misnomer, one has to instead remove all lowercase letters form the a3m MSA and then count the number of "-" per column to obtain the probability of a deletion of a particular residue in the MSA.

Is my reasoning here correct, or am I missing something important?

The text was updated successfully, but these errors were encountered:

huhlim · 2020-03-02T17:28:48Z

I agree with your opinion. The description was strange to me for the same reason.

Augustin-Zidek · 2020-03-04T13:28:42Z

Hi, thanks for the feedback. You are right, the description of the deletion_probability feature wasn't correct, I fixed it and added a code snippet to show how we compute the deletion_probability feature to make it clearer.

On the insertion vs deletion comment: I agree that our naming is misleading -- we call them 'deleted' residues because they have to be deleted in order for the sequence to align to the query.

aai97 · 2020-03-04T13:55:15Z

Thank you for the update.

diegolascasas closed this as completed Mar 4, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

incorrect m3a format in alphafold #34

incorrect m3a format in alphafold #34

aai97 commented Feb 28, 2020

huhlim commented Mar 2, 2020

Augustin-Zidek commented Mar 4, 2020

aai97 commented Mar 4, 2020

incorrect m3a format in alphafold #34

incorrect m3a format in alphafold #34

Comments

aai97 commented Feb 28, 2020

huhlim commented Mar 2, 2020

Augustin-Zidek commented Mar 4, 2020

aai97 commented Mar 4, 2020