Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions about evaluation metrics for sequence alignment #153

Closed
hughplay opened this issue Jan 11, 2024 · 4 comments
Closed

Questions about evaluation metrics for sequence alignment #153

hughplay opened this issue Jan 11, 2024 · 4 comments

Comments

@hughplay
Copy link

Hi,

Your excellent work on the sequence alignment is awesome and inspiring.

Recently, I tested deepblast on the MALIDUP and MALISAM and find the results indeed is great. However, I am confused how the F1 score in the Table 2 is computed. I have tried to reproduce the score with my own evaluation pipeline, as well as computing the f1 based on the tp, fp, fn returned by the function alignment_score, but both results are far from the value given in the table. I think there must be some mistakes in my evaluation code.

The code for evaluting one sample based on alignment_score is like this:

EPS = 1e-8

model = load_model("deepblast-v3.ckpt", "prot_t5_xl_uniref50").cuda()

true_alignment = ...
pred_alignment = model.align(primary_sequence, target_sequence)

scores = alignment_score(true_alignment, pred_alignment)
tp, fp, fn = scores[0], scores[1], scores[2]

precision = tp / (tp + fp)
recall = tp / (tp + fn)
f1 = 2 * precision * recall / (precision + recall + EPS)

Could you please provide guidance on the correct method for calculating the F1 score?

Thank you!

@mortonjt
Copy link
Collaborator

Hi, it looks like you are using the same methods that I used for evaluation. The original notebooks can be found on the zenodo linked in the paper : https://doi.org/10.5281/zenodo.7731163

@hughplay
Copy link
Author

hughplay commented Jan 18, 2024

Thank you very much! I have reproduced the results.

The reason that I got wrong scores is that I first represented the alignment in another format, and my transforming function was not well tested and I obatained wrong alignment states for computing scores. 😭

@hughplay
Copy link
Author

hughplay commented Jan 19, 2024

I'm back again.

I find the alignment score seems to be weired in some cases. According to my observation, it happens when the alignments starting with "21:", for example (MALIDUP, d1knca):

manual
SSITRSSVLDQEQLWGTLLASAAATRNPQVLADIGAEATDH-LSAAARHAALGAAAIMGMNNVFYRGRGFLE
:::::::::::::::::::::::::::::::::::::::::1::::::::::::::::::::::::::::::
MNIIANPGIPKANFELWSFAVSAINGCSHCLVAHEHTLRTVGVDREAIFEALKAAAIVSGVAQALATIEALS

deepblast
S-SITRSSVLDQEQLWGTLLASAAATRNPQVLADIGAEATDH-LSAAARHAALGAAAIM-GMNNVFYRGRGFLE
21::::::::::::::::::::::::::::::::::::::::1::::::::::::::::1:::::::::::2::
-MNIIANPGIPKANFELWSFAVSAINGCSHCLVAHEHTLRTVGVDREAIFEALKAAAIVSGVAQALATIEA-LS

true_edges:
[(0, 0), (1, 1), (2, 2), (3, 3), (4, 4), (5, 5), (6, 6), (7, 7), (8, 8), (9, 9), (10, 10), (11, 11), (12, 12), (13, 13), (14, 14), (15, 15), (16, 16), (17, 17), (18, 18), (19, 19), (20, 20), (21, 21), (22, 22), (23, 23), (24, 24), (25, 25), (26, 26), (27, 27), (28, 28), (29, 29), (30, 30), (31, 31), (32, 32), (33, 33), (34, 34), (35, 35), (36, 36), (37, 37), (38, 38), (39, 39), (40, 40), (41, 40), (42, 41), (43, 42), (44, 43), (45, 44), (46, 45), (47, 46), (48, 47), (49, 48), (50, 49), (51, 50), (52, 51), (53, 52), (54, 53), (55, 54), (56, 55), (57, 56), (58, 57), (59, 58), (60, 59), (61, 60), (62, 61), (63, 62), (64, 63), (65, 64), (66, 65), (67, 66), (68, 67), (69, 68), (70, 69), (71, 70)]
pred_edges:
[(0, 0), (1, 0), (2, 1), (3, 2), (4, 3), (5, 4), (6, 5), (7, 6), (8, 7), (9, 8), (10, 9), (11, 10), (12, 11), (13, 12), (14, 13), (15, 14), (16, 15), (17, 16), (18, 17), (19, 18), (20, 19), (21, 20), (22, 21), (23, 22), (24, 23), (25, 24), (26, 25), (27, 26), (28, 27), (29, 28), (30, 29), (31, 30), (32, 31), (33, 32), (34, 33), (35, 34), (36, 35), (37, 36), (38, 37), (39, 38), (40, 39), (41, 40), (42, 40), (43, 41), (44, 42), (45, 43), (46, 44), (47, 45), (48, 46), (49, 47), (50, 48), (51, 49), (52, 50), (53, 51), (54, 52), (55, 53), (56, 54), (57, 55), (58, 56), (59, 56), (60, 57), (61, 58), (62, 59), (63, 60), (64, 61), (65, 62), (66, 63), (67, 64), (68, 65), (69, 66), (70, 67), (70, 68), (71, 69), (72, 70)]

DeepBlast predicts pretty well in this case, but the f1 score is 0. I am confused about the evaluation method. What are the edges? Why we need to compute the edges first? And why the f1 score is 0 in this case?

@hughplay hughplay reopened this Jan 19, 2024
@mortonjt
Copy link
Collaborator

Hi, the edges are the match coordinates between the two sequences.

Regarding f1 score, if there is an off-by-1 error, the f1 score can be zero, even if the structural similarity is preserved. This is why f1 score isn't a great metric (TM-score is more robust).

Regarding the edge alignments, indeed there are weird edge cases. This is partially due to the querks surrounding indels -- the current gap-position-specific scoring setup isn't ideal. And we don't have a concept of affine gap scoring (it turns out to be highly non-trivial to setup for differential dynamic programming). See the DEDAL paper on a discussion on this

Despite these setbacks, these edge cases doesn't seem to strong affect the TMscore, since superposition is still roughly the same.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants