Alignment with match score not optimal #26

wongs2 · 2022-07-16T18:29:00Z

Using the latest v2.2, with match score, the alignment is not optimal preferring a gap and not a mismatch. The alignment result is "8D1I1M7I", expecting "7D1X1M7I"

int main(int argc,char* argv[]) {
  // Patter & Text
  char* pattern = "CCCCCCCCC";
  char* text    = "TCTTTTTTT";
  // Configure alignment attributes
  wavefront_aligner_attr_t attributes = wavefront_aligner_attr_default;
  attributes.distance_metric = gap_affine;
  attributes.affine_penalties.match = -10;
  attributes.affine_penalties.mismatch = 4;
  attributes.affine_penalties.gap_opening = 6;
  attributes.affine_penalties.gap_extension = 2;

  attributes.alignment_form.span = alignment_endsfree;
  attributes.alignment_form.pattern_begin_free = strlen(pattern)-1;
  attributes.alignment_form.pattern_end_free = 0;
  attributes.alignment_form.text_begin_free = 0;
  attributes.alignment_form.text_end_free = strlen(text)-1;
  // Initialize Wavefront Aligner
  wavefront_aligner_t* const wf_aligner = wavefront_aligner_new(&attributes);
  // Align
  wavefront_align(wf_aligner,pattern,strlen(pattern),text,strlen(text));
  fprintf(stderr,"WFA-Alignment returns score %d\n",wf_aligner->cigar.score);
  // Display alignment
  fprintf(stderr,"  PATTERN  %s\n",pattern);
  fprintf(stderr,"  TEXT     %s\n",text);
  fprintf(stderr,"  SCORE (RE)COMPUTED %d\n",
      cigar_score_gap_affine(&wf_aligner->cigar,&attributes.affine_penalties));
  cigar_print_pretty(stderr,
      pattern,strlen(pattern),text,strlen(text),
      &wf_aligner->cigar,wf_aligner->mm_allocator);
  // Free
  wavefront_aligner_delete(wf_aligner);
}

The result:

WFA-Alignment returns score 64
  PATTERN  CCCCCCCCC
  TEXT     TCTTTTTTT
  SCORE (RE)COMPUTED -40
      ALIGNMENT	8D1I1M7I
      ALIGNMENT.COMPACT	8D1I7I
      PATTERN    CCCCCCCC-C-------
                          |
      TEXT       --------TCTTTTTTT

The text was updated successfully, but these errors were encountered:

smarco · 2022-07-17T10:15:10Z

Hi,

Thanks for the feedback. This case is really interesting.

I understand the problem and realize that @jeizenga's penalties-conversion (i.e., match-score) formulas apply to global alignment.

$s_c=\frac{1}{2}(l_c(M+N)-s_w)$

However, in the case of ends-free alignment, the query and text segments aligned (outside ends-free regions) can be smaller than $M$ and $N$ (let's say, $X$ and $Y$).

$s_c=\frac{1}{2}(l_c(X+Y)-s_w)$

So, the first solution found by the WFA (with adapted scores) doesn't have to be optimal. Nevertheless, I understand that we can bound the classic score ($s_c$), being $ef_q$ and $ef_t$ the maximum length of the query and text that can be "free".

$\frac{1}{2}(l_c(M-ef_t+N-ef_q)-s_w) \leq s_c \leq \frac{1}{2}(l_c(M+N)-s_w)$

Hence, I belive that performing $ef_q+ef_t$ extra steps, at most, the algorithm will find the optimal score.
Let me know what you think, @jeizenga and @wongs2, so I can patch it.

Cheers,

jeizenga · 2022-07-17T15:18:14Z

Some of the details of the implementation are a bit opaque to me, but assuming that you do have $M - ef_t \leq X \leq M$ and $N - ef_q \leq Y \leq N$ then I think you are right.

wongs2 · 2022-07-17T16:14:15Z

I have not dived deep to understand the implementation. Here are 2 more "not optimal" solution:

WFA-Alignment returns score 95
  PATTERN  CCCCCCCCC
  TEXT     CTCTTTTTTT
  SCORE (RE)COMPUTED -36
      ALIGNMENT	8D1M9I
      ALIGNMENT.COMPACT	8D9I
      PATTERN    CCCCCCCCC---------
                         |
      TEXT       --------CTCTTTTTTT

WFA-Alignment returns score 95
  PATTERN  CCCCCCCCC
  TEXT     CCCTTTTTTT
  SCORE (RE)COMPUTED -36
      ALIGNMENT	8D1M9I
      ALIGNMENT.COMPACT	8D9I
      PATTERN    CCCCCCCCC---------
                         |
      TEXT       --------CCCTTTTTTT

smarco · 2022-08-01T13:55:08Z

In the end, it was just an initialization problem.
I have pushed a fix into the development branch.

In principle, it should be fixed.
Try it and let me know.

smarco closed this as completed Oct 1, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Alignment with match score not optimal #26

Alignment with match score not optimal #26

wongs2 commented Jul 16, 2022 •

edited

Loading

smarco commented Jul 17, 2022

jeizenga commented Jul 17, 2022

wongs2 commented Jul 17, 2022

smarco commented Aug 1, 2022

Alignment with match score not optimal #26

Alignment with match score not optimal #26

Comments

wongs2 commented Jul 16, 2022 • edited Loading

smarco commented Jul 17, 2022

jeizenga commented Jul 17, 2022

wongs2 commented Jul 17, 2022

smarco commented Aug 1, 2022

wongs2 commented Jul 16, 2022 •

edited

Loading