Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alignment with match score not optimal #26

Closed
wongs2 opened this issue Jul 16, 2022 · 4 comments
Closed

Alignment with match score not optimal #26

wongs2 opened this issue Jul 16, 2022 · 4 comments

Comments

@wongs2
Copy link

wongs2 commented Jul 16, 2022

Using the latest v2.2, with match score, the alignment is not optimal preferring a gap and not a mismatch. The alignment result is "8D1I1M7I", expecting "7D1X1M7I"

int main(int argc,char* argv[]) {
  // Patter & Text
  char* pattern = "CCCCCCCCC";
  char* text    = "TCTTTTTTT";
  // Configure alignment attributes
  wavefront_aligner_attr_t attributes = wavefront_aligner_attr_default;
  attributes.distance_metric = gap_affine;
  attributes.affine_penalties.match = -10;
  attributes.affine_penalties.mismatch = 4;
  attributes.affine_penalties.gap_opening = 6;
  attributes.affine_penalties.gap_extension = 2;

  attributes.alignment_form.span = alignment_endsfree;
  attributes.alignment_form.pattern_begin_free = strlen(pattern)-1;
  attributes.alignment_form.pattern_end_free = 0;
  attributes.alignment_form.text_begin_free = 0;
  attributes.alignment_form.text_end_free = strlen(text)-1;
  // Initialize Wavefront Aligner
  wavefront_aligner_t* const wf_aligner = wavefront_aligner_new(&attributes);
  // Align
  wavefront_align(wf_aligner,pattern,strlen(pattern),text,strlen(text));
  fprintf(stderr,"WFA-Alignment returns score %d\n",wf_aligner->cigar.score);
  // Display alignment
  fprintf(stderr,"  PATTERN  %s\n",pattern);
  fprintf(stderr,"  TEXT     %s\n",text);
  fprintf(stderr,"  SCORE (RE)COMPUTED %d\n",
      cigar_score_gap_affine(&wf_aligner->cigar,&attributes.affine_penalties));
  cigar_print_pretty(stderr,
      pattern,strlen(pattern),text,strlen(text),
      &wf_aligner->cigar,wf_aligner->mm_allocator);
  // Free
  wavefront_aligner_delete(wf_aligner);
}

The result:

WFA-Alignment returns score 64
  PATTERN  CCCCCCCCC
  TEXT     TCTTTTTTT
  SCORE (RE)COMPUTED -40
      ALIGNMENT	8D1I1M7I
      ALIGNMENT.COMPACT	8D1I7I
      PATTERN    CCCCCCCC-C-------
                          |
      TEXT       --------TCTTTTTTT
@smarco
Copy link
Owner

smarco commented Jul 17, 2022

Hi,

Thanks for the feedback. This case is really interesting.

I understand the problem and realize that @jeizenga's penalties-conversion (i.e., match-score) formulas apply to global alignment.

$s_c=\frac{1}{2}(l_c(M+N)-s_w)$

However, in the case of ends-free alignment, the query and text segments aligned (outside ends-free regions) can be smaller than $M$ and $N$ (let's say, $X$ and $Y$).

$s_c=\frac{1}{2}(l_c(X+Y)-s_w)$

So, the first solution found by the WFA (with adapted scores) doesn't have to be optimal. Nevertheless, I understand that we can bound the classic score ($s_c$), being $ef_q$ and $ef_t$ the maximum length of the query and text that can be "free".

$\frac{1}{2}(l_c(M-ef_t+N-ef_q)-s_w) \leq s_c \leq \frac{1}{2}(l_c(M+N)-s_w)$

Hence, I belive that performing $ef_q+ef_t$ extra steps, at most, the algorithm will find the optimal score.
Let me know what you think, @jeizenga and @wongs2, so I can patch it.

Cheers,

@jeizenga
Copy link
Collaborator

Some of the details of the implementation are a bit opaque to me, but assuming that you do have $M - ef_t \leq X \leq M$ and $N - ef_q \leq Y \leq N$ then I think you are right.

@wongs2
Copy link
Author

wongs2 commented Jul 17, 2022

I have not dived deep to understand the implementation. Here are 2 more "not optimal" solution:

WFA-Alignment returns score 95
  PATTERN  CCCCCCCCC
  TEXT     CTCTTTTTTT
  SCORE (RE)COMPUTED -36
      ALIGNMENT	8D1M9I
      ALIGNMENT.COMPACT	8D9I
      PATTERN    CCCCCCCCC---------
                         |
      TEXT       --------CTCTTTTTTT

WFA-Alignment returns score 95
  PATTERN  CCCCCCCCC
  TEXT     CCCTTTTTTT
  SCORE (RE)COMPUTED -36
      ALIGNMENT	8D1M9I
      ALIGNMENT.COMPACT	8D9I
      PATTERN    CCCCCCCCC---------
                         |
      TEXT       --------CCCTTTTTTT

@smarco
Copy link
Owner

smarco commented Aug 1, 2022

In the end, it was just an initialization problem.
I have pushed a fix into the development branch.

In principle, it should be fixed.
Try it and let me know.

@smarco smarco closed this as completed Oct 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants