Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Indels around the limit of the window #240

Closed
RolandFaure opened this issue May 7, 2024 · 1 comment
Closed

Indels around the limit of the window #240

RolandFaure opened this issue May 7, 2024 · 1 comment

Comments

@RolandFaure
Copy link

Hello,

Thanks for developping Racon, I use it all the time 😃

I am polishing amplicon sequences (they are short, usually less than 1000bp), and I noticed that racon sometimes introduced small indels at position 500 of the consensus, which I suppose is linked with the fact that 500 is the size of the default window. Do you know where this issue might come from ?

Attached is one very small example:
racon_pb.gz

I have v1.4.20 and here is the command lines used:

minimap2 -ax map-ont consensus_0.fa reads_0.fasta > mapped_0.sam
racon reads_0.fasta mapped_0.sam consensus_0.fa > polished_0.fasta

Around position 500 of polished_0.fasta I obtain the sequence "TGTGCAGATTTTTGACAA", which is in none of the reads and should instead be "TGTGTGCGATTTTTGACAA".

Thanks in advance

@isovic
Copy link
Owner

isovic commented Jun 24, 2024

Hi,
You are correct. There is unfortunately a side effect of windowing, if the window boundary happens to fall on an indel region.

There are a couple of options you may try:

  1. Run Racon twice, either with the same (default) window size, or with a slightly different window size. If your input data has a bias towards insertions or deletions, then your consensus sequence will change in length, and the window boundaries in the second round should be different than in the first round and you can just run it with the same default window size. If your consensus does not vary much in length, try using a slightly larger/smaller window size for the second round.
  2. Since your target sequences are only ~1000bp in length, you can try to bump up the window size to 1000bp or more, and produce a consensus as a single window. That way you will avoid windowing issues altogether.

Hope this helps,
Best regards,
Ivan.

@isovic isovic closed this as completed Jul 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants