Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing transposition alignments. #830

Closed
mnshgl0110 opened this issue Oct 18, 2021 · 6 comments
Closed

Missing transposition alignments. #830

mnshgl0110 opened this issue Oct 18, 2021 · 6 comments
Labels

Comments

@mnshgl0110
Copy link

Similar to #816 there are issues in aligning long transpositions between genomes in the current version of minimap2.
For a 50Kb transposition, output from version 2.17-r974-dirty:

Chr1	1000001	287810	715519	+	Chr1	1000001	337810	765519	427709	427709	60	NM:i:0	ms:i:427709	AS:i:427709	nn:i:0	tp:A:P	cm:i:42912	s1:i:427687	s2:i:3642	de:f:0	rl:i:0	cg:Z:427709M
Chr1	1000001	0	287811	+	Chr1	1000001	0	287811	287811	287811	60	NM:i:0	ms:i:287811	AS:i:287811	nn:i:0	tp:A:P	cm:i:28718	s1:i:287795	s2:i:119	de:f:0	rl:i:0	cg:Z:287811M
Chr1	1000001	765518	1000001	+	Chr1	1000001	765518	1000001	234483	234483	60	NM:i:0	ms:i:234483	AS:i:234483	nn:i:0	tp:A:P	cm:i:23363	s1:i:234471	s2:i:619	de:f:0	rl:i:0	cg:Z:234483M
Chr1	1000001	715519	765519	+	Chr1	1000001	287810	337810	50000	50000	60	NM:i:0	ms:i:50000	AS:i:50000	nn:i:0	tp:A:P	cm:i:4969	s1:i:49984	s2:i:112	de:f:0	rl:i:0	cg:Z:50000M

whereas, output from version 2.22-r1110-dirty:

Chr1	1000001	0	1000001	+	Chr1	1000001	0	1000001	950001	1050001	60	NM:i:100000	ms:i:949927	AS:i:849949	nn:i:0	tp:A:P	cm:i:172697	s1:i:934761	s2:i:49990	de:f:0.0000	rl:i:256	cg:Z:287810M50000D427708M50000I234483M

Smaller transpositions are not identified by both versions. Testing a 5Kb transposition, output from version 2.17-r974-dirty:

Chr1	1000001	182075	1000001	+	Chr1	1000001	187075	1000001	812926	817926	60	NM:i:5000	ms:i:797887	AS:i:807845	nn:i:0	tp:A:P	cm:i:81302	s1:i:812896	s2:i:4987	de:f:0.0000	rl:i:0	cg:Z:433718M5000I379208M
Chr1	1000001	0	182077	+	Chr1	1000001	0	182077	182077	182077	60	NM:i:0	ms:i:182077	AS:i:182077	nn:i:0	tp:A:P	cm:i:18154	s1:i:182061	s2:i:0	de:f:0	rl:i:0	cg:Z:182077M

whereas, output from version 2.22-r1110-dirty:

Chr1	1000001	0	1000001	+	Chr1	1000001	0	1000001	995001	1005001	60	NM:i:10000	ms:i:994849	AS:i:984839	nn:i:0	tp:A:P	cm:i:99456	s1:i:993425	s2:i:4987	de:f:0.0000	rl:i:0	cg:Z:182075M5000D433718M5000I379208M

It would be great is such regions can also be identified.

Test files:
reference.txt
seq_up_trans5000.fa.txt
seq_up_trans50000.fa.txt

@lh3
Copy link
Owner

lh3 commented Oct 18, 2021

These are aligned as long indels.

@lh3 lh3 closed this as completed Oct 18, 2021
@lh3 lh3 added the question label Oct 18, 2021
@mnshgl0110
Copy link
Author

Hi Heng. I understand that these are aligned as long indels, but that loses the genomic information that the "long deletion" and the "long insertion" are actually the same sequence that has relocated. In order to capture this information, it would be helpful to have separate alignments for transpostions. Is it possible to make minimap2 output such regions as separate alignments?

@lh3
Copy link
Owner

lh3 commented Oct 19, 2021

You can use -r 1k,10k to control the gap length. The second parameter controls the maximum gap length. It is approximate, not precise, though.

the genomic information that the "long deletion" and the "long insertion" are actually the same sequence that has relocated.

I am not sure how your simulated example could arise biologically. Long duplications are usually tandem with messy boundaries and gene conversions. In my view, getting long INDELs is more often preferred. This is how the default is tuned.

@mnshgl0110
Copy link
Author

One example of translocation was reported earlier (Figure 2B).
Transposon movement can result in intra-chromosomal movement of DNA. Also, as far as I recall, error during meiosis can also cause translocations. Indeed, it is not rare to find such structural rearrangements between natural strains of many species.

@lh3
Copy link
Owner

lh3 commented Oct 20, 2021

I was saying your short insertion-deletion in close range is unlikely. A transposon-mediated event involves an insertion, not moving sequences. Human has L1-mediated XTR between sex chromosomes ~1Mya. That happened at much larger scale and is accompanied by many smaller SVs. It will not be contained in a long alignment. Translocations also occur at arm level, not like your example.

@mnshgl0110
Copy link
Author

DNA transposons (class II transposons) work using the cut-and-paste mechanism leading to movement of DNA. These transposons are reported to be few KBp long only. I am not sure how the insertion target for these transposons is selected, but I assume that they can result in translocation in local regions.

Also, I think, non-human species have more genomic rearrangements. For example, in our data, we find a region with two large inversions and a translocation adjacent to each other between two strains (Col-0 and Ler) of A. thaliana.

image

So, in order to detect such genomic rearrangements, separate alignments would work better than alignments with large indels.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants