Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Arriba and STARfuison align #239

Closed
xiucz opened this issue May 15, 2024 · 7 comments
Closed

Arriba and STARfuison align #239

xiucz opened this issue May 15, 2024 · 7 comments

Comments

@xiucz
Copy link

xiucz commented May 15, 2024

Hi,
The picture below illustrates the differences between the alignment results of Arriba (v2.4.0, STAR2.7.8a) and Starfusion (v1.10.0, STAR2.7.8a). In Starfusion's BAM file, there are numerous softclip reads that support the TIMM23--LINC00843 fusion events (chr10:51606988--chr10:51732772). It appears that the parameter settings used by Arriba's STAR alignment discarded many of these reads.
Could you provide me with some suggestions?

arriba code(use the raw run_arriba.sh script)

arriba_v2.4.0//run_arriba.sh ~/arriba_v2.4.0//database//hg19_GENCODE37lift37/STAR_index_hg19_GENCODE37lift37/ ~/arriba_v2.4.0//database//hg19_GENCODE37lift37/GENCODE37lift37.gtf ~/arriba_v2.4.0//database//hg19_GENCODE37lift37/hg19.fa ~/arriba_v2.4.0//database//blacklist_hg19_hs37d5_GRCh37_v2.4.0.tsv ~/database//known_fusions_hg19_hs37d5_GRCh37_v2.4.0.tsv ~/arriba_v2.4.0//database//protein_domains_hg19_hs37d5_GRCh37_v2.4.0.gff3 6 ~/trim/A2_1.trim.fastq.gz ~/trim/A2_2.trim.fastq.gz

STARfusion code

~/STAR-Fusion-v1.10.0/STAR-Fusion --left_fq ~/trim/A2_1.trim.fastq.gz --right_fq ~/trim/A2_2.trim.fastq.gz --genome_lib_dir ~/hg19_GENCODE37lift37/STARFusion/GRCh37_gencode_v19_CTAT_lib_Mar012021.source/ctat_genome_lib_build_dir --CPU 6 --STAR_PATH ~/STAR-2.7.8a/bin/Linux_x86_64_static/STAR --examine_coding_effect --tmpdir  ~/tmp/ --output_dir STARfusion

image

Best,
xiucz

@suhrig
Copy link
Owner

suhrig commented May 19, 2024

Hi, could you kindly pick one read which is aligned by STAR-Fusion but not by Arriba and extract the paired-end reads from the Arriba BAM file using for example samtools view arriba_alignments.bam | grep -w READ_NAME? I'm curious if the reads are aligned at all. Moreover, if I have the read sequence, then I can try to figure out which parameters are responsible. Thanks!

@xiucz
Copy link
Author

xiucz commented May 20, 2024

Thanks for your reply. I grep one read here.
arriba

$ samtools view Aligned.sortedByCoord.out.bam|grep  V350121616L4C005R0660558367
V350121616L4C005R0660558367	99	chr10	51381825	255	22M2452N59M3295N69M	=	51387704	351037	GGTCCAAACCAGGAAATGTACAGATTTTGAATATGGTGACTAGGCAAGGGGCACTTTGGGCTAATACTCTAGGTTCTCTGGCGTTGCTCTATAGTGCATTTGGTGTCATCATTGAGAAAACACGAGGTGCAGAAGATGACCTTAACACAG	FFFFGGFF>FFF<FFFGFFBFDFFEFFFFFGG??FFFFGFFGFFFGFFDDFDFFFGFGFFFFGFGFFFGFCFFFGFG<GGFFEFFEFFGFAGFGFFFFGFGFFFGFFCFFDFFFFFFFFFFFFFEGFFFFGFFFFA=FF?FGGGFFFFGF	NH:i:1	HI:i:1	AS:i:295	nM:i:2	NM:i:2
V350121616L4C005R0660558367	147	chr10	51387704	255	60M345008N90M	=	51381825	-351037	AAGATGACCTTAACACAGTAGCAGCTGGAACCATGACAGGCATGTTGTATAAATGTACAGTGTCAGAGATGGCTTTGGATTCCCCGTTCTGTGTGCTGCTGTCTGGCTCCTGAACCCAGCTGTAGAGGTGTGTGTCAATCCCAACTGGTG	9@GGFGGFFBAFFFGFFEAFDEFFGGGGCFGCGGCFFFGFEFFFFFGGGCGGBGGEFEGFGG7FGFCGGDFFGDFEGFGGF<GFFFG>FFGGEEGGFGFFGEGFFFGFFGGGFFGFCGFGGGBGGGGGEFFGGGAFFFFFDGFFGGFEGF	NH:i:1	HI:i:1	AS:i:295	nM:i:2	NM:i:0

STARfusion

 $ samtools view Aligned.out.sort.bam |grep  V350121616L4C005R0660558367
V350121616L4C005R0660558367	355	chr10	51381825	3	22M2452N59M3295N69M	=	51387704	5939	GGTCCAAACCAGGAAATGTACAGATTTTGAATATGGTGACTAGGCAAGGGGCACTTTGGGCTAATACTCTAGGTTCTCTGGCGTTGCTCTATAGTGCATTTGGTGTCATCATTGAGAAAACACGAGGTGCAGAAGATGACCTTAACACAG	FFFFGGFF>FFF<FFFGFFBFDFFEFFFFFGG??FFFFGFFGFFFGFFDDFDFFFGFGFFFFGFGFFFGFCFFFGFG<GGFFEFFEFFGFAGFGFFFFGFGFFFGFFCFFDFFFFFFFFFFFFFEGFFFFGFFFFA=FF?FGGGFFFFGF	NH:i:2	HI:i:2	AS:i:207	nM:i:2	RG:Z:GRPundef	XS:A:+
V350121616L4C005R0660558367	403	chr10	51387704	3	60M90S	=	51381825	-5939	AAGATGACCTTAACACAGTAGCAGCTGGAACCATGACAGGCATGTTGTATAAATGTACAGTGTCAGAGATGGCTTTGGATTCCCCGTTCTGTGTGCTGCTGTCTGGCTCCTGAACCCAGCTGTAGAGGTGTGTGTCAATCCCAACTGGTG	9@GGFGGFFBAFFFGFFEAFDEFFGGGGCFGCGGCFFFGFEFFFFFGGGCGGBGGEFEGFGG7FGFCGGDFFGDFEGFGGF<GFFFG>FFGGEEGGFGFFGEGFFFGFFGGGFFGFCGFGGGBGGGGGEFFGGGAFFFFFDGFFGGFEGF	NH:i:2	HI:i:2	AS:i:207	nM:i:2	RG:Z:GRPundef	XS:A:+
V350121616L4C005R0660558367	163	chr10	51606988	3	90S60M	=	51607030	5938	CACCAGTTGGGATTGACACACACCTCTACAGCTGGGTTCAGGAGCCAGACAGCAGCACACAGAACGGGGAATCCAAAGCCATCTCTGACACTGTACATTTATACAACATGCCTGTCATGGTTCCAGCTGCTACTGTGTTAAGGTCATCTT	FGEFGGFFGDFFFFFAGGGFFEGGGGGBGGGFGCFGFFGGGFFGFFFGEGFFGFGGEEGGFF>GFFFG<FGGFGEFDGFFDGGCFGF7GGFGEFEGGBGGCGGGFFFFFEFGFFFCGGCGFCGGGGFFEDFAEFFGFFFABFFGGFGG@9	NH:i:2	HI:i:1	AS:i:207	nM:i:2	RG:Z:GRPundef	XS:A:-
V350121616L4C005R0660558367	83	chr10	51607030	3	69M3294N59M2452N22M	=	51606988	-5938	CTGTGTTAAGGTCATCTTCTGCACCTCGTGTTTTCTCAATGATGACACCAAATGCACTATAGAGCAACGCCAGAGAACCTAGAGTATTAGCCCAAAGTGCCCCTTGCCTAGTCACCATATTCAAAATCTGTACATTTCCTGGTTTGGACC	FGFFFFGGGF?FF=AFFFFGFFFFGEFFFFFFFFFFFFFDFFCFFGFFFGFGFFFFGFGAFGFFEFFEFFGG<GFGFFFCFGFFFGFGFFFFGFGFFFDFDDFFGFFFGFFGFFFF??GGFFFFFEFFDFBFFGFFF<FFF>FFGGFFFF	NH:i:2	HI:i:1	AS:i:207	nM:i:2	RG:Z:GRPundef	XS:A:-

And here is a zoom-in screenshot,
image
Best,
xiucz

@xiucz
Copy link
Author

xiucz commented May 20, 2024

Thanks for your reply, I grep one soft-clip read in STARfusion.
arriba

$samtools view Aligned.sortedByCoord.out.bam|grep  V350121616L4C005R0660558367
V350121616L4C005R0660558367	99	chr10	51381825	255	22M2452N59M3295N69M	=	51387704	351037	GGTCCAAACCAGGAAATGTACAGATTTTGAATATGGTGACTAGGCAAGGGGCACTTTGGGCTAATACTCTAGGTTCTCTGGCGTTGCTCTATAGTGCATTTGGTGTCATCATTGAGAAAACACGAGGTGCAGAAGATGACCTTAACACAG	FFFFGGFF>FFF<FFFGFFBFDFFEFFFFFGG??FFFFGFFGFFFGFFDDFDFFFGFGFFFFGFGFFFGFCFFFGFG<GGFFEFFEFFGFAGFGFFFFGFGFFFGFFCFFDFFFFFFFFFFFFFEGFFFFGFFFFA=FF?FGGGFFFFGF	NH:i:1	HI:i:1	AS:i:295	nM:i:2	NM:i:2
V350121616L4C005R0660558367	147	chr10	51387704	255	60M345008N90M	=	51381825	-351037	AAGATGACCTTAACACAGTAGCAGCTGGAACCATGACAGGCATGTTGTATAAATGTACAGTGTCAGAGATGGCTTTGGATTCCCCGTTCTGTGTGCTGCTGTCTGGCTCCTGAACCCAGCTGTAGAGGTGTGTGTCAATCCCAACTGGTG	9@GGFGGFFBAFFFGFFEAFDEFFGGGGCFGCGGCFFFGFEFFFFFGGGCGGBGGEFEGFGG7FGFCGGDFFGDFEGFGGF<GFFFG>FFGGEEGGFGFFGEGFFFGFFGGGFFGFCGFGGGBGGGGGEFFGGGAFFFFFDGFFGGFEGF	NH:i:1	HI:i:1	AS:i:295	nM:i:2	NM:i:0

STARfusion

$ samtools view Aligned.out.sort.bam |grep  V350121616L4C005R0660558367
V350121616L4C005R0660558367	355	chr10	51381825	3	22M2452N59M3295N69M	=	51387704	5939	GGTCCAAACCAGGAAATGTACAGATTTTGAATATGGTGACTAGGCAAGGGGCACTTTGGGCTAATACTCTAGGTTCTCTGGCGTTGCTCTATAGTGCATTTGGTGTCATCATTGAGAAAACACGAGGTGCAGAAGATGACCTTAACACAG	FFFFGGFF>FFF<FFFGFFBFDFFEFFFFFGG??FFFFGFFGFFFGFFDDFDFFFGFGFFFFGFGFFFGFCFFFGFG<GGFFEFFEFFGFAGFGFFFFGFGFFFGFFCFFDFFFFFFFFFFFFFEGFFFFGFFFFA=FF?FGGGFFFFGF	NH:i:2	HI:i:2	AS:i:207	nM:i:2	RG:Z:GRPundef	XS:A:+
V350121616L4C005R0660558367	403	chr10	51387704	3	60M90S	=	51381825	-5939	AAGATGACCTTAACACAGTAGCAGCTGGAACCATGACAGGCATGTTGTATAAATGTACAGTGTCAGAGATGGCTTTGGATTCCCCGTTCTGTGTGCTGCTGTCTGGCTCCTGAACCCAGCTGTAGAGGTGTGTGTCAATCCCAACTGGTG	9@GGFGGFFBAFFFGFFEAFDEFFGGGGCFGCGGCFFFGFEFFFFFGGGCGGBGGEFEGFGG7FGFCGGDFFGDFEGFGGF<GFFFG>FFGGEEGGFGFFGEGFFFGFFGGGFFGFCGFGGGBGGGGGEFFGGGAFFFFFDGFFGGFEGF	NH:i:2	HI:i:2	AS:i:207	nM:i:2	RG:Z:GRPundef	XS:A:+
V350121616L4C005R0660558367	163	chr10	51606988	3	90S60M	=	51607030	5938	CACCAGTTGGGATTGACACACACCTCTACAGCTGGGTTCAGGAGCCAGACAGCAGCACACAGAACGGGGAATCCAAAGCCATCTCTGACACTGTACATTTATACAACATGCCTGTCATGGTTCCAGCTGCTACTGTGTTAAGGTCATCTT	FGEFGGFFGDFFFFFAGGGFFEGGGGGBGGGFGCFGFFGGGFFGFFFGEGFFGFGGEEGGFF>GFFFG<FGGFGEFDGFFDGGCFGF7GGFGEFEGGBGGCGGGFFFFFEFGFFFCGGCGFCGGGGFFEDFAEFFGFFFABFFGGFGG@9	NH:i:2	HI:i:1	AS:i:207	nM:i:2	RG:Z:GRPundef	XS:A:-
V350121616L4C005R0660558367	83	chr10	51607030	3	69M3294N59M2452N22M	=	51606988	-5938	CTGTGTTAAGGTCATCTTCTGCACCTCGTGTTTTCTCAATGATGACACCAAATGCACTATAGAGCAACGCCAGAGAACCTAGAGTATTAGCCCAAAGTGCCCCTTGCCTAGTCACCATATTCAAAATCTGTACATTTCCTGGTTTGGACC	FGFFFFGGGF?FF=AFFFFGFFFFGEFFFFFFFFFFFFFDFFCFFGFFFGFGFFFFGFGAFGFFEFFEFFGG<GFGFFFCFGFFFGFGFFFFGFGFFFDFDDFFGFFFGFFGFFFF??GGFFFFFEFFDFBFFGFFF<FFF>FFGGFFFF	NH:i:2	HI:i:1	AS:i:207	nM:i:2	RG:Z:GRPundef	XS:A:-

Alignment stat:
image
And here is zoom-in screenshot for the select read:
image
Best,
xiucz

@suhrig
Copy link
Owner

suhrig commented May 20, 2024

Thanks for providing the details.

Without any further evidence, I would argue that Arriba is right here. It finds a better alignment than STAR-Fusion. The alignment proposed by Arriba is linear and even matches an annotated transcript of a gene perfectly (namely, NR_158654.1 of gene TIMM23B-AGAP6). The alignment proposed by STAR-Fusion has the same number of mismatches, but it is chimeric and therefore has a poorer alignment score (which is why Arriba discards it).

The reason why STAR-Fusion prefers the chimeric alignment over the linear one is simply because it uses a very low value for --alignIntronMax (if I remember correctly 100kb). This value is not biologically meaningful, since many splicing events are larger than this. STAR-Fusion has to use such a low value, because otherwise it would not be able to detect fusions from focal deletions. Arriba does not have this limitation and can use larger values for --alignIntronMax. For this reason, it finds the linear alignment, which is most likely an effect of normal splicing.

Do you have any reason to believe that this fusion is real? For example, do you have structural variants from whole-genome sequencing data to confirm this fusion? If not, then I find it likely that the fusion prediction made by STAR-Fusion is wrong in this case.

@xiucz
Copy link
Author

xiucz commented May 21, 2024

Thank you, and I will provide further information if I have any.

@xiucz
Copy link
Author

xiucz commented Jun 12, 2024

In the end, I could not further validate this sample. However, the same fusion event was detected in the WBC of the patient, leading me to believe that STARfusion's fusion event is erroneous. Thank you for your assistance.

@xiucz xiucz closed this as completed Jun 12, 2024
@suhrig
Copy link
Owner

suhrig commented Jun 12, 2024

Thanks for your feedback!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants