Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation fault with kallisto quant #432

Closed
bbimber opened this issue Apr 4, 2024 · 4 comments
Closed

Segmentation fault with kallisto quant #432

bbimber opened this issue Apr 4, 2024 · 4 comments

Comments

@bbimber
Copy link

bbimber commented Apr 4, 2024

Hello,

I'm trying to run kallisto quant with the pseudobam option, and it is consistently running into a segfault error. Do you have suggestions on debugging that I could try? Below is the command and output:

kallisto quant -i <IDX> -o <OUTPUT> --gtf <GTF> --pseudobam -t 8 --verbose <10X_FQ1> <10X_FQ2>

and the end of the output:

[quant] done
[quant] processed 176,394,303 reads, 909,387 reads pseudoaligned
[quant] estimated average fragment length: 273.2
[   em] quantifying the abundances ... done
[   em] the Expectation-Maximization algorithm ran for 97 rounds
[  bam] writing pseudoalignments to BAM format .. /var/spool/slurmd/job39236197/slurm_script: line 118: 33371 Segmentation fault      (core dumped) $KALLISTO quant -i $RhSup_IDX -o $OUT --gtf $RhSup_GTF --pseudobam -t $THREADS --verbose $FQ1 $FQ2

thanks for any help.

@Yenaled
Copy link
Collaborator

Yenaled commented Apr 4, 2024

What version of kallisto are you running?

One possibility I can think of is that your GTF file is malformed (i.e. the transcript IDs in your GTF file don't match the transcript IDs in your index).

@bbimber
Copy link
Author

bbimber commented Apr 4, 2024

The version is the latest, 0.50.1.

The GTF is a bit contrived, but i think it's valid. My reference space is about 30 coding sequences. I created the GTF by making one gene feature and on transcript feature for each reference, where that feature extends one 1 to the length of the reference. I'm not sure if this is helpful, but this is the actual code, which takes the FASTA FAI index, iterates it and writes out:

	while IFS=$'\t' read -r -a myArray
	do
		SEQ_NAME="${myArray[0]}"
		SEQ_LEN="${myArray[1]}"
		echo -e $SEQ_NAME"\tnimble\tgene\t1\t"$SEQ_LEN"\t.\t+\t.\tgene_id "\"$SEQ_NAME"\"; gene_name \""$SEQ_NAME"\"; gene_biotype = \"protein_coding\";" >> $GTF
		echo -e $SEQ_NAME"\ttnimble\ttranscript\t1\t"$SEQ_LEN"\t.\t+\t.\tgene_id "\"$SEQ_NAME"\"; gene_name \""$SEQ_NAME"\"; transcript_id = \""$SEQ_NAME"\"; gene_biotype = \"protein_coding\";" >> $GTF
	done < $FAI_FILE

the result looks something like this:

CCR7_NM_001032884	nimble	gene	1	1137	.	+	.	gene_id "CCR7_NM_001032884"; gene_name "CCR7_NM_001032884"; gene_biotype = "protein_coding";
CCR7_NM_001032884	nimble	transcript	1	1137	.	+	.	gene_id "CCR7_NM_001032884"; gene_name "CCR7_NM_001032884"; transcript_id = "CCR7_NM_001032884"; gene_biotype = "protein_coding";
CD3D_XM_015115817	nimble	gene	1	2264	.	+	.	gene_id "CD3D_XM_015115817"; gene_name "CD3D_XM_015115817"; gene_biotype = "protein_coding";
CD3D_XM_015115817	nimble	transcript	1	2264	.	+	.	gene_id "CD3D_XM_015115817"; gene_name "CD3D_XM_015115817"; transcript_id = "CD3D_XM_015115817"; gene_biotype = "protein_coding";
CD3D_XM_015115818	nimble	gene	1	2249	.	+	.	gene_id "CD3D_XM_015115818"; gene_name "CD3D_XM_015115818"; gene_biotype = "protein_coding";
CD3D_XM_015115818	nimble	transcript	1	2249	.	+	.	gene_id "CD3D_XM_015115818"; gene_name "CD3D_XM_015115818"; transcript_id = "CD3D_XM_015115818"; gene_biotype = "protein_coding";
CD3E_XM_015115816	nimble	gene	1	1549	.	+	.	gene_id "CD3E_XM_015115816"; gene_name "CD3E_XM_015115816"; gene_biotype = "protein_coding";
CD3E_XM_015115816	nimble	transcript	1	1549	.	+	.	gene_id "CD3E_XM_015115816"; gene_name "CD3E_XM_015115816"; transcript_id = "CD3E_XM_015115816"; gene_biotype = "protein_coding";
CD3E_XM_028834033	nimble	gene	1	1258	.	+	.	gene_id "CD3E_XM_028834033"; gene_name "CD3E_XM_028834033"; gene_biotype = "protein_coding";
CD3E_XM_028834033	nimble	transcript	1	1258	.	+	.	gene_id "CD3E_XM_028834033"; gene_name "CD3E_XM_028834033"; transcript_id = "CD3E_XM_028834033"; gene_biotype = "protein_coding";

@Yenaled
Copy link
Collaborator

Yenaled commented Apr 4, 2024

OK, the latest version does not support BAM files. The last version to support BAM files is version 0.48.0.

@bbimber
Copy link
Author

bbimber commented Apr 4, 2024

OK, the latest version does not support BAM files. The last version to support BAM files is version 0.48.0.

Got it. I'm running 0.48.0 to try it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants