unmapped reads in fastq/fasta format and pair-end data #32

YiweiNiu · 2018-03-23T06:26:14Z

Hello,

I want to try imrep with my RNA-seq data. I've aligned the fastq to genome with STAR, and saved the unmapped reads in fastq format (using --outReadsUnmapped Fastx). As you said, STAR produces partially-mapped reads. In such case, can I feed imrep with the BAM file together with unmapped fastq?

And another question, it seems imrep only accepts single fastq file as input when user want to use --digGold and -a options. Should I just cat two fastq? Or imrep just works for single-end data in such case?

Thanks!

Yiwei Niu

The text was updated successfully, but these errors were encountered:

smangul1 · 2018-03-23T19:01:17Z

Hi Yiwei,

You are right that STAR produces partially-mapped reads, which are potentially T or B receptor reads. I would recommend saving the bam file with both mapped and unmapped reads. You can consider using our script for STAR with tuned parameters which will produce BAM file with both mapped and unmapped reads : https://github.com/smangul1/recycle.RNA.seq/blob/master/benchmark_RNASeq_aligners/code/run.STAR.tuned.sh

Please let me know if it works. Serghei

YiweiNiu · 2018-03-24T01:41:22Z

Hi, Serghei. Thank you for your quick reply!

Sorry I didn't make myself clear.

I've already done the alignment using STAR and saved unmapped reads to FASTQ files. I don't want to do the alignment job again now, because I have many samples.

To my understanding, ImRep now is designed to handle two cases:

When you have saved mapped and unmapped reads in one BAM file, ImRep can accept one BAM as input.
When you forgot to save unmapped reads, ImRep can accept BAM file with mapped reads and all raw FASTQ files as input.

So the first case doesn't suit me, I have to follow the second.

And my questions are:

Can I run ImRep with the BAM generated by STAR? I did't save unmapped reads in the BAM, but I saved them in separated FATSQ files.
It seems ImRep only accepts one single FATSQ as input, should I cat two FASTQ of pair-end data? Or it just works for single-end data?

Thank you for your help!

Bests,
Yiwei Niu

smangul1 · 2018-03-24T02:07:32Z

Hi Yiwei,

Please merge PE into one file. Also to use --digGold, you need to provide original reads, not the unmapped reads. Please let me know how it goes. If this doesn't' work for you, we can implement the option to allow to supply bam with mapped and FASTQ with unmapped (this is on our TODO list anyway). Thanks, Serghei

YiweiNiu · 2018-03-24T02:12:56Z

OK! Thank you for your help!

Also looking forward to the new version of ImRep.

BTW, the link https://github.com/smangul1/Profiling-adaptive-immune-repertoires-across-multiple-humantissues-by-RNA-Sequencing.ImRePconsistentlyoutperformedexistingmethods in your paper is broken.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

unmapped reads in fastq/fasta format and pair-end data #32

unmapped reads in fastq/fasta format and pair-end data #32

YiweiNiu commented Mar 23, 2018 •

edited

smangul1 commented Mar 23, 2018

YiweiNiu commented Mar 24, 2018

smangul1 commented Mar 24, 2018

YiweiNiu commented Mar 24, 2018

unmapped reads in fastq/fasta format and pair-end data #32

unmapped reads in fastq/fasta format and pair-end data #32

Comments

YiweiNiu commented Mar 23, 2018 • edited

smangul1 commented Mar 23, 2018

YiweiNiu commented Mar 24, 2018

smangul1 commented Mar 24, 2018

YiweiNiu commented Mar 24, 2018

YiweiNiu commented Mar 23, 2018 •

edited