Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

miRNA annotation with seqcluster from BAM file #27

Closed
svm-zhang opened this issue Dec 28, 2017 · 12 comments
Closed

miRNA annotation with seqcluster from BAM file #27

svm-zhang opened this issue Dec 28, 2017 · 12 comments

Comments

@svm-zhang
Copy link

Hello @lpantano,

I am a bit new to miRNA annotation, and I was trying to run seqbuster subcommand seqbuster on my BAM that was aligned by STAR. My command line was pretty standard following the doc:

/bcbio_anaconda_bin/seqcluster seqbuster --hairpin [hairpin.hsa.fa] --mirna [mirna.str] -sps hsa -o test [MY_BAM]

The error I got is "global name os is not defined" in the function _sam_to_bam.py within the init.py file .

So I went into that file and found you imported os.path as op. Simply changing this part didn't fix the problem. I further looked into this function, and the way I understand it is that if the input is with a "sam" suffix it will convert it to a "bam". But why in the if statement, it evaluates whether it ends with "bam"? I updated that to "sam", and this part passed. However, there was a new error from pysam.sort(). Please see the attached file. I feel it might be caused by pysam version?
error.txt

I also came across another error when I did not specify --hairpin and --mirna options, the program tried to download the two files, but ended with complaining about "gzip !$.gz". This can be avoided by downloading hairpin.fa and miRNA.str myself. I think you might want to know.

So how can I fix the first problem? I also saw the a GTF option, do I need to prepare that file? If so, any suggestion on how should I prepare it? Thanks very much in advance for your help.

happy holidays!

Simo

@lpantano
Copy link
Owner

lpantano commented Dec 30, 2017 via email

@svm-zhang
Copy link
Author

Hello @lpantano,

Thanks very much for the quick response.

  1. Just to confirm how I should install mirtop with my current bcbio. I first git clone the dev branch mirtop, and then put it in the bcbio installation dir, and install from there? Could you please give more specific instructions?

  2. miRBase only provides a GFF3 file, should I use this as value for the GTF option?

  3. As for my downstream analyses, I'd like to quantify miRNA expression and do differential expression across my samples.

Your tips would be very helpful. Thanks in advance. Happy holiday!

cheers,
Simo

@lpantano
Copy link
Owner

lpantano commented Dec 30, 2017 via email

@svm-zhang
Copy link
Author

Hello @lpantano,

Thanks very much for your follow up on my problems.

For the last several days, I have been using the step-by-step pipeline as you mentioned at http://seqcluster.readthedocs.io/mirna_annotation.html#processing-of-reads. Let me give you a sense of the sample I am using. It is a sample contains only one miRNA (aly-MIR163). I have 8,356,240 reads after trimming adapter and polyA, and I aligned them using miraligner. Below is the result:

Number of reads to be mapped: 8356240
Searching in precursors
Tue Jan 02 11:40:05 PST 2018
Num reads annotated: 13488

The command line I used is:

java -jar miraligner.jar -db [path to alyrata DB] -sub 1 -trim 3 -add 3 -s aly -o [out prefix] -i [fastq file]

As you can see, there are lots of reads that did not find miRNA 163 hits, if I understand correctly here. If I simply grep the miRNA 163 sequence GAAGAGGACUUGGAACUCGAUC in the same fastq file and allow for 1 mismatch, there are about 7.9 million reads have it (https://github.com/Wikinaut/agrep).

So my first question is how can I recover those reads unmapped by miraligner? And now I have the [prefix].mirna file. What should I do next to get the count file?

Last, I am not quite familiar with bcbio_nextgen. Could you please give me a head start on how to run it? I found its small RNA pipeline, http://bcbio-nextgen.readthedocs.io/en/latest/contents/pipelines.html#smallrna-seq. How should I prepare the config file? A example config file would be really helpful.

Apologize that I have so many questions. Thanks again for your help.

cheers,
Simo

@lpantano
Copy link
Owner

lpantano commented Jan 4, 2018 via email

@svm-zhang
Copy link
Author

Hello @lpantano,

I used cutadapt to do adapter-trimming:

--overlap 10 --minimum-length 16 --nextseq-trim 20 --trim-n --max-n 10 -a [3' ADAPTER] -a A{8}N{30} -g [5' ADAPTER] --info-file [info-file] -o [output fastq] [input fastq]

The number I gave you was from this non-collapsed fastq.

I did collapse the adapter-trimmed reads:

seqcluster collapse -f [adapter-trimmed fq] -o [collapsed fq]

And run miraligner on the collapsed fastq using miraligner with -freq option, and I got similar results:

Number of reads to be mapped: 34922
Searching in precursors
Num reads annotated: 336

If you have some insights what are reasons here, please let me know.

At the same time, I will run the bcbio pipeline. Thanks very much for all the links. Very helpful!

cheers,
Simo

@svm-zhang
Copy link
Author

Hello @lpantano,

How can I prepare my A. lyrata genome, transcriptome, and miRBase data? I found this tutorial very confusing. I was trying bcbio_setup_genome.py and it complains sam_fa_indices.loc file. What format that file should be? I guess I also need bwa, bowtie2, and star loc files, if I want to build their indices. If this is too much trouble to set up, I think I would stick to the step-by-step version.

Thanks for any help.

Simo

@lpantano
Copy link
Owner

lpantano commented Jan 6, 2018 via email

@lpantano
Copy link
Owner

lpantano commented Jan 6, 2018 via email

@svm-zhang
Copy link
Author

Hello @lpantano,

Thanks again.

It looks like those collapsed reads who have exact match with the sequence of miRNA 163 have residual nucleotides at either or both ends. But adapter trimming can never be perfect, and in the real world this is an usual case, no? Does miraligner has some functionality like soft-clipping? Also, is it possible that miraligner could allow more than 1 mismatches, although it could cause false alignments?

Simo

@lpantano
Copy link
Owner

lpantano commented Jan 7, 2018 via email

@lpantano
Copy link
Owner

lpantano commented Jan 9, 2018 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants