Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trouble with uniquely mapped reads #43

Closed
nafeesatk opened this issue Feb 23, 2017 · 3 comments
Closed

Trouble with uniquely mapped reads #43

nafeesatk opened this issue Feb 23, 2017 · 3 comments

Comments

@nafeesatk
Copy link

I am trying to just learn how to use the tool and am having issues with the first step (species taxonomy) when running samples downloaded from NCBI. I ran the test script and went through the tutorial without issue. I downloaded one of the sequences from the Hadza microbiome study and tried to run MIDAS with that. At the first step I did not get any uniquely mapped reads and therefore can't seem to get the species taxonomy and abundance for the sample. I've run in to this with other sequences I've tried as well. Here is the result I got:

sbrgxps@sbrgxps-XPS-8910:~/MIDAS$ run_midas.py species SRR1927149 -1 ../rampelliseqs/SRR1927149.fq

MIDAS: Metagenomic Intra-species Diversity Analysis System
version 1.2.1; github.com/snayfach/MIDAS
Copyright (C) 2015-2016 Stephen Nayfach
Freely distributed under the GNU General Public License (GPLv3)

===========Parameters===========
Command: /home/sbrgxps/MIDAS/scripts/run_midas.py species SRR1927149 -1 ../rampelliseqs/SRR1927149.fq
Script: run_midas.py species
Database: /home/sbrgxps/MIDAS/midas_db_v1.2
Output directory: SRR1927149
Input reads (1st mate): ../rampelliseqs/SRR1927149.fq
Input reads (2nd mate): None
Remove temporary files: False
Word size for database search: 28
Minimum mapping alignment coverage: 0.75
Number of reads to use from input: use all
Number of threads for database search: 1

Aligning reads to marker-genes database
17.92 minutes
0.73 Gb maximum memory

Classifying reads
total alignments: 25486
uniquely mapped reads: 0
ambiguously mapped reads: 0
0.0 minutes
0.74 Gb maximum memory

Estimating species abundance
total marker-gene coverage: 0.0
0.0 minutes
0.74 Gb maximum memory

Did I install something incorrectly or is there an issue with how I am running the scripts? I am new to using metagenomics analysis tools so I wasn't sure where to start with troubleshooting.

Thank you!

@snayfach
Copy link
Owner

Hi nafeesatk,

Everything you've done looks correct. However, I've run that same sample through MIDAS and get several hundred uniquely reads mapped to the marker genes.

I think the easiest solution would be for me to recreate the issue on my system using your FASTQ file. If you send me your email address, I can send a dropbox link where you can upload your FASTQ.

Thanks,
Stephen

@nafeesatk
Copy link
Author

Hi Stephen,

Thank you so much for your help! My email is ntkhan@eng.ucsd.edu.

Best,

Nafeesa

@snayfach
Copy link
Owner

Hi Nafeesa,

I see the issue now. It appears you have merged read1 and read2 from SRR1927149_1.fastq and SRR1927149_2.fastq.

For most Illumina libraries read1 and read2 occur about 350 bp apart. So you won't get an alignment that covers the entire read, because in reality, there is a big gap between them. MIDAS sees that the alignments only cover a portion of the read and is therefore throwing them out.

I would strongly recommend running MIDAS without merging your reads. But if this is not possible, you can try decreasing the value for --aln_cov. The default if 0.7, but you can try dropping it to 0.35.

Best,
Stephen

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants