-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Metatranscriptomics #200
Comments
Not really, we routinely use megahit due to its lower memory consumption, but we've gotten good results with SPAdes too. Personally, I don't have a formed opinion on which of the two might be better (it seems that the rnaSPAdes mode can be interesting in your case but I have no experience using it).
Maybe you can. The |
Thank you for your quick response! I've tried the option you proposed, but since in the 01.run_assembly_merged.pl script is specified that the --meta option will be used (as shown in the line below, line 128) at any time, there is a conflict.
This leads to the following error. == Error == you cannot simultaneously use more than one mode out of Metagenomic, Large genome, Illumina TruSeq, RNA-Seq, Plasmid, and Single-cell! I assume this can be solved by manually adjusting the file to use --rna instead of --meta, but I don't have the required admin permissions for now. |
I have managed to adjust the SqueezeMeta.pl and 01.run_assembly_merged.pl in a way that I have a functioning pipeline which uses rnaSPAdes. If you would be interested, I asked about the specifics on the different SPAdes modes on their forum. |
Thanks for the heads up! Let us know if this works for you |
Hi all, Since it's been a long time and I see multiple topics pointing to this one, I will give an update on the results so far. Manually changing the --meta to --rna works and assuming that there is enough RAM to run the more demanding SPAdes, this works. Take in mind, in the newer versions of SPAdes, the algorithm for rnaSPAdes has been changed in the newer versions, which makes it more memory-efficient. Discussion on the SPAdes forum. Using this newer version was necessary in my case, since for some samples it was even impossible to assemble even one sample because of memory issues. rnaSPAdes seems to give a coverage percentage quite similar to the normal SPAdes, I am not sure yet what the advantage of using this compared to SPAdes would be, although it is the recommended option by the developers themselves. That said, it was impossible to merge too many samples at once, even in seqmerged mode my estimates were that only the merging step would take 3 months. I therefore chose to cut my samples in three groups and run three 12 sample runs. One could argue that the (yet unknown) advantage that (rna)SPAdes might give in coverage, does not weigh against the better use of resources and time by MEGAHIT, and more correct statistics one can derive from that. That's it for now, I will let you know any further updates. |
Thank you @seppedm! We will include the new version of SPAdes in SqueezeMeta by default. Best, |
Hi @jtamames do you know when you are likely to release an updated squeezemeta with the new version of Spades? |
Hi, it's definitely on the cards. I'd have to find some time to integrate/test, which won't happen for at least the next few weeks. However i'd say that if you edit the |
I wouldn't need to make any alterations to the new spades.py? |
I don't think so. Spades has only minor alterations so it can find the library location within the SqueezeMeta directory structure. A different version should find its own libraries just fine. |
Hi @fpusan @jtamames. I ran my assembly using spades with the --rna tag. As you suggested I changed the path to the latest version and the assembly seems to have worked, however the run failed transitioning to step 2. I've attached the syslog and spadeslog file below: When I try to restart at step 2 I get this error:
|
Hello
Looks like the contigs file is missing. Another user has the same error, I will check it asap. |
Hi, this is the output:
|
Also if it helps, this is the contents of the data/spades directory:
The results/ directory is empty |
Ok, I get it now... When assemblying rna, spades names the resulting assembly as Just copy the
and restart. That should do it. Best, |
Thanks @jtamames I'll give it a go. Also, when would you use the - - singletons tag when running squeezemeta? I'm not sure I understand correctly, does that essentially include unmapped reads in the annotation steps? |
Hello again I think it is better if, instead of restarting, you create a new project and provide this transcript.fasta assembly as external assembly using the -extassembly option. The reason is that step 01 also generates an auxiliary file 01.project.lon containing the length of the contigs. If you just copy the file and restart, that file will be missing and will crash the run later on. Best, |
Regarding --singletons, I would use it when: 1) You want to be sure not to lose anything in the assembly, or 2) When the mapping percentage is too low and most reads did not make it into the assembly. But this will come at the cost of having myriads of short contigs that will probably burden your analysis |
OK I will try the external assembly option. Can I include the singletons flag when using an external assembly? |
And sorry one more question. What is the advantage/disadvantage of running an assembly with --singletons vs running squeezemeta in the reads_only mode? |
Singletons flag: yes |
Thanks for that. I tried re running as you suggested with the external assembly and received this error:
|
I tried starting another run using megahit with singletons but I am getting the same error:
|
Hello Silly mistake, I am sorry. Change the following lines in 01.remap.pl Line 96: And after line 28: That should do it. |
Hi @jtamames I've made the changes and squeezemeta has progressed further, I'll update how the run goes with the external RNA assembly and singletons flag |
Hi @jtamames, this run seemed to stop at step 2 with the following error:
|
This points to a bad installation of the databases. Did you move them to a different folder? |
Yep I figured as much. I've decided just to do a clean install of the latest squeezemeta version and databases |
If that doesn´t work or need further advice, please open a new issue. We must be spamming the poor @seppedm with alert messages not related to his original issue. |
Here a final update, for future reference. It seems that when doing transcriptomics, a very large number of unassigned reads is very common if I see other posts. Below two graphs. The first one showing the results for my run (rnaSPAdes), the second one a sqm_reads.pl run (compare T12 and T117). Indeed, more reads are assigned taxonomically in the sqm_reads.pl, but still the large majority is unassigned. It thus seems that the lack of assignment is mainly a database issue when doing (soil) metatranscriptomics, not something related to the assemblies. I can only assume this means there is a lot of science left to be done!!! When looking at functional assignment, this is significantly different for sqm_reads.pl vs assemblies, mostly in the COG assignment, where there is an overabundance of hypothetical proteins. Here the assembly method seems to have the advantage for more accurate function predictions. |
Hello! Best, |
Dear developers,
Our group is working on a soil metatranscriptomic project with a quite large amount of samples. We plan to use the merged mode. Specifically for this kind of project, is there one of the assemblers that you would recommend over the other (first of all with regards to quality) and why?
Second question: It seems it is not possible to run SPAdes in the rnaSPAdes modus, is that correct? The rnaSPAdes manual specifically recommends using rnaSPAdes for metatranscriptomic analyses. Do you expect this will have a big impact on the analysis?
Thanks in advance!!
The text was updated successfully, but these errors were encountered: