Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is it possible to use Atlas in nanopore sequencing data? #273

Closed
maithemagalhaes opened this issue Jan 10, 2020 · 19 comments
Closed

Is it possible to use Atlas in nanopore sequencing data? #273

maithemagalhaes opened this issue Jan 10, 2020 · 19 comments
Labels

Comments

@maithemagalhaes
Copy link

No description provided.

@SilasK
Copy link
Member

SilasK commented Jan 10, 2020

Thank you for the question.
You can do a hybrid assembly (ilumina + nanopore) using spades.

I don't have experience of nanopore assembly alone.
Maybe we can adapt atlas to work also for long reads alone.

E.g. If you have MAG predicted you can use atlas to do the taxonomic and functional annotation.

@botellaflotante
Copy link

is it possible to run ATLAS with Illumina mate-pair reads?

@SilasK
Copy link
Member

SilasK commented Sep 22, 2020

Do you have only mate-pair libraries or in combination with normal-paired-end libraries?
If you have only mate-pair libraries, as I understand, they can be mapped as normal paired-end libraries.
It seems spades supports mate-pair libraries. So with a small adaption, Atlas could support mate-pair libraries.
However, I don't know if you have to do some special quality control before.

@botellaflotante
Copy link

I actually have a single end set of reads and also a small mate pair set of reads for the same sample, but I don't know if I can change the config file to take these... I guess ATLAS is normally used for paired.end, right?

@SilasK
Copy link
Member

SilasK commented Sep 24, 2020

Ok then you could do it the following:

Start Atlas with the single end read library atlas init .

Set spades_preset: normal in the config file to use normal spades.

Metaspades doesn't allow mate pairs nor single end reads, but I've heard that normal spades is almost as good as metaspades for metagenome assembly.

you can pass extra arguments to spades via the spades_extra keyword in the config file.
See the documentation of spades for how to do this: https://github.com/ablab/spades#input-data

e.g with matepairs this would be something like:

spades_extra: " --mp1-1 path/to/matepair_R1.fastq --mp1-2 path/to/matepair_R2.fastq"

@botellaflotante
Copy link

botellaflotante commented Sep 24, 2020

It worked fine mostly, although there is some issue in the maxbin step... (so the assembly and genecatalog worked fine but I have no bins and no MAGs). Not sure where to spot the exact problem though. I think this may be because it is a sample of plasmid enriched DNA from different bacteria and cannot reconstruct any genome..

@SilasK
Copy link
Member

SilasK commented Sep 25, 2020

Many users have encountered the problem that maxbin doesn't produce bins. Maybe the assembly is to complicated. Did metabat produce bins? Then you might just set final_binner: metabat

In your particular case, it might be due to the fact that only the SE reads are used for mapping and that you have less coverage for binning.

@botellaflotante
Copy link

yes, it worked. Thanks!

@rhysnewell
Copy link

Hi @SilasK,
I was just wondering if you could clarify how you specify both long and short-read inputs during atlas init there doesn't seem to be an option to differentiate between long and short-read input (atlas v2.8.2 conda-forge)
Cheers,
Rhys

@SilasK
Copy link
Member

SilasK commented Feb 16, 2022

Atlas can handle long reads + short reads. (see the docs)

But I'm thinking about developing something for long reads only, is that what you want?

@rhysnewell
Copy link

Cool, thanks nah I was looking for hybrid assembly options. Not long read independently

I guess I was hoping for a command-line option to specify my input reads. I've got a large amount of metagenomes to assemble that have both short and long reads, setting up config files for each of them is going to get tedious. That's okay, thanks for your response

@SilasK
Copy link
Member

SilasK commented Feb 17, 2022

Do your long read files contain the sample name in the filename?

@rhysnewell
Copy link

No, they are in a folder that contains the sample name though. Like so:
Interleaved illumina:

(base) n10853499:muffin$ ls ../../short_read/2017.12.04_18.45.54_sample_0/reads/
anonymous_reads.fq.gz  reads_mapping.tsv.gz

PacBio:

(base) n10853499:muffin$ ls ../../pacbio/2018.01.23_11.53.11_sample_0/reads/
anonymous_reads.fq.gz  reads_mapping.tsv.gz

@SilasK
Copy link
Member

SilasK commented Feb 21, 2022

Hey @rhysnewell I made a function to add the long reads to the atlas sample table.
This function should allow you to add the long reads to the sample table.

You might need to install

mamba install -y pathlib2

From within the atlas folder run
import_long_reads.py ../../pacbio

I run a test and it worked for me. However your sample names become something very long with dots in it. I suggest you to replace them with something simpler.

Try it out, if it works, I add it to the init function.

@AstrobioMike
Copy link

AstrobioMike commented Dec 12, 2022

hey there, @SilasK :)

thanks for your work here!

I came across someone looking for a workflow suitable for solely nanopore data and found my way to this issue. It looks like maybe this has stopped here for now due to a lack of need/priority so far, but just wanna check with 2 quick questions:

  1. Has taking solely long reads as input been integrated into the main program yet as discussed above?
  2. If yes, when solely nanopore or pacbio reads are provided, is an assembler (and i guess read-mapper too) used that is specifically designed for dealing with them and their potentially higher error rates (e.g. like flye has settings for for assembly)?

thanks!

@SilasK
Copy link
Member

SilasK commented Dec 13, 2022

@AstrobioMike It is not implemented in the main workflow, but I would be happy to help to make it happen.

As alternative for now I suggest MUFFIN.

@AstrobioMike
Copy link

Oh excellent, thanks for the note about muffin, I will pass that along 🙂

No specific pressure from me to integrate a long-read specific path here, especially since you pointed out muffin seems to already have a way

Actually I just looked a bit and it seems muffin might require short reads also. I have a question in to them making sure, but if that is the case, maybe there is still a niche to fill for a general workflow starting with long reads only and it might be worth it to add some things in here for that capability if you make the time/find the motivation

Thanks again!

@jmtsuji
Copy link
Contributor

jmtsuji commented Dec 14, 2022

Just to add a comment to this thread, I've been working on a snakemake workflow for microbial genome assembly/annotation using Nanopore data -- it can perform either a long-read only or a hybrid long/short read workflow. The basic framework could probably be adapted and added into ATLAS for long-read only metagenome assembly, if there were interest. See https://github.com/jmtsuji/rotary (still in development!)

Sorry for generally being slow to reply these days! Thanks again for all your work on ATLAS!

@github-actions
Copy link

github-actions bot commented Apr 6, 2023

There was no activity since some time. I hope your issue is solved in the mean time.
This issue will automatically close soon if no further activity occurs.

Thank you for your contributions.

@github-actions github-actions bot added the stale label Apr 6, 2023
@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Apr 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants