Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Competitive mapping for strain separation #21

Closed
EisenRa opened this issue Jul 18, 2018 · 4 comments
Closed

Competitive mapping for strain separation #21

EisenRa opened this issue Jul 18, 2018 · 4 comments

Comments

@EisenRa
Copy link

EisenRa commented Jul 18, 2018

In some metagenomic contexts, there can be closely related species in a sample that can make read mapping to a single reference genome difficult (e.g. cross-mapping of reads between species). In this situation, it can be useful to employ competitive mapping, whereby reference genomes from closely related species are concatenated (in a multifasta), and the reads mapped to this reference. This can allow for the mapping quality filter to filter out reads that would cross-map between species.

EAGER could allow users to select a folder, or a list of reference fastas and concatenate them into a multifasta prior to read mapping. Alternatively, the user could provide their pre-concatenated multifasta file.

Regarding the output of mapping stats, the concatenated BAM file would have to be split using bamtools prior to generating stats.

@apeltzer apeltzer added good first issue Good for newcomers feature labels Jul 18, 2018
@apeltzer
Copy link
Member

I guess this goes the same way as #5 , with the minimal adjustment of making these available as multiFastA files. I guess that should be feasible in that scope too.

@ewels ewels added this to EAGER2 in hackathon-scilifelab-2018 Aug 2, 2018
@apeltzer apeltzer added this to the V2.0 "Gray Wolf" milestone Aug 15, 2018
@jfy133
Copy link
Member

jfy133 commented Mar 1, 2019

After doing something similar in another situation, I agree @EisenRa 's suggestion of making a multiFasta file reference file and mapping to that is probably the way forward, rather than having the multiple channel thing mentioned in #5. One might not even have to split the bamfile, (unless we have an option of spitting out a particular genome from the bam file), as most tools are human-focused that report per-chromosome statistics.

@jfy133 jfy133 modified the milestones: V2.1 "Ulm", V2.2 "Wangen" Oct 5, 2019
jfy133 added a commit that referenced this issue Dec 4, 2019
@ewels ewels added this to Beginner tasks in hackathon-scilifelab-2019 Dec 5, 2019
@jfy133 jfy133 changed the title Competitive mapping [Feature] Competitive mapping for strain separation Dec 28, 2020
@jfy133
Copy link
Member

jfy133 commented Apr 13, 2021

This could also be of interest but outside of the metagenomics stuff, for improving non-human mappings: https://bmcgenomics.biomedcentral.com/articles/10.1186/s12864-020-07229-y

@jfy133
Copy link
Member

jfy133 commented Sep 6, 2022

Going to close in favour of #878 which is more requested and arguably more powerful.

Like above, extracting stats with e.g. bed tools (already supported) should be sufficient in many cases.

@jfy133 jfy133 closed this as completed Sep 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature good first issue Good for newcomers
Projects
No open projects
Development

No branches or pull requests

3 participants