Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Interpretation of reported abundance table #10

Closed
donovan-h-parks opened this issue Jan 28, 2020 · 6 comments
Closed

Interpretation of reported abundance table #10

donovan-h-parks opened this issue Jan 28, 2020 · 6 comments

Comments

@donovan-h-parks
Copy link

Hi. I'm running MetaCache query with the -abundaces profile.tsv and -abundance-per species flags. It appears this writes two profiling results to profile.tsv: the full taxon profile and a species profile. However, these profiles do not appear to agree. For example, the taxon profile reports Streptococcus dysgalactiae subsp. equisimilis at 1.15% and no other results for S. dysgalactiae. The species profile reports S. dysgalactiae at 1.93%. Why is there a discrepancy?

Relevant lines from profile.tsv:

# query summary: number of queries mapped per taxon
# rank:name | taxid | number of reads | abundance
...
subspecies:Streptococcus dysgalactiae subsp. equisimilis	119602	80740	1.15343%
...
# estimated abundance (number of queries) per species
# rank:name | taxid | number of reads | abundance
...
species:Streptococcus dysgalactiae	1334	135610	1.93728%
...

Is the best prediction of the abundance of S. dysgalactiae by MetaCache 1.15% or 1.93%?

Thanks,
Donovan

@muellan
Copy link
Owner

muellan commented Jan 28, 2020

The first table represents the raw abundances based on the read mapping.

The second table shows the estimated abundance on a specific taxonomic rank. This works as follows (will be described in our upcoming paper about food ingredient detection):

For each taxon in the dataset we count the number of reads assigned to it. Taxa on lower levels than the requested taxonomic rank are pruned and their read counts are added to their respective parents, while reads from taxa on higher levels are distributed among their children in proportion to the weights of the sub-trees rooted at each child. After the redistribution the estimated number of reads and abundance percentages are returned as outputs.

I will also add more detailed explanation to the Markdown documentation of the output options.

@muellan
Copy link
Owner

muellan commented Jan 28, 2020

So the best prediction would be the second table.

@donovan-h-parks
Copy link
Author

Thanks for the quick response. Very helpful.

@tothuhien
Copy link

Hi, could I ask in this thread again about 1 detail in the way of redistribution reads. How do you define the weight of each sub-tree in the taxonomic tree? Thanks

@Funatiq
Copy link
Collaborator

Funatiq commented Dec 3, 2023

The weight is the number of reads mapped to taxa in the sub-tree.

@tothuhien
Copy link

Thank you for your prompt response!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants