Interpretation of reported abundance table #10

donovan-h-parks · 2020-01-28T17:16:02Z

Hi. I'm running MetaCache query with the -abundaces profile.tsv and -abundance-per species flags. It appears this writes two profiling results to profile.tsv: the full taxon profile and a species profile. However, these profiles do not appear to agree. For example, the taxon profile reports Streptococcus dysgalactiae subsp. equisimilis at 1.15% and no other results for S. dysgalactiae. The species profile reports S. dysgalactiae at 1.93%. Why is there a discrepancy?

Relevant lines from profile.tsv:

# query summary: number of queries mapped per taxon
# rank:name | taxid | number of reads | abundance
...
subspecies:Streptococcus dysgalactiae subsp. equisimilis	119602	80740	1.15343%
...
# estimated abundance (number of queries) per species
# rank:name | taxid | number of reads | abundance
...
species:Streptococcus dysgalactiae	1334	135610	1.93728%
...

Is the best prediction of the abundance of S. dysgalactiae by MetaCache 1.15% or 1.93%?

Thanks,
Donovan

The text was updated successfully, but these errors were encountered:

muellan · 2020-01-28T17:33:11Z

The first table represents the raw abundances based on the read mapping.

The second table shows the estimated abundance on a specific taxonomic rank. This works as follows (will be described in our upcoming paper about food ingredient detection):

For each taxon in the dataset we count the number of reads assigned to it. Taxa on lower levels than the requested taxonomic rank are pruned and their read counts are added to their respective parents, while reads from taxa on higher levels are distributed among their children in proportion to the weights of the sub-trees rooted at each child. After the redistribution the estimated number of reads and abundance percentages are returned as outputs.

I will also add more detailed explanation to the Markdown documentation of the output options.

muellan · 2020-01-28T17:33:29Z

So the best prediction would be the second table.

donovan-h-parks · 2020-01-28T20:54:34Z

Thanks for the quick response. Very helpful.

tothuhien · 2023-12-02T21:54:11Z

Hi, could I ask in this thread again about 1 detail in the way of redistribution reads. How do you define the weight of each sub-tree in the taxonomic tree? Thanks

Funatiq · 2023-12-03T08:12:54Z

The weight is the number of reads mapped to taxa in the sub-tree.

tothuhien · 2023-12-03T13:13:04Z

Thank you for your prompt response!

donovan-h-parks closed this as completed Jan 28, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Interpretation of reported abundance table #10

Interpretation of reported abundance table #10

donovan-h-parks commented Jan 28, 2020

muellan commented Jan 28, 2020

muellan commented Jan 28, 2020

donovan-h-parks commented Jan 28, 2020

tothuhien commented Dec 2, 2023

Funatiq commented Dec 3, 2023

tothuhien commented Dec 3, 2023

Interpretation of reported abundance table #10

Interpretation of reported abundance table #10

Comments

donovan-h-parks commented Jan 28, 2020

muellan commented Jan 28, 2020

muellan commented Jan 28, 2020

donovan-h-parks commented Jan 28, 2020

tothuhien commented Dec 2, 2023

Funatiq commented Dec 3, 2023

tothuhien commented Dec 3, 2023