Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

compatibility with bracken #2

Closed
ajaybabu27 opened this issue May 3, 2017 · 5 comments
Closed

compatibility with bracken #2

ajaybabu27 opened this issue May 3, 2017 · 5 comments

Comments

@ajaybabu27
Copy link

Will this tool still be compatible with the kreport output from bracken ?

@smdabdoub
Copy link
Owner

smdabdoub commented May 9, 2017

Yes. The 5/31/2016 update on the Bracken site says:

This release adds another output file to the est_abundance.py script. The new output file (with a _bracken.report extension) is in the same style as the Kraken report file, with reads distributed to the desired level.

kraken-biom expects Kraken report files, so there should be no problem using Bracken output with kraken-biom.

But please let me know if you have any issues with it.

@ajaybabu27
Copy link
Author

Thanks for getting back to me. I compared the report file from Bracken and the biom files generated by kraken-biom tool. There seems to be a large issue here.

Bracken redistributes reads from higher to lower taxon levels and thus the report files it generates contain "zero" reads directly assigned to taxon levels above species (S). This would be the third column in the report file. The second column now contains a cumulative sum of the reads from S to D.

I am guessing that in order to allow your code to process Bracken output you would have to process the second column instead of the third column as it is done in the Kraken output.

I have pasted a sample for comparison of both kraken and bracken outputs:

Kraken Output:

26.12 9918653 9918653 U 0 unclassified
73.88 28059325 1182 - 1 root
73.87 28055310 98 - 131567 cellular organisms
73.75 28007044 8500 D 2 Bacteria
68.22 25907794 15534 - 1783272 Terrabacteria group
37.35 14185842 53535 P 1239 Firmicutes
36.70 13937530 58455 C 91061 Bacilli
36.45 13842775 70723 O 186826 Lactobacillales
33.49 12719109 40070 F 81852 Enterococcaceae
33.37 12674215 236758 G 1350 Enterococcus
32.47 12331937 10495618 S 1352 Enterococcus faecium

Bracken Output:

26.12 9918653 9918653 U 0 unclassified
73.88 28059300 0 R 1 root
73.88 28056420 0 R1 131567 cellular organisms
73.75 28008249 0 D 2 Bacteria
68.24 25917279 0 D1 1783272 Terrabacteria group
37.39 14200770 0 P 1239 Firmicutes
36.88 14004816 0 C 91061 Bacilli
36.78 13967720 0 O 186826 Lactobacillales
34.00 12911294 0 F 81852 Enterococcaceae
33.98 12906404 0 G 1350 Enterococcus
33.70 12798327 12798327 S 1352 Enterococcus faecium

@smdabdoub smdabdoub reopened this May 10, 2017
@smdabdoub
Copy link
Owner

The third column still contains the reads assigned directly to species, which after running Bracken are the only assigned reads. So the only change you should need to make when running kraken-biom would be to set --max to S. That way both min and max are set to Species and only the species-assigned reads will be extracted from the report file and written to the BIOM table.

@ajaybabu27
Copy link
Author

ajaybabu27 commented May 11, 2017

That sounds good to me. Thanks for taking your time to review it.

By the way, I get the following error when I use --max S argument:

ERROR: Max and Min ranks are out of order: S < S

The default command should still work fine without the --max argument, since taxons with zero values across all samples are discarded by kraken-biom.

Cheers,
Ajay.

@smdabdoub
Copy link
Owner

smdabdoub commented May 11, 2017

Happy to help. And thanks for passing along the error message.

Setting both to S is actually a part of the test suite for the parsing, but I apparently did't test it before it gets there. It looks like I have a >= where I should just have a > where I'm sanity-checking the command-line parameters. Should be an easy fix, I'll open an issue to remind myself.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants