Skip to content

Layering annotations onto classifications

cmorganl edited this page Apr 27, 2021 · 3 revisions

Overview

This tutorial is to show the usefulness, and sometimes necessity, of treesapp layer to improve the classifications of TreeSAPP beyond functional and taxonomic annotations. With this module, users are able to use reference packages to add anything from a taxonomic guild to metabolic information, or really any aspect of a gene that is phylogenetically conserved, to the classifications from treesapp assign.

Supported versions >=0.11.0


Ingredients

To use treesapp layer, you will need outputs created by treesapp assign and the reference packages used. If you have TreeSAPP cloned from GitHub somewhere, you can probably use a command like this to generate outputs we can layer.

treesapp assign -i ~/bin/TreeSAPP/test_data/marker_test_suite.faa \
-o marker_test/ \
-n 2 -t McrA,DsrAB --trim_align

The reference packages should have their feature_annotations attribute populated by treesapp package edit. These features stored in a reference package provide a map between the reference package's phylogeny's leaves and a phenotype. The feature annotation a query sequences will be assigned is, like the taxonomic assignment, based on the position in the phylogenetic tree (i.e. placement edge) a query sequence was placed.

Usage

We have classified McrA and DsrAB sequences found in the file 'marker_test_suite.faa', so the next step is to add the metabolic annotations for McrA and the gene annotations for DsrAB.

The DsrAB reference package was created from seed sequences in the FunGene database for DsrA and DsrB, together in a single phylogeny. This works because the two subunits that form the holoenzyme necessary for dissimilatory sulfite reduction are paralogs.

If the reference packages installed with TreeSAPP were used for sequence classification then treesapp layer only requires the path to the output directory made by treesapp assign.

If reference packages in a different directory were used, however, the path to this directory must be provided via '--refpkg_dir'. For example, if ~/treesapp_refpkgs/ contained the .pkl files used by treesapp assign:

treesapp layer --treesapp_output marker_test/ --refpkg_dir ~/treesapp_refpkgs/

TreeSAPP would map all the features annotated from all reference packages in ~/treesapp_refpkgs/ to the classified query sequences. The new feature annotations produced by treesapp layer are saved to a new file within the final_outputs/ directory called 'layered_classifications.tsv'. In the above example, the layered classifications would be in marker_test/final_outputs/layered_classifications.tsv.