Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check for and root tree on import #235

Closed
Mikeyj opened this issue Aug 13, 2013 · 6 comments
Closed

Check for and root tree on import #235

Mikeyj opened this issue Aug 13, 2013 · 6 comments

Comments

@Mikeyj
Copy link

Mikeyj commented Aug 13, 2013

Unifrac requires a rooted tree for calculation. FastUnifrac online selects an arbitrary root when an unrooted tree is uploaded.

The output FastTree in QIIME is unrooted as default and the implementation of unifrac in Picante doesn't select an arbitrary root, so if this tree is used unifrac fails in phyloseq with the following message:

distance(mydata.biom, method = "unifrac", type = "samples")
Error in fastUniFrac(physeq, weighted, normalized, parallel) : 
  Rooted phylogeny required for UniFrac calculation

Prior to importing a tree to phyloseq, therefore need to run in QIIME:

make_phylogeny.py -i anoise_out_rev_primer_truncated_filtered_rep_set_aligned.fasta -r midpoint -o rooted.tre

Might be worth adding to documentation?

Wondered whether this could be checked on import of a tree, and warning given that unifrac won't work as supplied tree unrooted?

And, the solution requiring the most effort, midpoint rooting in R on import? Thought this would be available in ape::root, but doesn't seem to be. Came across the Phangorn package, but that would add another dependency.

@ghost ghost assigned joey711 Aug 13, 2013
@joey711
Copy link
Owner

joey711 commented Aug 14, 2013

The documentation part I can see. Maybe some additional automated documentation, too. A notification on import that the tree is unrooted. UniFrac is not the only reason to keep the tree handy, and not every other method requires a rooted tree. Maybe beefing-up the UniFrac error message to better point users to a solution is a good approach.

Just to be clear, phyloseq is not a QIIME-specific R package, and I'm not that keen on adding suggestions for QIIME commands to the phyloseq doc. I am more likely to add additional documentation to help users find the relevant R commands for rooting a tree. I believe there are already examples of this in the phyloseq::UniFrac doc, for example.

@Mikeyj
Copy link
Author

Mikeyj commented Aug 15, 2013

Cool, thanks, that will help. It gets me every time. Take your point about QIIME specificity. The tree is from FastTree though, which will be a common method for generating trees for this number of sequences beyond its use in QIIME.

@joey711
Copy link
Owner

joey711 commented Oct 14, 2013

I just realized that I should have posted at least the in-R solution to this, using a command or two from the ape package. The following is more completely explained, borrowed from a previous, long and detailed closed issue, Issue #167

myPhyseqObj = import_biom("path/to/my/file.biom", "path/to/my/tree.tre")
is.rooted(phy_tree(myPhyseqObj)) 
# FALSE

Uh-oh. It is FALSE. The tree is not rooted. Let's root it, in-place, using the phy_tree<- assignment operator, and the root function from the ape package. This randomly selects an OTU to set as root. You may want to make a more informed choice.

require("ape")
phy_tree(myPhyseqObj) <- root(phy_tre(myPhyseqObj), sample(taxa_names(myPhyseqObj), 1), resolve.root = TRUE)

Now double check

is.rooted(phy_tree(myPhyseqObj)) 
# FALSE

I should probably wrap this as a default behavior, with warning so the user knows that a random OTU is being used as root, in case they want to select one manually.

@Mikeyj
Copy link
Author

Mikeyj commented Oct 17, 2013

Hi Joey,

I tried this in R and got the same result, note that the output of is.rooted is still FALSE despite rerooting. I couldn't work out why so did this with the make_phylogeny command in QIIME instead.

Have also had issues with the FastTree produced tree with Cophenetic in R that I haven't resolved (required for generating MPD and MND phylogenetic community measures). May or may not be related.

@joey711
Copy link
Owner

joey711 commented Oct 17, 2013

Thanks for the update. I missed the is.rooted result, too. Woops! I'll figure it out and post the solution.

@joey711
Copy link
Owner

joey711 commented Oct 29, 2013

This is related to Issue 255, automatically fix empty (NA) branch-length values. These should probably be rolled-up in one tree-parsing update.

joey711 added a commit that referenced this issue Nov 5, 2013
CHANGES IN VERSION 1.7.7
-------------------------

USER-VISIBLE CHANGES

	- Tree fixes:
		#235
		#255

	- If a tree has NA branch-length values, they are automatically set to
0.
		This occurs within both phyloseq(), and read_tree().

	- UniFrac calculations require a rooted tree. While a rooted tree is
		not required to be part of a phyloseq object, it is a helpful
		default behavior to select a random root when UniFrac is called
		and the tree is unrooted, flashing a notice to the user.

	- Precise import from ape-package, rather than full-import.
		Smaller chance for collisions.
		Precisely-defined dependencies listed in NAMESPACE

	- As a result of the previous, phyloseq defines a placeholder "phylo"
class,
		extended from "list". This seems to match the class
		from a full import of ape, and is necessary since ape does not
		export the "phylo" class.
@joey711 joey711 closed this as completed Nov 5, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants