Attaching nexml metadata to phylo objects #19

cboettig · 2013-09-06T21:54:42Z

Crazy idea: when reading into a phylo, should we create a new RNeXML environment, store the full nexml tree in there, and add a new slot to the phylo object storing an unevaluated get("<unique_tree_id>", envir=RNeXML) that methods could use to access the full NeXML??

This would let us do something like:

tr <- nexml_read("tree.xml")
metadata(tr)

instead of

tr <- nexml_read("tree.xml", type="nexml")
metadata(tr)

That is, reading in as the default (ape) type, and still calling functions that need the full nexml metadata, while also having an ape tree object that can still be passed around to the usual R packages.

Or maybe that's stupid and asking for trouble, and we should be explicit about what type of object we want.

The text was updated successfully, but these errors were encountered:

sckott · 2013-09-06T22:16:15Z

@cboettig That seems reasonable at small scale, but with huge xml files then we would be putting a lot of data into the users workspace without them realizing it.

cboettig · 2013-10-07T21:48:39Z

In general it would be useful to read in a nexml object that could be passed directly to functions based on ape trees without requiring coercion and dropping of metadata.

I never understood why phylobase didn't do this -- but it appears that phylo4 objects do not inherit the phylo S3 class and cannot be passed to phylo functions without explicit coercion:

library(phylobase)
library(ape)
data(bird.orders)
bird.orders4 <- as(bird.orders, "phylo4") # make ape::phylo tree into phylobase::phylo4 S4 class
plot.phylo(bird.orders4) # attempting to use the S4 fails

Of course a plot function is defined for phylo4, but more interesting functions are not written for phylo4, so this is a huge handicap: consider:

 S <- c(10, 47, 69, 214, 161, 17, 355, 51, 56, 10, 39, 152,
             6, 143, 358, 103, 319, 23, 291, 313, 196, 1027, 5712)
bd.ext(bird.orders4, S)   # Fails again. Works with the S3 type

Anyway, it appears this problem can be solved using setOldClass. I've defined an the class phyloS4 which inherits all methods for the S3 phylo class without having to explicitly declare those methods. In this way, we have the benefits of an S4 class while maintaining compatibility with all developers who only write functions based on the S3 class. (as long as functions don't stupidly check the string identity class(obj) == "phylo", instead of using the proper class check is(obj, "phylo")....)

I can then build a new class, nexmlTree by extending this class. Again my new class acts like an S3 phylo in any such functions, but adds a representation containing all the nexml data. This approach doesn't minimize memory footprint, but usually that is not a concern for R users (otherwise coercion is always an option). It does satisfy the need for an object that works with all existing functions while also containing any and all metadata we can express in nexml.

See R/extend_phylo.R for the defitition.

cboettig · 2013-10-15T18:17:56Z

Looking for feedback on this approach.

It appears that phylobase didn't choose to extend the phylo class in a way that phylo4 objects could be simply passed to existing functions designed for the S3 phylo objects. This is possible, as I have now implemented with the tentatively named nexmlTree class, and describe here: http://carlboettiger.info/2013/10/07/nexml-phylo-class-extension.html

On one hand, it seems to make sense that we want an object that both has the metadata attached to it, with methods that can operate to extract, display, and potentially compute on that metadata, but still works as a tree object in all existing functions.

On the other hand, this makes a larger object, since it has all this metadata attached (possibly not a problem?). It can also introduce more potential trouble to have users using this object directly in their workflow, instead of converting to a vanilla phylo object and using that (for instance, as I describe in my linked notes, methods that check class with string matching instead of the built-in method will throw an error).

Seems it is an important design choice whether we build methods around the extended class or have separate methods for working on RNeXML S4 object metadata and just convert that to an ape::phylo for tree methods? @schamberlain @hlapp @rvosa thoughts?

sckott · 2013-10-15T20:08:16Z

whether we build methods around the extended class or have separate methods for working on RNeXML S4 object metadata and just convert that to an ape::phylo for tree methods

Do you have a feeling for which is better?

hlapp · 2013-10-16T00:00:48Z

Not clear to me what the concrete consequences for users would be. Can you explicate?

cboettig · 2013-10-16T00:10:09Z

With separate objects, users would have to decide to read in a NeXML file as nexml (and later convert it), or read it in directly as "phylo" and later read it in again to do anything with the metadata. e.g.:

tree <- nexml_read("file.xml", type="phylo") # object of class "phylo"
plot(tree)

or

nexml_tree <- nexml_read("file.xml", type="nexml") # object of class "nexml"
tree <- as(nexml_tree, "phylo")
plot(tree)

while to perform metadata functions they have to operate on the nexml object instead:

summary(nexml_tree) 
citation(nexml_tree)
license(nexml_tree)

(those methods not yet written btw).

In Option 2, with a combined interface, the user would use the same object for all purposes:

tree <- nexml_read("file.xml")  # object of class "nexmlTree"
plot(tree)
metadata(tree)
summary(tree)
license(tree)

etc. Clearly the interface is cleaner in the later context. The cost is larger object memory size and a chance that poorly written phylogenetics functions (at least ones that check class using strings) fail.

cboettig · 2013-10-16T00:13:49Z

(Um, note that plot(tree) is the ape method plot.phylo, I'm just using it to illustrate any existing method. Could be a richer function like bd.ext, any function from gieger, OUwie, phytools etc. Meanwhile the other 'metadata' functions would be the unique functions provided in RNeXML to handle the metadata. I'm not sure quite what or how many such functions we'll have, but see ideas in #20)

cboettig · 2013-10-16T21:13:21Z

Okay, I think we can just support both and let the user decide. The metadata methods (now implemented, see #20 (comment) and commit 94996e6 ) are written for the "nexml" class and inherited by the "nexmlTree" class. By default, I support the second method; e.g. tree <- nexml_read("file.xml") will read in an object of class "nexmlTree" that acts like a phylo object has all the metadata attached, with associated methods. Users who would prefer a pure phylo object can coerce this or read it in as such, as shown above.

Not sure if users will have any use for the raw nexml class, since the nexmlTree class has the added benefit of working in phylo methods. Still, it is available as an object for any user or developer just needing an R S4 representation of a nexml document.

I think this resolves this question. Re-open with outstanding issues, or feel free to add further questions or comments.

cboettig added a commit that referenced this issue Oct 7, 2013

extend_phylo.R provides the nexmlTree class, #19

51998e6

cboettig closed this as completed Oct 16, 2013

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Attaching nexml metadata to phylo objects #19

Attaching nexml metadata to phylo objects #19

cboettig commented Sep 6, 2013

sckott commented Sep 6, 2013

cboettig commented Oct 7, 2013

cboettig commented Oct 15, 2013

sckott commented Oct 15, 2013

hlapp commented Oct 16, 2013

cboettig commented Oct 16, 2013

cboettig commented Oct 16, 2013

cboettig commented Oct 16, 2013

Attaching nexml metadata to phylo objects #19

Attaching nexml metadata to phylo objects #19

Comments

cboettig commented Sep 6, 2013

sckott commented Sep 6, 2013

cboettig commented Oct 7, 2013

cboettig commented Oct 15, 2013

sckott commented Oct 15, 2013

hlapp commented Oct 16, 2013

cboettig commented Oct 16, 2013

cboettig commented Oct 16, 2013

cboettig commented Oct 16, 2013