-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Attaching nexml metadata to phylo objects #19
Comments
@cboettig That seems reasonable at small scale, but with huge xml files then we would be putting a lot of data into the users workspace without them realizing it. |
In general it would be useful to read in a nexml object that could be passed directly to functions based on ape trees without requiring coercion and dropping of metadata. I never understood why phylobase didn't do this -- but it appears that phylo4 objects do not inherit the phylo S3 class and cannot be passed to phylo functions without explicit coercion: library(phylobase)
library(ape)
data(bird.orders)
bird.orders4 <- as(bird.orders, "phylo4") # make ape::phylo tree into phylobase::phylo4 S4 class
plot.phylo(bird.orders4) # attempting to use the S4 fails Of course a S <- c(10, 47, 69, 214, 161, 17, 355, 51, 56, 10, 39, 152,
6, 143, 358, 103, 319, 23, 291, 313, 196, 1027, 5712)
bd.ext(bird.orders4, S) # Fails again. Works with the S3 type Anyway, it appears this problem can be solved using I can then build a new class, See R/extend_phylo.R for the defitition. |
Looking for feedback on this approach. It appears that phylobase didn't choose to extend the On one hand, it seems to make sense that we want an object that both has the metadata attached to it, with methods that can operate to extract, display, and potentially compute on that metadata, but still works as a tree object in all existing functions. On the other hand, this makes a larger object, since it has all this metadata attached (possibly not a problem?). It can also introduce more potential trouble to have users using this object directly in their workflow, instead of converting to a vanilla Seems it is an important design choice whether we build methods around the extended class or have separate methods for working on RNeXML S4 object metadata and just convert that to an |
Do you have a feeling for which is better? |
Not clear to me what the concrete consequences for users would be. Can you explicate? |
With separate objects, users would have to decide to read in a NeXML file as nexml (and later convert it), or read it in directly as "phylo" and later read it in again to do anything with the metadata. e.g.: tree <- nexml_read("file.xml", type="phylo") # object of class "phylo"
plot(tree) or nexml_tree <- nexml_read("file.xml", type="nexml") # object of class "nexml"
tree <- as(nexml_tree, "phylo")
plot(tree) while to perform metadata functions they have to operate on the summary(nexml_tree)
citation(nexml_tree)
license(nexml_tree) (those methods not yet written btw). In Option 2, with a combined interface, the user would use the same object for all purposes: tree <- nexml_read("file.xml") # object of class "nexmlTree"
plot(tree)
metadata(tree)
summary(tree)
license(tree) etc. Clearly the interface is cleaner in the later context. The cost is larger object memory size and a chance that poorly written phylogenetics functions (at least ones that check class using strings) fail. |
(Um, note that |
Okay, I think we can just support both and let the user decide. The metadata methods (now implemented, see #20 (comment) and commit 94996e6 ) are written for the "nexml" class and inherited by the "nexmlTree" class. By default, I support the second method; e.g. Not sure if users will have any use for the raw I think this resolves this question. Re-open with outstanding issues, or feel free to add further questions or comments. |
Crazy idea: when reading into a phylo, should we create a new RNeXML environment, store the full nexml tree in there, and add a new slot to the phylo object storing an unevaluated
get("<unique_tree_id>", envir=RNeXML)
that methods could use to access the full NeXML??This would let us do something like:
instead of
That is, reading in as the default (ape) type, and still calling functions that need the full nexml metadata, while also having an ape tree object that can still be passed around to the usual R packages.
Or maybe that's stupid and asking for trouble, and we should be explicit about what type of object we want.
The text was updated successfully, but these errors were encountered: