-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Testing #15
Comments
No worries on the bugs. I have many to take care of myself. Right, makes sense on checking if trees are correct. Okay, will get to this later today hopefully. |
rock on! These look great. Are they all passing for you? It appears that The first inheritance test still fails, somehow I'm not writing |
Woops, thanks for fixing that. Right, that first inheritance test fails for me too. They aren't quite identical as you said. |
@schamberlain Thanks for writing the extra test cases. Would you be interested in running some testing of trying to parse a bunch of nexml files (e.g. https://github.com/rvosa/supertreebase/tree/master/data/treebase ) and seeing if we get any errors? Similarly would be good to try and write a bunch of ape files into NeXML, if you have some lying around (or can grab from the nexus using the treebase api...) Such tests probably would never become part of our standard unit tests, but it would be good to have run a few more different files through the pipes just to see. The unit tests all reuse the same handful of trees over and over... |
@cboettig Yep, will do both of those. |
working on this now... |
Hey @cboettig , the new tests I wrote in the The Please do change the setup, or let me know what you want changed. Also note that tests still fail on the serializing and top level API tests. |
Okay, tested parsing and conversion on all 3586 NeXML files provided in the supertreebase repo. Results: conversion failed: read failed: success
65 2 3519 Not too bad, as we are parsing 98% successfully, which I believe is higher than the success rate in parsing the Need to understand what happened in the failing cases. (The 3586 count excludes the nexml files that won't parse > read_failed
[1] "S12760.xml" "S13111.xml"
More problematic are the conversions to ape, where > conversion_failed
[1] "S10327.xml" "S10334.xml" "S10366.xml" "S10436.xml" "S10464.xml"
[6] "S104.xml" "S10548.xml" "S10562.xml" "S10589.xml" "S10644.xml"
[11] "S10648.xml" "S10679.xml" "S10689.xml" "S10772.xml" "S10810.xml"
[16] "S10968.xml" "S10995.xml" "S11021.xml" "S11155.xml" "S11178.xml"
[21] "S11267.xml" "S11459.xml" "S11618.xml" "S11647.xml" "S11786.xml"
[26] "S11911.xml" "S12022.xml" "S12093.xml" "S12101.xml" "S12147.xml"
[31] "S12173.xml" "S12233.xml" "S12300.xml" "S12336.xml" "S12410.xml"
[36] "S12523.xml" "S12536.xml" "S12630.xml" "S12773.xml" "S12786.xml"
[41] "S12880.xml" "S13140.xml" "S13461.xml" "S13664.xml" "S13805.xml"
[46] "S2087.xml" "S235.xml" "S236.xml" "S237.xml" "S239.xml"
[51] "S240.xml" "S241.xml" "S242.xml" "S243.xml" "S244.xml"
[56] "S245.xml" "S246.xml" "S247.xml" "S248.xml" "S249.xml"
[61] "S250.xml" "S251.xml" "S252.xml" "S382.xml" "S696.xml"
While 3586 is a decent sample size, all these nexml files were created by the same serializer, so they aren't quite independent samples. @hlapp @rvosa Is there any other large collection of nexml files, or other popular serializer of nexml that we should be including as a test case? Further, this test is not particularly deep, as it checks only that conversions run without error. Should obviously include more validation. @schamberlain updates to the test script are welcome -- test isn't run by |
On Oct 24, 2013, at 9:31 PM, Scott Chamberlain wrote:
|
Okay, maybe we can create some ourselves @cboettig ? |
@schamberlain yeah. Not crucial perhaps but it would be good to get some sample NeXML files from each of the major tools that write NeXML (e.g. python, perl, ruby libs, though maybe @rvosa has a couple such nexml files lying around already). Actually I think there were a few from various sources in the Kseniia's inst/examples directory that I removed, https://github.com/shumelchyk/RNeXML/tree/0b6788e6b3f554d0f0598ca6a6499b3dcdb67859/tests/examples may as well check those. Given that NeXML can be validated against its schema it probably doesn't matter so much what tool makes the NeXML. We know/check that we're writing valid nexml from R, and can assume/check that the input nexml files are valid, so the only weak link is in converting the nexml to ape, etc, as I discovered in the bug in #29. |
Fixing #29 reduced us from 65 conversion failures to 32.
|
The NeXML files generated by Phenoscape are in the https://github.com/phenoscape/phenoscape-data repo. |
Specifically, look under "Curation Files". The only caveat I can think of is that some values for "id" attributes start with an integer, which is not valid for XML. Something I didn't realize earlier and need to go through and fix. |
Hi Jim! Thanks for the feedback, nice to see this tagging of user names Dr. Rutger A. Vos |
@rvosa @balhoff Thanks! At this time we haven't implemented the character matrix side of NeXML (see #12), mostly because I'm unfamiliar with the R packages typically used in this context (I believe bioconductor has quite a few, but I'm mostly familiar only with the cran phylogenetics tools which largely work only with the phylogeny data... Beyond just extracting the character matrix as an R matrix, not sure what the sensible thing to do is for the associated phenoscape metadata. Any example use case / work flows would be appreciated. For the moment, we can parse the phenoscape file but RNeXML will drop the |
We now have |
@schamberlain Hey Scott, sorry about the bugs this morning. I think all the tests should be working now. I think this counts as our first real milestone!
Like you say, they could use a lot of
expect_that
tests still. It's not necessary to go too crazy with the expect_that tests though, thanks to the validator (xmlSchemaValidate). This is the really nice thing about the schema-based data structure, it comes with this really rigorous check that we've done things right.Checking the schema is valid doesn't check that we have accurately read in a tree, of course. Rather than the more "unit" unit tests of checking each line, we'd probably get some really solid tests just by going back and forth between formats and then seeing if the objects were identical or not; starting both with a bunch of ape trees and with a bunch of NeXML trees (which you can grab from the treebase github repo: https://github.com/rvosa/supertreebase/tree/master/data/treebase )
Of course RNeXML doesn't completely implement all of NeXML (character matrices, networks, and other things still missing), so these are silently dropped when being read in. There are also a few things the schema validation does not check, like consistent use of ids (see #8 (comment) )
Post bugs as you hit them, should help us flush out special cases. A few I can anticipate already (e.g. #9 ) which one of us should get around too..
The text was updated successfully, but these errors were encountered: