Testing #15

cboettig · 2013-09-03T18:35:25Z

@schamberlain Hey Scott, sorry about the bugs this morning. I think all the tests should be working now. I think this counts as our first real milestone!

Like you say, they could use a lot of expect_that tests still. It's not necessary to go too crazy with the expect_that tests though, thanks to the validator (xmlSchemaValidate). This is the really nice thing about the schema-based data structure, it comes with this really rigorous check that we've done things right.

Checking the schema is valid doesn't check that we have accurately read in a tree, of course. Rather than the more "unit" unit tests of checking each line, we'd probably get some really solid tests just by going back and forth between formats and then seeing if the objects were identical or not; starting both with a bunch of ape trees and with a bunch of NeXML trees (which you can grab from the treebase github repo: https://github.com/rvosa/supertreebase/tree/master/data/treebase )

Of course RNeXML doesn't completely implement all of NeXML (character matrices, networks, and other things still missing), so these are silently dropped when being read in. There are also a few things the schema validation does not check, like consistent use of ids (see #8 (comment) )

Post bugs as you hit them, should help us flush out special cases. A few I can anticipate already (e.g. #9 ) which one of us should get around too..

The text was updated successfully, but these errors were encountered:

sckott · 2013-09-03T20:43:38Z

No worries on the bugs. I have many to take care of myself.

Right, makes sense on checking if trees are correct.

Okay, will get to this later today hopefully.

sckott · 2013-09-05T15:30:14Z

Forget to mention this issue, three commits are relavant here ada8def , c949693 , and ced59fb

cboettig · 2013-09-05T16:48:42Z

rock on! These look great.

Are they all passing for you? It appears that test_that tests cannot use variables defined inside other tests, so I moved the shared initializations out above the test_that calls in test_inheritance.R, and now most of those pass for me (using test_dir("inst/tests/"). (Just pushed these).

The first inheritance test still fails, somehow I'm not writing xsi:type out to the meta node correctly, so that's something I have to fix.

sckott · 2013-09-05T16:55:20Z

Woops, thanks for fixing that.

Right, that first inheritance test fails for me too. They aren't quite identical as you said.

cboettig · 2013-09-08T18:48:48Z

@schamberlain Thanks for writing the extra test cases. Would you be interested in running some testing of trying to parse a bunch of nexml files (e.g. https://github.com/rvosa/supertreebase/tree/master/data/treebase ) and seeing if we get any errors?

Similarly would be good to try and write a bunch of ape files into NeXML, if you have some lying around (or can grab from the nexus using the treebase api...)

Such tests probably would never become part of our standard unit tests, but it would be good to have run a few more different files through the pipes just to see. The unit tests all reuse the same handful of trees over and over...

sckott · 2013-09-08T18:51:32Z

@cboettig Yep, will do both of those.

sckott · 2013-09-09T18:14:45Z

working on this now...

…g issue #15

sckott · 2013-09-10T18:52:46Z

Hey @cboettig , the new tests I wrote in the inst/tests/conversions file run from a separate set of files here https://github.com/ropensci/RNeXML_testfiles

The conversions file does not run when you do test_package("RNeXML"), so that's good.

Please do change the setup, or let me know what you want changed.

Also note that tests still fail on the serializing and top level API tests.

cboettig · 2013-10-24T19:05:43Z

Okay, tested parsing and conversion on all 3586 NeXML files provided in the supertreebase repo.

Results:

conversion failed:       read failed:            success 
                65                  2               3519

Not too bad, as we are parsing 98% successfully, which I believe is higher than the success rate in parsing the nexus files provided by treebase into R (should recompute those numbers). Still, I think there is no good reason why we should be failing on any of these files, so this needs investigation.

Need to understand what happened in the failing cases. (The 3586 count excludes the nexml files that won't parse xmlParse, mostly because they are empty files). The 2 read fails therefore parse, but still trigger read.nexml to fail:

> read_failed
[1] "S12760.xml" "S13111.xml"

Need to investigate what is going wrong here (and then add this situation as a unit test case).

More problematic are the conversions to ape, where

> conversion_failed
 [1] "S10327.xml" "S10334.xml" "S10366.xml" "S10436.xml" "S10464.xml"
 [6] "S104.xml"   "S10548.xml" "S10562.xml" "S10589.xml" "S10644.xml"
[11] "S10648.xml" "S10679.xml" "S10689.xml" "S10772.xml" "S10810.xml"
[16] "S10968.xml" "S10995.xml" "S11021.xml" "S11155.xml" "S11178.xml"
[21] "S11267.xml" "S11459.xml" "S11618.xml" "S11647.xml" "S11786.xml"
[26] "S11911.xml" "S12022.xml" "S12093.xml" "S12101.xml" "S12147.xml"
[31] "S12173.xml" "S12233.xml" "S12300.xml" "S12336.xml" "S12410.xml"
[36] "S12523.xml" "S12536.xml" "S12630.xml" "S12773.xml" "S12786.xml"
[41] "S12880.xml" "S13140.xml" "S13461.xml" "S13664.xml" "S13805.xml"
[46] "S2087.xml"  "S235.xml"   "S236.xml"   "S237.xml"   "S239.xml"  
[51] "S240.xml"   "S241.xml"   "S242.xml"   "S243.xml"   "S244.xml"  
[56] "S245.xml"   "S246.xml"   "S247.xml"   "S248.xml"   "S249.xml"  
[61] "S250.xml"   "S251.xml"   "S252.xml"   "S382.xml"   "S696.xml"

Need to figure out what's going on here

While 3586 is a decent sample size, all these nexml files were created by the same serializer, so they aren't quite independent samples. @hlapp @rvosa Is there any other large collection of nexml files, or other popular serializer of nexml that we should be including as a test case?

Further, this test is not particularly deep, as it checks only that conversions run without error. Should obviously include more validation. @schamberlain updates to the test script are welcome -- test isn't run by test_that at the moment since it needs the whole 2.5 Gigs of supertreebase data that obviously aren't in the R package, see comments at top of file.

sckott · 2013-10-25T01:31:49Z

Good news on mostly successful parsing! Makes sense to do the test script the way you did it.

I'll poke around and see if I an find another set of nexml files, though @hlapp and @rvosa probably have some ideas

hlapp · 2013-10-25T02:03:48Z

On Oct 24, 2013, at 9:31 PM, Scott Chamberlain wrote:

I'll poke around and see if I an find another set of nexml files, though @hlapp and @rvosaprobably have some ideas

I'm not aware of any other sources.

sckott · 2013-10-25T02:07:18Z

Okay, maybe we can create some ourselves @cboettig ?

cboettig · 2013-10-25T04:54:04Z

@schamberlain yeah. Not crucial perhaps but it would be good to get some sample NeXML files from each of the major tools that write NeXML (e.g. python, perl, ruby libs, though maybe @rvosa has a couple such nexml files lying around already). Actually I think there were a few from various sources in the Kseniia's inst/examples directory that I removed, https://github.com/shumelchyk/RNeXML/tree/0b6788e6b3f554d0f0598ca6a6499b3dcdb67859/tests/examples may as well check those.

Given that NeXML can be validated against its schema it probably doesn't matter so much what tool makes the NeXML. We know/check that we're writing valid nexml from R, and can assume/check that the input nexml files are valid, so the only weak link is in converting the nexml to ape, etc, as I discovered in the bug in #29.

cboettig · 2013-10-25T22:48:23Z

Fixing #29 reduced us from 65 conversion failures to 32.

S10366.xml contains a single <tree> element which has only meta element children - no nodes or edges. Guess we should throw an explicit error in this case, and suggest converting to / reading in as nexml instead of nexmlTree or phylo to access metadata?
S10436.xml contains several <tree> elements, one of which has only meta elements. In this case, we should be able to skip this tree in conversion to phylo and parse the remaining trees into a multiPhylo list.

rvosa · 2013-11-12T20:29:09Z

Sorry to be late to the party, but aren't there files created by phenex that could be used for testing? Maybe @hlapp or @balhoff know?

hlapp · 2013-11-13T00:08:54Z

The NeXML files generated by Phenoscape are in the https://github.com/phenoscape/phenoscape-data repo.

balhoff · 2013-11-13T02:56:15Z

Specifically, look under "Curation Files". The only caveat I can think of is that some values for "id" attributes start with an integer, which is not valid for XML. Something I didn't realize earlier and need to go through and fix.

rvosa · 2013-11-13T09:24:37Z

Hi Jim! Thanks for the feedback, nice to see this tagging of user names
works so well. GitHub is good. Hope you're well, btw!

Dr. Rutger A. Vos
Bioinformaticist
Naturalis Biodiversity Center
Visiting address: Office A109, Einsteinweg 2, 2333 CC, Leiden, the
Netherlands
Mailing address: Postbus 9517, 2300 RA, Leiden, the Netherlands
http://rutgervos.blogspot.com

cboettig · 2013-11-14T15:35:31Z

@rvosa @balhoff Thanks! At this time we haven't implemented the character matrix side of NeXML (see #12), mostly because I'm unfamiliar with the R packages typically used in this context (I believe bioconductor has quite a few, but I'm mostly familiar only with the cran phylogenetics tools which largely work only with the phylogeny data...

Beyond just extracting the character matrix as an R matrix, not sure what the sensible thing to do is for the associated phenoscape metadata. Any example use case / work flows would be appreciated.

For the moment, we can parse the phenoscape file but RNeXML will drop the <characters> node, since we don't even have an S4 version of that part of the schema implemented. Of course we can just keep it in R's XML representation, which is in some ways more useful since we can at least then navigate the tree with xpath expressions.

cboettig · 2014-03-25T21:59:20Z

We now have <characters> implemented and a full test suite.

ghost assigned sckott Sep 3, 2013

cboettig mentioned this issue Sep 3, 2013

Tackling "special cases" #13

Closed

cboettig mentioned this issue Sep 8, 2013

Reach out to R and NeXML lists #22

Closed

sckott added a commit that referenced this issue Sep 9, 2013

added conversions file for large scale testing of file reading,writin…

acf7422

…g issue #15

cboettig mentioned this issue Oct 25, 2013

Non-XML files TreeBASE/supertreebase#2

Closed

cboettig mentioned this issue Oct 25, 2013

ape conversion should handle case of some missing branch lengths #29

Closed

cboettig mentioned this issue Oct 25, 2013

Don't error if we cannot convert only some <tree> elements in a list of <trees> to a multiPhylo #30

Closed

rvosa closed this as completed Nov 12, 2013

rvosa reopened this Nov 12, 2013

cboettig closed this as completed Mar 25, 2014

cboettig added this to the CRAN Release milestone Mar 25, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Testing #15

Testing #15

cboettig commented Sep 3, 2013

sckott commented Sep 3, 2013

sckott commented Sep 5, 2013

cboettig commented Sep 5, 2013

sckott commented Sep 5, 2013

cboettig commented Sep 8, 2013

sckott commented Sep 8, 2013

sckott commented Sep 9, 2013

sckott commented Sep 10, 2013

cboettig commented Oct 24, 2013

sckott commented Oct 25, 2013

hlapp commented Oct 25, 2013

sckott commented Oct 25, 2013

cboettig commented Oct 25, 2013

cboettig commented Oct 25, 2013

rvosa commented Nov 12, 2013

hlapp commented Nov 13, 2013

balhoff commented Nov 13, 2013

rvosa commented Nov 13, 2013

cboettig commented Nov 14, 2013

cboettig commented Mar 25, 2014

Testing #15

Testing #15

Comments

cboettig commented Sep 3, 2013

sckott commented Sep 3, 2013

sckott commented Sep 5, 2013

cboettig commented Sep 5, 2013

sckott commented Sep 5, 2013

cboettig commented Sep 8, 2013

sckott commented Sep 8, 2013

sckott commented Sep 9, 2013

sckott commented Sep 10, 2013

cboettig commented Oct 24, 2013

sckott commented Oct 25, 2013

hlapp commented Oct 25, 2013

sckott commented Oct 25, 2013

cboettig commented Oct 25, 2013

cboettig commented Oct 25, 2013

rvosa commented Nov 12, 2013

hlapp commented Nov 13, 2013

balhoff commented Nov 13, 2013

rvosa commented Nov 13, 2013

cboettig commented Nov 14, 2013

cboettig commented Mar 25, 2014