Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Testing #15

Closed
cboettig opened this issue Sep 3, 2013 · 20 comments
Closed

Testing #15

cboettig opened this issue Sep 3, 2013 · 20 comments
Assignees
Milestone

Comments

@cboettig
Copy link
Member

cboettig commented Sep 3, 2013

@schamberlain Hey Scott, sorry about the bugs this morning. I think all the tests should be working now. I think this counts as our first real milestone!

Like you say, they could use a lot of expect_that tests still. It's not necessary to go too crazy with the expect_that tests though, thanks to the validator (xmlSchemaValidate). This is the really nice thing about the schema-based data structure, it comes with this really rigorous check that we've done things right.

Checking the schema is valid doesn't check that we have accurately read in a tree, of course. Rather than the more "unit" unit tests of checking each line, we'd probably get some really solid tests just by going back and forth between formats and then seeing if the objects were identical or not; starting both with a bunch of ape trees and with a bunch of NeXML trees (which you can grab from the treebase github repo: https://github.com/rvosa/supertreebase/tree/master/data/treebase )

Of course RNeXML doesn't completely implement all of NeXML (character matrices, networks, and other things still missing), so these are silently dropped when being read in. There are also a few things the schema validation does not check, like consistent use of ids (see #8 (comment) )

Post bugs as you hit them, should help us flush out special cases. A few I can anticipate already (e.g. #9 ) which one of us should get around too..

@sckott
Copy link
Contributor

sckott commented Sep 3, 2013

No worries on the bugs. I have many to take care of myself.

Right, makes sense on checking if trees are correct.

Okay, will get to this later today hopefully.

@sckott
Copy link
Contributor

sckott commented Sep 5, 2013

Forget to mention this issue, three commits are relavant here ada8def , c949693 , and ced59fb

@cboettig
Copy link
Member Author

cboettig commented Sep 5, 2013

rock on! These look great.

Are they all passing for you? It appears that test_that tests cannot use variables defined inside other tests, so I moved the shared initializations out above the test_that calls in test_inheritance.R, and now most of those pass for me (using test_dir("inst/tests/"). (Just pushed these).

The first inheritance test still fails, somehow I'm not writing xsi:type out to the meta node correctly, so that's something I have to fix.

@sckott
Copy link
Contributor

sckott commented Sep 5, 2013

Woops, thanks for fixing that.

Right, that first inheritance test fails for me too. They aren't quite identical as you said.

@cboettig
Copy link
Member Author

cboettig commented Sep 8, 2013

@schamberlain Thanks for writing the extra test cases. Would you be interested in running some testing of trying to parse a bunch of nexml files (e.g. https://github.com/rvosa/supertreebase/tree/master/data/treebase ) and seeing if we get any errors?

Similarly would be good to try and write a bunch of ape files into NeXML, if you have some lying around (or can grab from the nexus using the treebase api...)

Such tests probably would never become part of our standard unit tests, but it would be good to have run a few more different files through the pipes just to see. The unit tests all reuse the same handful of trees over and over...

@sckott
Copy link
Contributor

sckott commented Sep 8, 2013

@cboettig Yep, will do both of those.

@sckott
Copy link
Contributor

sckott commented Sep 9, 2013

working on this now...

@sckott
Copy link
Contributor

sckott commented Sep 10, 2013

Hey @cboettig , the new tests I wrote in the inst/tests/conversions file run from a separate set of files here https://github.com/ropensci/RNeXML_testfiles

The conversions file does not run when you do test_package("RNeXML"), so that's good.

Please do change the setup, or let me know what you want changed.

Also note that tests still fail on the serializing and top level API tests.

@cboettig
Copy link
Member Author

Okay, tested parsing and conversion on all 3586 NeXML files provided in the supertreebase repo.

Results:

conversion failed:       read failed:            success 
                65                  2               3519 

Not too bad, as we are parsing 98% successfully, which I believe is higher than the success rate in parsing the nexus files provided by treebase into R (should recompute those numbers). Still, I think there is no good reason why we should be failing on any of these files, so this needs investigation.

Need to understand what happened in the failing cases. (The 3586 count excludes the nexml files that won't parse xmlParse, mostly because they are empty files). The 2 read fails therefore parse, but still trigger read.nexml to fail:

> read_failed
[1] "S12760.xml" "S13111.xml"
  • Need to investigate what is going wrong here (and then add this situation as a unit test case).

More problematic are the conversions to ape, where

> conversion_failed
 [1] "S10327.xml" "S10334.xml" "S10366.xml" "S10436.xml" "S10464.xml"
 [6] "S104.xml"   "S10548.xml" "S10562.xml" "S10589.xml" "S10644.xml"
[11] "S10648.xml" "S10679.xml" "S10689.xml" "S10772.xml" "S10810.xml"
[16] "S10968.xml" "S10995.xml" "S11021.xml" "S11155.xml" "S11178.xml"
[21] "S11267.xml" "S11459.xml" "S11618.xml" "S11647.xml" "S11786.xml"
[26] "S11911.xml" "S12022.xml" "S12093.xml" "S12101.xml" "S12147.xml"
[31] "S12173.xml" "S12233.xml" "S12300.xml" "S12336.xml" "S12410.xml"
[36] "S12523.xml" "S12536.xml" "S12630.xml" "S12773.xml" "S12786.xml"
[41] "S12880.xml" "S13140.xml" "S13461.xml" "S13664.xml" "S13805.xml"
[46] "S2087.xml"  "S235.xml"   "S236.xml"   "S237.xml"   "S239.xml"  
[51] "S240.xml"   "S241.xml"   "S242.xml"   "S243.xml"   "S244.xml"  
[56] "S245.xml"   "S246.xml"   "S247.xml"   "S248.xml"   "S249.xml"  
[61] "S250.xml"   "S251.xml"   "S252.xml"   "S382.xml"   "S696.xml" 
  • Need to figure out what's going on here

While 3586 is a decent sample size, all these nexml files were created by the same serializer, so they aren't quite independent samples. @hlapp @rvosa Is there any other large collection of nexml files, or other popular serializer of nexml that we should be including as a test case?

Further, this test is not particularly deep, as it checks only that conversions run without error. Should obviously include more validation. @schamberlain updates to the test script are welcome -- test isn't run by test_that at the moment since it needs the whole 2.5 Gigs of supertreebase data that obviously aren't in the R package, see comments at top of file.

@sckott
Copy link
Contributor

sckott commented Oct 25, 2013

Good news on mostly successful parsing! Makes sense to do the test script the way you did it.

I'll poke around and see if I an find another set of nexml files, though @hlapp and @rvosa probably have some ideas

@hlapp
Copy link
Contributor

hlapp commented Oct 25, 2013

On Oct 24, 2013, at 9:31 PM, Scott Chamberlain wrote:

I'll poke around and see if I an find another set of nexml files, though @hlapp and @rvosaprobably have some ideas

I'm not aware of any other sources.

@sckott
Copy link
Contributor

sckott commented Oct 25, 2013

Okay, maybe we can create some ourselves @cboettig ?

@cboettig
Copy link
Member Author

@schamberlain yeah. Not crucial perhaps but it would be good to get some sample NeXML files from each of the major tools that write NeXML (e.g. python, perl, ruby libs, though maybe @rvosa has a couple such nexml files lying around already). Actually I think there were a few from various sources in the Kseniia's inst/examples directory that I removed, https://github.com/shumelchyk/RNeXML/tree/0b6788e6b3f554d0f0598ca6a6499b3dcdb67859/tests/examples may as well check those.

Given that NeXML can be validated against its schema it probably doesn't matter so much what tool makes the NeXML. We know/check that we're writing valid nexml from R, and can assume/check that the input nexml files are valid, so the only weak link is in converting the nexml to ape, etc, as I discovered in the bug in #29.

@cboettig
Copy link
Member Author

Fixing #29 reduced us from 65 conversion failures to 32.

  • S10366.xml contains a single <tree> element which has only meta element children - no nodes or edges. Guess we should throw an explicit error in this case, and suggest converting to / reading in as nexml instead of nexmlTree or phylo to access metadata?
  • S10436.xml contains several <tree> elements, one of which has only meta elements. In this case, we should be able to skip this tree in conversion to phylo and parse the remaining trees into a multiPhylo list.

@rvosa
Copy link
Contributor

rvosa commented Nov 12, 2013

Sorry to be late to the party, but aren't there files created by phenex that could be used for testing? Maybe @hlapp or @balhoff know?

@rvosa rvosa closed this as completed Nov 12, 2013
@rvosa rvosa reopened this Nov 12, 2013
@hlapp
Copy link
Contributor

hlapp commented Nov 13, 2013

The NeXML files generated by Phenoscape are in the https://github.com/phenoscape/phenoscape-data repo.

@balhoff
Copy link

balhoff commented Nov 13, 2013

Specifically, look under "Curation Files". The only caveat I can think of is that some values for "id" attributes start with an integer, which is not valid for XML. Something I didn't realize earlier and need to go through and fix.

@rvosa
Copy link
Contributor

rvosa commented Nov 13, 2013

Hi Jim! Thanks for the feedback, nice to see this tagging of user names
works so well. GitHub is good. Hope you're well, btw!

Dr. Rutger A. Vos
Bioinformaticist
Naturalis Biodiversity Center
Visiting address: Office A109, Einsteinweg 2, 2333 CC, Leiden, the
Netherlands
Mailing address: Postbus 9517, 2300 RA, Leiden, the Netherlands
http://rutgervos.blogspot.com

@cboettig
Copy link
Member Author

@rvosa @balhoff Thanks! At this time we haven't implemented the character matrix side of NeXML (see #12), mostly because I'm unfamiliar with the R packages typically used in this context (I believe bioconductor has quite a few, but I'm mostly familiar only with the cran phylogenetics tools which largely work only with the phylogeny data...

Beyond just extracting the character matrix as an R matrix, not sure what the sensible thing to do is for the associated phenoscape metadata. Any example use case / work flows would be appreciated.

For the moment, we can parse the phenoscape file but RNeXML will drop the <characters> node, since we don't even have an S4 version of that part of the schema implemented. Of course we can just keep it in R's XML representation, which is in some ways more useful since we can at least then navigate the tree with xpath expressions.

@cboettig
Copy link
Member Author

We now have <characters> implemented and a full test suite.

@cboettig cboettig added this to the CRAN Release milestone Mar 25, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants