-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SPARQL use case #73
Comments
A meaningful SPARQL example would be great. Your proposed use case does sound like a common one that many researchers could relate to. My only thought is that most R users would be more familiar with simply importing the tree and trait data, etc, and then extracting the union (the I was wondering if we might have an example that emphasizes the logical reasoning of SPARQL that doesn't have an immediate SQL-like analog. For instance, a query that makes use of some ontology in identifying which species listed in the target dataset are a member of the queried taxonomic class or something (e.g. see our earlier thread: #20 (comment) ). Maybe that would be involved in the use case you already described. Will give a thought to some good published data examples. |
Reasoning would be really great but might be hard to demonstrate - do you On Tue, Jul 1, 2014 at 10:06 PM, Carl Boettiger notifications@github.com
|
With commit a7c8ffd I have added some example data which I believe might be interesting to demonstrate (recursive?) SPARQL queries. The NeXML file The general idea is that we should be able to query for all the members of a higher taxon - so given the URI of the higher taxon, give me all the direct descendants that specify Unfortunately, there appear to be some bugs in how the RDF is extracted. In particular, the namespace prefixes are not extracted correctly in the file What we should be getting is:
But instead we are getting:
I gather that this RDF is obtained by posting the NeXML to a web service, so its output is out of our control. I would like to suggest an alternative that could build on commit e3845d6. In that commit I have added an XSL stylesheet that extracts RDF/XML from RDFa. The output it produces is valid, and we should be able to run it locally, probably with better performance. However, this means we would create a dependency on a library that can process XSL stylesheets, such as this one: http://www.omegahat.org/Sxslt/ |
With commit d61a0c5 I have added an example that shows how we can query the valid RDF/XML that the XSL stylesheet produces. The example shows how you can fetch the taxon whose taxonomic rank is "Order", and return the corresponding NCBI taxon URI. Subsequently, with that URI, the example shows how to fetch its children. A person that actually knows R (so, not me ;-)) would be able to take these examples to write a simple recursive traversal from the root to the tips. As the URIs of the subjects in this graph are constructed from the |
I played around with sparql.R a bit more. It is failing, but I hope someone will be able to get the recursion to work so it generates a newick string which we then plot. Bonus points if the newick string can have the taxon names from the original NeXML. |
Very cool!! Look forward to digging in to your example when I'm back. Carl Boettiger sent from mobile device; my apologies for any terseness or typos
|
As of 81da59b, the RDF/XML taxonomy is traversed by recursive SPARQL queries, whose results are serialized to a Newick string with unbranched interior nodes, no branch lengths, and (optionally) interior node labels. In other words: it's a classification tree, which can be plotted as a cladogram, as the example shows. I think this would be a pretty neat use case for the supplementary materials: it's a bit too long (72 lines) to put in the MS body. To clean this up I am going to need a little more help, still:
|
This commit updates travis and documentation regarding the Sxslt dependency Also updates the rdf unit test (by removing the XPath query and by checking document type) Addresses #73. Should probably update sparql.R example to actually use this new get_rdf() command instead of starting with the already extracted RDF
Very nice. I've just updated
|
Excellent! Sorry I don't know the conventions (yet), but it's fun to learn On Wed, Jul 9, 2014 at 9:14 PM, Carl Boettiger notifications@github.com
|
@rvosa I was just thinking about trying to make the figure generated by I followed the suggestion in your code about adding get_name(id) to the |
doesn't run because of error in newick parsing now
@rvosa For quick reference, here's the Newick file I get when trying to add the node labels; not sure why it fails to parse (either using phylobase::readNewick, which uses the nexus class library, or using phytools::read.newick): https://github.com/ropensci/RNeXML/blob/96add29b379748a6dae302c483e6bbaf25297a7e/inst/examples/sparql.newick |
The tree description is valid in principle (you can paste it into figtree, On Fri, Jul 18, 2014 at 12:03 AM, Carl Boettiger notifications@github.com
|
@fmichonneau Maybe you might have some idea why we I can't parse this Newick file successfully in R? e.g. with phylobase: download.file("https://github.com/ropensci/RNeXML/raw/96add29b379748a6dae302c483e6bbaf25297a7e/inst/examples/sparql.newick", "sparql.newick", "wget")
readNewick("sparql.newick") Gives me:
though it seems like a valid tree (e.g. can be read into figtree)... |
I think this is a bug in ape (Unfortunately, phylobase still relies on ape to parse the tree string, phylobase uses NCL to extract information about the taxa, branch lengths, labels, etc, but on ape to convert the parentheses and commas into an R object). Apparently, ape doesn't support edge labels on terminal edges. To have edge labels on terminal edges, taxa need to be in parenthesis by themselves like so
gives
But
gives
This works with the phytools parser:
but the string from the example doesn't work (R hangs). I reported the ape's bug to Emmanuel |
Okay, thanks for taking a look. Yeah, I'd given phytools a try too and I On Wed, Jul 30, 2014 at 8:58 AM, Francois Michonneau <
Carl Boettiger |
Okay, with Liam's bugfix http://blog.phytools.org/2014/07/new-version-of-readnewick-that-can-read.html we can read the tree in and just plot internal node labels to avoid over-crowding the figure (see https://github.com/ropensci/RNeXML/blob/devel/manuscripts/supplement.Rmd#L330) I think we have a nice sparql use case now. We could possibly use a bit more text around this example, but I'll wait for others to weigh in. |
Now that it is so relatively painless to extract RDF and run SPARQL queries on it (incidentally: great job, supercool) I think it would be good to develop a more persuasive use case to demonstrate the power of this facility.
Here's an idea: let's say we have a tree, some trait data and some occurrences for a set of species. As usual, after all the data cleaning, we find that the species in the tree, in the trait data and the occurrences are only partially overlapping. It ought to be possible to extract the union of the taxa across these different data sources by way of a query.
What do you guys think - is that the coolest we can come up with (hopefully not?) and do we have some published data lying around that we could use to demonstrate this?
The text was updated successfully, but these errors were encountered: