Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bryce review from rOpenSci onboarding #9

Closed
cboettig opened this issue Jan 31, 2018 · 1 comment
Closed

Bryce review from rOpenSci onboarding #9

cboettig opened this issue Jan 31, 2018 · 1 comment

Comments

@cboettig
Copy link
Member

cboettig commented Jan 31, 2018

In addition to the changes below, the reviewers have inspired me to add a few further bits of functionality, documented now in NEWS.md

I have left two checkboxes unchecked due to the following issues:

  • Community guidelines are missing in README.md
    devtools::use_code_of_conduct() now added.
  • README.md missing citation section recommended by packaging guide

Added (using citation("rdflib"), note that if accepted to JOSS I will also add a CITATION file which will update this record.

Fixed (release dates added, divided into sections (feature vs bug fix etc))

Higher level

  • How are you dealing with differentiating between resource, literal, and blank nodes in rdf_add? As far as I can see, it looks I can't create a new triple with a resource or blank node as the subject:
> rdf_add(x, "test", "test", "test")
> x
<test> <test> "test" .

rdf_add documentation actually covers blank nodes already but I've extended it significantly (as Anna also suggested) to highlight this use.

The redland methods actually automatically Duck Type nodes, (things that look like URIs are typed as such, non-URIs in subject are typed as Blank, etc). This can be specified explicitly instead using subjectType and objectType respectively (e.g. if you want a URL to be treated as a literal).

  • I'd like to see more usage of the RDF capabilities of rdflib in the vignette. At current it's centered around JSON-LD

So technically JSON-LD is just one way to serialize RDF; I chose that serialization because I felt it would be more familiar / intuitive to readers than to show all the RDF is, say, nquads or turtle.

However, I think Anna nailed the bigger issue here, which is the real lack of conceptual introduction to RDF. To me, the current vignette is basically more-or-less a SPARQL query tutorial, trying to make the point that JSON-LD is a kind of RDF and can be fun to query with SPARQL. To address the real lack of conceptual intro I'm adding a very different vignette focused on introducing RDF as a way of thinking about data for readers who come from a tabular / relational / tidyverse orientation. Hope this addresses the issue.

Lower level

These were written out as I went through the checklist:

  • Documentation could benefit from a general pass for capitalization just to make things look nicer
    Agree, done.
  • \link{}s in docs for rdf_serialize and rdf_query link to wrong version of the parse function
  • Noticed funky encoding issues from vita.json.
    "Fernández" => "Fern\u00E1ndez"
    Not sure if this is an rdflib problem or a redland one.

Seems to be a bug in Redland serializer/parser for nquads and ntriples (we have to go through the nquads when working with JSON-LD). Bug filed: ropensci/redland-bindings#62, but also added what I hope is a solid work-around by just re-parsing these strings into UTF-8. At least resolves the issues seen in the Vignette example

  • I think any mentions of "RDF+XML" outside of MIME types should be "RDF/XML", not "RDF+XML"
    done
  • I thought arg doc in rdf_serialize could be path instead as it's a little more clear.

Note as stated in the help file, doc could also be a literal text string instead of a path. This same ambiguity / flexibility is found in jsonld R package, which also uses the term doc so I have borrowed that.

  • I'm generally not a fan of arguments named x, though it feels warranted in some cases. Could a more descriptive name be used in these instances? For example, rdf_query has a first arg of x which could, instead, be rdf.

Very good point! I now call the argument rdf whenever it refers to an rdf object.

  • Perhaps a little more error checking...
    If I send a malformed query, I get useful but a bit cryptic response from rdf_query:
> coi <- vita %>% rdf_query(sparql)
librdf error  - syntax error, unexpected a
rdf_query_results.c:100: (librdf_query_results_finished) assertion failed: object pointer of type librdf_query_results is NULL.
rdf_query_results.c:100: (librdf_query_results_finished) assertion failed: object pointer of type librdf_query_results is NULL.
This is caused by failing to cath an error to the call to `redland::executeQuery`.

```r
queryResult <- redland::executeQuery(queryObj, x$model)
```
  • browseVignettes(package = "rdflib") doesn't return any vignettes. I'm not that familiar with this functionality so I didn't debug it.

Assuming you installed with install_github(), default is build_vignettes = FALSE, so the vignette was not installed. Once on CRAN (or ropensci drat) vignettes will be prebuilt and installed by install.packages()

  • Noticed the test "we can parse from a url" calls out to the web but the skip_on_cran() guard is commented out. Should it not be commented out?

The CRAN policy on tests involving calls to an external internet resource is a bit unclear, but these usually pass so I may as well leave it in. (For some reason, skip_on_cran() also makes this get skipped by default testing? A comment added to this effect.

  • Found documentation for rdf() to be too terse and terse by comparison to other functions in the package
    Extended with a description of the purpose of this constructor and an overview of it's two components.
  • rdf_parse() mentions the return is an 'rdf S3 object'. I might just call it an 'object' and omit the class system it was defined in
    agreed, done.
  • Verbage in rdf_serialize 'Serialize RDF docs' maybe should be 'Serialize an RDF Document' (no plural)
  • In rdf_serialize, I wonder if the return should not be the raw numeric return code from redland but either a logical or the document path itself (the latter would allow for piping)
    Good idea! done.
  • rdf_query states the return type is 'a list of all query results' when it actually appears to be a 'data.frame' from what I can tell
    Yup, corrected.
  • What the common argument 'x' is documented as across functions varies. "rdf graph object" "rdf object". Please use common language for these
    Good point. I now only refer either to the rdf object (as the actual R object used in input or return), or RDF graph, as the abstract concept embodied by the rdf object.
  • RE: JOSS submission: I wonder if they won't find this package to simple/thin for acceptance. It does have clear research application though!

This is a fair question, as you observe they don't take "Thin API Clients" and "Minor Utility packages", though an implementation need not be "novel."

Actually, I'm now wondering if it would better to submit the longer, new vignette as a separate piece to R Journal, in lieu of JOSS. This would need some substantial revising to be more journal style than blog style, but I believe falls within the scope of several of their topic areas (listed below; would love to recruit Bryce & Ana as co-authors in such an effort).

Reviews and proposals:

surveying and discussing challenges and opportunities of potential importance for the broader R community, including proposals and proof-of-concept implementations.

Comparisons and benchmarking:

of implementations in base-R and contributed packages with each other, and where relevant with implementations in other software systems.

Applications:

demonstrating how new or existing techniques can be applied in an area of current interest using R, providing a fresh view of such analyses in R that is of benefit beyond the specific application.

Add-on packages:

short introductions to contributed R packages that are already available on CRAN or Bioconductor, and going beyond package vignettes in aiming to provide broader context and to attract a wider readership than package users.

@cboettig
Copy link
Member Author

These issues are now resolved and approved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant