Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Submission: emld #269

Closed
14 of 19 tasks
cboettig opened this issue Nov 27, 2018 · 16 comments
Closed
14 of 19 tasks

Submission: emld #269

cboettig opened this issue Nov 27, 2018 · 16 comments

Comments

@cboettig
Copy link
Member

cboettig commented Nov 27, 2018

Summary

  • What does this package do? (explain in 50 words or less):

Transform Ecological Metadata Language (EML) files into JSON-LD
and back into EML. Doing so creates a natural list-based representation
of EML in R, so that EML data can easily be manipulated using standard
R tools. This makes this package an effective backend for other tools
for working with EML. Additioally, the JSON-LD representation enables
the use of developer-friendly JSON parsing and informatics-friendly
serializations such as RDF and SPARQL queries. This package is targeted
primarily at developers. By abstracting away the complexity of EML's
XML Schema, developers can build against intuitive list or JSON objects
and not have to worry about satisfying many of the additional constraints
of set by the schema (such as element ordering, which is handled automatically).

In practice, emld will basically be the backend machinery for a new release of the EML
package, which will retain largely the same user-facing API, with several improvements.
But both in terms of ease of reviewing and ease of package maintenance, I think it's important
to have these as separate packages.

  • Paste the full DESCRIPTION file inside a code block below:
Package: emld
Title: Ecological Metadata as Linked Data
Version: 0.0.2
Authors@R: person("Carl", "Boettiger",
                  email = "cboettig@gmail.com",
                  role = c("aut", "cre", "cph"),
                  comment=c(ORCID = "http://orcid.org/0000-0002-1642-628X"))
Description: Transform Ecological Metadata Language (EML) files into JSON-LD
             and back into EML.  Doing so creates a natural list-based representation
             of EML in R, so that EML data can easily be manipulated using standard
             R tools. This makes this package an effective backend for other tools
             for working with EML.  Additioally, the JSON-LD representation enables
             the use of developer-friendly JSON parsing and informatics-friendly
             serializations such as RDF and SPARQL queries. This package is targeted
             primarily at developers. By abstracting away the complexity of EML's
             XML Schema, developers can build against intuitive list or JSON objects
             and not have to worry about satisfying many of the additional constraints
             of set by the schema (such as element ordering, which is handled automatically).
URL: https://github.com/cboettig/emld
BugReports: https://github.com/cboettig/emld/issues
Depends: R (>= 3.1.0)
License: MIT + file LICENSE
Encoding: UTF-8
LazyData: true
RoxygenNote: 6.1.1
Suggests:
    spelling,
    testthat,
    magrittr,
    rmarkdown,
    covr,
    knitr,
    rdflib,
    jqr
Imports: xml2,
    jsonlite,
    jsonld,
    methods,
    yaml
VignetteBuilder: knitr
Language: en-US

  • URL for the package (the development repository, not a stylized html page):
    https://github.com/cboettig/emld

  • Please indicate which category or categories from our package fit policies this package falls under *and why(? (e.g., data retrieval, reproducibility. If you are unsure, we suggest you make a pre-submission inquiry.):

data publication, data retrieval, and reproducibility.

  • Who is the target audience and what are scientific applications of this package?

Researchers and ecological informatics teams (at NEON, LTER sites, NCEAS, etc) who are consuming or managing EML and data described by EML.

This overlaps with the EML package, and will soon be the backend to the newer version of said package. This package works very differently than the current CRAN version of EML: instead of a massive S4 class system, this uses a single S3 class by exploiting the magic of JSON-LD.

  • If you made a pre-submission enquiry, please paste the link to the corresponding issue, forum post, or other discussion, or @tag the editor you contacted.

Requirements

Confirm each of the following by checking the box. This package:

  • does not violate the Terms of Service of any service it interacts with.
  • has a CRAN and OSI accepted license.
  • contains a README with instructions for installing the development version.
  • includes documentation with examples for all functions.
  • contains a vignette with examples of its essential functions and uses.
  • has a test suite.
  • has continuous integration, including reporting of test coverage, using services such as Travis CI, Coveralls and/or CodeCov.
  • I agree to abide by ROpenSci's Code of Conduct during the review process and in maintaining my package should it be accepted.

Publication options

  • Do you intend for this package to go on CRAN?
  • Do you wish to automatically submit to the Journal of Open Source Software? If so:
    • The package has an obvious research application according to JOSS's definition.
    • The package contains a paper.md matching JOSS's requirements with a high-level description in the package root or in inst/.
    • The package is deposited in a long-term repository with the DOI:
    • (Do not submit your package separately to JOSS)
  • Do you wish to submit an Applications Article about your package to Methods in Ecology and Evolution? If so:
    • The package is novel and will be of interest to the broad readership of the journal.
    • The manuscript describing the package is no longer than 3000 words.
    • You intend to archive the code for the package in a long-term repository which meets the requirements of the journal (see MEE's Policy on Publishing Code)
    • (Scope: Do consider MEE's Aims and Scope for your manuscript. We make no guarantee that your manuscript will be within MEE scope.)
    • (Although not required, we strongly recommend having a full manuscript prepared when you submit here.)
    • (Please do not submit your package separately to Methods in Ecology and Evolution)

Detail

  • Does R CMD check (or devtools::check()) succeed? Paste and describe any errors or warnings:

  • Does the package conform to rOpenSci packaging guidelines? Please describe any exceptions:

  • If this is a resubmission following rejection, please explain the change in circumstances:

  • If possible, please provide recommendations of reviewers - those with experience with similar packages and/or likely users of your package - and their GitHub user names:

@lmullen
Copy link
Member

lmullen commented Dec 5, 2018

Thanks for your submission, @cboettig. I have run the editor's checks below. Please address the requested changes below. I have begun looking for reviewers and I will update this thread with the reviewers and due date once all reviewers have agreed to take this on.

Editor checks:

  • Fit: The package meets criteria for fit and overlap
  • Automated tests: Package has a testing suite and is tested via Travis-CI or another CI service.
  • License: The package has a CRAN or OSI accepted license
  • Repository: The repository link resolves correctly
  • Archive (JOSS only, may be post-review): The repository DOI resolves correctly
  • Version (JOSS only, may be post-review): Does the release version given match the GitHub release (v1.0.0)?

Editor comments

Changes for package authors:

  • Please add an rOpenSci review badge to your repository. You can use this code:
[![](https://badges.ropensci.org/269_status.svg)](https://github.com/ropensci/software-review/issues/269)
  • The package summary might be somewhat clearer if revised to make clearer what the words "natural," "intuitive," and so on mean. I took this to mean that XML is a pain to work with and R lists and JSON are easier to work with in R. While I agree with what it is not a universal opinion. I suggest taking another look at that language and bringing it together into a single sentence or clause that expresses the idea that it is more idiomatic to work with lists than XML in R.
  • When you say that the package is a "backend for other tools for working with EML," does that mean other tools written in R?
  • The copyright holder in the LICENSE file should be filled out or removed.
  • Please correct the following failed test before the review begins.
test-template.R:11: failure: template knows about internal classes too
`print\(template\("ResponsibleParty"\)\)` does not match "individualName: \\{\\}".
Actual value: "\{\}"
  • Please check the examples. When running devtools::check(run_dont_test = TRUE) I get the following error.
E  checking examples (619ms)
   Running examples in 'emld-Ex.R' failed
   The error most likely occurred in:
   
   > base::assign(".ptime", proc.time(), pos = "CheckExEnv")
   > ### Name: eml_locate_schema
   > ### Title: eml_locate_schema
   > ### Aliases: eml_locate_schema
   > 
   > ### ** Examples
   > 
   > ## No test: 
   > f <- system.file("examples", "example-eml-2.1.1.xml", package = "EML")
   > eml <- xml2::read_xml(f)
   Error: '' does not exist in current working directory ('/tmp/Rtmp2NUD8n/emld.Rcheck').

For reviewers:

  • Please note whether you have any difficulties with the chain of dependencies (jsonld package -> V8 package -> V8 system libraries) that leads back to V8. This is at least a potential pain point.

  • Check code that is not covered by the test suite. See the output of goodpractice:gp() and covr::package_coverage().

  ✖ write unit tests for all functions, and all package code in general. 86%
    of code lines are covered by test cases.

@lmullen
Copy link
Member

lmullen commented Dec 5, 2018

Two reviewers have agreed. Thanks to Kelly and Peter for being willing to take on this review. 🎉 Typically we ask reviewers to complete their review in 3 weeks, but in this case that includes several holidays and most people's winter break. So I have set the due date with that in mind. Kelly and Peter, in addition to the typical checks listed in the reviewers' guide, I've noted a couple of things to pay attention to in my editorial comments above.

Reviewers: @khondula and @gothub

Due date: 2019-01-04

@cboettig
Copy link
Member Author

cboettig commented Dec 6, 2018

Thanks @lmullen !

  • badge added.
  • DESCRIPTION updated to avoid natural and intuitive and be a bit more precise:
        This is a utility for transforming Ecological Metadata Language
        (EML) files into JSON-LD and back into EML.  Doing so creates a
        list-based representation of EML in R, so that EML data can easily
        be manipulated using standard R tools. This makes this package an
        effective backend for other R-based tools  working with EML. By
        abstracting away the complexity of EML's XML Schema, developers can
        build around native R list objects and not have to worry about satisfying
        many of the additional constraints of set by the schema (such as element
        ordering, which is handled automatically). Additionally, the JSON-LD 
        representation enables the use of developer-friendly JSON parsing and
        serialization that may faciliate the use of EML in contexts outside of R,
        as well as the informatics-friendly serializations such as RDF and
        SPARQL queries.
  • Also hopefully makes explicit what parts would be useful to other R developers (e.g. R lists) vs what would be useful outside of the R context (having a JSON serialization, which is convenient for, e.g. javascript / web developers)
  • License template is filled out now :-)
  • Fixed failing test and failing example.

@khondula
Copy link

khondula commented Jan 4, 2019

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

  • As the reviewer I confirm that there are no conflicts of interest for me to review this work (If you are unsure whether you are in conflict, please speak to your editor before starting your review).

Documentation

The package includes all the following forms of documentation:

  • A statement of need clearly stating problems the software is designed to solve and its target audience in README
  • Installation instructions: for the development version of package and any non-standard dependencies in README
  • Vignette(s) demonstrating major functionality that runs successfully locally
  • Function Documentation: for all exported functions in R help
  • Examples for all exported functions in R Help that run successfully locally (see comments below)
  • Community guidelines including contribution guidelines in the README or CONTRIBUTING, and DESCRIPTION with URL, BugReports and Maintainer (which may be autogenerated via Authors@R). (see comments below)
For packages co-submitting to JOSS
  • The package has an obvious research application according to JOSS's definition
  • The package contains a paper.md matching JOSS's requirements with:
  • A short summary describing the high-level functionality of the software
  • Authors: A list of authors with their affiliations
  • A statement of need clearly stating problems the software is designed to solve and its target audience.
  • References: with DOIs for all those that have one (e.g. papers, datasets, software).

Functionality

  • Installation: Installation succeeds as documented.
  • Functionality: Any functional claims of the software been confirmed.
  • Performance: Any performance claims of the software been confirmed.
  • Automated tests: Unit tests cover essential functions of the package
    and a reasonable range of inputs and conditions. All tests pass on the local machine.
  • Packaging guidelines: The package conforms to the rOpenSci packaging guidelines

Final approval (post-review)

  • The author has responded to my review and made changes to my satisfaction. I recommend approving this package.

Estimated hours spent reviewing: 7
---

Review Comments

Summary

The emld package provides a handy set of functions that will facilitate manipulating and analyzing ecological metadata in R. The functions work as described, and the readme provides both a clear explanation of how emld differs from packages like EML, and ways to use the output of its functions in with SPARQL or JQ queries in rdflib or jqr respectively. Users of this package will likely be using it in combination within this ecosystem of related packages, however the translation functionality of this package should lower the barriers to entry for any user that is not familiar with any of EML, json, XML, or R lists.

The main issues I noticed were in the documentation, especially to correct the SPARQL query in the README/vignette so that a blank dataframe is not returned. Otherwise, I found the package easy to use and understand, and I look forward to seeing it published.

Documentation

From review checklist:

  1. Example needed for as_emld().
  2. The example for eml_locate_schema() requires having the EML package installed. Consider adding this package to Suggests in package documentation.
  3. I was unable to locate a CONTRIBUTING file.
  4. I was unable to locate the maintainer in the DESCRIPTION file.

Suggestions for improving package documentation:

  1. Typo in package description: "faciliate" should be "facilitate" on line 18 of DESCRIPTION.

Suggestions for improving function documentation:

  1. For as_emld, consider providing information about the return value of the function.
  2. The Description could be improved for as_json, as_xml, and template documentation.
  3. For template, consider providing more options for the object argument may be helpful.
  4. For template, the Value text refers to usage in vignettes however I was unable to locate that.

Suggestions for improving README:

  1. Typo in SPARQL queries section - I believe "names are act as variables" should be "names act as variables"
  2. Consider indenting as_json("hf205.json") line after pipe in SPARQL queries section for readability
  3. The SPARQL query in the example returns nothing as written. I believe the eml PREFIX should be updated to <eml://ecoinformatics.org/eml-2.2.0/>.
  4. I was able to run everything in the vignette without separately loading the library maggritr. Is this required?

Functionality

  1. The naming scheme is slightly different than the object_verb format suggested in rOpenSci packaging guidelines, however it is consistent with other conversion-based functions. My only concern is that while the function names are not in conflict, they are similar to non-eml specific conversion functions jsonlite::toJSON and xml2::as_xml_document. The emld::as_xml details clarify that the funciton is specific to EML and emld objects -- perhaps it would be useful to clarify this in the as_json documentation as well?
  2. Although it is rather intuitive, please add the type of return object to the documentation for the functions as_emld, as_json, and as_xml to keep with rOpenSci packaging guidelines (1.6).
  3. Consider adding top-level documentation to return from ?emld in keeping with rOpenSci packaging guidelines (1.6).
  4. Are functions in as_eml_document.R, validate_units.R, etc. internal functions that should have #' @noRd as per rOpenSci packaging guidelines (1.6)?

Additional comments:

  • I did not encounter any issues with V8
  • All tests passed when running devtools::test().
  • On devtools::check() I encountered the following notes:
❯ checking package dependencies ... NOTE
  Package suggested but not available for checking: ‘spelling’

❯ checking top-level files ... NOTE
  Non-standard file/directory found at top level:
    ‘LICENSE.md’
  • the output of goodpractice:gp() similarly suggested that spelling was not available, however this message disappeared after I installed the spelling package.
 ✖ fix this R CMD check ERROR: Package suggested but not available: ‘spelling’ The suggested
    packages are required for a complete check. Checking can be attempted without them by setting the
    environment variable _R_CHECK_FORCE_SUGGESTS_ to a false value. See section ‘The DESCRIPTION file’ in
    the ‘Writing R Extensions’ manual.
  • The output of goodpractice::gp() indicates that 88% of code has test coverage, and that there are a few dozen code lines longer than 80 characters. The output of covr::package_coverage() indicates that all functions have greater than 75% test coverage.

@gothub
Copy link

gothub commented Jan 7, 2019

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

  • As the reviewer I confirm that there are no conflicts of interest for me to review this work (If you are unsure whether you are in conflict, please speak to your editor before starting your review).

Documentation

The package includes all the following forms of documentation:

  • A statement of need clearly stating problems the software is designed to solve and its target audience in README
  • Installation instructions: for the development version of package and any non-standard dependencies in README
  • Vignette(s) demonstrating major functionality that runs successfully locally
  • Function Documentation: for all exported functions in R help
  • Examples for all exported functions in R Help that run successfully locally
  • Community guidelines including contribution guidelines in the README or CONTRIBUTING, and DESCRIPTION with URL, BugReports and Maintainer (which may be autogenerated via Authors@R).
For packages co-submitting to JOSS
  • The package has an obvious research application according to JOSS's definition
  • [ ]The package contains a paper.md matching JOSS's requirements with:
  • A short summary describing the high-level functionality of the software
  • Authors: A list of authors with their affiliations
  • A statement of need clearly stating problems the software is designed to solve and its target audience.
  • References: with DOIs for all those that have one (e.g. papers, datasets, software).

Functionality

  • Installation: Installation succeeds as documented.
  • Functionality: Any functional claims of the software been confirmed.
  • Performance: Any performance claims of the software been confirmed.
  • Automated tests: Unit tests cover essential functions of the package
    and a reasonable range of inputs and conditions. All tests pass on the local machine.
  • Packaging guidelines: The package conforms to the rOpenSci packaging guidelines

Final approval (post-review)

  • The author has responded to my review and made changes to my satisfaction. I recommend approving this package.

Estimated hours spent reviewing: 5-6 hours


Reviewer Comments

The case made for the necessity of the emld package in converting EML to JSON-LD is clear and compelling. Providing the ability to inspect EML derived data as S3 lists, JSON and via SPARQL queries will facilitate creation of new tools and workflows to automate access and interpretation.

Documentation

In the Motivations section of README.md the purpose and method of "extending EML with other semantic vocabularies" isn't clear (to me). Consider adding a section such as "Extending EML" with an example, if this is within the scope of the package.

The help text for the as_xml Description contains only the text as_xml and does not describe what the function does. Also, the return type is not specified. The same is true for both as_emld and as_json, template.

There is no documentation index is available for the package.

There is no documentation for ?emld' or ?emld-package.

This may be outside the scope of the package, but no functions, strategies or examples are presented in the documentation for simple editing of EML data, i.e. (reading, simple edit, write out). This would be very useful, but if this functionality presents to much overlap with the EML package, pleas disregard this comment.

Functionality

The as_json, as_emld, as_xml functions have a clear purpose and work as expected.

In addition to testing using the provided code samples, the following checks for a complex EML document were performed for a couple of EML documents such as https://goa.nceas.ucsb.edu/#view/urn:uuid:3249ada0-afe3-4dd6-875e-0f7928a4c171:

  • verified that as_json produced valid JSON-LD
  • verified that the following commands produced the original EML file (round trip):
x <- as_emld('metadata.xml') # using 
as_xml(x, "new.xml") 

R Source

The following potentially unresolved items exist, as show from a quick scan of the source code:

$ grep FIXME *
as_emld.R:    ## FIXME technically this assumes only our context
as_xml.R:## FIXME drop NAs too?
eml_validate.R:    ##  FIXME shouldn't have to write to tempfile,
eml_validate.R:  # FIXME technically must be unique inside parent system only...
emld-methods.R:## FIXME: Print method should drop context

Misc

devtools::check() reports non-standard file LICENSE.md

@lmullen
Copy link
Member

lmullen commented Jan 11, 2019

Thank you, @khondula and @gothub for your thorough reviews.

@cboettig: Please let us know if you have any questions about the reviews. Will you be able to complete the necessary changes in two weeks time, i.e., by January 25?

@cboettig
Copy link
Member Author

cboettig commented Jan 26, 2019

@khondula @gothub @lmullen Thanks for these very helpful and detailed reviews! I believe I have addressed all of these issues now. I've made the edits in a separate branch and created a PR, ropensci/emld#30 which I will leave un-merged for now, hopefully that provides an easy way to see the differences. I've also summarized my replies to each of your point-by-point reviews (with a few clarifying questions and comments) in the two issues linked in that PR. Anchors aweigh 🚢 ⚓️!

@lmullen
Copy link
Member

lmullen commented Jan 26, 2019

Thanks for making these changes, @cboettig.

@khondula and @gothub: Could you please review these changes and make any further comments within one week, by Feb 2? If the package is now satisfactory, you can check of any remaining relevant boxes in your main review.

@gothub
Copy link

gothub commented Feb 1, 2019

@lmullen @cboettig I had a look at ropensci/emld#29 and it looks great!

This package is a really useful contribution, thx!

@khondula
Copy link

khondula commented Feb 4, 2019

@cboettig re: the template function - I think if I were more familiar with the EML package I would agree that it might fit there better, but it seems like a nice function to have since it might help a user get more familiar with eml if they are starting from scratch. The fxn documentation seems to still refer to it being used in the readme though ("see vignettes"), which might be confusing( unless I am missing a different vignette).

otherwise everything looks great to me!

@cboettig
Copy link
Member Author

cboettig commented Feb 5, 2019

@khondula @gothub Great! Thanks to you both for the constructive reviews. I've gone ahead and merged my own PR now.

@khondula
I rebuilt the vignette, should have no mention of the template() function there now and I didn't see anything in searching for template in any of the main user facing functions. It's still in the package as a bit of a vestigial organ for the time being, may either be flushed our or removed in favor of functions from EML package.

I think that's everything on my end for the moment; if you're happy with the little paper.md for the JOSS side? Over to you, @lmullen

@lmullen
Copy link
Member

lmullen commented Feb 16, 2019

Thank you @gothub and @khondula.

@cboettig We can consider this accepted now. Here are some post-review things for you to do.

  • You are clear to submit the package to CRAN.
  • Could you please transfer the repository to rOpenSci? Once you've done that I'll give you admin rights back.
  • Please add an rOpenSci footer to the repository README. ropensci_footer
  • Please add a CodeMeta file by running codemetar::write_codemeta()
  • Please double check that CI and their respective badges work after the transfer.

@lmullen lmullen closed this as completed Feb 16, 2019
@lmullen
Copy link
Member

lmullen commented Feb 16, 2019

@cboettig Some other things pertaining to the JOSS submission.

  • After the repository has been transferred, please generate a new release with a DOI. You may wish to wait until after CRAN acceptance.
  • Please submit the paper to JOSS: http://joss.theoj.org/papers/new

Once you've submitted the paper, please let me know the URL to the submission and I will confirm that it has been peer reviewed by rOpenSci.

@cboettig
Copy link
Member Author

cboettig commented Feb 23, 2019

Thanks @lmullen ! I've now completed the transfer to rOpenSci, and created the submission to JOSS at http://joss.theoj.org/papers/7e9064114c9a8d6756ef500cb5bc5d2b

@lmullen
Copy link
Member

lmullen commented Feb 26, 2019

@cboettig Thanks, Carl. Looks like everything is wrapped up. Glad to see everything went through JOSS just fine.

@stefaniebutland This new package/paper might be a good candidate for a blog post at rOpenSci.

@stefaniebutland
Copy link
Member

stefaniebutland commented Feb 26, 2019

Yes indeed @lmullen. Carl knows he has a wide open invitation to blog about his packages :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants