Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

data.world #133

Closed
11 of 14 tasks
rflprr opened this issue Jul 10, 2017 · 28 comments
Closed
11 of 14 tasks

data.world #133

rflprr opened this issue Jul 10, 2017 · 28 comments

Comments

@rflprr
Copy link

rflprr commented Jul 10, 2017

Summary

  • What does this package do? (explain in 50 words or less):
    This package allows users to consume data hosted on data.world. By way of its sibling package (dwapi) it also allows users to invoke data.world API endpoints.

  • Paste the full DESCRIPTION file inside a code block below:

Package: data.world
Title: Main Package for Working with 'data.world' Data Sets
Version: 1.1.1
Authors@R: c(
    person("Rafael", "Pereira", email = "rafael.pereira@data.world", role = c("aut", "cre")),
    person("Triet", "Le", email = "triet.le@data.world", role = c("aut")),
    person("Bryon", "Jacob", email = "bryon.jacob@data.world", role = c("aut")))
Description: High-level tools for working with data.world data sets. data.world is a community 
    where you can find interesting data, store and showcase your own data and data projects, 
    and find and collaborate with other members. In addition to exploring, querying and 
    charting data on the data.world site, you can access data via 'API' endpoints and 
    integrations. Use this package to access, query and explore data sets, and to 
    integrate data into R projects. Visit <https://data.world>, for additional information.
Depends: R (>= 3.3.0)
License: Apache License 2.0
LazyData: TRUE
Imports:
    dwapi,
    httr,
    ini
Suggests: 
    testthat,
    knitr,
    rmarkdown,
    readr
License: Apache License 2.0
Encoding: UTF-8
LazyData: true
URL: https://github.com/datadotworld/data.world-r
BugReports: https://github.com/datadotworld/data.world-r/issues
RoxygenNote: 6.0.1
VignetteBuilder: knitr
  • URL for the package (the development repository, not a stylized html page):
    https://github.com/datadotworld/data.world-r

  • Please indicate which category or categories from our package fit policies this package falls under *and why(? (e.g., data retrieval, reproducibility. If you are unsure, we suggest you make a pre-submission inquiry.):

[e.g., "data extraction, because the package parses a scientific data file format"]

This package fits under Data Retrieval (and/or, possibly, Data Publication), because it allows access to data on data.world and, also, creation of datasets via API wrappers in the dwapi package.

  • Who is the target audience?

Data scientists, data analysts, researchers, etc. — any professional or enthusiast working on data projects.

I am not aware of any other R packages designed to work with data.world.

Requirements

Confirm each of the following by checking the box. This package:

  • does not violate the Terms of Service of any service it interacts with.
  • has a CRAN and OSI accepted license.
  • contains a README with instructions for installing the development version.
  • includes documentation with examples for all functions.
  • contains a vignette with examples of its essential functions and uses.
  • has a test suite.
  • has continuous integration, including reporting of test coverage, using services such as Travis CI, Coeveralls and/or CodeCov.
  • I agree to abide by ROpenSci's Code of Conduct during the review process and in maintaining my package should it be accepted.

Publication options

  • Do you intend for this package to go on CRAN?
  • Do you wish to automatically submit to the Journal of Open Source Software? If so:
    • The package contains a paper.md with a high-level description in the package root or in inst/.
    • The package is deposited in a long-term repository with the DOI:
    • (Do not submit your package separately to JOSS)

Detail

  • Does R CMD check (or devtools::check()) succeed? Paste and describe any errors or warnings:

  • Does the package conform to rOpenSci packaging guidelines? Please describe any exceptions:

  • If this is a resubmission following rejection, please explain the change in circumstances:

  • If possible, please provide recommendations of reviewers - those with experience with similar packages and/or likely users of your package - and their GitHub user names:

@maelle
Copy link
Member

maelle commented Jul 11, 2017

Thanks a lot for your submission @rflprr ! What are the scientific applications of these datasets, do you have any examples to help us assess fit?

@maelle maelle added the package label Jul 11, 2017
@rflprr
Copy link
Author

rflprr commented Jul 11, 2017

Hi, @maelle. data.world is an open and community-driven platform. We have built a general purpose tool that serves researchers increasingly well (we are still early stage, but quickly adding functionality to aid reproducibility, for example). Searching the platform you can find several users in academia. Not all their datasets are public, so it's hard to give you an idea for the breath of scientific applications. Here is one example of a user who is a PhD student and who have created a few public datasets: https://data.world/jordanageorge?relationship=owner. I hope this helps.

@maelle
Copy link
Member

maelle commented Jul 11, 2017

Thanks @rflprr !

@sckott
Copy link
Contributor

sckott commented Jul 18, 2017

Doing my editor checks now, sorry for the delay ...

@sckott
Copy link
Contributor

sckott commented Jul 18, 2017

HI @rflprr

What was the reasoning for splitting out dwapi-r ? I'm not saying I disagree with the split, just asking why. I think I'm leaning towards asking reviewers to review the pkg submitted, but also to at least have a look at dwapi-r since it does some heavy lifting esp. wrt HTTP requests.

@rflprr
Copy link
Author

rflprr commented Jul 19, 2017

Hi, @sckott. Our plan is to provide wrappers for all our API endpoints. That's an ever growing list of low level operations that would pollute the namespace of our main library, which is intended to always expose a very limited set of high-level and easy to use operations. In short, we separated so there is more clear separation of concerns and more cohesive namespaces to better serve what we believe are two distinct groups of users 1) those who need access to low-level APIs and value flexibility, and 2) the vast majority who only care about a few high-level operations and value ease of use.

@sckott
Copy link
Contributor

sckott commented Jul 20, 2017

@rflprr Okay. Seems the same goal could be done while having the lower level methods in this pkg, but just not exported to the user. - But if there's use cases for a certain kind of user (power user) to use just the low level methods in the other pkg that makes sense. I'd like to have reviewers at least take a brief look at dwapi-r in addition to this pkg, okay?

@rflprr
Copy link
Author

rflprr commented Jul 20, 2017

@sckott, reviewing dwapi-r makes a lot of sense. Please go ahead and let me know if you find any issues.

@sckott
Copy link
Contributor

sckott commented Jul 25, 2017

Editor checks:

  • Fit: The package meets criteria for fit and overlap
  • Automated tests: Package has a testing suite and is tested via Travis-CI or another CI service.
  • License: The package has a CRAN or OSI accepted license
  • Repository: The repository link resolves correctly
  • Archive (JOSS only, may be post-review): The repository DOI resolves correctly
  • Version (JOSS only, may be post-review): Does the release version given match the GitHub release (v1.0.0)?

Editor comments

Thanks for your submission @rflprr !

Currently seeking reviewers. It's a good fit and not overlapping.

  • goodpractice::gp() output is below for reviewers to consider - feel free to run it yourself as well
It is good practice towrite unit tests for all functions, and all package code
    in general. 60% of code lines are covered by test cases.

    R/data.world-defunct.R:65:NA
    R/data.world-defunct.R:67:NA
    R/data.world-defunct.R:68:NA
    R/data.world-defunct.R:70:NA
    R/data.world-defunct.R:71:NA
    ... and 55 more lines

For ``

It is good practice towrite unit tests for all functions, and all package code
    in general. 79% of code lines are covered by test cases.

    R/add_file.R:65:NA
    R/add_file.R:95:NA
    R/add_file.R:97:NA
    R/add_file.R:98:NA
    R/add_file.R:99:NA
    ... and 162 more linesavoid 1:length(...), 1:nrow(...), 1:ncol(...),
    1:NROW(...) and 1:NCOL(...) expressions. They are error prone and
    result 1:0 if the expression on the right hand side is zero. Use
    seq_len() or seq_along() instead.

    R/dataset_summary_response.R:41:15
    R/table_schema_response.R:28:15fix this R CMD check NOTE: Namespace in Imports field not
    imported from:xml2All declared Imports should be used.fix this R CMD check NOTE: Found the following hidden
    files and directories: .lintr These were most likely included in
    error. See sectionPackage structurein theWriting R
    Extensionsmanual.

Reviewers: @MarkEdmondson1234 @nistara
Due date: 2017-08-23

@sckott
Copy link
Contributor

sckott commented Jul 26, 2017

Thanks for agreeing to review @MarkEdmondson1234 & @nistara !!!

Due date is 2017-08-23

let me know if you have any questions about any part of the process

@MarkEdmondson1234
Copy link

Package Review - data.world-r (dwapi-r)

Summmary

Overall the package is a nice addition, allow access to a crowd-sourced variety of data that would be useful in many projects. The richness of the datasets will depend largely on how popular the website data.world becomes, but already there are some good examples for teaching and in various sectors. This package's success is tied to the success of that endeavour, to which I encourage and wish well.

The line between data.world and dwapi is a little blurry, and I do question the need to have seperate packages given as in the short time I got to use data.world I was already needing to dip into the dwapi package to get what I needed, with the added confusion of where documentation should sit etc. I would suggest either to export/provide wrappers for all the functions that are useful in data.world, or to merge it all into dwapi, particularly as the documentation of data.world is a little sparse. Myself I am inclined to use dwapi directly as perhaps other R users will too, but if the documentation was all in data.world with the most common useful functions (such as metadata requests) that could change. If worried about complication of functions available, more use of @noRd looks to solve that issue.

Review template

  • As the reviewer I confirm that there are no conflicts of interest for me to review this work (If you are unsure whether you are in conflict, please speak to your editor before starting your review).

Documentation

The package includes all the following forms of documentation:

  • A statement of need clearly stating problems the software is designed to solve and its target audience in README
  • Installation instructions: for the development version of package and any non-standard dependencies in README

I could install with no problems following the README instructions.

The "Hello World" example worked quickly and well.

  • Vignette(s) demonstrating major functionality that runs successfully locally

The vignettes were the quickstart and query. The Quickstart worked getting youto your first query

quickstart

I followed the first vignette as suggested via vignette("quickstart", package = "data.world")

The installation instructions offer three options (runtime, environment variables, runtime), but only include an example for one (via config file).

The help files in ?save_config and ?set_config could detail more about what is happening as well.

The start guide states to not keep your API token in code, but then only gives examples where it is in code. I manually setup a environment argument in my .Renviron for use, but that assumes users know about how to do that (as say explained in httr's docs) but other R users may not.

I did it like this:

saved_cfg <- data.world::save_config(Sys.getenv("DATADOTWORLD"))

data.world::set_config(saved_cfg)

...but its not clear if this is a good way to handle it or how save_config or set_config works as I saw no files in my directory.

Now I inspect the code I see it expects DW_AUTH_TOKEN as the environment file - but this needs to be documented somewhere, and the examples updated to show its use.

The API token is also very long, is this necessary? Compare the GitHub personal access token and this packages:

nchar(Sys.getenv("DATADOTWORLD"))
# [1] 340
nchar(Sys.getenv("GITHUB_PAT"))
# [1] 40

query

This vignette went into a lot more detail on how to query data, with some good links to further documentation.

All the SQL examples worked, but the SPARQL SELECT example returned no results:

assists_vs_height <- data.world::qry_sparql(paste(
  "BASE <http://data.world/jonloyens/an-intro-to-dataworld-dataset/> ",
  "PREFIX t: <DataDotWorldBBallTeam.csv/DataDotWorldBBallTeam#> ",
  "PREFIX s: <DataDotWorldBBallStats.csv/DataDotWorldBBallStats#> ",
  "SELECT ?name ?height ?assists WHERE { ",
  "  ?pt t:Name ?name . ",
  "  ?ps s:Name ?name . ", # the join column 
  "  ?pt t:Height ?height . ",
  "  ?ps s:AssistsPerGame ?assists . ",
  "} ",
  "ORDER BY DESC (?assists)", sep = "\n"
))

data.world::query(assists_vs_height, intro_ds)
# A tibble: 0 x 3
# ... with 3 variables: name <chr>, height <chr>, assists <chr>

And the parameterized example errored:

assists_greater_than <- data.world::qry_sparql(paste(
  "BASE <http://data.world/jonloyens/an-intro-to-dataworld-dataset/> ",
  "PREFIX t: <DataDotWorldBBallTeam.csv/DataDotWorldBBallTeam#> ",
  "PREFIX s: <DataDotWorldBBallStats.csv/DataDotWorldBBallStats#> ",
  "SELECT ?name ?height ?assists WHERE { ",
  "  ?pt t:Name ?name . ",
  "  ?ps s:Name ?name . ", # the join column 
  "  ?pt t:Height ?height . ",
  "  ?ps s:AssistsPerGame ?assists . ",
  "  FILTER(?assists > $v1) ",
  "} ",
  "ORDER BY DESC (?assists)", sep = "\n"
))
assists_greater_than$params <- c("$v1" = 10)
data.world::query(assists_greater_than, intro_ds)
# No encoding supplied: defaulting to UTF-8.
# Error in dwapi::sparql(dataset = dataset, query = qry$query_string, query_params = qry$params) : 
#   Error: Unbound variable used in filter: $v1   (line 9, column 21)
  • Function Documentation: for all exported functions in R help

There was a helpful list of defunct functions, I should do that for my packages too :)

There was also a helpful intro help file under ?data.world

The help files for cfg, cfg_env, and cfg_saved looked to be internal functions - could they be removed from the function list via #' @noRd? Similarly the methods for query and set_config since they all point at the same documentation.

The documentation for the main useful functions such as qry_sparql and qry_sql is very sparse, some more detail and at least a link to the documentation that is linked in the vignettes would be useful.

  • Examples for all exported functions in R Help that run successfully locally

The most important functions in the package (as judged by the vignette metions) do not have examples in their help files yet - namely qry_sparql and qry_sql - copying over the (working) examples from the vignettes in \dontrun{} would be helpful for day to day use.

  • Community guidelines including contribution guidelines in the README or CONTRIBUTING, and DESCRIPTION with URL, BugReports and Maintainer (which may be autogenerated via Authors@R).

There are no contributing guidelines yet.

The DESCRIPTION has a URL and BugReports to GitHub, as well as the authors/maintainers via Authors@R

The Description looks like it needs to be line-breaked at 80 characters more, as it did not display correctly compared with other packages in the Help pane.

Functionality

  • Installation: Installation succeeds as documented.
  • Functionality: Any functional claims of the software been confirmed.
  • Performance: Any performance claims of the software been confirmed.
  • Automated tests: Unit tests cover essential functions of the package
    and a reasonable range of inputs and conditions. All tests pass on the local machine.

The tests covered a good range and used mock testing to quickly pass on the local machine.

  • Packaging guidelines: The package conforms to the rOpenSci packaging guidelines
  • The functions under rOpenSci guidelines would have some kind of object_verb structure, such as dw_qry_sql, dw_set_config, assuming dw_ was your chosen object.
  • In the README the Installation and Configuration section should be higher in the page before the demo.
  • The rOpenSci review badge is not yet present
  • A bit more description of what Data World does as an API service overall, as described in the rOpenSci submission.
  • A Code of Conduct could be added
  • The NEWS.md could have more details such as package name, version and date. It should also be in descending date order.
  • The package versioning usually also follows what is released on CRAN, not revisions before that (for development versions off CRAN, the suggestion is to use 1.1.1.9000 as a way to distinguish them) See here for more details.
  • In dwapi, rjson is used instead of the guidelines jsonlite - example code: dwapi::add_files_by_source

Code review - data.world

data.world moves a lot of the work to the supporting dwapi package, so only minor suggestions here

structure()

The below code could use the existing structure() function, and there is no need for the return():

e.g.

## present function
qry_sql <- function(query_string, params = NULL) {
  me <- list(query_string = query_string, params = params)
  class(me) <- "qry_sql"
  return(me)
}

## suggested
qry_sql <- function(query_string, params = NULL) {
  structure(
    list(query_string = query_string, params = params),
    class = "qry_sql"
  )
}

Repeat for qry_sparql, cfg, cfg_env, cfg_saved etc.

config ini files

Only after examination of the code did I find the package is writing .ini files to my system without my knowledge. e.g. getOption("dw.config_path") = "/Users/mark/.dw/config" - I would suggest at the very least to document this somewhere and/or a message stating a file is being written to my system. The config file is also written outside of my working directory, which is unexpected. I would prefer it writes to the working directory in a visible file, and I can choose the option myself if it is to be written elsewhere on my system, especially if this is a file that is created/saved on loading the package via .onAttach.

discovery queries

Its a little confusing what should data.world be used for compared with dwapi - for example I was looking for the functionality of dwapi::list_tables() in the data.world package, which I think should have at least a wrapper for or re-exported or something.

Code review - dwapi

linter issue?

The formatting for function arguments looked like incorrect indentation to me, and RStudio agreed when I did CMD+I on the lines:

# before
file_create_request <- function(file_name,
  url,
  description = NULL,
  labels = NULL) {
  if (file_name == "" | url == "") {
    stop("name and url required")
  }

  me <-
    list(name = file_name, source = file_source_create_request(url))
  if (!is.null(description)) {
    me$description <- description
  }
  if (!is.null(labels)) {
    me$labels <- labels
  }

  class(me) <- "file_create_request"
  return(me)
}

# after CMD+I
file_create_request <- function(file_name,
                                url,
                                description = NULL,
                                labels = NULL) {
  if (file_name == "" | url == "") {
    stop("name and url required")
  }

  me <-
    list(name = file_name, source = file_source_create_request(url))
  if (!is.null(description)) {
    me$description <- description
  }
  if (!is.null(labels)) {
    me$labels <- labels
  }

  class(me) <- "file_create_request"
  return(me)
}

using structure()

Again structure() could be used to make the classes, and a helper class to get rid of NULLs from a list will let you make neater objects:

# before
file_create_or_update_request <- function(file_name,
                                          url = NULL,
                                          description = NULL,
                                          labels = NULL) {
  if (file_name == "") {
    stop("name is required")
  }

  me <- list(name = file_name)
  if (!is.null(url)) {
    me$source <- file_source_create_or_update_request(url)
  }
  if (!is.null(description)) {
    me$description <- description
  }
  if (!is.null(labels)) {
    me$labels <- labels
  }

  class(me) <- "file_create_or_update_request"
  return(me)
}

# after

# function to remove NULLs from lists. 
compact <- function(x) {
  null <- vapply(x, is.null, logical(1))
  x[!null]
}

file_create_or_update_request <- function(file_name,
                                          url = NULL,
                                          description = NULL,
                                          labels = NULL) {
  if (file_name == "") {
    stop("name is required")
  }
  
  me <- structure(
    list(
      name = file_name,
      # may need to alter to return NULL if passed NULL
      source = file_source_create_or_update_request(url),
      description = description,
      labels = labels
    ),
    class = "file_create_or_update_request"
  )
  
  compact(me)

}

growing vectors

The method of adding to requests doesn't look like R standards since its using your_vector[[length(your_vector) + 1]] <- my_addition rather than c(your_vector, my_addition)

Also why is dwapi:: used here within the dwapi package? I guess it got migrated and forgot its internal now.

# before
add_file.dataset_create_request <-
  function(request,
           name,
           url,
           description = NULL,
           labels = NULL) {
    existing_files <- request$files
    # O(N) ?
    existing_files[[length(existing_files) + 1]] <-
      dwapi::file_create_request(
        name, url, description = description, labels = labels)
    request$files <- existing_files
    request
  }

# after
add_file.dataset_create_request <-
  function(request,
           name,
           url,
           description = NULL,
           labels = NULL) {
    
    request$files <- c(request$files, 
                       file_create_request(name, 
                                           url, 
                                           description = description, 
                                           labels = labels))
    request
  }

Using httr::RETRY

You may want to consider using httr::RETRY instead of httr::POST for API requests, to mitigate against network errors. Example it could be here

Creating a generic API fetch function

A lot of the API requests are constructing their own httr::POSTetc. requests with repeated information such as auth, user agent etc. This results in lots of repeated code, so I suggest a generic "fetch the API with these arguments" function be created, and then pass arguments to that.

e.g something like:

dw_do_api <- function(verb, path_url, query_param = NULL, body = NULL){

  base_url <- getOption("dwapi.api_url")
  api_url <- paste0(base_url, path_url)
  
  if(!is.null(query_param)){
    api_url <- paste0("?", paste0(query_param, collapse="&"))
  }
  
  response <-
        httr::RETRY(verb, 
                    url = api_url,
                    httr::add_headers(Authorization = sprintf("Bearer %s", auth_token())),
                    httr::user_agent(user_agent()),
                    body = body
                    )
                    
  parse_success_or_error(response)
}

Then API functions can use it like:

# before
delete_files <-
  function(dataset, file_names) {
    query_param <-
      lapply(file_names, function(name)
        sprintf("name=%s", name))
    if (length(query_param) == 0) {
      print("empty names input = no-op")
    } else {
      api_url <-
        paste(
          sprintf(
            "%s/datasets/%s/files?", getOption("dwapi.api_url"),
            extract_dataset_key(dataset)),
          paste0(query_param, collapse = "&"),
          sep = ""
        )
      auth <- sprintf("Bearer %s", auth_token())
      response <-
        httr::DELETE(
          api_url,
          httr::add_headers(Authorization = auth),
          httr::user_agent(user_agent())
        )
      ret <- parse_success_or_error(response)
      ret
    }
  }

# after
delete_files <-
  function(dataset, file_names) {
    query_param <-
      lapply(file_names, function(name)
        sprintf("name=%s", name))
    if (length(query_param) == 0) {
      print("empty names input = no-op")
    } else {
      dw_do_api("DELETE",
                path_url = sprintf("/datasets/%s/files", extract_dataset_key(dataset)),
                query_param = query_param)
    }
  }

Repeat as necessary for all the other API requests. You could also add the response status code test for 200 and parsing out the error message if incorrect to the dw_do_api function which should mean a lot less code needed overall. For instance, swapping from rjson to jsonlite at the moment will mean a lot of refactoring, which could be avoid by making a centralised fectch/parse function. You can also then do things like turn on verbose() for checking HTTP requests more easily.

Final approval (post-review)

  • The author has responded to my review and made changes to my satisfaction. I recommend approving this package.

Estimated hours spent reviewing: 5

@sckott
Copy link
Contributor

sckott commented Aug 4, 2017

thanks @MarkEdmondson1234 very much for your review!

@rflprr
Copy link
Author

rflprr commented Aug 9, 2017

@sckott and @MarkEdmondson1234, first, thank you. Second, I'm unsure about the process at this point. Lot's of good feedback here, but no sense of severity. With limited resources, I'd like to limit any changes to a minimum and continue to implement improvements over time. What is your expectation at this point?

The main point I would like to address is why data.world and dwapi. We are serving two audiences with those packages, those looking for high-level and low entry barrier features and those looking for low-level and maximum flexibility, respectively. I want to make them as easy to learn and their features easy to discover for the right users and that's why I opted for two packages (two namespaces) and to export all relevant functions from each. data.world is sparse right now because it's still in its infancy we want to be judicious about what features we add to it. Our intent is for it to contain functionality that is most useful in the context of interactive programming (RStudio, R Notebooks, etc). dwapi, on the other hand, is more full-featured because less thought goes into its design. It will simply evolve along-side our REST API and will be better suited for scripts, headless apps, and web apps (e.g. Shiny).

If you object to this approach or see any severe implementation/conformance flaws, let me know. Otherwise, I'll create issues on our repo for everything you reported and would like to address them over time.

Thanks again!

@sckott
Copy link
Contributor

sckott commented Aug 9, 2017

@rflprr at this point I think it's best to wait for the 2nd review to come in to have all the reviewer comments. Once the 2nd review is in, I can help think about which changes may be most important.

@rflprr
Copy link
Author

rflprr commented Aug 22, 2017

Hello, @sckott. Thanks for your response. Any idea of when I should expect the 2nd review to be completed?

@sckott
Copy link
Contributor

sckott commented Aug 22, 2017

@rflprr soon hopefully 😄

@nistara due date is tomorrow - please get your review in soon, thanks!

@nistara
Copy link

nistara commented Aug 22, 2017

@sckott and @rflprr , I'll be adding my comments today :)

@nistara
Copy link

nistara commented Aug 23, 2017

Summary

The data.world package is mainly for retrieving data from the data.wprd website, whereas the dwapi package provides greater functionality like updating and uploading data, editing metadata, etc. It might be helpful to combine the two together.

Overall I think these are very useful packages which will encourage the use and analyses of data from data.world!

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

  • As the reviewer I confirm that there are no conflicts of interest for me to review this work (If you are unsure whether you are in conflict, please speak to your editor before starting your review).

Documentation

The package includes all the following forms of documentation:

  • A statement of need clearly stating problems the software is designed to solve and its target audience in README

I suggest giving a little more information about the data.world website in the README as well; this would help orient new users and increase interest in the package as well as data.world. Perhaps also shift the installation and configuration above the getting started example. There's a slight redundancy in suggesting the checking out of the "quickstart" vignette.

  • Installation instructions: for the development version of package and any non-standard dependencies in README

I could install the development version of the package without any problem.

  • Vignette(s) demonstrating major functionality that runs successfully locally

The vignettes were helpful in understanding the core functionalities of the package.

quickstart

Does not mention how configuration can be provided via environment variables and at runtime. Tangentially related, when I signed up with data.world, there was a typo for "collaborators" in the getting started screen.

query

The query vignette was useful and it's nice that the examples are based on the same dataset and links to more information on SQL and SPARQL are provided. The SPARQL examples don't work as expected however:

SELECT queries
No data is returned with this example:


assists_vs_height <- data.world::qry_sparql(paste(
    "BASE <http://data.world/jonloyens/an-intro-to-dataworld-dataset/> ",
    "PREFIX t: <DataDotWorldBBallTeam.csv/DataDotWorldBBallTeam#> ",
    "PREFIX s: <DataDotWorldBBallStats.csv/DataDotWorldBBallStats#> ",
    "SELECT ?name ?height ?assists WHERE { ",
    "  ?pt t:Name ?name . ",
    "  ?ps s:Name ?name . ", # the join column 
    "  ?pt t:Height ?height . ",
    "  ?ps s:AssistsPerGame ?assists . ",
    "} ",
    "ORDER BY DESC (?assists)", sep = "\n"
))

data.world::query(assists_vs_height, intro_ds)

# A tibble: 0 x 3
# ... with 3 variables: name <chr>, height <chr>, assists <chr>

Parameterized SPARQL queries
Returns an error:

assists_greater_than <- data.world::qry_sparql(paste(
    "BASE <http://data.world/jonloyens/an-intro-to-dataworld-dataset/> ",
    "PREFIX t: <DataDotWorldBBallTeam.csv/DataDotWorldBBallTeam#> ",
    "PREFIX s: <DataDotWorldBBallStats.csv/DataDotWorldBBallStats#> ",
    "SELECT ?name ?height ?assists WHERE { ",
    "  ?pt t:Name ?name . ",
    "  ?ps s:Name ?name . ", # the join column 
    "  ?pt t:Height ?height . ",
    "  ?ps s:AssistsPerGame ?assists . ",
    "  FILTER(?assists > $v1) ",
    "} ",
    "ORDER BY DESC (?assists)", sep = "\n"
))

assists_greater_than$params <- c("$v1" = 10)
data.world::query(assists_greater_than, intro_ds)

# No encoding supplied: defaulting to UTF-8.
# Error in dwapi::sparql(dataset = dataset, query = qry$query_string, query_params = qry$params) : 
  # Error: Unbound variable used in filter: $v1   (line 9, column 21)
  • Function Documentation: for all exported functions in R help

    • some more details in the function descriptions would be great. As of now the descriptions for the functions are the same as their titles.

    • save_config() has an argument, profile, which is defined as the configuration profile. It doesn't explain what the default profile is, or which other options exist. I think these profiles are the same as the ones listed in set_config(), however they should be explained here as well.

    • The README mentions the following, however most of the functions, e.g. query(), qry_sql() do not provide information on the dataset parameter.

    Notice that dataset is parameter required by most functions and can be provided in two formats:

    URL: "https://data.world/jonloyens/an-intro-to-dataworld-dataset"
    Path: "jonloyens/an-intro-to-dataworld-dataset"

    • Based on the above points, it might be useful to integrate the information provided in the README, vignettes and the function documentation. Also make the function documentation more detailed than it currently is (including adding more examples).
  • Examples for all exported functions in R Help that run successfully locally

    • qry_sql() and qry_sparql() do not have examples in their respective documentation.

    • The example provided for query() gives an error, probably because it should be qry_sql instead of sql:

     sql_stmt <- data.world::sql("SELECT * FROM Tables")
     query_results_df <- data.world::query(
     sql_stmt, "jonloyens/an-intro-to-dataworld-dataset")
    
     # Error: 'sql' is not an exported object from 'namespace:data.world'
    
    
  • Community guidelines including contribution guidelines in the README or CONTRIBUTING, and DESCRIPTION with URL, BugReports and Maintainer (which may be autogenerated via Authors@R).

    • There are no contribution guidelines

    • URL and BugReports links are provided in the DESCRIPTION, and so is the Maintainer (autogenerated via Authors@R)

Functionality

  • Installation: Installation succeeds as documented.

  • Functionality: Any functional claims of the software been confirmed.

  • Performance: Any performance claims of the software been confirmed.

  • Automated tests: Unit tests cover essential functions of the package
    and a reasonable range of inputs and conditions. All tests pass on the local machine.

  • Packaging guidelines: The package conforms to the rOpenSci packaging guidelines

    • object_verb() naming scheme could be considered, although function names might get too long when differentiating between sql and sparql.

    • As mentioned before, the README should include citation information, and also consider restructing to first, Installation instructions, and then the Getting started/Brief demonstration usage. More information about data.world itself should be also be provided in the README.

    • The news.md file should contain the package name and dates of release. Consider using the section headers NEW FEATURES, MINOR IMPROVEMENTS, BUG FIXES, DEPRECATED AND DEFUNCT while providing information about each release. See here.

    • The package passed R CMD check/devtools::check()

    • Consider adding a coverage badge to the README

    • rjson is used instead of jsonlite in the following places:

      ./R/add_files_by_source.R:71:        body = rjson::toJSON(file_batch_update_req),
      ./R/create_dataset.R:46:      body = rjson::toJSON(create_dataset_req),
      ./R/create_dataset.R:57:      create_dataset_response(rjson::fromJSON(httr::content(
      ./R/create_dataset.R:64:      error_message(rjson::fromJSON(httr::content(
      ./R/get_dataset.R:42:      rjson::fromJSON(httr::content(
      ./R/get_dataset.R:50:      error_message(rjson::fromJSON(httr::content(
      ./R/get_table_schema.R:43:      rjson::fromJSON(httr::content(
      ./R/get_table_schema.R:51:      error_message(rjson::fromJSON(httr::content(
      ./R/list_tables.R:47:      rjson::fromJSON(httr::content(
      ./R/list_tables.R:54:      error_message(rjson::fromJSON(httr::content(
      ./R/replace_dataset.R:39:        body = rjson::toJSON(dataset_replace_req),
      ./R/update_dataset.R:44:        body = rjson::toJSON(dataset_update_req),
      ./R/update_table_schema.R:39:    body = rjson::toJSON(table_schema_update_req),
      ./R/utils.R:57:        success_message(rjson::fromJSON(httr::content(
      ./R/utils.R:64:        error_message(rjson::fromJSON(httr::content(
      
    • goodpractice::gp() for dwapi says:

      ✖ avoid 1:length(...), 1:nrow(...), 1:ncol(...),
          1:NROW(...) and 1:NCOL(...) expressions. They are error prone and
          result 1:0 if the expression on the right hand side is zero. Use
          seq_len() or seq_along() instead.
    

    However looking at the respective lines, the authors do check whether length is > 0:

      if (length(structure$files) > 0) {
        for (i in 1:length(structure$files)) {
    	me$files[[i]] <-
           file_summary_response(structure$files[[i]])
        }
      } 
    
    • I agree with @MarkEdmondson1234 regarding indentation of code in dwapi, and this occurs beyond the example he provided, for e.g. even in add_file_by_source().

    • Maybe add the .lintr file to .Rbuildignore?

    • Spelling errors in dwapi:

      devtools::spell_check()
      WORD           FOUND IN
      begining       dwapi.Rd:16
      compability    list_tables.Rd:21, sparql.Rd:21
      reponse        create_dataset_response.Rd:5,16
    
    • I'm unfamiliar with structure() as @MarkEdmondson1234 suggests (thank you - learnt something new!). I do however see differences in the way objects are returned. Sometimes return() is used, and other times it's not. It's probably better practice to be consistent about it (and use return() only if needed).

       #' Return the current data.world version.
       #' @return Current package version.
       #' @keywords internal
       sdk_version <- function() {
         is.nothing <- function(x)
           is.null(x)
         # nolint start
         if (!is.nothing(utils::sessionInfo()$otherPkgs$data.world)) {
           ret <- utils::sessionInfo()$otherPkgs$dwapi$Version
         } else {
           ret <- "X.X.X"
         }
         # nolint end
         ret
       }
      
       #' Return the data.world user-agent.
       #' @return User-agent string.
       #' @keywords internal
       user_agent <- function() {
         ret <- sprintf("data.world-R - %s", sdk_version())
         ret
       }
      
       #' Extract the dataset key from URL or as provided.
       #' @param tentative_key key or URL.
       #' @return dataset key extracted form URL or as provided.
       #' @keywords internal
       extract_dataset_key <- function(tentative_key) {
         url <- httr::parse_url(tentative_key)
         return(url$path)
       }
      
      • data.world vs dwapi
        It looks like the authors have their reasons for having these two different packages, and while I understand that data.world and dwapi could be aimed at users with different skill levels, functions that do the same thing across both packages should be called similarly. For e.g. data.world::query() basically does the same thing as dwapi::sql() or dwapi::sparql(), the difference being that queries in data.world::query() are passed in with qry_sql() or qry_sparqll(). This could get confusing for users trying to reconcile the two packages or switch between them.

Final approval (post-review)

  • The author has responded to my review and made changes to my satisfaction. I recommend approving this package.

Estimated hours spent reviewing: 7

@sckott
Copy link
Contributor

sckott commented Aug 23, 2017

thanks for your review @nistara !!!

@rflprr continue discussion/feedback for reviewers here ...

@sckott
Copy link
Contributor

sckott commented Sep 5, 2017

@rflprr let me or the reviewers know if you have any questions as you go through reviews

@rflprr
Copy link
Author

rflprr commented Sep 7, 2017

@sckott While constructive and certainly helpful, I'm not seeing anything in the feedback that appears to be severe/urgent. If you do, calling those things out would be the most helpful thing at this point.

Thank you very much, @MarkEdmondson1234 and @nistara.

Off-topic: If you know good R engineers who we could hire on a freelance basis to help us improve and continue to develop these libraries, I would appreciate introductions.

@sckott
Copy link
Contributor

sckott commented Sep 7, 2017

@rflprr can you clarify what you mean by "I'm not seeing anything in the feedback that appears to be severe/urgent"?

the way our process works is we want submitters to respond to reviewers, making changes or if they don't think they should change something then give a reason why not.

@sckott
Copy link
Contributor

sckott commented Sep 7, 2017

to clarify:

The purpose of this review is to improve the package along many lines (usability, performance, documentation, maintainability). The first people you need to convince of what is or is not necessary are the reviewers. Here's a good example of responding to reviewers (though it doesn't all have to be in one comment): #93 (comment)

If you and a reviewer disagree about the necessity of something, then, an editor can weigh in (e.g., #127 (comment))

While we understand that there is considerable effort in updating a package post-review, we think it is worth it.

@sckott
Copy link
Contributor

sckott commented Sep 26, 2017

@rflprr Any thoughts on my comments above?

@sckott
Copy link
Contributor

sckott commented Oct 12, 2017

@rflprr we do expect responses to reviewers, you don't have to respond to every single item they raised, but no response is not allowed.

Please get your responses to reviewers in within 3 weeks. if not, we'll close the issue, and you're welcome to resubmit at a later point if you like

@rflprr
Copy link
Author

rflprr commented Oct 16, 2017

@sckott sorry for the communication hiatus (it's been busy period and we are resource constrained).

The reviewers did a very thorough job. Reading though both reviews I see many excellent suggestions, but none that would significantly improve (or de-risk) the user experience in the immediate term (in other words, none that appear to be critical). Most importantly, all suggestions can be implemented in a backwards compatible way over time.

I cannot commit to making these changes at the moment, so we can proceed with the packages in their current stage or pause/abort the onboarding process, if you think that is the right thing to do.

Additionally, we intend to continue to develop these packages if I can find help. I was serious when I asked for references for free-lance developers. If you know good ones and could get me in touch with someone, I would certainly appreciate any intros.

@sckott sckott added the holding label Oct 19, 2017
@sckott
Copy link
Contributor

sckott commented Oct 19, 2017

Okay, we'll put it on hold for now. That means if you or someone else has time later to make changes/respond to reviewers, we can remove the holding tag and proceed.

@sckott
Copy link
Contributor

sckott commented Feb 25, 2019

According to our policies https://ropensci.github.io/dev_guide/policies.html#review-process it's been more than 1 year since this has been on hold (holding label) and will now be closed. If you wish to re-submit, you'll need to start a new submission

@sckott sckott closed this as completed Feb 25, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants