New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ropenaq package for R #24

Closed
maelle opened this Issue Dec 31, 2015 · 32 comments

Comments

Projects
None yet
6 participants
@maelle
Member

maelle commented Dec 31, 2015

    1. What does this package do? (explain in 50 words or less)
      It provides an interface to the OpenAQ API. OpenAQ provides open air quality data for many locations around the world, whose number is growing. The R package allows to check data availability and to download measurements from R as a dplyr data.table.
    1. Paste the full DESCRIPTION file inside a code block (bounded by ``` on either end).

Package: Ropenaq Type: Package Title: The Package is a R Interface to the openaq API Version: 0.1 Date: 2015-12-30 Author: Maëlle Salmon Maintainer: Who to complain to <maelle.salmon@yahoo.se> Description: See https://docs.openaq.org/ License: GPL (version 2 or later) LazyData: TRUE RoxygenNote: 5.0.1 URL: http://github.com/masalmon/Ropenaq BugReports: http://github.com/masalmon/Ropenaq/issues Suggests: knitr, rmarkdown VignetteBuilder: knitr Depends: httr, lubridate, dplyr, testthat, XML

    1. URL for the package (the development repository, not a stylized html page)
      https://github.com/masalmon/Ropenaq
    1. What data source(s) does it work with (if applicable)?
      Open AQ https://openaq.org/#/
    1. Who is the target audience?
      People that need air quality data for their studies and do not know how to access the API, or want to do the analysis in R anyway. They could be e.g. epidemiologists. I think that if the R package has users, the Open AQ platform will also get more attention and thus get even more data sources.
    1. Are there other R packages that accomplish the same thing? If so, what is different about yours?
      No, this is the first and only R package for accessing the OpenAQ API (I first asked one of the OpenAQ co-founders)
    1. Check the box next to each policy below, confirming that you agree. These are mandatory.
  • [x ] This package does not violate the Terms of Service of any service it interacts with.
  • [ x] The repository has continuous integration with Travis and/or another service
  • [ x] The package contains a vignette
  • [ x] The package contains a reasonably complete readme with devtools install instructions
  • [x ] The package contains unit tests
  • [ x] The package only exports functions to the NAMESPACE that are intended for end users
    1. Do you agree to follow the rOpenSci packaging guidelines? These aren't mandatory, but we strongly suggest you follow them. If you disagree with anything, please explain.
      Yes, as much as I can.
  • Are there any package dependencies not on CRAN?
  • [ x] Do you intend for this package to go on CRAN?
    When it is more developped, yes, it could be usefool.
  • [ x] Does the package have a CRAN accepted license?
  • Did devtools::check() produce any errors or warnings? If so paste them below.
    1. Please add explanations below for any exceptions to the above:
    1. If this is a resubmission following rejection, please explain the change in cirucmstances.
@sckott

This comment has been minimized.

Member

sckott commented Jan 6, 2016

@masalmon Thanks for your submission. We are looking for reviewers now.

@maelle

This comment has been minimized.

Member

maelle commented Jan 6, 2016

Ok, thanks for the update!

@sckott

This comment has been minimized.

Member

sckott commented Jan 7, 2016

Reviewers: @aammd @ateucher

@ateucher

This comment has been minimized.

Member

ateucher commented Jan 29, 2016

Review

General Comments

  • This is a really useful package providing access to an equally useful service that I can see filling a large need. I know quite a few air quality scientists that use R, and I'm sure this will be a big help to them. I'm actually surprised that this is the first package providing this functionality - nice work @masalmon!
  • The functions all perform as expected, and from what I can see provide access to the full suite of servcies that the API offers. Overall, the documentation is clear and concise.
  • I had a look over at the github page for the OpenAQ API, and it appears that the API is not yet fully mature. This must make developing an R package for it a bit challenging (but also likely helps raise important questions about its functionality and expose hidden bugs so is also probably quite useful in that regard).
  • All examples in the README and function documentation run without errors.
  • Running devtools::check() turned up five NOTEs, one WARNING, and one ERROR. The error is related to failing tests (see note below), and I think you should be able to address the rest relatively easily from the messages. Many of them are due to files/folders that should be listed in .Rbuildignore (.lintr, .travis.yml, README.Rmd, etc.).
  • The vignette doesn't build due to a Gateway Timeout (as noted in ropensci/ropenaq#5)
  • develop folder - It's probably a good idea to move this to a development branch and merge features into master as they are developed.

rOpenSci Packaging Guidelines

  • Title - To conform to rOpenSci guidelines, you could make the name all lowercase (ropenaq) or even just openaq?
  • You've chosen nice, simple, descriptive function names that mirror the openaq API endpoints
  • Consider adding a code of conduct to let people know that this is a friendly, open, space to work in :)
  • The README is great - nice examples and descriptions. The only thing missing is the title of the package at the top.
  • It's generally considered best practice to use Imports: instead of Depends: for dependencies (see https://github.com/ropensci/packaging_guide#-package-dependencies & http://r-pkgs.had.co.nz/namespace.html#imports)

DESCRIPTION file

  • Title: I think 'An R package for ...' is frowned upon by CRAN - perhaps something like "Access Air Quality Data From OpenAQ"?
  • In the Maintainer: field, put your name (rather than "Who to complain to") :)
  • The Description: field should contain one or more complete sentences describing what the package does.
  • License: I think the correct specification is GPL (>= 2). @sckott: does ropensci have policies/suggestions on licensing?

Dependencies

  • As noted above, I don't think any of your dependencies need to be in the Depends: section. httr, dplyr, and lubridate can all be in Imports:, and from what I can see, XML is not needed at all? testthat should be in Suggests: as it is not necessary to use the package, only to run the tests.
  • I feel like OpenStreetMaps (in Suggests:), which depends on rJava, is a pretty heavy dependency just to build the vignette. Can you use ggmap for the mapping in the vignette instead of OpenStreetMaps?

Function Usage and Documentation

  • All functions:
    • Value: Should specify that a data.frame ("tbl_df") is returned instead of data.table.
    • I think it would be helpful to specify in a bit more detail what the columns in the returned data frame contain. E.g., locations and count.
  • cities:
    • In what format(s) can country be specified? Only the two-letter ISO code as in the example, or will a full name do as well?
  • locations:
    • It would be useful to have a bit of an explanation about what combinations of city, country, location are allowed. For example, can location be specified without country and city? What happens if you specify a city that's not in the country specified? Perhaps also reorder to country, city, location so they are in decreasing order of resolution (and the same order as in measurements).
    • It doesn't look like city, country, location, and parameter can take more than one value, though that might be a nice feature. (eg. locations(country = c("IN", "CN"), parameter = c("pm25", "o3"))). I took a quick peek at the API docs and it doesn't look like the API accepts more than one... but could you query the api once for each country and combine them? This could be something to think about for a future version though!
    • date_from and date_to: Perhaps mention what class of inputs these arguments accept - (e.g., Date, POSIXct, POSIXlt, character in the form YYYY-MM-DD?)
  • measurements: Similar comments as for location in terms of documentation.
  • latest: measurements has a limit param with default value of 100. latest seems like it should have the same for consistency (unless of course the API doesn't have it).

Source Code

General

  • I like how thorough you are in your checking of inputs. This will cause the functions to fail informatively when the user inputs something incorrectly.
  • Code modularization:
    • Try to write code that is modular to reduce repetetion (the 'DRY' principle: Don't Repeat Yourself). Try to write many more smaller functions that encapsulate discrete operations and that can be reused. Look across the functions and find code that is repeated, and wrap it up as functions to be called within each larger function. For example, you can create a function that returns the base url for all queries:

      base_url <- function() {
          "https://api.openaq.org/v1/"
      }
      
      ## Then you can use that function in each exported function:
      countries <- function() {
          url <- paste0(base_url(), "countries")
          httr::GET(url)
          etc...
      }

      That way if the url changes etc., you only have to change it in one place.

    • It's common practice to put these functions that get used in a lot of other functions in a file called utils.R or zzz.R.

  • If you're using the pkg::function() syntax, you don't need to use @import/@importFrom to add these to the NAMESPACE file. As long as the packages are in the Imports: section of the DESCRIPTION, you can use the functions directly using :: without including them in the package namespace. See Hadley's recommendations for this here: http://r-pkgs.had.co.nz/namespace.html#imports. Of course there are many differing opinions on this!

Getting and parsing content

  • Use httr::stop_for_status to throw an error when the request fails, and/or use httr::status_code if you want to check for Gateway Timeout (Error code 504) specifically. That way you don't have to grepl for the specific text "Gateway time-out".

  • httr::GET can take a named list of arguments to build the query. This can make constructing the call a bit cleaner, so that you don't have to paste0 them all together with &s. You can check the arguments near the top of the function, then construct a list of all arguments to pass to the query argument of GET. For example:

    url <- "https://api.openaq.org/v1/locations?"
    country = "IN"
    city = "Mumbai"
    location = NULL
    arg_list <- list(country = country, city = city, location = location)
    httr::GET(url, query = arg_list) # Any NULL elements of the list supplied to the query paramater are automoatically dropped
  • In most (all?) cases you should be able to get the results from page <- httr::GET(query) more easily and directly by using httr::content(page, as = "text"), and then parsing from json to a data frame using jsonlite::fromJSON(), rather than parsing the columns individually with lapply. This should result in more concise code, and hopefully be quite a bit faster.

    • You could write a function to use everywhere for parsing content such as:
    get_content <- function(x) {
      content_type <- httr::headers(x)["content-type"]
      if (!grepl("application/json", content_type)) {
        stop("Unexpected content type")
      }
      cont <- httr::content(x, as = "text")
      parsed_content <- jsonlite::fromJSON(cont, simplifyDataFrame = TRUE, flatten = TRUE)
      parsed_content$results
    }
    
    cont <- get_content(page) ## cont will be a nice data frame
    • If you use the code above in latest(), the measurements column will be a list-column, with each row in the column containing a data frame with columns: parameter, value, lastUpdated, unit. To then expand the data frame returned from get_content() you can use tidyr::unnest_(cont, "measurements"), which will expand the measurements column and fill in the other columns appropriately.
  • For dplyr functions, it is probably best to use the non-standard evaluation (NSE) versions of the functions (i.e., mutate_() instead of mutate(). See the vignette as a good resource to get started.

  • measurements:

    • I'm not sure you need to specify page=1 or set limit to 100 if it's NULL, as these are the default values for the API.

Tests

  • Generally, they look great! There's not a lot you can do test actual values, but you could also add some tests ensuring the dimensions of the returned tables are what you expect - number of columns, and especially number of rows when a limit value is supplied.
  • Good job having skip_on_cran for the functions that query the API.
  • Best practice is to have one context() call at the top of each file, so you could split the tests into multiple files named test-cites.R, test-countries.R, etc.
  • All tests passed for me, except for latest almost always fail with a Gateway Timeout, but as noted this is likely an API issue. You could skip them for now until the API issue is sorted?

This is my first code review, so thanks for letting me take part! Let me know if you have questions or comments; I'm happy to discuss further.

@maelle

This comment has been minimized.

Member

maelle commented Jan 29, 2016

Thanks a lot @ateucher for your thorough and very useful review! I'll start working on this as soon as possible. It will make the package really better, thank you!

I guess there is no devtools function for easily changing the name of the package? I bitterly regret naming it too fast.

@ateucher

This comment has been minimized.

Member

ateucher commented Jan 29, 2016

@masalmon it was my pleasure - I really enjoyed doing it :) If there's anything you run into that you need help with, ping me in an issue and/or on twitter. Also, I am definitely not an expert on httr and parsing json content, so if any other rOpenSci people have advice that is better than mine please chime in (@sckott?)

@ateucher

This comment has been minimized.

Member

ateucher commented Jan 30, 2016

@masalmon, I should have linked to this in my review: Here's the httr vignette on writing API packages: https://cran.r-project.org/web/packages/httr/vignettes/api-packages.html
Ther's a lot of stuff on authentication that shouldn't apply to this package, but the other tips might be useful.

@maelle

This comment has been minimized.

Member

maelle commented Jan 30, 2016

@ateucher cool, and again, thank you! I hope you'll like the final result. ☺ I'll make a point-to-point answer hopefully next week. I have started improving things but now for some other ones I need to spend time reading (lazy evaluation for instance) -- I am very grateful for the links you put in the review. Regarding httr and jsonlite your advice already made my code much simpler! My old solution was quite clumsy. Have a good week-end!

@sckott

This comment has been minimized.

Member

sckott commented Jan 30, 2016

nice work on the review @ateucher

@masalmon happy to answer any httr questions - one that comes to mind right now:

httr v1.1.0 has some changes https://cran.rstudio.com/web/packages/httr/

  • content() warns now if you don't set encoding, so use the encoding parameter when calling content()
@aammd

This comment has been minimized.

aammd commented Jan 31, 2016

review of Ropenaq

I really like this package! Its an elegant means of accessing a very interesting (and large!) online database. I have a few pieces of advice that I think will make the code more modular and easy to develop as it grows. And grow it will, because it is quite useful indeed!

general

i had quite a challenge to try to run this package! I am using Linux (Ubuntu 14.04), and for the longest time I could not get R to talk to the API. This was extra frustrating, because i was able to use the API from curl at the command line! I finally found the answer by posting this StackOverflow Question. h/t @richfitz for his assistance.

To correct the problem I had to run

apt-get install libcurl4-openssl-dev

I also uninstalled then reinstalled curl:

install.packages("curl")

You can apparently find out if your user will be having this problem by running curl::curl_version()$ssl_version, and if it answers GnuTLS, then perhaps there should be a warning or a message directing them to this advice.

Function reviews

general thoughts on API functions

I really like how you've got five functions, and each maps to the five different API queries that are available. I was thinking that the names of functions might be a bit similar to functions in other packages that also use geography. What do you think of prefixing every function with a package-specific string? Perhaps something like aq_ before all the user-facing functions (i.e. countries() becomes aq_countries()). This is similar to another rOpenSci package, geonames, which uses GN (as in GNcities()). This has the added benefit of playing very nicely with Rstudio, which will show all the package functions as completions when users type aq_ and hit tab.

I'd suggest using httr convenience functions more. This will help when you are assembling queries and checking responses:
- assembling queries would be easier if you use the query argument accepted by httr::GET(). For example, in cities.R, line 24, you could use this approach. It will build the API call for you, and handle the encoding too:

URL <- "https://api.openaq.org/v1/cities"

indonesia <- httr::GET(url = URL, query = list(country = "ID"))

Status codes are also easily parsed with httr::status_code(). For example, instead of grepl("Gateway time-out", toString(contentPageText)) , you could use httr::status_code(page) to find out if 200 (everything is OK), 504 ("Gateway time-out") etc. Then, you could write some logic to stop the functions with warnings, or messages, etc as appropriate.

For parsing responses, I'd suggest you try out a two step approach - first use grep to identify if the content of the response is json. Then use jsonlite::fromJSON() to parse it. That is probably simpler and more robust than parsing by hand (e.g. as you do in cities.R L30 to L34:

## confirm that the response is what you expect:
grepl("json", headers(page)$`content-type`) # is there a better way?

## parsing content
contentPageText <- httr::content(page, as = "text")
jsonlite::fromJSON(contentPageText)

countries()

This function looks great to me, besides the general tips above I have nothing to add. The help file might mention that the codes for countries follow the "ISO 3166-1 alpha-2" specification. (er, that is, assuming that they actually do). In a future version of the package, perhaps you could include the package countrycode to allow more flexible naming of countries.

cities()

I really like how you are checking for available countries! I was noticing that you check to see if a country is available in all datasets, and I suggest you write a little country-checking function for the purpose. Perhaps something like:

check_country <- function(x){

  ctrys <- as.character(countries()$code)

  is_country <- x %in% ctrys

  if(is_country){
    x
  } else {
    msg <- paste0("x must be one of ", paste0(ctrys, collapse = ", "))
    stop(msg)
  }
}

check_country("CA")

or perhaps something via assertthat :

assertthat::not_empty(country)
ctrys <- as.character(countries()$code)
assertthat::validate_that(country %in% ctrys)

If you rely on httr to handle proper URL encoding, perhaps you could do away with the cityURL column in this output? I think it could be distracting for users to see the country information appearing twice.

Ropenaq::locations

If Ropenaq is not loaded, the example for this function doesn't work:

Ropenaq::locations(city='Houston', parameter='co')
Error in Ropenaq::locations(city = "Houston", parameter = "co") : 
  could not find function "%>%"

you could fix this by adding:

importFrom dplyr `%>%` 

to the function.

You have a large chunk of if-else statements devoted to confirming that the inputs are actually available. You could accomplish a very similar result using match.arg:

match.arg("pm25", c("pm25", "pm10", "so2","no2", "o3", "co", "bc")

In fact, a lot of this function (Lines 34 to 154) is about checking the inputs. You could make it more succinct by doing something like this:

1 make a list that grabs all the optional arguments (arglist <- list(parameter = parameter, has_geo = has_geo, value_from = value_from, value_to = value_to etc)
2 get rid of any that are null , e.g. by doing Filter(arglist, f = Negate(is.null))
3 check the remaining ones (using match.arg or assertthat)
4 if they are OK, pass them httr::GET

encoding
Wow, you're getting lots of different encodings in your results from the API! I ran the following:

brz <- Ropenaq::locations(country = "BR")
brz$location %>% stringi::stri_enc_mark(.)

And I found that you've got a mix of UTF-8 and ASCII encoding! I'm afraid I'm no expert in this stuff. I was looking at this SO question to try and figure out how it works. There should be a way to change the whole vector to the same encoding. Perhaps you could write the file out to a tempfile(), then read it back in?

Also, I know that this is probably the fault of the API and not your package, but the places named in brz$location are not places in city of Sao Paulo, but cities in the state of Sao Paulo. Yet the resulting data frame has "Sao Paulo" listed for all rows, in the column "city".

Vignettes

maps.Rmd

I wasn't able to actually get the vignette to build, until I went in and removed "NL".

When running the for loop at L31 to L33, you're using c() to grow a vector. That can have the unfortunate effect of slowing down your computation. I think it could be faster (and simpler) to grab all the location info like this:

country_codes <- Ropenaq::countries()$code

## create a list to hold output
locations_country <- vector(mode = "list", length = length(country_codes))
names(locations_country) <- as.character(country_codes)

for (country in country_codes) {
  print(country)
  locations_country[[country]] <- Ropenaq::locations(country = country)
}

dplyr::bind_rows(locations_country)

Rather than plucking out the lat and long columns, this downloads the location info and simply combines all the outputs with dplyr::bind_rows().

I only get this "Gateway time-out, but try again in a few minutes." error when I look for country=NL. Perhaps you could add tryCatch to this for loop, so it doesn't automatically fail when only one country does not work? tryCatch() could also go inside Ropenaq::locations() itself, to prevent it from failing in situations like these.

That said, I wonder if it is worth actually downloading this data every time the vignette is built. Could you instead download a version of the data and place it in data/, or perhaps R/sysdata.rda? The downloading code could still serve as a useful example (though not run, you could set eval=FALSE), and you'd still get your (very nice!) map

ggplot2 code: no need to add theme settings twice.

The resulting map is exquisite and really highlights just how much data is available from this API! I can see why you wanted to make it 😄

graphics.Rmd

This vignette is quality: succinct, useful and attractive! Perhaps a more descriptive name for this (and other) vignettes? I think users are likely to have other packages with vignettes called graphics (or come to think of it, maps).

ropenaq-vignette.Rmd

Great intro. I'd like it to be a little longer, including some general background on what is air quality, how is it measured, who are these people over at the openaq project. A link to their website (in addition to the api specification) would be helpful.

DESCRIPTION

A few small changes to this file would be helpful (and avoid those nasty notes on R CMD CHECK):

  • replace "Maintainer" and "Author" with Authors@R: person("Maelle", "Salmon", email = "maelle.salmon@yahoo.se", role = c("aut", "cre"))
  • GPL (version 2 or later) -> GPL(>=2)
  • I got the note "Description should contain at least one complete sentence."
  • I couldn't find where you use the XML package. So you could remove it from here?
  • Imports is usually where you put the packages you need, rather than Depends

MISC

  • develop/ folder. This looks like some code that you're thinking of including either in the package or in some vignettes in the future (for example, I couldn't find findMissing2015() used anywhere outside of develop/finding_proportion_missing.R). I'd suggest working on these things on a development branch, just to keep the package a bit simpler.
  • i'd remove all the references to lintr, since that just tidies your code and doesn't affect the package at all.
@aammd

This comment has been minimized.

aammd commented Jan 31, 2016

I really enjoyed reviewing this package! Thank you so much @sckott for the invitation, and thank you @masalmon for contributing such an interesting package. I learned a lot while reading your code.
I'm also very sorry this is late, and I hope it is still useful 😊 😄

@maelle

This comment has been minimized.

Member

maelle commented Jan 31, 2016

Thank you so much @aammd, this will be very helpful! I'm looking forward to implementing your advice! It's interesting to see how you and @ateucher have spotted different things, both reviews complete each other very well! 👍

I'm just wondering what you mean by references to lintr? I like having the lintr-bot tell me where my syntax is wrong each time I push something to Github. And if I let it here and get contributions from other people later, I can also check their code for "syntax compatibility" with mine.

Regarding encoding: ropensci/ropenaq#4 Imagine an user wants to see which cities are available for Poland and then make a query with measurements for one of these cities. I provide cityURL because otherwise, you couldn't just do copy and paste in a query since with an e.g. French locale the accents of city names in Poland do not appear so what you see in a result of cities() wouldn't be the real name of the country (missing accents) and therefore the httr translation wouldn't be right either. So now I expect the user to provide the encoded version as argument, and I help him/her by providing the cityURL and locationURL column in all tables that have city and location. I had no better idea. 😕

Anyway, I'll start working on the package again next week and I already know it will get better thanks to both reviews, thanks again @ateucher and @aammd !

And thanks @sckott for the httr remark !

@maelle

This comment has been minimized.

Member

maelle commented Feb 3, 2016

Ropenaq -- point-to-point answer to the reviewers

Thanks again for your very helpful and thorough reviews, @ateucher and @aammd ! When I'm a (R) grown up I want to be a Ropensci package reviewer too and make people happy and improve their code 😃

I'll answer to both of you in the same answer to avoid repeating myself. I was not sure how to structure my answer so I hope it'll be easy to read -- I thought that having a remark in italic/answer would be too cumbersome. Both reviews were well structured and easy to read! Please tell me if I missed something.

General comments

  • I was surprised too that there was no ropenaq-R-package yet but the Open AQ platform is fairly new. I asked them if they had any plan to add an R package before starting to write one because writing an existing package = 😬
  • Regarding the API, @jflasher and @RocketD0g are very patient and helpful so it's been a pleasure to work with them, although the API is not mature.
  • If you know air quality scientists who have suggestions for new data sources for the platform, you can give them this link: https://github.com/openaq/openaq-fetch/issues where they can open an issue for suggesting the new data source.
  • I think I have fixed my description file and package structure for passing R check (except a warning for the vignettes that do not have the corresponding html -> is this an issue?). I only had two files in develop/, and now have a develop branch, I will see if I use it: one of the two files has already become a vignette for presenting how to use the R package openair with Open AQ data.
  • I have added @aammd installation problem shooting in the README. It must have been quite frustrating to have to deal with it! Thanks for spotting this! Is the README the best place for such advice?

rOpenSci Packaging Guidelines

  • I'm keeping the change of name (to ropenaq) for the end. Is there any clever way to change the name of the package in all files? Could it be problematic for the github repo (although it is only a change of case)?
  • The names of the functions now all start with "aq_".
  • I now use Imports instead of Depends.

DESCRIPTION file

I have made all changes.

Dependencies

  • I have made all changes, including deleting XML. It was nice to learn about the difference between Imports and Depends. I think I've seen a very thorough article about it somewhere but always thought I was going to read it later and later never came.
  • The vignette now uses ggmap. The result is not as nice as with OpenStreetMaps but I agree on its being an heavy dependency, and I mention the possibility to use OpenStreetMaps.
  • I know import "%>%" from dplyr so that Ropenaq::aq_locations(city='Houston', parameter='co') even if the package is not loaded.

Function Usage and Documentation

  • I have changed all return descriptions so that the data.frames are presented as such.
  • In all help files there is now a list of the columns with their meaning and class.
  • In the help file of country I have written the ISO specification of the country code and that only the code, not the name, can be used. Is this clear enough or should I write this in all help files? In all help files the argument country is described as "Limit results by a certain country – a two-letters codem see countries() for finding code based on name.".
  • I have tried to better explain country/city/location in the details section of the aq_locations function. But I'm not sure it's clear.
  • I think that the possibility to query several parameters, or even several countries, should be a feature in a future version. It might be very useful indeed! I also need to find a good way to deal with very big aq_measurements query. I've added the page argument that I had somehow neglected, and a warning for when the number of found measurements is higher than limit. The warning returns the number of pages to be queried (this warning is printed no matter the value of page). I am not sure I should do the loop inside the function because if the user e.g. writes aq_measurements() it would last forever, so I prefer let the user write their own loops for now. Any suggestion? Is this something for a future version or should it be dealt with now? Should I add a verbose argument for turning warnings off?
  • I am now more precise in the description of the date_from and date_to arguments.
  • The latest endpoint of the API does not have any limit argument (I had to ask).
  • Regarding encoding / having the URL-encoded columns for city and location, as I said in a previous comment it was the easiest solution I found for not getting any locale problem and helping the user make queries for e.g. Poland, but I'll be happy to hear any suggestions. I've noticed that if I paste a city name with polish accents in my R session, whether at home (French locale) or at work (Spanish locale), then some accents disappear which was why I created the columns.

Source Code

  • I check inputs very thoroughly because I've seen a remark about it in an old Ropensci package review, I think. 😉
  • I have added a utils.R file and all functions files have become much smaller. In the utils.R file there are base_url, buildQuery which returns a list of arguments for httr::GET after checking them, getResults that returns a table of results that still needs to be transformed, the three functions functionURL, addCityURL and addLocationURL for adding the columns with the URL-encoded names, then addGeo, functionGeo and functionNotGeo for adding the latitude and longitude columns, functionTime and functionTime2 for transforming the columns with times, and functionParameters for transforming the column with the parameters names for aq_location. I was first happy with all functions and then I started using standard evaluation instead of non-standard evaluation... And now I have the impression that I have written very ugly code, so I'd be grateful for suggestions.
  • I actually like both importFrom for having the "list" of functions in the NAMESPACE and pkg:: which makes my code easier to read (at least for me). Is it ok if I let both?
  • I now make a better use of httr and jsonlite:
# does the query and then parses it
getResults <- function(urlAQ, argsList){
  page <- httr::GET(url = urlAQ,
                    query = argsList)

  # convert the http error to a R error
  httr::stop_for_status(page)
  contentPage <- httr::content(page, as = "text")

  # parse the data
  resTable <- jsonlite::fromJSON(contentPage)$results
  resTable <- dplyr::tbl_df(resTable)
  return(resTable)
}

The suggestions you both made made it all very easy! I just wonder if having a simple stop_for_status is user friendly enough?

  • I have added a check for the size of the table resulting from the query in 4 functions (all except countries). If the table has zero rows, the function returns an empty table and a warning. This was not mentioned in the reviews but I realize I had no good message when it happened.
  • @aammd, are you happy with the buildQueries function or do you think it would be easier to read/better if I used assertthat?
  • I now use tidyr in aq_latest.
  • I read the part about NSE/SE in Hadley Wickham's advanced R book and the vignette you pointed me to, as well as this article http://www.numbertheory.nl/2015/09/23/using-mutate-from-dplyr-inside-a-function-getting-around-non-standard-evaluation/ but I still feel my code is really clumsy. It works (well maybe it's not even clever SE!) and it has _ for all dplyr functions but it seems too complicated. I'll be happy to get any help.

Tests

  • There is now a test file for each context.
  • I commented the failing aq_latest test. I promise I don't usually do this when I get a testthat error!
  • I don't know how to test my check of the results table being empty. I don't know how to systematically write something that will yield an empty table. And now code coverage is only 94%. 😆

lintr

  • Why remove the references to lintr?

Vignettes

  • There are now four of them: the general one, the one with the graphics for one day in several places, the one with the map, and one for showing how to use the great openair package on the retrieved data.
  • I have changed their filenames and titles, are they ok now?
  • The vignettes now all build. Do you get any problem with them?
  • The vignette creating the map now uses your code suggestion, @aammd, thanks a lot! And I saved the data in a data/ folder in the vignettes folder, is it good practice or should the data be elsewhere?
  • As I mentioned earlier now this vignette uses ggmap instead of OpenStreetMaps.
  • I was not sure which infos to use about air quality in the general vignette? I have added a sentence about Open AQ that I copied from their website, and a link to said website. What else would you like to see here, @aammd? Maybe I could ask @jflasher and @RocketD0g where I could find a concise description.

I hope I haven't missed anything. This review process is really very interesting. I can see that the package is getting better, and I can apply nearly everything I learnt to other R code/packages I have to write, so again, many thanks!

PS: How can I acknowledge the reviewers' input in the DESCRIPTION file, @sckott ?

@maelle

This comment has been minimized.

Member

maelle commented Apr 12, 2016

Just giving some news about the package:

  • The API recently made a few changes: now all endpoints have a limit and a page argument, so I need to change a few lines of code in the parts where I check that e.g. a country exists if the users inputs the country as argument. My package still works because I added the arguments in the functions, but it's still missing this checking thing.
  • I have started to add memoization with memoise.
  • @aammd and @ateucher I'd like to add you as package authors because of all the useful feedback, is it ok for you?

I'll come write a comment again when I'm done with the modifications because of the API change and with memoise (a few weeks at most). I hope that the editor @sckott is ok with this "strategy"!

@sckott

This comment has been minimized.

Member

sckott commented Apr 12, 2016

@masalmon yes, sounds good

@noamross

This comment has been minimized.

Collaborator

noamross commented Apr 12, 2016

Note that there is a specific semantic for adding reviewers to the Author
field of the DESCRIPTION file. We've just added it to our recommendations
in the packaging guide:
https://github.com/ropensci/packaging_guide/blob/master/README.md#authorship

On Tue, Apr 12, 2016, 4:49 PM Scott Chamberlain notifications@github.com
wrote:

@masalmon https://github.com/masalmon yes, sounds good


You are receiving this because you are subscribed to this thread.

Reply to this email directly or view it on GitHub
#24 (comment)

@maelle

This comment has been minimized.

Member

maelle commented Apr 16, 2016

I have now prepared a new version of the package. Important things to note:

  • I have not changed the package name... yet? Is it compulsory? I feel bad having the capital R, but I kind of wish there were a magical solution for changing the name everywhere. But hey if I have to I will!
  • All endpoints have now page and limit arguments.
  • The output of all functions is now a list with 3 data.frames: 1 for results, 1 with metadata, 1 with timestamps (I have updated the doc, readme and vignettes accordingly). So now in metadata there is the number of results found on the API, which can be smaller than the limit argument -> in this case the user has to figure out how to compare number of results found and the limit argument, and then loop over several pages. As I said previously, in a future version I'd like to add better tools for getting a lot of data but on the other hand for really big files OpenAQ has csv dumps. So maybe the future version thing is better documenting the "problem".
  • I have now decided against caching because the results change very often.
  • I still have both "::" and importFrom in some cases I think... But I quite like the "::" -> does it make the code much slower? Is it really bad bad style?

Now I'll wait for feedback in order to finally make the package ready for transfer... and when the API becomes stable I'd like to submit it to CRAN which makes me a bit nervous.

And again, thanks a lot for the previous feedback!

@aammd

This comment has been minimized.

aammd commented Apr 16, 2016

@masalmon I would be delighted to be listed as a reviewer on this package! Its a really cool package and I enjoyed studying it.

I would love to comment on the changes you've made -- I apologize for saying little in response to your awesome & extensive revisions. I confess it felt a little overwhelming! @sckott , what is the best way to go forward? Should I write another comment here?

@sckott

This comment has been minimized.

Member

sckott commented Apr 16, 2016

I have not changed the package name... yet? Is it compulsory?

We don't make anyone change pkg name, but we do find it easier for users if they don't have to worry about case. If you change the name, github will redirect users automatically to the new repo name masalmon/Ropenaq -> masalmon/ropenaq

what is the best way to go forward? Should I write another comment here?

@aammd if you have time, yes please! We realize your time is valuable, so comments don't have to be extensive

@maelle

This comment has been minimized.

Member

maelle commented Apr 16, 2016

Ok so changing the name will be my next step! 😇 I was smarter with my next package.

@jennybc

This comment has been minimized.

Member

jennybc commented Apr 16, 2016

Let me pre-emptively warn you about a real headscratcher re: travis:

http://stackoverflow.com/questions/31107761/encrypting-a-file-on-travis-for-a-github-repo-that-was-renamed

TL;DR manually change the travis slug in .git/config and save yourself hours of frustration.

@maelle

This comment has been minimized.

Member

maelle commented Apr 17, 2016

Thanks @jennybc, that's thoughtful! I'm quite lucky because this API does not require authentification, however I'll be careful if something bad happens on Appveyor (for which I encrypted my Github path and put the secret in the config file) since it might have similar issues I guess.

@maelle

This comment has been minimized.

Member

maelle commented Apr 17, 2016

It seems it was much easier than I had envisioned... So now this whole issue deals with a package called ropenaq.

@aammd you're listed as a reviewer now in the DESCRIPTION.

@jennybc

This comment has been minimized.

Member

jennybc commented Apr 17, 2016

Based on a single repo of mine, it doesn't look like Appveyor puts info into .git/config, with which to confuse itself later, should you rename things. Maybe this is unique to travis.

@ateucher

This comment has been minimized.

Member

ateucher commented Apr 17, 2016

Hi @masalmon , sorry or being so silent lately! I would be thrilled to be added as a reviewer, thanks you! I've been lurking and watching some of your work, am I think it looks awesome! Unfortunately I don't think I'll have time to do much more review on it in the short term, but if I have a chance I will peek in, as I am very interested and I know I can learn a lot from your work.

@maelle

This comment has been minimized.

Member

maelle commented Apr 20, 2016

@sckott, how should I proceed?

I re-read my answers. My main doubts are now:

  • a few documentation things (whether I explain city/country/location well, and whether the few words about air quality in the readme/general vignette are enough)
  • whether I should get read of all "::" if I use importFrom in the same file (I like having the "::" for seeing what I used but I know it makes computation slower... but not that much slower?)

For a future version of ropenaq my goals are:

  • implement a few air quality index functions.
  • making it easier to get lots of data, probably by documenting how to do this with the package versus how to go on the website and get csv dumps.

When the API is stable I want to make the package ready for CRAN submission.

@sckott

This comment has been minimized.

Member

sckott commented Apr 20, 2016

@masalmon I think we're ready to approve and have you move over your repo. I've marked it as approved. I'll close this issue, and move over to ropenscilabs org - let me know if you need help with that.

@sckott sckott closed this Apr 20, 2016

@maelle

This comment has been minimized.

Member

maelle commented Apr 21, 2016

@sckott awesome, thank you!

The "Transfer ownership" button is in the danger zone of settings which makes me a bit nervous, ah, but it seems pretty easy.

Since I'm a member of ropenscilabs do I already have the right to transfer a repo there? When would the repo be transferred to ropensci? I'm a bit lost with all these names 😁

@sckott

This comment has been minimized.

Member

sckott commented Apr 21, 2016

I think you should be able to - try it and if it doesn't work I'll fix some things.

We started the 2nd organization ropenscilabs for earlier stage packages/projects - once more mature they'll move to ropensci - more mature is not a very specific definition - so can't give answer yet on what that means - but at some point we'll move to ropensci

@maelle

This comment has been minimized.

Member

maelle commented Apr 21, 2016

@sckott ok thanks for explaining, now I'll transfert it! 😁

@maelle

This comment has been minimized.

Member

maelle commented Apr 21, 2016

"Moving repository to ropenscilabs/ropenaq. This may take a few minutes."

@maelle

This comment has been minimized.

Member

maelle commented Apr 21, 2016

Cool I can see the redirection, so this is my last message here. 😸

@noerw noerw referenced this issue Jan 15, 2018

Merged

Add tests and CI integrations #17

3 of 3 tasks complete
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment