-
-
Notifications
You must be signed in to change notification settings - Fork 104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rdefra package #68
Comments
Editor checks:
@cvitolo Thanks for your submission!
Reviewers: @masalmon @haozhu233 |
Hi Scott, Many thanks for looking at my package. Kind regards, On 9 August 2016 at 00:00, Scott Chamberlain notifications@github.com
|
thanks @cvitolo - reviewer assigned - seeking one more |
@cvitolo I have a first question, why do you use a sub-directory for the package itself in the repository? I'm not saying it's wrong, I'm just wondering. :-) |
@masalmon I use a sub-directory for the package so that, in the parent folder, I can have the package itself and also add all the work (e.g. experiments, presentations, demos, tutorials, posters, papers, etc.) related to the package. I could move everything to the package folder and add the additional files to .Rbuildignore but I personally prefer to leave the package folder as clean as possible. It is just a personal preference, but I'm happy to adapt to best practises if needed. |
@cvitolo Ok thank you for your quick answer. :-) I see some commented code such as # library(RCurl)
# library(XML)
# uka_id <- "UKA00399"
# uka_id <- "UKA15910" or # library(RCurl)
# library(XML)
# library(plyr)
# site_id <- "ABD"
# years <- 1972:2014
# Site with list of flat files
# rootURL <- "http://uk-air.defra.gov.uk/data/flat_files?"
# myURL <- paste(rootURL, "site_id=", site_id, sep = "")
# download html
# html <- getURL(myURL, followlocation = TRUE)
# parse html
# doc = htmlParse(html, asText=TRUE)
# hrefs <- xpathSApply(doc, '//*[@id="center-2col"]', xmlGetAttr, 'href')
# Otherwise
# html <- paste(readLines(myURL), collapse="\n")
# library(stringr)
# matched <- str_match_all(html, "<a href=\"(.*?)\"") Why is it commented? If it's not useful any longer, maybe you could delete it? |
@masalmon Thanks for spotting that. I have just removed all the old commented code and committed the changes. |
@cvitolo how long does it take for the vignette to build on your PC? It's taking ages on mine (but it seems to work, just slowly!). |
@masalmon It can take few hours because it is parsing lots of HTML pages. |
@cvitolo oh, maybe the examples are a bit too long for a vignette then? 😄 Maybe you could do the same for one of the countries only (e.g. Northern Ireland)? Now I have the built vignette. ✨ |
General
Code
message("Please insert a valid year (or sequence of years).")
stop You should write stop("Please insert a valid year (or sequence of years).")
df <- df[, -col2rm]
df$site_id <- site_id
`` `
Would become
```r
df <- dplyr::select_( df, quote(-col2rm))
df <- dplyr::mutate_(df, lon = lazyeval::interp(~site)) Or possibly something easier. I’m not a standard evaluation expert, maybe someone has better arguments? Or is it even best practice for rOpenSci? There’s a vignette in dplyr about NSE https://cran.r-project.org/web/packages/dplyr/vignettes/nse.html and a chapter in Hadley’s advanced R book http://adv-r.had.co.nz/Computing-on-the-language.html
Dependencies
url <- "http://uk-air.defra.gov.uk/networks/find-sites?"
closed <- FALSE
pollutant <- 9999
arg_list <- list(closed = closed, pollutant = pollutant)
httr::GET (url, query = arg_list) # Any NULL elements of the list supplied to the query paramater are automatically dropped
Readme and vignette
Documentation
Tests
I don’t know how much time doing this took me, I had parallel tasks and I won’t count the vignette building, ahah. I’d say a few hours? I’m happy to help if you have any question! |
thanks @masalmon ! can you give estimate of time it took to do review? |
@sckott not sure, see above, 2-3hours? |
thanks! |
@masalmon Thanks a lot for your prompt and thorough review. I have already simplified the vignette. I will work on the other changes once I receive the second review. Many thanks again!!! |
First, I really like the idea of this package as it provides R users an relative easy way to access the UK-AIR data, especially when the data source does not have a public API. Great job, @cvitolo! :) Also, thank you, @masalmon, for providing such a thorough review for this package. You review covers many points I want to say. I'll try not to overlap our points too much. ;) Comments
In the sample code,
The review process took me around 3 hours |
@haozhu233 thanks for your review! @cvitolo both reviews are in, continue conversation here and let us know if you have any questions. |
@sckott thanks for finding reviewers. @haozhu233 @masalmon Thanks for reviewing my package so quickly. I'll start working on this right now. |
rdefra - response to reviewersMany thanks again for reviewing my package. Below are my responses to reviewers' comments. Reviewer 1 (@masalmon)General
Code
Dependencies
Readme and vignette
Documentation
Tests
Reviewer 2 (@haozhu233)
|
I couldn't find a solution for |
rdefra - response to reviewers (step 3)Thanks @masalmon for giving another look at the package, I greatly appreciate your comments! Below is a summary of the latest changes I made based on your suggestions.
One last question from me: I was thinking of moving the package to the root directory (now it's in a subfolder) as I noticed that that is the easiest way to get appveyor to work. The side effect is that the repository name will change and the link to the old releases will be lost. An option would be to create a separate repository. What is your view on that? Thanks again! |
library("rdefra")
library("lubridate")
library("ggplot2")
library("dplyr")
years <- 2013:2015
df <- ukair_get_hourly_data('MY1', years)
df <- mutate(df, year = as.factor(year (datetime)),
year_day = yday(datetime))
df %>%
group_by(year, year_day) %>%
summarize(ozone = mean(Ozone)) %>%
ggplot() +
geom_line(aes(year_day, ozone)) +
geom_smooth() +
facet_grid(year ~.) +
ylab(expression(paste("Ozone concentration (", mu, "g/",m^3,")"))) but it's not really prettier. 😄 I think it's quite hard to see seasonality. Maybe try it with another pollutant? For your last question, I have no idea ("side effect is that the repository name will change and the link to the old releases will be lost")... @sckott ? |
I have tried downloading more years and I see the seasonality now (sorry it made me curious): library("rdefra")
library("ggplot2")
years <- 2000:2015
df <- ukair_get_hourly_data('MY1', years)
df <- mutate(df, year = as.factor(year (datetime)),
year_day = yday(datetime))
+
ylab(expression(paste("Ozone concentration (", mu, "g/",m^3,")")))
Maybe you could comment on the later years which do not have the "nice" spike? Although there isn't really a downward trend. |
rdefra - response to reviewers (step 4)@masalmon
|
output:
md_document:
variant: markdown_github
toc: true as header?
** Maybe add a link to the parallel package the first time you mention it and write ** "the user access" -> "the users access" ** Nice new plot with the boxplots! It just lacks a x-axis label "Month of the year". ** Your meta should include "Please note that this project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms."
output <- ukair_get_hourly_data("ABD", 2000:2014, keep_units = TRUE) and then saw that the units data frame has 117 lines, some of them without unit.
|
hi @cvitolo for your question
You can do this. If you change the repository name, any attempts at going to the old URL will be redirected to the new URL, And I +1 this move of the pkg to the root directory 👏 just makes things easier. |
I totally agree with you, it is not practical! I am going to ask for help on stackoverflow or contact appveyor support.
Sorry about this. I found out about that the output can be set to
Ok, that's done
Done
Done
Done
I did try that but did not work for me :(
You can look at the full log here.
You are right! setting the attribute in
The table has now a year column. It is possible that, over time, sensors will be substituted with new ones (with higher resolution, therefore different units?). My guess is that if this happens, the new sensor gets a new SiteID. I can check with DEFRA. In the meantime I think it is safer to keep the full table. Also, I wouldn't remove rows with empty units, otherwise users won't know if the units are missing or the function failed to retrieve all the metadata.
Done. Thanks for all your suggestions! Again, your help was precious! |
So Travis can't install gdal, is that the issue? |
@cvitolo I see that Then regarding |
I also wonder whether you should make |
An example of package that has |
Travis can install a binary R package like rgdal, but you need this in your sudo: required
Working example is here: Travis can also build GDAL from source, see Edzer's .yml here: https://github.com/edzer/gstat/blob/master/.travis.yml So, I guess that's three ways to do it if you include the apt-get system spTransform() is defined in sp, but it's only useable if rgdal is also Cheers, Mike. HTH! |
I'll add that I'm also happy to help get travis / rgdal working if you have Generally, rgdal is easily available on Windows and Mac, but it's a bit |
Yes I don't have other remarks (hopefully all potential users will be able to install gdal or ask for help). Nice job, and welcome back from Travis hell 😉 |
Yes, I also believe that this package is ready. Great job, @cvitolo! :D |
Thanks everyone @cvitolo just taking a quick look over before approving |
Great work! A few items before approving:
ukair_get_coordinates(5)
#> Error in xml2::xml_find_all(page_content, "//*[contains(@id,'tab_info')]")[[2]] :
#> subscript out of bounds use e.g., foo <- function(ids) {
UseMethod("foo")
}
foo.default <- function(ids) {
stop("no 'foo' method for ", class(ids), call. = FALSE)
}
foo.character <- function(ids) {
# do stuff
ids
}
foo.data.frame <- function(ids) {
# do stuff
ids
} which gives you nice failure behavior, for example: foo(5)
#> Error: no 'foo' method for numeric You could alternatively, do the checking for correct class type internally without doing the above example, but the above does make it easy and clear to say what happens for each input class I thin the other exported functions in the package are fine |
@sckott Thanks for having another look at the package! I have made the changes you suggested:
|
Looks good to me. Approved! Your vignettes still need a few pieces:
To move your repo:
Please do follow these guidelines moving forward with your package:
Are you interested in doing a guest blog post on our site? http://ropensci.org/tech-notes/ If so, I'll get in touch with more details |
Thanks @sckott, I'll follow your suggestions! I have made the changes above and tried to transfer the repo to ropenscilabs but I get this message:
Also, should I change all the links in the badges as well (e.g. travis, codecov)? Happy to do a guest blog post, please send me more details. |
Adding a note for reference: rdefra went through JOSS and rOpenSci review in parallel, having been submitted before our joint-review workflow was in place but approved after. The rOpenSci review can be found here: openjournals/joss-reviews#51 |
@cvitolo You should get an invitation via email invited to a team on
yes, Once the repo is transferred, we can then turn on travis and codecov for your repo
I'll send an email to you |
Based on issue at onboarding-meta/#68
Summary
The rdefra package retrieves air pollution data and metadata from the Air Information Resource (UK-AIR) of the Department for Environment, Food and Rural Affairs in the United Kingdom. UK-AIR does not provide a public API for programmatic access to data, therefore this package scrapes the HTML pages to get relevant information.
https://github.com/kehraProject/r_rdefra
Scientists and researchers interested in air pollution data and epidemiologists.
The openair package (https://github.com/davidcarslaw/openair) accomplishes similar things but relies on a local and compressed copy of the data on servers at King's College (UK), periodically updated. I have used the openair package myself in the past and it is an excellent package (for data retrieval and visualisation) but I had troubles with King's College servers down time.
The rdefra package, instead, retrieves the information directly from the original source with the advantage that users always get the most complete information at any time. This package also integrates a function to retrieve missing coordinates from the standard metadata.
Requirements
Confirm each of the following by checking the box. This package:
Publication options
paper.md
with a high-level description.http://dx.doi.org/10.5281/zenodo.59851
Detail
R CMD check
(ordevtools::check()
) succeed? Paste and describe any errors or warnings:R CMD check succeeds. There were no ERRORs, WARNINGs or NOTEs.
The text was updated successfully, but these errors were encountered: