Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FedData package in R #13

Closed
2 tasks done
bocinsky opened this issue May 6, 2015 · 31 comments
Closed
2 tasks done

FedData package in R #13

bocinsky opened this issue May 6, 2015 · 31 comments

Comments

@bocinsky
Copy link

bocinsky commented May 6, 2015

    1. What does this package do?

Allows for automated geospatial querying and downloading of raw data from several federated databases.

    1. Paste the full DESCRIPTION file inside a code block (bounded by ``` on either end).
Package: FedData
Type: Package
Title: Functions to Automate Downloading Geospatial Data Available from Several Federated Data Sources
Version: 1.1.0
Date: 2015-05-06
Author: R. Kyle Bocinsky <bocinsky@gmail.com>
    Dylan Beaudette <Dylan.Beaudette@ca.usda.gov>
Maintainer: R. Kyle Bocinsky <bocinsky@gmail.com>
Description: Functions to automate downloading geospatial data available from several federated data sources (mainly sources maintained by the US Federal government). Currently, the package allows for retrieval of five datasets: The National Elevation Dataset digital elevation models (1 and 1/3 arc-second; USGS); The National Hydrography Dataset (USGS); The Soil Survey Geographic (SSURGO) database from the National Cooperative Soil Survey (NCSS), which is led by the Natural Resources Conservation Service (NRCS) under the USDA; the Global Historical Climatology Network (GHCN), coordinated by National Climatic Data Center at NOAA; and the International Tree Ring Data Bank. Additional data sources are in the works, including global DEM resources (ETOPO1, ETOPO5, ETOPO30, SRTM), global soils (HWSD), MODIS satellite data products, the National Atlas (US), Natural Earth, PRISM, and WorldClim.
License: GPL-3
Depends: R (>= 3.1.0), sp, rgdal, raster, RCurl
Imports: rgeos, igraph, data.table, devtools, soilDB
Suggests: SSOAP, XMLSchema
Repository: CRAN
Additional_repositories: http://www.omegahat.org/R
NeedsCompilation: no
    1. URL for the package (the development repository, not a stylized html page)

https://github.com/bocinsky/FedData

    1. What data source(s) does it work with (if applicable)?

The National Elevation Dataset (USGS), National Hydrography Dataset (USGS), SSURGO soils database (USGS), Global Historical Climatology Network (NOAA), and the International Tree Ring Databank (NOAA).

    1. Who is the target audience?

Researchers, government employees/land-managers, and anyone else interested in accessing these databases.

    1. Check the box next to each policy below, confirming that you agree. These are mandatory.
  • This package does not violate the Terms of Service of any service it interacts with.
  • [?] The repository has continuous integration with Travis and/or another service
  • [?] The package contains a vignette
  • The package contains a reasonably complete readme with devtools install instructions
  • [?] The package contains unit tests
    1. Do you agree to follow the rOpenSci packaging guidelines? These aren't mandatory, but we strongly suggest you follow them. If you disagree with anything, please explain.
  • [...] Are there any package dependencies not on CRAN?
    Two not on CRAN, but available from http://www.omegahat.org/R.
  • [YES] Do you intend for this package to go on CRAN?
  • [YES] Does the package have a CRAN accepted license?
  • [NO] Did devtools::check() produce any errors or warnings? If so paste them below.
    1. Please add explanations below for any exceptions to the above:

I'm a fairly novice programmer, so [as far as I know] I don't use Travis CI or unit tests. I've not yet written a vignette (it's on my to-do list).

    1. If this is a resubmission following rejection, please explain the change in cirucmstances.

I think others have suggested FedData to rOpenSci, but the latest version is far more stable and platform-agnostic.

@sckott
Copy link
Contributor

sckott commented May 7, 2015

@jooolia assigned

@jooolia
Copy link
Member

jooolia commented May 18, 2015

Review and Comments:

Overall looks like a useful package for navigating downloading and parsing medium to large sets of data from some US government data collections.

Overall all of the functions work, however, I think there can be significant improvements made to the examples and to the documentation that would facilitate the use of the package by users. I have tried to provide general comments with some specific examples in this review.

Installation:

In README.MD the authors indicate that they have installed the package successfully on Mac OS, Windows and Linux systems. It would be useful to have more information on this and/or a link to helpful information. I easily installed the package on my Windows 7 machine, but after many frustrating hours gave up trying to install the package on Linux Fedora 19 (even with GEOS, etc installed). It is nice that there is code to help install the package for Mac users, but this could be expanded to include other OSs.

Specific comments about the scripts and functions:

  • Nice naming scheme is presented in the FedData.Rd explaining get(), download() and extract() families of functions.
  • Perhaps there is a reason for it, but there are some inconsistencies in the default values for the functions. Sometimes label=NULL, sometimes the user is required to give a value. I think consistency would make it easier for users to use the functions and understand any problems that may arise.
  • Including basic examples for use with the main overall functions would be very helpful. Currently all the help files end after "Value."

Example script in FedData.Rd

I found the use of wgetDownload() problematic (line 72). This does not work right without some tinkering on Windows. Could R functions (which you do make use of in your package) be used instead to download this data?

Line 78 gets truncated when displayed in the help menu. What shows up is"county <- county[!(county$STATE" instead of "county <- county[!(county$STATE %in% c("AK","VI","PR","HI")),]" which is problematic for following the examples.

I had problems with the http://websoilsurvey.sc.egov.usda.gov site being down when I was testing the example. Is it commonly unreliable? It took some hunting for me to determine that this was something related to the website and not to my system. Could it be possible to test the state of the website and then return a message to the user letting them know that the website is down? Could be more informative than:

error in function
"Error in function (type, msg, asError = TRUE)  : 
  Recv failure: Connection was reset"

GHCN_FUNCTIONS.R

  • line 39 see comments about cat() in Ropensci package section.
  • line 144, could use some comments. Why are these numbers hard-coded in? This is just an example, there are several spots in the code where numbers are hard-coded in and an explanation of why that needs to be would be helpful for users and future contributors.
  • line 148 spelling
  • line 160 -standardize naming a bit more. See comment in Ropensic package section
  • line 202 and 203, why are these commented out?
  • Some of the other files have the download and get sections very clearly divided into separate functions. It could simplify some of the functions in this file to do so, e. g. why is there no downloadGHCN() like there is for downloadGHCNDailyStation()?

ITRDB_FUNCTIONS.R

  • line 79 if template is null, get the whole world. I think this could be better explained in the description of the arguments.
  • lines 233-237 do these lines just need to be deleted?
  • line 217- seems different from rest of the file. I could see an argument for having these related functions in a separate file (but could also make sense to keep in this file too).
  • line 228 Could be clearer to give your read-in file a more descriptive name in the function? This would especially be helpful in lines 240-242.
  • line 240 could this be changed while accomplishing a similar goal? I don't like the idea of overwriting your input data.
  • Why include both read.crn() and read.crn.data() - how are they different?

NED_FUNCTIONS.R

  • line 50 Nice explanatory comments.
  • line 126- does res really default to null? It looks like it has no default. It would be nice to be consistent with the getNed() function.

NHD_FUNCTIONS.R

  • is the functionalization equivalent in the other functions. I like how the downloads and gets are re-used here.
  • lines 26-30 -using pipes(library("maggritr")) could simplify this section if desired.

SSURGO_FUNCTIONS.R

  • lines 105-107- A comment of what is happening here could be helpful. Perhaps it could be simplified?
  • line 324 why get rid of this? comment would be useful.

UTILITY_FUNCTIONS.R

  • line 86 is wgetDownload() actually used except in the overall example?

Ropensci criteria:

(Based on criteria laid out in ROpenSci packaging guide)

Package name:

  • Would recommend short and lower case:
    feddata or something equivalent?

Function naming:

  • would recommend snake-case:
    getGHCNDaily() would become get_ghcn_daily()
  • Overall functions are well organized in R scripts. Each script deals with a specific data source.
  • functions, vectors, lists and dataframes have been given descriptive names.

Coding style:

  • Good although I think including spaces in the code could increase the readability. See Code Style section

Readme:

  • Missing usage examples and information. Please add.

Code of conduct:

  • Consider adding a code of conduct to repository for future contributors.

Documentation:

  • Looks good.

Examples:

Would be nice to have more. Do you use specific libraries for visualization too? I think it would be very helpful to have simple visualization of the data downloaded and extracted (e.g. plot(map), plot(downloadeddata)).

Recommended scaffolding:

RopenSci recommends lhttr over Rcurl---could look into that. e.g. 111 in GHCN_FUNCTIONS.R

Console messages:

  • several examples of cat() and print() in the scripts please use Use message() and warning() instead.

Local Building and Testing of package

  • There are no tests included in the package. The package successfully passes all the checks from "Check Package" except:
Error: No testthat directories found in \testingFedDate\FedData

Instructions on installing SSOAP and XMLSchema
Packages suggested but not available: 'SSOAP' 'XMLSchema'

The suggested packages are required for a complete check.
Checking can be attempted without them by setting the environment
variable _R_CHECK_FORCE_SUGGESTS_ to a false value.

See the information on DESCRIPTION files in the chapter 'Creating R
packages' of the 'Writing R Extensions' manual.
Error: Command failed (1)

> sessionInfo()
R version 3.1.1 (2014-07-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)

locale:
[1] LC_COLLATE=English_Canada.1252  LC_CTYPE=English_Canada.1252   
[3] LC_MONETARY=English_Canada.1252 LC_NUMERIC=C                   
[5] LC_TIME=English_Canada.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] FedData_1.1.0  RCurl_1.95-4.5 bitops_1.0-6   raster_2.3-24  rgdal_0.9-1    sp_1.0-17     

loaded via a namespace (and not attached):
 [1] acepack_1.3-3.3     aqp_1.8-6           chron_2.3-45        cluster_2.0.1      
 [5] colorspace_1.2-4    data.table_1.9.4    devtools_1.7.0      digest_0.6.8       
 [9] foreign_0.8-63      Formula_1.2-0       ggplot2_1.0.0       grid_3.1.1         
[13] gtable_0.1.2        Hmisc_3.15-0        igraph_0.7.1        lattice_0.20-30    
[17] latticeExtra_0.6-26 MASS_7.3-39         munsell_0.4.2       nnet_7.3-9         
[21] plotrix_3.5-11      plyr_1.8.1          proto_0.3-10        RColorBrewer_1.1-2 
[25] Rcpp_0.11.4         reshape_0.8.5       reshape2_1.4.1      rgeos_0.3-8        
[29] rpart_4.1-9         scales_0.2.4        soilDB_1.5-4        splines_3.1.1      
[33] stringr_0.6.2       survival_2.38-1     tools_3.1.1         XML_3.98-1.1 

I hope these comments are helpful and please let me know if anything is unclear in these comments.

@sckott
Copy link
Contributor

sckott commented May 18, 2015

thanks so much for the review @jooolia !

@bocinsky Julia's review is in. Discuss here if you have any questions about her review. Make issues in your github repo to address these.

@bocinsky
Copy link
Author

Wow, I was totally not expecting such a thorough review—it's incredible, @Joolia ! Thank you so much!

@sckott I'm at a workshop in Georgia all week, but I'll start adding these as issues tonight/tomorrow, and will work towards addressing them in the evenings and will definitely next week.

Thanks again!

@sckott
Copy link
Contributor

sckott commented Jul 20, 2015

@bocinsky any progress on changes?

@bocinsky
Copy link
Author

@sckott Fair amount of progress on most of the issues drawn from @jooolia 's review (see issues, etc.). I ended up submitting a new version (2.0.0) to the CRAN that incorporates most updates, plus some spurred by an update to iGraph that was causing CRAN check warnings. One thing I haven't gotten on—and something I want to do prior to resubmitting?—is the transition from Rcurl to httr. Otherwise, expanding the examples and writing a vignette are still on the docket.

@sckott
Copy link
Contributor

sckott commented Jul 21, 2015

Nice, great progress. Thanks for the update @bocinsky

@sckott
Copy link
Contributor

sckott commented Feb 11, 2016

@bocinsky Any progress?

One thing I haven't gotten on—and something I want to do prior to resubmitting?

You don't need to resubmit. Just let us know when you've made the rcurl to httr changes, and then good to go. Let me know if you need help with that

@bocinsky
Copy link
Author

@sckott Still haven't made that transition. The issue is that I use time-stamping to ensure that people have the latest version of downloaded files, and I haven't been able to figure out how to make that work with httr. Any help would be much appreciated. The function I would need to port to httr is the "curl_download" function in the UTILITY_FUNCTIONS.R source file.

@sckott
Copy link
Contributor

sckott commented Feb 11, 2016

thanks for the update. I'll have a look

@bocinsky
Copy link
Author

Awesome! Thanks Scott!

@sckott
Copy link
Contributor

sckott commented Feb 12, 2016

so maybe the curl package is a good option, which httr is built on now. I couldn't figure out yet how to do what your function curl_download does with httr, but seems to be easy with curl, e.g, replacing your code in https://github.com/bocinsky/FedData/blob/master/R/UTILITY_FUNCTIONS.R#L126-L136 with

opts <- list(
      verbose = verbose, 
      noprogress = !progress,
      fresh_connect = TRUE, 
      ftp_use_epsv = FALSE, 
      forbid_reuse = TRUE, 
      timecondition = TRUE, 
      timevalue = base::file.info(destfile)$mtime)
hand <- new_handle()
handle_setopt(hand, .list = opts)
res <- curl::curl_download(url, destfile = temp.file, handle = hand)

httr has write_disk() to write file to disk, but it doesn't appear to work when using the timecondition curl option, since you have to set FALSE or TRUE for write_disk(), but that's not what you'd want if you want the writing to disk happening conditional on whether the file has been updated

let me know what you think .

@bocinsky
Copy link
Author

Hey @sckott! Thanks so much for your help with this... I was able to successfully migrate away from RCurl, and have pushed those updates. Updating on CRAN now! I've also closed out several of the issues re. the ROpenSci review. Are we good to go for finalizing the onboarding?

@sckott
Copy link
Contributor

sckott commented Feb 15, 2016

Great, glad it worked.

Are we good to go for finalizing the onboarding?

Just a few things:

  • Can you include at least a few examples with each function? Looking at the man pages and there's no examples.
  • Exported functions: It appears at least to me that there are a number of functions that seem like they are for internal use only, but are exported to users, e..g., all these functions in the utilities file https://github.com/bocinsky/FedData/blob/master/R/UTILITY_FUNCTIONS.R - If they aren't meant to b used by users could you either remove @export so they aren't avail. to user unless they do FedData::: or add @keywords internal so the man pages for those functions aren't visible by default
  • Don't need to do have it fully done and comprehensive before merging here, but at least start including tests

@sckott
Copy link
Contributor

sckott commented Apr 4, 2016

@bocinsky any thoughts on my above comments?

@bocinsky
Copy link
Author

bocinsky commented Apr 4, 2016

@sckott Sorry for the lack of reply! I definitely want to address all of them—the examples are the most important—but I haven't gotten a chance to sit down do them.

@sckott
Copy link
Contributor

sckott commented Apr 4, 2016

thanks, let us know if you need any help

@sckott
Copy link
Contributor

sckott commented Aug 10, 2016

@bocinsky Any updates on this? Anything we can help with?

@bocinsky
Copy link
Author

Hey hey. I just pushed a new version to gitHub and the CRAN that does three things:

  • Adds examples to the principal "get_" functions.
  • Adds the "@Keywords internal" tag to all functions but the principal "get_" functions (but keeps them exported so as not to screw up peoples' already established workflows)
  • Adds testing infrastructure via testthat (but I haven't yet started implementing testing)

The last one is tricky, because testing the principal functions is slow and data-heavy in this package. When I do, I suppose I'll make it so they aren't tested on CRAN.

@sckott
Copy link
Contributor

sckott commented Aug 23, 2016

@bocinsky Thanks for the change. Would like to see some tests before we approve this.

For tests in your pkg, could you have that principal function just do a subset of whatever it is it does so that it runs faster? You can easily skip on CRAN with the fxn testthat::skip_on_cran()

@bocinsky
Copy link
Author

Thanks so much @sckott. Will work on implementing some tests (mainly having to do with testing whether data URIs are valid) and will be back in touch.

@bocinsky
Copy link
Author

bocinsky commented Sep 9, 2016

Hi @sckott I've just pushed a new version that implements testing for all download URLs and some other functions. This has already come in handy, as the USGS changed the path of one of their FTP servers. Checking version 2.1.0 for CRAN submission on win-builder now.

@sckott
Copy link
Contributor

sckott commented Sep 9, 2016

Great, thanks for the changes.

@sckott
Copy link
Contributor

sckott commented Aug 11, 2017

@bocinsky So so sorry about the long long time since we revisited this. I think what happened was I applied the holding label when waiting on you, then forgot to remove it, then thought I was waiting on you - anyway, very much approved.

  • Please transfer the package to ropensci- I've invited you to a team within our ropensci organization. After you accept you should be able to move it.
  • Make sure to
    • add rOpenSci footer to README
      [![ropensci_footer](https://ropensci.org/public_images/ropensci_footer.png)](https://ropensci.org)
    • Add the (brand new!) "Peer-reviewed" badge to your README (in the future people will add this on submittal; it updates through the stages of review):
      [![](https://ropensci.org/badges/13_status.svg)](https://github.com/ropensci/onboarding/issues/13)
    • Change any needed links, such those for CI badges
    • Travis CI should update to the new location automatically - you may have to update other CI systems manually
  • Are you interested in doing a blog post for our blog https://ropensci.org/blog/ ? either a short-form intro to it (https://ropensci.org/tech-notes/) or long-form post with more narrative about its development (https://ropensci.org/blog/). If so, i'll get you the details on that

@bocinsky
Copy link
Author

@sckott This is great news! As you'll see in the current readme (which I'll update as suggested), I'm currently working on the next major release of FedData, but slowly. Will migrate over to ropensci now, and continue dev on the next version, which will bring integration with sf, tidy data, and more comprehensive code coverage, among other enhancements.

I could definitely crank out a tech notes on FedData. Please send details!

Cheers!

@sckott
Copy link
Contributor

sckott commented Aug 11, 2017

Great. All sounds good. Let me know if you have trouble transferring.

I'll have our community manager Stefanie get in touch with you about the blog post

@bocinsky
Copy link
Author

Hi @sckott . Got moved over to ropensci, and submitted documentation changes to CRAN as version 2.4.6. The only thing left to do is to change the URL that points to the pkgdown documentation (which is a mask for the gh-pages site at ropensci/FedData/docs). Is there a way to gain admin access to the repo?

@sckott
Copy link
Contributor

sckott commented Aug 12, 2017

Cool. Try again now on the gh-pages thing - made you admin on that repo

@sckott
Copy link
Contributor

sckott commented Aug 12, 2017

closing this, thanks again for your submission

@sckott sckott closed this as completed Aug 12, 2017
@stefaniebutland
Copy link
Member

Hi @bocinsky. Thank you for offering to contribute a tech note on FedData!

We post tech notes as soon as they are submitted and lightly reviewed by one of us, usually me or Scott.

Practical instructions: https://github.com/ropensci/roweb/blob/master/.github/CONTRIBUTING.md

It would be great if you included a thank you to package reviewers with links to their GitHub or Twitter, maybe point readers to issues and what you think is next to improve the package and invite people to open or address an issue etc.

suggested tags: R, community, software, review, onboarding, package, package_name, topic labels used in onboarding review.

Ping me here with any questions!

@bocinsky
Copy link
Author

Hi @stefaniebutland. I just submitted a pull request for a FedData tech note and a headshot and description for the community page. Let me know if there is anything else I need to do!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants