New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GSODR: Global Surface Summary Daily Weather Data in R #79

Closed
adamhsparks opened this Issue Oct 14, 2016 · 61 comments

Comments

Projects
None yet
4 participants
@adamhsparks
Copy link
Member

adamhsparks commented Oct 14, 2016

Summary

  • What does this package do? (explain in 50 words or less):
    The GSODR package offers an R interface for downloading Global Surface
    Summary of the Day weather data files from the United States National Climatic
    Data Center, applies rules for cleaning and parses them into files useful for
    researchers.
  • Paste the full DESCRIPTION file inside a code block below:
Package: GSODR
Type: Package
Title: Global Summary Daily Weather Data in R
Version: 0.3
Date: 2016-10-16
Authors@R: c(person("Adam", "Sparks", role = c("aut", "cre"),
    email = "adamhsparks@gmail.com"),
    person("Tomislav", "Hengl", role = "aut",
    email = "tom.hengl@isric.org"),
    person("Andrew", "Nelson", role = "aut",
    email = "dr.andy.nelson@gmail.com"))
Maintainer: Adam Sparks <adamhsparks@gmail.com>
URL: https://github.com/adamhsparks/GSODR
BugReports: https://github.com/adamhsparks/GSODR/issues
Description:  The GSODR package offers automated downloading, parsing, cleaning
    and converting of Global Surface Summary of the Day (GSOD) weather data from
    the from the USA National Climatic Data Center (NCDC) into Comma Separated
    Values (CSV) or Geopackage (GPKG) files. The get_GSOD() function retrieves
    data from the NCDC's public FTP site and reformats it from United States
    Customary System (USCS) units to International System of Units (SI).
    Stations are individually checked for number of missing days as defined by
    the user to help ensure data quality. Stations files with too many missing
    observations are omitted from the final file. Only stations with valid
    reported latitude and longitude values are permitted in the final derived
    data set. A list of values for 200 metre buffered elevation data, derived
    from the CGIAR-CSI SRTM hole-filled 90 metre data set (Jarvis et al. 2008)
    are included for stations between -60 and 60 degrees latitude with valid
    latitude and longitude values. File output is returned as a comma-separated
    values (CSV) or GeoPackage (GPKG) file written to a local disk in a location
    selected by the user, which summarises each year by station. Additional
    useful variables, saturation vapour pressure (es), actual vapour pressure (ea)
    and relative humidity are calculated from the original data and included in
    the final data set. The resulting files can be as larger than 500mb
    depending on the user's stringency for missing data and geographic area of
    interest. Be sure to have sufficient RAM and disk space as well as a
    reasonably fast Internet connection to use this package to perform this
    operation for a global data set over many years. However, for much smaller
    and more manageable data sets, an individual country of interest may be
    selected as well as only stations falling between -60 and 60 degrees
    latitude for agroclimatology work or individual stations if the station ID
    is known. The resulting files include station data (e.g., station name,
    country, latitude, longitude, elevation) for use in a geographic
    information system (GIS). The function was largely based on T. Hengl's
    'getGSOD.R' script, available from
    <http://spatial-analyst.net/book/system/files/getGSOD.R> with enhancements
    to be cross-platform, more efficient with computing resources and more
    flexible. For information on the original data from NCDC, please see the
    GSOD readme.txt file available from,
    <http://www1.ncdc.noaa.gov/pub/data/gsod/readme.txt>.
Depends:
    R (>= 3.2.0)
License: GPL (>= 3)
Imports:
    data.table,
    doParallel,
    foreach,
    iterators,
    parallel,
    RCurl,
    readr,
    rgdal,
    sp,
    stats,
    utils
Suggests:
    chillR,
    dplyr,
    plotKML,
    spacetime,
    ggalt,
    ggplot2,
    knitr,
    lubridate,
    rgeos,
    rmarkdown,
    testthat,
    tibble,
    tidyr
RoxygenNote: 5.0.1
Encoding: UTF-8
NeedsCompilation: no
Repository: CRAN
LazyData: TRUE
ByteCompile: TRUE
VignetteBuilder: knitr
  • URL for the package (the development repository, not a stylized html page):
    https://github.com/adamhsparks/GSODR
  • Who is the target audience?
    Researchers who use daily weather data. Most individuals I've encountered so far
    are crop or plant disease modellers, hydrological modellers, etc. However,
    I'm there that there are others that I'm just unaware of due to my work-sphere.
  • Are there other R packages that accomplish the same thing? If so, what is different about yours?
    gsod: obtain daily weather data" from Jacob Van Etten, 2013
    This package was written for a instructional course and has not been submitted
    to CRAN or updated in some time. It does offer the same corrected elevation that
    GSODR offers, however the station list is not updated from the NCDC server,
    meaning that the station list provided with this package is over two years old.
    It is part of a crop modelling course, not designed to only download and clean
    GSOD data. GSODR also offers different types of file outputs, Geopackage or
    comma-separated values (CSV).
    GSODTools: R functions to select, download, and process GSOD data
    This package is well written and maintained on GitHub only. The package is not
    available on CRAN at this date, either. It offers many of the same functions as
    GSODR. However, it does not automatically convert the values to International
    System of Units (SI), which GSODR does. It also does not offer the corrected
    elevation values that GSODR offers. Florian indicated that this project was
    not top priority during the completion of his PhD but was happy to consider
    working together in the future.
    rnoaa: R interface to many NOAA data APIs
    This package offers an interface to several of NOAA's weather data sets. However,
    GSOD is not among the data sets that are available through this package.

Requirements

Confirm each of the following by checking the box. This package:

  • does not violate the Terms of Service of any service it interacts with.
  • has a CRAN and OSI accepted license.
  • contains a README with instructions for installing the development version.
  • includes documentation with examples for all functions.
  • contains a vignette with examples of its essential functions and uses.
  • has a test suite.
  • has continuous integration with Travis CI and/or another service.

Publication options

  • Do you intend for this package to go on CRAN?
  • Do you wish to automatically submit to the Journal of Open Source Software? If so:
    • The package contains a paper.md with a high-level description in the package root or in inst/.
    • The package is deposited in a long-term repository with the DOI: DOI
    • (Do not submit your package separately to JOSS)

Detail

@sckott sckott self-assigned this Oct 14, 2016

@adamhsparks

This comment has been minimized.

Copy link
Member Author

adamhsparks commented Oct 14, 2016

@sckott, I've found a bug that I need to fix.

Not sure how it's crept in. I will report back when it's corrected.

@adamhsparks

This comment has been minimized.

Copy link
Member Author

adamhsparks commented Oct 15, 2016

Well that was embarrassing. It's all fixed now, @sckott.

Glad I checked the package build again this morning.

@sckott

This comment has been minimized.

Copy link
Member

sckott commented Oct 15, 2016

Editor checks:

  • Fit: The package meets criteria for fit and overlap
  • Automated tests: Package has a testing suite and is tested via Travis-CI or another CI service.
  • License: The package has a CRAN or OSI accepted license
  • Repository: The repository link resolves correctly
  • Archive (JOSS only, may be post-review): The repository DOI resolves correctly
  • Version (JOSS only, may be post-review): Does the release version given match the GitHub release (v1.0.0)?

Editor comments

Thanks for your submission @adamhsparks - no worries about eth bug

Will be seeking reviewers, but in the meantime: I ran goodpractice::gp() on your package. Here are the results, which it may be helpful to get started with before review (you can run it yourself and dig into the results more):

It is good practice to

  ✖ write unit tests for all functions, and all package code
    in general. At least some lines are not covered in this package.

    R/get_GSOD.R:230:NA
    R/get_GSOD.R:231:NA
    R/get_GSOD.R:234:NA
    R/get_GSOD.R:235:NA
    R/get_GSOD.R:238:NA
    ... and 359 more lines

  ✖ omit "Date" in DESCRIPTION. It is not required and it
    gets invalid quite often. A build date will be added to the package
    when you perform `R CMD build` on it.
  ✖ avoid long code lines, it is bad for readability. Also,
    many people prefer editor windows that are about 80 characters
    wide. Try make your lines shorter than 80 characters

    R/get_GSOD.R:283:1
    R/get_GSOD.R:284:1
    R/get_GSOD.R:287:1
    R/get_GSOD.R:306:1
    R/get_GSOD.R:456:1
    ... and 9 more lines
  • The vignette takes a loooooooooong time to run. Please do figure out a way to make it run much more quickly.
  • TheDescription field in the DESCRIPTION file is too detailed. Please make it more concise.

Reviewers: @jeffreyhanson @deniederhut
Due date: 2016-11-08

@adamhsparks

This comment has been minimized.

Copy link
Member Author

adamhsparks commented Oct 15, 2016

Thanks for the valuable feedback. This is exactly the type of suggestions I was looking for here.

I'll get as much of this done as I can shortly.

@adamhsparks

This comment has been minimized.

Copy link
Member Author

adamhsparks commented Oct 16, 2016

I've updated the DESCRIPTION file in the package and here in this issue as suggested.

@adamhsparks

This comment has been minimized.

Copy link
Member Author

adamhsparks commented Oct 16, 2016

Vignette builds faster now, as suggested as well.

@adamhsparks

This comment has been minimized.

Copy link
Member Author

adamhsparks commented Oct 16, 2016

Tomislav Hengl has contributed to the package and is now an author. I'll be merging his commits as I have time. However, I'll be on travel this week. So it may take me a while to address this.

@sckott

This comment has been minimized.

Copy link
Member

sckott commented Oct 17, 2016

Thanks for the updates @adamhsparks - seeking reviewers

@jeffreyhanson

This comment has been minimized.

Copy link

jeffreyhanson commented Oct 20, 2016

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

  • [] As the reviewer I confirm that there are no conflicts of interest for me to review this work (such as being a major contributor to the software).

Documentation

The package includes all the following forms of documentation:

  • [] A statement of need clearly stating problems the software is designed to solve and its target audience in README
  • [] Installation instructions: for the development version of package and any non-standard dependencies in README
  • [] Vignette(s) demonstrating major functionality that runs successfully locally
  • [] Function Documentation: for all exported functions in R help
  • [] Examples for all exported functions in R Help that run successfully locally
  • [] Community guidelines including contribution guidelines in the README or CONTRIBUTING, and URL, Maintainer and BugReports fields in DESCRIPTION
Paper (for packages co-submitting to JOSS)

The package contains a paper.md with:

  • [] A short summary describing the high-level functionality of the software
  • [] Authors: A list of authors with their affiliations
  • [] A statement of need clearly staing problems the software is designed to solve and its target audience.
  • [] References: with DOIs for all those that have one (e.g. papers, datasets, software).

Functionality

  • [] Installation: Installation succeeds as documented.
  • [] Functionality: Any functional claims of the software been confirmed.
  • [] Performance: Any performance claims of the software been confirmed.
  • [] Automated tests: Unit tests cover essential functions of the package
    and a reasonable range of inputs and conditions. All tests pass on the local machine.
  • [] Packaging guidelines: The package conforms to the rOpenSci packaging guidelines

Final approval (post-review)

  • The author has responded to my review and made changes to my satisfaction. I recommend approving this package.

Estimated hours spent reviewing: 3


Review Comments

Overview

The Global Summary Of The Day dataset contains weather data collected from meteorological stations all over the planet. However, accessing and formatting this data for use in R can be rather tricky. The GSODR R package will make this a dream. This package will be incredibly useful for new R users who want easy access to this data set. It will also be useful for experienced R users who want to download data over large extents and do post-processing later in R. I very much look forward to using this package in the future.

I have several major concerns before I can recommend this package for release. First, the main function get_GSOD fails when the user wants to download data for an entire country. Personally, I think this is the biggest feature of this package. Second, the package could benefit from more automated testing. None of the examples are actually tested, and the unit tests only cover cases that are expected to fail. Third, the package only downloads data. It would be brilliant if the get_GSOD function returned data after completion. I have outlined these main issues and other issues below.

Major issues

  1. The get_GSOD function cannot download all data for an entire country. For instance, running the code get_GSOD(year=2015, country='NZ', dsn=tempdir()) returns this error:
Error in col[[i, exact = exact]] : 
 attempt to select more than one element in vectorIndex
  1. All examples are wrapped in \dontrun{} so they are not actually tested.
  2. The main function get_GSOD just saves data to disk and does not actually return data. It would be brilliant if this package returned GSOD data so the user does not have to then load it back into R. To that end, I would recommend the function returns a single tibble or SpatialPointsDataFrame object (perhaps depending on an argument?), and let the user then re-save the data where they want it stored.
  3. As far as I can tell, the vignette cannot be accessed within R. After installation, the following code vignette(package='GSODR') yields 'no vignettes found'. Additionally, the rmarkdown file for the vignette has an obscure name, so even if it could be opened within R, this would be tricky given its obscure name. I recommend naming the vignette after the package so the following code will open it vignette('GSODR').
  4. Unit testing covers cases where get_GSOD is expected to fail but it does not actually test cases that are expected to succeed.
  5. Missing data is represented using -9999. Why not use NA values? The main aim of this package is to load GSOD data into R.
  6. There is no help page for the package (ie. ?GSODR). It would be really useful to include this, even if it it just a copy-paste from the README. See here for more details.
  7. The nearest_station function will be incredibly useful for a user. It would be even more useful if--instead of printing the stations closest to a user-specified point--it returned the stations. Then this function could be used programmatically to find the closest stations to a series of points, and download data accordingly.
  8. The accompanying paper does not list authors in the html document.

Minor issues

  1. This package needs a rerun of devtools::document() before resubmission. When I ran devtools::run_examples(), this automatically re-compiled some Rd files.
  2. get_GSOD.R:20 & data.R:29: Both lines refer to a document available on the github repository. I recommend making this document an additional vignette.
  3. get_GSOD.R:279: I am unsure why foreach iterators are used here. It would be great to do offer some parallelisation and options for progress bars here. If you wrapped this in a plyr::**ply function, you could easily make it such that: (1) years can easily be processed in parallel, (2) if not processed in parallel, then a progress-bar can be shown.
  4. get_GSOD.R:322: ROpensci guidelines recommend using the httr package instead of the RCurl package.
  5. get_GSOD.R:392--393: Options are restored to the default state but this is not necessarily the previous state. These values could be cached at the beginning of the function. For example: use orginal_options <- options() at the start of the function, and at the end of the function use options(original_options).
  6. get_GSOD.R:345: the user should have explicit control over thread use to avoid accidently fork-bombing their own computer. I recommend providing a threads argument in the get_GSOD function and defaulting it to 1.
  7. get_GSOD.R:561: Here and elsewhere return(0) is used after a call to stop(). Please consider removing the return(0) as this is unnecessary .
  8. nearest_stations.R:28--29: This function changes options() and does not reset them after it finishes. This is against recommended CRAN practices. Please see previous comment on options for a suggestion on how to fix this.
  9. nearest_stations.R:34--44: This code could be rewritten for clarity and computational efficiency. I suggest something like this:
dists <- fields::rdist.earth(as.matrix(stations[c('LON', 'LAT')]), matrix(c(LON, LAT), ncol=2), miles=FALSE)
nearby <- which(dists[,1] < distance)

Typographic and stylistic issues

  1. get_GSOD.R:83--85: Documentation contains '&lt' which is not rendered as a less than symbol in the html documentation.
  2. get_GSOD.R:292: Instead of a call to utils::glob2rx, this could be used instead pattern='^.*\\.gz$'
  3. get_GOSD.R:631: This could rewritten be as !vs %in% stations[[12]] for clarity.
@adamhsparks

This comment has been minimized.

Copy link
Member Author

adamhsparks commented Oct 20, 2016

Thanks, Jeff. I'm rather embarrassed that it didn't work for you.

I'll see what needs to be done to get it fixed and report back here when/as I do.

@sckott

This comment has been minimized.

Copy link
Member

sckott commented Oct 20, 2016

thanks for the review @jeffreyhanson ! a few notes:

  • wrapping examples in \dontrun{} is what we suggest in a package like this where examples make web requests, so I'd keep that - though if there's any examples that don't make web requests, definitely do put those outside of \dontrun
@sckott

This comment has been minimized.

Copy link
Member

sckott commented Oct 20, 2016

2nd reviewer @deniederhut assigned

@jeffreyhanson

This comment has been minimized.

Copy link

jeffreyhanson commented Oct 21, 2016

@sckott Ok, thanks for correcting me on that. I just did a quick check and the nearest_stations function also has its example wrapped in a \dontrun{}. This function doesn't make web requests.

@adamhsparks

This comment has been minimized.

Copy link
Member Author

adamhsparks commented Oct 22, 2016

@jeffreyhanson, I've fixed the bug with the country request in the devel branch. Good catch, THANK YOU!

Your comment about the nearest_stations is incorrect. It does make a web query with the .fetch_stations function, thus is wrapped in a \dontrun{}.

I'll work on some of the other issue's you've raised while I wait on my plane now.

@adamhsparks

This comment has been minimized.

Copy link
Member Author

adamhsparks commented Oct 22, 2016

Add threads option to get_GSOD function for parallel processing …
ropensci/GSODR@d404cb4

Fix Documentation contains '&lt' which is not rendered …
ropensci/GSODR@84c210f

Replace pattern call on line 292 …
ropensci/GSODR@5c622af

Fix bug when ISO2c/ISO3c is set, query fails
ropensci/GSODR@434f71e

Thanks, @jeffreyhanson, boarding now.

@adamhsparks

This comment has been minimized.

Copy link
Member Author

adamhsparks commented Oct 23, 2016

Update authors in paper.md
ropensci/GSODR@12f8683

@adamhsparks

This comment has been minimized.

Copy link
Member Author

adamhsparks commented Oct 23, 2016

Edit code for clarity

get_GOSD.R:631: This could rewritten be as !vs %in% stations[[12]] for clarity.

ropensci/GSODR@8372844

@adamhsparks

This comment has been minimized.

Copy link
Member Author

adamhsparks commented Oct 23, 2016

Check and reset original options()

get_GSOD.R:392--393: Options are restored to the default state but this is not necessarily the previous state. These values could be cached at the beginning of the function. For example: use orginal_options <- options() at the start of the function, and at the end of the function use options(original_options).

ropensci/GSODR@960bb82

@adamhsparks

This comment has been minimized.

Copy link
Member Author

adamhsparks commented Oct 24, 2016

Renamed vignette to "GSODR"

ropensci/GSODR@a0713f6

Additionally, @jeffreyhanson, If you install from github using devtools, devtools::install(build_vignettes = TRUE), will build the vignette, so then using vignette(package="GSODR") a vignette is returned.

@adamhsparks

This comment has been minimized.

Copy link
Member Author

adamhsparks commented Oct 24, 2016

Added GSODR.R to create helpfile for package

There is no help page for the package (ie. ?GSODR). It would be really useful to include this, even if it it just a copy-paste from the README.

ropensci/GSODR@a404494

@adamhsparks

This comment has been minimized.

Copy link
Member Author

adamhsparks commented Oct 24, 2016

Removed return(0) after stop command

get_GSOD.R:561: Here and elsewhere return(0) is used after a call to stop(). Please consider removing the return(0) as this is unnecessary .

ropensci/GSODR@9c4c24e

@adamhsparks

This comment has been minimized.

Copy link
Member Author

adamhsparks commented Jan 6, 2017

Update. I've made a minor change to the package that does not affect the functionality.

Due to the size of the initially added climate data and the addition of yet more data by myself just this week, the package size was 5.1Mb. I've moved the additional climate data to its own R package that's available only from GitHub, https://github.com/adamhsparks/GSODR.data.

This is referenced in the README and a vignette.

@sckott

This comment has been minimized.

Copy link
Member

sckott commented Jan 6, 2017

okay, you do plan to put it on CRAN though since I imagine this will go to CRAN.

@adamhsparks

This comment has been minimized.

Copy link
Member Author

adamhsparks commented Jan 6, 2017

GSODR is on CRAN and I'll update it once this review is done. The other package, GSODR.data is too large for CRAN unless it's split into two packages or I remove some of the data sets.

I'm happy to just leave it on GitHub since it's not essential for the use of GSODR.

Thoughts?

@sckott

This comment has been minimized.

Copy link
Member

sckott commented Jan 6, 2017

Ah, if not required for GSODR, then all good,

@adamhsparks

This comment has been minimized.

Copy link
Member Author

adamhsparks commented Jan 7, 2017

I guess maybe I wasn't initially clear. It's four sources of climate data that are formatted for use with the data GSODR provides. You can link these data to GSOD data by station ID and LAT/LON.

I have one example in a vignette using a one of the data sets, but for normal use it's not required. It just makes this data more available for use with GSOD data output produced by GSODR.

So since it's mentioned in a vignette, does that require a CRAN version?

edit I could put just the dataset that's referenced in the GSODR package itself and leave the other three as the optional download from GitHub. Is that a good option?

@deniederhut

This comment has been minimized.

Copy link

deniederhut commented Jan 7, 2017

Okay, vignettes look good! I think I would still like to see some explicit correctness check in the unit tests somewhere, perhaps with a cached result, but I am recommending this package for approval. A few bugs I ran into while testing:

  1. Call to get_GSOD(years = 2010, station = "955510-99999") raises namespace error -- Error: 'safely' is not an exported object from 'namespace:purrr' ... probably in line 490 or 507
  2. Not sure if this is GSODR or my machine, but tab completing the function names sometimes throws this:
Error in fetch(key) :
  lazy-load database '/Library/Frameworks/R.framework/Versions/3.2/Resources/library/GSODR/help/GSODR.rdb' is corrupt
  1. Not super important, but max_missing seems to accept negative values and NA
@adamhsparks

This comment has been minimized.

Copy link
Member Author

adamhsparks commented Jan 7, 2017

Thanks, I have fixed # 3 easily. The other two I'm not so sure of, I've never run into either one with all my testing.

  1. I don't know, it's definitely in purrr 0.2.2. I've never had that issue. If someone has any suggestions?
  2. I ran to Google and did a quick search and found another RStudio user reporting this issue with all packages and someone on SO who had an issue with Java, but I can't find any fixes.
@jeffreyhanson

This comment has been minimized.

Copy link

jeffreyhanson commented Jan 8, 2017

Hi, thanks for the reminder. I'll take a look at this sometime this coming week.

@sckott

This comment has been minimized.

Copy link
Member

sckott commented Jan 8, 2017

its possible you have @deniederhut purrr v0.1, since safely is not in that version https://github.com/hadley/purrr/blob/v0.1.0/NAMESPACE - @adamhsparks plz add a minimum version of purrr to your DESCRIPTION file

@sckott

This comment has been minimized.

Copy link
Member

sckott commented Jan 8, 2017

lazy-load database ... is corrupt

this is solved by restarting R usually

@sckott

This comment has been minimized.

Copy link
Member

sckott commented Jan 8, 2017

@adamhsparks

This comment has been minimized.

Copy link
Member Author

adamhsparks commented Jan 8, 2017

@sckott, I've added the minimum version of purrr to DESCRIPTION. Thanks for the tip.

@jeffreyhanson

This comment has been minimized.

Copy link

jeffreyhanson commented Jan 13, 2017

I'm still on holiday in New Zealand and won't be back till the 18th. I was hoping that the local library's internet would be good enough to let me test the package, but that doesn't seem to be the case.

Would you mind if I submitted my review when I'm back home?

Also, I noticed that the DESCRIPTION is missing a Maintainer field.

@adamhsparks

This comment has been minimized.

Copy link
Member Author

adamhsparks commented Jan 13, 2017

@jeffreyhanson,
I'm following Hadley's preference, using Authors@R

You can also use separate Maintainer and Author fields. I prefer not to use these fields because Authors@R offers richer metadata.

@jeffreyhanson

This comment has been minimized.

Copy link

jeffreyhanson commented Jan 18, 2017

@adamhsparks Ok, sounds good.

@jeffreyhanson

This comment has been minimized.

Copy link

jeffreyhanson commented Jan 18, 2017

@adamhsparks @sckott

Overall, I am very impressed with this package. I think it will make a great addition to rOpenSci's
arsenal. I think it's almost perfect, I just found a few issues when running tests and checks.

  1. Some tests fail. Let me know if there's any other information I can provide to help with
    debugging. I've included details on the testing environment below.
Failed -------------------------------------------------------------------------

1. Failure: reformat_GSOD file_list parameter reformats data properly (@test-reformat_GSOD.R#12)                                                                                          
nrow(x) not equal to 1454.
1/1 mismatches
[1] 0 - 1454 == -1454


2. Failure: reformat_GSOD file_list parameter reformats data properly (@test-reformat_GSOD.R#13) 
ncol(x) not equal to 48.
1/1 mismatches
[1] 0 - 48 == -48


3. Failure: reformat_GSOD dsn parameter reformats data properly (@test-reformat_GSOD.R#23) 
nrow(x) not equal to 1454.
1/1 mismatches
[1] 0 - 1454 == -1454


4. Failure: reformat_GSOD dsn parameter reformats data properly (@test-reformat_GSOD.R#24) 
ncol(x) not equal to 48.
1/1 mismatches
[1] 0 - 48 == -48
  1. Some examples fail (run with devtools::run_examples(run=FALSE)).
  • get_gsod function.
    • Line 197: this needs to be commented out.
    • Lines 199-200: This example yields an error. This might have something
      to do with my installation of GDAL. Or the package might be a missing dependency in the
      DESCRIPTION?
get_GSOD(years = 2010, country = "Philippines", dsn = "~/",
            filename = "Philippines_GSOD", GPKG = TRUE, max_missing = 5)
trying URL 'ftp://ftp.ncdc.noaa.gov/pub/data/gsod/2010/gsod_2010.tar'
ftp data connection made, file length 98222080 bytes
==================================================
downloaded 93.7 MB

Checking stations against max_missing value
Starting data file processing
  |======================================================================| 100%

Writing GeoPackage File to Disk.

Error in rgdal::writeOGR(GSOD_XY, dsn = path.expand(outfile), layer = "GSOD",  : 
  No such driver: GPKG
  • get_station_list function.
    • Line 11: fetch_station_list() should be get_station_list()
  1. The documentation could be enhanced.
  • Several functions are missing an @return field in the documentation. In most cases they just
    need to describe the class of the returned object for completeness sake. For example,
    get_station_list just needs something like a \code{\link[data.table]{data.table} object.

Test environment

R version 3.3.2 (2016-10-31)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.5 LTS

locale:
[1] C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] GSODR_1.0.0     testthat_1.0.2  devtools_1.12.0

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.8       magrittr_1.5      roxygen2_5.0.1    maps_3.1.1       
 [5] lattice_0.20-34   R6_2.2.0          stringr_1.1.0     plyr_1.8.4       
 [9] dplyr_0.5.0       fields_8.10       tools_3.3.2       rgdal_1.2-5      
[13] grid_3.3.2        data.table_1.10.0 spam_1.4-0        R.oo_1.21.0      
[17] DBI_0.5-1         withr_1.0.2       digest_0.6.10     assertthat_0.1   
[21] tibble_1.2        crayon_1.3.2      readr_1.0.0       purrr_0.2.2      
[25] bitops_1.0-6      R.utils_2.5.0     RCurl_1.95-4.8    curl_2.3         
[29] evaluate_0.10     memoise_1.0.0     sp_1.2-4          stringi_1.1.2    
[33] R.methodsS3_1.7.1

PS. Thank you for introducing me to the .gpkg format. It looks much easier to work with than ESRI Shapefiles for vector data.

@adamhsparks

This comment has been minimized.

Copy link
Member Author

adamhsparks commented Jan 18, 2017

Hi @jeffreyhanson,
Thanks for this complete review. I've addressed most of your comments already. At least the easy ones related to documentation.

RE 2. Right now I'm checking on your GPKG error.

I've set up a fresh R install on an ElementaryOS VM (based on Xenial) and installed GSODR on that. With a fresh installation, I'm able to create a GPKG file and I have the same version of rgdal that you do, 1.2-5, though I did add a minimum version to the DESCRIPTION, >= 1.1-9, so I don't think that's the issue and I don't think I'm missing dependencies since it works for me on a fresh install.

What does rgdal::ogrDrivers return? I'm wondering if your libgdal is outdated so it's an OS issue, not an R/rgdal issue.

@jeffreyhanson

This comment has been minimized.

Copy link

jeffreyhanson commented Jan 18, 2017

Here's the output. It looks like it's an issue with my GDAL installation, so feel free to disregard the issue.

rgdal::ogrDrivers()
             name write
1          ARCGEN FALSE
2          AVCBin FALSE
3          AVCE00 FALSE
4      AeronavFAA FALSE
5             BNA  TRUE
6             CSV  TRUE
7         CouchDB  TRUE
8             DGN  TRUE
9            DODS FALSE
10            DXF  TRUE
11         EDIGEO FALSE
12 ESRI Shapefile  TRUE
13  ElasticSearch  TRUE
14            GFT  TRUE
15            GML  TRUE
16            GMT  TRUE
17       GPSBabel  TRUE
18  GPSTrackMaker  TRUE
19            GPX  TRUE
20        GeoJSON  TRUE
21         GeoRSS  TRUE
22     Geoconcept  TRUE
23       Geomedia FALSE
24            HTF FALSE
25         Idrisi FALSE
26     Interlis 1  TRUE
27     Interlis 2  TRUE
28            KML  TRUE
29         LIBKML  TRUE
30   MSSQLSpatial  TRUE
31   MapInfo File  TRUE
32         Memory  TRUE
33          MySQL  TRUE
34            NAS FALSE
35           ODBC  TRUE
36            ODS  TRUE
37           OGDI FALSE
38            OSM FALSE
39        OpenAir FALSE
40         PCIDSK  TRUE
41            PDF  TRUE
42            PDS FALSE
43         PGDump  TRUE
44           PGeo FALSE
45     PostgreSQL  TRUE
46            REC FALSE
47            S57  TRUE
48           SDTS FALSE
49       SEGUKOOA FALSE
50           SEGY FALSE
51         SQLite  TRUE
52            SUA FALSE
53            SVG FALSE
54          TIGER  TRUE
55        UK .NTF FALSE
56            VFK FALSE
57            VRT FALSE
58            WFS FALSE
59            XLS FALSE
60           XLSX  TRUE
61         XPlane FALSE
@adamhsparks

This comment has been minimized.

Copy link
Member Author

adamhsparks commented Jan 18, 2017

@jeffreyhanson,
RE 1. I have files saved in the data raw folder to check the functionality of reformat_GSOD(), which works with local files and I used a hardcoded path. I need to fix that. Good catch!

@sckott

This comment has been minimized.

Copy link
Member

sckott commented Jan 18, 2017

thanks @jeffreyhanson for taking another look

@adamhsparks looks like we're good to go then since looks like you dealt with that reformat_GSOD thing

@adamhsparks

This comment has been minimized.

Copy link
Member Author

adamhsparks commented Jan 18, 2017

@sckott I think we're good. I was waiting to check the results of Appveyor and Travis this morning when I woke up and see if everything was good as far as the reformat_GSOD testing.

I added another test related to the file saving parameters this morning based on a bug I found last night.

Everything looks good on winbuilder now as well. I'm ready to go if you are.

@sckott

This comment has been minimized.

Copy link
Member

sckott commented Jan 19, 2017

Approved 👍 Thx for your submission.

  • I've added you to a GSODR team for on the rOpenSci Github org. Accept and transfer the repo (under "Settings") to rOpenSci
  • Be sure to update any links for CI, coverage, and documentation.
  • Add ropensci footer image to bottom of readme [![ropensci_footer](http://ropensci.org/public_images/github_footer.png)](http://ropensci.org)
  • Let me know if you are interested in doing a blog post about your package on the rOpenSci blog, either long-form (https://ropensci.org/blog/) or short (https://ropensci.org/tech-notes/)

I'll get back to you soon on the JOSS stuff in a bit

@deniederhut

This comment has been minimized.

Copy link

deniederhut commented Jan 19, 2017

Congrats @adamhsparks !! 👏

@adamhsparks

This comment has been minimized.

Copy link
Member Author

adamhsparks commented Jan 20, 2017

@sckott now that the dust is settling and I've moved everything over, I'm happy to do a short-form blog post.

@sckott

This comment has been minimized.

Copy link
Member

sckott commented Jan 20, 2017

@adamhsparks great, on the blog post: We'll send an email to get that started

on JOSS:

I saw that you did a new release, what is the Zenodo DOI for that?

You can submit to https://github.com/openjournals/joss-reviews/issues now

@sckott sckott closed this Jan 20, 2017

@sckott

This comment has been minimized.

Copy link
Member

sckott commented Jan 20, 2017

@adamhsparks blarg, sorry, meant to give you https links for the footer image, plz change to [![ropensci_footer](https://ropensci.org/public_images/github_footer.png)](https://ropensci.org)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment