New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dbhydroR package #61

Closed
jsta opened this Issue Aug 1, 2016 · 34 comments

Comments

Projects
None yet
4 participants
@jsta
Contributor

jsta commented Aug 1, 2016

Summary

  • What does this package do? (explain in 50 words or less):

dbhydroR provides scripted access to the South Florida Water Management District's DBHYDRO database which holds over 35 million water quality and hydrologic data records from the Florida Everglades and surrounding areas.

  • Paste the full DESCRIPTION file inside a code block below:
Package: dbhydroR
Title: 'DBHYDRO' hydrologic and water quality data from R
Description: Client for programmatic access to the South Florida Water
  Management District's 'DBHYDRO' database at 
  http://my.sfwmd.gov/dbhydroplsql/show_dbkey_info.main_menu, with functions
  for accessing hydrologic and water quality data. 
Version: 0.1-6
Author: Joseph Stachelek <stachel2@msu.edu>
Maintainer: Joseph Stachelek <stachel2@msu.edu>
URL: http://www.github.com/SFWMD/dbhydro
BugReports: http://www.github.com/SFWMD/dbhydro/issues
Depends:
    R (>= 3.0.2)
Imports:
    httr,
    reshape2,
    RCurl,
    XML
License: GPL
LazyData: true
Suggests:
    testthat
RoxygenNote: 5.0.1
  • URL for the package (the development repository, not a stylized html page):

https://github.com/sfwmd/dbhydro

  • Who is the target audience?

Anyone interested in water quality and hydrologic data from the Florida Everglades - ecologists, engineers, meteorologists, hydrogeologists, hydrologists, etc.

  • Are there other R packages that accomplish the same thing? If so, what is different about yours?

For the most part, no. However, a very small proportion of DBHYDRO mirrors data that is also present on the USGS (United States Geological Survey) NWIS (National Water Information System). This data could then be accessed by the dataRetrieval package.

Requirements

Confirm each of the following by checking the box. This package:

  • does not violate the Terms of Service of any service it interacts with.
  • has a CRAN and OSI accepted license.
  • contains a README with instructions for installing the development version.
  • includes documentation with examples for all functions.
  • contains a vignette with examples of its essential functions and uses.
  • has a test suite.
  • has continuous integration with Travis CI and/or another service.

Publication options

  • Do you intend for this package to go on CRAN?
  • Do you wish to automatically submit to the Journal of Open Source Software? If so:
    • The package contains a paper.md with a high-level description.
    • The package is deposited in a long-term repository with the DOI:

Detail

  • Does R CMD check (or devtools::check()) succeed? Paste and describe any errors or warnings:
  • Does the package conform to rOpenSci packaging guidelines? Please describe any exceptions:

Some components like the rOpenSci footer, README section ordering, and NEWS tags are pending this initial inquiry.

  • If this is a resubmission following rejection, please explain the change in circumstances:
  • If possible, please provide recommendations of reviewers - those with experience with similar packages and/or likely users of your package - and their GitHub user names:

@sckott sckott self-assigned this Aug 1, 2016

@sckott

This comment has been minimized.

Show comment
Hide comment
@sckott

sckott Aug 1, 2016

Member

thanks for your submission @jsta - looks like you didn't check Do you intend for this package to go on CRAN?. Can you check it or explain why not

Member

sckott commented Aug 1, 2016

thanks for your submission @jsta - looks like you didn't check Do you intend for this package to go on CRAN?. Can you check it or explain why not

@jsta

This comment has been minimized.

Show comment
Hide comment
@jsta

jsta Aug 1, 2016

Contributor

@sckott it's checked now.

Contributor

jsta commented Aug 1, 2016

@sckott it's checked now.

@sckott

This comment has been minimized.

Show comment
Hide comment
@sckott

sckott Aug 1, 2016

Member

Editor checks:

  • Fit: The package meets or criteria fit and overlap
  • Automated tests: Package has a test suite but all tests are skipped
  • License: The package has a CRAN or OSI accepted license
  • Repository: The repository link resolves correctly
  • Archive (JOSS only, may be post-review): The repository DOI resolves correctly
  • Version (JOSS only, may be post-review): Does the release version given match the GitHub release (v1.0.0)?

Reviewers: @aappling-usgs @fawda123
Due date: 2016-08-23

Member

sckott commented Aug 1, 2016

Editor checks:

  • Fit: The package meets or criteria fit and overlap
  • Automated tests: Package has a test suite but all tests are skipped
  • License: The package has a CRAN or OSI accepted license
  • Repository: The repository link resolves correctly
  • Archive (JOSS only, may be post-review): The repository DOI resolves correctly
  • Version (JOSS only, may be post-review): Does the release version given match the GitHub release (v1.0.0)?

Reviewers: @aappling-usgs @fawda123
Due date: 2016-08-23

@jsta

This comment has been minimized.

Show comment
Hide comment
@jsta

jsta Aug 1, 2016

Contributor

After working with the rnoaa package I was under the impression that this was the norm.

Should I enable them all on Travis or only certain types?

Contributor

jsta commented Aug 1, 2016

After working with the rnoaa package I was under the impression that this was the norm.

Should I enable them all on Travis or only certain types?

@sckott

This comment has been minimized.

Show comment
Hide comment
@sckott

sckott Aug 2, 2016

Member

@jsta sorry about misleading! I don't remember exactly why but I usually only skip on travis temporarily if a web service is down temporarily .

Should I enable them all on Travis or only certain types?

Just skip on CRAN, and enable in all others

Member

sckott commented Aug 2, 2016

@jsta sorry about misleading! I don't remember exactly why but I usually only skip on travis temporarily if a web service is down temporarily .

Should I enable them all on Travis or only certain types?

Just skip on CRAN, and enable in all others

@jsta

This comment has been minimized.

Show comment
Hide comment
@jsta

jsta Aug 2, 2016

Contributor

Ok, tests are enabled on Travis now.

Contributor

jsta commented Aug 2, 2016

Ok, tests are enabled on Travis now.

@jsta

This comment has been minimized.

Show comment
Hide comment
@jsta

jsta Aug 2, 2016

Contributor

@fawda123 is interested in reviewing this submission.

Contributor

jsta commented Aug 2, 2016

@fawda123 is interested in reviewing this submission.

@sckott

This comment has been minimized.

Show comment
Hide comment
@sckott

sckott Aug 2, 2016

Member

@jsta I'm revising my advice above - Skip on CRAN those tests that make web requests. If there are some tests that do not, then don't skip those. It's good to have at least some tests that run on CRAN if possible, but not required

Member

sckott commented Aug 2, 2016

@jsta I'm revising my advice above - Skip on CRAN those tests that make web requests. If there are some tests that do not, then don't skip those. It's good to have at least some tests that run on CRAN if possible, but not required

@sckott

This comment has been minimized.

Show comment
Hide comment
@sckott

sckott Aug 2, 2016

Member

thanks @fawda123 for offering, assigned above #61 (comment) - Here's instructions for review https://github.com/ropensci/onboarding/blob/master/reviewing_guide.md and examples of good reviews: #22 (comment) and #20 (comment)

Member

sckott commented Aug 2, 2016

thanks @fawda123 for offering, assigned above #61 (comment) - Here's instructions for review https://github.com/ropensci/onboarding/blob/master/reviewing_guide.md and examples of good reviews: #22 (comment) and #20 (comment)

@fawda123

This comment has been minimized.

Show comment
Hide comment
@fawda123

fawda123 Aug 12, 2016

@jsta @sckott here's my review. See the comments below for items that aren't checked.

  • As the reviewer I confirm that there are no conflicts of interest for me to review this work (such as being a major contributor to the software).

Documentation

The package includes all the following forms of documentation:

  • A statement of need clearly stating problems the software is designed to solve and its target audience in README
  • Installation instructions: for the development version of package and any non-standard dependencies in README
  • Vignette(s) demonstrating major functionality that runs successfully locally
  • Function Documentation: for all exported functions in R help
  • Examples for all exported functions in R Help that run successfully locally
  • Community guidelines including contribution guidelines in the README or CONTRIBUTING, and URL, Maintainer and BugReports fields in DESCRIPTION

Functionality

  • Installation: Installation succeeds as documented.
  • Functionality: Any functional claims of the software been confirmed.
  • Performance: Any performance claims of the software been confirmed.
  • Automated tests: Unit tests cover essential functions of the package
    and a reasonable range of inputs and conditions. All tests pass on the local machine.
  • Packaging guidelines: The package conforms to the rOpenSci packaging guidelines

Final approval (post-review)

  • The author has responded to my review and made changes to my satisfaction. I recommend approving this package.

Estimated hours spent reviewing: 3


Review Comments

This package provides access to water quality and hydrological data from the dbhydro database provided by the South Florida Water Management District. Although the package has a regional focus, the database could be used to address more general ecological or environmental management questions. Having a package to more easily acquire the database will no doubt facilitate use and reach of the data.

The package is lightweight and is built around two main functions. The documentation and examples could be improved in some cases (see below), but overall I do not have any major concerns about package structure or functionality. However, access to metadata for the database is a serious limitation that needs to be addressed. Sure I could go online to find what I need but it seems like the package should facilitate that process in some way. The package does a good job retrieving data if the station id and test name are known but it would be much more helpful if there was some way to query the available information. Here are some options to consider:

  • The vignette has an example of retrieving data using a wild card for the station id. Some more examples like this would be very helpful.
  • Including more detailed metadata in the appendix of the vignette, e.g., stations, date ranges, etc.
  • Including more functions to return and search metadata. My SWMPr package has similar functions (e.g., SWMPRr::site_codes or SWMPr::site_codes_ind) but I have the advantage of much fewer stations than those in dbhydro, so I'm not sure the best approach.

Build/install

  • I was having problems building the vignette when running devtools::check(). The file dbhydroR.tex was not compiling and returning an error "support pakage l3kernel too old". Updating the package in my LaTeX distribution fixed the issue. I'm guessing you didn't have this issue since the package is already on CRAN but I wanted to mention it in case the problem occurs in the future. Otherwise, the devtools build on my Windows machine and the CRAN Windows builder (devtools::build_win()) passed with no notes, warnings, or errors.

Examples

  • The following error was returned running devtools::run_examples(pkg = '.', run = FALSE)):
> cleanhydro(gethydro(dbkey = "15081", date_min = "2013-01-01", date_max = "2013-02-02"))
Error: is.response(x) is not TRUE

I ran the check to include examples in \dontrun tags, as for the above. It makes sense to not run the example if it's supposed to fail but there is no description exaplaining what the example shows. Either fix the example above or provide some explanation as to why it fails. All other examples ran without issues.

Documentation/vignette

  • Maybe you should include some details (@details) for getwq and gethydro. The documentation for each is minimal, so perhaps add some of the text in the vignette to the help files for each function.
  • As noted above under documentation requirements, the README does not include a 'statement of need clearly stating problems the software is designed to solve and its target audience'. The text from the onboarding issue above would be a good addition, i.e., 'dbhydroR provides scripted access to the South Florida Water Management District's DBHYDRO database which holds over 35 million water quality and hydrologic data records from the Florida Everglades and surrounding areas. The target audience is anyone interested in water quality and hydrologic data from the Florida Everglades - ecologists, engineers, meteorologists, hydrogeologists, hydrologists, etc.'
  • For the vignette (or even on the README), I think it would be useful to provide a more user-friendly view of the available stations for getwq. The link in the vignette is to a KMZ file, which is not easily viewed. I tried installing the KML/KMZ Viewer for Chrome but it was blocked on my goverment computer, so I suspect others would have problems. Another option is to include an attachment to the vignette with a table or map showing the stations.

Functions

  • I'm wondering if cleanhydro and cleanwq should be exported since they are only used within getwq and gethydro. I say this because the documentation is sparse (no details, no returned value description, minimal examples). Maybe it's better not to export. Would a user ever have a separate need for these functions?
  • For cleanwq, the vignette says that rows with missing data are removed. What happens with these rows if multiple stations or 'test_names' are queried? Is the entire row removed even if data are available for the other columns? I think some clarification is needed.

Contributing

  • Might be worth including a contributing.md file on the repo, or something more minimal in the README. Maybe modify this for your needs?

Compliance with rOpenSci Packaging Guide

My comments above address most of the compliance concerns for rOpenSci. Below are some minor, additional points.

Function variable naming

I guess the CRAN gatekeepers don't care about this but I was having a mental block with getwq given it's similarity to getwd. You might consider changing the function name to make it more distinct.

README

The ropensci footer will have to be added eventually: [![ropensci_footer](http://ropensci.org/public_images/github_footer.png)](http://ropensci.org)

Code of conduct

Also consider adding the code of conduct badge with devtools::use_code_of_conduct.

Hopefully these comments are helpful. Let me know if you have any questions.

fawda123 commented Aug 12, 2016

@jsta @sckott here's my review. See the comments below for items that aren't checked.

  • As the reviewer I confirm that there are no conflicts of interest for me to review this work (such as being a major contributor to the software).

Documentation

The package includes all the following forms of documentation:

  • A statement of need clearly stating problems the software is designed to solve and its target audience in README
  • Installation instructions: for the development version of package and any non-standard dependencies in README
  • Vignette(s) demonstrating major functionality that runs successfully locally
  • Function Documentation: for all exported functions in R help
  • Examples for all exported functions in R Help that run successfully locally
  • Community guidelines including contribution guidelines in the README or CONTRIBUTING, and URL, Maintainer and BugReports fields in DESCRIPTION

Functionality

  • Installation: Installation succeeds as documented.
  • Functionality: Any functional claims of the software been confirmed.
  • Performance: Any performance claims of the software been confirmed.
  • Automated tests: Unit tests cover essential functions of the package
    and a reasonable range of inputs and conditions. All tests pass on the local machine.
  • Packaging guidelines: The package conforms to the rOpenSci packaging guidelines

Final approval (post-review)

  • The author has responded to my review and made changes to my satisfaction. I recommend approving this package.

Estimated hours spent reviewing: 3


Review Comments

This package provides access to water quality and hydrological data from the dbhydro database provided by the South Florida Water Management District. Although the package has a regional focus, the database could be used to address more general ecological or environmental management questions. Having a package to more easily acquire the database will no doubt facilitate use and reach of the data.

The package is lightweight and is built around two main functions. The documentation and examples could be improved in some cases (see below), but overall I do not have any major concerns about package structure or functionality. However, access to metadata for the database is a serious limitation that needs to be addressed. Sure I could go online to find what I need but it seems like the package should facilitate that process in some way. The package does a good job retrieving data if the station id and test name are known but it would be much more helpful if there was some way to query the available information. Here are some options to consider:

  • The vignette has an example of retrieving data using a wild card for the station id. Some more examples like this would be very helpful.
  • Including more detailed metadata in the appendix of the vignette, e.g., stations, date ranges, etc.
  • Including more functions to return and search metadata. My SWMPr package has similar functions (e.g., SWMPRr::site_codes or SWMPr::site_codes_ind) but I have the advantage of much fewer stations than those in dbhydro, so I'm not sure the best approach.

Build/install

  • I was having problems building the vignette when running devtools::check(). The file dbhydroR.tex was not compiling and returning an error "support pakage l3kernel too old". Updating the package in my LaTeX distribution fixed the issue. I'm guessing you didn't have this issue since the package is already on CRAN but I wanted to mention it in case the problem occurs in the future. Otherwise, the devtools build on my Windows machine and the CRAN Windows builder (devtools::build_win()) passed with no notes, warnings, or errors.

Examples

  • The following error was returned running devtools::run_examples(pkg = '.', run = FALSE)):
> cleanhydro(gethydro(dbkey = "15081", date_min = "2013-01-01", date_max = "2013-02-02"))
Error: is.response(x) is not TRUE

I ran the check to include examples in \dontrun tags, as for the above. It makes sense to not run the example if it's supposed to fail but there is no description exaplaining what the example shows. Either fix the example above or provide some explanation as to why it fails. All other examples ran without issues.

Documentation/vignette

  • Maybe you should include some details (@details) for getwq and gethydro. The documentation for each is minimal, so perhaps add some of the text in the vignette to the help files for each function.
  • As noted above under documentation requirements, the README does not include a 'statement of need clearly stating problems the software is designed to solve and its target audience'. The text from the onboarding issue above would be a good addition, i.e., 'dbhydroR provides scripted access to the South Florida Water Management District's DBHYDRO database which holds over 35 million water quality and hydrologic data records from the Florida Everglades and surrounding areas. The target audience is anyone interested in water quality and hydrologic data from the Florida Everglades - ecologists, engineers, meteorologists, hydrogeologists, hydrologists, etc.'
  • For the vignette (or even on the README), I think it would be useful to provide a more user-friendly view of the available stations for getwq. The link in the vignette is to a KMZ file, which is not easily viewed. I tried installing the KML/KMZ Viewer for Chrome but it was blocked on my goverment computer, so I suspect others would have problems. Another option is to include an attachment to the vignette with a table or map showing the stations.

Functions

  • I'm wondering if cleanhydro and cleanwq should be exported since they are only used within getwq and gethydro. I say this because the documentation is sparse (no details, no returned value description, minimal examples). Maybe it's better not to export. Would a user ever have a separate need for these functions?
  • For cleanwq, the vignette says that rows with missing data are removed. What happens with these rows if multiple stations or 'test_names' are queried? Is the entire row removed even if data are available for the other columns? I think some clarification is needed.

Contributing

  • Might be worth including a contributing.md file on the repo, or something more minimal in the README. Maybe modify this for your needs?

Compliance with rOpenSci Packaging Guide

My comments above address most of the compliance concerns for rOpenSci. Below are some minor, additional points.

Function variable naming

I guess the CRAN gatekeepers don't care about this but I was having a mental block with getwq given it's similarity to getwd. You might consider changing the function name to make it more distinct.

README

The ropensci footer will have to be added eventually: [![ropensci_footer](http://ropensci.org/public_images/github_footer.png)](http://ropensci.org)

Code of conduct

Also consider adding the code of conduct badge with devtools::use_code_of_conduct.

Hopefully these comments are helpful. Let me know if you have any questions.

@sckott

This comment has been minimized.

Show comment
Hide comment
@sckott

sckott Aug 12, 2016

Member

@fawda123 thanks very much for your review!

Member

sckott commented Aug 12, 2016

@fawda123 thanks very much for your review!

@aappling-usgs

This comment has been minimized.

Show comment
Hide comment
@aappling-usgs

aappling-usgs Aug 15, 2016

Package Review

Joseph (@jsta) and Scott (@sckott), here's my review. Marcus (@fawda123), thanks for a thoughtful review to build on.

  • As the reviewer I confirm that there are no conflicts of interest for me to review this work (such as being a major contributor to the software).

Documentation

The package includes all the following forms of documentation:

  • A statement of need clearly stating problems the software is designed to solve and its target audience in README (I agree with @fawda123's suggestion)
  • Installation instructions: for the development version of package and any non-standard dependencies in README
  • Vignette(s) demonstrating major functionality that runs successfully locally
  • Function Documentation: for all exported functions in R help
  • Examples for all exported functions in R Help that run successfully locally
  • Community guidelines including contribution guidelines in the README or CONTRIBUTING, and URL, Maintainer and BugReports fields in DESCRIPTION (just missing the contribution guidelines)

Functionality

  • Installation: Installation succeeds as documented.
  • Functionality: Any functional claims of the software been confirmed. (see below for some edge cases)
  • Performance: Any performance claims of the software been confirmed.
  • Automated tests: Unit tests cover essential functions of the package
    and a reasonable range of inputs and conditions. All tests pass on the local machine. (but see below for some edge cases that could also be tested)
  • Packaging guidelines: The package conforms to the rOpenSci packaging guidelines

Final approval (post-review)

  • The author has responded to my review and made changes to my satisfaction. I recommend approving this package.

Estimated hours spent reviewing: 5


Review Comments

This package provides scriptable access to an extensive Florida water quality/quantity database, DBHYDRO. It's also noticeably faster than the browser-based interface. dbhydroR will enhance the usability of the database and the reproducibility of analyses of DBHYDRO data.

I agree with @fawda123 that the largest room for improvement is in the documentation. I have also made suggestions below for increasing the flexibility of the data-cleaning functions to serve several common use cases, and I've pointed out a few edge cases where the package's functionality might surprise a user.

Documentation

As @fawda123 noted, it can be hard to identify the station IDs and tests you want if you don't know what the options are. There are a few ways you could helper users:

  • You have several great links in the vignette that could be ported to the corresponding help files.
  • The Google Earth kmz file is excellent for interactively identifying sites and timeseries. However, http://my.sfwmd.gov/KMLEXT/CUSTOMKMLS/DBHydro/DBHydroKML/DBHYDRO_KML.kmz doesn't open a browser window as a user might expect - maybe include some suggestions for how to best open the file, like on p. 56 of the DBHYDRO manual.
  • If you're feeling ambitious, functions that return lists of options (sites, tests) would enhance users' ability to search sites programatically. I also see a simplicity/maintainability argument for avoiding this route.

Questions that arose on my first encounter with the package/dataset:

  • What does a dbkey represent? A site, a timeseries, ...?
  • How should I cite DBHYDRO and/or the dbhydroR package?

Other documentation thoughts

  • The description of "cleaned output" in the vignette could also be copied right over to the help files for getwd and gethydro (and their internal cleaning functions if those continue to be exported).
  • Contribution guidelines could be added to the README or a CONTRIBUTING file.

Build/install

  • I had no trouble installing/building the package from CRAN, GitHub, or a local clone of the repository.
  • The GitHub repository name is dbhydro while the package name is dbhydroR. You might consider switching to the same name in both places to make it ever-so-slightly easier to find the package on GitHub. (E.g., my first try was devtools::install_github('SFWMD/dbhydroR') because I knew the package name and only glanced at the URL.)

Examples/tests

  • I encountered the same error as @fawda123 on running the example for cleanhydro:
> devtools::run_examples(run=FALSE)
...
> cleanhydro(gethydro(dbkey = "15081", date_min = "2013-01-01", date_max = "2013-02-02"))
Error: is.response(x) is not TRUE
  • devtools::check() runs cleanly with 0 errors, 0 warnings, and 0 notes.

Possibly unnecessary files & code

  • The NEWS file has output: pdf_document at the top. Are you using this feature?
  • Do you need all of the files in dbhydro/vignettes? DFlowInterpR.bib~, Thumbs.db, and Untitled Document~ (and maybe others) might not be getting used.

Functions

getwq + gethydro

These functions have a clean, intuitive interface. A few thoughts:

  • It would be helpful to more thoroughly explain dbkey in the documentation for gethydro.
  • Is it / could it be possible to pass dbkey into getwq, as it is for gethydro? One dbkey might be easier than site+test for some programming applications, and it looks like water quality data do have dbkeys, e.g.:
> getdbkey(stationid = "C111%", stat = 'MEAN', category = "WQ")
Search results for 'C111% WQ'
  Dbkey    Group Data Type Freq Recorder  Start Date    End Date
1 38097 C111WETL      H2OT   DA     ???? 04-OCT-2003 10-OCT-2011
2 38104 C111WETL     SALIN   DA     ???? 04-OCT-2003 10-OCT-2011
3 38100 C111WETL     SCOND   DA     ???? 04-OCT-2003 10-OCT-2011
  • Two things only occur in getwq if raw==FALSE which I expected to be conditional: (1) the message "No data found" doesn't occur even if there are no data, and (2) MDL handling doesn't occur even if I specify mdl_handling. My favorite solution would be to make these things happen regardless of the value of raw, but another option would be to document them prominently in the ?getwq help file.
> getwq("FLAB01", "2014-09-14", "2014-09-18", "NITRATE+NITRITE-N", raw = TRUE)$Value
[1] -0.01
> getwq("FLAB01", "2014-09-14", "2014-09-18", "NITRATE+NITRITE-N", raw = TRUE, mdl_handling='half')$Value
[1] -0.01
getdbkey

Though not advertised in the README, the getdbkey function looks like a useful way to identify datasets to download. With that in mind, I have a few suggestions that are mostly illustrated by this example:

> getdbkey(stationid = "C111%", stat = 'MEAN', category = "WQ", detail.level="full")
Search results for 'C111% WQ'
  Get Data Dbkey \nStation\n    Group Data Type Freq Stat Recorder Agency
1          38097  C111WETLND C111WETL      H2OT   DA MEAN     ????   USGS
2          38104  C111WETLND C111WETL     SALIN   DA MEAN     ????   USGS
3          38100  C111WETLND C111WETL     SCOND   DA MEAN     ????   USGS
   Start Date    End Date Strata County Op Num Latitude Longitude    X Coord
1 04-OCT-2003 10-OCT-2011      0    DAD      Â   251740    803112 814755.394
2 04-OCT-2003 10-OCT-2011      0    DAD      Â   251740    803112 814755.394
3 04-OCT-2003 10-OCT-2011      0    DAD      Â   251740    803112 814755.394
     Y Coord   Basin Struct Section Township Range
1 349549.418 C111_CO      Â      15       59    38
2 349549.418 C111_CO      Â      15       59    38
3 349549.418 C111_CO      Â      15       59    38
  • Calls with detail.level='full' return a column named "\nStation\n". I'd prefer "Station" if there's no reason to stick with the extra newlines.
  • The values of Latitude and Longitude are unconventionally formatted. Would you consider converting them to decimal degrees?
  • Are the values in the "Struct" column being parsed correctly? I don't know what to make of Â. It seems to appear in other columns, too, in other queries.
  • Does anything useful ever appear in the "Get Data" column?
  • Consider using stringsAsFactors=FALSE so that Station, Dbkey, etc. come back as strings.
  • When I try to get a more complete list of sites/tests to choose from, it seems that I'm limited to 100 dbkeys. This limitation should probably be more clearly documented and/or circumvented.
    http://my.sfwmd.gov/dbhydroplsql/show_dbkey_info.show_station_info?v_station=I%25
# we'd expect >100 dbkeys in this query
> nrow(getdbkey(category = "GW", detail.level = "full"))
Search results for 'NA GW'
[1] 100
# because we easily get >100 sites combined in these two queries:
> dba <- getdbkey(stationid = 'A%', category = "GW", detail.level = "full")
Search results for 'A% GW'
> dbi <- getdbkey(stationid = 'I%', category = "GW", detail.level = "full")
Search results for 'I% GW'
> nrow(dba) + nrow(dbi)
[1] 134
cleanwq + cleanhydro
> h <- gethydro(dbkey = "IY639", date_min = "2009-01-30", date_max = "2009-01-31")
Warning messages:
1: In cleanhydro(res) : Column headings missing. Guessing...
2: In cleanhydro(res) : Instantaneous data detected
> head(h$date)
[1] "2009-01-30 00:40:00 MST" "2009-01-30 00:55:00 MST" "2009-01-30 01:10:00 MST"
[4] "2009-01-30 01:25:00 MST" "2009-01-30 01:40:00 MST" "2009-01-30 01:55:00 MST"
  • In cleanhydro, why is "Instantaneous data detected" a warning rather than a message? Is it possible for instantaneous data to make trouble?
  • At first glance, I agreed with @fawda123 that it would make sense not to export the cleanwq and cleanhydro functions because they are wrapped by getwd and gethydro. After more thought, though, I decided that I might often want to retain the Station.ID column and the long format (e.g., for dplyr-style analyses of multiple sites or tests) but might also want to specify the mdl_handling and/or get back parsed timestamps. I see no one-line option for these use cases, and it makes me wonder if the cleaning functionality could be split into things you'd almost always want (parsed timestamps, connecting metadata to headers, maybe move the Value column closer to the left) and things you might want to specify in various ways (MDL handling, selecting specific columns, switching between wide and long, renaming the value column). The former functionality could be the default in getwd and gethydro while the latter could live in (a) exported functions that you'd call separately or (b) new arguments to getwd and gethydro.
  • If you decide that cleanwq and cleanhydro should continue to be exported functions, they should be used in examples in the README and vignette.

Signing off

This package is already a nice contribution - thanks for developing it. Let me know if you'd like to clarify/discuss any of the above points. Best, Alison (@aappling-usgs)

aappling-usgs commented Aug 15, 2016

Package Review

Joseph (@jsta) and Scott (@sckott), here's my review. Marcus (@fawda123), thanks for a thoughtful review to build on.

  • As the reviewer I confirm that there are no conflicts of interest for me to review this work (such as being a major contributor to the software).

Documentation

The package includes all the following forms of documentation:

  • A statement of need clearly stating problems the software is designed to solve and its target audience in README (I agree with @fawda123's suggestion)
  • Installation instructions: for the development version of package and any non-standard dependencies in README
  • Vignette(s) demonstrating major functionality that runs successfully locally
  • Function Documentation: for all exported functions in R help
  • Examples for all exported functions in R Help that run successfully locally
  • Community guidelines including contribution guidelines in the README or CONTRIBUTING, and URL, Maintainer and BugReports fields in DESCRIPTION (just missing the contribution guidelines)

Functionality

  • Installation: Installation succeeds as documented.
  • Functionality: Any functional claims of the software been confirmed. (see below for some edge cases)
  • Performance: Any performance claims of the software been confirmed.
  • Automated tests: Unit tests cover essential functions of the package
    and a reasonable range of inputs and conditions. All tests pass on the local machine. (but see below for some edge cases that could also be tested)
  • Packaging guidelines: The package conforms to the rOpenSci packaging guidelines

Final approval (post-review)

  • The author has responded to my review and made changes to my satisfaction. I recommend approving this package.

Estimated hours spent reviewing: 5


Review Comments

This package provides scriptable access to an extensive Florida water quality/quantity database, DBHYDRO. It's also noticeably faster than the browser-based interface. dbhydroR will enhance the usability of the database and the reproducibility of analyses of DBHYDRO data.

I agree with @fawda123 that the largest room for improvement is in the documentation. I have also made suggestions below for increasing the flexibility of the data-cleaning functions to serve several common use cases, and I've pointed out a few edge cases where the package's functionality might surprise a user.

Documentation

As @fawda123 noted, it can be hard to identify the station IDs and tests you want if you don't know what the options are. There are a few ways you could helper users:

  • You have several great links in the vignette that could be ported to the corresponding help files.
  • The Google Earth kmz file is excellent for interactively identifying sites and timeseries. However, http://my.sfwmd.gov/KMLEXT/CUSTOMKMLS/DBHydro/DBHydroKML/DBHYDRO_KML.kmz doesn't open a browser window as a user might expect - maybe include some suggestions for how to best open the file, like on p. 56 of the DBHYDRO manual.
  • If you're feeling ambitious, functions that return lists of options (sites, tests) would enhance users' ability to search sites programatically. I also see a simplicity/maintainability argument for avoiding this route.

Questions that arose on my first encounter with the package/dataset:

  • What does a dbkey represent? A site, a timeseries, ...?
  • How should I cite DBHYDRO and/or the dbhydroR package?

Other documentation thoughts

  • The description of "cleaned output" in the vignette could also be copied right over to the help files for getwd and gethydro (and their internal cleaning functions if those continue to be exported).
  • Contribution guidelines could be added to the README or a CONTRIBUTING file.

Build/install

  • I had no trouble installing/building the package from CRAN, GitHub, or a local clone of the repository.
  • The GitHub repository name is dbhydro while the package name is dbhydroR. You might consider switching to the same name in both places to make it ever-so-slightly easier to find the package on GitHub. (E.g., my first try was devtools::install_github('SFWMD/dbhydroR') because I knew the package name and only glanced at the URL.)

Examples/tests

  • I encountered the same error as @fawda123 on running the example for cleanhydro:
> devtools::run_examples(run=FALSE)
...
> cleanhydro(gethydro(dbkey = "15081", date_min = "2013-01-01", date_max = "2013-02-02"))
Error: is.response(x) is not TRUE
  • devtools::check() runs cleanly with 0 errors, 0 warnings, and 0 notes.

Possibly unnecessary files & code

  • The NEWS file has output: pdf_document at the top. Are you using this feature?
  • Do you need all of the files in dbhydro/vignettes? DFlowInterpR.bib~, Thumbs.db, and Untitled Document~ (and maybe others) might not be getting used.

Functions

getwq + gethydro

These functions have a clean, intuitive interface. A few thoughts:

  • It would be helpful to more thoroughly explain dbkey in the documentation for gethydro.
  • Is it / could it be possible to pass dbkey into getwq, as it is for gethydro? One dbkey might be easier than site+test for some programming applications, and it looks like water quality data do have dbkeys, e.g.:
> getdbkey(stationid = "C111%", stat = 'MEAN', category = "WQ")
Search results for 'C111% WQ'
  Dbkey    Group Data Type Freq Recorder  Start Date    End Date
1 38097 C111WETL      H2OT   DA     ???? 04-OCT-2003 10-OCT-2011
2 38104 C111WETL     SALIN   DA     ???? 04-OCT-2003 10-OCT-2011
3 38100 C111WETL     SCOND   DA     ???? 04-OCT-2003 10-OCT-2011
  • Two things only occur in getwq if raw==FALSE which I expected to be conditional: (1) the message "No data found" doesn't occur even if there are no data, and (2) MDL handling doesn't occur even if I specify mdl_handling. My favorite solution would be to make these things happen regardless of the value of raw, but another option would be to document them prominently in the ?getwq help file.
> getwq("FLAB01", "2014-09-14", "2014-09-18", "NITRATE+NITRITE-N", raw = TRUE)$Value
[1] -0.01
> getwq("FLAB01", "2014-09-14", "2014-09-18", "NITRATE+NITRITE-N", raw = TRUE, mdl_handling='half')$Value
[1] -0.01
getdbkey

Though not advertised in the README, the getdbkey function looks like a useful way to identify datasets to download. With that in mind, I have a few suggestions that are mostly illustrated by this example:

> getdbkey(stationid = "C111%", stat = 'MEAN', category = "WQ", detail.level="full")
Search results for 'C111% WQ'
  Get Data Dbkey \nStation\n    Group Data Type Freq Stat Recorder Agency
1          38097  C111WETLND C111WETL      H2OT   DA MEAN     ????   USGS
2          38104  C111WETLND C111WETL     SALIN   DA MEAN     ????   USGS
3          38100  C111WETLND C111WETL     SCOND   DA MEAN     ????   USGS
   Start Date    End Date Strata County Op Num Latitude Longitude    X Coord
1 04-OCT-2003 10-OCT-2011      0    DAD      Â   251740    803112 814755.394
2 04-OCT-2003 10-OCT-2011      0    DAD      Â   251740    803112 814755.394
3 04-OCT-2003 10-OCT-2011      0    DAD      Â   251740    803112 814755.394
     Y Coord   Basin Struct Section Township Range
1 349549.418 C111_CO      Â      15       59    38
2 349549.418 C111_CO      Â      15       59    38
3 349549.418 C111_CO      Â      15       59    38
  • Calls with detail.level='full' return a column named "\nStation\n". I'd prefer "Station" if there's no reason to stick with the extra newlines.
  • The values of Latitude and Longitude are unconventionally formatted. Would you consider converting them to decimal degrees?
  • Are the values in the "Struct" column being parsed correctly? I don't know what to make of Â. It seems to appear in other columns, too, in other queries.
  • Does anything useful ever appear in the "Get Data" column?
  • Consider using stringsAsFactors=FALSE so that Station, Dbkey, etc. come back as strings.
  • When I try to get a more complete list of sites/tests to choose from, it seems that I'm limited to 100 dbkeys. This limitation should probably be more clearly documented and/or circumvented.
    http://my.sfwmd.gov/dbhydroplsql/show_dbkey_info.show_station_info?v_station=I%25
# we'd expect >100 dbkeys in this query
> nrow(getdbkey(category = "GW", detail.level = "full"))
Search results for 'NA GW'
[1] 100
# because we easily get >100 sites combined in these two queries:
> dba <- getdbkey(stationid = 'A%', category = "GW", detail.level = "full")
Search results for 'A% GW'
> dbi <- getdbkey(stationid = 'I%', category = "GW", detail.level = "full")
Search results for 'I% GW'
> nrow(dba) + nrow(dbi)
[1] 134
cleanwq + cleanhydro
> h <- gethydro(dbkey = "IY639", date_min = "2009-01-30", date_max = "2009-01-31")
Warning messages:
1: In cleanhydro(res) : Column headings missing. Guessing...
2: In cleanhydro(res) : Instantaneous data detected
> head(h$date)
[1] "2009-01-30 00:40:00 MST" "2009-01-30 00:55:00 MST" "2009-01-30 01:10:00 MST"
[4] "2009-01-30 01:25:00 MST" "2009-01-30 01:40:00 MST" "2009-01-30 01:55:00 MST"
  • In cleanhydro, why is "Instantaneous data detected" a warning rather than a message? Is it possible for instantaneous data to make trouble?
  • At first glance, I agreed with @fawda123 that it would make sense not to export the cleanwq and cleanhydro functions because they are wrapped by getwd and gethydro. After more thought, though, I decided that I might often want to retain the Station.ID column and the long format (e.g., for dplyr-style analyses of multiple sites or tests) but might also want to specify the mdl_handling and/or get back parsed timestamps. I see no one-line option for these use cases, and it makes me wonder if the cleaning functionality could be split into things you'd almost always want (parsed timestamps, connecting metadata to headers, maybe move the Value column closer to the left) and things you might want to specify in various ways (MDL handling, selecting specific columns, switching between wide and long, renaming the value column). The former functionality could be the default in getwd and gethydro while the latter could live in (a) exported functions that you'd call separately or (b) new arguments to getwd and gethydro.
  • If you decide that cleanwq and cleanhydro should continue to be exported functions, they should be used in examples in the README and vignette.

Signing off

This package is already a nice contribution - thanks for developing it. Let me know if you'd like to clarify/discuss any of the above points. Best, Alison (@aappling-usgs)

@sckott

This comment has been minimized.

Show comment
Hide comment
@sckott

sckott Aug 25, 2016

Member

@aappling-usgs thanks so much for your review!

@jsta let us know if you need any help, and continue the conversation here

Member

sckott commented Aug 25, 2016

@aappling-usgs thanks so much for your review!

@jsta let us know if you need any help, and continue the conversation here

@jsta

This comment has been minimized.

Show comment
Hide comment
@jsta

jsta Aug 28, 2016

Contributor

Thanks for the review @fawda123!

Review Comments (@fawda123)

This package provides access to water quality and hydrological data from the dbhydro database provided by the South Florida Water Management District. Although the package has a regional focus, the database could be used to address more general ecological or environmental management questions. Having a package to more easily acquire the database will no doubt facilitate use and reach of the data.

The package is lightweight and is built around two main functions. The documentation and examples could be improved in some cases (see below), but overall I do not have any major concerns about package structure or functionality. However, access to metadata for the database is a serious limitation that needs to be addressed. Sure I could go online to find what I need but it seems like the package should facilitate that process in some way. The package does a good job retrieving data if the station id and test name are known but it would be much more helpful if there was some way to query the available information. Here are some options to consider:

  • The vignette has an example of retrieving data using a wild card for the station id. Some more examples like this would be very helpful.

    I believe that wild-cards only work with station-id(s). Are you thinking that test-names would also be exposed to wildcard selection?

  • Including more detailed metadata in the appendix of the vignette, e.g., stations, date ranges, etc.

I added a link to the available test-name listing in the getwq docs. I think there may be too many stations to include a managable listing in the vignette but see response to the next comment.

  • Including more functions to return and search metadata. My SWMPr package has similar functions (e.g., SWMPRr::site_codes or SWMPr::site_codes_ind) but I have the advantage of much fewer stations than those in dbhydro, so I'm not sure the best approach.

Thank you for the suggestion. I am working on wrapping the station-search utility at http://my.sfwmd.gov/dbhydroplsql/water_quality_data.show_group_station_characters. For now, I tried to make map-based selection via the ArcGIS online or Google Earth interface more prominent in the vignette and README.

https://github.com/SFWMD/dbhydroR/issues/3
SFWMD/dbhydroR@11c57f3

Build/install

  • I was having problems building the vignette when running devtools::check(). The file dbhydroR.tex was not compiling and returning an error "support pakage l3kernel too old". Updating the package in my LaTeX distribution fixed the issue. I'm guessing you didn't have this issue since the package is already on CRAN but I wanted to mention it in case the problem occurs in the future. Otherwise, the devtools build on my Windows machine and the CRAN Windows builder (devtools::build_win()) passed with no notes, warnings, or errors.

Examples

  • The following error was returned running devtools::run_examples(pkg = '.', run = FALSE)):

cleanhydro(gethydro(dbkey = "15081", date_min = "2013-01-01", date_max = "2013-02-02"))
Error: is.response(x) is not TRUE

I ran the check to include examples in \dontrun tags, as for the above. It makes sense to not run the example if it's supposed to fail but there is no description exaplaining what the example shows. Either fix the example above or provide some explanation as to why it fails. All other examples ran without issues.

The example has been fixed and now runs without error

SFWMD/dbhydroR@ca676ef
SFWMD/dbhydroR@86436e3

Documentation/vignette

  • Maybe you should include some details (@details) for getwq and gethydro. The documentation for each is minimal, so perhaps add some of the text in the vignette to the help files for each function.

    Great suggestion! More detail has been added to the docs for getwq and gethydro.

    SFWMD/dbhydroR@8b67b62

  • As noted above under documentation requirements, the README does not include a 'statement of need clearly stating problems the software is designed to solve and its target audience'. The text from the onboarding issue above would be a good addition, i.e., 'dbhydroR provides scripted access to the South Florida Water Management District's DBHYDRO database which holds over 35 million water quality and hydrologic data records from the Florida Everglades and surrounding areas. The target audience is anyone interested in water quality and hydrologic data from the Florida Everglades - ecologists, engineers, meteorologists, hydrogeologists, hydrologists, etc.'

    This has been added to the README and vignette.

    SFWMD/dbhydroR@6a277de
    SFWMD/dbhydroR@7a7d414

  • For the vignette (or even on the README), I think it would be useful to provide a more user-friendly view of the available stations for getwq. The link in the vignette is to a KMZ file, which is not easily viewed. I tried installing the KML/KMZ Viewer for Chrome but it was blocked on my goverment computer, so I suspect others would have problems. Another option is to include an attachment to the vignette with a table or map showing the stations.

    Links to the equivalent ArcGIS Online station map have been added to the vignette and README, and function docs. Hopefully this will make it more generally accessible.

    SFWMD/dbhydroR@11c57f3
    SFWMD/dbhydroR@3750b84
    SFWMD/dbhydroR@e66be8f

Functions

  • I'm wondering if cleanhydro and cleanwq should be exported since they are only used within getwq and gethydro. I say this because the documentation is sparse (no details, no returned value description, minimal examples). Maybe it's better not to export. Would a user ever have a separate need for these functions?

    I think it is best to have these functions exported. I envision a use-case where dplyr/SQL-like data would be preferable to the default format. Exposing these functions would allow for the ability to switch to non-dplyr/SQL-like data without re-pulling the data. Also, I added more detail to the clean docs and refactored cleanhydro to be more consistent with cleanwq*

    SFWMD/dbhydroR@86436e3

  • For cleanwq, the vignette says that rows with missing data are removed. What happens with these rows if multiple stations or 'test_names' are queried? Is the entire row removed even if data are available for the other columns? I think some clarification is needed.

    Hmm, possibly you are referring to the "QA blanks" discussion? This refers not to missing data but to analytical samples run to check for equipment/process contamination. I have added text to the vignette to this effect.

    SFWMD/dbhydroR@5d4fe25

Contributing

  • Might be worth including a contributing.md file on the repo, or something more minimal in the README. Maybe modify this for your needs?

Good suggestion. I think it might be best to wait on this until we are closer to acceptance?

Compliance with rOpenSci Packaging Guide

My comments above address most of the compliance concerns for rOpenSci. Below are some minor, additional points.

Function variable naming

I guess the CRAN gatekeepers don't care about this but I was having a mental block with getwq given it's similarity to getwd. You might consider changing the function name to make it more distinct.

README

The ropensci footer will have to be added eventually: ropensci_footer

I think the onboarding guidelines have been updated to say that the footer is only added after a package has been accepted.

Code of conduct

Also consider adding the code of conduct badge with devtools::use_code_of_conduct.

A COC has been added.

SFWMD/dbhydroR@0758c0c

Hopefully these comments are helpful. Let me know if you have any questions.

Contributor

jsta commented Aug 28, 2016

Thanks for the review @fawda123!

Review Comments (@fawda123)

This package provides access to water quality and hydrological data from the dbhydro database provided by the South Florida Water Management District. Although the package has a regional focus, the database could be used to address more general ecological or environmental management questions. Having a package to more easily acquire the database will no doubt facilitate use and reach of the data.

The package is lightweight and is built around two main functions. The documentation and examples could be improved in some cases (see below), but overall I do not have any major concerns about package structure or functionality. However, access to metadata for the database is a serious limitation that needs to be addressed. Sure I could go online to find what I need but it seems like the package should facilitate that process in some way. The package does a good job retrieving data if the station id and test name are known but it would be much more helpful if there was some way to query the available information. Here are some options to consider:

  • The vignette has an example of retrieving data using a wild card for the station id. Some more examples like this would be very helpful.

    I believe that wild-cards only work with station-id(s). Are you thinking that test-names would also be exposed to wildcard selection?

  • Including more detailed metadata in the appendix of the vignette, e.g., stations, date ranges, etc.

I added a link to the available test-name listing in the getwq docs. I think there may be too many stations to include a managable listing in the vignette but see response to the next comment.

  • Including more functions to return and search metadata. My SWMPr package has similar functions (e.g., SWMPRr::site_codes or SWMPr::site_codes_ind) but I have the advantage of much fewer stations than those in dbhydro, so I'm not sure the best approach.

Thank you for the suggestion. I am working on wrapping the station-search utility at http://my.sfwmd.gov/dbhydroplsql/water_quality_data.show_group_station_characters. For now, I tried to make map-based selection via the ArcGIS online or Google Earth interface more prominent in the vignette and README.

https://github.com/SFWMD/dbhydroR/issues/3
SFWMD/dbhydroR@11c57f3

Build/install

  • I was having problems building the vignette when running devtools::check(). The file dbhydroR.tex was not compiling and returning an error "support pakage l3kernel too old". Updating the package in my LaTeX distribution fixed the issue. I'm guessing you didn't have this issue since the package is already on CRAN but I wanted to mention it in case the problem occurs in the future. Otherwise, the devtools build on my Windows machine and the CRAN Windows builder (devtools::build_win()) passed with no notes, warnings, or errors.

Examples

  • The following error was returned running devtools::run_examples(pkg = '.', run = FALSE)):

cleanhydro(gethydro(dbkey = "15081", date_min = "2013-01-01", date_max = "2013-02-02"))
Error: is.response(x) is not TRUE

I ran the check to include examples in \dontrun tags, as for the above. It makes sense to not run the example if it's supposed to fail but there is no description exaplaining what the example shows. Either fix the example above or provide some explanation as to why it fails. All other examples ran without issues.

The example has been fixed and now runs without error

SFWMD/dbhydroR@ca676ef
SFWMD/dbhydroR@86436e3

Documentation/vignette

  • Maybe you should include some details (@details) for getwq and gethydro. The documentation for each is minimal, so perhaps add some of the text in the vignette to the help files for each function.

    Great suggestion! More detail has been added to the docs for getwq and gethydro.

    SFWMD/dbhydroR@8b67b62

  • As noted above under documentation requirements, the README does not include a 'statement of need clearly stating problems the software is designed to solve and its target audience'. The text from the onboarding issue above would be a good addition, i.e., 'dbhydroR provides scripted access to the South Florida Water Management District's DBHYDRO database which holds over 35 million water quality and hydrologic data records from the Florida Everglades and surrounding areas. The target audience is anyone interested in water quality and hydrologic data from the Florida Everglades - ecologists, engineers, meteorologists, hydrogeologists, hydrologists, etc.'

    This has been added to the README and vignette.

    SFWMD/dbhydroR@6a277de
    SFWMD/dbhydroR@7a7d414

  • For the vignette (or even on the README), I think it would be useful to provide a more user-friendly view of the available stations for getwq. The link in the vignette is to a KMZ file, which is not easily viewed. I tried installing the KML/KMZ Viewer for Chrome but it was blocked on my goverment computer, so I suspect others would have problems. Another option is to include an attachment to the vignette with a table or map showing the stations.

    Links to the equivalent ArcGIS Online station map have been added to the vignette and README, and function docs. Hopefully this will make it more generally accessible.

    SFWMD/dbhydroR@11c57f3
    SFWMD/dbhydroR@3750b84
    SFWMD/dbhydroR@e66be8f

Functions

  • I'm wondering if cleanhydro and cleanwq should be exported since they are only used within getwq and gethydro. I say this because the documentation is sparse (no details, no returned value description, minimal examples). Maybe it's better not to export. Would a user ever have a separate need for these functions?

    I think it is best to have these functions exported. I envision a use-case where dplyr/SQL-like data would be preferable to the default format. Exposing these functions would allow for the ability to switch to non-dplyr/SQL-like data without re-pulling the data. Also, I added more detail to the clean docs and refactored cleanhydro to be more consistent with cleanwq*

    SFWMD/dbhydroR@86436e3

  • For cleanwq, the vignette says that rows with missing data are removed. What happens with these rows if multiple stations or 'test_names' are queried? Is the entire row removed even if data are available for the other columns? I think some clarification is needed.

    Hmm, possibly you are referring to the "QA blanks" discussion? This refers not to missing data but to analytical samples run to check for equipment/process contamination. I have added text to the vignette to this effect.

    SFWMD/dbhydroR@5d4fe25

Contributing

  • Might be worth including a contributing.md file on the repo, or something more minimal in the README. Maybe modify this for your needs?

Good suggestion. I think it might be best to wait on this until we are closer to acceptance?

Compliance with rOpenSci Packaging Guide

My comments above address most of the compliance concerns for rOpenSci. Below are some minor, additional points.

Function variable naming

I guess the CRAN gatekeepers don't care about this but I was having a mental block with getwq given it's similarity to getwd. You might consider changing the function name to make it more distinct.

README

The ropensci footer will have to be added eventually: ropensci_footer

I think the onboarding guidelines have been updated to say that the footer is only added after a package has been accepted.

Code of conduct

Also consider adding the code of conduct badge with devtools::use_code_of_conduct.

A COC has been added.

SFWMD/dbhydroR@0758c0c

Hopefully these comments are helpful. Let me know if you have any questions.

@jsta

This comment has been minimized.

Show comment
Hide comment
@jsta

jsta Aug 29, 2016

Contributor

Thanks for the review @aappling-usgs!

Review Comments (@aappling-usgs)

This package provides scriptable access to an extensive Florida water quality/quantity database, DBHYDRO. It's also noticeably faster than the browser-based interface. dbhydroR will enhance the usability of the database and the reproducibility of analyses of DBHYDRO data.

I agree with @fawda123 that the largest room for improvement is in the documentation. I have also made suggestions below for increasing the flexibility of the data-cleaning functions to serve several common use cases, and I've pointed out a few edge cases where the package's functionality might surprise a user.

Documentation

As @fawda123 noted, it can be hard to identify the station IDs and tests you want if you don't know what the options are. There are a few ways you could helper users:

  • You have several great links in the vignette that could be ported to the corresponding help files.

    Great suggestion. I ported these links and added links to the ArcGIS Online Station Map.

    SFWMD/dbhydroR@8b67b62

  • The Google Earth kmz file is excellent for interactively identifying sites and timeseries. However, http://my.sfwmd.gov/KMLEXT/CUSTOMKMLS/DBHydro/DBHydroKML/DBHYDRO_KML.kmz doesn't open a browser window as a user might expect - maybe include some suggestions for how to best open the file, like on p. 56 of the DBHYDRO manual.

  • If you're feeling ambitious, functions that return lists of options (sites, tests) would enhance users' ability to search sites programatically. I also see a simplicity/maintainability argument for avoiding this route.

    I am working on wrapping the station-search utility at http://my.sfwmd.gov/dbhydroplsql/water_quality_data.show_group_station_characters. For now, I tried to make map-based selection via the ArcGIS online or Google Earth interface more prominent in the vignette and README.

Questions that arose on my first encounter with the package/dataset:

  • What does a dbkey represent? A site, a timeseries, ...?
    A dbkey represents a unique site x variable time series. I added some more info about dbkeys to the gethydro docs.

    SFWMD/dbhydroR@8b67b62

  • How should I cite DBHYDRO and/or the dbhydroR package?

    A notice about citation() has been added to the README

    SFWMD/dbhydroR@bbf0246

Other documentation thoughts

  • The description of "cleaned output" in the vignette could also be copied right over to the help files for getwd and gethydro (and their internal cleaning functions if those continue to be exported).

    I expanded the docs for the get and clean functions to describe what goes on in the respective functions. I also refactored cleanhydro to be more consistent with cleanwq (they both operate on data.frame objects now).**

    SFWMD/dbhydroR@8b67b62
    SFWMD/dbhydroR@86436e3

  • Contribution guidelines could be added to the README or a CONTRIBUTING file.

    The contributing guidelines of the other rOpenSci repos appear to be fairly specific to rOpenSci. I wonder if I should hold off on this until we are closer to acceptance?

Build/install

  • I had no trouble installing/building the package from CRAN, GitHub, or a local clone of the repository.

  • The GitHub repository name is dbhydro while the package name is dbhydroR. You might consider switching to the same name in both places to make it ever-so-slightly easier to find the package on GitHub. (E.g., my first try was devtools::install_github('SFWMD/dbhydroR') because I knew the package name and only glanced at the URL.)

    The repo name has been switched to match the package name.

Examples/tests

  • I encountered the same error as @fawda123 on running the example for cleanhydro:

devtools::run_examples(run=FALSE)
...
cleanhydro(gethydro(dbkey = "15081", date_min = "2013-01-01", date_max = "2013-02-02"))
Error: is.response(x) is not TRUE

  • devtools::check() runs cleanly with 0 errors, 0 warnings, and 0 notes.

The example has been fixed and now runs without error

SFWMD/dbhydroR@ca676ef
SFWMD/dbhydroR@86436e3

Possibly unnecessary files & code

  • The NEWS file has output: pdf_document at the top. Are you using this feature?

    These lines have been removed.

    SFWMD/dbhydroR@4cc1b2d

  • Do you need all of the files in dbhydro/vignettes? DFlowInterpR.bib~, Thumbs.db, and Untitled Document~ (and maybe others) might not be getting used.

    These files have been removed.

    SFWMD/dbhydroR@0888842
    SFWMD/dbhydroR@9186979

Functions

getwq + gethydro

These functions have a clean, intuitive interface. A few thoughts:

  • It would be helpful to more thoroughly explain dbkey in the documentation for gethydro.

    I included some additional information about dbkey in the gethydro docs.

    SFWMD/dbhydroR@8b67b62

  • Is it / could it be possible to pass dbkey into getwq, as it is for gethydro? One dbkey might be easier than site+test for some programming applications, and it looks like water quality data do have dbkeys, e.g.:

getdbkey(stationid = "C111%", stat = 'MEAN', category = "WQ")
Search results for 'C111% WQ'
Dbkey Group Data Type Freq Recorder Start Date End Date
1 38097 C111WETL H2OT DA ???? 04-OCT-2003 10-OCT-2011
2 38104 C111WETL SALIN DA ???? 04-OCT-2003 10-OCT-2011
3 38100 C111WETL SCOND DA ???? 04-OCT-2003 10-OCT-2011

I realize that this is a bit confusing. The getwq function pulls data from a seperate set of DBHYDRO tables than gethydro. Only the gethydro tables contain dbkeys. Then, confusingly, the gethydro tables have a category called "WQ". The getwq tables do not support selection by dbkey. The getwq water quality data is generated from laboratory tests while the gethydro "WQ" data is generated from things like automated sondes.

  • Two things only occur in getwq if raw==FALSE which I expected to be conditional: (1) the message "No data found" doesn't occur even if there are no data, and (2) MDL handling doesn't occur even if I specify mdl_handling. My favorite solution would be to make these things happen regardless of the value of raw, but another option would be to document them prominently in the ?getwq help file.

getwq("FLAB01", "2014-09-14", "2014-09-18", "NITRATE+NITRITE-N", raw = TRUE)$Value
[1] -0.01
getwq("FLAB01", "2014-09-14", "2014-09-18", "NITRATE+NITRITE-N", raw = TRUE, mdl_handling='half')$Value
[1] -0.01

I refactored getwq and cleanwq so that MDL handling occurs regardless of how raw is set. Also, the "No data found" message should now appear regardless of how raw is set.

SFWMD/dbhydroR@8a62918

getdbkey

Though not advertised in the README, the getdbkey function looks like a useful way to identify datasets to download. With that in mind, I have a few suggestions that are mostly illustrated by this example:

getdbkey(stationid = "C111%", stat = 'MEAN', category = "WQ", detail.level="full")
Search results for 'C111% WQ'
Get Data Dbkey \nStation\n Group Data Type Freq Stat Recorder Agency
1 38097 C111WETLND C111WETL H2OT DA MEAN ???? USGS
2 38104 C111WETLND C111WETL SALIN DA MEAN ???? USGS
3 38100 C111WETLND C111WETL SCOND DA MEAN ???? USGS
Start Date End Date Strata County Op Num Latitude Longitude X Coord
1 04-OCT-2003 10-OCT-2011 0 DAD Â 251740 803112 814755.394
2 04-OCT-2003 10-OCT-2011 0 DAD Â 251740 803112 814755.394
3 04-OCT-2003 10-OCT-2011 0 DAD Â 251740 803112 814755.394
Y Coord Basin Struct Section Township Range
1 349549.418 C111_CO Â 15 59 38
2 349549.418 C111_CO Â 15 59 38
3 349549.418 C111_CO Â 15 59 38

I added some advertisement for getdbkey to the README.

  • Calls with detail.level='full' return a column named "\nStation\n". I'd prefer "Station" if there's no reason to stick with the extra newlines.

  • The values of Latitude and Longitude are unconventionally formatted. Would you consider converting them to decimal degrees?

    I removed the extra newlines from "Station" and formatted Latitude and Longitude as decimal degrees.

    SFWMD/dbhydroR@3df0ad1

  • Are the values in the "Struct" column being parsed correctly? I don't know what to make of Â. It seems to appear in other columns, too, in other queries.

    I am not getting  printed in any of the columns. Maybe I fixed it as a side-effect of another commit? The following line prints a reasonable (non-blank) value of Struct for me:

    getdbkey(category = "SW", stationid = "ALLIGAT", param = "STG", detail.level = "full")

  • Does anything useful ever appear in the "Get Data" column?

    Good catch. No it doesn't. This is an artifact from the web interface. I removed this column.

    SFWMD/dbhydroR@b68cc5e

  • Consider using stringsAsFactors=FALSE so that Station, Dbkey, etc. come back as strings.

    Good suggestion. I made this change.

    SFWMD/dbhydroR@3df0ad1

  • When I try to get a more complete list of sites/tests to choose from, it seems that I'm limited to 100 dbkeys. This limitation should probably be more clearly documented and/or circumvented. http://my.sfwmd.gov/dbhydroplsql/show_dbkey_info.show_station_info?v_station=I%25

we'd expect >100 dbkeys in this query

nrow(getdbkey(category = "GW", detail.level = "full"))
Search results for 'NA GW'
[1] 100

because we easily get >100 sites combined in these two queries:

dba <- getdbkey(stationid = 'A%', category = "GW", detail.level = "full")
Search results for 'A% GW'
dbi <- getdbkey(stationid = 'I%', category = "GW", detail.level = "full")
Search results for 'I% GW'
nrow(dba) + nrow(dbi)
[1] 134

I circumvented this limitation.

SFWMD/dbhydroR@b68cc5e

cleanwq + cleanhydro

h <- gethydro(dbkey = "IY639", date_min = "2009-01-30", date_max = "2009-01-31")
Warning messages:
1: In cleanhydro(res) : Column headings missing. Guessing...
2: In cleanhydro(res) : Instantaneous data detected
head(h$date)
[1] "2009-01-30 00:40:00 MST" "2009-01-30 00:55:00 MST" "2009-01-30 01:10:00 MST"
[4] "2009-01-30 01:25:00 MST" "2009-01-30 01:40:00 MST" "2009-01-30 01:55:00 MST"

I forced hydro and wq date/times to return in EST.

SFWMD/dbhydroR@4d1dd15
SFWMD/dbhydroR@239c038

  • In cleanhydro, why is "Instantaneous data detected" a warning rather than a message? Is it possible for instantaneous data to make trouble?

    This has been changed to a message rather than a warning. The raw server response for instantaneous data returns no column headers. These are assumed, hence the warning/message. The column assignments seem pretty reliable. Maybe I can drop these messages.

    SFWMD/dbhydroR@2d9902d

  • At first glance, I agreed with @fawda123 that it would make sense not to export the cleanwq and cleanhydro functions because they are wrapped by getwd and gethydro. After more thought, though, I decided that I might often want to retain the Station.ID column and the long format (e.g., for dplyr-style analyses of multiple sites or tests) but might also want to specify the mdl_handling and/or get back parsed timestamps. I see no one-line option for these use cases, and it makes me wonder if the cleaning functionality could be split into things you'd almost always want (parsed timestamps, connecting metadata to headers, maybe move the Value column closer to the left) and things you might want to specify in various ways (MDL handling, selecting specific columns, switching between wide and long, renaming the value column). The former functionality could be the default in getwd and gethydro while the latter could live in (a) exported functions that you'd call separately or (b) new arguments to getwd and gethydro.

Maybe I have captured the spirit of this comment by standardizing the behavior of the clean functions, and forcing mdl_handling to occur regardless of the value of raw? Also, timestamps are parsed to the date column regardless of the value of raw.*

  • If you decide that cleanwq and cleanhydro should continue to be exported functions, they should be used in examples in the README and vignette.

    I added examples to the README and vignette.

    SFWMD/dbhydroR@cf5ebd7

Signing off

This package is already a nice contribution - thanks for developing it. Let me know if you'd like to clarify/discuss any of the above points. Best, Alison (@aappling-usgs)

Contributor

jsta commented Aug 29, 2016

Thanks for the review @aappling-usgs!

Review Comments (@aappling-usgs)

This package provides scriptable access to an extensive Florida water quality/quantity database, DBHYDRO. It's also noticeably faster than the browser-based interface. dbhydroR will enhance the usability of the database and the reproducibility of analyses of DBHYDRO data.

I agree with @fawda123 that the largest room for improvement is in the documentation. I have also made suggestions below for increasing the flexibility of the data-cleaning functions to serve several common use cases, and I've pointed out a few edge cases where the package's functionality might surprise a user.

Documentation

As @fawda123 noted, it can be hard to identify the station IDs and tests you want if you don't know what the options are. There are a few ways you could helper users:

  • You have several great links in the vignette that could be ported to the corresponding help files.

    Great suggestion. I ported these links and added links to the ArcGIS Online Station Map.

    SFWMD/dbhydroR@8b67b62

  • The Google Earth kmz file is excellent for interactively identifying sites and timeseries. However, http://my.sfwmd.gov/KMLEXT/CUSTOMKMLS/DBHydro/DBHydroKML/DBHYDRO_KML.kmz doesn't open a browser window as a user might expect - maybe include some suggestions for how to best open the file, like on p. 56 of the DBHYDRO manual.

  • If you're feeling ambitious, functions that return lists of options (sites, tests) would enhance users' ability to search sites programatically. I also see a simplicity/maintainability argument for avoiding this route.

    I am working on wrapping the station-search utility at http://my.sfwmd.gov/dbhydroplsql/water_quality_data.show_group_station_characters. For now, I tried to make map-based selection via the ArcGIS online or Google Earth interface more prominent in the vignette and README.

Questions that arose on my first encounter with the package/dataset:

  • What does a dbkey represent? A site, a timeseries, ...?
    A dbkey represents a unique site x variable time series. I added some more info about dbkeys to the gethydro docs.

    SFWMD/dbhydroR@8b67b62

  • How should I cite DBHYDRO and/or the dbhydroR package?

    A notice about citation() has been added to the README

    SFWMD/dbhydroR@bbf0246

Other documentation thoughts

  • The description of "cleaned output" in the vignette could also be copied right over to the help files for getwd and gethydro (and their internal cleaning functions if those continue to be exported).

    I expanded the docs for the get and clean functions to describe what goes on in the respective functions. I also refactored cleanhydro to be more consistent with cleanwq (they both operate on data.frame objects now).**

    SFWMD/dbhydroR@8b67b62
    SFWMD/dbhydroR@86436e3

  • Contribution guidelines could be added to the README or a CONTRIBUTING file.

    The contributing guidelines of the other rOpenSci repos appear to be fairly specific to rOpenSci. I wonder if I should hold off on this until we are closer to acceptance?

Build/install

  • I had no trouble installing/building the package from CRAN, GitHub, or a local clone of the repository.

  • The GitHub repository name is dbhydro while the package name is dbhydroR. You might consider switching to the same name in both places to make it ever-so-slightly easier to find the package on GitHub. (E.g., my first try was devtools::install_github('SFWMD/dbhydroR') because I knew the package name and only glanced at the URL.)

    The repo name has been switched to match the package name.

Examples/tests

  • I encountered the same error as @fawda123 on running the example for cleanhydro:

devtools::run_examples(run=FALSE)
...
cleanhydro(gethydro(dbkey = "15081", date_min = "2013-01-01", date_max = "2013-02-02"))
Error: is.response(x) is not TRUE

  • devtools::check() runs cleanly with 0 errors, 0 warnings, and 0 notes.

The example has been fixed and now runs without error

SFWMD/dbhydroR@ca676ef
SFWMD/dbhydroR@86436e3

Possibly unnecessary files & code

  • The NEWS file has output: pdf_document at the top. Are you using this feature?

    These lines have been removed.

    SFWMD/dbhydroR@4cc1b2d

  • Do you need all of the files in dbhydro/vignettes? DFlowInterpR.bib~, Thumbs.db, and Untitled Document~ (and maybe others) might not be getting used.

    These files have been removed.

    SFWMD/dbhydroR@0888842
    SFWMD/dbhydroR@9186979

Functions

getwq + gethydro

These functions have a clean, intuitive interface. A few thoughts:

  • It would be helpful to more thoroughly explain dbkey in the documentation for gethydro.

    I included some additional information about dbkey in the gethydro docs.

    SFWMD/dbhydroR@8b67b62

  • Is it / could it be possible to pass dbkey into getwq, as it is for gethydro? One dbkey might be easier than site+test for some programming applications, and it looks like water quality data do have dbkeys, e.g.:

getdbkey(stationid = "C111%", stat = 'MEAN', category = "WQ")
Search results for 'C111% WQ'
Dbkey Group Data Type Freq Recorder Start Date End Date
1 38097 C111WETL H2OT DA ???? 04-OCT-2003 10-OCT-2011
2 38104 C111WETL SALIN DA ???? 04-OCT-2003 10-OCT-2011
3 38100 C111WETL SCOND DA ???? 04-OCT-2003 10-OCT-2011

I realize that this is a bit confusing. The getwq function pulls data from a seperate set of DBHYDRO tables than gethydro. Only the gethydro tables contain dbkeys. Then, confusingly, the gethydro tables have a category called "WQ". The getwq tables do not support selection by dbkey. The getwq water quality data is generated from laboratory tests while the gethydro "WQ" data is generated from things like automated sondes.

  • Two things only occur in getwq if raw==FALSE which I expected to be conditional: (1) the message "No data found" doesn't occur even if there are no data, and (2) MDL handling doesn't occur even if I specify mdl_handling. My favorite solution would be to make these things happen regardless of the value of raw, but another option would be to document them prominently in the ?getwq help file.

getwq("FLAB01", "2014-09-14", "2014-09-18", "NITRATE+NITRITE-N", raw = TRUE)$Value
[1] -0.01
getwq("FLAB01", "2014-09-14", "2014-09-18", "NITRATE+NITRITE-N", raw = TRUE, mdl_handling='half')$Value
[1] -0.01

I refactored getwq and cleanwq so that MDL handling occurs regardless of how raw is set. Also, the "No data found" message should now appear regardless of how raw is set.

SFWMD/dbhydroR@8a62918

getdbkey

Though not advertised in the README, the getdbkey function looks like a useful way to identify datasets to download. With that in mind, I have a few suggestions that are mostly illustrated by this example:

getdbkey(stationid = "C111%", stat = 'MEAN', category = "WQ", detail.level="full")
Search results for 'C111% WQ'
Get Data Dbkey \nStation\n Group Data Type Freq Stat Recorder Agency
1 38097 C111WETLND C111WETL H2OT DA MEAN ???? USGS
2 38104 C111WETLND C111WETL SALIN DA MEAN ???? USGS
3 38100 C111WETLND C111WETL SCOND DA MEAN ???? USGS
Start Date End Date Strata County Op Num Latitude Longitude X Coord
1 04-OCT-2003 10-OCT-2011 0 DAD Â 251740 803112 814755.394
2 04-OCT-2003 10-OCT-2011 0 DAD Â 251740 803112 814755.394
3 04-OCT-2003 10-OCT-2011 0 DAD Â 251740 803112 814755.394
Y Coord Basin Struct Section Township Range
1 349549.418 C111_CO Â 15 59 38
2 349549.418 C111_CO Â 15 59 38
3 349549.418 C111_CO Â 15 59 38

I added some advertisement for getdbkey to the README.

  • Calls with detail.level='full' return a column named "\nStation\n". I'd prefer "Station" if there's no reason to stick with the extra newlines.

  • The values of Latitude and Longitude are unconventionally formatted. Would you consider converting them to decimal degrees?

    I removed the extra newlines from "Station" and formatted Latitude and Longitude as decimal degrees.

    SFWMD/dbhydroR@3df0ad1

  • Are the values in the "Struct" column being parsed correctly? I don't know what to make of Â. It seems to appear in other columns, too, in other queries.

    I am not getting  printed in any of the columns. Maybe I fixed it as a side-effect of another commit? The following line prints a reasonable (non-blank) value of Struct for me:

    getdbkey(category = "SW", stationid = "ALLIGAT", param = "STG", detail.level = "full")

  • Does anything useful ever appear in the "Get Data" column?

    Good catch. No it doesn't. This is an artifact from the web interface. I removed this column.

    SFWMD/dbhydroR@b68cc5e

  • Consider using stringsAsFactors=FALSE so that Station, Dbkey, etc. come back as strings.

    Good suggestion. I made this change.

    SFWMD/dbhydroR@3df0ad1

  • When I try to get a more complete list of sites/tests to choose from, it seems that I'm limited to 100 dbkeys. This limitation should probably be more clearly documented and/or circumvented. http://my.sfwmd.gov/dbhydroplsql/show_dbkey_info.show_station_info?v_station=I%25

we'd expect >100 dbkeys in this query

nrow(getdbkey(category = "GW", detail.level = "full"))
Search results for 'NA GW'
[1] 100

because we easily get >100 sites combined in these two queries:

dba <- getdbkey(stationid = 'A%', category = "GW", detail.level = "full")
Search results for 'A% GW'
dbi <- getdbkey(stationid = 'I%', category = "GW", detail.level = "full")
Search results for 'I% GW'
nrow(dba) + nrow(dbi)
[1] 134

I circumvented this limitation.

SFWMD/dbhydroR@b68cc5e

cleanwq + cleanhydro

h <- gethydro(dbkey = "IY639", date_min = "2009-01-30", date_max = "2009-01-31")
Warning messages:
1: In cleanhydro(res) : Column headings missing. Guessing...
2: In cleanhydro(res) : Instantaneous data detected
head(h$date)
[1] "2009-01-30 00:40:00 MST" "2009-01-30 00:55:00 MST" "2009-01-30 01:10:00 MST"
[4] "2009-01-30 01:25:00 MST" "2009-01-30 01:40:00 MST" "2009-01-30 01:55:00 MST"

I forced hydro and wq date/times to return in EST.

SFWMD/dbhydroR@4d1dd15
SFWMD/dbhydroR@239c038

  • In cleanhydro, why is "Instantaneous data detected" a warning rather than a message? Is it possible for instantaneous data to make trouble?

    This has been changed to a message rather than a warning. The raw server response for instantaneous data returns no column headers. These are assumed, hence the warning/message. The column assignments seem pretty reliable. Maybe I can drop these messages.

    SFWMD/dbhydroR@2d9902d

  • At first glance, I agreed with @fawda123 that it would make sense not to export the cleanwq and cleanhydro functions because they are wrapped by getwd and gethydro. After more thought, though, I decided that I might often want to retain the Station.ID column and the long format (e.g., for dplyr-style analyses of multiple sites or tests) but might also want to specify the mdl_handling and/or get back parsed timestamps. I see no one-line option for these use cases, and it makes me wonder if the cleaning functionality could be split into things you'd almost always want (parsed timestamps, connecting metadata to headers, maybe move the Value column closer to the left) and things you might want to specify in various ways (MDL handling, selecting specific columns, switching between wide and long, renaming the value column). The former functionality could be the default in getwd and gethydro while the latter could live in (a) exported functions that you'd call separately or (b) new arguments to getwd and gethydro.

Maybe I have captured the spirit of this comment by standardizing the behavior of the clean functions, and forcing mdl_handling to occur regardless of the value of raw? Also, timestamps are parsed to the date column regardless of the value of raw.*

  • If you decide that cleanwq and cleanhydro should continue to be exported functions, they should be used in examples in the README and vignette.

    I added examples to the README and vignette.

    SFWMD/dbhydroR@cf5ebd7

Signing off

This package is already a nice contribution - thanks for developing it. Let me know if you'd like to clarify/discuss any of the above points. Best, Alison (@aappling-usgs)

@sckott

This comment has been minimized.

Show comment
Hide comment
@sckott

sckott Aug 30, 2016

Member

@jsta Thanks for responding to reviews and making changes.

@fawda123 @aappling-usgs There's a few questions for you, see his responses. Also, any more thoughts, or are you happy?

Member

sckott commented Aug 30, 2016

@jsta Thanks for responding to reviews and making changes.

@fawda123 @aappling-usgs There's a few questions for you, see his responses. Also, any more thoughts, or are you happy?

@fawda123

This comment has been minimized.

Show comment
Hide comment
@fawda123

fawda123 Sep 1, 2016

@jsta Looks like you've done a good job addressing my comments, nice work. I'm responding below to your specific questions and some follow-up points after having some time to stew on the review.

About the wild-cards, I guess I was wondering if you could have a few more examples in the vignette of how these can be used to get info on several stations in the case where a user has some knowledge about the station naming convention but specific names are unknown. For example, all stations that start with 'A' or 'B', etc. Are there other characters besides '%' that can help with retrieval (e.g., regex matching)? It's not all that important, the additional links in the vignette are very helpful.

The link for the test_name argument in getwq documentation is really helpful. Maybe also add the station-search utility link under the station_id argument description.

Minor typo, 'hydrolgic' in the README description.

I'll try clarify my comment on the QA field blanks, which might be related to my misunderstanding of the data. For an instance of using getwq to retrieve multiple stations, would there not be separate fields for each station? Removing rows for fields that apply to one station would remove valid data for a station wasn't flagged if the data were in wide format, yes? Just wanted to clarify that the cleaning function isn't removing valid data.

I also ask you to reconsider changing the name of getwq based on my previous comment. As proof of point, @aaplling-usgs has used getwd by accident a few times.

fawda123 commented Sep 1, 2016

@jsta Looks like you've done a good job addressing my comments, nice work. I'm responding below to your specific questions and some follow-up points after having some time to stew on the review.

About the wild-cards, I guess I was wondering if you could have a few more examples in the vignette of how these can be used to get info on several stations in the case where a user has some knowledge about the station naming convention but specific names are unknown. For example, all stations that start with 'A' or 'B', etc. Are there other characters besides '%' that can help with retrieval (e.g., regex matching)? It's not all that important, the additional links in the vignette are very helpful.

The link for the test_name argument in getwq documentation is really helpful. Maybe also add the station-search utility link under the station_id argument description.

Minor typo, 'hydrolgic' in the README description.

I'll try clarify my comment on the QA field blanks, which might be related to my misunderstanding of the data. For an instance of using getwq to retrieve multiple stations, would there not be separate fields for each station? Removing rows for fields that apply to one station would remove valid data for a station wasn't flagged if the data were in wide format, yes? Just wanted to clarify that the cleaning function isn't removing valid data.

I also ask you to reconsider changing the name of getwq based on my previous comment. As proof of point, @aaplling-usgs has used getwd by accident a few times.

@aappling-usgs

This comment has been minimized.

Show comment
Hide comment
@aappling-usgs

aappling-usgs Sep 1, 2016

Hah - true, I did end up typing getwd a bunch of times in my comments accidentally, didn't I? It's hard to type getwq when the habit is in your fingers.

@jsta, a quick glance suggests you've made some great improvements. I've been out of town and will need a few more days to catch up and reply properly.

aappling-usgs commented Sep 1, 2016

Hah - true, I did end up typing getwd a bunch of times in my comments accidentally, didn't I? It's hard to type getwq when the habit is in your fingers.

@jsta, a quick glance suggests you've made some great improvements. I've been out of town and will need a few more days to catch up and reply properly.

@jsta

This comment has been minimized.

Show comment
Hide comment
@jsta

jsta Sep 1, 2016

Contributor

@fawda123

  • I believe that % is the only wildcard operation that is currently supported by the DBHYDRO database backend (at least it is the only one that is documented).
  • Great suggestion on adding a link to the station-search utility!
  • Typo fixed.
  • No valid data is being removed. If mdl_handling is specified, flagged data is removed prior to conversion to wide format.
  • You make a good argument for changing the getwq name. I just needed to learn what is the best way to deprecate/alias a function name so as not to break backwards compatibility. My plan is to follow the accepted answer in this stackoverflow thread. What do you think about get_wq()?
Contributor

jsta commented Sep 1, 2016

@fawda123

  • I believe that % is the only wildcard operation that is currently supported by the DBHYDRO database backend (at least it is the only one that is documented).
  • Great suggestion on adding a link to the station-search utility!
  • Typo fixed.
  • No valid data is being removed. If mdl_handling is specified, flagged data is removed prior to conversion to wide format.
  • You make a good argument for changing the getwq name. I just needed to learn what is the best way to deprecate/alias a function name so as not to break backwards compatibility. My plan is to follow the accepted answer in this stackoverflow thread. What do you think about get_wq()?
@jsta

This comment has been minimized.

Show comment
Hide comment
@jsta

jsta Sep 2, 2016

Contributor

I went ahead and refactored to underscored function names. Maybe that will break the getwd muscle memory?

Contributor

jsta commented Sep 2, 2016

I went ahead and refactored to underscored function names. Maybe that will break the getwd muscle memory?

@fawda123

This comment has been minimized.

Show comment
Hide comment
@fawda123

fawda123 Sep 2, 2016

@jsta The underscore works better, good idea.

@sckott I'm happy with the changes from my end, this package is good to go pending comments from @aappling-usgs

fawda123 commented Sep 2, 2016

@jsta The underscore works better, good idea.

@sckott I'm happy with the changes from my end, this package is good to go pending comments from @aappling-usgs

@sckott

This comment has been minimized.

Show comment
Hide comment
@sckott

sckott Sep 2, 2016

Member

thanks @fawda123 - we'll wait for a thumbsup from @aappling-usgs

Member

sckott commented Sep 2, 2016

thanks @fawda123 - we'll wait for a thumbsup from @aappling-usgs

@aappling-usgs

This comment has been minimized.

Show comment
Hide comment
@aappling-usgs

aappling-usgs Sep 3, 2016

@jsta - yeah, tons of great changes, and I'm satisfied as-is. I agree that there are several ways to address the division between getting and cleaning, and I like what you've chosen. I also like the new underscores.

I'm not particularly worried about it, but the  symbols in the Num and sometimes also Struct columns are still appearing for me, even after updating to the most recent version. Maybe it's just how my system is set up, and it does seem to vary by query - the query you showed gives 'SPIL' in the Struct column but still gives  in the Num column, while a different query gives Âs in both.

> get_dbkey(category = "SW", stationid = "ALLIGAT", param = "STG", detail.level = "full")
Search results for 'ALLIGAT SW'
  Get Data Dbkey Station   Group Data Type Freq Stat Recorder Agency  Start Date    End Date Strata County Op Num
1          00875 ALLIGAT ALLIGAT       STG   DA MEAN     ????   USGS 05-OCT-1979 06-DEC-1979      0    CHA      Â
  Latitude Longitude    X Coord    Y Coord    Basin Struct Section Township Range
1 26.34222  -81.8303 338357.138 932249.714 CHARLOTT   SPIL      23       41    23
> get_dbkey(stationid = "C111%", stat = 'MEAN', category = "WQ", detail.level="full")
Search results for 'C111% WQ'
  Get Data Dbkey    Station    Group Data Type Freq Stat Recorder Agency  Start Date    End Date Strata County Op Num
1          38097 C111WETLND C111WETL      H2OT   DA MEAN     ????   USGS 04-OCT-2003 10-OCT-2011      0    DAD      Â
2          38104 C111WETLND C111WETL     SALIN   DA MEAN     ????   USGS 04-OCT-2003 10-OCT-2011      0    DAD      Â
3          38100 C111WETLND C111WETL     SCOND   DA MEAN     ????   USGS 04-OCT-2003 10-OCT-2011      0    DAD      Â
  Latitude Longitude    X Coord    Y Coord   Basin Struct Section Township Range
1    25.74   -80.112 814755.394 349549.418 C111_CO      Â      15       59    38
2    25.74   -80.112 814755.394 349549.418 C111_CO      Â      15       59    38
3    25.74   -80.112 814755.394 349549.418 C111_CO      Â      15       59    38

aappling-usgs commented Sep 3, 2016

@jsta - yeah, tons of great changes, and I'm satisfied as-is. I agree that there are several ways to address the division between getting and cleaning, and I like what you've chosen. I also like the new underscores.

I'm not particularly worried about it, but the  symbols in the Num and sometimes also Struct columns are still appearing for me, even after updating to the most recent version. Maybe it's just how my system is set up, and it does seem to vary by query - the query you showed gives 'SPIL' in the Struct column but still gives  in the Num column, while a different query gives Âs in both.

> get_dbkey(category = "SW", stationid = "ALLIGAT", param = "STG", detail.level = "full")
Search results for 'ALLIGAT SW'
  Get Data Dbkey Station   Group Data Type Freq Stat Recorder Agency  Start Date    End Date Strata County Op Num
1          00875 ALLIGAT ALLIGAT       STG   DA MEAN     ????   USGS 05-OCT-1979 06-DEC-1979      0    CHA      Â
  Latitude Longitude    X Coord    Y Coord    Basin Struct Section Township Range
1 26.34222  -81.8303 338357.138 932249.714 CHARLOTT   SPIL      23       41    23
> get_dbkey(stationid = "C111%", stat = 'MEAN', category = "WQ", detail.level="full")
Search results for 'C111% WQ'
  Get Data Dbkey    Station    Group Data Type Freq Stat Recorder Agency  Start Date    End Date Strata County Op Num
1          38097 C111WETLND C111WETL      H2OT   DA MEAN     ????   USGS 04-OCT-2003 10-OCT-2011      0    DAD      Â
2          38104 C111WETLND C111WETL     SALIN   DA MEAN     ????   USGS 04-OCT-2003 10-OCT-2011      0    DAD      Â
3          38100 C111WETLND C111WETL     SCOND   DA MEAN     ????   USGS 04-OCT-2003 10-OCT-2011      0    DAD      Â
  Latitude Longitude    X Coord    Y Coord   Basin Struct Section Township Range
1    25.74   -80.112 814755.394 349549.418 C111_CO      Â      15       59    38
2    25.74   -80.112 814755.394 349549.418 C111_CO      Â      15       59    38
3    25.74   -80.112 814755.394 349549.418 C111_CO      Â      15       59    38
@jsta

This comment has been minimized.

Show comment
Hide comment
@jsta

jsta Sep 3, 2016

Contributor

@aappling-usgs My guess is that it is some sort of encoding issue. I've just pushed a commit that specifies the encoding of httr::content(). Does this fix the issue?

Contributor

jsta commented Sep 3, 2016

@aappling-usgs My guess is that it is some sort of encoding issue. I've just pushed a commit that specifies the encoding of httr::content(). Does this fix the issue?

@aappling-usgs

This comment has been minimized.

Show comment
Hide comment
@aappling-usgs

aappling-usgs commented Sep 3, 2016

Hmm. nope.

@jsta

This comment has been minimized.

Show comment
Hide comment
@jsta

jsta Sep 5, 2016

Contributor

Ok, I think I figured it out. Those two columns were being read as native encoding despite setting encoding to UTF-8 in the initial pull. My latest commit should fix (SFWMD/dbhydroR@2125b25).

For future reference, I used stringi::stri_enc_mark() to check encoding. This r-help thread describes the issue but it seems that my "fix" is not documented anywhere.

Contributor

jsta commented Sep 5, 2016

Ok, I think I figured it out. Those two columns were being read as native encoding despite setting encoding to UTF-8 in the initial pull. My latest commit should fix (SFWMD/dbhydroR@2125b25).

For future reference, I used stringi::stri_enc_mark() to check encoding. This r-help thread describes the issue but it seems that my "fix" is not documented anywhere.

@aappling-usgs

This comment has been minimized.

Show comment
Hide comment
@aappling-usgs

aappling-usgs Sep 6, 2016

That's it!

> library(dbhydroR)
> get_dbkey(category = "SW", stationid = "ALLIGAT", param = "STG", detail.level = "full")
Search results for 'ALLIGAT SW'
  Dbkey Station   Group Data Type Freq Stat Recorder Agency  Start Date    End Date Strata County Op Num Latitude
1 00875 ALLIGAT ALLIGAT       STG   DA MEAN     ????   USGS 05-OCT-1979 06-DEC-1979      0    CHA        26.34222
  Longitude    X Coord    Y Coord    Basin Struct Section Township Range
1  -81.8303 338357.138 932249.714 CHARLOTT   SPIL      23       41    23
> get_dbkey(stationid = "C111%", stat = 'MEAN', category = "WQ", detail.level="full")
Search results for 'C111% WQ'
  Dbkey    Station    Group Data Type Freq Stat Recorder Agency  Start Date    End Date Strata County Op Num Latitude
1 38097 C111WETLND C111WETL      H2OT   DA MEAN     ????   USGS 04-OCT-2003 10-OCT-2011      0    DAD           25.74
2 38104 C111WETLND C111WETL     SALIN   DA MEAN     ????   USGS 04-OCT-2003 10-OCT-2011      0    DAD           25.74
3 38100 C111WETLND C111WETL     SCOND   DA MEAN     ????   USGS 04-OCT-2003 10-OCT-2011      0    DAD           25.74
  Longitude    X Coord    Y Coord   Basin Struct Section Township Range
1   -80.112 814755.394 349549.418 C111_CO             15       59    38
2   -80.112 814755.394 349549.418 C111_CO             15       59    38
3   -80.112 814755.394 349549.418 C111_CO             15       59    38

aappling-usgs commented Sep 6, 2016

That's it!

> library(dbhydroR)
> get_dbkey(category = "SW", stationid = "ALLIGAT", param = "STG", detail.level = "full")
Search results for 'ALLIGAT SW'
  Dbkey Station   Group Data Type Freq Stat Recorder Agency  Start Date    End Date Strata County Op Num Latitude
1 00875 ALLIGAT ALLIGAT       STG   DA MEAN     ????   USGS 05-OCT-1979 06-DEC-1979      0    CHA        26.34222
  Longitude    X Coord    Y Coord    Basin Struct Section Township Range
1  -81.8303 338357.138 932249.714 CHARLOTT   SPIL      23       41    23
> get_dbkey(stationid = "C111%", stat = 'MEAN', category = "WQ", detail.level="full")
Search results for 'C111% WQ'
  Dbkey    Station    Group Data Type Freq Stat Recorder Agency  Start Date    End Date Strata County Op Num Latitude
1 38097 C111WETLND C111WETL      H2OT   DA MEAN     ????   USGS 04-OCT-2003 10-OCT-2011      0    DAD           25.74
2 38104 C111WETLND C111WETL     SALIN   DA MEAN     ????   USGS 04-OCT-2003 10-OCT-2011      0    DAD           25.74
3 38100 C111WETLND C111WETL     SCOND   DA MEAN     ????   USGS 04-OCT-2003 10-OCT-2011      0    DAD           25.74
  Longitude    X Coord    Y Coord   Basin Struct Section Township Range
1   -80.112 814755.394 349549.418 C111_CO             15       59    38
2   -80.112 814755.394 349549.418 C111_CO             15       59    38
3   -80.112 814755.394 349549.418 C111_CO             15       59    38
@sckott

This comment has been minimized.

Show comment
Hide comment
@sckott

sckott Sep 6, 2016

Member

thanks @aappling-usgs and @fawda123 for your reviews!

@jsta Just looking over it before approving, a few questions:

In e.g.,

get_wq(station_id = "FLAB08", date_min = "2011-03-01",
    date_max = "2012-05-01", test_name = "CHLOROPHYLLA-SALINE")
#>          date FLAB08_CHLOROPHYLLA-SALINE_ug/L
#> 1  2011-03-21                           0.196
#> 2  2011-04-11                           0.276
#> 3  2011-05-09                           0.328
#> 4  2011-06-07                           0.329
#> 5  2011-07-05                           0.402
#> 6  2011-08-08                           0.907
#> 7  2011-09-06                           0.887
#> 8  2011-10-06                           0.573
#> 9  2011-12-08                           0.369
#> 10 2012-02-02                           0.260
#> 11 2012-04-05                           0.509

Is the 2nd column name a combination of the station name and the variable? If so, I'd prefer that this is changed to be in more of tidy format, with station as it's own column, and variable as its own column. At least have a function to get it in that format easily would be nice.

Member

sckott commented Sep 6, 2016

thanks @aappling-usgs and @fawda123 for your reviews!

@jsta Just looking over it before approving, a few questions:

In e.g.,

get_wq(station_id = "FLAB08", date_min = "2011-03-01",
    date_max = "2012-05-01", test_name = "CHLOROPHYLLA-SALINE")
#>          date FLAB08_CHLOROPHYLLA-SALINE_ug/L
#> 1  2011-03-21                           0.196
#> 2  2011-04-11                           0.276
#> 3  2011-05-09                           0.328
#> 4  2011-06-07                           0.329
#> 5  2011-07-05                           0.402
#> 6  2011-08-08                           0.907
#> 7  2011-09-06                           0.887
#> 8  2011-10-06                           0.573
#> 9  2011-12-08                           0.369
#> 10 2012-02-02                           0.260
#> 11 2012-04-05                           0.509

Is the 2nd column name a combination of the station name and the variable? If so, I'd prefer that this is changed to be in more of tidy format, with station as it's own column, and variable as its own column. At least have a function to get it in that format easily would be nice.

@jsta

This comment has been minimized.

Show comment
Hide comment
@jsta

jsta Sep 6, 2016

Contributor

Thanks for your question @sckott.

There is already a method for returning station name and variable in separate columns.

get_wq(station_id = "FLAB08", date_min = "2011-03-01",
    date_max = "2012-05-01", test_name = "CHLOROPHYLLA-SALINE",
    raw = TRUE)[,c("date", "Station.ID", "Test.Name", "Units", "Value")]

         date Station.ID           Test.Name Units Value
1  2011-03-21     FLAB08 CHLOROPHYLLA-SALINE  ug/L 0.196
3  2011-04-11     FLAB08 CHLOROPHYLLA-SALINE  ug/L 0.276
5  2011-05-09     FLAB08 CHLOROPHYLLA-SALINE  ug/L 0.328
6  2011-06-07     FLAB08 CHLOROPHYLLA-SALINE  ug/L 0.329
7  2011-07-05     FLAB08 CHLOROPHYLLA-SALINE  ug/L 0.402
9  2011-08-08     FLAB08 CHLOROPHYLLA-SALINE  ug/L 0.907
10 2011-09-06     FLAB08 CHLOROPHYLLA-SALINE  ug/L 0.887
11 2011-10-06     FLAB08 CHLOROPHYLLA-SALINE  ug/L 0.573
12 2011-12-08     FLAB08 CHLOROPHYLLA-SALINE  ug/L 0.369
13 2012-02-02     FLAB08 CHLOROPHYLLA-SALINE  ug/L 0.260
14 2012-04-05     FLAB08 CHLOROPHYLLA-SALINE  ug/L 0.509

The reason why raw is an argument rather than the default is because my initial feedback on the package was from users who appreciated receiving output in non-tidy format for simple plotting and Excel import.

Also, it was valuable for users to be able to do things like:

plot(get_wq(station_id = "FLAB08", date_min = "2011-03-01",
    date_max = "2012-05-01", test_name = "CHLOROPHYLLA-SALINE"))

Do you think I should do away with the raw argument and make a separate function?

Contributor

jsta commented Sep 6, 2016

Thanks for your question @sckott.

There is already a method for returning station name and variable in separate columns.

get_wq(station_id = "FLAB08", date_min = "2011-03-01",
    date_max = "2012-05-01", test_name = "CHLOROPHYLLA-SALINE",
    raw = TRUE)[,c("date", "Station.ID", "Test.Name", "Units", "Value")]

         date Station.ID           Test.Name Units Value
1  2011-03-21     FLAB08 CHLOROPHYLLA-SALINE  ug/L 0.196
3  2011-04-11     FLAB08 CHLOROPHYLLA-SALINE  ug/L 0.276
5  2011-05-09     FLAB08 CHLOROPHYLLA-SALINE  ug/L 0.328
6  2011-06-07     FLAB08 CHLOROPHYLLA-SALINE  ug/L 0.329
7  2011-07-05     FLAB08 CHLOROPHYLLA-SALINE  ug/L 0.402
9  2011-08-08     FLAB08 CHLOROPHYLLA-SALINE  ug/L 0.907
10 2011-09-06     FLAB08 CHLOROPHYLLA-SALINE  ug/L 0.887
11 2011-10-06     FLAB08 CHLOROPHYLLA-SALINE  ug/L 0.573
12 2011-12-08     FLAB08 CHLOROPHYLLA-SALINE  ug/L 0.369
13 2012-02-02     FLAB08 CHLOROPHYLLA-SALINE  ug/L 0.260
14 2012-04-05     FLAB08 CHLOROPHYLLA-SALINE  ug/L 0.509

The reason why raw is an argument rather than the default is because my initial feedback on the package was from users who appreciated receiving output in non-tidy format for simple plotting and Excel import.

Also, it was valuable for users to be able to do things like:

plot(get_wq(station_id = "FLAB08", date_min = "2011-03-01",
    date_max = "2012-05-01", test_name = "CHLOROPHYLLA-SALINE"))

Do you think I should do away with the raw argument and make a separate function?

@sckott

This comment has been minimized.

Show comment
Hide comment
@sckott

sckott Sep 6, 2016

Member

@jsta Thanks. I think it's okay to leave as is, the raw parameter takes care of it I think.

Member

sckott commented Sep 6, 2016

@jsta Thanks. I think it's okay to leave as is, the raw parameter takes care of it I think.

@jsta

This comment has been minimized.

Show comment
Hide comment
@jsta

jsta Sep 28, 2016

Contributor

@sckott Please let me know if there is anything else I can do for this submission. I think both reviewers have given their 👍

Contributor

jsta commented Sep 28, 2016

@sckott Please let me know if there is anything else I can do for this submission. I think both reviewers have given their 👍

@sckott

This comment has been minimized.

Show comment
Hide comment
@sckott

sckott Sep 28, 2016

Member

thanks again for the reviews @aappling-usgs and @fawda123 🚀 ✏️ 📋

@jsta - There's a few things I'd like changed before approving:

  • Please use a README.Rmd and build your README.md from that, instead of having to manually create README.md, which I assume you're doing (correct me if I'm wrong)
  • I noticed one thing to deal with after running goodpractice::gp() at some point (no need to do prior to transferring though):
  • Avoid long code and documentation lines. Stick to 80 characters or less for each line. Looks like your code does this, but docs should too.
  • I suggest moving your code to do HTTP requests into helper functions that are general enough that you can use them in your various functions that need to do HTTP requests, e.g,
dbh_GET <- function(url, ...) {
    res <- httr::GET(url, ...) # any params can be passed to GET() when calling dbh_GET()
    stop_for_status(res) # make sure to check if the http status is good or not
    httr::content(res, "text", encoding = "UTF-8") # parse to text
}

Then use dbh_GET() in your functions.


Then after that, the steps to move to ropensci are:

  • Add the rOpenSci footer to the README
  • Transfer the repo to ropenscilabs - I've created a team in that org. account, and you should have received an email about, after you accept the invitation, go to Settings->Transfer Ownership
  • After transferring to ropenscilabs, travis builds should work automatically under the new github account. but you do need to update the link in the badge in your readme
  • Update all links to github repo to replace jsta with ropenscilabs
Member

sckott commented Sep 28, 2016

thanks again for the reviews @aappling-usgs and @fawda123 🚀 ✏️ 📋

@jsta - There's a few things I'd like changed before approving:

  • Please use a README.Rmd and build your README.md from that, instead of having to manually create README.md, which I assume you're doing (correct me if I'm wrong)
  • I noticed one thing to deal with after running goodpractice::gp() at some point (no need to do prior to transferring though):
  • Avoid long code and documentation lines. Stick to 80 characters or less for each line. Looks like your code does this, but docs should too.
  • I suggest moving your code to do HTTP requests into helper functions that are general enough that you can use them in your various functions that need to do HTTP requests, e.g,
dbh_GET <- function(url, ...) {
    res <- httr::GET(url, ...) # any params can be passed to GET() when calling dbh_GET()
    stop_for_status(res) # make sure to check if the http status is good or not
    httr::content(res, "text", encoding = "UTF-8") # parse to text
}

Then use dbh_GET() in your functions.


Then after that, the steps to move to ropensci are:

  • Add the rOpenSci footer to the README
  • Transfer the repo to ropenscilabs - I've created a team in that org. account, and you should have received an email about, after you accept the invitation, go to Settings->Transfer Ownership
  • After transferring to ropenscilabs, travis builds should work automatically under the new github account. but you do need to update the link in the badge in your readme
  • Update all links to github repo to replace jsta with ropenscilabs
@jsta

This comment has been minimized.

Show comment
Hide comment
@jsta

jsta Sep 29, 2016

Contributor

Ok, I think I've addressed all the items you've listed under pre-approval. Should I move forward with transferring? @sckott

Contributor

jsta commented Sep 29, 2016

Ok, I think I've addressed all the items you've listed under pre-approval. Should I move forward with transferring? @sckott

@sckott

This comment has been minimized.

Show comment
Hide comment
@sckott

sckott Sep 29, 2016

Member

thanks for the change. Yes, do move forward

Member

sckott commented Sep 29, 2016

thanks for the change. Yes, do move forward

@sckott sckott closed this Oct 10, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment