-
-
Notifications
You must be signed in to change notification settings - Fork 104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dbhydroR package #61
Comments
thanks for your submission @jsta - looks like you didn't check Do you intend for this package to go on CRAN?. Can you check it or explain why not |
@sckott it's checked now. |
Editor checks:
Reviewers: @aappling-usgs @fawda123 |
After working with the
Should I enable them all on Travis or only certain types? |
@jsta sorry about misleading! I don't remember exactly why but I usually only skip on travis temporarily if a web service is down temporarily .
Just skip on CRAN, and enable in all others |
Ok, tests are enabled on Travis now. |
@fawda123 is interested in reviewing this submission. |
@jsta I'm revising my advice above - Skip on CRAN those tests that make web requests. If there are some tests that do not, then don't skip those. It's good to have at least some tests that run on CRAN if possible, but not required |
thanks @fawda123 for offering, assigned above #61 (comment) - Here's instructions for review https://github.com/ropensci/onboarding/blob/master/reviewing_guide.md and examples of good reviews: #22 (comment) and #20 (comment) |
@jsta @sckott here's my review. See the comments below for items that aren't checked.
DocumentationThe package includes all the following forms of documentation:
Functionality
Final approval (post-review)
Estimated hours spent reviewing: 3 Review CommentsThis package provides access to water quality and hydrological data from the dbhydro database provided by the South Florida Water Management District. Although the package has a regional focus, the database could be used to address more general ecological or environmental management questions. Having a package to more easily acquire the database will no doubt facilitate use and reach of the data. The package is lightweight and is built around two main functions. The documentation and examples could be improved in some cases (see below), but overall I do not have any major concerns about package structure or functionality. However, access to metadata for the database is a serious limitation that needs to be addressed. Sure I could go online to find what I need but it seems like the package should facilitate that process in some way. The package does a good job retrieving data if the station id and test name are known but it would be much more helpful if there was some way to query the available information. Here are some options to consider:
Build/install
Examples
> cleanhydro(gethydro(dbkey = "15081", date_min = "2013-01-01", date_max = "2013-02-02"))
Error: is.response(x) is not TRUE I ran the check to include examples in Documentation/vignette
Functions
Contributing
Compliance with rOpenSci Packaging GuideMy comments above address most of the compliance concerns for rOpenSci. Below are some minor, additional points. Function variable namingI guess the CRAN gatekeepers don't care about this but I was having a mental block with READMEThe ropensci footer will have to be added eventually: Code of conductAlso consider adding the code of conduct badge with Hopefully these comments are helpful. Let me know if you have any questions. |
@fawda123 thanks very much for your review! |
Package ReviewJoseph (@jsta) and Scott (@sckott), here's my review. Marcus (@fawda123), thanks for a thoughtful review to build on.
DocumentationThe package includes all the following forms of documentation:
Functionality
Final approval (post-review)
Estimated hours spent reviewing: 5 Review CommentsThis package provides scriptable access to an extensive Florida water quality/quantity database, DBHYDRO. It's also noticeably faster than the browser-based interface. I agree with @fawda123 that the largest room for improvement is in the documentation. I have also made suggestions below for increasing the flexibility of the data-cleaning functions to serve several common use cases, and I've pointed out a few edge cases where the package's functionality might surprise a user. DocumentationAs @fawda123 noted, it can be hard to identify the station IDs and tests you want if you don't know what the options are. There are a few ways you could helper users:
Questions that arose on my first encounter with the package/dataset:
Other documentation thoughts
Build/install
Examples/tests
> devtools::run_examples(run=FALSE)
...
> cleanhydro(gethydro(dbkey = "15081", date_min = "2013-01-01", date_max = "2013-02-02"))
Error: is.response(x) is not TRUE
Possibly unnecessary files & code
Functionsgetwq + gethydroThese functions have a clean, intuitive interface. A few thoughts:
> getdbkey(stationid = "C111%", stat = 'MEAN', category = "WQ")
Search results for 'C111% WQ'
Dbkey Group Data Type Freq Recorder Start Date End Date
1 38097 C111WETL H2OT DA ???? 04-OCT-2003 10-OCT-2011
2 38104 C111WETL SALIN DA ???? 04-OCT-2003 10-OCT-2011
3 38100 C111WETL SCOND DA ???? 04-OCT-2003 10-OCT-2011
> getwq("FLAB01", "2014-09-14", "2014-09-18", "NITRATE+NITRITE-N", raw = TRUE)$Value
[1] -0.01
> getwq("FLAB01", "2014-09-14", "2014-09-18", "NITRATE+NITRITE-N", raw = TRUE, mdl_handling='half')$Value
[1] -0.01 getdbkeyThough not advertised in the README, the > getdbkey(stationid = "C111%", stat = 'MEAN', category = "WQ", detail.level="full")
Search results for 'C111% WQ'
Get Data Dbkey \nStation\n Group Data Type Freq Stat Recorder Agency
1 38097 C111WETLND C111WETL H2OT DA MEAN ???? USGS
2 38104 C111WETLND C111WETL SALIN DA MEAN ???? USGS
3 38100 C111WETLND C111WETL SCOND DA MEAN ???? USGS
Start Date End Date Strata County Op Num Latitude Longitude X Coord
1 04-OCT-2003 10-OCT-2011 0 DAD Â 251740 803112 814755.394
2 04-OCT-2003 10-OCT-2011 0 DAD Â 251740 803112 814755.394
3 04-OCT-2003 10-OCT-2011 0 DAD Â 251740 803112 814755.394
Y Coord Basin Struct Section Township Range
1 349549.418 C111_CO Â 15 59 38
2 349549.418 C111_CO Â 15 59 38
3 349549.418 C111_CO Â 15 59 38
# we'd expect >100 dbkeys in this query
> nrow(getdbkey(category = "GW", detail.level = "full"))
Search results for 'NA GW'
[1] 100
# because we easily get >100 sites combined in these two queries:
> dba <- getdbkey(stationid = 'A%', category = "GW", detail.level = "full")
Search results for 'A% GW'
> dbi <- getdbkey(stationid = 'I%', category = "GW", detail.level = "full")
Search results for 'I% GW'
> nrow(dba) + nrow(dbi)
[1] 134 cleanwq + cleanhydro
> h <- gethydro(dbkey = "IY639", date_min = "2009-01-30", date_max = "2009-01-31")
Warning messages:
1: In cleanhydro(res) : Column headings missing. Guessing...
2: In cleanhydro(res) : Instantaneous data detected
> head(h$date)
[1] "2009-01-30 00:40:00 MST" "2009-01-30 00:55:00 MST" "2009-01-30 01:10:00 MST"
[4] "2009-01-30 01:25:00 MST" "2009-01-30 01:40:00 MST" "2009-01-30 01:55:00 MST"
Signing offThis package is already a nice contribution - thanks for developing it. Let me know if you'd like to clarify/discuss any of the above points. Best, Alison (@aappling-usgs) |
@aappling-usgs thanks so much for your review! @jsta let us know if you need any help, and continue the conversation here |
Thanks for the review @fawda123! Review Comments (@fawda123)This package provides access to water quality and hydrological data from the dbhydro database provided by the South Florida Water Management District. Although the package has a regional focus, the database could be used to address more general ecological or environmental management questions. Having a package to more easily acquire the database will no doubt facilitate use and reach of the data. The package is lightweight and is built around two main functions. The documentation and examples could be improved in some cases (see below), but overall I do not have any major concerns about package structure or functionality. However, access to metadata for the database is a serious limitation that needs to be addressed. Sure I could go online to find what I need but it seems like the package should facilitate that process in some way. The package does a good job retrieving data if the station id and test name are known but it would be much more helpful if there was some way to query the available information. Here are some options to consider:
I added a link to the available test-name listing in the
Thank you for the suggestion. I am working on wrapping the station-search utility at http://my.sfwmd.gov/dbhydroplsql/water_quality_data.show_group_station_characters. For now, I tried to make map-based selection via the ArcGIS online or Google Earth interface more prominent in the vignette and README. https://github.com/SFWMD/dbhydroR/issues/3 Build/install
Examples
I ran the check to include examples in \dontrun tags, as for the above. It makes sense to not run the example if it's supposed to fail but there is no description exaplaining what the example shows. Either fix the example above or provide some explanation as to why it fails. All other examples ran without issues. The example has been fixed and now runs without error SFWMD/dbhydroR@ca676ef Documentation/vignette
Functions
Contributing
Good suggestion. I think it might be best to wait on this until we are closer to acceptance? Compliance with rOpenSci Packaging GuideMy comments above address most of the compliance concerns for rOpenSci. Below are some minor, additional points. Function variable namingI guess the CRAN gatekeepers don't care about this but I was having a mental block with getwq given it's similarity to getwd. You might consider changing the function name to make it more distinct. READMEThe ropensci footer will have to be added eventually: I think the onboarding guidelines have been updated to say that the footer is only added after a package has been accepted. Code of conductAlso consider adding the code of conduct badge with devtools::use_code_of_conduct. A COC has been added. Hopefully these comments are helpful. Let me know if you have any questions. |
Thanks for the review @aappling-usgs! Review Comments (@aappling-usgs)This package provides scriptable access to an extensive Florida water quality/quantity database, DBHYDRO. It's also noticeably faster than the browser-based interface. dbhydroR will enhance the usability of the database and the reproducibility of analyses of DBHYDRO data. I agree with @fawda123 that the largest room for improvement is in the documentation. I have also made suggestions below for increasing the flexibility of the data-cleaning functions to serve several common use cases, and I've pointed out a few edge cases where the package's functionality might surprise a user. DocumentationAs @fawda123 noted, it can be hard to identify the station IDs and tests you want if you don't know what the options are. There are a few ways you could helper users:
Questions that arose on my first encounter with the package/dataset:
Other documentation thoughts
Build/install
Examples/tests
The example has been fixed and now runs without error SFWMD/dbhydroR@ca676ef Possibly unnecessary files & code
Functionsgetwq + gethydro These functions have a clean, intuitive interface. A few thoughts:
I realize that this is a bit confusing. The
I refactored getdbkey Though not advertised in the README, the getdbkey function looks like a useful way to identify datasets to download. With that in mind, I have a few suggestions that are mostly illustrated by this example:
I added some advertisement for
I circumvented this limitation. cleanwq + cleanhydro
I forced hydro and wq date/times to return in EST. SFWMD/dbhydroR@4d1dd15
Maybe I have captured the spirit of this comment by standardizing the behavior of the clean functions, and forcing mdl_handling to occur regardless of the value of
Signing off This package is already a nice contribution - thanks for developing it. Let me know if you'd like to clarify/discuss any of the above points. Best, Alison (@aappling-usgs) |
@jsta Thanks for responding to reviews and making changes. @fawda123 @aappling-usgs There's a few questions for you, see his responses. Also, any more thoughts, or are you happy? |
@jsta Looks like you've done a good job addressing my comments, nice work. I'm responding below to your specific questions and some follow-up points after having some time to stew on the review. About the wild-cards, I guess I was wondering if you could have a few more examples in the vignette of how these can be used to get info on several stations in the case where a user has some knowledge about the station naming convention but specific names are unknown. For example, all stations that start with 'A' or 'B', etc. Are there other characters besides '%' that can help with retrieval (e.g., regex matching)? It's not all that important, the additional links in the vignette are very helpful. The link for the Minor typo, 'hydrolgic' in the README description. I'll try clarify my comment on the QA field blanks, which might be related to my misunderstanding of the data. For an instance of using I also ask you to reconsider changing the name of |
Hah - true, I did end up typing @jsta, a quick glance suggests you've made some great improvements. I've been out of town and will need a few more days to catch up and reply properly. |
|
I went ahead and refactored to underscored function names. Maybe that will break the |
@jsta The underscore works better, good idea. @sckott I'm happy with the changes from my end, this package is good to go pending comments from @aappling-usgs |
thanks @fawda123 - we'll wait for a thumbsup from @aappling-usgs |
@jsta - yeah, tons of great changes, and I'm satisfied as-is. I agree that there are several ways to address the division between getting and cleaning, and I like what you've chosen. I also like the new underscores. I'm not particularly worried about it, but the > get_dbkey(category = "SW", stationid = "ALLIGAT", param = "STG", detail.level = "full")
Search results for 'ALLIGAT SW'
Get Data Dbkey Station Group Data Type Freq Stat Recorder Agency Start Date End Date Strata County Op Num
1 00875 ALLIGAT ALLIGAT STG DA MEAN ???? USGS 05-OCT-1979 06-DEC-1979 0 CHA Â
Latitude Longitude X Coord Y Coord Basin Struct Section Township Range
1 26.34222 -81.8303 338357.138 932249.714 CHARLOTT SPIL 23 41 23
> get_dbkey(stationid = "C111%", stat = 'MEAN', category = "WQ", detail.level="full")
Search results for 'C111% WQ'
Get Data Dbkey Station Group Data Type Freq Stat Recorder Agency Start Date End Date Strata County Op Num
1 38097 C111WETLND C111WETL H2OT DA MEAN ???? USGS 04-OCT-2003 10-OCT-2011 0 DAD Â
2 38104 C111WETLND C111WETL SALIN DA MEAN ???? USGS 04-OCT-2003 10-OCT-2011 0 DAD Â
3 38100 C111WETLND C111WETL SCOND DA MEAN ???? USGS 04-OCT-2003 10-OCT-2011 0 DAD Â
Latitude Longitude X Coord Y Coord Basin Struct Section Township Range
1 25.74 -80.112 814755.394 349549.418 C111_CO Â 15 59 38
2 25.74 -80.112 814755.394 349549.418 C111_CO Â 15 59 38
3 25.74 -80.112 814755.394 349549.418 C111_CO Â 15 59 38 |
@aappling-usgs My guess is that it is some sort of encoding issue. I've just pushed a commit that specifies the encoding of |
Hmm. nope. |
Ok, I think I figured it out. Those two columns were being read as For future reference, I used |
That's it! > library(dbhydroR)
> get_dbkey(category = "SW", stationid = "ALLIGAT", param = "STG", detail.level = "full")
Search results for 'ALLIGAT SW'
Dbkey Station Group Data Type Freq Stat Recorder Agency Start Date End Date Strata County Op Num Latitude
1 00875 ALLIGAT ALLIGAT STG DA MEAN ???? USGS 05-OCT-1979 06-DEC-1979 0 CHA 26.34222
Longitude X Coord Y Coord Basin Struct Section Township Range
1 -81.8303 338357.138 932249.714 CHARLOTT SPIL 23 41 23
> get_dbkey(stationid = "C111%", stat = 'MEAN', category = "WQ", detail.level="full")
Search results for 'C111% WQ'
Dbkey Station Group Data Type Freq Stat Recorder Agency Start Date End Date Strata County Op Num Latitude
1 38097 C111WETLND C111WETL H2OT DA MEAN ???? USGS 04-OCT-2003 10-OCT-2011 0 DAD 25.74
2 38104 C111WETLND C111WETL SALIN DA MEAN ???? USGS 04-OCT-2003 10-OCT-2011 0 DAD 25.74
3 38100 C111WETLND C111WETL SCOND DA MEAN ???? USGS 04-OCT-2003 10-OCT-2011 0 DAD 25.74
Longitude X Coord Y Coord Basin Struct Section Township Range
1 -80.112 814755.394 349549.418 C111_CO 15 59 38
2 -80.112 814755.394 349549.418 C111_CO 15 59 38
3 -80.112 814755.394 349549.418 C111_CO 15 59 38 |
thanks @aappling-usgs and @fawda123 for your reviews! @jsta Just looking over it before approving, a few questions: In e.g., get_wq(station_id = "FLAB08", date_min = "2011-03-01",
date_max = "2012-05-01", test_name = "CHLOROPHYLLA-SALINE")
#> date FLAB08_CHLOROPHYLLA-SALINE_ug/L
#> 1 2011-03-21 0.196
#> 2 2011-04-11 0.276
#> 3 2011-05-09 0.328
#> 4 2011-06-07 0.329
#> 5 2011-07-05 0.402
#> 6 2011-08-08 0.907
#> 7 2011-09-06 0.887
#> 8 2011-10-06 0.573
#> 9 2011-12-08 0.369
#> 10 2012-02-02 0.260
#> 11 2012-04-05 0.509 Is the 2nd column name a combination of the station name and the variable? If so, I'd prefer that this is changed to be in more of tidy format, with station as it's own column, and variable as its own column. At least have a function to get it in that format easily would be nice. |
Thanks for your question @sckott. There is already a method for returning station name and variable in separate columns. get_wq(station_id = "FLAB08", date_min = "2011-03-01",
date_max = "2012-05-01", test_name = "CHLOROPHYLLA-SALINE",
raw = TRUE)[,c("date", "Station.ID", "Test.Name", "Units", "Value")]
date Station.ID Test.Name Units Value
1 2011-03-21 FLAB08 CHLOROPHYLLA-SALINE ug/L 0.196
3 2011-04-11 FLAB08 CHLOROPHYLLA-SALINE ug/L 0.276
5 2011-05-09 FLAB08 CHLOROPHYLLA-SALINE ug/L 0.328
6 2011-06-07 FLAB08 CHLOROPHYLLA-SALINE ug/L 0.329
7 2011-07-05 FLAB08 CHLOROPHYLLA-SALINE ug/L 0.402
9 2011-08-08 FLAB08 CHLOROPHYLLA-SALINE ug/L 0.907
10 2011-09-06 FLAB08 CHLOROPHYLLA-SALINE ug/L 0.887
11 2011-10-06 FLAB08 CHLOROPHYLLA-SALINE ug/L 0.573
12 2011-12-08 FLAB08 CHLOROPHYLLA-SALINE ug/L 0.369
13 2012-02-02 FLAB08 CHLOROPHYLLA-SALINE ug/L 0.260
14 2012-04-05 FLAB08 CHLOROPHYLLA-SALINE ug/L 0.509 The reason why Also, it was valuable for users to be able to do things like: plot(get_wq(station_id = "FLAB08", date_min = "2011-03-01",
date_max = "2012-05-01", test_name = "CHLOROPHYLLA-SALINE")) Do you think I should do away with the |
@jsta Thanks. I think it's okay to leave as is, the |
@sckott Please let me know if there is anything else I can do for this submission. I think both reviewers have given their 👍 |
thanks again for the reviews @aappling-usgs and @fawda123 🚀 ✏️ 📋 @jsta - There's a few things I'd like changed before approving:
dbh_GET <- function(url, ...) {
res <- httr::GET(url, ...) # any params can be passed to GET() when calling dbh_GET()
stop_for_status(res) # make sure to check if the http status is good or not
httr::content(res, "text", encoding = "UTF-8") # parse to text
} Then use Then after that, the steps to move to ropensci are:
|
Ok, I think I've addressed all the items you've listed under pre-approval. Should I move forward with transferring? @sckott |
thanks for the change. Yes, do move forward |
Summary
dbhydroR
provides scripted access to the South Florida Water Management District's DBHYDRO database which holds over 35 million water quality and hydrologic data records from the Florida Everglades and surrounding areas.https://github.com/sfwmd/dbhydro
Anyone interested in water quality and hydrologic data from the Florida Everglades - ecologists, engineers, meteorologists, hydrogeologists, hydrologists, etc.
For the most part, no. However, a very small proportion of DBHYDRO mirrors data that is also present on the USGS (United States Geological Survey) NWIS (National Water Information System). This data could then be accessed by the
dataRetrieval
package.Requirements
Confirm each of the following by checking the box. This package:
Publication options
paper.md
with a high-level description.Detail
R CMD check
(ordevtools::check()
) succeed? Paste and describe any errors or warnings:Some components like the rOpenSci footer, README section ordering, and NEWS tags are pending this initial inquiry.
The text was updated successfully, but these errors were encountered: