-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable paging for betydb functions #94
Conversation
update fork
…in order to prevent 504 timeout errors
@infotroph please review |
Merge branch 'master' of https://github.com/dlebauer/traits Conflicts: tests/testthat/test-betydb.R
Codecov Report
@@ Coverage Diff @@
## master #94 +/- ##
==========================================
+ Coverage 15.44% 24.08% +8.63%
==========================================
Files 17 17
Lines 602 681 +79
==========================================
+ Hits 93 164 +71
- Misses 509 517 +8
Continue to review full report at Codecov.
|
@sckott I would appreciate any advice on writing better tests ... I expected that the tests I wrote would cover most of the code that I wrote and was pretty surprised to these results https://codecov.io/gh/ropensci/traits/pull/94/src/R/betydb.R that seem to indicate most of the new lines in betydb.R are not covered. |
you don't have any requests with limit > 200 correct? seems like that block of lines 179-224 only invoked if limit > 200 |
aha. Thanks ... I was trying to minimize the testing time; will revise accordingly |
def. do minimize testing time, just having one test that hits that bit of the code is good i think |
updated tests to use smaller per-call limit
change warning to message tests: fixed errors testing API responses
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apologies for incomplete comments -- I'll need to come back to this, but submitting now to save my comments
R/betydb.R
Outdated
lst <- jsonlite::fromJSON(txt, simplifyVector = TRUE, flatten = TRUE) | ||
|
||
api_version <- ifelse(grepl('/beta/api', url), 'beta', 'v0') | ||
if(!exists('per_call_limit')) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the intention to test whether ...
contains a parameter named per_call_limit
? I don't think this will do that -- parameters in ...
aren't assigned into the function environment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually, the whole reason I added the per_call_limit
as a parameter rather than a fixed number is so that it can be set to a smaller number in the testing. So in the testing I make per_call_limit <<- 20
a global variable ...
5000 is a reasonable limit when the package is being used and 20 is sufficient for testing
It is kind of hacky, but is the difference b/w tests that take seconds vs minutes (e.g. so that I can test with iterations of [0,1,n] and remainders [0,n].
if(args$limit <= per_call_limit){ | ||
txt <- betydb_http(url, args, key, user, pwd, ...) | ||
lst <- jsonlite::fromJSON(txt, simplifyVector = TRUE, flatten = TRUE) | ||
} else if (args$limit > per_call_limit){ # divide large requests (aka page) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be could be an unconditional else.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This approach makes it more clear and explicit the case under which we begin paging. the final else on line 264 below captures if args$limit
captures cases where !args$limit <= per_call_limit | args$limit > per_call limit
. (e.g. if args$limit
can't be coerced to a numeric value)
tests/testthat/test-betydb.R
Outdated
expect_is(get.out, "response") | ||
expect_true(grepl("OK", get.out[["headers"]]$status)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or expect_match(httr::headers(get.out)$status, "OK" )
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
tests/testthat/test-betydb.R
Outdated
betydb_api_version = "beta", | ||
betydb_key = "eI6TMmBl3IAb7v4ToWYzR0nZYY07shLiCikvT6Lv", | ||
warn=-1 ## suppress warnings that we did not get all data | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Beware that option changes are NOT local to this test block -- they'll stay in effect for other tests until the end of the file. See test_that("URL & version options work"
above for one approach to cleanup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
replaced w/ on.exit(reset_opts(opts))
tests/testthat/test-betydb.R
Outdated
betydb_url = "https://www.betydb.org/", | ||
betydb_api_version = "v0", | ||
warnings = 0) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, OK. This works, but on.exit(reset_opts(opts))
is still arguably safer: It both resets all previous options, and unsets any newly set options that didn't exist before this test.
recent changes to allow paging
better opts handling change from `expect_true(grepl ...` to `expect_match`
Include 'betydb_experiment()', exactly the same as 'betyb_site()', but replaces 'sites' with 'experiments' endpoint
Include method for querying experiment by id
Paging fix
@sckott with the latest updates this branch is now passing. Do you have a schedule for pushing new versions to CRAN? Would be nice to be able to use a stable release in the training materials I am writing. |
we can target for next milestone https://github.com/ropensci/traits/milestone/4 some of those issues i may move to next milestone, so could be pretty soon. |
merged, sorry about the long wait |
added (Chris Black) (@infotroph) to author list based on contributions to the betydb interface (especially ropensci#94, ropensci#88, ropensci#82)
This PR enables paging for betydb_* functions in order to avoid server timeout.
Description
The BETYdb API is fairly slow (PecanProject/bety#419), and with very large requests (> 500k records) functions can fail due to server timeouts. Although this is not an issue for the default database (betydb.org) it is becoming an issue for other instances of BETYdb.
These changes use a for loop to divide large requests into smaller requests (currently in chunks of 5000 rows) using the
limit
andoffset
parameters to iterate through.Related Issue
Example
I've added new tests just to make sure that we are getting the right number of requested records! More examples can be seen in the tests file.
Here is an example I've used to test the effect of limit on timings: