New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix corner case for num_as_location() #1166
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the motivation for extending consecutively but non-monotonically? Is this needed for tibble?
src/subscript-loc.c
Outdated
R_len_t n_missing = 0; | ||
int extended[n_extend]; | ||
int i_extend = 0; | ||
int nn = n; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would be benefit from an informative name, e.g. current_n
or new_n
src/subscript-loc.c
Outdated
|
||
R_len_t n_subscript = Rf_length(sorted); | ||
R_len_t n_missing = 0; | ||
int extended[n_extend]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We shouldn't allocate a variable size array that is potentially very large on the stack because it could overflow. At the minimum the stack size should be checked using R_CheckStack2()
.
There should also be two paths, one with a heap alloc if n_extend
gets too large (say 100), and one with a stack alloc. I wonder if this doesn't get too complicated though, so a heap alloc is probably just as fine in this case, unless the stack alloc can be demonstrated to improve efficiency measurably (I would expect other costs to dominate, esp. for a small n_extend
).
// should not require any sorting | ||
if (elt - 1 == nn) { | ||
++nn; | ||
--n_extend; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
n_extend
is no longer used at this point right? So no purpose for decreasing it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the end of the loop, n_extend == i_extend
. What's a good way to check this invariant in "debug builds"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's probably possible via the build system by introducing DESCRIPTION as a dependency and checking whether the version has large components, and including a debug header in that case.
But I don't think we want to include debug-only code paths in vctrs right now, so the algorithm needs to be correct ;) I don't mind having this consistency check unconditionally though, it won't make much of a difference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can add if (failure) Rf_error("INTERNAL: ...");
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes we use this pattern for internal errors:
Rf_error("Internal error in `s3_get_class()`: Class must have length.");
src/subscript-loc.c
Outdated
++nn; | ||
--n_extend; | ||
} else { | ||
extended[i_extend++] = elt - 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The extended
array was not zero-initialised, and this is the only path in the loop where it is assigned a value. This means the sorted array below will contain random memory in some cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure. I'm using only the first i_extend
entries of the array during the sort and the subsequent check. Added a comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, you're right.
tests/testthat/test-subscript-loc.R
Outdated
expect_identical(num_as_location(c(1:5, 7, 6), 3, oob = "extend"), c(1:5, 7L, 6L)) | ||
expect_identical(num_as_location(c(1, NA, 3), 2, oob = "extend"), c(1L, NA, 3L)) | ||
expect_identical(num_as_location(c(1, NA, 4, 3), 2, oob = "extend"), c(1L, NA, 4L, 3L)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you move this to a separate "can extend beyond the end consecutively but non-monotonically"
test please?
I'd like to maintain data frame compatibility here: df <- data.frame(a = 1:3)
df[6:4, ] <- 6:4
df
#> a
#> 1 1
#> 2 2
#> 3 3
#> 4 4
#> 5 5
#> 6 6 Created on 2020-07-01 by the reprex package (v0.3.0) |
Thanks! |
Are you planning a CRAN release any time soon? It would be useful to have this, and also |
yes this week. |
and replace by a simpler and faster implementation, with shortcut if appending in strictly increasing order without gaps.