Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade package based on dplyr 1.1.0 release #190

Merged
merged 13 commits into from
Feb 22, 2023

Conversation

damianooldoni
Copy link
Member

This PR aims to follow the changes in dplyr 1.1.0.

  1. 777e12b: do not use dplyr::na_if() for replacing 0 with NA over all the data.frame (see bullet point referring to this function in https://dplyr.tidyverse.org/news/index.html#vctrs-1-1-0 and Rewrite na_if() using vctrs tidyverse/dplyr#6329)
  2. 8f1c2bf: do not use .data in tidyselect expressions (select(), rename(), relocate()) as it generates warnings. Use "colname" syntax instead. In this way no notes about "no visible binding for global variable" arise. See https://dplyr.tidyverse.org/articles/programming.html#eliminating-r-cmd-check-notes.
  3. 8f1c2bf (same commit as 2): add multiple = "all" in a left_join() to avoid warning (we expect multiple matches of course). See https://www.tidyverse.org/blog/2023/01/dplyr-1-1-0-joins/#multiple-matches
  4. Update DESCRIPTION by explicit version 1.1.0 of dplyr dependency (064bfee), adding @PietrH as contributor to the list of authors (955162b) and bumping version number (fb1ac9b)

Copy link
Member

@PietrH PietrH left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well done, quick fix and nicely implemented. I have a few minor questions, notes. I have commented in line.

Code Coverage

All tests pass, but code coverage is insufficient, I'm not suggesting this is fixed in the scope of this PR. Several functions are untested. This should be addressed in a future PR.

NAMESPACE

devtools::check() results in a warning for me:

Consider adding
    importFrom("utils", "data")
  to your NAMESPACE file.

.data is imported from dplyr. Where do we use utils::data?

Style

Currently we mix tidy evaluation with standard evaluation (passing column names both as strings and as symbols). It would be more neat to stick to either or.

R/get_n_obs.R Outdated Show resolved Hide resolved
R/get_record_table.R Outdated Show resolved Hide resolved
R/get_record_table.R Outdated Show resolved Hide resolved
@@ -292,7 +293,7 @@ get_record_table <- function(package = NULL,
dplyr::filter(.data$delta.time.secs == max(.data$delta.time.secs) &
.data$row_number == max(.data$row_number)) %>%
dplyr::ungroup() %>%
dplyr::select(-.data$row_number)
dplyr::select(-"row_number")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Style: You mix standard and non standard evalutation (tidy evaluation), in this case dplyr::select(-row_number) would be tidy evaluation.

https://dplyr.tidyverse.org/articles/programming.html

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a note, doesn't break anything ;)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See #190 (comment) for more details. It doesn't break anything but it generates a note:

Undefined global functions or variables:
row_number

@@ -99,7 +99,7 @@ read_camtrap_dp <- function(file = NULL,
obs_col_names <- names(observations)
if (all(c("X22", "X23", "X24") %in% names(observations))) {
observations <- observations %>%
dplyr::rename(speed = X22, radius = X23, angle = X24)
dplyr::rename(speed = "X22", radius = "X23", angle = "X24")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line is not covered by test-read_camtrap_dp.R, there are more cases of insufficient coverage. So we can postpone this to a separate pull request.

Copy link
Member Author

@damianooldoni damianooldoni Feb 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a fast fix (see #185). These fields will disappear in new version of camtrap-dp. To test this part I think we have to add a new data package to inst/extdata and I thought it was not really worth.

tests/testthat/test-calc_animal_pos.R Show resolved Hide resolved
R/get_n_obs.R Show resolved Hide resolved
R/get_cam_op.R Outdated Show resolved Hide resolved
@damianooldoni
Copy link
Member Author

Great review, @PietrH, thanks a lot.

Before I get to specific comments, I would like to discuss with you an important point which comes quite often in your review and it's all due to the new version of dplyr, I am afraid.

As I mentioned in my second bullet point above, this is how to solve R CMD check notes: https://dplyr.tidyverse.org/articles/programming.html#eliminating-r-cmd-check-notes. This means using SE evaluation for tidyselect functions (select, rename, ...) and continuing using .data$ for the others.

I tested all ways to work with columns in a data.frame within a dummy package I have just created, (dummyr) with jsut one dummy function check_tidyselect() and one dummy loaded data.frame, dummy_df.

So, if we use https://github.com/damianooldoni/dummyr/blob/88a5725d9ce3ea461af513263ae5bc4dd2b7f674/R/check_tidyselect.R#L14-L19 in the dummyr package, we get a R CMD CHECK note for undefined global variables:

checking R code for possible problems ... NOTE
  check_tidyselect: no visible binding for global variable 'col_b'
  check_tidyselect: no visible binding for global variable 'col_a'
  check_tidyselect: no visible binding for global variable 'col_c'
  Undefined global functions or variables:
    col_a col_b col_c

If we use https://github.com/damianooldoni/dummyr/blob/3f507f6027291a28a86776c2ff1c4d3469bbf687/R/check_tidyselect.R#L14-L19, we are using SE for select() and this solves one issue, but we get again the R CMD CHECK note below as rename() and filter() are still using NSE without .data$:

checking R code for possible problems ... NOTE
  check_tidyselect: no visible binding for global variable 'col_b'
  check_tidyselect: no visible binding for global variable 'col_c'
  Undefined global functions or variables:
    col_b col_c

And using SE in rename() (see https://github.com/damianooldoni/dummyr/blob/5ec2ad3ff136f66d526eda7f181759789788392d/R/check_tidyselect.R#L14-L19) solves the note for col_c:

checking R code for possible problems ... NOTE
  check_tidyselect: no visible binding for global variable 'col_b'
  Undefined global functions or variables:
    col_b

This doesn't surprise me as I learned few days ago that rename and select are both using tidyselect behind the screen (same for relocate() just to cite another one).

Finale step, as shown in the blogpost of dplyr I mentioned above, we need to add .data to all data masking functions (filter in our dummy case study, but also group_by for example). So, importing .data from dplyr, and using it in filter() as done in https://github.com/damianooldoni/dummyr/blob/dab8f59eaf942b6d2a09e75e2ce3608c578cf796/R/check_tidyselect.R#L14-L19 removes all notes.

The point is that using NSE in select() in tests files will not generate such notes which makes things quite annoying of course. I will try to be consequent in the tests and using SE there as well as good practice.

@PietrH: Do you agree on this?

@damianooldoni
Copy link
Member Author

Answer about the question in NAMESPACE. Actually we are not using utils, indeed. It's just R thinking we need to use the "data" coming from utils. Again a problem with tidyselect, this time in a nest() in get_effort(). Solved via 544c426

@peterdesmet
Copy link
Member

@damianooldoni interesting summary in #190 (comment) Do I understand correctly that you cannot use quotes column names in filter()? So the alternatives are:

  • Use unquoted colnames everywhere Leads to R CMD Check errors
  • Import .data from rlang and use .data$col_name everywhere
  • Use quoted values everywhere Doesn't work for filter()
  • Use quoted values, except for filter() Current approach in this PR

It enables compatibility with R 3.6

Co-Authored-By: Pieter Huybrechts <48065851+PietrH@users.noreply.github.com>
@damianooldoni
Copy link
Member Author

Hi @peterdesmet
Thanks to point this.

I forgot indeed to make an example with SE (= using quotes) in filter(). If you do such (see https://github.com/damianooldoni/dummyr/blob/9bc359859c7dfdea5057804e058af8f7989f99fe/R/check_tidyselect.R#L15-L20) the result is wrong as dplyr thinks you really mean the string "col_b" instead of the column col_b.

I like your list and I will copy paste it here below giving some extra comments:

  • Use unquoted colnames everywhere: Leads to R CMD Check errors

It doesn't give errors back, "only" the annoying note about "undefined global variables". But I think you meant this.

Import .data from rlang and use .data$col_name everywhere

This gives a warning as .data$ is deprecated in tidyselect expresssion, e.g. in select(), rename() and relocate()

Use quoted values everywhere Doesn't work for filter()

Indeed, see above. But note also that Standard Evaluation (= quotes) it doesn't work for other functions such as group_by("col_name"). Try dummyr package with this https://github.com/damianooldoni/dummyr/blob/d2d36da5a791c96e4166638756f9864714b6e6b7/R/check_tidyselect.R#L20 as test example.

Use quoted values, except for filter() Current approach in this PR

Indeed. As noted above it's mort than just filter().

Conclusion

Personally I don't find the solution proposed by the blogpost and implemented in this PR a great solution and I hope tidyverse (and/or R) will come to a better one. Still, there is no better solution at the moment as all other solutions fail or generate warnings/notes at R CMD check level.

@PietrH
Copy link
Member

PietrH commented Feb 22, 2023

Do you agree on this?

As per the blogpost you are following best practice. Like you I am looking forward to a different solution proposed by the community. So for the time being the mix of NSE and SE is just something we'll have to live with.

@damianooldoni
Copy link
Member Author

Indeed @PietrH. Completely agree.

@PietrH
Copy link
Member

PietrH commented Feb 22, 2023

One more small note: we should consider having an R-CMD Check version running R 3.6 to catch things like the new R4.0 |> pipe so we don't break backwards support

I can make an issue if you like

@damianooldoni
Copy link
Member Author

I was already thinking about this. But indeed, I wanted to close this PR first. So, making a issue is good: see #194

@damianooldoni damianooldoni merged commit df9de45 into main Feb 22, 2023
@damianooldoni damianooldoni deleted the check-post-dplyr-upgrade branch February 22, 2023 11:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants