Upgrade package based on dplyr 1.1.0 release #190

damianooldoni · 2023-02-14T19:44:37Z

This PR aims to follow the changes in dplyr 1.1.0.

777e12b: do not use dplyr::na_if() for replacing 0 with NA over all the data.frame (see bullet point referring to this function in https://dplyr.tidyverse.org/news/index.html#vctrs-1-1-0 and Rewrite na_if() using vctrs tidyverse/dplyr#6329)
8f1c2bf: do not use .data in tidyselect expressions (select(), rename(), relocate()) as it generates warnings. Use "colname" syntax instead. In this way no notes about "no visible binding for global variable" arise. See https://dplyr.tidyverse.org/articles/programming.html#eliminating-r-cmd-check-notes.
8f1c2bf (same commit as 2): add multiple = "all" in a left_join() to avoid warning (we expect multiple matches of course). See https://www.tidyverse.org/blog/2023/01/dplyr-1-1-0-joins/#multiple-matches
Update DESCRIPTION by explicit version 1.1.0 of dplyr dependency (064bfee), adding @PietrH as contributor to the list of authors (955162b) and bumping version number (fb1ac9b)

See https://dplyr.tidyverse.org/news/index.html#vctrs-1-1-0

PietrH

Well done, quick fix and nicely implemented. I have a few minor questions, notes. I have commented in line.

Code Coverage

All tests pass, but code coverage is insufficient, I'm not suggesting this is fixed in the scope of this PR. Several functions are untested. This should be addressed in a future PR.

NAMESPACE

devtools::check() results in a warning for me:

Consider adding
    importFrom("utils", "data")
  to your NAMESPACE file.

.data is imported from dplyr. Where do we use utils::data?

Style

Currently we mix tidy evaluation with standard evaluation (passing column names both as strings and as symbols). It would be more neat to stick to either or.

R/get_n_obs.R

R/get_record_table.R

PietrH · 2023-02-17T10:19:50Z

R/get_record_table.R

@@ -292,7 +293,7 @@ get_record_table <- function(package = NULL,
      dplyr::filter(.data$delta.time.secs == max(.data$delta.time.secs) &
        .data$row_number == max(.data$row_number)) %>%
      dplyr::ungroup() %>%
-      dplyr::select(-.data$row_number)
+      dplyr::select(-"row_number")


Style: You mix standard and non standard evalutation (tidy evaluation), in this case dplyr::select(-row_number) would be tidy evaluation.

https://dplyr.tidyverse.org/articles/programming.html

Just a note, doesn't break anything ;)

See #190 (comment) for more details. It doesn't break anything but it generates a note:

Undefined global functions or variables: row_number

PietrH · 2023-02-17T10:29:26Z

R/read_camtrap_dp.R

@@ -99,7 +99,7 @@ read_camtrap_dp <- function(file = NULL,
  obs_col_names <- names(observations)
  if (all(c("X22", "X23", "X24") %in% names(observations))) {
    observations <- observations %>%
-      dplyr::rename(speed = X22, radius = X23, angle = X24)
+      dplyr::rename(speed = "X22", radius = "X23", angle = "X24")


This line is not covered by test-read_camtrap_dp.R, there are more cases of insufficient coverage. So we can postpone this to a separate pull request.

This was a fast fix (see #185). These fields will disappear in new version of camtrap-dp. To test this part I think we have to add a new data package to inst/extdata and I thought it was not really worth.

tests/testthat/test-calc_animal_pos.R

R/get_n_obs.R

R/get_cam_op.R

damianooldoni · 2023-02-20T16:09:10Z

Great review, @PietrH, thanks a lot.

Before I get to specific comments, I would like to discuss with you an important point which comes quite often in your review and it's all due to the new version of dplyr, I am afraid.

As I mentioned in my second bullet point above, this is how to solve R CMD check notes: https://dplyr.tidyverse.org/articles/programming.html#eliminating-r-cmd-check-notes. This means using SE evaluation for tidyselect functions (select, rename, ...) and continuing using .data$ for the others.

I tested all ways to work with columns in a data.frame within a dummy package I have just created, (dummyr) with jsut one dummy function check_tidyselect() and one dummy loaded data.frame, dummy_df.

So, if we use https://github.com/damianooldoni/dummyr/blob/88a5725d9ce3ea461af513263ae5bc4dd2b7f674/R/check_tidyselect.R#L14-L19 in the dummyr package, we get a R CMD CHECK note for undefined global variables:

❯ checking R code for possible problems ... NOTE
  check_tidyselect: no visible binding for global variable 'col_b'
  check_tidyselect: no visible binding for global variable 'col_a'
  check_tidyselect: no visible binding for global variable 'col_c'
  Undefined global functions or variables:
    col_a col_b col_c

If we use https://github.com/damianooldoni/dummyr/blob/3f507f6027291a28a86776c2ff1c4d3469bbf687/R/check_tidyselect.R#L14-L19, we are using SE for select() and this solves one issue, but we get again the R CMD CHECK note below as rename() and filter() are still using NSE without .data$:

❯ checking R code for possible problems ... NOTE
  check_tidyselect: no visible binding for global variable 'col_b'
  check_tidyselect: no visible binding for global variable 'col_c'
  Undefined global functions or variables:
    col_b col_c

And using SE in rename() (see https://github.com/damianooldoni/dummyr/blob/5ec2ad3ff136f66d526eda7f181759789788392d/R/check_tidyselect.R#L14-L19) solves the note for col_c:

❯ checking R code for possible problems ... NOTE
  check_tidyselect: no visible binding for global variable 'col_b'
  Undefined global functions or variables:
    col_b

This doesn't surprise me as I learned few days ago that rename and select are both using tidyselect behind the screen (same for relocate() just to cite another one).

Finale step, as shown in the blogpost of dplyr I mentioned above, we need to add .data to all data masking functions (filter in our dummy case study, but also group_by for example). So, importing .data from dplyr, and using it in filter() as done in https://github.com/damianooldoni/dummyr/blob/dab8f59eaf942b6d2a09e75e2ce3608c578cf796/R/check_tidyselect.R#L14-L19 removes all notes.

The point is that using NSE in select() in tests files will not generate such notes which makes things quite annoying of course. I will try to be consequent in the tests and using SE there as well as good practice.

@PietrH: Do you agree on this?

damianooldoni · 2023-02-20T17:44:40Z

Answer about the question in NAMESPACE. Actually we are not using utils, indeed. It's just R thinking we need to use the "data" coming from utils. Again a problem with tidyselect, this time in a nest() in get_effort(). Solved via 544c426

peterdesmet · 2023-02-21T09:30:19Z

@damianooldoni interesting summary in #190 (comment) Do I understand correctly that you cannot use quotes column names in filter()? So the alternatives are:

Use unquoted colnames everywhere Leads to R CMD Check errors
Import .data from rlang and use .data$col_name everywhere
Use quoted values everywhere Doesn't work for filter()
Use quoted values, except for filter() Current approach in this PR

It enables compatibility with R 3.6 Co-Authored-By: Pieter Huybrechts <48065851+PietrH@users.noreply.github.com>

damianooldoni · 2023-02-22T10:45:46Z

Hi @peterdesmet ✋
Thanks to point this.

I forgot indeed to make an example with SE (= using quotes) in filter(). If you do such (see https://github.com/damianooldoni/dummyr/blob/9bc359859c7dfdea5057804e058af8f7989f99fe/R/check_tidyselect.R#L15-L20) the result is wrong as dplyr thinks you really mean the string "col_b" instead of the column col_b.

I like your list and I will copy paste it here below giving some extra comments:

Use unquoted colnames everywhere: Leads to R CMD Check errors

It doesn't give errors back, "only" the annoying note about "undefined global variables". But I think you meant this.

Import .data from rlang and use .data$col_name everywhere

This gives a warning as .data$ is deprecated in tidyselect expresssion, e.g. in select(), rename() and relocate()

Use quoted values everywhere Doesn't work for filter()

Indeed, see above. But note also that Standard Evaluation (= quotes) it doesn't work for other functions such as group_by("col_name"). Try dummyr package with this https://github.com/damianooldoni/dummyr/blob/d2d36da5a791c96e4166638756f9864714b6e6b7/R/check_tidyselect.R#L20 as test example.

Use quoted values, except for filter() Current approach in this PR

Indeed. As noted above it's mort than just filter().

Conclusion

Personally I don't find the solution proposed by the blogpost and implemented in this PR a great solution and I hope tidyverse (and/or R) will come to a better one. Still, there is no better solution at the moment as all other solutions fail or generate warnings/notes at R CMD check level.

PietrH · 2023-02-22T11:06:55Z

Do you agree on this?

As per the blogpost you are following best practice. Like you I am looking forward to a different solution proposed by the community. So for the time being the mix of NSE and SE is just something we'll have to live with.

damianooldoni · 2023-02-22T11:13:41Z

Indeed @PietrH. Completely agree.

PietrH · 2023-02-22T11:42:32Z

One more small note: we should consider having an R-CMD Check version running R 3.6 to catch things like the new R4.0 |> pipe so we don't break backwards support

I can make an issue if you like

damianooldoni · 2023-02-22T11:46:20Z

I was already thinking about this. But indeed, I wanted to close this PR first. So, making a issue is good: see #194

damianooldoni added 6 commits February 8, 2023 14:51

Avoid errors with na_if() - dplyr 1.1.0

777e12b

See https://dplyr.tidyverse.org/news/index.html#vctrs-1-1-0

Update Roxygen

c8a5649

Solve warning about .data in tidyselect exprs

8f1c2bf

Explicit requirement dplyr 1.1.0

064bfee

Add Pieter Huybrechts to Authors@R

955162b

Bump version

fb1ac9b

damianooldoni requested a review from PietrH February 14, 2023 19:44

PietrH reviewed Feb 17, 2023

View reviewed changes

PietrH assigned damianooldoni Feb 17, 2023

PietrH mentioned this pull request Feb 17, 2023

Improve code coverage #191

Merged

Use NA

4a17c98

damianooldoni added 5 commits February 20, 2023 18:11

Add .data$ back in group_by

f81e7a7

Return the wrong stationCol value via glue

d20077c

Add .data$ back in group_by expr

d7a7bb3

Use SE in relocate

dbf7d25

Use "data" to avoid note in unnest

544c426

Avoid using anonymous function

79126cf

It enables compatibility with R 3.6 Co-Authored-By: Pieter Huybrechts <48065851+PietrH@users.noreply.github.com>

damianooldoni requested a review from PietrH February 22, 2023 11:25

PietrH approved these changes Feb 22, 2023

View reviewed changes

damianooldoni mentioned this pull request Feb 22, 2023

Add R 3.6 to GitHub action to check package #194

Closed

damianooldoni merged commit df9de45 into main Feb 22, 2023

damianooldoni deleted the check-post-dplyr-upgrade branch February 22, 2023 11:46

damianooldoni mentioned this pull request Mar 30, 2023

Error while running get_custom_effort() #204

Closed

PietrH added a commit that referenced this pull request May 9, 2023

call .data$ to eliminate check notes, see #190

1a4b6cc

PietrH mentioned this pull request May 9, 2023

Use dplyr::mutate() instead of SQL for write_dwc() #207

Merged

damianooldoni mentioned this pull request Dec 9, 2024

Fix NOTES inbo/camtrapdp#144

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Upgrade package based on dplyr 1.1.0 release #190

Upgrade package based on dplyr 1.1.0 release #190

damianooldoni commented Feb 14, 2023

PietrH left a comment

PietrH Feb 17, 2023

PietrH Feb 17, 2023

damianooldoni Feb 20, 2023

PietrH Feb 17, 2023

damianooldoni Feb 20, 2023 •

edited

Loading

damianooldoni commented Feb 20, 2023

damianooldoni commented Feb 20, 2023

peterdesmet commented Feb 21, 2023

damianooldoni commented Feb 22, 2023

PietrH commented Feb 22, 2023

damianooldoni commented Feb 22, 2023

PietrH commented Feb 22, 2023

damianooldoni commented Feb 22, 2023

Upgrade package based on dplyr 1.1.0 release #190

Upgrade package based on dplyr 1.1.0 release #190

Conversation

damianooldoni commented Feb 14, 2023

PietrH left a comment

Choose a reason for hiding this comment

Code Coverage

NAMESPACE

Style

PietrH Feb 17, 2023

Choose a reason for hiding this comment

PietrH Feb 17, 2023

Choose a reason for hiding this comment

damianooldoni Feb 20, 2023

Choose a reason for hiding this comment

PietrH Feb 17, 2023

Choose a reason for hiding this comment

damianooldoni Feb 20, 2023 • edited Loading

Choose a reason for hiding this comment

damianooldoni commented Feb 20, 2023

damianooldoni commented Feb 20, 2023

peterdesmet commented Feb 21, 2023

damianooldoni commented Feb 22, 2023

Conclusion

PietrH commented Feb 22, 2023

damianooldoni commented Feb 22, 2023

PietrH commented Feb 22, 2023

damianooldoni commented Feb 22, 2023

damianooldoni Feb 20, 2023 •

edited

Loading