Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

conflict_prefer() is slow #75

Closed
sanjmeh opened this issue Oct 22, 2022 · 17 comments
Closed

conflict_prefer() is slow #75

sanjmeh opened this issue Oct 22, 2022 · 17 comments

Comments

@sanjmeh
Copy link

sanjmeh commented Oct 22, 2022

After consistently facing delayed sourcing times - 2 to 4 seconds (of R files that are all just functions), I added tictoc::tic() and toc() in each block of my code to narrow down to where the delays are.

FInally the main delay was found in running a small bunch of conflicted::conflict_prefer(),

tictoc::tic("Loading conflicted lines in cron_functions.R")
conflicted::conflict_prefer("first",winner = "data.table",quiet = T)
conflicted::conflict_prefer("last",winner = "data.table",quiet = T)
conflicted::conflict_prefer("year",winner = "lubridate",quiet = T)
conflicted::conflict_prefer("month",winner = "lubridate",quiet = T)
conflicted::conflict_prefer("hour",winner = "lubridate",quiet = T)
conflicted::conflict_prefer("minute",winner = "lubridate",quiet = T)
conflicted::conflict_prefer("filter",winner = "dplyr",quiet = T)
tictoc::toc()

Loading packages of commonAPI.R: 0.001 sec elapsed
Loading packages using pacman in generalfunctions.: 0.001 sec elapsed
Running conflicted scripts in generalfunctions: 1.169 sec elapsed
Loading conflicted lines in cron_functions.R: 1.082 sec elapsed

So you see just those 6 lines causes a 1 second + delay in execution.

What's happening under the hood? or am I missing something?

@sanjmeh

This comment was marked as outdated.

@hadley

This comment was marked as outdated.

@sanjmeh

This comment was marked as outdated.

@hadley

This comment was marked as outdated.

@sanjmeh

This comment was marked as outdated.

@hadley
Copy link
Member

hadley commented Oct 31, 2022

Ok, if the actual data manipulation is no slower, then you shouldn't be benchmarking it. So the actual reprex is this:

library(conflicted)

load_packages <- function() {
  library(data.table, warn.conflicts = FALSE)
  library(dplyr, warn.conflicts = FALSE)
  library(lubridate, warn.conflicts = FALSE)
}

declare_conflicts <- function() {
  conflicted::conflict_prefer("shift",winner = "data.table",quiet = T)
  conflicted::conflict_prefer("last",winner = "dplyr",quiet = T)
  conflicted::conflict_prefer("first",winner = "data.table",quiet = T)
  conflicted::conflict_prefer("year",winner = "lubridate",quiet = T)
  conflicted::conflict_prefer("month",winner = "lubridate",quiet = T)
  conflicted::conflict_prefer("day",winner = "lubridate",quiet = T)
}

bench::mark(
  load_packages(),
  declare_conflicts(),
  check = FALSE
)
#> Warning: Some expressions had a GC in every iteration; so filtering is disabled.
#> # A tibble: 2 × 6
#>   expression               min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>          <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 load_packages()       35.4ms   36.7ms      27.2      21MB     25.2
#> 2 declare_conflicts()   67.6ms   71.3ms      13.9    10.1MB     23.9

Created on 2022-10-31 with reprex v2.0.2

Does that accurately capture the performance issue you are experiencing?

@sanjmeh
Copy link
Author

sanjmeh commented Oct 31, 2022

Yes indeed, that's a better reprex Hadley. 👍🏼
So I have loaded few more conflict_prefer() lines that I use often and re run the performance test you suggest.

declare_conflicts <- function() {
  conflicted::conflict_prefer("shift",winner = "data.table",quiet = T)
  conflicted::conflict_prefer("last",winner = "dplyr",quiet = T)
  conflicted::conflict_prefer("first",winner = "data.table",quiet = T)
  conflicted::conflict_prefer("year",winner = "lubridate",quiet = T)
  conflicted::conflict_prefer("month",winner = "lubridate",quiet = T)
  conflicted::conflict_prefer("day",winner = "lubridate",quiet = T)
  conflicted::conflict_prefer("filter",winner = "dplyr",quiet = T)
  conflicted::conflict_prefer("renderDataTable", "DT",quiet = T)
  conflicted::conflict_prefer("hour", "lubridate",quiet = T)
  conflicted::conflict_prefer("minute", "lubridate",quiet = T)
  conflicted::conflict_prefer("between", "dplyr",q = T)
}

And now we can see the declare_conflicts() function takes 1 second +.

bench::mark(
  load_packages(),
  declare_conflicts(),
  check = FALSE
)

I get the following:

expression               min   median `itr/sec` mem_alloc `gc/sec` n_itr  n_gc total_time result memory                  time           gc              
  <bch:expr>          <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl> <int> <dbl>   <bch:tm> <list> <list>                  <list>         <list>          
1 load_packages()     248.87ms 253.51ms     3.94     12.5MB    0         2     0   507.03ms <NULL> <Rprofmem [4,714 × 3]>  <bench_tm [2]> <tibble [2 × 3]>
2 declare_conflicts()    1.17s    1.17s     0.854    45.9MB    0.854     1     1      1.17s <NULL> <Rprofmem [17,111 × 3]> <bench_tm [1]> <tibble [1 × 3]>

Although 1.17 second is not bad as a loading time for an app but seems a very high overhead for just declaring conflicts. Could there be a better way to declare conflicts, in an efficient way?

Note: I have several .R files that are sourced in global.R of the shiny app. The best I could do was added a line if(sys.nframe() <=4) conflicted_prefer(...) to avoid nested sourcing and re-declaration of same conflicted functions. This helps keep a check on the total loading time. Is there a more efficient way?
Thanks

@hadley hadley changed the title Too much time to run conflict_prefer conflict_prefer() is slow Oct 31, 2022
@hadley
Copy link
Member

hadley commented Oct 31, 2022

I see a doubling in time, which is what I'd expect:

library(conflicted)

declare_conflicts1 <- function() {
  conflicted::conflict_prefer("shift",winner = "data.table",quiet = T)
  conflicted::conflict_prefer("last",winner = "dplyr",quiet = T)
  conflicted::conflict_prefer("first",winner = "data.table",quiet = T)
  conflicted::conflict_prefer("year",winner = "lubridate",quiet = T)
  conflicted::conflict_prefer("month",winner = "lubridate",quiet = T)
  conflicted::conflict_prefer("day",winner = "lubridate",quiet = T)
}

declare_conflicts2 <- function() {
  declare_conflicts1()
  conflicted::conflict_prefer("filter",winner = "dplyr",quiet = T)
  conflicted::conflict_prefer("renderDataTable", "DT",quiet = T)
  conflicted::conflict_prefer("hour", "lubridate",quiet = T)
  conflicted::conflict_prefer("minute", "lubridate",quiet = T)
  conflicted::conflict_prefer("between", "dplyr",q = T)
}


bench::mark(
  declare_conflicts1(),
  declare_conflicts2(),
  check = FALSE
)
#> # A tibble: 2 × 6
#>   expression                min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>           <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 declare_conflicts1()   56.4µs   58.9µs    16723.    25.6KB     52.3
#> 2 declare_conflicts2()  103.9µs  108.6µs     9078.    60.8KB     52.1

Created on 2022-10-31 with reprex v2.0.2

I think the problem is that every time you call conflict_prefer() it calls conflicts_register() which has to repeat a bunch of work. I think we could probably provide some way to delay that work until you've finished registering all preferences.

@sanjmeh
Copy link
Author

sanjmeh commented Oct 31, 2022

Alright. I will wait for further guidance on conflicts_register() and how we could delay its execution to the end.
I notice conflicts_register() is not a function in the user namespace so perhaps a fix from you is needed.
Thanks a lot.

@sanjmeh
Copy link
Author

sanjmeh commented Nov 2, 2022

I wonder how are you getting such a fast response.

expression               min   median `itr/sec` mem_alloc `gc/sec`
<bch:expr>          <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
1 load_packages()       35.4ms   36.7ms      27.2      21MB     25.2
2 declare_conflicts()   67.6ms   71.3ms      13.9    10.1MB     23.9

If you don't mind sharing your machine config in which you ran above @hadley, it would be great to know, as the performance is several orders better in your machine than mine (I got 1140 msec compared to your 71.3 msec). My load_ packages() was also several times slower.

Mine is an Ubuntu 20 server, 8 CPU cores, 16GB RAM. Hosted on Digital Ocean, Data center: Bengaluru.

@hadley
Copy link
Member

hadley commented Nov 2, 2022

I have a recent macbook pro with m1 chip.

@hadley
Copy link
Member

hadley commented Jan 9, 2023

Maybe syntax could be something like:

conflicted::conficts_register({
  conflicted::conflict_prefer("filter",winner = "dplyr",quiet = T)
  conflicted::conflict_prefer("renderDataTable", "DT",quiet = T)
  conflicted::conflict_prefer("hour", "lubridate",quiet = T)
  conflicted::conflict_prefer("minute", "lubridate",quiet = T)
  conflicted::conflict_prefer("between", "dplyr",q = T)
})

@hadley
Copy link
Member

hadley commented Jan 10, 2023

Or maybe it should be more like:

conflicted::conficts_favour({
  dplyr::filter,
  DT::renderDataTable,
  lubridate::hour,
  lubridate::minute,
  dplyr::between,
})

@sanjmeh
Copy link
Author

sanjmeh commented Jan 11, 2023

I re installed using remotes::install_github(repo = "r-lib/conflicted",ref = "9b9b8ea") but the namespace still dosn't show conficts_favour or conflicts_register. Is the repo not yet ready? I donot have much experience in installing packages from source code, so I apologise if I am doing it wrong.

@hadley
Copy link
Member

hadley commented Jan 11, 2023

I would do it with remotes::install_github("r-lib/conflicted#81"). But you'll also need to restart R.

@hadley
Copy link
Member

hadley commented Jan 30, 2023

I think the new conflicts_prefer() will solve your problems, and I think is better than my previously proposed solution.

@hadley hadley closed this as completed Jan 30, 2023
@sanjmeh
Copy link
Author

sanjmeh commented Feb 2, 2023

Thanks @hadley - i will test it let you know. We can close the issue for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants