Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

R script execution slow after package installation #95

Closed
rmeans opened this issue Nov 19, 2021 · 4 comments
Closed

R script execution slow after package installation #95

rmeans opened this issue Nov 19, 2021 · 4 comments
Assignees

Comments

@rmeans
Copy link

rmeans commented Nov 19, 2021

This issue may not be directly related to this SQLMLUTILS, but maybe you would be able to provide some context or a potential solution.

As we install more libraries, especially tidyverse which installs 70+ sub packages, execution of even the most basic R script goes from sub second execution to 7+ seconds.

Here is the installation script:

library(sqlmlutils)

connection <- connectionInfo(
  server   = "XXXXXX",
  database = "XXXXXX")

r = getOption("repos")
r["CRAN"] = "https://mran.microsoft.com/snapshot/2019-02-01"
options(repos = r)

pkgs <- c("tictoc", "data.table", "tidyverse", "zoo", "DBI")
sql_install.packages(connectionString = connection, pkgs = pkgs, verbose = TRUE, scope = "PUBLIC")

The R script I am testing with is:

EXECUTE sp_execute_external_script @language = N'R'
    , @script = N'
a <- 1
b <- 2
c <- a/b
d <- a*b
print(c(c, d))
'

As a separate test on the same server to rule out any issues with the packages, I installed the packages using the process documented for SQL Server 2017 which installs packages globally for all databases, the script executes in sub seconds times.

Script to test global installation of packages:

r = getOption("repos")
r["CRAN"] = "https://mran.microsoft.com/snapshot/2019-02-01"
options(repos = r)

pkgs <- c("tictoc", "data.table", "tidyverse", "zoo", "DBI")
install.packages(pkgs)

This would be fine, but our plan is to run the database in SQL Managed Instance on Azure and this global approach is not an option. By the way, I did try the test of running the simple script before and after installing packages on Azure and experienced the same behavior, less than a second before packages, 7+ seconds after packages.

Any guidance you could provide would be great!

@jarupatj
Copy link
Contributor

This is a known issue. I think one of you 70+ packages might contains source packages like BH. Can you see if you have them? If so, can you try remove the source package?
Source package like BH has a lot of files. We have an existing known issue that the query performance will degrade when there a lot of files in the installed external libraries. The mitigation right now is to remove the source package from the list of installed packages. We cannot run source package in SQL Server or SQL MI anyway.

We are working on the fix for this with high priority.

@mcollis2
Copy link

Is there another thread reporting this issue for tracking, or can we follow this issue here? Thanks!

@jarupatj
Copy link
Contributor

Feel free to follow this thread. The fix will be in SQL Server and we are targeting mid next year.
In the mean time, I suggest going through the list of packages and remove source packages like BH.

@seantleonard
Copy link
Contributor

The latest release of SQLMLUtils for R (version 1.2.1 - GitHub Releases) is published with changes from #99 to improve execution time for this scenario.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants