Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dbWriteTable() too slow for remote databases and LOAD DATA LOCAL INFILE disabled by default #265

Closed
GitHunter0 opened this issue Apr 9, 2022 · 10 comments · Fixed by #267

Comments

@GitHunter0
Copy link

GitHunter0 commented Apr 9, 2022

Hey folks,

I believe it's a follow up to #162 and #11

DBI::dbWriteTable() is incredibly slow and crashes for large tables (100MB+) for my AWS RDS MySQL database.

LOAD DATA LOCAL INFILE is impressively faster but it is disabled by default since MySQL 8.0.

I can enable that to use LOAD DATA LOCAL INFILE within MySQL Workbench but I could not find a way to use it in R / DBI.
Therefore I opened an issue here r-dbi/DBI#385

Any update on this matter? What is the current recommended approach to import large datasets to remote databases?

Thank you

R version 4.1.1 (2021-08-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19044)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252 
[2] LC_CTYPE=Portuguese_Brazil.1252       
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    
system code page: 65001

attached base packages:
[1] stats     graphics  grDevices utils    
[5] datasets  methods   base     

other attached packages:
[1] DBI_1.1.2      RMariaDB_1.2.1
@GitHunter0 GitHunter0 changed the title dbWriteTable() too slow and LOAD DATA LOCAL INFILE disabled dbWriteTable() too slow for remote databases and LOAD DATA LOCAL INFILE disabled by default Apr 9, 2022
@krlmlr
Copy link
Member

krlmlr commented Apr 14, 2022

Thanks. Can you please share the entire code you use to connect to your database? Please edit passwords and other sensitive information.

@GitHunter0
Copy link
Author

GitHunter0 commented Apr 14, 2022

Of course, thank you for the feedback @krlmlr .

Please, consider this MWE:

PREPARATION

library(DBI)
library(RMariaDB)
library(nycflights13)
library(glue)
library(data.table)
library(dplyr)
library(fs)

# Connection to my AWS RDS MySQL Database
db_con_remote <- DBI::dbConnect(
    RMariaDB::MariaDB(), 
    user = "user", 
    password = "pass",
    port = as.integer(3306), 
    dbname = "dbname",
    host = "host"
)

table_name <- "table_test" 

# df has 38.8 MB with 336,776 rows and 19 columns.
df <- nycflights13::flights

METHOD 1: copy table using DBI::dbWriteTable()

  • Although df has just 38.8 MB, this method takes several hours and still does not finish the task, until I'm forced to stop R.
  • It is working only for very small tables like datasets::mtcars (7 KB)
DBI::dbWriteTable(db_con_remote, table_name, df, overwrite=TRUE)

METHOD 2: copy table using SQL LOAD DATA LOCAL INFILE command

  • Impressively faster method, it takes only 10 seconds to complete the same task
# First I use this trick to create a boilerplate for the table because 'LOAD
# DATA LOCAL INFILE' command requires a preexisting table.
DBI::dbWriteTable(db_con_remote, table_name, df[1,], overwrite=TRUE)

# Remove all rows but keep columns and their types.
DBI::dbExecute(db_con_remote, glue("TRUNCATE {table_name};"))

# Save table as .csv file
table_csv_path <- glue::glue("./{table_name}.csv") |> fs::path_abs()
data.table::fwrite(df, table_csv_path)

# Then I run this query in MySQL Workbench:
LOAD DATA LOCAL INFILE '<<table_csv_path>>' 
INTO TABLE <<table_name>>
CHARACTER SET utf8mb4 
FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"' ESCAPED BY '"' 
LINES  TERMINATED BY '\r\n'  
IGNORE 1 LINES; 
  • The only problem with METHOD 2 is that I could not find a way to use it using solely R.
  • This command returns Error: Error executing query: Load data local infile forbidden because since MySQL8.0 Local Data Loading is forbidden by default.
DBI::dbExecute(db_con_remote,
  glue::glue(
    r"[
      LOAD DATA LOCAL INFILE '<<table_csv_path>>' 
      INTO TABLE <<table_name>>
      CHARACTER SET utf8mb4 
      FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"' ESCAPED BY '"' 
      LINES  TERMINATED BY '\r\n'  
      IGNORE 1 LINES; 
    ]",
    .open = "<<", .close = ">>",
  )
)

@krlmlr
Copy link
Member

krlmlr commented Apr 15, 2022

Thanks. RMariaDB has gained a load_data_local_infile argument for dbConnect(), see https://rmariadb.r-dbi.org/reference/dbconnect-mariadbdriver-method.

What is the exact error message returned by DBI::dbExecute() in your example? How can we document this better?

@GitHunter0
Copy link
Author

GitHunter0 commented Apr 15, 2022

What is the exact error message returned by DBI::dbExecute() in your example?

Here's a screenshot:
image

Thanks. RMariaDB has gained a load_data_local_infile argument for dbConnect()

Thank you, that's awesome news!
However it is still not working , the DBI::dbExecute() of the MWE below is returning the same message Error: Error executing query: Load data local infile forbidden.

library(DBI)
library(RMariaDB)
library(nycflights13)
library(glue)
library(data.table)
library(dplyr)
library(fs)

# Connection to either my AWS RDS MySQL Database or my Local MySQL database. 
db_con <- DBI::dbConnect(
    RMariaDB::MariaDB(), 
    user = "user", 
    password = "pass",
    port = as.integer(3306), 
    dbname = "dbname",
    host = "host",
    load_data_local_infile = TRUE
)

table_name <- "table_test" 

# df has 38.8 MB with 336,776 rows and 19 columns.
df <- nycflights13::flights

# First I use this trick to create a boilerplate for the table because 'LOAD
# DATA LOCAL INFILE' command requires a preexisting table.
DBI::dbWriteTable(db_con, table_name, df[1,], overwrite=TRUE)

# Remove all rows but keep columns and their types.
DBI::dbExecute(db_con, glue("TRUNCATE {table_name};"))

# Save table as .csv file
table_csv_path <- glue::glue("./{table_name}.csv") |> fs::path_abs()
data.table::fwrite(df, table_csv_path)

DBI::dbExecute(db_con,
  glue::glue(
    r"[
      LOAD DATA LOCAL INFILE '<<table_csv_path>>' 
      INTO TABLE <<table_name>>
      CHARACTER SET utf8mb4 
      FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"' ESCAPED BY '"' 
      LINES  TERMINATED BY '\r\n'  
      IGNORE 1 LINES; 
    ]",
    .open = "<<", .close = ">>",
  )
)

I tested for both the AWS remote server and my local MySQL database and neither worked, any idea why?
(MySQL Workbench with option OPT_LOCAL_INFILE=1 works normally)

@krlmlr
Copy link
Member

krlmlr commented Apr 16, 2022

Thanks. Can you please try

remotes::install_github("r-dbi/RMariaDB#267")

?

@GitHunter0
Copy link
Author

Thanks. Can you please try

remotes::install_github("r-dbi/RMariaDB#267")

Awesome @krlmlr , it's working perfectly now both local and remote, thanks a lot!

@GitHunter0
Copy link
Author

@krlmlr , do you plan to add LOAD DATA LOCAL INFILE method to DBI::dbWriteTable() backend? It would make that command work properly for remote databases too.

@krlmlr
Copy link
Member

krlmlr commented Apr 18, 2022

I would expect it to work already. What does the error message for dbWriteTable() look like? Please post complete code (minus database credentials) and the error.

@krlmlr krlmlr reopened this Apr 18, 2022
@GitHunter0
Copy link
Author

I would expect it to work already.

Sorry, I checked using the code below and it is indeed working flawlessly. Thanks again @krlmlr .

library(DBI)
library(RMariaDB)
library(nycflights13)

# Connection to my AWS RDS MySQL Database
db_con_remote <- DBI::dbConnect(
    RMariaDB::MariaDB(), 
    user = "user", 
    password = "pass",
    port = as.integer(3306), 
    dbname = "dbname",
    host = "host"
)

table_name <- "table_test" 

# df has 38.8 MB with 336,776 rows and 19 columns.
df <- nycflights13::flights

DBI::dbWriteTable(db_con_remote, table_name, df, overwrite=TRUE)

@krlmlr krlmlr closed this as completed May 1, 2022
@github-actions
Copy link

github-actions bot commented May 2, 2023

This old thread has been automatically locked. If you think you have found something related to this, please open a new issue and link to this old issue if necessary.

@github-actions github-actions bot locked and limited conversation to collaborators May 2, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants