Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dbWriteTable() much slower than RMySQL, especially slow on Mac #125

Closed
anguspmitchell opened this issue Feb 10, 2019 · 9 comments
Closed
Milestone

Comments

@anguspmitchell
Copy link

@anguspmitchell anguspmitchell commented Feb 10, 2019

RMySQL is orders of magnitude faster than RMariaDB, especially for writes. Tested on both OSX and Amazon Linux. Reading/writing from an AWS RDS database, MySQL v5.6.39. Also Amazon Linux is much faster than Mac, although I guess that could be a product of network speed. That is also a concern because we do some development on Macs. The reason I don't just switch to RMySQL is that RMariaDB handles unicode characters better than RMySQL, which is another important consideration.

Script

There is a DBI::dbWriteTable() and DBI::dbConnect() in addition to RMariaDB::dbWriteTable() and RMariaDB::dbConnect(). To avoid any namespacing confusion, I ran everything with every combination of DBI, RMySQL, and RMariaDB.

Before running I uninstalled and reinstalled R packages.

I also installed mariadb-devel and mysql-devel on Linux and mariadb-connector-c and mysql-connector-c on Mac.

# detach("package:RMySQL", unload=T)
# detach("package:RMariaDB", unload=T)
# detach("package:DBI", unload=T)
# remove.packages(c("DBI", "RMySQL", "RMariaDB"))
# install.packages(c("DBI", "RMySQL", "RMariaDB"))

# Amazon AMI
# sudo yum install mariadb-devel
# sudo yum install mysql-devel

# OSX
# brew install mariadb-connector-c
# brew install mysql-connector-c

dbString <- "xxxxx.us-west-2.rds.amazonaws.com"
maria_mariaConn <- RMariaDB::dbConnect(RMariaDB::MariaDB(),
                                       host = dbString,
                                       user="xxx", password="xxx", dbname="xxx")
mysql_mysqlConn <- RMySQL::dbConnect(RMySQL::MySQL(),
                                     host = dbString,
                                     user="xxx", password="xxx", dbname="xxx")
maria_dbiConn <- DBI::dbConnect(RMariaDB::MariaDB(),
                                host = dbString,
                                user="xxx", password="xxx", dbname="xxx")
mysql_dbiConn <- DBI::dbConnect(RMySQL::MySQL(),
                                host = dbString,
                                user="xxx", password="xxx", dbname="xxx")


# Query Maria - Maria Function / Maria Conn
startTime = Sys.time()
rows_MariaFunc_MariaConn = RMariaDB::dbGetQuery(maria_mariaConn, "Select * From TestTable")
endTime = Sys.time()
queryTime_MariaFunc_MariaConn = (endTime - startTime)

# Query Maria - Maria Function / DBI Connection
startTime = Sys.time()
rows_MariaFunc_MariaDBIConn = RMariaDB::dbGetQuery(maria_dbiConn, "Select * From TestTable")
endTime = Sys.time()
queryTime_MariaFunc_MariaDBIConn = (endTime - startTime)

# Query Maria - DBI Function / Maria Connection
startTime = Sys.time()
rows_DBIFunc_MariaConn = DBI::dbGetQuery(maria_mariaConn, "Select * From TestTable")
endTime = Sys.time()
queryTime_DBIFunc_MariaConn = (endTime - startTime)

# Query Maria - DBI Function / DBI Connection
startTime = Sys.time()
rows_DBIFunc_MariaDBIConn = DBI::dbGetQuery(maria_dbiConn, "Select * From TestTable")
endTime = Sys.time()
queryTime_DBIFunc_MariaDBIConn = (endTime - startTime)

# Query MySQL - MySQL conn
startTime = Sys.time()
rows_MySQLConn = DBI::dbGetQuery(mysql_mysqlConn, "Select * From TestTable")
endTime = Sys.time()
queryTime_MySQLConn = (endTime - startTime)

# Query MySQL - DBI conn
startTime = Sys.time()
rows_MySQLDBIConn = DBI::dbGetQuery(mysql_dbiConn, "Select * From TestTable")
endTime = Sys.time()
queryTime_MySQLDBIConn = (endTime - startTime)




# Cut down rows to make it run faster
rowsMySQL1000 = rows_MySQLConn[1:1000,]
rowsMaria1000 = rows_MariaFunc_MariaConn[1:1000,]

# Write Maria - Maria Func / Maria Conn
startTime = Sys.time()
RMariaDB::dbWriteTable(maria_mariaConn, "TestTable2", rowsMaria1000, row.names=FALSE, append=TRUE)
endTime = Sys.time()
writeTime_MariaFunc_MariaConn = (endTime - startTime)

# Write Maria - Maria Func / DBI Conn
startTime = Sys.time()
RMariaDB::dbWriteTable(maria_dbiConn, "TestTable2", rowsMaria1000, row.names=FALSE, append=TRUE)
endTime = Sys.time()
writeTime_MariaFunc_MariaDBIConn = (endTime - startTime)

# Write Maria - DBI Func / Maria Conn
startTime = Sys.time()
DBI::dbWriteTable(maria_mariaConn, "TestTable2", rowsMaria1000, row.names=FALSE, append=TRUE)
endTime = Sys.time()
writeTime_DBIFunc_MariaConn = (endTime - startTime)

# Write Maria - DBI Func / DBI Conn
startTime = Sys.time()
DBI::dbWriteTable(maria_dbiConn, "TestTable2", rowsMaria1000, row.names=FALSE, append=TRUE)
endTime = Sys.time()
writeTime_DBIFunc_MariaDBIConn = (endTime - startTime)


# Write MySQL - MySQL Func / MySQL Conn
startTime = Sys.time()
RMySQL::dbWriteTable(mysql_mysqlConn, "TestTable2", rowsMySQL1000, row.names=FALSE, append=TRUE)
endTime = Sys.time()
writeTime_MySQLFunc_MySQLConn = (endTime - startTime)

# Write MySQL - MySQL Func / DBI Conn
startTime = Sys.time()
RMySQL::dbWriteTable(mysql_dbiConn, "TestTable2", rowsMySQL1000, row.names=FALSE, append=TRUE)
endTime = Sys.time()
writeTime_MySQLFunc_MySQLDBIConn = (endTime - startTime)

# Write MySQL - DBI Func / MySQL Conn
startTime = Sys.time()
DBI::dbWriteTable(mysql_mysqlConn, "TestTable2", rowsMySQL1000, row.names=FALSE, append=TRUE)
endTime = Sys.time()
writeTime_DBIFunc_MySQLConn = (endTime - startTime)

# Write MySQL - DBI Func / DBI Conn
startTime = Sys.time()
DBI::dbWriteTable(mysql_dbiConn, "TestTable2", rowsMySQL1000, row.names=FALSE, append=TRUE)
endTime = Sys.time()
writeTime_DBIFunc_MySQLDBIConn = (endTime - startTime)

print(paste("queryTime_MariaFunc_MariaConn", as.numeric(queryTime_MariaFunc_MariaConn, units="secs")))
print(paste("queryTime_MariaFunc_DBIConn", as.numeric(queryTime_MariaFunc_MariaDBIConn, units="secs")))
print(paste("queryTime_DBIFunc_MariaConn", as.numeric(queryTime_DBIFunc_MariaConn, units="secs")))
print(paste("queryTime_DBIFunc_MariaDBIConn", as.numeric(queryTime_DBIFunc_MariaDBIConn, units="secs")))

print(paste("queryTime_MySQLConn", as.numeric(queryTime_MySQLConn, units="secs")))
print(paste("queryTime_MySQLDBIConn", as.numeric(queryTime_MySQLDBIConn, units="secs")))


print(paste("writeTime_MariaFunc_MariaConn", as.numeric(writeTime_MariaFunc_MariaConn, units="secs")))
print(paste("writeTime_MariaFunc_MariaDBIConn", as.numeric(writeTime_MariaFunc_MariaDBIConn, units="secs")))
print(paste("writeTime_DBIFunc_MariaConn", as.numeric(writeTime_DBIFunc_MariaConn, units="secs")))
print(paste("writeTime_DBIFunc_MariaDBIConn", as.numeric(writeTime_DBIFunc_MariaDBIConn, units="secs")))

print(paste("writeTime_MySQLFunc_MySQLConn", as.numeric(writeTime_MySQLFunc_MySQLConn, units="secs")))
print(paste("writeTime_MySQLFunc_MySQLDBIConn", as.numeric(writeTime_MySQLFunc_MySQLDBIConn, units="secs")))
print(paste("writeTime_DBIFunc_MySQLConn", as.numeric(writeTime_DBIFunc_MySQLConn, units="secs")))
print(paste("writeTime_DBIFunc_MySQLDBIConn", as.numeric(writeTime_DBIFunc_MySQLDBIConn, units="secs")))


print(paste("RMariaDB", packageVersion("RMariaDB")))
print(paste("RMariaDB", packageVersion("RMySQL")))
print(paste("R.Version Platform", R.Version()$platform))
print(paste("R.Version String", R.Version()$version.string))

Amazon Linux Output

RMySQL queries are 1.2x to 3x faster than RMariaDB
RMySQL writes are 10x to 25x faster than RMariaDB

[1] "queryTime_MariaFunc_MariaConn 0.721820831298828"
[1] "queryTime_MariaFunc_DBIConn 0.30603814125061"
[1] "queryTime_DBIFunc_MariaConn 0.382649660110474"
[1] "queryTime_DBIFunc_MariaDBIConn 0.300202369689941"
[1] "queryTime_MySQLConn 0.241434335708618"
[1] "queryTime_MySQLDBIConn 0.263653039932251"
[1] "writeTime_MariaFunc_MariaConn 1.173743724823"
[1] "writeTime_MariaFunc_MariaDBIConn 1.25841975212097"
[1] "writeTime_DBIFunc_MariaConn 1.1358757019043"
[1] "writeTime_DBIFunc_MariaDBIConn 1.13124299049377"
[1] "writeTime_MySQLFunc_MySQLConn 0.0569257736206055"
[1] "writeTime_MySQLFunc_MySQLDBIConn 0.113566160202026"
[1] "writeTime_DBIFunc_MySQLConn 0.0538115501403809"
[1] "writeTime_DBIFunc_MySQLDBIConn 0.0567092895507812"
[1] "RMariaDB 1.0.6"
[1] "RMariaDB 0.10.15"
[1] "R.Version Platform x86_64-koji-linux-gnu"
[1] "R.Version String R version 3.4.3 (2017-11-30)"
$ yum list | grep maria
mariadb-devel.x86_64                   1:5.5.60-1.amzn2              @amzn2-core
mariadb-libs.x86_64                    1:5.5.60-1.amzn2              installed  
mariadb.x86_64                         1:5.5.60-1.amzn2              amzn2-core 
mariadb-bench.x86_64                   1:5.5.60-1.amzn2              amzn2-core 
mariadb-embedded.x86_64                1:5.5.60-1.amzn2              amzn2-core 
mariadb-embedded-devel.x86_64          1:5.5.60-1.amzn2              amzn2-core 
mariadb-libs.i686                      1:5.5.60-1.amzn2              amzn2-core 
mariadb-server.x86_64                  1:5.5.60-1.amzn2              amzn2-core 
mariadb-test.x86_64                    1:5.5.60-1.amzn2              amzn2-core
$ yum list | grep mysql
apr-util-mysql.x86_64                  1.6.1-5.amzn2.0.2             amzn2-core 
dovecot-mysql.x86_64                   1:2.2.10-8.amzn2.0.2          amzn2-core 
freeradius-mysql.x86_64                3.0.13-9.amzn2                amzn2-core 
libdbi-dbd-mysql.x86_64                0.8.3-16.amzn2.0.1            amzn2-core 
mysql-connector-java.noarch            1:5.1.25-3.amzn2              amzn2-core 
mysql-connector-odbc.x86_64            5.2.5-7.amzn2                 amzn2-core 
pcp-pmda-mysql.x86_64                  3.12.2-5.amzn2                amzn2-core 
php-mysql.x86_64                       5.4.16-43.amzn2               amzn2-core 
php-mysqlnd.x86_64                     5.4.16-45.amzn2.0.6           amzn2-core 
qt-mysql.i686                          1:4.8.5-15.amzn2.0.3          amzn2-core 
qt-mysql.x86_64                        1:4.8.5-15.amzn2.0.3          amzn2-core 
qt5-qtbase-mysql.i686                  5.9.2-3.amzn2.0.1             amzn2-core 
qt5-qtbase-mysql.x86_64                5.9.2-3.amzn2.0.1             amzn2-core 
redland-mysql.x86_64                   1.0.16-6.amzn2.0.1            amzn2-core 
rsyslog-mysql.x86_64                   8.24.0-16.amzn2.6.1           amzn2-core

OSX Output

RMySQL queries are similar to RMariaDB
RMySQL writes are 100x to 200x faster than RMariaDB

[1] "queryTime_MariaFunc_MariaConn 26.3695220947266"
[1] "queryTime_MariaFunc_DBIConn 47.9590289592743"
[1] "queryTime_DBIFunc_MariaConn 13.6953809261322"
[1] "queryTime_DBIFunc_MariaDBIConn 10.8443579673767"
[1] "queryTime_MySQLConn 16.7168970108032"
[1] "queryTime_MySQLDBIConn 29.732666015625"
[1] "writeTime_MariaFunc_MariaConn 106.188654899597"
[1] "writeTime_MariaFunc_MariaDBIConn 103.038119077682"
[1] "writeTime_DBIFunc_MariaConn 100.028841018677"
[1] "writeTime_DBIFunc_MariaDBIConn 98.1016211509705"
[1] "writeTime_MySQLFunc_MySQLConn 1.18589186668396"
[1] "writeTime_MySQLFunc_MySQLDBIConn 0.954946041107178"
[1] "writeTime_DBIFunc_MySQLConn 0.477998971939087"
[1] "writeTime_DBIFunc_MySQLDBIConn 0.491441965103149"
[1] "RMariaDB 1.0.6"
[1] "RMariaDB 0.10.16"
[1] "R.Version Platform x86_64-apple-darwin15.6.0"
[1] "R.Version String R version 3.5.0 (2018-04-23)"
-> brew list --versions | grep mysql
mysql-connector-c 6.1.11
-> brew list --versions | grep mariadb
mariadb-connector-c 3.0.8
-> pip freeze | grep mysql
mysql-connector-python==2.0.4
-> pip freeze | grep mariadb
@anguspmitchell anguspmitchell changed the title dbWriteTable() slow on Mac, much slower than RMySQL dbWriteTable() much slower than RMySQL, especially slow on Mac Feb 10, 2019
@krlmlr
Copy link
Member

@krlmlr krlmlr commented Mar 3, 2019

Thanks for the detailed benchmark. It strikes me odd that writing times are much much lower for RMySQL than its reading times. This can happen if data is written asynchronously.

@jeroen: Any ideas?

@ben519
Copy link

@ben519 ben519 commented Apr 26, 2019

Sounds similar to #104

@rspreafico-vir
Copy link

@rspreafico-vir rspreafico-vir commented May 24, 2019

Confirming huge performance difference between RMySQL and RMariaDB

@ben519
Copy link

@ben519 ben519 commented May 24, 2019

@rspreafico-vir Can you elaborate on your setup? Where is your database located (e.g. Google Cloud)? Are you on a mac? Thx

@rspreafico-vir
Copy link

@rspreafico-vir rspreafico-vir commented May 24, 2019

Yes, I am connecting from a Mac to an AWS RDS hosted MySQL database. C connector installed. Used RMariaDB to write ~3 million rows, never terminated. In a for loop, ~1,000 rows at a time, would take 2 min for each cycle. With RMySQL, in 5 min all 3 million rows had been written.

@antoine-sachet
Copy link

@antoine-sachet antoine-sachet commented Oct 9, 2019

Could be linked to the generation of the SQL strings rather than really execution time. Creating large strings can be expensive. The dbQuoteXXX functions might also play a role.

@nbarsch

This comment has been minimized.

@myoung3
Copy link

@myoung3 myoung3 commented Oct 20, 2020

fyi I've found it's much faster to write a file to disk and then use dbWriteTable on the file, rather than using a data.frame as the input to dbWriteTable. I believe this has something to do with chunking: dbWriteTable will append rows one line at a time when using a data.frame input, but will batch add lines when using a file input. I could be wrong though--it's been over a year since I looked into this.

@krlmlr krlmlr modified the milestones: 1.1.0, 1.2.0 Jan 5, 2021
@krlmlr krlmlr modified the milestones: 1.2.0, 1.1.2 Aug 24, 2021
@krlmlr
Copy link
Member

@krlmlr krlmlr commented Sep 2, 2021

Planning to release an update that uses LOAD DATA LOCAL INFILE very soon. Closing this issue in favor of #11.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants