Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Character encoding in dbWriteTable in R #57

Closed
jghigginson78 opened this issue Mar 13, 2015 · 9 comments
Closed

Character encoding in dbWriteTable in R #57

jghigginson78 opened this issue Mar 13, 2015 · 9 comments
Milestone

Comments

@jghigginson78
Copy link

I'm failing to get correct (Japanese) characters when writing a dataframe to MySQL using dbWriteTable. For example if I read the following lines from a CSV file into R, the characters display correctly, but when written to a MySQL table they appear as series of ???. The text lines..

"00043","201411","こんにゃく","おかずその他","普及",2,2
"00043","201411","こんにゃく","おかずその他","通年",1,1
"00043","201411","こんにゃく","その他こんにゃく","-",1,1
"00043","201411","こんにゃく","つき白滝","中質",4,3
"00043","201411","こんにゃく","つき白滝","普及",4,1

The R code..

library(RMySQL)

DF <- read.csv('fpo.csv', stringsAsFactors=FALSE)
dbase <- 'wnp_2_3_new'
con <- dbConnect(MySQL(), user='root', password=pw, dbname=dbase)
rs <- dbGetQuery(con, 'set character set utf8')
dbWriteTable(con, 'fpo_out', overwrite=TRUE, DF)
dbDisconnect(con)
@hadley
Copy link
Member

hadley commented Mar 13, 2015

Rather than using a csv, could you please include dput(DF)? And I'll also need the results of Sys.getlocale()

@hadley
Copy link
Member

hadley commented Mar 13, 2015

This works for me:

con <- dbConnect(MySQL())

df <- data.frame(x = 1, y = "佃煮惣菜")
dbWriteTable(con, 'test-utf8', df, temporary = TRUE)
dbReadTable(con, 'test-utf8')

dbDisconnect(con)

@hadley
Copy link
Member

hadley commented Mar 13, 2015

(I see the warning too - it's on my list to fix) Are you on linux?

@vaidyasm
Copy link

Hello Hadley,
I'm also having issue with inserting unicode characters.

Here are the outputs from some of your commands mentioned above:

> Sys.getlocale()
[1] "LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252"

> x <- "佃煮惣菜"
> x
[1] "佃煮惣菜"
> dput(x)
"<U+4F43><U+716E><U+60E3><U+83DC>"
> Encoding(x)
[1] "UTF-8"


> con <- dbConnect(MySQL())

> df <- data.frame(x = 1, y = "佃煮惣菜")
> dput(df)
structure(list(x = 1, y = structure(1L, .Label = "<U+4F43><U+716E><U+60E3><U+83DC>", class = "factor")), .Names = c("x", 
"y"), row.names = c(NA, -1L), class = "data.frame") 


> dbWriteTable(con, 'test_utf8', df, temporary = FALSE)
> dbReadTable(con, 'test_utf8')
  x                                y
1 1 <U+4F43><U+716E><U+60E3><U+83DC>

Mysql Workbench also reports

<U+4F43><U+716E><U+60E3><U+83DC>

for y.

SHOW CREATE TABLE test_utf8

results in

CREATE TABLE `test_utf8` (
    `row_names` TEXT NULL COLLATE 'utf8_unicode_ci',
    `x` DOUBLE NULL DEFAULT NULL,
    `y` TEXT NULL COLLATE 'utf8_unicode_ci'
)
COLLATE='utf8_unicode_ci'
ENGINE=InnoDB
;
DBI::dbGetQuery(con, "show variables like 'character_set_%'")

results in

             Variable_name                                        Value
1     character_set_client                                         utf8
2 character_set_connection                                         utf8
3   character_set_database                                         utf8
4 character_set_filesystem                                       binary
5    character_set_results                                         utf8
6     character_set_server                                         utf8
7     character_set_system                                         utf8
8       character_sets_dir C:\\bin\\mysql-server-5.5\\share\\charsets\\

I'm also having issue retrieving unicode characters.
On the table test_utf8 table, I inserted a row with row.names = 2, x = 2, and y = "OôEé" via Mysql Workbench.

> dbReadTable(con, 'test_utf8')
  x                                y
1 1 <U+4F43><U+716E><U+60E3><U+83DC>
2 2                           OôEé

In my.ini, I have

character-set-server=utf8
collation=utf8_unicode_ci
skip-character-set-client-handshake

Could you help me with pin-pointing the issue?

Thanks in advance.

@krlmlr
Copy link
Member

krlmlr commented Nov 11, 2015

Related: #93.

@valentas-kurauskas
Copy link

I also thought I had this issue, but it turned out this was because my MySQL database had the default latin1 charset. The following query fixes this:

ALTER DATABASE databasename CHARACTER SET utf8 COLLATE utf8_general_ci;

@sanjmeh
Copy link

sanjmeh commented Jun 13, 2017

@valentas-kurauskas : I am going to run the above command on my database (of course taking a backup of it before that). Will it solve the problem of R unable to write to the MySQL DB using INSERT - VALUES. This happens when R outputs a character vector with single quotes (sQoute)?

@sanjmeh
Copy link

sanjmeh commented Jun 13, 2017

Nopes. MySQL database is already on utf8_general_ci;
How to make sure R outputs the strings into the same character set. Here's a sample code that is rejected in MySQL on Ubuntu (running on AWS) with a character set problem. I strongly suspect it is not liking the single quotes used by R.

sql_text<-paste("INSERT (CLASS, DATE) VALUE (10", sQuote(now()); n<-dbExecute(db,sql_text)
The error I get is:

Error in .local(conn, statement, ...) : could not run statement: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '��2017-06-13 04:28:10’

@krlmlr
Copy link
Member

krlmlr commented Jul 5, 2017

Moved to rstats-db/RMariaDB (see link above).

@krlmlr krlmlr closed this as completed Jul 5, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants