-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Storing unicode characters in SQL Server #215
Comments
Thanks. This is likely to be a problem with the odbc package, or the underlying ODBC driver for SQL server. @jimhester: Does this ring a bell? |
I actually found a workaround. The issue is that when R writes to SQL, it writes the strings as UTF-8 and stores them in a VARCHAR(255) column. SQL SERVER needs the characters to be UTF-16LE encoded and stored in an NVARCHAR(255) column. Thus, you need to both:
This seems like something that should still be handled by either DBI or ODBC or something though. require(odbc)
require(DBI)
require(dplyr)
require(stringr)
require(purrr)
# This function takes a string vector and turns it into a list of raw UTF-16LE bytes.
# These will be needed to load into SQL Server
convertToUTF16 <- function(s){
map(s, function(x) iconv(x,from="UTF-8",to="UTF-16LE",toRaw=TRUE) %>% unlist)
}
#creat a connection to a sql table
connectionString <- "DRIVER={ODBC Driver 13 for SQL Server}; SERVER=[Server]; Database=Test; trusted_connection=yes;"
con <- DBI::dbConnect(odbc::odbc(),
.connection_string = connectionString)
# Create the SQL Table. THIS IS REALLY IMPORTANT!
# If the dbWriteTable creates the table, then it will incorrectly
# make the Char column a VARCHAR and not an NVARCHAR
dbExecute(con, "CREATE TABLE UnicodeExample (ID INT, Char NVARCHAR(MAX))")
# our example data
testData <- data_frame(ID = c(1,2,3), Char = c("I", "❤","Apples"))
# we adjust the column with the UTF-8 strings to instead
# be a list column of UTF-16LE bytes
testData <- testData %>% mutate(Char = convertToUTF16(Char))
# write the table to the database
dbWriteTable(con, "UnicodeExample",testData,append=TRUE)
dbDisconnect(con) |
FWIW you can also use the dbWriteTable(con, "UnicodeExample", testData, field.types = c(Char = "NVARCHAR(MAX)")) |
That's good to know, thanks! |
@jimhester @jnolis I used above steps and able to push Japanese data from R/CSV to Microsoft sql DB successfully.
Below is the result of QueryDF2
|
Following
vector_nvarchar<-c(Filter(Negate(is.null),
(
lapply(testData,function(x){
if (is.character(x) ) c(
names(x),
paste0("NVARCHAR(",
max(
# nvarchar(max) gave error dbReadTable/dbGetQuery returns Invalid Descriptor Index error on SQL server
# https://github.com/r-dbi/odbc/issues/112
# so we compute the max
nchar(
iconv( #nchar doesn't work for UTF-8 : help (nchar)
Filter(Negate(is.null),x)
,"UTF-8","ASCII",sub ="x"
)
)
,na.rm = TRUE)
,")"
)
)
})
)
))
con= DBI::dbConnect(odbc::odbc(),.connection_string=xxxxt, encoding = 'UTF-8')
DBI::dbWriteTable(con,"UnicodeExample",testData, overwrite= TRUE, append=FALSE, field.types= vector_nvarchar)
DBI::dbGetQuery(con,iconv('select * from UnicodeExample')) |
I'm trying to write Unicode strings from R to SQL. Unfortunately, the Unicode characters only seem to work when I load the table back into R, and not when I view the table in SSMS or A different tool.
Successfully yields:
However, when I pull that table in SSMS:
I get two different characters:
It looks like the column stored in SQL is a varchar rather than nvarchar, and somehow the binary value of the particular heart character ends up being different than what I get when I insert a row from SSMS. Can the unicode characters be stored in a way that they can be used universally and not just in R?
The text was updated successfully, but these errors were encountered: