Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Creating a Unicode where-clause with N-prefix #4442

Closed
villebro opened this issue Jan 13, 2019 · 20 comments
Closed

Creating a Unicode where-clause with N-prefix #4442

villebro opened this issue Jan 13, 2019 · 20 comments
Labels
bug Something isn't working datatypes things to do with database types, like VARCHAR and others SQL Server Microsoft SQL Server, e.g. mssql
Milestone

Comments

@villebro
Copy link
Contributor

villebro commented Jan 13, 2019

I'm not sure if this is a SqlAlchemy issue or something with the dialect/connector, but is there some way to force the query compiler to prefix a Unicode string with N for e.g. MSSQL? If a column has been defined as UnicodeText (also tried NVARCHAR, same result), one would expect a where clause on that column to be prefixed with an N for MSSQL. However, this doesn't automatically seem to happen, and I haven't been able to find any documentation on how to achieve this. Below is an example to illustrate the problem:

import sqlalchemy as sa
from sqlalchemy.sql import column, table

mssql_engine = sa.create_engine("mssql+pyodbc://XXX")

col = column('unicode_col', type_=sa.types.UnicodeText())
unicode_where_str = '🐍'

select = sa.select([col])
select = select.where(col == unicode_where_str)
select = select.select_from(table('tbl'))
print(select.compile(mssql_engine, compile_kwargs={'literal_binds': True}))

This yields

SELECT unicode_col 
FROM tbl 
WHERE unicode_col = '🐍'

while one would expect the unicode type to put an N-prefix on the where clause, like so:

SELECT unicode_col 
FROM tbl 
WHERE unicode_col = N'🐍'

Is there some way to either automatically get the N-prefix, or manually inject it, e.g. by using TypeDecorator?

@zzzeek
Copy link
Member

zzzeek commented Jan 13, 2019

you could use TypeDecorator.bind_expression (https://docs.sqlalchemy.org/en/latest/core/custom_types.html?highlight=bind_expression#sqlalchemy.types.TypeDecorator.bind_expression), however the database driver, e.g. pyodbc or pymssql, does this for you when the statement is executed (edit: zzzeek is stupidly missing that this is with literal binds). What is your actual use case?

@zzzeek zzzeek added question issue where a "fix" on the SQLAlchemy side is unlikely, hence more of a usage question SQL Server Microsoft SQL Server, e.g. mssql datatypes things to do with database types, like VARCHAR and others labels Jan 13, 2019
@zzzeek
Copy link
Member

zzzeek commented Jan 13, 2019

more appropriately it would be literal_processor()

@zzzeek zzzeek added bug Something isn't working and removed question issue where a "fix" on the SQLAlchemy side is unlikely, hence more of a usage question labels Jan 13, 2019
@zzzeek zzzeek added this to the 1.2.x milestone Jan 13, 2019
@zzzeek
Copy link
Member

zzzeek commented Jan 13, 2019

workaround:

# coding: utf-8

from sqlalchemy import TypeDecorator

import sqlalchemy as sa
from sqlalchemy.sql import column, table

class MSUnicode(TypeDecorator):
    impl = sa.types.UnicodeText

    def literal_processor(self, dialect):
        def process(value):

            value = value.replace("'", "''")

            if dialect.identifier_preparer._double_percents:
                value = value.replace("%", "%%")

            return "N'%s'" % value

        return process


mssql_engine = sa.create_engine("mssql+pyodbc://XXX")

col = column('unicode_col', type_=MSUnicode())
unicode_where_str = '🐍'

select = sa.select([col])
select = select.where(col == unicode_where_str)
select = select.select_from(table('tbl'))
print(select.compile(mssql_engine, compile_kwargs={'literal_binds': True}))

so I may target this at 1.3 or 1.2.x not sure yet

@sqla-tester
Copy link
Collaborator

Mike Bayer has proposed a fix for this issue in the master branch:

Render N'' for SQL Server unicode literals https://gerrit.sqlalchemy.org/1095

@villebro
Copy link
Contributor Author

villebro commented Jan 14, 2019

Btw, Oracle also has the same N-prefix, but based on my limited testing seems to do automatic casting. But you may want to target Oracle with the change, too, as it seems to be the recommended way of handling NCHAR and NVARCHAR on Oracle.

@zzzeek
Copy link
Member

zzzeek commented Jan 14, 2019

I think I need to doa unicode literal round trip test to really confirm on all backends

@zzzeek
Copy link
Member

zzzeek commented Jan 14, 2019

well Im unable to show that the "N" makes any difference on Oracle OR SQL server. can you please illustrate the problem you're having with SQL Server itself as well as what Python driver / database version you are using ?

@villebro
Copy link
Contributor Author

I verified this on both MSSQL 2008R2 and 2017. Below I create a table on the 2017 instance (database has SQL_Latin1_General_CP1_CI_AS collation) with an NVARCHAR and VARCHAR column, insert the snake from the previous example (U+1F40D) both with and without N-prefixing into both columns:

screenshot 2019-01-14 at 21 06 45

The select produces the following:

screenshot 2019-01-14 at 21 06 52

As can be seen, only the N-prefixed snake that was inserted into the NVARCHAR column turns up in its original shape. Interestingly running a select without N-prefixing is able to find the mutated snake, NOT the N-prefixed snake:

screenshot 2019-01-14 at 21 41 40

screenshot 2019-01-14 at 21 41 47

Running the same with N-prefixing turns up the correct snake. I ran the same via SSMS on 2008R2 and also on Superset (Python 3.6.4 with SQLAlchemy==1.2.16, pymssql==2.1.4) with similar results:

screenshot 2019-01-14 at 21 32 06

The underlying query from Superset rendered by SQLAlchemy:
screenshot 2019-01-14 at 21 32 23

I read somewhere that MSSQL 2019 is the first version of MSSQL to introduce native UTF8 support (🤣🤣🤣), so the N-prefix might be optional nowadays, but on older versions it seems to be necessary.

@zzzeek
Copy link
Member

zzzeek commented Jan 14, 2019

OK my SQL Server here is the linux one, this is the version string: Microsoft SQL Server 2017 (RTM-CU11) (KB4462262) - 14.0.3038.14 (X64) \n\tSep 14 2018 13:53:44 \n\tCopyright (C) 2017 Microsoft Corporation\n\tDeveloper Edition (64-bit) on Linux (CentOS Linux 7 (Core))

The database looks like I created it as: "CREATE DATABASE test;" e.g. i didn't specify any collation.

For these string statements you are making, are you running those in with python and the driver as well? or is this cut-and-paste into a GUI ? i wonder if that affects things ?

@zzzeek
Copy link
Member

zzzeek commented Jan 14, 2019

well let me just try py2k as well to make sure things aren't working smoothly, in trying to get this to fail

@zzzeek
Copy link
Member

zzzeek commented Jan 14, 2019

I can't get it to fail, here's more or less the same test, py3k/py2k and pymssql/pyodbc, passes

#! coding: utf-8

import pymssql

conn = pymssql.connect(
    user="scott", password="tiger^5HHH", host="mssql2017:1433", database="test"
)


cursor = conn.cursor()

cursor.execute(
    """
CREATE TABLE t (
    id INT,
    x NVARCHAR(255) NULL,
    y VARCHAR(255) NULL
)
"""
)

cursor.execute("DELETE FROM t")
cursor.execute("INSERT INTO t (id, x, y) VALUES (1, N'réveillé', N'réveillé')")
cursor.execute("INSERT INTO t (id, x, y) VALUES (2, 'réveillé', 'réveillé')")

# note in Python 2, these are bytestrings.  pymssql doesn't accept
# u"" on py2k.
for stmt in [
    "SELECT t.* FROM t WHERE t.x = 'réveillé'",
    "SELECT t.* FROM t WHERE t.x = N'réveillé'",
    "SELECT t.* FROM t WHERE t.y = 'réveillé'",
    "SELECT t.* FROM t WHERE t.y = N'réveillé'",
]:
    cursor.execute(stmt)
    rows = cursor.fetchall()
    assert rows == [
        (1, u"réveillé", u"réveillé"),
        (2, u"réveillé", u"réveillé"),
    ], rows

in any case the N certainly isn't making things worse so I'll likely just push it through.

@zzzeek
Copy link
Member

zzzeek commented Jan 14, 2019

hey let's try your unicode character and not the ones I have which might overlap with latin1 or something

@zzzeek
Copy link
Member

zzzeek commented Jan 14, 2019

ding

@zzzeek
Copy link
Member

zzzeek commented Jan 14, 2019

hooray. OK that was it. needed weirder characters. lets see what oracle says.

@zzzeek
Copy link
Member

zzzeek commented Jan 14, 2019

yeah oracle doesn't need it.

@zzzeek
Copy link
Member

zzzeek commented Jan 14, 2019

and adding this to the tests makes me have to deal with mysql utfmb4 all over again...

@zzzeek
Copy link
Member

zzzeek commented Jan 14, 2019

this is looking kind of 1.3 ish now. are you worked around for 1.2.x ?

@zzzeek zzzeek modified the milestones: 1.2.x, 1.3 Jan 14, 2019
@villebro
Copy link
Contributor Author

Yeah sorry I should have stressed the part about using adequately weird characters, I wasted a good bit of time using accented ascii-compliant letters, too 😃

@villebro
Copy link
Contributor Author

I agree about this being 1.3 material. I'm thinking we can live without this until 1.3 comes out (eager ones can go with the RC).

@villebro
Copy link
Contributor Author

Thanks @zzzeek for fixing this and loved the snaked-up French in the test cases 😆

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working datatypes things to do with database types, like VARCHAR and others SQL Server Microsoft SQL Server, e.g. mssql
Projects
None yet
Development

No branches or pull requests

3 participants