Creating a Unicode where-clause with N-prefix #4442

villebro · 2019-01-13T21:07:41Z

I'm not sure if this is a SqlAlchemy issue or something with the dialect/connector, but is there some way to force the query compiler to prefix a Unicode string with N for e.g. MSSQL? If a column has been defined as UnicodeText (also tried NVARCHAR, same result), one would expect a where clause on that column to be prefixed with an N for MSSQL. However, this doesn't automatically seem to happen, and I haven't been able to find any documentation on how to achieve this. Below is an example to illustrate the problem:

import sqlalchemy as sa
from sqlalchemy.sql import column, table

mssql_engine = sa.create_engine("mssql+pyodbc://XXX")

col = column('unicode_col', type_=sa.types.UnicodeText())
unicode_where_str = '🐍'

select = sa.select([col])
select = select.where(col == unicode_where_str)
select = select.select_from(table('tbl'))
print(select.compile(mssql_engine, compile_kwargs={'literal_binds': True}))

This yields

SELECT unicode_col 
FROM tbl 
WHERE unicode_col = '🐍'

while one would expect the unicode type to put an N-prefix on the where clause, like so:

SELECT unicode_col 
FROM tbl 
WHERE unicode_col = N'🐍'

Is there some way to either automatically get the N-prefix, or manually inject it, e.g. by using TypeDecorator?

The text was updated successfully, but these errors were encountered:

zzzeek · 2019-01-13T22:39:12Z

you could use TypeDecorator.bind_expression (https://docs.sqlalchemy.org/en/latest/core/custom_types.html?highlight=bind_expression#sqlalchemy.types.TypeDecorator.bind_expression), however the database driver, e.g. pyodbc or pymssql, does this for you when the statement is executed (edit: zzzeek is stupidly missing that this is with literal binds). What is your actual use case?

zzzeek · 2019-01-13T22:43:05Z

more appropriately it would be literal_processor()

zzzeek · 2019-01-13T23:21:59Z

workaround:

# coding: utf-8

from sqlalchemy import TypeDecorator

import sqlalchemy as sa
from sqlalchemy.sql import column, table

class MSUnicode(TypeDecorator):
    impl = sa.types.UnicodeText

    def literal_processor(self, dialect):
        def process(value):

            value = value.replace("'", "''")

            if dialect.identifier_preparer._double_percents:
                value = value.replace("%", "%%")

            return "N'%s'" % value

        return process


mssql_engine = sa.create_engine("mssql+pyodbc://XXX")

col = column('unicode_col', type_=MSUnicode())
unicode_where_str = '🐍'

select = sa.select([col])
select = select.where(col == unicode_where_str)
select = select.select_from(table('tbl'))
print(select.compile(mssql_engine, compile_kwargs={'literal_binds': True}))

so I may target this at 1.3 or 1.2.x not sure yet

sqla-tester · 2019-01-13T23:23:58Z

Mike Bayer has proposed a fix for this issue in the master branch:

Render N'' for SQL Server unicode literals https://gerrit.sqlalchemy.org/1095

villebro · 2019-01-14T05:21:24Z

Btw, Oracle also has the same N-prefix, but based on my limited testing seems to do automatic casting. But you may want to target Oracle with the change, too, as it seems to be the recommended way of handling NCHAR and NVARCHAR on Oracle.

zzzeek · 2019-01-14T16:27:35Z

I think I need to doa unicode literal round trip test to really confirm on all backends

zzzeek · 2019-01-14T16:47:13Z

well Im unable to show that the "N" makes any difference on Oracle OR SQL server. can you please illustrate the problem you're having with SQL Server itself as well as what Python driver / database version you are using ?

villebro · 2019-01-14T19:44:56Z

I verified this on both MSSQL 2008R2 and 2017. Below I create a table on the 2017 instance (database has SQL_Latin1_General_CP1_CI_AS collation) with an NVARCHAR and VARCHAR column, insert the snake from the previous example (U+1F40D) both with and without N-prefixing into both columns:

The select produces the following:

As can be seen, only the N-prefixed snake that was inserted into the NVARCHAR column turns up in its original shape. Interestingly running a select without N-prefixing is able to find the mutated snake, NOT the N-prefixed snake:

Running the same with N-prefixing turns up the correct snake. I ran the same via SSMS on 2008R2 and also on Superset (Python 3.6.4 with SQLAlchemy==1.2.16, pymssql==2.1.4) with similar results:

The underlying query from Superset rendered by SQLAlchemy:

I read somewhere that MSSQL 2019 is the first version of MSSQL to introduce native UTF8 support (🤣🤣🤣), so the N-prefix might be optional nowadays, but on older versions it seems to be necessary.

zzzeek · 2019-01-14T20:22:16Z

OK my SQL Server here is the linux one, this is the version string: Microsoft SQL Server 2017 (RTM-CU11) (KB4462262) - 14.0.3038.14 (X64) \n\tSep 14 2018 13:53:44 \n\tCopyright (C) 2017 Microsoft Corporation\n\tDeveloper Edition (64-bit) on Linux (CentOS Linux 7 (Core))

The database looks like I created it as: "CREATE DATABASE test;" e.g. i didn't specify any collation.

For these string statements you are making, are you running those in with python and the driver as well? or is this cut-and-paste into a GUI ? i wonder if that affects things ?

zzzeek · 2019-01-14T20:24:59Z

well let me just try py2k as well to make sure things aren't working smoothly, in trying to get this to fail

zzzeek · 2019-01-14T20:36:43Z

I can't get it to fail, here's more or less the same test, py3k/py2k and pymssql/pyodbc, passes

#! coding: utf-8

import pymssql

conn = pymssql.connect(
    user="scott", password="tiger^5HHH", host="mssql2017:1433", database="test"
)


cursor = conn.cursor()

cursor.execute(
    """
CREATE TABLE t (
    id INT,
    x NVARCHAR(255) NULL,
    y VARCHAR(255) NULL
)
"""
)

cursor.execute("DELETE FROM t")
cursor.execute("INSERT INTO t (id, x, y) VALUES (1, N'réveillé', N'réveillé')")
cursor.execute("INSERT INTO t (id, x, y) VALUES (2, 'réveillé', 'réveillé')")

# note in Python 2, these are bytestrings.  pymssql doesn't accept
# u"" on py2k.
for stmt in [
    "SELECT t.* FROM t WHERE t.x = 'réveillé'",
    "SELECT t.* FROM t WHERE t.x = N'réveillé'",
    "SELECT t.* FROM t WHERE t.y = 'réveillé'",
    "SELECT t.* FROM t WHERE t.y = N'réveillé'",
]:
    cursor.execute(stmt)
    rows = cursor.fetchall()
    assert rows == [
        (1, u"réveillé", u"réveillé"),
        (2, u"réveillé", u"réveillé"),
    ], rows

in any case the N certainly isn't making things worse so I'll likely just push it through.

zzzeek · 2019-01-14T20:37:30Z

hey let's try your unicode character and not the ones I have which might overlap with latin1 or something

zzzeek · 2019-01-14T20:40:45Z

ding

zzzeek · 2019-01-14T20:42:38Z

hooray. OK that was it. needed weirder characters. lets see what oracle says.

zzzeek · 2019-01-14T20:43:29Z

yeah oracle doesn't need it.

zzzeek · 2019-01-14T20:49:15Z

and adding this to the tests makes me have to deal with mysql utfmb4 all over again...

zzzeek · 2019-01-14T20:54:42Z

this is looking kind of 1.3 ish now. are you worked around for 1.2.x ?

villebro · 2019-01-14T20:55:39Z

Yeah sorry I should have stressed the part about using adequately weird characters, I wasted a good bit of time using accented ascii-compliant letters, too 😃

villebro · 2019-01-14T21:02:44Z

I agree about this being 1.3 material. I'm thinking we can live without this until 1.3 comes out (eager ones can go with the RC).

villebro · 2019-01-15T17:53:33Z

Thanks @zzzeek for fixing this and loved the snaked-up French in the test cases 😆

zzzeek added question issue where a "fix" on the SQLAlchemy side is unlikely, hence more of a usage question SQL Server Microsoft SQL Server, e.g. mssql datatypes things to do with database types, like VARCHAR and others labels Jan 13, 2019

zzzeek added bug Something isn't working and removed question issue where a "fix" on the SQLAlchemy side is unlikely, hence more of a usage question labels Jan 13, 2019

zzzeek added this to the 1.2.x milestone Jan 13, 2019

villebro mentioned this issue Jan 14, 2019

filtering unicode strings using where clause on MSSQL apache/superset#6624

Closed

zzzeek modified the milestones: 1.2.x, 1.3 Jan 14, 2019

sqlalchemy-bot closed this as completed in c0e6ebd Jan 15, 2019

villebro added a commit to villebro/sqlalchemy that referenced this issue Jan 29, 2019

Fix typo: issue is sqlalchemy#4442, not sqlalchemy#4222

ce481b3

villebro mentioned this issue Jan 29, 2019

Fix typo in 1.3 changelog: mssql unicode issue is #4442, not #4222 #4471

Merged

1 task

zzzeek mentioned this issue Mar 21, 2019

MSSQL where clause defaults to N-prefix on py3 #4561

Closed

zzzeek mentioned this issue Feb 7, 2022

coerce str objects to the String datatype, not Unicode #7551

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Creating a Unicode where-clause with N-prefix #4442

Creating a Unicode where-clause with N-prefix #4442

villebro commented Jan 13, 2019 •

edited

zzzeek commented Jan 13, 2019 •

edited

zzzeek commented Jan 13, 2019

zzzeek commented Jan 13, 2019

sqla-tester commented Jan 13, 2019

villebro commented Jan 14, 2019 •

edited

zzzeek commented Jan 14, 2019

zzzeek commented Jan 14, 2019

villebro commented Jan 14, 2019

zzzeek commented Jan 14, 2019

zzzeek commented Jan 14, 2019

zzzeek commented Jan 14, 2019

zzzeek commented Jan 14, 2019

zzzeek commented Jan 14, 2019

zzzeek commented Jan 14, 2019

zzzeek commented Jan 14, 2019

zzzeek commented Jan 14, 2019

zzzeek commented Jan 14, 2019

villebro commented Jan 14, 2019

villebro commented Jan 14, 2019

villebro commented Jan 15, 2019

Creating a Unicode where-clause with N-prefix #4442

Creating a Unicode where-clause with N-prefix #4442

Comments

villebro commented Jan 13, 2019 • edited

zzzeek commented Jan 13, 2019 • edited

zzzeek commented Jan 13, 2019

zzzeek commented Jan 13, 2019

sqla-tester commented Jan 13, 2019

villebro commented Jan 14, 2019 • edited

zzzeek commented Jan 14, 2019

zzzeek commented Jan 14, 2019

villebro commented Jan 14, 2019

zzzeek commented Jan 14, 2019

zzzeek commented Jan 14, 2019

zzzeek commented Jan 14, 2019

zzzeek commented Jan 14, 2019

zzzeek commented Jan 14, 2019

zzzeek commented Jan 14, 2019

zzzeek commented Jan 14, 2019

zzzeek commented Jan 14, 2019

zzzeek commented Jan 14, 2019

villebro commented Jan 14, 2019

villebro commented Jan 14, 2019

villebro commented Jan 15, 2019

villebro commented Jan 13, 2019 •

edited

zzzeek commented Jan 13, 2019 •

edited

villebro commented Jan 14, 2019 •

edited