Cannot insert strings with a length greater than 2000 into columns with a datatype of varchar(max) or nvarchar(max) using parametrised queries #835

nlykkei · 2020-10-14T10:21:18Z

Environment

Python: 3.8.5
pyodbc: 4.0.30
OS: macOS 10.5.7
DB: MS SQL Server 2017 and 2019
driver: microsoft/mssql-release/msodbcsql17: stable 17.6.1.1

Issue

I cannot insert strings with a length greater than 2000 into columns with a datatype of varchar(max) or nvarchar(max) using parametrised queries.

In particular the POC (see below) fails with following output:

started
non-param: len(content)=2001: success
Traceback (most recent call last):
  File "test.py", line 54, in <module>
    cursor.execute(sql, (content,))
pyodbc.ProgrammingError: ('42000', "[42000] [Microsoft][ODBC Driver 17 for SQL Server][SQL Server]Cannot convert to text/ntext or collate to 'Latin1_General_100_CI_AS_SC' because these legacy LOB types do not support UTF-8 or UTF-16 encodings. Use types varchar(max), nvarchar(max) or a collation which does not have the _SC or _UTF8 flags. (4189) (SQLParamData)")

Every parametrised insert with a string having a length greater than 2000 fails with the above error message relating to a text/ntext conversion or Latin1_General_100_CI_AS_SC collation, which is strange considering that only plain ASCII is inserted.

How can I resolve this issue and insert strings of any length into the database?

POC:

import pyodbc

host = 'tcp:127.0.0.1,1433'
db = 'pyodbc_test'
user = 'sa'
pwd = 'P@ssw0rd'

print('started')

cnxn = pyodbc.connect('DRIVER={ODBC Driver 17 for SQL Server};SERVER='+host+';DATABASE='+'master'+';UID='+user+';PWD='+ pwd, autocommit=True)
cursor = cnxn.cursor()

try:
    cursor.execute(f'CREATE DATABASE {db} COLLATE Latin1_General_100_CI_AS_SC')
except pyodbc.ProgrammingError as e:
    pass # database exists

cursor.execute(f'USE {db}')

try:
    cursor.execute("""
        CREATE TABLE msg (
            id int identity(1,1) not null,
            content varchar(max) not null
        );""")
except pyodbc.ProgrammingError as exc:
    pass # table exists

content = 2001 * 'A'

cursor.execute(f"""
    INSERT INTO msg (content)
    VALUES ('{content}')""")
print(f'non-param: {len(content)=}: success')

# fails (pyodbc 4.0.30)
sql = f"""
      INSERT INTO msg (content)
      VALUES (?)"""
cursor.execute(sql, (content,))
print(f'param: {len(content)=}: success')

print('finished')

The text was updated successfully, but these errors were encountered:

v-chojas · 2020-10-14T14:51:38Z

I suspect this is due to how pyODBC binds these long parameters using a SQL type of SQL_LONGVARCHAR or SQL_WLONGVARCHAR , which translates to text/ntext in the ODBC Driver for SQL Server. Because it is driver-agnostic, pyODBC does not have special handling for varchar(max). You can try to use setinputsizes on your parameter to tell it to send as varchar(max): cursor.setinputsizes([(pyodbc.SQL_WVARCHAR,0,0),])

Does it work if you use a non-SC collation?

You can post an ODBC trace for further analysis.

nlykkei · 2020-10-14T19:06:00Z

Thank you. I’m not a direct user of `pyodbc` - I only use it indirectly through SQLAlchemy. Therefore, the use of `setinputsizes` is not really an option for me, since I deal with the ORM abstraction only. How do I make a OBDC trace? Once I’ve done that, I will provide answers to all your questions. I see that `text`/`ntext` are deprecated, since they don’t support Unicode supplementary characters (_SC) (0x10000 -). How can I use data types with support for `_SC`? As I recall, removing `_SC` fixes the issue (I will post a definite answer later), but I need to support supplementary characters in my application.

v-chojas · 2020-10-14T19:42:19Z

https://github.com/mkleehammer/pyodbc/wiki/Troubleshooting-%E2%80%93-Generating-an-ODBC-trace-log

If you cannot use setinputsizes directly, then you can ask your ORM vendor about adding support for varchar(max): https://github.com/sqlalchemy/sqlalchemy/issues

nlykkei · 2020-10-15T06:32:04Z

Can you elaborate on the semantics of cursor.setinputsizes([(pyodbc.SQL_WVARCHAR,0,0),])? In particular, what do the 0's represent? Does it implement a true varchar(max) datatype allowing up to 2GB of data, or what is the maximum allowed size?

I can confirm that removing _SC from the database collation fixed the problem, but that's not an option for me, since I need to support Unicode supplementary characters (e.g. smileys):

cursor.execute(f'CREATE DATABASE {db} COLLATE Latin1_General_100_CI_AS') results in the (partial) output: param: len(content)=2001: success in the POC above.

Furthermore using setinputsizes fixed the problem:

cnxn = pyodbc.connect('DRIVER={ODBC Driver 17 for SQL Server};SERVER='+host+';DATABASE='+'master'+';UID='+user+';PWD='+ pwd, autocommit=True)
cursor = cnxn.cursor()
cursor.setinputsizes([(pyodbc.SQL_WVARCHAR,0,0),])

results in the output:

started
non-param: len(content)=2001: success
param: len(content)=2001: success
finished

ODBC trace:

I've attached two ODBC trace logs: with/without cursor.setinputsizes([(pyodbc.SQL_WVARCHAR,0,0),])

odbctrace.txt
odbctrace_setinputsizes.txt

$ odbcinst -j
unixODBC 2.3.9
DRIVERS............: /usr/local/etc/odbcinst.ini
SYSTEM DATA SOURCES: /usr/local/etc/odbc.ini
FILE DATA SOURCES..: /usr/local/etc/ODBCDataSources
USER DATA SOURCES..: /Users/nlykkei/.odbc.ini
SQLULEN Size.......: 8
SQLLEN Size........: 8
SQLSETPOSIROW Size.: 8

v-chojas · 2020-10-15T15:01:33Z

That confirms my suspicions, pyODBC is using SQL_WLONGVARCHAR (-10) which the driver maps to ntext, and using setinputsizes to specify a column size of 0 correctly causes the driver to use nvarchar(max) instead.

Note that according to https://docs.microsoft.com/en-us/sql/relational-databases/collations/collation-and-unicode-support?view=sql-server-ver15#Supplementary_Characters the effect of using or not using _SC collations only affects string operations in the DB itself, which suggests that for simply storing the data and retrieving it, _SC won't show any difference even with high Unicode characters..

nlykkei · 2020-10-15T16:06:35Z

Why are we using cursor.setinputsizes([(pyodbc.SQL_WVARCHAR,0,0),]) instead of cursor.setinputsizes([(pyodbc.SQL_WLONGVARCHAR,0,0),]) then?

I don't understand the following: why does e.g. Azure Data Studio display the column datatype in the database as varchar(max), when apparently the driver maps the parameter to ntext, or when using setinputsizes(), it maps it to nvarchar(max)? For a SQL novice, it looks very messed up.

Just to be sure, it's the interaction between pyODBC that causes troubles, not the SQL Server database backend, right?

I previously experimented with non-_SC in my application, and it caused errors.

Is the conclusion that SQLAlchemy should fix this, or is it pyODBC's responsibility? Currently, SQLAlchemy says that it's pyODBC's responsibility, see my issue here: https://groups.google.com/g/sqlalchemy/c/Kk6DkPNWlR4

v-chojas · 2020-10-15T16:21:36Z

SQL_WLONGVARCHAR will cause the driver to send ntext. SQL_WVARCHAR is required to send nvarchar.
The database column is truly varchar(max).

Is the conclusion that SQLAlchemy should fix this, or is it pyODBC's responsibility?

SQLAlchemy. pyODBC is generic and does not know about special handling of varchar(max), whereas SQLAlchemy appears to have code for specific database types. It needs to call setinputsizes as you described, when the length is more than maximum for non-max types (2K wide characters or 4K bytes).

(I cannot see that link, it requires login.)

nlykkei · 2020-10-16T07:33:05Z

Isn't it a problem that nvarchar is sent, when the column is varchar(max)? AFAIK, varchar is for non-Unicode data, where as nvarchar is for Unicode data.

_SC collations only affects string operations in the DB

These string operations include comparisons etc., so that if I want to filter on a column containing supplementary characters, I need the _SC collation? Or does string operations only refer to built-in functions?

v-chojas · 2020-10-16T14:49:43Z

The server can convert between nvarchar and varchar.

gordthompson · 2020-10-17T19:23:05Z

Taking pyodbc out of the mix for a moment, this VBA code does not throw the error:

Sub gh_sqla_5651()
    Dim con As New ADODB.Connection
    con.Open _
        "DRIVER=ODBC Driver 17 for SQL Server;" & _
        "SERVER=(local)\SQLEXPRESS;" & _
        "DATABASE=master;" & _
        "Trusted_Connection=Yes;" & _
        "UseFMTONLY=Yes;"
        
    Const db_name = "gh_sqla_5651"
    Const table_name = "table1"
    
    Dim cmd As New ADODB.Command
    cmd.ActiveConnection = con
    cmd.CommandText = "DROP DATABASE IF EXISTS " & db_name
    cmd.Execute
    cmd.CommandText = "CREATE DATABASE " & db_name & " COLLATE Latin1_General_100_CI_AI_SC"
    cmd.Execute
    cmd.CommandText = "USE " & db_name
    cmd.Execute
    cmd.CommandText = "CREATE TABLE " & table_name & "(id int PRIMARY KEY, txt nvarchar(max))"
    cmd.Execute
    cmd.CommandText = "INSERT INTO " & table_name & " (id, txt) VALUES (?, ?)"
    cmd.Parameters.Append cmd.CreateParameter("?", adInteger, adParamInput, , 1)
    cmd.Parameters.Append cmd.CreateParameter("?", adLongVarWChar, adParamInput, 2 ^ 31 - 1, String(2001, "x"))
    cmd.Execute
End Sub

SQL Profiler shows the difference is that ADO produces this

exec sp_executesql N'INSERT INTO table1 (id, txt) VALUES (@P1, @P2)',N'@P1 int,@P2 ntext',1,N'xxx ... xxx'

while pyodbc produces this

declare @p1 int
set @p1=1
exec sp_prepare @p1 output,N'@P1 int,@P2 ntext',N'INSERT INTO table1 (id, txt) VALUES (@P1, @P2)',1
select @p1

exec sp_execute 1,1,N'xxx ... xxx'

exec sp_unprepare 1

In both cases the string parameter is declared as ntext. What's puzzling is that ADO calls SQLExecDirectW whereas pyodbc does an explicit SQLPrepareW even though this is a single .execute. In both cases the parameter is bound as SQL_WLONGVARCHAR with one subtle difference:

ADO:

dde             10b0-1060	EXIT  SQLBindParameter  with return code 0 (SQL_SUCCESS)
        HSTMT               0x080E79B0
        UWORD                        2 
        SWORD                        1 <SQL_PARAM_INPUT>
        SWORD                       -8 <SQL_C_WCHAR>
        SWORD                      -10 <SQL_WLONGVARCHAR>
        SQLULEN             2147483647
        SWORD                        0 
        PTR                0x12FFD068
        SQLLEN                  4004
        SQLLEN *            0x12EDC2C4 (4002)

pyodbc:

gh_sqla_5649    2060-1c5c	EXIT  SQLBindParameter  with return code 0 (SQL_SUCCESS)
        HSTMT               0x00000059159E3B80
        UWORD                        2 
        SWORD                        1 <SQL_PARAM_INPUT>
        SWORD                       -8 <SQL_C_WCHAR>
        SWORD                      -10 <SQL_WLONGVARCHAR>
        SQLULEN                 4002
        SWORD                        0 
        PTR                0x00000059124F9FD8
        SQLLEN                     8
        SQLLEN *            0x00000059124FA000 (-4102)

ado.LOG
pyodbc.LOG

gordthompson · 2020-10-17T22:24:05Z

Also no error with this C# code:

using System;
using System.Data.Odbc;

namespace odbcConsoleApp
{
    class Program
    {
        static void Main(string[] args)
        {
            var connectionString =
                  "DRIVER=ODBC Driver 17 for SQL Server;"
                + "SERVER=(local)\\SQLEXPRESS;"
                + "DATABASE=master;"
                + "Trusted_Connection=Yes;"
                + "UseFMTONLY=Yes;";
            var db_name = "gh_sqla_5651";
            var table_name = "table1";
            using (var con = new OdbcConnection(connectionString))
            {
                con.Open();
                using (var cmd = new OdbcCommand())
                {
                    cmd.Connection = con;
                    cmd.CommandText = $"DROP DATABASE IF EXISTS {db_name}";
                    cmd.ExecuteNonQuery();
                    cmd.CommandText = $"CREATE DATABASE {db_name} COLLATE Latin1_General_100_CI_AI_SC";
                    cmd.ExecuteNonQuery();
                    cmd.CommandText = $"USE {db_name}";
                    cmd.ExecuteNonQuery();
                    cmd.CommandText = $"CREATE TABLE {table_name} (id int PRIMARY KEY, txt nvarchar(max))";
                    cmd.ExecuteNonQuery();
                    cmd.CommandText = $"INSERT INTO {table_name} (id, txt) VALUES (?, ?)";
                    cmd.Parameters.Add("?", OdbcType.Int).Value = 1;
                    cmd.Parameters.Add("?", OdbcType.NText).Value = new String('x', 2001);
                    cmd.ExecuteNonQuery();
                }
            }
        }
    }
}

From SQL Profiler's perspective it is the same as ADO:

exec sp_executesql N'INSERT INTO table1 (id, txt) VALUES (@P1, @P2)',N'@P1 int,@P2 ntext',1,N'xxx ... xxx'

and the ODBC trace shows

odbcConsoleApp  269c-2a94	EXIT  SQLBindParameter  with return code 0 (SQL_SUCCESS)
        HSTMT               0x057808C8
        UWORD                        2 
        SWORD                        1 <SQL_PARAM_INPUT>
        SWORD                       -8 <SQL_C_WCHAR>
        SWORD                      -10 <SQL_WLONGVARCHAR>
        SQLULEN                 2001
        SWORD                        0 
        PTR                0x00CF5E1C
        SQLLEN                  4004
        SQLLEN *            0x00CF5E18 (4002)

csharp.LOG

gordthompson · 2020-10-19T16:31:09Z

@v-chojas - For completeness I added an explicit cmd.Prepare(); to my C# code, expecting it to fail, but it didn't.

using System;
using System.Data.Odbc;

namespace odbcConsoleApp
{
    class Program
    {
        static void Main(string[] args)
        {
            Console.WriteLine(String.Format("{0}-bit", IntPtr.Size * 8));
            var connectionString =
                  "DRIVER=ODBC Driver 17 for SQL Server;"
                + "SERVER=(local)\\SQLEXPRESS;"
                + "DATABASE=master;"
                + "Trusted_Connection=Yes;"
                + "UseFMTONLY=Yes;";
            var db_name = "gh_sqla_5651";
            var table_name = "table1";
            using (var con = new OdbcConnection(connectionString))
            {
                con.Open();
                using (var cmd = new OdbcCommand())
                {
                    cmd.Connection = con;

                    cmd.CommandText = $"DROP DATABASE IF EXISTS {db_name}";
                    cmd.ExecuteNonQuery();
                    cmd.CommandText = $"CREATE DATABASE {db_name} COLLATE Latin1_General_100_CI_AI_SC";
                    cmd.ExecuteNonQuery();
                    cmd.CommandText = $"USE {db_name}";
                    cmd.ExecuteNonQuery();
                    cmd.CommandText = $"CREATE TABLE {table_name} (id int PRIMARY KEY, txt nvarchar(max))";
                    cmd.ExecuteNonQuery();
                    cmd.CommandText = $"INSERT INTO {table_name} (id, txt) VALUES (?, ?)";
                    cmd.Prepare();  // new
                    cmd.Parameters.Add("?", OdbcType.Int).Value = 1;
                    cmd.Parameters.Add("?", OdbcType.NText).Value = new String('x', 2001);
                    cmd.ExecuteNonQuery();
                }
            }
        }
    }
}

csharp_prepare.LOG shows that it is calling SQLPrepareW

odbcConsoleApp  1aec-184	EXIT  SQLPrepareW  with return code 0 (SQL_SUCCESS)
        HSTMT               0x000000554BB97A90
        WCHAR *             0x0000005533096254 [      -3] "INSERT INTO table1 (id, txt) VALUES (?, ?)\ 0"
        SDWORD                    -3

and binding the parameter in the expected way

odbcConsoleApp  1aec-184	EXIT  SQLBindParameter  with return code 0 (SQL_SUCCESS)
        HSTMT               0x000000554BB97A90
        UWORD                        2 
        SWORD                        1 <SQL_PARAM_INPUT>
        SWORD                       -8 <SQL_C_WCHAR>
        SWORD                      -10 <SQL_WLONGVARCHAR>
        SQLULEN                 2001
        SWORD                        0 
        PTR                0x000000554C51B648
        SQLLEN                  4004
        SQLLEN *            0x000000554C51B640 (4002)

but SQL Profiler shows sp_prepexec being called

declare @p1 int
set @p1=1
exec sp_prepexec @p1 output,N'@P1 int,@P2 ntext',N'INSERT INTO table1 (id, txt) VALUES (@P1, @P2)',1,N'xxx ... xxx'
select @p1

Is there a clue in the ODBC trace that might explain why pyodbc and System.Data.Odbc can both call SQLPrepareW and get different results?

gordthompson · 2020-10-20T13:41:44Z

@v-chojas - The test code also completes successfully with PDO_ODBC under PHP:

<?php
header('Content-Type: text/html; charset=utf-8');
?>
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>PDO example</title>
</head>
<body>
<?php
echo (8 * PHP_INT_SIZE) . "-bit<br/>";
$connStr = 
        'odbc:' .
        'DRIVER=ODBC Driver 17 for SQL Server;' .
        'SERVER=.\\SQLEXPRESS;' .
        'DATABASE=master;' .
        'Trusted_Connection=yes;' .
        'UseFMTONLY=yes';
$dbh = new PDO($connStr);
$dbh->setAttribute(PDO::ATTR_ERRMODE, PDO::ERRMODE_EXCEPTION);

$db_name = "gh_sqla_5651";
$table_name = "table1";

$sth = $dbh->prepare("DROP DATABASE IF EXISTS " . $db_name);
$sth->execute();
$sth = $dbh->prepare("CREATE DATABASE " . $db_name . " COLLATE Latin1_General_100_CI_AI_SC");
$sth->execute();
$sth = $dbh->prepare("USE " . $db_name);
$sth->execute();
$sth = $dbh->prepare("CREATE TABLE " . $table_name . "(id int PRIMARY KEY, txt nvarchar(max))");
$sth->execute();
$sth = $dbh->prepare("INSERT INTO " . $table_name . " (id, txt) VALUES (?, ?)");
$id = 1;
$sth->bindParam(1, $id, PDO::PARAM_INT);
$txt = str_repeat("x", 2001);
$sth->bindParam(2, $txt, PDO::PARAM_STR);
$sth->execute();
?>
</body>
</html>

SQL Profiler shows

set fmtonly on select id,txt from table1 where 1=2 set fmtonly off

declare @p1 int
set @p1=5
exec sp_prepare @p1 output,N'@P1 int,@P2 nvarchar(max)',N'INSERT INTO table1 (id, txt) VALUES (@P1, @P2)',1
select @p1

exec sp_execute 5,1,N'xxx ... xxx'

exec sp_unprepare 5

pyodbc is definitely the outlier here.

v-chojas · 2020-10-20T15:16:51Z

The negative indicator (-4102) shows that pyODBC is attempting to use DAE. A comment in the code agrees, and while msodbcsql doesn't require the use of DAE for the long types, other drivers may.

gordthompson · 2020-10-20T16:17:24Z

@v-chojas - Thanks for the explanation. So if you could add an avoid_dae_on_execute flag to pyodbc, e.g.,

cnxn = pyodbc.connect(connection_string, avoid_dae_on_execute=True)

then I can tweak SQLAlchemy's mssql+pyodbc dialect to add that to the pyodbc.connect() call. As @zzzeek mentions here, a workaround for SQLAlchemy using setinputsizes is simply not practical.

Trying to reproduce #835. No luck.

mkleehammer · 2020-10-25T23:02:48Z

The first thing we need is a failing unit test. I've added varchar(max) and nvarchar(max) tests to tests3/sqlservertests.py. They pass on Windows. I don't have my new Mac setup for SQL Server just yet. Can someone confirm these tests fail with Mac?

Part of the setup in cnxninfo.cpp code is to find the longest supported varchar and wvarchar lengths using SQLGetTypeInfo. IIRC, anything larger than this is supposed to require DAE. I need to find the docs for this and add more comments.

mkleehammer · 2020-10-25T23:53:33Z

I increased the varchar(max) and nvarchar(max) tests to use the large sizes (up to 20K) and connected to my Windows SQL Server test machine from the Mac test machine using the MS SQL Server ODBC Driver v17. No problems.

Here's my system info.

(venv39) [1] Mac ~/dev/pyodbc/tests3 $ python sqlservertests.py -v                                              [master]
/Users/mkleehammer/dev/pyodbc/tests3 --> /Users/mkleehammer/dev/pyodbc
CS: DSN=mssql;UID=test;PWD=test
python:  3.9.0 (v3.9.0:9cf6752276, Oct  5 2020, 11:29:23)
[Clang 6.0 (clang-600.0.57)]
pyodbc:  4.0.30 /Users/mkleehammer/dev/pyodbc/build/lib.macosx-10.9-x86_64-3.9/pyodbc.cpython-39-darwin.so
odbc:    03.52
driver:  libmsodbcsql.17.dylib 17.06.0001
         supports ODBC version 03.52
os:      Darwin
unicode: Py_Unicode=4 SQLWCHAR=2
Max VARCHAR = 8000
Max WVARCHAR = 4000
Max BINARY = 8000

Also note the Max items at the end. These are the maximum supported sizes according to the driver. Anything above these sizes should (and does) use DAE. Is this not what we expect?

Do these tests fail on anyone else's machine? I must be missing something.

mkleehammer · 2020-10-26T00:03:07Z

The example test program also works for me. Here is the trace of the two inserts.

		Exit:[SQL_SUCCESS]
[ODBC][35846][1603670293.046335][SQLExecDirectW.c][177]
		Entry:
			Statement = 0x7fab84062200
			SQL = [
    INSERT INTO msg (content)
    VALUES ('AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA...][length = 2047]
[ODBC][35846][1603670293.051688][SQLExecDirectW.c][456]
		Exit:[SQL_SUCCESS]
[ODBC][35846][1603670293.051984][SQLRowCount.c][173]
		Entry:
			Statement = 0x7fab84062200
			Row Count = 0x7ffeec186c90
[ODBC][35846][1603670293.052464][SQLRowCount.c][247]
		Exit:[SQL_SUCCESS]
			Row Count = 0x7ffeec186c90 -> 1 (64 bits)
[ODBC][35846][1603670293.052716][SQLNumResultCols.c][156]
		Entry:
			Statement = 0x7fab84062200
			Column Count = 0x7ffeec186c80
[ODBC][35846][1603670293.053143][SQLNumResultCols.c][251]
		Exit:[SQL_SUCCESS]
			Count = 0x7ffeec186c80 -> 0
[ODBC][35846][1603670293.053452][SQLFreeStmt.c][144]
		Entry:
			Statement = 0x7fab84062200
			Option = 0
[ODBC][35846][1603670293.053667][SQLFreeStmt.c][266]
		Exit:[SQL_SUCCESS]
[ODBC][35846][1603670293.053957][SQLPrepareW.c][165]
		Entry:
			Statement = 0x7fab84062200
			SQL = [
      INSERT INTO msg (content)
      VALUES (?)][length = 49]
[ODBC][35846][1603670293.054156][SQLPrepareW.c][346]
		Exit:[SQL_SUCCESS]
[ODBC][35846][1603670293.054608][SQLNumParams.c][144]
		Entry:
			Statement = 0x7fab84062200
			Param Count = 0x7ffeec186be6
[ODBC][35846][1603670293.054771][SQLNumParams.c][231]
		Exit:[SQL_SUCCESS]
			Count = 0x7ffeec186be6 -> 1
[ODBC][35846][1603670293.055090][SQLBindParameter.c][217]
		Entry:
			Statement = 0x7fab84062200
			Param Number = 1
			Param Type = 1
			C Type = -8 SQL_C_WCHAR
			SQL Type = -10 
			Col Def = 4002
			Scale = 0
			Rgb Value = 0x7fab83a30ae0
			Value Max = 8
			StrLen Or Ind = 0x7fab83a30b08
[ODBC][35846][1603670293.055303][SQLBindParameter.c][434]
		Exit:[SQL_SUCCESS]
[ODBC][35846][1603670293.055486][SQLExecute.c][187]
		Entry:
			Statement = 0x7fab84062200
[ODBC][35846][1603670293.059333][SQLExecute.c][357]
		Exit:[SQL_NEED_DATA]
[ODBC][35846][1603670293.059531][SQLParamData.c][166]
		Entry:
			Statement = 0x7fab84062200
			Value = 0x7ffeec186c80
[ODBC][35846][1603670293.059890][SQLParamData.c][362]
		Exit:[SQL_NEED_DATA]
			Value = 0x7fab83a30ae0
[ODBC][35846][1603670293.060159][SQLPutData.c][148]
		Entry:
			Statement = 0x7fab84062200
			Data = 0x7fab84075220
			StrLen = 4000
[ODBC][35846][1603670293.060487][SQLPutData.c][315]
		Exit:[SQL_SUCCESS]
[ODBC][35846][1603670293.060646][SQLPutData.c][148]
		Entry:
			Statement = 0x7fab84062200
			Data = 0x7fab840761c0
			StrLen = 2
[ODBC][35846][1603670293.060799][SQLPutData.c][315]
		Exit:[SQL_SUCCESS]
[ODBC][35846][1603670293.061167][SQLParamData.c][166]
		Entry:
			Statement = 0x7fab84062200
			Value = 0x7ffeec186c80
[ODBC][35846][1603670293.066789][SQLParamData.c][362]
		Exit:[SQL_SUCCESS]
			Value = 0x7fab83a30ae0

mkleehammer · 2020-10-26T00:09:49Z

This seems important. From the OP's ODBC trace log.

This explains why it worked on my machine - I used my existing test database and missed the COLLATE Latin1_General_100_CI_AS_SC when creating the database.

[ODBC][37][1602743956.220306][SQLParamData.c][362]
		Exit:[SQL_ERROR]
			Value = 0x564861448eb0
		DIAG [42000] [Microsoft][ODBC Driver 17 for SQL Server][SQL Server]Cannot convert to
		text/ntext or collate to 'Latin1_General_100_CI_AS_SC' because these legacy LOB types
		do not support UTF-8 or UTF-16 encodings. Use types varchar(max), nvarchar(max) or a
		collation which does not have the _SC or _UTF8 flags.



[ODBC][37][1602743956.220352][SQLGetDiagRecW.c][535]
		Entry:
			Statement = 0x5648613e86a0
			Rec Number = 1
			SQLState = 0x7ffcb3ba7fe4
			Native = 0x7ffcb3ba7fc4
			Message Text = 0x7ffcb3ba7ff0
			Buffer Length = 1023
			Text Len Ptr = 0x7ffcb3ba7fc2
[ODBC][37][1602743956.220375][SQLGetDiagRecW.c][596]
		Exit:[SQL_SUCCESS]
			SQLState = [42000]
			Native = 0x7ffcb3ba7fc4 -> 4189
			Message Text = [[Microsoft][ODBC Driver 17 for SQL Server][SQL Server]Cannot
			               convert to text/ntext or collate to 'Latin1_General_100_CI_AS_SC' b]

Can anyone explain? The recent SQL Server Unicode changes are new to me.

https://feedback.azure.com/forums/908035-sql-server/suggestions/39319090-utf-8-in-sql-2019-binary-collation-latin1-gener

mkleehammer · 2020-10-26T00:20:25Z

More info:

https://docs.microsoft.com/en-us/sql/relational-databases/collations/collation-and-unicode-support?view=sql-server-ver15

zzzeek · 2020-10-26T01:40:00Z

@mkleehammer thanks for looking into this. not much I can add, folks here seem to know SQL server much better than I do.

gordthompson · 2020-10-26T12:10:09Z

@mkleehammer (cc: @v-chojas )

This seems important.

Yes, the _SC collation as the database default is the key. No problems with a database default of Latin1_General_100_CI_AS (without the _SC) and then creating a table with

CREATE TABLE t1 (id int primary key, txt nvarchar(max) COLLATE Latin1_General_100_CI_AS_SC)

so we can't even use that to create a failing test. We need to do

CREATE DATABASE foo COLLATE Latin1_General_100_CI_AS_SC
USE foo
CREATE TABLE t1 (id int primary key, txt nvarchar(max))

The reason why System.Data.Odbc, ADO, and PDO_ODBC all work seems to be that they don't use DAE. As a test I tried changing

pyodbc/src/params.cpp

Line 767 in 023af55

if (maxlength == 0 || cb <= maxlength || isTVP)

to

    bool avoid_dae_on_execute = true;  // POC for #835
    
    if (maxlength == 0 || cb <= maxlength || isTVP || avoid_dae_on_execute)

but that failed with "Invalid Precision value (0)".

Given that Microsoft shut down that feedback.azure.com thread pretty darn quick the situation doesn't look too promising.

My gut feeling is that msodbcsql should just stop sending the parameter values as text/ntext and start sending them as varchar(max)/nvarchar(max). According to that Azure thread, text/ntext have been deprecated since SQL Server 2005.

However, msodbcsql is a total black box. AFAIK there isn't even a forum for suggestions or an issue tracker. The closest I've found is https://github.com/microsoft/msphpsql but that is for the PHP layer over msodbcsql, not msodbcsql itself.

v-chojas · 2020-10-26T14:51:57Z

My gut feeling is that msodbcsql should just stop sending the parameter values as text/ntext and start sending them as varchar(max)/nvarchar(max).

For backwards compatibility, that will not change. If you don't want text/ntext then bind using a SQL type of SQL_VARCHAR/SQL_WVARCHAR, not the LONG variations (which map to text/ntext), and specify a "column length" of 0 which tells it to use the (max) type.

gordthompson · 2020-10-26T16:05:13Z

My gut feeling is that msodbcsql should just stop sending the parameter values as text/ntext and start sending them as varchar(max)/nvarchar(max).

For backwards compatibility, that will not change.

It need not be an immediate breaking change. msodbcsql could accept a connection string parameter named "UseNVarcharMax", which would default to "no" and continue the current text/ntext behaviour. UseNVarcharMax=yes would tell the driver to use nvarchar(max) instead of ntext and varchar(max) instead of text.

Precedent:

UseFMTONLY=yes was introduced in msodbcsql 17.0. It uses a legacy feature whose documentation says

(!) Note

Do not use this feature. This feature has been replaced by the following items:
…

If you don't want text/ntext then bind using a SQL type of SQL_VARCHAR/SQL_WVARCHAR, not the LONG variations (which map to text/ntext), and specify a "column length" of 0 which tells it to use the (max) type.

Is setinputsizes the only way to do that? If so, then you're saying that SQLAlchemy's mssql dialect must inspect every column object for every DML operation, see if it is a string, and if so then issue a setinputsizes before executing. That is simply not practical.

mkleehammer · 2021-01-22T03:55:16Z

I'm sorry I'm just now getting back to this.

I'm wondering if DAE should ever be used. I need to test with some populate DBs, but maybe it should default to off. What does everyone think of investigating that solution?

Also, note that you can quickly test turning of DAE by setting cnxn.maxwrite to something larger than the largest string you'll write (1GB or something). If I remember correctly, that would be the same as @gordthompson's test Boolean, but would let you test something with an existing pyodbc version.

I would guess most drivers really don't care about DAE, so another option would be to default maxwrite to None (0 internally in C) and use that to indicate we never use it, but allow people to set it to a positive int to enable DAE above that value.

Thoughts?

gordthompson · 2021-01-22T18:30:22Z

In the original repro code if I add

cnxn.maxwrite = 1_000_000

immediately after the pyodbc.connect() call I get

gh_pyodbc_835   19a4-1ee4	EXIT  SQLBindParameter  with return code -1 (SQL_ERROR)
        HSTMT               0x03D003A8
        UWORD                        1 
        SWORD                        1 <SQL_PARAM_INPUT>
        SWORD                       -8 <SQL_C_WCHAR>
        SWORD                       -9 <SQL_WVARCHAR>
        SQLULEN                 4002
        SWORD                        0 
        PTR                0x00F10E88
        SQLLEN                  4002
        SQLLEN *            0x00EE48C4

        DIAG [HY104] [Microsoft][ODBC Driver 17 for SQL Server]Invalid precision value (0)

SQL.LOG

(Tested with pyodbc 4.0.31b51.)

mkleehammer · 2021-01-22T19:04:28Z

It says "precision" and the DecimalDigits parameter to SQLBindParameter is 0 in the log. That must be what it is complaining about. The documentation states the DecimalDigits is ignored when inserting character data:

Does it only happen when we cross the 4000 character boundary?

@v-chojas @v-makouz Can either of you provide any insight here?

v-chojas · 2021-01-22T19:57:59Z

@mkleehammer Good to see you working on pyODBC again!

When the length is nonzero and the SQL type is not a long type, the driver interprets that as a non-max type, and as we know a non-max nvarchar is restricted to 4000 at most. If you specify a length of 0, the driver interprets that to mean use a max-type.

Relevant documentation here; https://docs.microsoft.com/en-us/sql/relational-databases/native-client/features/using-large-value-types

gordthompson · 2021-01-22T22:40:44Z

Does it only happen when we cross the 4000 character boundary?

The 4000 byte boundary, yes. With the above cnxn.maxwrite = 1_000_000 modification still in place

content = 2000 * 'A'

succeeds, while

content = 2001 * 'A'

fails.

gordthompson · 2021-01-22T22:46:31Z

I would guess most drivers really don't care about DAE

In an earlier comment @v-chojas said

"while msodbcsql doesn't require the use of DAE for the long types, other drivers may"

mkleehammer · 2021-01-23T22:07:12Z

Thanks to @gordthompson, we have some data to look at. Here is a summary of his tests.

Test	Pass	ValueType	ParameterType	ColumnSize	BufferLength	StrLen
pyodbc	F	wchar	WLONGVARCHAR	4002	8	AE(4002)
ADO	P	wchar	WLONGVARCHAR	2147483647	4004	4002
C#	P	wchar	WLONGVARCHAR	2001	4004	4002
maxwrite	F	wchar	WVARCHAR	4002	4002	?

This leads me to a few questions:

pyodbc is passing 4002 but C# is passing a ColumnSize of 2001 with a long data type and it
is working. What is the right value for ColumnSize? We might need to fix this even if it
is unrelated.
As seen in the C# example, DAE is not required to send long data. It is passing
WLONGVARCHAR but not using DAE. Is there a way to know this? I truly doubt most open
source drivers really care and would happily accept large buffers here.
How is the C# version working? I thought the maximum size for nvarchar was 2000. Does it
work just because it is a max column or is it because its limit is 4000 characters instead
of 4000 bytes.
Does the C# approach work with SQL Server earlier than 2005? The
link
provided says special handling was required before 2005. I assume we still want to
support 2001.
I'm not sure what the ColumnSize value ADO is passing is. Possible a constant. I thought
I saw an "unlimited" constant somewhere, but I'm not finding it now.

Here is what doesn't work with SQL Server:

Using DAE and a non-zero column size because the long data type forces it to be ntext (pyodbc).
Not using DAE or long size but sending a column size over 4K (maxwrite).

I'm still not 100% certain of what does work:

Does the C# approach of using long types and passing the actual length work past 4K? If
so, it isn't a complete solution as we'd still need a threshold.

@v-chojas's comments seem to say that the preferred approach is to use WVARCHAR with a 0
column size.

Will this work with older SQL Server versions like 2001?

A bigger problem is how do we when know when to do this? It's not quite just avoiding DAE.
It sounds like we also have to pass 0 unless I missed a combination that works.

Is it time to introduce a DBMS-type or dialect attribute with all of these settings? Then
we can tweak things for certain databases when necessary? I've tried to avoid it and still
dislike it, but I also hate to have 20 parameters you have to set when connecting just to
get it to work. I always have a single connect function where it is all kept together, so
it isn't that terrible, but it might be confusing for newbies. It might be much easier to
use connect(.., dialect="mssqlserver").

We'd define the generic default and then some DB specific ones like "mssqlserver",
"postgrsql", and "mysql".

zzzeek · 2021-01-23T23:09:03Z

Is it time to introduce a DBMS-type or dialect attribute with all of these settings? Then
we can tweak things for certain databases when necessary?

IMO this is what pyodbc should do and I believe this mechanism can be used to help with issues like #134 as well (?)

gordthompson · 2021-01-24T14:30:19Z

Is it time to introduce a DBMS-type or dialect attribute with all of these settings? Then
we can tweak things for certain databases when necessary? I've tried to avoid it and still
dislike it,

I agree that it's not an ideal solution but it may be the most practical approach, especially since vendors cannot be relied upon to fix defects in their ODBC drivers.

keitherskine · 2021-01-26T01:49:37Z

Bundling a bunch of DBMS-specific pyodbc parameters into a connection "dialect" does seem to be an elegant way of making pyodbc work with a variety of DBMSs that don't comply 100% with the ODBC standard. This dialect approach is somewhat like turbodbc or the built-in Python csv reader object, and works well in my experience. Perhaps pyodbc could even try to detect the DBMS on connection (from SQL_DBMS_NAME and SQL_DBMS_VER?) so clients don't need to specify it at all.

gordthompson · 2021-01-27T02:36:32Z

Perhaps pyodbc could even try to detect the DBMS on connection (from SQL_DBMS_NAME and SQL_DBMS_VER?) so clients don't need to specify it at all.

Good idea, but I would recommend going slow on that. AFAIK there is no standard naming convention for those sorts of attributes, so in doing so we could potentially be opting-in to a perpetual game of catch-up with "the names of things", e.g.,

pyodbc/tests3/sqlservertests.py

Lines 87 to 88 in 7c7b1b1

    
           if type_name == 'msodbcsql': 
        
               return ('msodbcsql' in driver_name) or ('sqlncli' in driver_name) or ('sqlsrv32.dll' == driver_name)

My preferred approach would be to keep the default behaviour and get the (hard-coded) profiles defined first, then try to make it automatic if/when the customers ask for it.

gordthompson · 2022-02-21T21:07:21Z

My gut feeling is that msodbcsql should just stop sending the parameter values as text/ntext and start sending them as varchar(max)/nvarchar(max).

For backwards compatibility, that will not change.

It need not be an immediate breaking change. msodbcsql could accept a connection string parameter named "UseNVarcharMax", which would default to "no" and continue the current text/ntext behaviour. UseNVarcharMax=yes would tell the driver to use nvarchar(max) instead of ntext and varchar(max) instead of text.

As reported in this comment, ODBC Driver 18 for SQL Server has apparently added just such a parameter, named LongAsMax

gordthompson mentioned this issue Oct 17, 2020

mssql+pyodbc error with database using _SC collation sqlalchemy/sqlalchemy#5651

Closed

mkleehammer added a commit that referenced this issue Oct 25, 2020

Add varchar(max) and nvarchar(max) SQL Server tests

3173474

Trying to reproduce #835. No luck.

mkleehammer added Environment or Driver Issue Investigating labels Jan 22, 2021

gordthompson mentioned this issue Jan 25, 2021

pyodbc error 22018 - Operand type clash : image is incompatible with varchar(max) #839

Closed

gordthompson mentioned this issue Mar 16, 2021

rework set_input_sizes() to be generically available for any backend sqlalchemy/sqlalchemy#5649

Closed

sebastianelsner mentioned this issue Sep 16, 2021

text truncated with Filemaker #956

Closed

v-chojas mentioned this issue Jan 31, 2022

[ODBC Driver 17 for SQL Server] Unsupported cast of parameter data to text/ntext with collation 'Latin1_General_100_BIN2_UTF8'. #1021

Closed

gordthompson closed this as completed Feb 21, 2022

zzzeek mentioned this issue Oct 18, 2022

Invalid precision value when inserting long string on varchar column in SQLAlchemy 2.0.0b1 sqlalchemy/sqlalchemy#8661

Closed

Cannot insert strings with a length greater than 2000 into columns with a datatype of varchar(max) or nvarchar(max) using parametrised queries #835

Cannot insert strings with a length greater than 2000 into columns with a datatype of varchar(max) or nvarchar(max) using parametrised queries #835

Comments

nlykkei commented Oct 14, 2020 • edited

Environment

Issue

v-chojas commented Oct 14, 2020

nlykkei commented Oct 14, 2020 via email • edited

v-chojas commented Oct 14, 2020

nlykkei commented Oct 15, 2020 • edited

v-chojas commented Oct 15, 2020

nlykkei commented Oct 15, 2020

v-chojas commented Oct 15, 2020

nlykkei commented Oct 16, 2020

v-chojas commented Oct 16, 2020

gordthompson commented Oct 17, 2020

gordthompson commented Oct 17, 2020 • edited

gordthompson commented Oct 19, 2020

gordthompson commented Oct 20, 2020

v-chojas commented Oct 20, 2020

gordthompson commented Oct 20, 2020 • edited

mkleehammer commented Oct 25, 2020

mkleehammer commented Oct 25, 2020

mkleehammer commented Oct 26, 2020

mkleehammer commented Oct 26, 2020 • edited

mkleehammer commented Oct 26, 2020

zzzeek commented Oct 26, 2020

gordthompson commented Oct 26, 2020

v-chojas commented Oct 26, 2020

gordthompson commented Oct 26, 2020 • edited

mkleehammer commented Jan 22, 2021

gordthompson commented Jan 22, 2021 • edited

mkleehammer commented Jan 22, 2021

v-chojas commented Jan 22, 2021

gordthompson commented Jan 22, 2021

gordthompson commented Jan 22, 2021

mkleehammer commented Jan 23, 2021

zzzeek commented Jan 23, 2021

gordthompson commented Jan 24, 2021 • edited

keitherskine commented Jan 26, 2021

gordthompson commented Jan 27, 2021

gordthompson commented Feb 21, 2022

nlykkei commented Oct 14, 2020 •

edited

nlykkei commented Oct 14, 2020 via email •

edited

nlykkei commented Oct 15, 2020 •

edited

gordthompson commented Oct 17, 2020 •

edited

gordthompson commented Oct 20, 2020 •

edited

mkleehammer commented Oct 26, 2020 •

edited

gordthompson commented Oct 26, 2020 •

edited

gordthompson commented Jan 22, 2021 •

edited

gordthompson commented Jan 24, 2021 •

edited