Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot insert strings with a length greater than 2000 into columns with a datatype of varchar(max) or nvarchar(max) using parametrised queries #835

Closed
nlykkei opened this issue Oct 14, 2020 · 36 comments

Comments

@nlykkei
Copy link

nlykkei commented Oct 14, 2020

Environment

  • Python: 3.8.5
  • pyodbc: 4.0.30
  • OS: macOS 10.5.7
  • DB: MS SQL Server 2017 and 2019
  • driver: microsoft/mssql-release/msodbcsql17: stable 17.6.1.1

Issue

I cannot insert strings with a length greater than 2000 into columns with a datatype of varchar(max) or nvarchar(max) using parametrised queries.

In particular the POC (see below) fails with following output:

started
non-param: len(content)=2001: success
Traceback (most recent call last):
  File "test.py", line 54, in <module>
    cursor.execute(sql, (content,))
pyodbc.ProgrammingError: ('42000', "[42000] [Microsoft][ODBC Driver 17 for SQL Server][SQL Server]Cannot convert to text/ntext or collate to 'Latin1_General_100_CI_AS_SC' because these legacy LOB types do not support UTF-8 or UTF-16 encodings. Use types varchar(max), nvarchar(max) or a collation which does not have the _SC or _UTF8 flags. (4189) (SQLParamData)")

Every parametrised insert with a string having a length greater than 2000 fails with the above error message relating to a text/ntext conversion or Latin1_General_100_CI_AS_SC collation, which is strange considering that only plain ASCII is inserted.

How can I resolve this issue and insert strings of any length into the database?

POC:

import pyodbc

host = 'tcp:127.0.0.1,1433'
db = 'pyodbc_test'
user = 'sa'
pwd = 'P@ssw0rd'

print('started')

cnxn = pyodbc.connect('DRIVER={ODBC Driver 17 for SQL Server};SERVER='+host+';DATABASE='+'master'+';UID='+user+';PWD='+ pwd, autocommit=True)
cursor = cnxn.cursor()

try:
    cursor.execute(f'CREATE DATABASE {db} COLLATE Latin1_General_100_CI_AS_SC')
except pyodbc.ProgrammingError as e:
    pass # database exists

cursor.execute(f'USE {db}')

try:
    cursor.execute("""
        CREATE TABLE msg (
            id int identity(1,1) not null,
            content varchar(max) not null
        );""")
except pyodbc.ProgrammingError as exc:
    pass # table exists

content = 2001 * 'A'

cursor.execute(f"""
    INSERT INTO msg (content)
    VALUES ('{content}')""")
print(f'non-param: {len(content)=}: success')

# fails (pyodbc 4.0.30)
sql = f"""
      INSERT INTO msg (content)
      VALUES (?)"""
cursor.execute(sql, (content,))
print(f'param: {len(content)=}: success')

print('finished')
@v-chojas
Copy link
Contributor

I suspect this is due to how pyODBC binds these long parameters using a SQL type of SQL_LONGVARCHAR or SQL_WLONGVARCHAR , which translates to text/ntext in the ODBC Driver for SQL Server. Because it is driver-agnostic, pyODBC does not have special handling for varchar(max). You can try to use setinputsizes on your parameter to tell it to send as varchar(max): cursor.setinputsizes([(pyodbc.SQL_WVARCHAR,0,0),])

Does it work if you use a non-SC collation?

You can post an ODBC trace for further analysis.

@nlykkei
Copy link
Author

nlykkei commented Oct 14, 2020 via email

@v-chojas
Copy link
Contributor

https://github.com/mkleehammer/pyodbc/wiki/Troubleshooting-%E2%80%93-Generating-an-ODBC-trace-log

If you cannot use setinputsizes directly, then you can ask your ORM vendor about adding support for varchar(max): https://github.com/sqlalchemy/sqlalchemy/issues

@nlykkei
Copy link
Author

nlykkei commented Oct 15, 2020

Can you elaborate on the semantics of cursor.setinputsizes([(pyodbc.SQL_WVARCHAR,0,0),])? In particular, what do the 0's represent? Does it implement a true varchar(max) datatype allowing up to 2GB of data, or what is the maximum allowed size?

I can confirm that removing _SC from the database collation fixed the problem, but that's not an option for me, since I need to support Unicode supplementary characters (e.g. smileys):

cursor.execute(f'CREATE DATABASE {db} COLLATE Latin1_General_100_CI_AS') results in the (partial) output: param: len(content)=2001: success in the POC above.

Furthermore using setinputsizes fixed the problem:

cnxn = pyodbc.connect('DRIVER={ODBC Driver 17 for SQL Server};SERVER='+host+';DATABASE='+'master'+';UID='+user+';PWD='+ pwd, autocommit=True)
cursor = cnxn.cursor()
cursor.setinputsizes([(pyodbc.SQL_WVARCHAR,0,0),])

results in the output:

started
non-param: len(content)=2001: success
param: len(content)=2001: success
finished

ODBC trace:

I've attached two ODBC trace logs: with/without cursor.setinputsizes([(pyodbc.SQL_WVARCHAR,0,0),])

odbctrace.txt
odbctrace_setinputsizes.txt

$ odbcinst -j
unixODBC 2.3.9
DRIVERS............: /usr/local/etc/odbcinst.ini
SYSTEM DATA SOURCES: /usr/local/etc/odbc.ini
FILE DATA SOURCES..: /usr/local/etc/ODBCDataSources
USER DATA SOURCES..: /Users/nlykkei/.odbc.ini
SQLULEN Size.......: 8
SQLLEN Size........: 8
SQLSETPOSIROW Size.: 8

@v-chojas
Copy link
Contributor

That confirms my suspicions, pyODBC is using SQL_WLONGVARCHAR (-10) which the driver maps to ntext, and using setinputsizes to specify a column size of 0 correctly causes the driver to use nvarchar(max) instead.

Note that according to https://docs.microsoft.com/en-us/sql/relational-databases/collations/collation-and-unicode-support?view=sql-server-ver15#Supplementary_Characters the effect of using or not using _SC collations only affects string operations in the DB itself, which suggests that for simply storing the data and retrieving it, _SC won't show any difference even with high Unicode characters..

@nlykkei
Copy link
Author

nlykkei commented Oct 15, 2020

Why are we using cursor.setinputsizes([(pyodbc.SQL_WVARCHAR,0,0),]) instead of cursor.setinputsizes([(pyodbc.SQL_WLONGVARCHAR,0,0),]) then?

I don't understand the following: why does e.g. Azure Data Studio display the column datatype in the database as varchar(max), when apparently the driver maps the parameter to ntext, or when using setinputsizes(), it maps it to nvarchar(max)? For a SQL novice, it looks very messed up.

Just to be sure, it's the interaction between pyODBC that causes troubles, not the SQL Server database backend, right?

I previously experimented with non-_SC in my application, and it caused errors.

Is the conclusion that SQLAlchemy should fix this, or is it pyODBC's responsibility? Currently, SQLAlchemy says that it's pyODBC's responsibility, see my issue here: https://groups.google.com/g/sqlalchemy/c/Kk6DkPNWlR4

@v-chojas
Copy link
Contributor

SQL_WLONGVARCHAR will cause the driver to send ntext. SQL_WVARCHAR is required to send nvarchar.
The database column is truly varchar(max).

Is the conclusion that SQLAlchemy should fix this, or is it pyODBC's responsibility?

SQLAlchemy. pyODBC is generic and does not know about special handling of varchar(max), whereas SQLAlchemy appears to have code for specific database types. It needs to call setinputsizes as you described, when the length is more than maximum for non-max types (2K wide characters or 4K bytes).

(I cannot see that link, it requires login.)

@nlykkei
Copy link
Author

nlykkei commented Oct 16, 2020

Isn't it a problem that nvarchar is sent, when the column is varchar(max)? AFAIK, varchar is for non-Unicode data, where as nvarchar is for Unicode data.

_SC collations only affects string operations in the DB

These string operations include comparisons etc., so that if I want to filter on a column containing supplementary characters, I need the _SC collation? Or does string operations only refer to built-in functions?

@v-chojas
Copy link
Contributor

The server can convert between nvarchar and varchar.

@gordthompson
Copy link
Collaborator

Taking pyodbc out of the mix for a moment, this VBA code does not throw the error:

Sub gh_sqla_5651()
    Dim con As New ADODB.Connection
    con.Open _
        "DRIVER=ODBC Driver 17 for SQL Server;" & _
        "SERVER=(local)\SQLEXPRESS;" & _
        "DATABASE=master;" & _
        "Trusted_Connection=Yes;" & _
        "UseFMTONLY=Yes;"
        
    Const db_name = "gh_sqla_5651"
    Const table_name = "table1"
    
    Dim cmd As New ADODB.Command
    cmd.ActiveConnection = con
    cmd.CommandText = "DROP DATABASE IF EXISTS " & db_name
    cmd.Execute
    cmd.CommandText = "CREATE DATABASE " & db_name & " COLLATE Latin1_General_100_CI_AI_SC"
    cmd.Execute
    cmd.CommandText = "USE " & db_name
    cmd.Execute
    cmd.CommandText = "CREATE TABLE " & table_name & "(id int PRIMARY KEY, txt nvarchar(max))"
    cmd.Execute
    cmd.CommandText = "INSERT INTO " & table_name & " (id, txt) VALUES (?, ?)"
    cmd.Parameters.Append cmd.CreateParameter("?", adInteger, adParamInput, , 1)
    cmd.Parameters.Append cmd.CreateParameter("?", adLongVarWChar, adParamInput, 2 ^ 31 - 1, String(2001, "x"))
    cmd.Execute
End Sub

SQL Profiler shows the difference is that ADO produces this

exec sp_executesql N'INSERT INTO table1 (id, txt) VALUES (@P1, @P2)',N'@P1 int,@P2 ntext',1,N'xxx ... xxx'

while pyodbc produces this

declare @p1 int
set @p1=1
exec sp_prepare @p1 output,N'@P1 int,@P2 ntext',N'INSERT INTO table1 (id, txt) VALUES (@P1, @P2)',1
select @p1

exec sp_execute 1,1,N'xxx ... xxx'

exec sp_unprepare 1

In both cases the string parameter is declared as ntext. What's puzzling is that ADO calls SQLExecDirectW whereas pyodbc does an explicit SQLPrepareW even though this is a single .execute. In both cases the parameter is bound as SQL_WLONGVARCHAR with one subtle difference:

ADO:

dde             10b0-1060	EXIT  SQLBindParameter  with return code 0 (SQL_SUCCESS)
        HSTMT               0x080E79B0
        UWORD                        2 
        SWORD                        1 <SQL_PARAM_INPUT>
        SWORD                       -8 <SQL_C_WCHAR>
        SWORD                      -10 <SQL_WLONGVARCHAR>
        SQLULEN             2147483647
        SWORD                        0 
        PTR                0x12FFD068
        SQLLEN                  4004
        SQLLEN *            0x12EDC2C4 (4002)

pyodbc:

gh_sqla_5649    2060-1c5c	EXIT  SQLBindParameter  with return code 0 (SQL_SUCCESS)
        HSTMT               0x00000059159E3B80
        UWORD                        2 
        SWORD                        1 <SQL_PARAM_INPUT>
        SWORD                       -8 <SQL_C_WCHAR>
        SWORD                      -10 <SQL_WLONGVARCHAR>
        SQLULEN                 4002
        SWORD                        0 
        PTR                0x00000059124F9FD8
        SQLLEN                     8
        SQLLEN *            0x00000059124FA000 (-4102)

ado.LOG
pyodbc.LOG

@gordthompson
Copy link
Collaborator

gordthompson commented Oct 17, 2020

Also no error with this C# code:

using System;
using System.Data.Odbc;

namespace odbcConsoleApp
{
    class Program
    {
        static void Main(string[] args)
        {
            var connectionString =
                  "DRIVER=ODBC Driver 17 for SQL Server;"
                + "SERVER=(local)\\SQLEXPRESS;"
                + "DATABASE=master;"
                + "Trusted_Connection=Yes;"
                + "UseFMTONLY=Yes;";
            var db_name = "gh_sqla_5651";
            var table_name = "table1";
            using (var con = new OdbcConnection(connectionString))
            {
                con.Open();
                using (var cmd = new OdbcCommand())
                {
                    cmd.Connection = con;
                    cmd.CommandText = $"DROP DATABASE IF EXISTS {db_name}";
                    cmd.ExecuteNonQuery();
                    cmd.CommandText = $"CREATE DATABASE {db_name} COLLATE Latin1_General_100_CI_AI_SC";
                    cmd.ExecuteNonQuery();
                    cmd.CommandText = $"USE {db_name}";
                    cmd.ExecuteNonQuery();
                    cmd.CommandText = $"CREATE TABLE {table_name} (id int PRIMARY KEY, txt nvarchar(max))";
                    cmd.ExecuteNonQuery();
                    cmd.CommandText = $"INSERT INTO {table_name} (id, txt) VALUES (?, ?)";
                    cmd.Parameters.Add("?", OdbcType.Int).Value = 1;
                    cmd.Parameters.Add("?", OdbcType.NText).Value = new String('x', 2001);
                    cmd.ExecuteNonQuery();
                }
            }
        }
    }
}

From SQL Profiler's perspective it is the same as ADO:

exec sp_executesql N'INSERT INTO table1 (id, txt) VALUES (@P1, @P2)',N'@P1 int,@P2 ntext',1,N'xxx ... xxx'

and the ODBC trace shows

odbcConsoleApp  269c-2a94	EXIT  SQLBindParameter  with return code 0 (SQL_SUCCESS)
        HSTMT               0x057808C8
        UWORD                        2 
        SWORD                        1 <SQL_PARAM_INPUT>
        SWORD                       -8 <SQL_C_WCHAR>
        SWORD                      -10 <SQL_WLONGVARCHAR>
        SQLULEN                 2001
        SWORD                        0 
        PTR                0x00CF5E1C
        SQLLEN                  4004
        SQLLEN *            0x00CF5E18 (4002)

csharp.LOG

@gordthompson
Copy link
Collaborator

@v-chojas - For completeness I added an explicit cmd.Prepare(); to my C# code, expecting it to fail, but it didn't.

using System;
using System.Data.Odbc;

namespace odbcConsoleApp
{
    class Program
    {
        static void Main(string[] args)
        {
            Console.WriteLine(String.Format("{0}-bit", IntPtr.Size * 8));
            var connectionString =
                  "DRIVER=ODBC Driver 17 for SQL Server;"
                + "SERVER=(local)\\SQLEXPRESS;"
                + "DATABASE=master;"
                + "Trusted_Connection=Yes;"
                + "UseFMTONLY=Yes;";
            var db_name = "gh_sqla_5651";
            var table_name = "table1";
            using (var con = new OdbcConnection(connectionString))
            {
                con.Open();
                using (var cmd = new OdbcCommand())
                {
                    cmd.Connection = con;

                    cmd.CommandText = $"DROP DATABASE IF EXISTS {db_name}";
                    cmd.ExecuteNonQuery();
                    cmd.CommandText = $"CREATE DATABASE {db_name} COLLATE Latin1_General_100_CI_AI_SC";
                    cmd.ExecuteNonQuery();
                    cmd.CommandText = $"USE {db_name}";
                    cmd.ExecuteNonQuery();
                    cmd.CommandText = $"CREATE TABLE {table_name} (id int PRIMARY KEY, txt nvarchar(max))";
                    cmd.ExecuteNonQuery();
                    cmd.CommandText = $"INSERT INTO {table_name} (id, txt) VALUES (?, ?)";
                    cmd.Prepare();  // new
                    cmd.Parameters.Add("?", OdbcType.Int).Value = 1;
                    cmd.Parameters.Add("?", OdbcType.NText).Value = new String('x', 2001);
                    cmd.ExecuteNonQuery();
                }
            }
        }
    }
}

csharp_prepare.LOG shows that it is calling SQLPrepareW

odbcConsoleApp  1aec-184	EXIT  SQLPrepareW  with return code 0 (SQL_SUCCESS)
        HSTMT               0x000000554BB97A90
        WCHAR *             0x0000005533096254 [      -3] "INSERT INTO table1 (id, txt) VALUES (?, ?)\ 0"
        SDWORD                    -3

and binding the parameter in the expected way

odbcConsoleApp  1aec-184	EXIT  SQLBindParameter  with return code 0 (SQL_SUCCESS)
        HSTMT               0x000000554BB97A90
        UWORD                        2 
        SWORD                        1 <SQL_PARAM_INPUT>
        SWORD                       -8 <SQL_C_WCHAR>
        SWORD                      -10 <SQL_WLONGVARCHAR>
        SQLULEN                 2001
        SWORD                        0 
        PTR                0x000000554C51B648
        SQLLEN                  4004
        SQLLEN *            0x000000554C51B640 (4002)

but SQL Profiler shows sp_prepexec being called

declare @p1 int
set @p1=1
exec sp_prepexec @p1 output,N'@P1 int,@P2 ntext',N'INSERT INTO table1 (id, txt) VALUES (@P1, @P2)',1,N'xxx ... xxx'
select @p1

Is there a clue in the ODBC trace that might explain why pyodbc and System.Data.Odbc can both call SQLPrepareW and get different results?

@gordthompson
Copy link
Collaborator

@v-chojas - The test code also completes successfully with PDO_ODBC under PHP:

<?php
header('Content-Type: text/html; charset=utf-8');
?>
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>PDO example</title>
</head>
<body>
<?php
echo (8 * PHP_INT_SIZE) . "-bit<br/>";
$connStr = 
        'odbc:' .
        'DRIVER=ODBC Driver 17 for SQL Server;' .
        'SERVER=.\\SQLEXPRESS;' .
        'DATABASE=master;' .
        'Trusted_Connection=yes;' .
        'UseFMTONLY=yes';
$dbh = new PDO($connStr);
$dbh->setAttribute(PDO::ATTR_ERRMODE, PDO::ERRMODE_EXCEPTION);

$db_name = "gh_sqla_5651";
$table_name = "table1";

$sth = $dbh->prepare("DROP DATABASE IF EXISTS " . $db_name);
$sth->execute();
$sth = $dbh->prepare("CREATE DATABASE " . $db_name . " COLLATE Latin1_General_100_CI_AI_SC");
$sth->execute();
$sth = $dbh->prepare("USE " . $db_name);
$sth->execute();
$sth = $dbh->prepare("CREATE TABLE " . $table_name . "(id int PRIMARY KEY, txt nvarchar(max))");
$sth->execute();
$sth = $dbh->prepare("INSERT INTO " . $table_name . " (id, txt) VALUES (?, ?)");
$id = 1;
$sth->bindParam(1, $id, PDO::PARAM_INT);
$txt = str_repeat("x", 2001);
$sth->bindParam(2, $txt, PDO::PARAM_STR);
$sth->execute();
?>
</body>
</html>

SQL Profiler shows

set fmtonly on select id,txt from table1 where 1=2 set fmtonly off

declare @p1 int
set @p1=5
exec sp_prepare @p1 output,N'@P1 int,@P2 nvarchar(max)',N'INSERT INTO table1 (id, txt) VALUES (@P1, @P2)',1
select @p1

exec sp_execute 5,1,N'xxx ... xxx'

exec sp_unprepare 5

pyodbc is definitely the outlier here.

@v-chojas
Copy link
Contributor

The negative indicator (-4102) shows that pyODBC is attempting to use DAE. A comment in the code agrees, and while msodbcsql doesn't require the use of DAE for the long types, other drivers may.

@gordthompson
Copy link
Collaborator

gordthompson commented Oct 20, 2020

@v-chojas - Thanks for the explanation. So if you could add an avoid_dae_on_execute flag to pyodbc, e.g.,

cnxn = pyodbc.connect(connection_string, avoid_dae_on_execute=True)

then I can tweak SQLAlchemy's mssql+pyodbc dialect to add that to the pyodbc.connect() call. As @zzzeek mentions here, a workaround for SQLAlchemy using setinputsizes is simply not practical.

mkleehammer added a commit that referenced this issue Oct 25, 2020
@mkleehammer
Copy link
Owner

The first thing we need is a failing unit test. I've added varchar(max) and nvarchar(max) tests to tests3/sqlservertests.py. They pass on Windows. I don't have my new Mac setup for SQL Server just yet. Can someone confirm these tests fail with Mac?

Part of the setup in cnxninfo.cpp code is to find the longest supported varchar and wvarchar lengths using SQLGetTypeInfo. IIRC, anything larger than this is supposed to require DAE. I need to find the docs for this and add more comments.

@mkleehammer
Copy link
Owner

I increased the varchar(max) and nvarchar(max) tests to use the large sizes (up to 20K) and connected to my Windows SQL Server test machine from the Mac test machine using the MS SQL Server ODBC Driver v17. No problems.

Here's my system info.

(venv39) [1] Mac ~/dev/pyodbc/tests3 $ python sqlservertests.py -v                                              [master]
/Users/mkleehammer/dev/pyodbc/tests3 --> /Users/mkleehammer/dev/pyodbc
CS: DSN=mssql;UID=test;PWD=test
python:  3.9.0 (v3.9.0:9cf6752276, Oct  5 2020, 11:29:23)
[Clang 6.0 (clang-600.0.57)]
pyodbc:  4.0.30 /Users/mkleehammer/dev/pyodbc/build/lib.macosx-10.9-x86_64-3.9/pyodbc.cpython-39-darwin.so
odbc:    03.52
driver:  libmsodbcsql.17.dylib 17.06.0001
         supports ODBC version 03.52
os:      Darwin
unicode: Py_Unicode=4 SQLWCHAR=2
Max VARCHAR = 8000
Max WVARCHAR = 4000
Max BINARY = 8000

Also note the Max items at the end. These are the maximum supported sizes according to the driver. Anything above these sizes should (and does) use DAE. Is this not what we expect?

Do these tests fail on anyone else's machine? I must be missing something.

@mkleehammer
Copy link
Owner

The example test program also works for me. Here is the trace of the two inserts.

		Exit:[SQL_SUCCESS]
[ODBC][35846][1603670293.046335][SQLExecDirectW.c][177]
		Entry:
			Statement = 0x7fab84062200
			SQL = [
    INSERT INTO msg (content)
    VALUES ('AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA...][length = 2047]
[ODBC][35846][1603670293.051688][SQLExecDirectW.c][456]
		Exit:[SQL_SUCCESS]
[ODBC][35846][1603670293.051984][SQLRowCount.c][173]
		Entry:
			Statement = 0x7fab84062200
			Row Count = 0x7ffeec186c90
[ODBC][35846][1603670293.052464][SQLRowCount.c][247]
		Exit:[SQL_SUCCESS]
			Row Count = 0x7ffeec186c90 -> 1 (64 bits)
[ODBC][35846][1603670293.052716][SQLNumResultCols.c][156]
		Entry:
			Statement = 0x7fab84062200
			Column Count = 0x7ffeec186c80
[ODBC][35846][1603670293.053143][SQLNumResultCols.c][251]
		Exit:[SQL_SUCCESS]
			Count = 0x7ffeec186c80 -> 0
[ODBC][35846][1603670293.053452][SQLFreeStmt.c][144]
		Entry:
			Statement = 0x7fab84062200
			Option = 0
[ODBC][35846][1603670293.053667][SQLFreeStmt.c][266]
		Exit:[SQL_SUCCESS]
[ODBC][35846][1603670293.053957][SQLPrepareW.c][165]
		Entry:
			Statement = 0x7fab84062200
			SQL = [
      INSERT INTO msg (content)
      VALUES (?)][length = 49]
[ODBC][35846][1603670293.054156][SQLPrepareW.c][346]
		Exit:[SQL_SUCCESS]
[ODBC][35846][1603670293.054608][SQLNumParams.c][144]
		Entry:
			Statement = 0x7fab84062200
			Param Count = 0x7ffeec186be6
[ODBC][35846][1603670293.054771][SQLNumParams.c][231]
		Exit:[SQL_SUCCESS]
			Count = 0x7ffeec186be6 -> 1
[ODBC][35846][1603670293.055090][SQLBindParameter.c][217]
		Entry:
			Statement = 0x7fab84062200
			Param Number = 1
			Param Type = 1
			C Type = -8 SQL_C_WCHAR
			SQL Type = -10 
			Col Def = 4002
			Scale = 0
			Rgb Value = 0x7fab83a30ae0
			Value Max = 8
			StrLen Or Ind = 0x7fab83a30b08
[ODBC][35846][1603670293.055303][SQLBindParameter.c][434]
		Exit:[SQL_SUCCESS]
[ODBC][35846][1603670293.055486][SQLExecute.c][187]
		Entry:
			Statement = 0x7fab84062200
[ODBC][35846][1603670293.059333][SQLExecute.c][357]
		Exit:[SQL_NEED_DATA]
[ODBC][35846][1603670293.059531][SQLParamData.c][166]
		Entry:
			Statement = 0x7fab84062200
			Value = 0x7ffeec186c80
[ODBC][35846][1603670293.059890][SQLParamData.c][362]
		Exit:[SQL_NEED_DATA]
			Value = 0x7fab83a30ae0
[ODBC][35846][1603670293.060159][SQLPutData.c][148]
		Entry:
			Statement = 0x7fab84062200
			Data = 0x7fab84075220
			StrLen = 4000
[ODBC][35846][1603670293.060487][SQLPutData.c][315]
		Exit:[SQL_SUCCESS]
[ODBC][35846][1603670293.060646][SQLPutData.c][148]
		Entry:
			Statement = 0x7fab84062200
			Data = 0x7fab840761c0
			StrLen = 2
[ODBC][35846][1603670293.060799][SQLPutData.c][315]
		Exit:[SQL_SUCCESS]
[ODBC][35846][1603670293.061167][SQLParamData.c][166]
		Entry:
			Statement = 0x7fab84062200
			Value = 0x7ffeec186c80
[ODBC][35846][1603670293.066789][SQLParamData.c][362]
		Exit:[SQL_SUCCESS]
			Value = 0x7fab83a30ae0

@mkleehammer
Copy link
Owner

mkleehammer commented Oct 26, 2020

This seems important. From the OP's ODBC trace log.

This explains why it worked on my machine - I used my existing test database and missed the COLLATE Latin1_General_100_CI_AS_SC when creating the database.

[ODBC][37][1602743956.220306][SQLParamData.c][362]
		Exit:[SQL_ERROR]
			Value = 0x564861448eb0
		DIAG [42000] [Microsoft][ODBC Driver 17 for SQL Server][SQL Server]Cannot convert to
		text/ntext or collate to 'Latin1_General_100_CI_AS_SC' because these legacy LOB types
		do not support UTF-8 or UTF-16 encodings. Use types varchar(max), nvarchar(max) or a
		collation which does not have the _SC or _UTF8 flags.



[ODBC][37][1602743956.220352][SQLGetDiagRecW.c][535]
		Entry:
			Statement = 0x5648613e86a0
			Rec Number = 1
			SQLState = 0x7ffcb3ba7fe4
			Native = 0x7ffcb3ba7fc4
			Message Text = 0x7ffcb3ba7ff0
			Buffer Length = 1023
			Text Len Ptr = 0x7ffcb3ba7fc2
[ODBC][37][1602743956.220375][SQLGetDiagRecW.c][596]
		Exit:[SQL_SUCCESS]
			SQLState = [42000]
			Native = 0x7ffcb3ba7fc4 -> 4189
			Message Text = [[Microsoft][ODBC Driver 17 for SQL Server][SQL Server]Cannot
			               convert to text/ntext or collate to 'Latin1_General_100_CI_AS_SC' b]

Can anyone explain? The recent SQL Server Unicode changes are new to me.

https://feedback.azure.com/forums/908035-sql-server/suggestions/39319090-utf-8-in-sql-2019-binary-collation-latin1-gener

@zzzeek
Copy link

zzzeek commented Oct 26, 2020

@mkleehammer thanks for looking into this. not much I can add, folks here seem to know SQL server much better than I do.

@gordthompson
Copy link
Collaborator

@mkleehammer (cc: @v-chojas )

This seems important.

Yes, the _SC collation as the database default is the key. No problems with a database default of Latin1_General_100_CI_AS (without the _SC) and then creating a table with

CREATE TABLE t1 (id int primary key, txt nvarchar(max) COLLATE Latin1_General_100_CI_AS_SC)

so we can't even use that to create a failing test. We need to do

CREATE DATABASE foo COLLATE Latin1_General_100_CI_AS_SC
USE foo
CREATE TABLE t1 (id int primary key, txt nvarchar(max))

The reason why System.Data.Odbc, ADO, and PDO_ODBC all work seems to be that they don't use DAE. As a test I tried changing

if (maxlength == 0 || cb <= maxlength || isTVP)

to

    bool avoid_dae_on_execute = true;  // POC for #835
    
    if (maxlength == 0 || cb <= maxlength || isTVP || avoid_dae_on_execute)

but that failed with "Invalid Precision value (0)".

Given that Microsoft shut down that feedback.azure.com thread pretty darn quick the situation doesn't look too promising.

My gut feeling is that msodbcsql should just stop sending the parameter values as text/ntext and start sending them as varchar(max)/nvarchar(max). According to that Azure thread, text/ntext have been deprecated since SQL Server 2005.

However, msodbcsql is a total black box. AFAIK there isn't even a forum for suggestions or an issue tracker. The closest I've found is https://github.com/microsoft/msphpsql but that is for the PHP layer over msodbcsql, not msodbcsql itself.

@v-chojas
Copy link
Contributor

My gut feeling is that msodbcsql should just stop sending the parameter values as text/ntext and start sending them as varchar(max)/nvarchar(max).

For backwards compatibility, that will not change. If you don't want text/ntext then bind using a SQL type of SQL_VARCHAR/SQL_WVARCHAR, not the LONG variations (which map to text/ntext), and specify a "column length" of 0 which tells it to use the (max) type.

@gordthompson
Copy link
Collaborator

gordthompson commented Oct 26, 2020

My gut feeling is that msodbcsql should just stop sending the parameter values as text/ntext and start sending them as varchar(max)/nvarchar(max).

For backwards compatibility, that will not change.

It need not be an immediate breaking change. msodbcsql could accept a connection string parameter named "UseNVarcharMax", which would default to "no" and continue the current text/ntext behaviour. UseNVarcharMax=yes would tell the driver to use nvarchar(max) instead of ntext and varchar(max) instead of text.

Precedent:

UseFMTONLY=yes was introduced in msodbcsql 17.0. It uses a legacy feature whose documentation says

(!) Note

Do not use this feature. This feature has been replaced by the following items:
…

If you don't want text/ntext then bind using a SQL type of SQL_VARCHAR/SQL_WVARCHAR, not the LONG variations (which map to text/ntext), and specify a "column length" of 0 which tells it to use the (max) type.

Is setinputsizes the only way to do that? If so, then you're saying that SQLAlchemy's mssql dialect must inspect every column object for every DML operation, see if it is a string, and if so then issue a setinputsizes before executing. That is simply not practical.

@mkleehammer
Copy link
Owner

I'm sorry I'm just now getting back to this.

I'm wondering if DAE should ever be used. I need to test with some populate DBs, but maybe it should default to off. What does everyone think of investigating that solution?

Also, note that you can quickly test turning of DAE by setting cnxn.maxwrite to something larger than the largest string you'll write (1GB or something). If I remember correctly, that would be the same as @gordthompson's test Boolean, but would let you test something with an existing pyodbc version.

I would guess most drivers really don't care about DAE, so another option would be to default maxwrite to None (0 internally in C) and use that to indicate we never use it, but allow people to set it to a positive int to enable DAE above that value.

Thoughts?

@gordthompson
Copy link
Collaborator

gordthompson commented Jan 22, 2021

In the original repro code if I add

cnxn.maxwrite = 1_000_000

immediately after the pyodbc.connect() call I get

gh_pyodbc_835   19a4-1ee4	EXIT  SQLBindParameter  with return code -1 (SQL_ERROR)
        HSTMT               0x03D003A8
        UWORD                        1 
        SWORD                        1 <SQL_PARAM_INPUT>
        SWORD                       -8 <SQL_C_WCHAR>
        SWORD                       -9 <SQL_WVARCHAR>
        SQLULEN                 4002
        SWORD                        0 
        PTR                0x00F10E88
        SQLLEN                  4002
        SQLLEN *            0x00EE48C4

        DIAG [HY104] [Microsoft][ODBC Driver 17 for SQL Server]Invalid precision value (0) 

SQL.LOG

(Tested with pyodbc 4.0.31b51.)

@mkleehammer
Copy link
Owner

It says "precision" and the DecimalDigits parameter to SQLBindParameter is 0 in the log. That must be what it is complaining about. The documentation states the DecimalDigits is ignored when inserting character data:

Does it only happen when we cross the 4000 character boundary?

@v-chojas @v-makouz Can either of you provide any insight here?

@v-chojas
Copy link
Contributor

@mkleehammer Good to see you working on pyODBC again!

When the length is nonzero and the SQL type is not a long type, the driver interprets that as a non-max type, and as we know a non-max nvarchar is restricted to 4000 at most. If you specify a length of 0, the driver interprets that to mean use a max-type.

Relevant documentation here; https://docs.microsoft.com/en-us/sql/relational-databases/native-client/features/using-large-value-types

@gordthompson
Copy link
Collaborator

Does it only happen when we cross the 4000 character boundary?

The 4000 byte boundary, yes. With the above cnxn.maxwrite = 1_000_000 modification still in place

content = 2000 * 'A'

succeeds, while

content = 2001 * 'A'

fails.

@gordthompson
Copy link
Collaborator

I would guess most drivers really don't care about DAE

In an earlier comment @v-chojas said

"while msodbcsql doesn't require the use of DAE for the long types, other drivers may"

@mkleehammer
Copy link
Owner

Thanks to @gordthompson, we have some data to look at. Here is a summary of his tests.

Test Pass ValueType ParameterType ColumnSize DecimalDigits BufferLength StrLen
pyodbc F wchar WLONGVARCHAR 4002 0 8 AE(4002)
ADO P wchar WLONGVARCHAR 2147483647 0 4004 4002
C# P wchar WLONGVARCHAR 2001 0 4004 4002
maxwrite F wchar WVARCHAR 4002 0 4002 ?

This leads me to a few questions:

  • pyodbc is passing 4002 but C# is passing a ColumnSize of 2001 with a long data type and it
    is working. What is the right value for ColumnSize? We might need to fix this even if it
    is unrelated.

  • As seen in the C# example, DAE is not required to send long data. It is passing
    WLONGVARCHAR but not using DAE. Is there a way to know this? I truly doubt most open
    source drivers really care and would happily accept large buffers here.

  • How is the C# version working? I thought the maximum size for nvarchar was 2000. Does it
    work just because it is a max column or is it because its limit is 4000 characters instead
    of 4000 bytes.

  • Does the C# approach work with SQL Server earlier than 2005? The
    link
    provided says special handling was required before 2005. I assume we still want to
    support 2001.

  • I'm not sure what the ColumnSize value ADO is passing is. Possible a constant. I thought
    I saw an "unlimited" constant somewhere, but I'm not finding it now.

Here is what doesn't work with SQL Server:

  • Using DAE and a non-zero column size because the long data type forces it to be ntext (pyodbc).

  • Not using DAE or long size but sending a column size over 4K (maxwrite).

I'm still not 100% certain of what does work:

  • Does the C# approach of using long types and passing the actual length work past 4K? If
    so, it isn't a complete solution as we'd still need a threshold.

@v-chojas's comments seem to say that the preferred approach is to use WVARCHAR with a 0
column size.

Will this work with older SQL Server versions like 2001?

A bigger problem is how do we when know when to do this? It's not quite just avoiding DAE.
It sounds like we also have to pass 0 unless I missed a combination that works.

Is it time to introduce a DBMS-type or dialect attribute with all of these settings? Then
we can tweak things for certain databases when necessary? I've tried to avoid it and still
dislike it, but I also hate to have 20 parameters you have to set when connecting just to
get it to work. I always have a single connect function where it is all kept together, so
it isn't that terrible, but it might be confusing for newbies. It might be much easier to
use connect(.., dialect="mssqlserver").

We'd define the generic default and then some DB specific ones like "mssqlserver",
"postgrsql", and "mysql".

@zzzeek
Copy link

zzzeek commented Jan 23, 2021

Is it time to introduce a DBMS-type or dialect attribute with all of these settings? Then
we can tweak things for certain databases when necessary?

IMO this is what pyodbc should do and I believe this mechanism can be used to help with issues like #134 as well (?)

@gordthompson
Copy link
Collaborator

gordthompson commented Jan 24, 2021

Is it time to introduce a DBMS-type or dialect attribute with all of these settings? Then
we can tweak things for certain databases when necessary? I've tried to avoid it and still
dislike it,

I agree that it's not an ideal solution but it may be the most practical approach, especially since vendors cannot be relied upon to fix defects in their ODBC drivers.

@keitherskine
Copy link
Collaborator

Bundling a bunch of DBMS-specific pyodbc parameters into a connection "dialect" does seem to be an elegant way of making pyodbc work with a variety of DBMSs that don't comply 100% with the ODBC standard. This dialect approach is somewhat like turbodbc or the built-in Python csv reader object, and works well in my experience. Perhaps pyodbc could even try to detect the DBMS on connection (from SQL_DBMS_NAME and SQL_DBMS_VER?) so clients don't need to specify it at all.

@gordthompson
Copy link
Collaborator

Perhaps pyodbc could even try to detect the DBMS on connection (from SQL_DBMS_NAME and SQL_DBMS_VER?) so clients don't need to specify it at all.

Good idea, but I would recommend going slow on that. AFAIK there is no standard naming convention for those sorts of attributes, so in doing so we could potentially be opting-in to a perpetual game of catch-up with "the names of things", e.g.,

if type_name == 'msodbcsql':
return ('msodbcsql' in driver_name) or ('sqlncli' in driver_name) or ('sqlsrv32.dll' == driver_name)

My preferred approach would be to keep the default behaviour and get the (hard-coded) profiles defined first, then try to make it automatic if/when the customers ask for it.

@gordthompson
Copy link
Collaborator

My gut feeling is that msodbcsql should just stop sending the parameter values as text/ntext and start sending them as varchar(max)/nvarchar(max).

For backwards compatibility, that will not change.

It need not be an immediate breaking change. msodbcsql could accept a connection string parameter named "UseNVarcharMax", which would default to "no" and continue the current text/ntext behaviour. UseNVarcharMax=yes would tell the driver to use nvarchar(max) instead of ntext and varchar(max) instead of text.

As reported in this comment, ODBC Driver 18 for SQL Server has apparently added just such a parameter, named LongAsMax

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants