New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cannot insert strings with a length greater than 2000 into columns with a datatype of varchar(max) or nvarchar(max) using parametrised queries #835
Comments
I suspect this is due to how pyODBC binds these long parameters using a SQL type of SQL_LONGVARCHAR or SQL_WLONGVARCHAR , which translates to text/ntext in the ODBC Driver for SQL Server. Because it is driver-agnostic, pyODBC does not have special handling for varchar(max). You can try to use Does it work if you use a non-SC collation? You can post an ODBC trace for further analysis. |
Thank you.
I’m not a direct user of `pyodbc` - I only use it indirectly through
SQLAlchemy. Therefore, the use of `setinputsizes` is not really an option for
me, since I deal with the ORM abstraction only.
How do I make a OBDC trace? Once I’ve done that, I will provide answers to
all your questions.
I see that `text`/`ntext` are deprecated, since they don’t support Unicode
supplementary characters (_SC) (0x10000 -). How can I use data types with
support for `_SC`?
As I recall, removing `_SC` fixes the issue (I will post a definite answer
later), but I need to support supplementary characters in my application.
|
https://github.com/mkleehammer/pyodbc/wiki/Troubleshooting-%E2%80%93-Generating-an-ODBC-trace-log If you cannot use setinputsizes directly, then you can ask your ORM vendor about adding support for varchar(max): https://github.com/sqlalchemy/sqlalchemy/issues |
Can you elaborate on the semantics of I can confirm that removing
Furthermore using
results in the output:
ODBC trace: I've attached two ODBC trace logs: with/without odbctrace.txt
|
That confirms my suspicions, pyODBC is using SQL_WLONGVARCHAR (-10) which the driver maps to ntext, and using setinputsizes to specify a column size of 0 correctly causes the driver to use nvarchar(max) instead. Note that according to https://docs.microsoft.com/en-us/sql/relational-databases/collations/collation-and-unicode-support?view=sql-server-ver15#Supplementary_Characters the effect of using or not using _SC collations only affects string operations in the DB itself, which suggests that for simply storing the data and retrieving it, _SC won't show any difference even with high Unicode characters.. |
Why are we using I don't understand the following: why does e.g. Azure Data Studio display the column datatype in the database as Just to be sure, it's the interaction between pyODBC that causes troubles, not the SQL Server database backend, right? I previously experimented with non- Is the conclusion that SQLAlchemy should fix this, or is it pyODBC's responsibility? Currently, SQLAlchemy says that it's pyODBC's responsibility, see my issue here: https://groups.google.com/g/sqlalchemy/c/Kk6DkPNWlR4 |
SQL_WLONGVARCHAR will cause the driver to send ntext. SQL_WVARCHAR is required to send nvarchar.
SQLAlchemy. pyODBC is generic and does not know about special handling of varchar(max), whereas SQLAlchemy appears to have code for specific database types. It needs to call setinputsizes as you described, when the length is more than maximum for non-max types (2K wide characters or 4K bytes). (I cannot see that link, it requires login.) |
Isn't it a problem that
These string operations include comparisons etc., so that if I want to filter on a column containing supplementary characters, I need the |
The server can convert between nvarchar and varchar. |
Taking pyodbc out of the mix for a moment, this VBA code does not throw the error: Sub gh_sqla_5651()
Dim con As New ADODB.Connection
con.Open _
"DRIVER=ODBC Driver 17 for SQL Server;" & _
"SERVER=(local)\SQLEXPRESS;" & _
"DATABASE=master;" & _
"Trusted_Connection=Yes;" & _
"UseFMTONLY=Yes;"
Const db_name = "gh_sqla_5651"
Const table_name = "table1"
Dim cmd As New ADODB.Command
cmd.ActiveConnection = con
cmd.CommandText = "DROP DATABASE IF EXISTS " & db_name
cmd.Execute
cmd.CommandText = "CREATE DATABASE " & db_name & " COLLATE Latin1_General_100_CI_AI_SC"
cmd.Execute
cmd.CommandText = "USE " & db_name
cmd.Execute
cmd.CommandText = "CREATE TABLE " & table_name & "(id int PRIMARY KEY, txt nvarchar(max))"
cmd.Execute
cmd.CommandText = "INSERT INTO " & table_name & " (id, txt) VALUES (?, ?)"
cmd.Parameters.Append cmd.CreateParameter("?", adInteger, adParamInput, , 1)
cmd.Parameters.Append cmd.CreateParameter("?", adLongVarWChar, adParamInput, 2 ^ 31 - 1, String(2001, "x"))
cmd.Execute
End Sub SQL Profiler shows the difference is that ADO produces this
while pyodbc produces this
In both cases the string parameter is declared as ADO:
pyodbc:
|
Also no error with this C# code: using System;
using System.Data.Odbc;
namespace odbcConsoleApp
{
class Program
{
static void Main(string[] args)
{
var connectionString =
"DRIVER=ODBC Driver 17 for SQL Server;"
+ "SERVER=(local)\\SQLEXPRESS;"
+ "DATABASE=master;"
+ "Trusted_Connection=Yes;"
+ "UseFMTONLY=Yes;";
var db_name = "gh_sqla_5651";
var table_name = "table1";
using (var con = new OdbcConnection(connectionString))
{
con.Open();
using (var cmd = new OdbcCommand())
{
cmd.Connection = con;
cmd.CommandText = $"DROP DATABASE IF EXISTS {db_name}";
cmd.ExecuteNonQuery();
cmd.CommandText = $"CREATE DATABASE {db_name} COLLATE Latin1_General_100_CI_AI_SC";
cmd.ExecuteNonQuery();
cmd.CommandText = $"USE {db_name}";
cmd.ExecuteNonQuery();
cmd.CommandText = $"CREATE TABLE {table_name} (id int PRIMARY KEY, txt nvarchar(max))";
cmd.ExecuteNonQuery();
cmd.CommandText = $"INSERT INTO {table_name} (id, txt) VALUES (?, ?)";
cmd.Parameters.Add("?", OdbcType.Int).Value = 1;
cmd.Parameters.Add("?", OdbcType.NText).Value = new String('x', 2001);
cmd.ExecuteNonQuery();
}
}
}
}
} From SQL Profiler's perspective it is the same as ADO:
and the ODBC trace shows
|
@v-chojas - For completeness I added an explicit using System;
using System.Data.Odbc;
namespace odbcConsoleApp
{
class Program
{
static void Main(string[] args)
{
Console.WriteLine(String.Format("{0}-bit", IntPtr.Size * 8));
var connectionString =
"DRIVER=ODBC Driver 17 for SQL Server;"
+ "SERVER=(local)\\SQLEXPRESS;"
+ "DATABASE=master;"
+ "Trusted_Connection=Yes;"
+ "UseFMTONLY=Yes;";
var db_name = "gh_sqla_5651";
var table_name = "table1";
using (var con = new OdbcConnection(connectionString))
{
con.Open();
using (var cmd = new OdbcCommand())
{
cmd.Connection = con;
cmd.CommandText = $"DROP DATABASE IF EXISTS {db_name}";
cmd.ExecuteNonQuery();
cmd.CommandText = $"CREATE DATABASE {db_name} COLLATE Latin1_General_100_CI_AI_SC";
cmd.ExecuteNonQuery();
cmd.CommandText = $"USE {db_name}";
cmd.ExecuteNonQuery();
cmd.CommandText = $"CREATE TABLE {table_name} (id int PRIMARY KEY, txt nvarchar(max))";
cmd.ExecuteNonQuery();
cmd.CommandText = $"INSERT INTO {table_name} (id, txt) VALUES (?, ?)";
cmd.Prepare(); // new
cmd.Parameters.Add("?", OdbcType.Int).Value = 1;
cmd.Parameters.Add("?", OdbcType.NText).Value = new String('x', 2001);
cmd.ExecuteNonQuery();
}
}
}
}
} csharp_prepare.LOG shows that it is calling
and binding the parameter in the expected way
but SQL Profiler shows
Is there a clue in the ODBC trace that might explain why pyodbc and |
@v-chojas - The test code also completes successfully with PDO_ODBC under PHP: <?php
header('Content-Type: text/html; charset=utf-8');
?>
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>PDO example</title>
</head>
<body>
<?php
echo (8 * PHP_INT_SIZE) . "-bit<br/>";
$connStr =
'odbc:' .
'DRIVER=ODBC Driver 17 for SQL Server;' .
'SERVER=.\\SQLEXPRESS;' .
'DATABASE=master;' .
'Trusted_Connection=yes;' .
'UseFMTONLY=yes';
$dbh = new PDO($connStr);
$dbh->setAttribute(PDO::ATTR_ERRMODE, PDO::ERRMODE_EXCEPTION);
$db_name = "gh_sqla_5651";
$table_name = "table1";
$sth = $dbh->prepare("DROP DATABASE IF EXISTS " . $db_name);
$sth->execute();
$sth = $dbh->prepare("CREATE DATABASE " . $db_name . " COLLATE Latin1_General_100_CI_AI_SC");
$sth->execute();
$sth = $dbh->prepare("USE " . $db_name);
$sth->execute();
$sth = $dbh->prepare("CREATE TABLE " . $table_name . "(id int PRIMARY KEY, txt nvarchar(max))");
$sth->execute();
$sth = $dbh->prepare("INSERT INTO " . $table_name . " (id, txt) VALUES (?, ?)");
$id = 1;
$sth->bindParam(1, $id, PDO::PARAM_INT);
$txt = str_repeat("x", 2001);
$sth->bindParam(2, $txt, PDO::PARAM_STR);
$sth->execute();
?>
</body>
</html> SQL Profiler shows
pyodbc is definitely the outlier here. |
The negative indicator (-4102) shows that pyODBC is attempting to use DAE. A comment in the code agrees, and while msodbcsql doesn't require the use of DAE for the long types, other drivers may. |
@v-chojas - Thanks for the explanation. So if you could add an cnxn = pyodbc.connect(connection_string, avoid_dae_on_execute=True) then I can tweak SQLAlchemy's |
Trying to reproduce #835. No luck.
The first thing we need is a failing unit test. I've added varchar(max) and nvarchar(max) tests to tests3/sqlservertests.py. They pass on Windows. I don't have my new Mac setup for SQL Server just yet. Can someone confirm these tests fail with Mac? Part of the setup in cnxninfo.cpp code is to find the longest supported varchar and wvarchar lengths using SQLGetTypeInfo. IIRC, anything larger than this is supposed to require DAE. I need to find the docs for this and add more comments. |
I increased the varchar(max) and nvarchar(max) tests to use the large sizes (up to 20K) and connected to my Windows SQL Server test machine from the Mac test machine using the MS SQL Server ODBC Driver v17. No problems. Here's my system info.
Also note the Max items at the end. These are the maximum supported sizes according to the driver. Anything above these sizes should (and does) use DAE. Is this not what we expect? Do these tests fail on anyone else's machine? I must be missing something. |
The example test program also works for me. Here is the trace of the two inserts.
|
This seems important. From the OP's ODBC trace log. This explains why it worked on my machine - I used my existing test database and missed the
Can anyone explain? The recent SQL Server Unicode changes are new to me. |
@mkleehammer thanks for looking into this. not much I can add, folks here seem to know SQL server much better than I do. |
@mkleehammer (cc: @v-chojas )
Yes, the CREATE TABLE t1 (id int primary key, txt nvarchar(max) COLLATE Latin1_General_100_CI_AS_SC) so we can't even use that to create a failing test. We need to do CREATE DATABASE foo COLLATE Latin1_General_100_CI_AS_SC
USE foo
CREATE TABLE t1 (id int primary key, txt nvarchar(max)) The reason why Line 767 in 023af55
to bool avoid_dae_on_execute = true; // POC for #835
if (maxlength == 0 || cb <= maxlength || isTVP || avoid_dae_on_execute) but that failed with "Invalid Precision value (0)". Given that Microsoft shut down that feedback.azure.com thread pretty darn quick the situation doesn't look too promising. My gut feeling is that msodbcsql should just stop sending the parameter values as text/ntext and start sending them as varchar(max)/nvarchar(max). According to that Azure thread, text/ntext have been deprecated since SQL Server 2005. However, msodbcsql is a total black box. AFAIK there isn't even a forum for suggestions or an issue tracker. The closest I've found is https://github.com/microsoft/msphpsql but that is for the PHP layer over msodbcsql, not msodbcsql itself. |
For backwards compatibility, that will not change. If you don't want text/ntext then bind using a SQL type of SQL_VARCHAR/SQL_WVARCHAR, not the LONG variations (which map to text/ntext), and specify a "column length" of 0 which tells it to use the (max) type. |
It need not be an immediate breaking change. msodbcsql could accept a connection string parameter named "UseNVarcharMax", which would default to "no" and continue the current text/ntext behaviour. Precedent:
Is |
I'm sorry I'm just now getting back to this. I'm wondering if DAE should ever be used. I need to test with some populate DBs, but maybe it should default to off. What does everyone think of investigating that solution? Also, note that you can quickly test turning of DAE by setting I would guess most drivers really don't care about DAE, so another option would be to default Thoughts? |
In the original repro code if I add cnxn.maxwrite = 1_000_000 immediately after the
(Tested with pyodbc 4.0.31b51.) |
It says "precision" and the DecimalDigits parameter to SQLBindParameter is 0 in the log. That must be what it is complaining about. The documentation states the DecimalDigits is ignored when inserting character data:
Does it only happen when we cross the 4000 character boundary? @v-chojas @v-makouz Can either of you provide any insight here? |
@mkleehammer Good to see you working on pyODBC again! When the length is nonzero and the SQL type is not a long type, the driver interprets that as a non-max type, and as we know a non-max nvarchar is restricted to 4000 at most. If you specify a length of 0, the driver interprets that to mean use a max-type. Relevant documentation here; https://docs.microsoft.com/en-us/sql/relational-databases/native-client/features/using-large-value-types |
The 4000 byte boundary, yes. With the above content = 2000 * 'A' succeeds, while content = 2001 * 'A' fails. |
In an earlier comment @v-chojas said "while msodbcsql doesn't require the use of DAE for the long types, other drivers may" |
Thanks to @gordthompson, we have some data to look at. Here is a summary of his tests.
This leads me to a few questions:
Here is what doesn't work with SQL Server:
I'm still not 100% certain of what does work:
@v-chojas's comments seem to say that the preferred approach is to use WVARCHAR with a 0 Will this work with older SQL Server versions like 2001? A bigger problem is how do we when know when to do this? It's not quite just avoiding DAE. Is it time to introduce a DBMS-type or dialect attribute with all of these settings? Then We'd define the generic default and then some DB specific ones like "mssqlserver", |
IMO this is what pyodbc should do and I believe this mechanism can be used to help with issues like #134 as well (?) |
I agree that it's not an ideal solution but it may be the most practical approach, especially since vendors cannot be relied upon to fix defects in their ODBC drivers. |
Bundling a bunch of DBMS-specific pyodbc parameters into a connection "dialect" does seem to be an elegant way of making pyodbc work with a variety of DBMSs that don't comply 100% with the ODBC standard. This dialect approach is somewhat like turbodbc or the built-in Python csv reader object, and works well in my experience. Perhaps pyodbc could even try to detect the DBMS on connection (from SQL_DBMS_NAME and SQL_DBMS_VER?) so clients don't need to specify it at all. |
Good idea, but I would recommend going slow on that. AFAIK there is no standard naming convention for those sorts of attributes, so in doing so we could potentially be opting-in to a perpetual game of catch-up with "the names of things", e.g., pyodbc/tests3/sqlservertests.py Lines 87 to 88 in 7c7b1b1
My preferred approach would be to keep the default behaviour and get the (hard-coded) profiles defined first, then try to make it automatic if/when the customers ask for it. |
As reported in this comment, ODBC Driver 18 for SQL Server has apparently added just such a parameter, named LongAsMax |
Environment
Issue
I cannot insert strings with a length greater than 2000 into columns with a datatype of
varchar(max)
ornvarchar(max)
using parametrised queries.In particular the POC (see below) fails with following output:
Every parametrised insert with a string having a length greater than 2000 fails with the above error message relating to a
text
/ntext
conversion orLatin1_General_100_CI_AS_SC
collation, which is strange considering that only plain ASCII is inserted.How can I resolve this issue and insert strings of any length into the database?
POC:
The text was updated successfully, but these errors were encountered: