-
Notifications
You must be signed in to change notification settings - Fork 16
FEAT: Support for streaming large parameters in execute() #176
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This pull request adds support for Data At Execution (DAE) streaming in SQL Server for large parameters, particularly NVARCHAR(MAX) fields. The implementation allows streaming of large Unicode strings and binary data through ODBC's DAE mechanism instead of buffering them entirely in memory.
- Introduces DAE support for streaming large parameters (>4000 characters for strings, >8000 bytes for binary)
- Updates parameter type mapping to correctly calculate UTF-16 string sizes and identify streamable parameters
- Refactors C++ bindings to handle SQLParamData/SQLPutData execution flow for large data
Reviewed Changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 6 comments.
File | Description |
---|---|
mssql_python/pybind/ddbc_bindings.h | Adds function pointer declarations for SQLParamData and SQLPutData DAE APIs |
mssql_python/pybind/ddbc_bindings.cpp | Implements DAE streaming logic, extends ParamInfo struct, and handles SQLParamData/SQLPutData execution flow |
mssql_python/pybind/CMakeLists.txt | Removes /WX compiler flag to make builds less strict on warnings |
mssql_python/cursor.py | Updates parameter type mapping to return DAE flags and correctly calculate UTF-16 string sizes |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
|
||
const size_t chunkSize = 8192; | ||
for (size_t offset = 0; offset < totalBytes; offset += chunkSize) { | ||
size_t len = std::min(chunkSize, totalBytes - offset); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Type mismatch: totalBytes
is SQLLEN
(signed) but offset
is size_t
(unsigned). This could cause issues when totalBytes
is negative or when comparing signed/unsigned values.
size_t len = std::min(chunkSize, totalBytes - offset); | |
size_t totalBytes_sz = static_cast<size_t>(utf16_str.size()); | |
const size_t chunkSize = 8192; | |
for (size_t offset = 0; offset < totalBytes_sz; offset += chunkSize) { | |
size_t len = std::min(chunkSize, totalBytes_sz - offset); |
Copilot uses AI. Check for mistakes.
) | ||
|
||
# String mapping logic here | ||
is_unicode = self._is_unicode_string(param) | ||
# TODO: revisit | ||
if len(param) > 4000: # Long strings | ||
if is_unicode: | ||
utf16_len = len(param.encode("utf-16-le")) // 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Encoding the string to calculate UTF-16 length is inefficient for large strings. Consider using a more efficient method to estimate the byte length without full encoding.
utf16_len = len(param.encode("utf-16-le")) // 2 | |
utf16_len = utf16_codeunits(param) |
Copilot uses AI. Check for mistakes.
) | ||
if is_unicode: # Short Unicode strings | ||
utf16_len = len(param.encode("utf-16-le")) // 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Duplicate encoding operation for UTF-16 length calculation. This encoding is performed even for short strings where it may be unnecessary overhead.
utf16_len = len(param.encode("utf-16-le")) // 2 | |
utf16_len = len(param) |
Copilot uses AI. Check for mistakes.
a17e198
to
c0a91a2
Compare
Work Item / Issue Reference
Summary
This pull request adds support for streaming large parameters to SQL Server using ODBC's Data At Execution (DAE) mechanism, particularly for long Unicode strings and binary data. The changes update both the Python and C++ layers to correctly identify large parameters, mark them for DAE, and handle the streaming process during execution. Additional refactoring improves parameter type mapping and memory handling for these cases.
Large parameter streaming (DAE) support:
_map_sql_type
method incursor.py
to return anis_dae
flag for parameters that require streaming (e.g., long Unicode strings, long binary data), and to calculate the correct size for Unicode strings using UTF-16 encoding. [1] [2] [3] [4]_create_parameter_types_list
incursor.py
to set DAE-related fields (isDAE
,strLenOrInd
,dataPtr
) in the parameter info when streaming is needed.C++ bindings and execution logic:
ParamInfo
struct and its Python bindings to include DAE fields (isDAE
,strLenOrInd
,dataPtr
) for use during parameter binding and streaming. [1] [2]SQLParamData
,SQLPutData
) and integrated their loading and usage into the driver handle setup. [1] [2] [3] [4]ddbc_bindings.cpp
to handle DAE parameters: if a parameter is marked for DAE, the code enters a loop to stream the data in chunks usingSQLParamData
andSQLPutData
. This is done for large Unicode strings and (potentially) binary data. [1] [2]Build and warning improvements: