Skip to content

FEAT: Support for streaming large parameters in execute() #176

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

gargsaumya
Copy link
Contributor

@gargsaumya gargsaumya commented Aug 13, 2025

Work Item / Issue Reference

AB#33395

GitHub Issue: #<ISSUE_NUMBER>


Summary

This pull request adds support for streaming large parameters to SQL Server using ODBC's Data At Execution (DAE) mechanism, particularly for long Unicode strings and binary data. The changes update both the Python and C++ layers to correctly identify large parameters, mark them for DAE, and handle the streaming process during execution. Additional refactoring improves parameter type mapping and memory handling for these cases.

Large parameter streaming (DAE) support:

  • Updated the _map_sql_type method in cursor.py to return an is_dae flag for parameters that require streaming (e.g., long Unicode strings, long binary data), and to calculate the correct size for Unicode strings using UTF-16 encoding. [1] [2] [3] [4]
  • Modified _create_parameter_types_list in cursor.py to set DAE-related fields (isDAE, strLenOrInd, dataPtr) in the parameter info when streaming is needed.

C++ bindings and execution logic:

  • Extended the ParamInfo struct and its Python bindings to include DAE fields (isDAE, strLenOrInd, dataPtr) for use during parameter binding and streaming. [1] [2]
  • Added ODBC DAE API function pointers (SQLParamData, SQLPutData) and integrated their loading and usage into the driver handle setup. [1] [2] [3] [4]
  • Refactored parameter binding and execution logic in ddbc_bindings.cpp to handle DAE parameters: if a parameter is marked for DAE, the code enters a loop to stream the data in chunks using SQLParamData and SQLPutData. This is done for large Unicode strings and (potentially) binary data. [1] [2]

Build and warning improvements:

  • Adjusted MSVC compiler flags to remove the "treat warnings as errors" option, making builds less strict on warnings.

@Copilot Copilot AI review requested due to automatic review settings August 13, 2025 04:35
@github-actions github-actions bot added pr-size: medium Moderate update size labels Aug 13, 2025
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This pull request adds support for Data At Execution (DAE) streaming in SQL Server for large parameters, particularly NVARCHAR(MAX) fields. The implementation allows streaming of large Unicode strings and binary data through ODBC's DAE mechanism instead of buffering them entirely in memory.

  • Introduces DAE support for streaming large parameters (>4000 characters for strings, >8000 bytes for binary)
  • Updates parameter type mapping to correctly calculate UTF-16 string sizes and identify streamable parameters
  • Refactors C++ bindings to handle SQLParamData/SQLPutData execution flow for large data

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 6 comments.

File Description
mssql_python/pybind/ddbc_bindings.h Adds function pointer declarations for SQLParamData and SQLPutData DAE APIs
mssql_python/pybind/ddbc_bindings.cpp Implements DAE streaming logic, extends ParamInfo struct, and handles SQLParamData/SQLPutData execution flow
mssql_python/pybind/CMakeLists.txt Removes /WX compiler flag to make builds less strict on warnings
mssql_python/cursor.py Updates parameter type mapping to return DAE flags and correctly calculate UTF-16 string sizes

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.


const size_t chunkSize = 8192;
for (size_t offset = 0; offset < totalBytes; offset += chunkSize) {
size_t len = std::min(chunkSize, totalBytes - offset);
Copy link
Preview

Copilot AI Aug 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Type mismatch: totalBytes is SQLLEN (signed) but offset is size_t (unsigned). This could cause issues when totalBytes is negative or when comparing signed/unsigned values.

Suggested change
size_t len = std::min(chunkSize, totalBytes - offset);
size_t totalBytes_sz = static_cast<size_t>(utf16_str.size());
const size_t chunkSize = 8192;
for (size_t offset = 0; offset < totalBytes_sz; offset += chunkSize) {
size_t len = std::min(chunkSize, totalBytes_sz - offset);

Copilot uses AI. Check for mistakes.

)

# String mapping logic here
is_unicode = self._is_unicode_string(param)
# TODO: revisit
if len(param) > 4000: # Long strings
if is_unicode:
utf16_len = len(param.encode("utf-16-le")) // 2
Copy link
Preview

Copilot AI Aug 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Encoding the string to calculate UTF-16 length is inefficient for large strings. Consider using a more efficient method to estimate the byte length without full encoding.

Suggested change
utf16_len = len(param.encode("utf-16-le")) // 2
utf16_len = utf16_codeunits(param)

Copilot uses AI. Check for mistakes.

)
if is_unicode: # Short Unicode strings
utf16_len = len(param.encode("utf-16-le")) // 2
Copy link
Preview

Copilot AI Aug 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicate encoding operation for UTF-16 length calculation. This encoding is performed even for short strings where it may be unnecessary overhead.

Suggested change
utf16_len = len(param.encode("utf-16-le")) // 2
utf16_len = len(param)

Copilot uses AI. Check for mistakes.

@gargsaumya gargsaumya changed the title FEAT: NVARCHAR (MAX) support in execute() FEAT: Support for streaming large parameters in execute() Aug 13, 2025
@github-actions github-actions bot added pr-size: medium Moderate update size and removed pr-size: medium Moderate update size labels Aug 13, 2025
@microsoft microsoft deleted a comment from Copilot AI Aug 13, 2025
@microsoft microsoft deleted a comment from Copilot AI Aug 13, 2025
@microsoft microsoft deleted a comment from Copilot AI Aug 13, 2025
@gargsaumya gargsaumya force-pushed the saumya/nvarcharmax-execute branch from a17e198 to c0a91a2 Compare August 13, 2025 08:02
@github-actions github-actions bot added pr-size: medium Moderate update size and removed pr-size: medium Moderate update size labels Aug 13, 2025
@github-actions github-actions bot added pr-size: medium Moderate update size and removed pr-size: medium Moderate update size labels Aug 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr-size: medium Moderate update size
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant