Skip to content

Conversation

gargsaumya
Copy link
Contributor

@gargsaumya gargsaumya commented Sep 3, 2025

Work Item / Issue Reference

AB#38110
AB#34162

GitHub Issue: #<ISSUE_NUMBER>


Summary

This pull request introduces major improvements to the handling of variable-length string data (such as VARCHAR(MAX)) in both the core C++ bindings and the Python cursor layer, as well as comprehensive new tests to ensure correctness. The changes refactor how long strings are mapped and fetched, add robust streaming support for large values, and improve handling of edge cases like empty strings and NULL values.

Variable-length string handling (core logic):

  • Added the FetchLobColumnData helper in ddbc_bindings.cpp to correctly stream and assemble large variable-length column data (LOBs), including proper handling of null terminators and distinguishing between wide/narrow/binary types.
  • Updated SQLGetData_wrap to use streaming for columns with columnSize == SQL_NO_TOTAL, columnSize == 0, or columnSize > 8000, ensuring reliable retrieval of large VARCHAR(MAX) and similar fields. This replaces previous incomplete or error-prone logic.

Edge case and error handling:

  • Improved handling for empty strings and NULL values in column fetch logic, ensuring that empty strings are returned as empty Python strings and NULL as None, rather than raising exceptions or returning incorrect results.
  • Removed unnecessary exceptions for buffer size mismatches and now safely returns empty strings or None as appropriate. [1] [2]

Python cursor layer changes:

  • Refactored the SQL type mapping for long strings in _map_sql_type to use SQL_VARCHAR/SQL_WVARCHAR with a column size of zero, which triggers the correct LOB streaming behavior in the backend.
  • Commented out verbose parameter logging in execute to reduce noise and improve performance.

New and improved tests:

  • Added a suite of tests for VARCHAR(MAX) covering short strings, boundary conditions (8000 bytes), streaming of large values (8100 bytes and 100,000 bytes), empty string, NULL, and transaction rollback scenarios. These tests ensure that all edge cases are handled correctly.

These changes make the driver much more robust for applications that need to store and retrieve large or variable-length string data, and the new tests provide strong coverage for future development.

@Copilot Copilot AI review requested due to automatic review settings September 3, 2025 12:30
@github-actions github-actions bot added the pr-size: medium Moderate update size label Sep 3, 2025
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds comprehensive streaming support for VARCHAR(MAX) data types by introducing a new LOB (Large Object) streaming mechanism in the C++ bindings and updating the Python cursor layer to handle long strings more efficiently.

Key changes:

  • Implements streaming-based data retrieval for large VARCHAR(MAX) columns to handle values that exceed buffer limits
  • Refactors SQL type mapping to use zero column size for long strings, triggering proper LOB handling
  • Adds comprehensive test coverage for VARCHAR(MAX) scenarios including boundary conditions, large values, and edge cases

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File Description
mssql_python/pybind/ddbc_bindings.cpp Adds FetchLobColumnData function for streaming large column data and updates SQLGetData_wrap to use streaming for VARCHAR(MAX)
mssql_python/cursor.py Updates _map_sql_type to use SQL_VARCHAR/SQL_WVARCHAR with zero column size for long strings
tests/test_004_cursor.py Adds comprehensive test suite for VARCHAR(MAX) covering various data sizes, edge cases, and transaction scenarios

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@github-actions github-actions bot added pr-size: medium Moderate update size and removed pr-size: medium Moderate update size labels Sep 3, 2025
@gargsaumya gargsaumya changed the title FEAT: adding streaming support in fetch for varcharmax type FEAT: streaming support in fetchone for varcharmax data type Sep 3, 2025
@github-actions github-actions bot added pr-size: medium Moderate update size and removed pr-size: medium Moderate update size labels Sep 3, 2025
Copy link
Contributor

@sumitmsft sumitmsft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few comments. Please resolve

LOG("Loop {}: Trimmed null terminator (narrow)", loopCount);
} else if (copyCount >= sizeof(wchar_t)) {
auto wcharBuf = reinterpret_cast<const wchar_t*>(chunk.data());
if (wcharBuf[(copyCount / sizeof(wchar_t)) - 1] == L'\0') {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if copyCount is not an exact multiple of sizeof(wchar_t)? For example, if copyCount is 7 bytes, sizeof(wchar_t) is 2), dividing will give a non-integer result (gets truncated to 3 units, which is actually only 6 bytes: the last byte is left out).


if (isWideChar) {
std::wstring wstr(reinterpret_cast<const wchar_t*>(buffer.data()),
buffer.size() / sizeof(wchar_t));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if bufferis not an exact multiple of sizeof(wchar_t)?

return py::str("");
}

if (isWideChar) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any possibility of trailing null terminator here?

@gargsaumya gargsaumya force-pushed the saumya/streaming-fetchone branch from f6b7389 to e21b47e Compare September 10, 2025 07:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr-size: medium Moderate update size
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants