-
Notifications
You must be signed in to change notification settings - Fork 21
FEAT: streaming support in fetchone for varcharmax data type #219
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds comprehensive streaming support for VARCHAR(MAX) data types by introducing a new LOB (Large Object) streaming mechanism in the C++ bindings and updating the Python cursor layer to handle long strings more efficiently.
Key changes:
- Implements streaming-based data retrieval for large VARCHAR(MAX) columns to handle values that exceed buffer limits
- Refactors SQL type mapping to use zero column size for long strings, triggering proper LOB handling
- Adds comprehensive test coverage for VARCHAR(MAX) scenarios including boundary conditions, large values, and edge cases
Reviewed Changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.
File | Description |
---|---|
mssql_python/pybind/ddbc_bindings.cpp | Adds FetchLobColumnData function for streaming large column data and updates SQLGetData_wrap to use streaming for VARCHAR(MAX) |
mssql_python/cursor.py | Updates _map_sql_type to use SQL_VARCHAR/SQL_WVARCHAR with zero column size for long strings |
tests/test_004_cursor.py | Adds comprehensive test suite for VARCHAR(MAX) covering various data sizes, edge cases, and transaction scenarios |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a few comments. Please resolve
LOG("Loop {}: Trimmed null terminator (narrow)", loopCount); | ||
} else if (copyCount >= sizeof(wchar_t)) { | ||
auto wcharBuf = reinterpret_cast<const wchar_t*>(chunk.data()); | ||
if (wcharBuf[(copyCount / sizeof(wchar_t)) - 1] == L'\0') { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if copyCount
is not an exact multiple of sizeof(wchar_t)? For example, if copyCount is 7 bytes, sizeof(wchar_t) is 2), dividing will give a non-integer result (gets truncated to 3 units, which is actually only 6 bytes: the last byte is left out).
|
||
if (isWideChar) { | ||
std::wstring wstr(reinterpret_cast<const wchar_t*>(buffer.data()), | ||
buffer.size() / sizeof(wchar_t)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if bufferis not an exact multiple of sizeof(wchar_t)?
return py::str(""); | ||
} | ||
|
||
if (isWideChar) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any possibility of trailing null terminator here?
f6b7389
to
e21b47e
Compare
Work Item / Issue Reference
Summary
This pull request introduces major improvements to the handling of variable-length string data (such as
VARCHAR(MAX)
) in both the core C++ bindings and the Python cursor layer, as well as comprehensive new tests to ensure correctness. The changes refactor how long strings are mapped and fetched, add robust streaming support for large values, and improve handling of edge cases like empty strings andNULL
values.Variable-length string handling (core logic):
FetchLobColumnData
helper inddbc_bindings.cpp
to correctly stream and assemble large variable-length column data (LOBs), including proper handling of null terminators and distinguishing between wide/narrow/binary types.SQLGetData_wrap
to use streaming for columns withcolumnSize == SQL_NO_TOTAL
,columnSize == 0
, orcolumnSize > 8000
, ensuring reliable retrieval of largeVARCHAR(MAX)
and similar fields. This replaces previous incomplete or error-prone logic.Edge case and error handling:
NULL
values in column fetch logic, ensuring that empty strings are returned as empty Python strings andNULL
asNone
, rather than raising exceptions or returning incorrect results.None
as appropriate. [1] [2]Python cursor layer changes:
_map_sql_type
to useSQL_VARCHAR
/SQL_WVARCHAR
with a column size of zero, which triggers the correct LOB streaming behavior in the backend.execute
to reduce noise and improve performance.New and improved tests:
VARCHAR(MAX)
covering short strings, boundary conditions (8000 bytes), streaming of large values (8100 bytes and 100,000 bytes), empty string,NULL
, and transaction rollback scenarios. These tests ensure that all edge cases are handled correctly.These changes make the driver much more robust for applications that need to store and retrieve large or variable-length string data, and the new tests provide strong coverage for future development.