Skip to content

Conversation

gargsaumya
Copy link
Contributor

@gargsaumya gargsaumya commented Sep 9, 2025

Work Item / Issue Reference

AB#33395

GitHub Issue: #<ISSUE_NUMBER>


Summary

This pull request significantly improves support for streaming and fetching large binary (VARBINARY(MAX)) and large text columns in the MSSQL Python driver. The main changes include robust chunked retrieval of large objects (LOBs), correct handling of edge cases (such as empty or null values), and enhanced test coverage for these scenarios.

LOB Streaming and Fetching Enhancements:

  • Added a new helper function FetchLobColumnData to efficiently stream and assemble large binary/text columns (LOBs) from the database, handling chunking, null/empty values, and correct type conversion for both binary and (wide/narrow) string columns.
  • Updated the logic in SQLGetData_wrap, FetchBatchData, FetchMany_wrap, and FetchAll_wrap to detect LOB columns and use the new streaming path for fetching them, including proper fallback to row-by-row fetching when LOBs are present. [1] [2] [3] [4] [5] [6]
  • Modified the batch fetch function signatures and logic to propagate LOB column information and ensure correct handling during bulk fetches.

Testing Improvements:

  • Replaced and expanded the test for large binary data with a new, comprehensive test (test_varbinarymax_insert_fetch) that verifies insertion and retrieval of empty, small, and large VARBINARY(MAX) values (including edge cases around the 8000-byte threshold) using fetchone, fetchall, and fetchmany.

These changes ensure that the driver can reliably handle large binary and text columns in all fetch scenarios, improving correctness and robustness for users working with LOB data.

@github-actions github-actions bot added the pr-size: medium Moderate update size label Sep 9, 2025
Copy link
Collaborator

@bewithgaurav bewithgaurav left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

avoiding repeated comments from other PRs, just added some minor changes here
I believe this PR will merge at last since other changes will merge here

bewithgaurav
bewithgaurav previously approved these changes Sep 15, 2025
@github-actions github-actions bot added pr-size: medium Moderate update size and removed pr-size: medium Moderate update size labels Sep 15, 2025
@gargsaumya gargsaumya changed the base branch from saumya/streaming-varbinarymax to main September 15, 2025 09:36
@gargsaumya gargsaumya dismissed bewithgaurav’s stale review September 15, 2025 09:36

The base branch was changed.

@github-actions github-actions bot added pr-size: medium Moderate update size and removed pr-size: medium Moderate update size labels Sep 15, 2025
Copy link
Contributor

@sumitmsft sumitmsft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor fixes are required.

@Copilot Copilot AI review requested due to automatic review settings September 16, 2025 05:28
@gargsaumya gargsaumya force-pushed the saumya/streaming-varbinarymax-fetch branch from 4e5cfe8 to 2ea9862 Compare September 16, 2025 05:28
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds streaming support for VARBINARY(MAX) data types in the fetchone, fetchmany, and fetchall operations, significantly improving the driver's ability to handle large binary data. The changes enable proper insertion and retrieval of binary data larger than 8000 bytes, which previously had limitations.

  • Implemented chunked streaming retrieval for large binary columns (LOBs) using a new FetchLobColumnData helper function
  • Enhanced parameter binding to support Data-At-Execution (DAE) for large binary parameters
  • Updated fetch operations to detect LOB columns and use appropriate streaming vs buffered approaches

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
mssql_python/pybind/ddbc_bindings.cpp Core implementation of LOB streaming support, DAE parameter binding, and enhanced fetch operations
mssql_python/cursor.py Updated parameter mapping to use VARBINARY(MAX) with DAE for large binary data
tests/test_004_cursor.py Replaced limited test with comprehensive VARBINARY(MAX) tests covering various data sizes and fetch methods

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@gargsaumya gargsaumya force-pushed the saumya/streaming-varbinarymax-fetch branch from 2ea9862 to 85a4487 Compare September 16, 2025 05:31
@github-actions github-actions bot added pr-size: medium Moderate update size and removed pr-size: medium Moderate update size labels Sep 16, 2025
@gargsaumya gargsaumya force-pushed the saumya/streaming-varbinarymax-fetch branch from 7cee625 to e50de23 Compare September 16, 2025 09:49
@github-actions github-actions bot added pr-size: medium Moderate update size and removed pr-size: medium Moderate update size labels Sep 16, 2025
sumitmsft
sumitmsft previously approved these changes Sep 16, 2025
Copy link
Contributor

@sumitmsft sumitmsft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All looks good.

@github-actions github-actions bot added pr-size: medium Moderate update size and removed pr-size: medium Moderate update size labels Sep 16, 2025
@gargsaumya gargsaumya merged commit 7d06c96 into main Sep 17, 2025
18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr-size: medium Moderate update size
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants