Skip to content

Conversation

bewithgaurav
Copy link
Collaborator

@bewithgaurav bewithgaurav commented Sep 3, 2025

Work Item / Issue Reference

AB#38480

GitHub Issue: #<ISSUE_NUMBER>


Summary

This pull request improves how binary data (Python bytes and bytearray) is handled in the MSSQL Python driver, especially for edge cases like empty values, mixed types, and large binaries. It also significantly expands the test suite to cover these scenarios and documents current driver limitations regarding parameter and fetch buffer sizes.

Binary Data Handling Improvements

  • Updated _map_sql_type in cursor.py to always use VARBINARY for Python bytes/bytearray, avoiding storage waste and ensuring correct handling of variable-length data. This removes previous logic that sometimes used fixed-length BINARY and simplifies type mapping.
  • Improved sample value selection in _select_best_sample_value to correctly handle columns that contain only binary types (bytes or bytearray).

Test Suite Enhancements for Binary Data

  • Fixed and clarified the test_longvarbinary test to expect the correct number of rows and removed assumptions about zero-padding in returned binary data.
  • Added new tests covering edge cases:
    • Executemany with binary data and empty byte arrays, ensuring correct insertion and retrieval of empty, regular, and null binary values.
    • Binary data over 8000 bytes, documenting driver limitations (8192 bytes for parameters, 4096 bytes for fetch buffer) and verifying correct error handling and data retrieval for supported sizes.
    • All empty binaries: verifies that multiple empty binary rows are handled and returned as zero-length bytes.
    • Mixing bytes and bytearray types in the same column, confirming consistent storage and retrieval as bytes.
    • Binary columns with mostly small/empty values and one large value, ensuring the large value is handled within driver limits.
    • Table with only NULL and empty binary values, verifying clear distinction between NULL and empty values and correct query results.

@Copilot Copilot AI review requested due to automatic review settings September 3, 2025 10:10
@github-actions github-actions bot added the pr-size: large Substantial code update label Sep 3, 2025
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes binary data handling in the mssql-python library, specifically addressing issues with empty strings and binary data that previously caused assertion failures. The fix improves UTF-16 conversion on Unix platforms and ensures proper handling of zero-length data.

  • Fixes assertion failures when handling empty strings and binary data
  • Improves UTF-16 string conversion for Unix platforms (Linux/macOS)
  • Updates SQL type mapping to use VARBINARY instead of BINARY for better compatibility

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
tests/test_004_cursor.py Adds comprehensive test cases for empty string/binary handling and fixes existing test expectations
mssql_python/pybind/ddbc_bindings.cpp Fixes core data handling logic for empty strings/binary and improves UTF-16 conversion
mssql_python/cursor.py Updates SQL type mapping to use VARBINARY and ensures minimum column sizes

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@github-actions github-actions bot added pr-size: large Substantial code update and removed pr-size: large Substantial code update labels Sep 3, 2025
@bewithgaurav bewithgaurav changed the title FIX: Binary data handling FIX: Binary data handling + tests Sep 4, 2025
@github-actions github-actions bot added pr-size: large Substantial code update and removed pr-size: large Substantial code update labels Sep 4, 2025
@github-actions github-actions bot removed the pr-size: large Substantial code update label Sep 4, 2025
@github-actions github-actions bot added pr-size: small Minimal code update and removed pr-size: small Minimal code update labels Sep 4, 2025
@bewithgaurav bewithgaurav changed the title FIX: Binary data handling + tests FIX: Binary data padding + tests Sep 4, 2025
@github-actions github-actions bot added pr-size: small Minimal code update and removed pr-size: small Minimal code update labels Sep 4, 2025
@github-actions github-actions bot added pr-size: small Minimal code update and removed pr-size: small Minimal code update labels Sep 4, 2025
Copy link
Contributor

@sumitmsft sumitmsft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few comments

@bewithgaurav bewithgaurav changed the base branch from bewithgaurav/fix_executemany to main September 8, 2025 06:24
@github-actions github-actions bot added pr-size: small Minimal code update and removed pr-size: small Minimal code update labels Sep 8, 2025
@github-actions github-actions bot added pr-size: medium Moderate update size and removed pr-size: small Minimal code update labels Sep 8, 2025
@github-actions github-actions bot added pr-size: medium Moderate update size and removed pr-size: medium Moderate update size labels Sep 8, 2025
@github-actions github-actions bot added pr-size: medium Moderate update size and removed pr-size: medium Moderate update size labels Sep 8, 2025
@bewithgaurav bewithgaurav merged commit 9f17c10 into main Sep 8, 2025
19 checks passed
@bewithgaurav bewithgaurav linked an issue Sep 8, 2025 that may be closed by this pull request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr-size: medium Moderate update size
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Binary Data Padding - extra characters getting added
4 participants