Skip to content

Conversation

bewithgaurav
Copy link
Collaborator

@bewithgaurav bewithgaurav commented Oct 6, 2025

Work Item / Issue Reference

GitHub Issue: Closes #278


Summary

This pull request introduces robust validation for access tokens used with SQL Server authentication to prevent ODBC driver crashes caused by malformed tokens. The changes ensure that only properly structured tokens are passed to the driver, with comprehensive unit tests verifying both protection against invalid tokens and allowance of valid ones. Key updates span Python and C++ code, constant definitions, and test coverage.

Access Token Validation and Safety Enhancements

  • Added validate_access_token_struct in auth.py to check the integrity of ACCESSTOKEN structures before passing them to the ODBC driver, preventing crashes from malformed tokens. Validation includes size checks, structure integrity, non-empty data, and UTF-16LE encoding verification.
  • Integrated access token validation into the connection initialization logic in connection.py, raising clear exceptions if invalid tokens are provided via attrs_before.

Constants and Driver Integration

  • Defined SQL_COPT_SS_ACCESS_TOKEN (value 1256) in constants.py and corresponding C++ code to standardize the access token attribute for connections. [1] [2]
  • Ensured safe handling of token buffers in the C++ setAttribute method, using heap-allocated strings for access token data.

Test Coverage and Crash Prevention

  • Added comprehensive tests in test_008_auth.py to verify that malformed tokens are rejected (preventing driver crashes) and that validly structured tokens are allowed through (even if authentication fails), using subprocess isolation to test crash scenarios.

@github-actions github-actions bot added the pr-size: medium Moderate update size label Oct 6, 2025
@bewithgaurav bewithgaurav marked this pull request as ready for review October 6, 2025 12:10
@Copilot Copilot AI review requested due to automatic review settings October 6, 2025 12:10
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Adds validation of access token structures to prevent ODBC driver crashes caused by malformed tokens. Key additions include a new validator in auth.py, integration of validation during connection initialization, a new constant for the access token attribute, and comprehensive tests for invalid and valid token scenarios.

  • Introduces validate_access_token_struct with structural and encoding checks.
  • Validates access tokens passed via attrs_before in Connection.init.
  • Adds tests ensuring malformed tokens are rejected and well-formed ones proceed.

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

File Description
tests/test_008_auth.py Adds subprocess-based tests for rejecting malformed access tokens and allowing valid ones.
mssql_python/constants.py Introduces SQL_COPT_SS_ACCESS_TOKEN constant (1256).
mssql_python/connection.py Integrates access token validation during connection initialization.
mssql_python/auth.py Adds validate_access_token_struct implementing structural and encoding checks.

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Comment on lines +298 to +302
result = subprocess.run(
[sys.executable, "-c", code],
capture_output=True,
text=True
)
Copy link

Copilot AI Oct 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sys is used but never imported in this test function, which will raise a NameError before the subprocess runs. Add import sys near the other imports at the start of the function.

Copilot uses AI. Check for mistakes.

Comment on lines +382 to +386
result = subprocess.run(
[sys.executable, "-c", code],
capture_output=True,
text=True
)
Copy link

Copilot AI Oct 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sys is referenced here but not imported in this function, causing a NameError. Add import sys alongside the other local imports.

Copilot uses AI. Check for mistakes.

Comment on lines +35 to +36
declared_size = struct.unpack('<I', token_struct[:4])[0]

Copy link

Copilot AI Oct 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

struct is used but not imported in this module, which will raise a NameError when this function is called. Add import struct at the top of the file.

Copilot uses AI. Check for mistakes.

(b"", "empty token"),
(b"x" * 3, "too small (< 4 bytes)"),
(b"\x00\x00\x00\x00", "header only, no data"),
(b"\x10\x00\x00\x00" + b"\x00" * 16, "size mismatch (declares 16, total 20)"),
Copy link

Copilot AI Oct 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The description claims a size mismatch, but a 16-byte declared size plus the 4-byte header correctly equals 20 bytes; this case actually fails due to all-zero data. Update the description or adjust the test data (e.g., declare 8 but provide 16) for clarity.

Suggested change
(b"\x10\x00\x00\x00" + b"\x00" * 16, "size mismatch (declares 16, total 20)"),
(b"\x08\x00\x00\x00" + b"\x00" * 16, "size mismatch (declares 8, total 20)"),

Copilot uses AI. Check for mistakes.

Copy link

github-actions bot commented Oct 6, 2025

📊 Code Coverage Report

🔥 Diff Coverage

11%


🎯 Overall Coverage

74%


📈 Total Lines Covered: 4223 out of 5632
📁 Project: mssql-python


Diff Coverage

Diff: main...HEAD, staged and unstaged changes

  • mssql_python/auth.py (5.9%): Missing lines 29-30,35,38-41,47-49,53-54,60-61,67-68
  • mssql_python/connection.py (12.5%): Missing lines 154-160
  • mssql_python/constants.py (100%)

Summary

  • Total: 26 lines
  • Missing: 23 lines
  • Coverage: 11%

mssql_python/auth.py

Lines 25-45

  25     Raises:
  26         ValueError: If the token structure is invalid
  27     """
  28     # Check minimum size (4-byte header + data)
! 29     if len(token_struct) < 4:
! 30         raise ValueError(
  31             f"Invalid access token: minimum 4 bytes required for ACCESSTOKEN structure, got {len(token_struct)} bytes"
  32         )
  33     
  34     # Extract declared size from first 4 bytes
! 35     declared_size = struct.unpack('<I', token_struct[:4])[0]
  36     
  37     # Validate structure integrity
! 38     total_size = len(token_struct)
! 39     expected_size = declared_size + 4
! 40     if expected_size != total_size:
! 41         raise ValueError(
  42             f"Invalid access token: size mismatch in ACCESSTOKEN structure. "
  43             f"Header declares {declared_size} bytes, but structure has {total_size - 4} bytes of data"
  44         )
  45     

Lines 43-58

  43             f"Header declares {declared_size} bytes, but structure has {total_size - 4} bytes of data"
  44         )
  45     
  46     # Validate token data is not empty/all zeros
! 47     token_data = token_struct[4:]
! 48     if not any(token_data):
! 49         raise ValueError("Invalid access token: token data is empty or all zeros")
  50     
  51     # Validate UTF-16LE encoding (ODBC driver requirement)
  52     # JWT tokens in UTF-16LE have null bytes interleaved with ASCII characters
! 53     if declared_size % 2 != 0:
! 54         raise ValueError(
  55             f"Invalid access token: must be UTF-16LE encoded (got odd byte length {declared_size})"
  56         )
  57     
  58     # Check for UTF-16LE pattern: ASCII characters with interleaved null bytes

Lines 56-65

  56         )
  57     
  58     # Check for UTF-16LE pattern: ASCII characters with interleaved null bytes
  59     # Real JWTs start with "eyJ" in UTF-16LE: 65 00 79 00 4A 00
! 60     if declared_size >= 6:
! 61         has_utf16_pattern = all([
  62             0x20 <= token_data[0] <= 0x7E and token_data[1] == 0,  # First char
  63             0x20 <= token_data[2] <= 0x7E and token_data[3] == 0,  # Second char
  64             0x20 <= token_data[4] <= 0x7E and token_data[5] == 0   # Third char
  65         ])

Lines 63-72

  63             0x20 <= token_data[2] <= 0x7E and token_data[3] == 0,  # Second char
  64             0x20 <= token_data[4] <= 0x7E and token_data[5] == 0   # Third char
  65         ])
  66         
! 67         if not has_utf16_pattern:
! 68             raise ValueError(
  69                 "Invalid access token: must be UTF-16LE encoded JWT. "
  70                 "Expected alternating ASCII and null bytes (e.g., 'e\\x00y\\x00J\\x00' for 'eyJ')"
  71             )

mssql_python/connection.py

Lines 150-164

  150         self._attrs_before = attrs_before or {}
  151         
  152         # Validate access token if provided directly via attrs_before
  153         if ConstantsDDBC.SQL_COPT_SS_ACCESS_TOKEN.value in self._attrs_before:
! 154             from mssql_python.auth import validate_access_token_struct
! 155             token_struct = self._attrs_before[ConstantsDDBC.SQL_COPT_SS_ACCESS_TOKEN.value]
! 156             if isinstance(token_struct, (bytes, bytearray)):
! 157                 try:
! 158                     validate_access_token_struct(bytes(token_struct))
! 159                 except ValueError as e:
! 160                     raise ValueError(f"Invalid access token in attrs_before: {e}") from e
  161 
  162         # Initialize encoding settings with defaults for Python 3
  163         # Python 3 only has str (which is Unicode), so we use utf-16le by default
  164         self._encoding_settings = {


📋 Files Needing Attention

📉 Files with overall lowest coverage (click to expand)
mssql_python.ddbc_bindings.py: 68.5%
mssql_python.pybind.ddbc_bindings.cpp: 69.3%
mssql_python.auth.py: 70.6%
mssql_python.cursor.py: 79.6%
mssql_python.pybind.connection.connection.cpp: 79.9%
mssql_python.connection.py: 80.3%
mssql_python.helpers.py: 84.7%
mssql_python.type.py: 86.8%
mssql_python.pooling.py: 88.8%
mssql_python.exceptions.py: 90.4%

🔗 Quick Links

⚙️ Build Summary 📋 Coverage Details

View Azure DevOps Build

Browse Full Coverage Report

@jahnvi480
Copy link
Contributor

@bewithgaurav Is it safe to read the token provided by the user in our code?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr-size: medium Moderate update size
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Invalid access tokens crash ODBC driver - need defensive validation
2 participants