-
Notifications
You must be signed in to change notification settings - Fork 25
FIX: Validate access tokens to prevent crashes #279
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Adds validation of access token structures to prevent ODBC driver crashes caused by malformed tokens. Key additions include a new validator in auth.py, integration of validation during connection initialization, a new constant for the access token attribute, and comprehensive tests for invalid and valid token scenarios.
- Introduces validate_access_token_struct with structural and encoding checks.
- Validates access tokens passed via attrs_before in Connection.init.
- Adds tests ensuring malformed tokens are rejected and well-formed ones proceed.
Reviewed Changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.
File | Description |
---|---|
tests/test_008_auth.py | Adds subprocess-based tests for rejecting malformed access tokens and allowing valid ones. |
mssql_python/constants.py | Introduces SQL_COPT_SS_ACCESS_TOKEN constant (1256). |
mssql_python/connection.py | Integrates access token validation during connection initialization. |
mssql_python/auth.py | Adds validate_access_token_struct implementing structural and encoding checks. |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
result = subprocess.run( | ||
[sys.executable, "-c", code], | ||
capture_output=True, | ||
text=True | ||
) |
Copilot
AI
Oct 6, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sys is used but never imported in this test function, which will raise a NameError before the subprocess runs. Add import sys near the other imports at the start of the function.
Copilot uses AI. Check for mistakes.
result = subprocess.run( | ||
[sys.executable, "-c", code], | ||
capture_output=True, | ||
text=True | ||
) |
Copilot
AI
Oct 6, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sys is referenced here but not imported in this function, causing a NameError. Add import sys alongside the other local imports.
Copilot uses AI. Check for mistakes.
declared_size = struct.unpack('<I', token_struct[:4])[0] | ||
|
Copilot
AI
Oct 6, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
struct is used but not imported in this module, which will raise a NameError when this function is called. Add import struct at the top of the file.
Copilot uses AI. Check for mistakes.
(b"", "empty token"), | ||
(b"x" * 3, "too small (< 4 bytes)"), | ||
(b"\x00\x00\x00\x00", "header only, no data"), | ||
(b"\x10\x00\x00\x00" + b"\x00" * 16, "size mismatch (declares 16, total 20)"), |
Copilot
AI
Oct 6, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] The description claims a size mismatch, but a 16-byte declared size plus the 4-byte header correctly equals 20 bytes; this case actually fails due to all-zero data. Update the description or adjust the test data (e.g., declare 8 but provide 16) for clarity.
(b"\x10\x00\x00\x00" + b"\x00" * 16, "size mismatch (declares 16, total 20)"), | |
(b"\x08\x00\x00\x00" + b"\x00" * 16, "size mismatch (declares 8, total 20)"), |
Copilot uses AI. Check for mistakes.
📊 Code Coverage Report
Diff CoverageDiff: main...HEAD, staged and unstaged changes
Summary
mssql_python/auth.pyLines 25-45 25 Raises:
26 ValueError: If the token structure is invalid
27 """
28 # Check minimum size (4-byte header + data)
! 29 if len(token_struct) < 4:
! 30 raise ValueError(
31 f"Invalid access token: minimum 4 bytes required for ACCESSTOKEN structure, got {len(token_struct)} bytes"
32 )
33
34 # Extract declared size from first 4 bytes
! 35 declared_size = struct.unpack('<I', token_struct[:4])[0]
36
37 # Validate structure integrity
! 38 total_size = len(token_struct)
! 39 expected_size = declared_size + 4
! 40 if expected_size != total_size:
! 41 raise ValueError(
42 f"Invalid access token: size mismatch in ACCESSTOKEN structure. "
43 f"Header declares {declared_size} bytes, but structure has {total_size - 4} bytes of data"
44 )
45 Lines 43-58 43 f"Header declares {declared_size} bytes, but structure has {total_size - 4} bytes of data"
44 )
45
46 # Validate token data is not empty/all zeros
! 47 token_data = token_struct[4:]
! 48 if not any(token_data):
! 49 raise ValueError("Invalid access token: token data is empty or all zeros")
50
51 # Validate UTF-16LE encoding (ODBC driver requirement)
52 # JWT tokens in UTF-16LE have null bytes interleaved with ASCII characters
! 53 if declared_size % 2 != 0:
! 54 raise ValueError(
55 f"Invalid access token: must be UTF-16LE encoded (got odd byte length {declared_size})"
56 )
57
58 # Check for UTF-16LE pattern: ASCII characters with interleaved null bytes Lines 56-65 56 )
57
58 # Check for UTF-16LE pattern: ASCII characters with interleaved null bytes
59 # Real JWTs start with "eyJ" in UTF-16LE: 65 00 79 00 4A 00
! 60 if declared_size >= 6:
! 61 has_utf16_pattern = all([
62 0x20 <= token_data[0] <= 0x7E and token_data[1] == 0, # First char
63 0x20 <= token_data[2] <= 0x7E and token_data[3] == 0, # Second char
64 0x20 <= token_data[4] <= 0x7E and token_data[5] == 0 # Third char
65 ]) Lines 63-72 63 0x20 <= token_data[2] <= 0x7E and token_data[3] == 0, # Second char
64 0x20 <= token_data[4] <= 0x7E and token_data[5] == 0 # Third char
65 ])
66
! 67 if not has_utf16_pattern:
! 68 raise ValueError(
69 "Invalid access token: must be UTF-16LE encoded JWT. "
70 "Expected alternating ASCII and null bytes (e.g., 'e\\x00y\\x00J\\x00' for 'eyJ')"
71 ) mssql_python/connection.pyLines 150-164 150 self._attrs_before = attrs_before or {}
151
152 # Validate access token if provided directly via attrs_before
153 if ConstantsDDBC.SQL_COPT_SS_ACCESS_TOKEN.value in self._attrs_before:
! 154 from mssql_python.auth import validate_access_token_struct
! 155 token_struct = self._attrs_before[ConstantsDDBC.SQL_COPT_SS_ACCESS_TOKEN.value]
! 156 if isinstance(token_struct, (bytes, bytearray)):
! 157 try:
! 158 validate_access_token_struct(bytes(token_struct))
! 159 except ValueError as e:
! 160 raise ValueError(f"Invalid access token in attrs_before: {e}") from e
161
162 # Initialize encoding settings with defaults for Python 3
163 # Python 3 only has str (which is Unicode), so we use utf-16le by default
164 self._encoding_settings = { 📋 Files Needing Attention📉 Files with overall lowest coverage (click to expand)mssql_python.ddbc_bindings.py: 68.5%
mssql_python.pybind.ddbc_bindings.cpp: 69.3%
mssql_python.auth.py: 70.6%
mssql_python.cursor.py: 79.6%
mssql_python.pybind.connection.connection.cpp: 79.9%
mssql_python.connection.py: 80.3%
mssql_python.helpers.py: 84.7%
mssql_python.type.py: 86.8%
mssql_python.pooling.py: 88.8%
mssql_python.exceptions.py: 90.4% 🔗 Quick Links
|
@bewithgaurav Is it safe to read the token provided by the user in our code? |
Work Item / Issue Reference
Summary
This pull request introduces robust validation for access tokens used with SQL Server authentication to prevent ODBC driver crashes caused by malformed tokens. The changes ensure that only properly structured tokens are passed to the driver, with comprehensive unit tests verifying both protection against invalid tokens and allowance of valid ones. Key updates span Python and C++ code, constant definitions, and test coverage.
Access Token Validation and Safety Enhancements
validate_access_token_struct
inauth.py
to check the integrity of ACCESSTOKEN structures before passing them to the ODBC driver, preventing crashes from malformed tokens. Validation includes size checks, structure integrity, non-empty data, and UTF-16LE encoding verification.connection.py
, raising clear exceptions if invalid tokens are provided viaattrs_before
.Constants and Driver Integration
SQL_COPT_SS_ACCESS_TOKEN
(value 1256) inconstants.py
and corresponding C++ code to standardize the access token attribute for connections. [1] [2]setAttribute
method, using heap-allocated strings for access token data.Test Coverage and Crash Prevention
test_008_auth.py
to verify that malformed tokens are rejected (preventing driver crashes) and that validly structured tokens are allowed through (even if authentication fails), using subprocess isolation to test crash scenarios.