Skip to content

Entra ID access token is acquired on every connect(), even on a pool hit #659

Description

@sdebruyn

Summary

With Entra ID authentication (e.g. Authentication=ActiveDirectoryDefault), Connection.__init__ acquires an access token on every connect(), before the native connection pool is consulted. On a pool hit the freshly acquired token is never used: the pooled physical connection is already authenticated, and the pool keys only on the sanitized connection string, so the token in attrs_before is not reapplied. The token acquisition and struct packing are therefore wasted work on every reused connection.

This partially defeats the purpose of pooling for token-auth workloads: pooling is enabled to avoid per-connection cost, yet a token is still materialized for each connection.

Where (v1.10.0)

mssql_python/connection.py, Connection.__init__:

# token acquired unconditionally, before the pool is consulted
sanitized = remove_sensitive_params(parsed_params)
self.connection_str = _ConnectionStringBuilder(sanitized).build()
token = get_auth_token(auth_type, credential_kwargs)      # <-- always runs
if token:
    self._attrs_before[ConstantsDDBC.SQL_COPT_SS_ACCESS_TOKEN.value] = token

# ... later ...

# pool checkout happens here, in the C layer, keyed on connection_str
if not PoolingManager.is_initialized():
    PoolingManager.enable()
self._pooling = PoolingManager.is_enabled()
self._conn = ddbc_bindings.Connection(
    self.connection_str, self._pooling, self._attrs_before
)

get_auth_token -> AADAuth._acquire_token reuses a cached credential instance, but still calls credential.get_token("https://database.windows.net/.default") and get_token_struct() (UTF-16-LE encode + struct.pack) on every call. azure-identity serves the token from its own in-memory cache while it is still valid, so this is not a full network round-trip each time, but it is per-connection CPU work whose result is discarded on a pool hit.

Evidence

A dlt pipeline loading many tables to a Fabric Warehouse with Authentication=ActiveDirectoryDefault and native pooling enabled:

  • The native pool is active and reusing connections: SQL_ATTR_RESET_CONNECTION (pool checkout reset) is logged ~212 times, with a single real cold login (~2.9s) followed by a uniform ~0.28s per open.
  • Yet get_token: Azure AD token acquired successfully is logged once per connect() (146 token acquisitions for 146 opens, exactly 1:1), i.e. a token is produced even for the reused connections.

Impact

  • Wasted CPU per connection (token struct packing + credential.get_token bookkeeping) exactly in the high-frequency, short-connection scenario that pooling is meant to optimize.
  • Makes it harder to reason about pooling from the caller side: every open still performs an auth step, so a pooled checkout is indistinguishable from a fresh login by wall-clock.

Possible direction

Consult the pool before acquiring/materializing the token, and only acquire when a new physical connection will actually be opened (pool miss). This depends on the pool-key/identity work tracked in #651: the pool currently cannot tell the caller whether a checkout will reuse or open a connection, and it cannot safely reuse a token-auth connection across identities. If the pool became identity-aware (per #651), a pool hit for the same identity could skip token acquisition entirely.

Related: #651 (pool identity separation), #580 (reducing per-connect parsing overhead).

Metadata

Metadata

Assignees

Labels

area: connectivity-authConnection lifecycle, Entra/SP/NTLM auth, tokens, TLS, conn-string parsing, Fabric endpoints.enhancementNew feature or requestinADO

Type

No type

Fields

No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions