Skip to content

[Bug] Retry handling does not close retried responses, which leaks httpx connections #564

@Kashyap456

Description

@Kashyap456

Environment

  • Python Version: 3.12.12
  • kiota-http version: 1.10.1
  • OS: Debian GNU/Linux 12 (bookworm), running in Kubernetes
  • Consumed via: msgraph-sdk 1.58.0 (msgraph-core 1.4.0)
  • HTTP stack: httpx 0.27.2 / httpcore 1.0.9

Seen during long, retry-heavy Microsoft Graph ingestions via msgraph-sdk-python.

Describe the bug

RetryHandler.send retries on 429/503/504, but it reassigns response to the
next attempt's response without closing the previous one. Because the
handler runs at the transport/middleware layer (below AsyncClient.send's
auto-read/close of the final response), each retried response is a streamed,
unread response whose connection is never returned to the pool. Over a
long-running, retry-heavy workload the connection pool is exhausted and
subsequent requests fail with httpx.PoolTimeout.

This is the same class of bug that was fixed for RedirectHandler in
microsoft/kiota-http-python#299 (PR microsoft/kiota-http-python#300), where the
fix was an await response.aclose() before discarding the old response.

To Reproduce

Self-contained, end-to-end repro against a real connection pool: a local server
returns a retryable 429, the default kiota middleware retries, and a wrapping
transport records the responses the pool hands back. The retried responses are
never closed (is_closed == False), so their connections are never returned to
the pool, and once it is exhausted the request raises httpx.PoolTimeout:

import asyncio
import threading
from http.server import BaseHTTPRequestHandler, ThreadingHTTPServer

import httpx
from kiota_http.kiota_client_factory import KiotaClientFactory

PORT = 8807


class QuietServer(ThreadingHTTPServer):
    def handle_error(self, request, client_address):
        pass  # silence benign ConnectionReset when the client drops the socket


class Handler(BaseHTTPRequestHandler):
    protocol_version = "HTTP/1.1"  # keep-alive, so the pool tracks the connection

    def log_message(self, *a):
        pass

    def do_GET(self):
        body = b"{}"
        self.send_response(429)  # retryable -> kiota RetryHandler retries
        self.send_header("Retry-After", "0")
        self.send_header("Content-Type", "application/json")
        self.send_header("Content-Length", str(len(body)))
        self.end_headers()
        self.wfile.write(body)


class RecordingTransport(httpx.AsyncBaseTransport):
    """Captures every response from the real pool so we can inspect it after."""

    def __init__(self, inner):
        self.inner = inner
        self.responses = []

    async def handle_async_request(self, request):
        response = await self.inner.handle_async_request(request)
        self.responses.append(response)
        return response

    async def aclose(self):
        await self.inner.aclose()


async def main():
    threading.Thread(
        target=QuietServer(("127.0.0.1", PORT), Handler).serve_forever,
        daemon=True,
    ).start()

    # real pool of 2; record the responses the pool hands back
    recorder = RecordingTransport(
        httpx.AsyncHTTPTransport(
            limits=httpx.Limits(max_connections=2, max_keepalive_connections=2)
        )
    )
    client = httpx.AsyncClient(transport=recorder, timeout=httpx.Timeout(5.0, pool=2.0))
    client = KiotaClientFactory.create_with_default_middleware(client=client)

    try:
        await client.get(f"http://127.0.0.1:{PORT}/")
        print("completed without error (unexpected)")
    except httpx.PoolTimeout:
        print("retried responses is_closed:", [r.is_closed for r in recorder.responses])
        print("httpx.PoolTimeout raised once the pool is exhausted by the leaked "
              "(unclosed) connections")
    finally:
        await client.aclose()


asyncio.run(main())

Output:

retried responses is_closed: [False, False]
httpx.PoolTimeout raised once the pool is exhausted by the leaked (unclosed) connections

The leaked connections sit checked-out but idle (no ReadTimeout). In
production this is gradual: with a larger pool each retried response leaks one
connection until the pool is exhausted.

Proposed Fix
Add an await response.aclose() after line 97 in the retry handler. Created this issue per the CONTRIBUTING.md and will follow up with a PR referencing this issue.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    Status
    Done ✔️

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions