Skip to content

[Bug] Non-streaming calls silently hang forever behind NAT — default httpx transport has no TCP keepalive #3269

@gsagrawal-binocs

Description

@gsagrawal-binocs

Confirm this is an issue with the Python library and not an underlying OpenAI API

  • This is an issue with the Python library

Describe the bug

Summary

Non-streaming OpenAI API calls (Responses API with medium/high reasoning, long completions.create() calls) hang indefinitely when run behind a NAT gateway. The OpenAI server successfully generates the response — visible in the dashboard — but the client never receives it and blocks forever.

Root Cause

The default httpx transport used by the SDK has no TCP keepalive (SO_KEEPALIVE is off). During a long non-streaming call, the TCP connection sits idle while the server generates. NAT gateways silently drop idle TCP connections after their idle timeout:

NAT type | Typical idle timeout -- | -- AWS NAT Gateway | ~350 s GCP Cloud NAT | ~120 s Home routers / ISP NAT | 60–300 s

With o-series and GPT-5.x models under high reasoning, server-side generation routinely takes 300–700 s. The NAT drops the connection mid-generation, the client receives no error (TCP is stateless — neither side knows the connection is dead without keepalive probes), and the call hangs indefinitely.

This affects any deployment behind NAT — EKS, ECS, Cloud Run, GKE, and even local development behind a home router.

Evidence

Running the reproducer below from a home network with gpt-5.4 + high reasoning:

  • OpenAI dashboard: call completed successfully, generation time ~606 s
  • Client (no keepalive): hung indefinitely — TCP connection was silently dropped by home router NAT at ~350 s idle; response never delivered despite being ready on the server
  • Client (with keepalive): completed successfully in ~606 s

Fix

Enable TCP keepalive on the default httpx transport in _DefaultHttpxClient and _DefaultAsyncHttpxClient in _base_client.py using kwargs.setdefault — so any caller-supplied custom transport is completely unaffected:

kwargs.setdefault("transport", httpx.AsyncHTTPTransport(socket_options=_build_keepalive_socket_options()))

This is identical to what the Anthropic Python SDK already does.

Workaround(until patched)

Pass a keepalive-enabled http_client explicitly as shown in the reproducer below.

PR

Fix submitted in: https://github.com//pull/3270

To Reproduce

"""
Run from behind any NAT (home router, cloud VM, k8s pod).
Both calls fire concurrently — [keepalive] succeeds, [no-keepalive] hangs forever.

Usage: OPENAI_API_KEY=sk-... python -u reproducer_nat_timeout.py
"""

Code snippets

import asyncio, socket, time, httpx
from openai import AsyncOpenAI

MODEL = "gpt-5.4"
PROMPT = (
    "You are a senior financial analyst. Write an exhaustive, deeply researched "
    "investment memo (minimum 8,000 words) covering market sizing, competitive "
    "landscape, unit economics, risks, and a DCF valuation for a hypothetical "
    "B2B SaaS company targeting the enterprise data-observability market. "
    "Cite real industry benchmarks throughout."
)

async def call_without_keepalive() -> None:
    client = AsyncOpenAI()  # default transport — no TCP keepalive
    t0 = time.monotonic()
    print("[no-keepalive] START")
    try:
        await client.responses.create(
            model=MODEL, reasoning={"effort": "high"}, input=PROMPT, stream=False
        )
        print(f"[no-keepalive] OK in {time.monotonic()-t0:.0f}s")
    except Exception as e:
        print(f"[no-keepalive] FAILED after {time.monotonic()-t0:.0f}s: {type(e).__name__}: {e}")

def _make_keepalive_client() -> httpx.AsyncClient:
    opts: list[tuple[int, int, int]] = [(socket.SOL_SOCKET, socket.SO_KEEPALIVE, 1)]
    if hasattr(socket, "TCP_KEEPIDLE"):
        opts.append((socket.IPPROTO_TCP, socket.TCP_KEEPIDLE, 60))
    elif hasattr(socket, "TCP_KEEPALIVE"):
        opts.append((socket.IPPROTO_TCP, socket.TCP_KEEPALIVE, 60))
    if hasattr(socket, "TCP_KEEPINTVL"):
        opts.append((socket.IPPROTO_TCP, socket.TCP_KEEPINTVL, 60))
    if hasattr(socket, "TCP_KEEPCNT"):
        opts.append((socket.IPPROTO_TCP, socket.TCP_KEEPCNT, 5))
    return httpx.AsyncClient(transport=httpx.AsyncHTTPTransport(socket_options=opts))

async def call_with_keepalive() -> None:
    client = AsyncOpenAI(http_client=_make_keepalive_client())
    t0 = time.monotonic()
    print("[keepalive]    START")
    try:
        await client.responses.create(
            model=MODEL, reasoning={"effort": "high"}, input=PROMPT, stream=False
        )
        print(f"[keepalive]    OK in {time.monotonic()-t0:.0f}s")
    except Exception as e:
        print(f"[keepalive]    FAILED after {time.monotonic()-t0:.0f}s: {type(e).__name__}: {e}")

async def main() -> None:
    print(f"Model: {MODEL}  reasoning: high")
    print("Both calls fired concurrently — watch [keepalive] succeed while [no-keepalive] hangs.\n")
    await asyncio.gather(call_without_keepalive(), call_with_keepalive())

asyncio.run(main())

Actual output:

Model: gpt-5.4 reasoning: high
Both calls fired concurrently — watch [keepalive] succeed while [no-keepalive] hangs.

[no-keepalive] START
[keepalive] START
[keepalive] OK in 604s ← completes
[no-keepalive] START ← still hanging...

OS

macOS

Python version

3.11

Library version

2.37.0

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions