[Bug] Non-streaming calls silently hang forever behind NAT — default httpx transport has no TCP keepalive

### Confirm this is an issue with the Python library and not an underlying OpenAI API

- [x] This is an issue with the Python library

### Describe the bug

<h2>Summary</h2><p>Non-streaming OpenAI API calls (Responses API with medium/high reasoning, long <code>completions.create()</code> calls) hang indefinitely when run behind a NAT gateway. The OpenAI server successfully generates the response — visible in the dashboard — but the client never receives it and blocks forever.</p>
<h2>Root Cause</h2><p>The default httpx transport used by the SDK has <strong>no TCP keepalive</strong> (<code>SO_KEEPALIVE</code> is off). During a long non-streaming call, the TCP connection sits idle while the server generates. NAT gateways silently drop idle TCP connections after their idle timeout:</p>
NAT type | Typical idle timeout
-- | --
AWS NAT Gateway | ~350 s
GCP Cloud NAT | ~120 s
Home routers / ISP NAT | 60–300 s

<p>With o-series and GPT-5.x models under high reasoning, server-side generation routinely takes <strong>300–700 s</strong>. The NAT drops the connection mid-generation, the client receives no error (TCP is stateless — neither side knows the connection is dead without keepalive probes), and the call hangs indefinitely.</p>
<p><strong>This affects any deployment behind NAT — EKS, ECS, Cloud Run, GKE, and even local development behind a home router.</strong></p>
<h2>Evidence</h2>
<p>Running the reproducer below from a home network with <code >gpt-5.4</code> + high reasoning:</p>
<ul>
<li><strong>OpenAI dashboard:</strong><span> </span>call completed successfully, generation time<span> </span><strong>~606 s</strong></li>
<li><strong>Client (no keepalive):</strong><span> </span>hung indefinitely — TCP connection was silently dropped by home router NAT at ~350 s idle; response never delivered despite being ready on the server</li><li><strong>Client (with keepalive):</strong><span> </span>completed successfully in ~606 s</li>
</ul>

<h2>Fix </h2>
<p>Enable TCP keepalive on the default httpx transport in <code>_DefaultHttpxClient</code> and <code>_DefaultAsyncHttpxClient</code> in <code>_base_client.py</code> using <code>kwargs.setdefault</code> — so any caller-supplied custom transport is completely unaffected:


```
kwargs.setdefault("transport", httpx.AsyncHTTPTransport(socket_options=_build_keepalive_socket_options()))
```

This is identical to what the [Anthropic Python SDK already does](https://github.com/anthropics/anthropic-sdk-python/blob/main/src/anthropic/_base_client.py).


<h2>Workaround(until patched) </h2>
Pass a keepalive-enabled http_client explicitly as shown in the reproducer below.

<h2>PR</h2> 
Fix submitted in: https://github.com/openai/openai-python/pull/3270 

### To Reproduce

"""
Run from behind any NAT (home router, cloud VM, k8s pod).
Both calls fire concurrently — [keepalive] succeeds, [no-keepalive] hangs forever.

Usage: OPENAI_API_KEY=sk-... python -u reproducer_nat_timeout.py
"""

### Code snippets

```Python
import asyncio, socket, time, httpx
from openai import AsyncOpenAI

MODEL = "gpt-5.4"
PROMPT = (
    "You are a senior financial analyst. Write an exhaustive, deeply researched "
    "investment memo (minimum 8,000 words) covering market sizing, competitive "
    "landscape, unit economics, risks, and a DCF valuation for a hypothetical "
    "B2B SaaS company targeting the enterprise data-observability market. "
    "Cite real industry benchmarks throughout."
)

async def call_without_keepalive() -> None:
    client = AsyncOpenAI()  # default transport — no TCP keepalive
    t0 = time.monotonic()
    print("[no-keepalive] START")
    try:
        await client.responses.create(
            model=MODEL, reasoning={"effort": "high"}, input=PROMPT, stream=False
        )
        print(f"[no-keepalive] OK in {time.monotonic()-t0:.0f}s")
    except Exception as e:
        print(f"[no-keepalive] FAILED after {time.monotonic()-t0:.0f}s: {type(e).__name__}: {e}")

def _make_keepalive_client() -> httpx.AsyncClient:
    opts: list[tuple[int, int, int]] = [(socket.SOL_SOCKET, socket.SO_KEEPALIVE, 1)]
    if hasattr(socket, "TCP_KEEPIDLE"):
        opts.append((socket.IPPROTO_TCP, socket.TCP_KEEPIDLE, 60))
    elif hasattr(socket, "TCP_KEEPALIVE"):
        opts.append((socket.IPPROTO_TCP, socket.TCP_KEEPALIVE, 60))
    if hasattr(socket, "TCP_KEEPINTVL"):
        opts.append((socket.IPPROTO_TCP, socket.TCP_KEEPINTVL, 60))
    if hasattr(socket, "TCP_KEEPCNT"):
        opts.append((socket.IPPROTO_TCP, socket.TCP_KEEPCNT, 5))
    return httpx.AsyncClient(transport=httpx.AsyncHTTPTransport(socket_options=opts))

async def call_with_keepalive() -> None:
    client = AsyncOpenAI(http_client=_make_keepalive_client())
    t0 = time.monotonic()
    print("[keepalive]    START")
    try:
        await client.responses.create(
            model=MODEL, reasoning={"effort": "high"}, input=PROMPT, stream=False
        )
        print(f"[keepalive]    OK in {time.monotonic()-t0:.0f}s")
    except Exception as e:
        print(f"[keepalive]    FAILED after {time.monotonic()-t0:.0f}s: {type(e).__name__}: {e}")

async def main() -> None:
    print(f"Model: {MODEL}  reasoning: high")
    print("Both calls fired concurrently — watch [keepalive] succeed while [no-keepalive] hangs.\n")
    await asyncio.gather(call_without_keepalive(), call_with_keepalive())

asyncio.run(main())
```
### Actual output:
Model: gpt-5.4  reasoning: high
Both calls fired concurrently — watch [keepalive] succeed while [no-keepalive] hangs.

[no-keepalive] START
[keepalive]    START
[keepalive]    OK in 604s        ← completes
[no-keepalive] START             ← still hanging...

### OS

macOS

### Python version

3.11

### Library version

2.37.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] Non-streaming calls silently hang forever behind NAT — default httpx transport has no TCP keepalive #3269

Confirm this is an issue with the Python library and not an underlying OpenAI API

Describe the bug

Summary

Root Cause

Evidence

Fix

Workaround(until patched)

PR

To Reproduce

Code snippets

Actual output:

OS

Python version

Library version

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[Bug] Non-streaming calls silently hang forever behind NAT — default httpx transport has no TCP keepalive #3269

Description

Confirm this is an issue with the Python library and not an underlying OpenAI API

Describe the bug

Summary

Root Cause

Evidence

Fix

Workaround(until patched)

PR

To Reproduce

Code snippets

Actual output:

OS

Python version

Library version

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions