Skip to content

asyncio and native threads timeout handling fix#230

Merged
parfeon merged 7 commits intomasterfrom
fix/asyncio-error-handling-and-test-cleanup
Mar 26, 2026
Merged

asyncio and native threads timeout handling fix#230
parfeon merged 7 commits intomasterfrom
fix/asyncio-error-handling-and-test-cleanup

Conversation

@parfeon
Copy link
Contributor

@parfeon parfeon commented Mar 24, 2026

fix(asyncio): fix error propagation in async request path

Ensure PubNubAsyncioException always carries a valid PNStatus with error data instead of None.

fix(asyncio): fix PubNubAsyncioException.__str__ crash

Handle cases where status or error_data is None instead of raising AttributeError.

fix(event-engine): fix error type checks in effects

Match PubNubAsyncioException which is what request_future actually returns on failure.

fix(event-engine): fix give-up logic for unlimited retries

Handle -1 (unlimited) correctly since attempts > -1 was always true, causing immediate give-up.

fix(event-engine): initialize heartbeat max retry attempts

Use delay class defaults instead of config value which could be None causing TypeError on comparison.

fix(event-engine): add missing return after heartbeat give-up

Prevent falling through to start a heartbeat after deciding to give up.

fix(request-handlers): use explicit httpx.Timeout object

Set all four timeout fields explicitly instead of a 2-tuple that left write and pool unset.

fix(request-handlers): enforce wall-clock deadline to survive system sleep

On macOS and Linux, time.monotonic() does not advance during system sleep, causing socket and asyncio timeouts (310s subscribe) to stall for hours of wall-clock time. Add time.time()-based deadline checks that detect sleep and cancel stale requests within ~5s of wake.

fix(asyncio): replace asyncio.wait_for with wall-clock-aware loop

Use asyncio.wait() with periodic time.time() checks instead of a single monotonic-based wait_for(), yielding to the event loop between checks.

fix(native-threads): add WallClockDeadlineWatchdog

Persistent single daemon thread monitors time.time() every 5s and closes the httpx session when the wall-clock deadline passes, interrupting the blocking socket read. Tracks deadlines per calling thread so concurrent requests (e.g., subscribe + publish) don't interfere. Only armed for long-timeout requests (>30s). Session is recreated for subsequent requests

test(wall-clock-deadline): add unit tests for sleep detection

Cover both asyncio and threads paths simulated clock jumps, normal passthrough, clean watchdog shutdown, per-thread deadline isolation, concurrent request independence, cleanup, and exception propagation.

test(native-threads): add try/finally cleanup to subscribe tests

Ensure pubnub.stop() always runs to prevent non-daemon threads from blocking process exit.

test(native-threads): fix flaky where_now and here_now tests

Enable presence heartbeat and use unique channel names so presence registers on the server.

test(file-upload): fix shared state leak in file upload tests

Restore cipher_key after use in send_file and pass it explicitly to download_file.

test(message-actions): use unique channel names

Avoid collisions with stale data from prior test runs.

Ensure `PubNubAsyncioException` always carries a valid `PNStatus` with error data instead of `None`.

fix(asyncio): fix `PubNubAsyncioException.__str__` crash

Handle cases where status or `error_data` is `None` instead of raising `AttributeError`.

fix(event-engine): fix error type checks in effects

Match `PubNubAsyncioException` which is what `request_future` actually returns on failure.

fix(event-engine): fix give-up logic for unlimited retries

Handle `-1 (unlimited)` correctly since `attempts > -1` was always `true`, causing immediate give-up.

fix(event-engine): initialize heartbeat max retry attempts

Use delay class defaults instead of config value which could be `None` causing `TypeError` on comparison.

fix(event-engine): add missing return after `heartbeat` give-up

Prevent falling through to start a heartbeat after deciding to give up.

fix(request-handlers): use explicit `httpx.Timeout` object

Set all four timeout fields explicitly instead of a 2-tuple that left write and pool unset.

fix(request-handlers): enforce wall-clock deadline to survive system sleep

On macOS and Linux, `time.monotonic()` does not advance during system sleep, causing socket and
`asyncio` timeouts (310s subscribe) to stall for hours of wall-clock time. Add `time.time()`-based
deadline checks that detect sleep and cancel stale requests within ~5s of wake.

fix(asyncio): replace `asyncio.wait_for` with wall-clock-aware loop

Use `asyncio.wait()` with periodic `time.time()` checks instead of a single monotonic-based
`wait_for()`, yielding to the event loop between checks.

fix(native-threads): add `WallClockDeadlineWatchdog`

Persistent single daemon thread monitors `time.time()` every 5s and closes the `httpx` session
when the wall-clock deadline passes, interrupting the blocking socket read. Tracks deadlines
per calling thread so concurrent requests (e.g., subscribe + publish) don't interfere. Only armed
for long-timeout requests (>30s). Session is recreated for subsequent requests

test(wall-clock-deadline): add unit tests for sleep detection

Cover both `asyncio` and threads paths: simulated clock jumps, normal passthrough, clean watchdog
shutdown, per-thread deadline isolation, concurrent request independence, cleanup, and exception
propagation.

test(native-threads): add try/finally cleanup to subscribe tests

Ensure `pubnub.stop()` always runs to prevent non-daemon threads from blocking process exit.

test(native-threads): fix flaky where_now and here_now tests

Enable presence heartbeat and use unique channel names so presence registers on the server.

test(file-upload): fix shared state leak in file upload tests

Restore `cipher_key` after use in `send_file` and pass it explicitly to `download_file`.

test(message-actions): use unique channel names

Avoid collisions with stale data from prior test runs.
@parfeon parfeon self-assigned this Mar 24, 2026
@parfeon parfeon added status: in progress This issue is being worked on. priority: medium This PR should be reviewed after all high priority PRs. type: fix This PR contains fixes to existing features. labels Mar 24, 2026
@pubnub-ops-terraform
Copy link

pubnub-ops-terraform commented Mar 24, 2026

Snyk checks have passed. No issues have been found so far.

Status Scan Engine Critical High Medium Low Total (0)
Open Source Security 0 0 0 0 0 issues
Licenses 0 0 0 0 0 issues

💻 Catch issues earlier using the plugins for VS Code, JetBrains IDEs, Visual Studio, and Eclipse.

@parfeon parfeon changed the title asyncio and native threads timeout handling fix asyncio and native threads timeout handling fix Mar 24, 2026
…t during sleep

Session was never recreated after the wall-clock watchdog closed it, causing all reconnection
`/time/0` calls to fail permanently with "client has been closed".

fix(native-threads): use socket.shutdown to immediately unblock reads on macOS

On macOS, `socket.close()` from another thread does not interrupt a blocked `recv()`. Use
`socket.shutdown(SHUT_RDWR)` on the raw TCP socket to unblock within seconds instead of `~25` minutes.

fix(reconnect): classify wall-clock sleep timeouts as unexpected disconnect

Sleep-induced timeouts were mapped to `PNTimeoutCategory` which triggers an immediate silent restart,
bypassing the reconnection manager. Map them to `PNUnexpectedDisconnectCategory` so all paths use the
configured retry policy (`exponential`/`linear`).

build(deps): pin `httpx<1.0` for internal attribute stability

The socket shutdown fix accesses `httpcore` private attributes to reach the raw TCP socket. Pin upper
bound to prevent silent breakage on major version changes; access is wrapped in try/except fallback.

test(wall-clock-deadline): add tests for sleep/wake reconnection fixes

Cover session recreation, `PNERR_CONNECTION_ERROR` mapping, `WallClockTimeoutError` classification,
socket shutdown attribute path, and graceful degradation when `httpcore` internals change.
@parfeon parfeon marked this pull request as ready for review March 25, 2026 14:42
@parfeon parfeon requested a review from jguz-pubnub as a code owner March 25, 2026 14:42
parfeon added 2 commits March 25, 2026 16:50
Python 3.13.2's `VERIFY_X509_STRICT` rejects some certifi CA certs,
breaking `httpx.Client()` init. Using minor-only versions lets
setup-python resolve to latest patches automatically.
@parfeon
Copy link
Contributor Author

parfeon commented Mar 26, 2026

@pubnub-release-bot release

pubnub-release-bot and others added 2 commits March 26, 2026 14:31
Defer `httpx.Client()` creation to first use and map watchdog timeout to `PNTimeoutCategory` for
silent subscribe restart instead of announcing unexpected disconnect.
@parfeon parfeon merged commit a61ef5d into master Mar 26, 2026
13 checks passed
@parfeon parfeon deleted the fix/asyncio-error-handling-and-test-cleanup branch March 26, 2026 16:22
@pubnub-release-bot
Copy link
Contributor

🚀 Release successfully completed 🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

priority: medium This PR should be reviewed after all high priority PRs. status: in progress This issue is being worked on. type: fix This PR contains fixes to existing features.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants