Skip to content

v0.13.2 - Transport reliability, Kafka RPC/Action support, task queue pattern

Latest

Choose a tag to compare

@klpanagi klpanagi released this 04 May 18:55
· 3 commits to master since this release

Highlights

This is a substantial reliability and feature release focused on cross-transport stability, full Kafka feature parity, a new task queue pattern, and large-scale code quality improvements.

Kafka transport - full RPC and Action support

  • New RPCService, RPCClient, RPCServer for Kafka with permanent per-instance reply topics and correlation-ID routing for sub-second call latency.
  • ActionService/ActionClient now pre-create all required topics in __init__ so subscriptions never wait on auto-creation.
  • New subscribe() / publish() primitives on KafkaTransport.
  • Unique group_id per instance prevents consumer-group rebalance cascades on broker start; offset_reset=latest removes 10s session-timeout waits.
  • Topics are pre-created via AdminClient to avoid 15s wait_for_assignment timeouts.
  • Graceful UNKNOWN_TOPIC_OR_PART handling.

New: Task queue endpoint pattern

  • Adds task queue pattern alongside Pub/Sub, RPC, and Action.
  • Integration tests, benchmarks, examples, and API docs included.

Transport reliability fixes

  • AMQP: Resolved nested ioloop re-entry corrupting pika's _tx_buffers (CI benchmark failures); _pika_call() helper serialises ioloop access on shared connections; Subscriber/RPCService now default to dedicated connections to prevent concurrent ioloop access; connection.close() bounded by 5s daemon-thread timeout to prevent node.stop() hangs on FRAME_ERROR.
  • Redis: _set_connected(True/False) correctly toggled in connect()/stop()/_attempt_reconnect() - wait_connected() no longer always times out (saved up to 50s for ActionClient with 5 sub-endpoints). stop() is now best-effort with each cleanup step guarded; pubsub reconnect loop checks _stopped at every await point.
  • Kafka: Subscriber.stop() race with poll thread fixed via _stop_event pattern; KafkaTransport.stop() bounds producer.flush() and runs Consumer.close() in a daemon thread with 5s timeout (librdkafka has no native close timeout).
  • Action: _handle_get_result() no longer asserts on RUNNING goals (was crashing RPC handler threads); GoalHandler uses issubclass for msg_type check so self.data/self.result are proper Pydantic models; BaseActionClient.get_result implements proper poll-until-terminal wait logic.

Code quality

  • Resolved ~400+ pylint violations across core library, examples, and tests (PRs #69, #70).
  • Full mypy --check-untyped-defs clean across the core library.
  • Module and function docstrings added across 50 files.
  • Replaced mutable default arguments, dangerous defaults, unused imports/variables, f-string logging, and protected-access violations.
  • max-line-length standardised at 100.

Tooling and CI

  • New make ci, make ci-strict, make ci-full targets; test-all target combining ci + integration.
  • Added mypy typecheck targets to Makefile.
  • Removed brittle benchmarks workflow; integration tests now cover the same surface.
  • scripts/run_all_broker_tests.py rewritten with TCP probe + AdminClient readiness checks for Kafka, per-script log persistence under logs/integration/, partial stdout/stderr on timeout, and live-buffered (python -u) execution.
  • Restructured docs/ directory with development, performance, and session-summary subfolders.

Documentation

  • Updated API guide and README with task queue pattern and full transport feature matrix.
  • README updated with CI commands and troubleshooting guidance.

Compatibility

  • Python 3.9+ supported.
  • Python 3.14 + AMQP (pika) remains incompatible - relevant tests auto-skipped.
  • All Pydantic v2 conventions (model_dump()) preserved.

Full Changelog: v0.13.1...v0.13.2