Releases: jcz2020/par
v0.5.1
Theme: RAG foundation + Python streaming (buffered) + FFI work-loop architecture + configurable embedding model + HTTP timeout fix + ReAct loop hardening.
Changed — ReAct loop retry/timeout hardening (7 fixes)
Problem: Engine ReAct loop had multiple retry/timeout bugs: retries didn't consume iteration budget (max_iter=1 could make 4+ LLM calls), Timeout middleware caused infinite retries, no wall-clock timeout, Retry middleware reset per iteration, no <think> tag handling.
Fixes (based on competitive analysis of LangChain, OpenAI Agents SDK, CrewAI, AutoGen):
- Wall-clock timeout:
agent_config.max_execution_time : float option— loop checks elapsed time, returnsTimeouterror if exceeded - Retries consume iterations: retry path now passes
iterations + 1(was unchanged) — industry consensus from all competitors - Timeout middleware on_error removed: eliminates infinite-retry causal chain (Timeout mw → retryable=true → Retry mw → repeat)
- Retry budget per-invocation: removed per-iteration reset of retry counter — 3 retries is the total, not per-iteration
- Graceful degradation:
agent_config.early_stopping_method(Force|Generate) — when iterations exhausted andGenerate, makes one final LLM call for best-effort answer <think>/<reasoning>tag stripping:json_extract.mlnow strips reasoning blocks before JSON parsing — prevents spurious repair loops with DeepSeek-R1, QwQ, MiniMax-M3- Context-length error classification: engine detects context-length-exceeded errors from provider messages, applies context strategy, retries
New types: Types.early_stopping_method = Force | Generate
New agent_config fields: max_execution_time : float option, early_stopping_method : early_stopping_method
Changed — HTTP request timeout (fixes engine hang on long prompts)
Root cause: cohttp-eio Client.call and Buf_read.take_all had no timeout. When LLM response was slow (correlated with 800-1500 char prompts), the HTTP read blocked indefinitely. Combined with the single-threaded work loop, one stuck request wedged the entire Runtime.
Fix: Added Http_client.with_timeout — each do_request/do_request_streaming forks a daemon fiber that sleeps 60s then fails the switch. Timeout errors are mapped to Types.Timeout (not Invalid_input), enabling Retry middleware to retry automatically.
Known limitation: MCP HTTP/SSE transport (mcp_transport_http.ml) and fetch_url builtin tool do not yet have timeouts. A stuck MCP server or URL fetch can still wedge the Runtime. Deferred to v0.5.2.
Changed — Streaming architecture (buffered, no daemon thread)
Root cause fixed: Python _StreamReader previously ran par_invoke_stream on a daemon threading.Thread that had no OCaml domain lock, causing Fatal: no domain lock held on every streaming call. Fix: removed the daemon thread entirely. _StreamReader now calls par_invoke_stream on the main thread. The OCaml work loop buffers chunks internally and returns them all with the final result as JSON. Python parses the chunks array and yields Events.
Trade-off: chunks arrive all at once after the LLM completes (buffered, not incremental). True incremental streaming is planned for v0.5.2.
Changed — Configurable embedding model
Added embedding_model : string option to the Openai provider config variant. When set, overrides the default "text-embedding-3-small". Example:
["Openai", {"api_key": "...", "embedding_model": "Qwen/Qwen3-Embedding-8B"}]The Ollama variant does not yet have this field — Ollama embeddings use the OpenAI default (tracked as known limitation).
Changed — Dead code cleanup
Removed import queue, import threading, _DONE sentinel from runtime.py (no longer needed after streaming refactor).
Changed — Error handling
_StreamReader._fetch now raises PARInvokeError on status != "ok" instead of silently returning an empty iterator.
Changed — Documentation
Updated docs/sdk/streaming.md implementation notes to describe the buffered architecture. Updated invoke_stream docstring in runtime.py.
Real API Verification (SiliconFlow)
All 5 endpoints verified against real API:
- embed (Qwen3-Embedding-8B, 4096 dims): PASS
- add_documents: PASS
- invoke (Qwen2.5-7B-Instruct): PASS
- invoke_with_rag: PASS
- invoke_stream (4 chunks, no crash): PASS
Test Count
- 998 OCaml tests
- 57 Python tests (1 skipped)
Install
curl -fsSL https://raw.githubusercontent.com/jcz2020/par/main/install.sh | bashOr upgrade: par update
macOS: binary is unsigned. Run
xattr -cr ""once after install.
Full changelog: https://github.com/jcz2020/par/blob/main/CHANGES.md
v0.5.0
Apple Silicon macOS native wheel added.
pip install par-runtimenow works natively on Apple Silicon Macs, no source build or Rosetta required. Intel Mac users still fall back to source distribution (macos-13runner permanently abandoned 2026-06-19 — seeci.ymlL16). ARM64 Linux wheel deferred to v0.5.1+ — GitHub Actions free-tier ARM64 runners are saturated (queue 45min+, never dispatched) and qemu-binfmt on x86_64 host crashes manylinux container on start.
Added
par_runtime-0.5.0-py3-none-macosx_11_0_arm64.whl(6.5 MB) — ARM64 macOS wheel. Built onmacos-15runner,brew install gmp sqlite3,delocate-wheelfor dylib bundling,wheel tags --platform-tag macosx_11_0_arm64to set the platform tag (ctypes .so getspy3-none-anyby default from setuptools).
Changed
pypi-publish.yml: single hardcoded job refactored into a 2-job matrix{linux-x86_64, macos-arm64}.gh-release-uploadandpypi-uploadjobs download both artifacts and publish together.auditwheelfor Linux (existing v0.4.13 path unchanged),delocate-wheel+wheel tagsfor macOS.release-acceptance.yml: split into 2 jobs (accept-linux-x643-container matrix +accept-macos-arm64native runner). Removed the v0.4.13WHEEL_COUNT -ne 1guard (each job downloads its matching wheel by glob pattern, e.g.*-manylinux_2_28_x86_64.whl).
Trade-offs accepted
- No ARM64 Linux wheel in v0.5.0. Attempted
ubuntu-22.04-arm64,ubuntu-24.04-arm64runners — both saturated 45min+ on free tier. Attempted qemu-binfmt onubuntu-22.04host withquay.io/pypa/manylinux_2_28_aarch64:latestcontainer — container crashes on start (Error response from daemon: container is not running). User decision 2026-06-21: skip ARM64 Linux for v0.5.0, defer to v0.5.1+ when GH Actions ARM quota improves OR self-hosted runner is available. - No Intel Mac native wheel.
macos-13(Intel) runner queue 24h+ then max-execution-time on free tier (ci.ymlL16). Trueuniversal2(which needs bothmacos-13+macos-15slices) is NOT achievable without paid minutes. Intel Mac users continue to fall back to source distribution (PyPI 2026 Intel Mac share <5%).
Verification Evidence
- PyPI: https://pypi.org/project/par-runtime/0.5.0/ — 2 wheels:
par_runtime-0.5.0-py3-none-manylinux_2_28_x86_64.whl(11.3 MB) +par_runtime-0.5.0-py3-none-macosx_11_0_arm64.whl(6.5 MB) - GH Release: https://github.com/jcz2020/par/releases/tag/v0.5.0 — 2 wheel assets
- CI iterations: 8 (post1 → post8). Key fixes: (a) drop aarch64 job after ARM runner saturation; (b) rename macos wheel from
py3-none-anytomacosx_11_0_arm64viawheel tags(filename-only rename fails PyPI 400 — internal WHEEL Tag field must match); (c) remove stale aarch64 download steps after matrix shrink.
Install
curl -fsSL https://raw.githubusercontent.com/jcz2020/par/main/install.sh | bashOr upgrade: par update
macOS: binary is unsigned. Run
xattr -cr ""once after install.
Full changelog: https://github.com/jcz2020/par/blob/main/CHANGES.md
v0.5.0-beta.20260621.post8
(No changelog entry found for 0.5.0-beta.20260621.post8)
Install
curl -fsSL https://raw.githubusercontent.com/jcz2020/par/main/install.sh | bashOr upgrade: par update
macOS: binary is unsigned. Run
xattr -cr ""once after install.
Full changelog: https://github.com/jcz2020/par/blob/main/CHANGES.md
v0.5.0-beta.20260621.post7
(No changelog entry found for 0.5.0-beta.20260621.post7)
Install
curl -fsSL https://raw.githubusercontent.com/jcz2020/par/main/install.sh | bashOr upgrade: par update
macOS: binary is unsigned. Run
xattr -cr ""once after install.
Full changelog: https://github.com/jcz2020/par/blob/main/CHANGES.md
v0.5.0-beta.20260621.post6
(No changelog entry found for 0.5.0-beta.20260621.post6)
Install
curl -fsSL https://raw.githubusercontent.com/jcz2020/par/main/install.sh | bashOr upgrade: par update
macOS: binary is unsigned. Run
xattr -cr ""once after install.
Full changelog: https://github.com/jcz2020/par/blob/main/CHANGES.md
v0.5.0-beta.20260621.post5
(No changelog entry found for 0.5.0-beta.20260621.post5)
Install
curl -fsSL https://raw.githubusercontent.com/jcz2020/par/main/install.sh | bashOr upgrade: par update
macOS: binary is unsigned. Run
xattr -cr ""once after install.
Full changelog: https://github.com/jcz2020/par/blob/main/CHANGES.md
v0.5.0-beta.20260621.post4
(No changelog entry found for 0.5.0-beta.20260621.post4)
Install
curl -fsSL https://raw.githubusercontent.com/jcz2020/par/main/install.sh | bashOr upgrade: par update
macOS: binary is unsigned. Run
xattr -cr ""once after install.
Full changelog: https://github.com/jcz2020/par/blob/main/CHANGES.md
v0.5.0-beta.20260621.post3
(No changelog entry found for 0.5.0-beta.20260621.post3)
Install
curl -fsSL https://raw.githubusercontent.com/jcz2020/par/main/install.sh | bashOr upgrade: par update
macOS: binary is unsigned. Run
xattr -cr ""once after install.
Full changelog: https://github.com/jcz2020/par/blob/main/CHANGES.md
v0.5.0-beta.20260621.post2
(No changelog entry found for 0.5.0-beta.20260621.post2)
Install
curl -fsSL https://raw.githubusercontent.com/jcz2020/par/main/install.sh | bashOr upgrade: par update
macOS: binary is unsigned. Run
xattr -cr ""once after install.
Full changelog: https://github.com/jcz2020/par/blob/main/CHANGES.md
v0.5.0-beta.20260621.post1
(No changelog entry found for 0.5.0-beta.20260621.post1)
Install
curl -fsSL https://raw.githubusercontent.com/jcz2020/par/main/install.sh | bashOr upgrade: par update
macOS: binary is unsigned. Run
xattr -cr ""once after install.
Full changelog: https://github.com/jcz2020/par/blob/main/CHANGES.md