Skip to content

Sync upstream#7

Merged
offendingcommit merged 5 commits into
offendingcommit:mainfrom
plastic-labs:main
May 7, 2026
Merged

Sync upstream#7
offendingcommit merged 5 commits into
offendingcommit:mainfrom
plastic-labs:main

Conversation

@offendingcommit
Copy link
Copy Markdown
Owner

No description provided.

lowyelling and others added 5 commits May 5, 2026 10:41
…V-1485) (#635)

* docs(integrations): add @honcho-ai/vercel-ai-sdk guide

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* docs(integrations): rewrite Vercel AI SDK guide as cookbook style (DEV-1485)

Reshapes the guide to cookbook formula, adds Full Script section, fixes
maxSteps → stopWhen for ai-sdk v5, renames package, and prunes stale notes.
See PR for full decision log.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* docs(integrations): lead Vercel AI SDK verification with direct-inspection check

- Restructure Verifying section: direct inspection (token delta + dashboard) is now step 1 so readers isolate Honcho's contribution before grading model behavior
- Behavioral tests (first turn, multi-turn, cross-session, tool calling) follow as steps 2-5
- Note `result.toolCalls` as the way to confirm which Honcho tool fired (tool names don't appear in `result.text`)
- Signpost the Full Script from Complete Example so the two snippets read as a staircase, not a duplicate

Addresses review comments on PR #635.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(tests): satisfy basedpyright in test_representation_manager

The save-representation tests added in #615 were structurally correct but
failed strict typing in two places. Static Analysis has been red on main
since the merge.

- `mock_save.await_args` is `_Call | None`; assert it's not None before
  reading `.kwargs` / `.args` so basedpyright can narrow the type
- `SimpleNamespace(...)` passed as `message_level_configuration` is an
  intentional duck-typed mock (only `.dream.enabled` is read by
  `save_representation`), so opt out at the call site with
  `# pyright: ignore[reportArgumentType]` rather than constructing a
  full `ResolvedConfiguration` (matches the existing `reportPrivateUsage`
  ignore pattern in this file)

No runtime behavior changes; `uv run basedpyright` is now clean
project-wide.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(tests): pad timestamp windows in test_messages for clock skew

Three timestamp tests captured `before_request` / `after_request` with
`datetime.now(UTC)` on the host and asserted the server's `created_at`
fell within. Under Docker, the Postgres container's clock can skew tens
of ms from the macOS host, flipping the assertion intermittently under
parallel pytest load.

Pad each window by 1 second on both sides — wide enough to absorb
realistic skew, narrow enough that the test still proves the timestamp
is server-current.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(integrations): tighten Verifying section after end-to-end smoke

Smoke-tested all five verification steps against a fresh Sonnet 4.6 + Honcho integration. Three findings, all reflected here:

- Cross-session recall (#4): added Note about DERIVER_REPRESENTATION_BATCH_MAX_TOKENS=1024 — short warmups don't accumulate enough content to flush observations, so cross-session recall returns empty even on a working integration.
- Tool calling prompt (#5): replaced the honcho_chat patterns prompt with a verbatim-retrieval honcho_search prompt. Sonnet skips honcho_chat when middleware-injected context already answers; verbatim retrieval forces a fire.
- Tool inspection (#5): replaced result.toolCalls reference with result.steps[i].toolCalls + flatMap snippet. Top-level toolCalls is empty in multi-step calls (stopWhen: stepCountIs(N)) — the fires are nested inside steps.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(integrations): make Step 4 cross-session test durable via honcho_search

Replace the prose-recall test ("Based on what we've talked about, what do you know about me?") with a forced honcho_search call. Prose recall depended on the model getting deriver-built representation/peer-card in its system prompt, which is gated behind DERIVER_REPRESENTATION_BATCH_MAX_TOKENS=1024 — short tutorial-length conversations don't trigger it, producing false negatives on a working integration.

honcho_search hits message embeddings, which are computed synchronously at message persist time (src/crud/message.py:262-276), so peer-scoped retrieval works regardless of how short the prior session was. Also folds the result.steps[i].toolCalls inspection snippet from the old Step 5 into Step 4 — same prompt, no need for two sections.

Drops Step 5 entirely.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Add "Use the Skill" section recommending `npx skills add plastic-labs/vercel-ai-sdk`
with the manual symlink approach as a collapsed alternative.

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
The previous mutual-exclusion check compared --test-dir against its
default string literal, so passing --test-file together with an
explicit --test-dir tests/unified/test_cases silently bypassed the
check. Replace with argparse.add_mutually_exclusive_group() and apply
the default path post-parse so the bare invocation still works.
…ex-args

fix(tests/unified): use argparse mutex group for --test-dir/--test-file
* fix: internal N+1 query in dialectic agent calls

* fix: comments
@offendingcommit offendingcommit merged commit 7fec670 into offendingcommit:main May 7, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants