Skip to content

feat(training-export): overhaul trigger system and message conversion#79703

Draft
wzhgba wants to merge 1 commit into
openclaw:mainfrom
SenseTime-FVG:wuzehuan/training-export
Draft

feat(training-export): overhaul trigger system and message conversion#79703
wzhgba wants to merge 1 commit into
openclaw:mainfrom
SenseTime-FVG:wuzehuan/training-export

Conversation

@wzhgba
Copy link
Copy Markdown

@wzhgba wzhgba commented May 9, 2026

Draft / work-in-progress — this PR is under active development. Feedback welcome on the overall direction.

Summary

Introduce a trajectory-first, trigger-driven training export system that produces episode-level JSONL data from the OpenClaw runtime — no offline reconstruction, no separate pipeline. The system is opt-in (trainingExport.enabled: true) and writes to:

~/.openclaw/training-export/episodes.jsonl

Each line is a self-contained training sample: a task episode (full agent turn with system prompt, messages, tools, metadata) or a compact-summary episode (compression prompt → summary pair for RL compaction training).

Relationship to Existing Systems

/export-trajectory (human-facing debug bundles)

The existing /export-trajectory command (docs at docs/tools/trajectory.md) produces redacted interactive support bundles for human debugging — prompt timelines, tool traces, transcript snapshots, usage metadata. It is triggered manually by users or support staff.

The training export introduced here is complementary and non-overlapping:

/export-trajectory Training Export
Purpose Human debugging, support Machine training data
Trigger Manual command Automatic (compaction, reset, export command)
Output Redacted bundle directory JSONL episodes
Format Directory of text/markdown files One JSON line per episode
Privacy Redacted (best-effort) Full content (machine-consumed; opt-in)
Audience Developers, support RL training pipelines

Both systems read from the same trajectory (trajectory capture / cache-trace). Training export simply produces a different output format for a different consumer, alongside the existing mechanism.

Compaction subsystem

Training export hooks into the Pi SDK compaction lifecycle (session_before_compact + session_compact) to capture:

  • Pre-compaction context (task episode): the full conversation before compression
  • Post-compaction summary (summary episode): the prompt sent to the summarization model + the summary it produced, with compaction metadata (tokensBefore, firstKeptEntryId, fromExtension)

This is the same data the compaction system already computes internally — training export just persists it in a structured training format before it is discarded.

Key Design Decisions

1. Trajectory-first

All training fields (system prompt, messages, tools, model metadata) come from runtime trajectory context.compiled events. Message and tool conversion delegates to the Pi SDK / provider layer (convertMessages from @mariozechner/pi-ai/openai-completions).

2. Unified compaction hook

A single Pi SDK extension (session_before_compact + session_compact) handles all compaction modes (default, safeguard, manual, overflow, timeout). No runTrainingExport calls scattered across individual compaction paths.

3. Pair-export guarantee

For compaction-triggered exports, task and summary episodes are built as a batch. If either is filtered by quality checks, the entire batch is discarded — no orphaned episodes.

4. Config-gated at every call site

getTrainingExportConfig(cfg)?.enabled === true is checked at all three entry points (extension registration, session reset, trajectory export command), so reviewers can see the opt-in gating logic without digging into implementation details.

5. compactionSummary bridging

Pi SDK's convertToLlm converts compactionSummaryuser messages, but the upstream convertMessages from @mariozechner/pi-ai/openai-completions does not handle the compactionSummary role. A pre-processing step (sharing a single map with thinking-block stripping) mirrors Pi SDK's conversion format before handing off to the upstream converter.

6. Training-quality message filtering (all triggers)

Training episodes must end with a complete assistant message — regardless of trigger type. Any snapshot (compaction, reset, or trajectory export) may end mid-turn at a non-assistant message (e.g. toolResult). Trailing non-assistant messages are trimmed from every trigger's output. The trainExampleMessagesAreUsable check requires ≥1 user + ≥1 assistant; if trimming leaves the episode unusable, it is discarded entirely. This is a universal training-data quality requirement, not a compaction-specific behavior.

7. Reset export is independent of plugin hooks

The before_reset training export call is placed outside emitGatewayBeforeResetPluginHook, so it fires regardless of whether any before_reset plugin hooks are registered.

8. Private file permissions

The export directory (~/.openclaw/training-export) and JSONL file are created with private filesystem modes (0o700 / 0o600) to prevent world-readable access to training data.

Files Changed

File Change
src/training-export.ts New — core module: snapshot collection, episode construction, JSONL I/O, prompt constants, compaction extension
src/training-export.test.ts New — test suite
docs/training-export.md New — formal feature documentation
src/config/types.openclaw.ts Add trainingExport config type (enabled, compat)
src/config/zod-schema.ts Add trainingExport schema
src/config/schema.help.ts Add field help text
src/config/schema.labels.ts Add field labels
src/agents/pi-embedded-runner/extensions.ts Register compaction extension (config-gated, opt-in)
src/gateway/session-reset-service.ts before_reset trigger (config-gated, outside hook function)
src/auto-reply/reply/commands-export-trajectory.ts trajectory_export trigger (config-gated, alongside existing command)
src/agents/openai-transport-stream.ts Minor: export convertResponsesMessages for use in conversion pipeline

Configuration

trainingExport:
  enabled: true            # default: false (opt-in)
  compat: {}               # optional ModelCompatConfig override for export path

When enabled is false (the default), the extension is not registered and runTrainingExport is never called — zero overhead.

How to Test

  1. Enable via trainingExport.enabled: true
  2. Trigger a compaction in a session long enough to exceed the context threshold
  3. Check ~/.openclaw/training-export/episodes.jsonl — should contain paired task + summary episodes with compaction metadata
  4. Reset a session — should produce a task episode
  5. Run /export-trajectory — should produce a task episode via the training export path as well
  6. Disable via trainingExport.enabled: false — episodes file should receive no new entries

Open Questions for Review

  1. Default to opt-in (enabled: false) — is this the right default, or should we consider a different approach?
  2. Privacy and retention policy — the training export writes full (non-redacted) session content to disk. Should there be a retention/cleanup mechanism?

@openclaw-barnacle openclaw-barnacle Bot added docs Improvements or additions to documentation gateway Gateway runtime agents Agent runtime and tooling size: XL triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup. labels May 9, 2026
@clawsweeper
Copy link
Copy Markdown
Contributor

clawsweeper Bot commented May 9, 2026

Codex review: needs real behavior proof before merge.

Summary
The PR adds an opt-in core training export config, JSONL episode writer, compaction/reset/trajectory trigger hooks, documentation, and unit tests.

Reproducibility: Do we have a high-confidence way to reproduce the issue? Not applicable: this is a new feature PR rather than a bug report, and the missing evidence is after-fix real behavior proof from a real setup.

Real behavior proof
Needs real behavior proof before merge: No after-fix compaction/reset/trajectory JSONL output, terminal output, recording, linked artifact, or redacted runtime log is attached; the contributor should add redacted proof and update the PR body for re-review.

Next step before merge
Human review is needed because the PR is draft/conflicting, lacks contributor real behavior proof, and needs a maintainer-approved privacy/retention contract before automation should attempt repair.

Security
Needs attention: Needs attention because the patch adds a persistent unredacted session-content export path before the retention, cleanup, sharing, and redaction boundary is settled.

Review findings

  • [P1] Use the current Pi package scope — src/training-export.ts:12-15
  • [P1] Remove the unused pre-compact config read — src/training-export.ts:1082
  • [P2] Define retention before appending training exports — src/training-export.ts:824
Review details

Best possible solution:

Land only after the branch is rebased to current main, uses the current Pi package scope, passes core typechecking, encodes an approved retention/redaction contract, and includes redacted real-runtime proof for each trigger path.

Do we have a high-confidence way to reproduce the issue?

Do we have a high-confidence way to reproduce the issue? Not applicable: this is a new feature PR rather than a bug report, and the missing evidence is after-fix real behavior proof from a real setup.

Is this the best way to solve the issue?

Is this the best way to solve the issue? No for merge as-is; the trajectory-first direction may be viable, but the patch must be updated to current dependency names and an explicit privacy/retention policy before the core append path ships.

Full review comments:

  • [P1] Use the current Pi package scope — src/training-export.ts:12-15
    Current main depends on and imports @earendil-works/pi-*, but this new core file imports @mariozechner/pi-*. Those modules are not declared in the current package contract, so the branch will fail typecheck/runtime resolution after rebasing unless these imports use the current scope.
    Confidence: 0.94
  • [P1] Remove the unused pre-compact config read — src/training-export.ts:1082
    config is assigned inside the session_before_compact handler but never read there. Because tsconfig.core.json enables noUnusedLocals for production src/**/*, this remains a core typecheck failure even after the package imports are fixed.
    Confidence: 0.9
  • [P2] Define retention before appending training exports — src/training-export.ts:824
    This append path persists trajectory-derived prompts, messages, tools, and metadata to a long-lived JSONL file while the PR still leaves retention and cleanup as an open question. The feature needs an agreed policy or cleanup mechanism before enabling users to accumulate unredacted session content indefinitely.
    Confidence: 0.86

Overall correctness: patch is incorrect
Overall confidence: 0.93

Security concerns:

  • [medium] Define training export retention — src/training-export.ts:824
    The new exporter appends trajectory-derived session content to a long-lived JSONL file under the OpenClaw state directory; private file modes help, but the PR still leaves retention and cleanup policy unresolved.
    Confidence: 0.86

What I checked:

  • Live PR state: GitHub reports this PR is open, draft, not maintainer-modifiable, mergeable=CONFLICTING, and at head 6ace302; the Real behavior proof check failed. (6ace302b5313)
  • Current main lacks the feature: No current-main source/docs/package matches were found for trainingExport, runTrainingExport, training-export, episodes.jsonl, or compactionTrainingExport, so the PR is not obsolete on main. (83b8289ee274)
  • PR imports obsolete Pi package scope: The new core module imports @mariozechner/pi-agent-core, @mariozechner/pi-ai, and @mariozechner/pi-coding-agent, but current main uses the @earendil-works scope. (src/training-export.ts:12, 6ace302b5313)
  • Current dependency contract: Current main declares @earendil-works/pi-agent-core, @earendil-works/pi-ai, @earendil-works/pi-coding-agent, and @earendil-works/pi-tui at 0.74.0; package and lock searches only show @mariozechner clipboard packages, not the PR's Pi imports. (package.json:1754, 83b8289ee274)
  • Core typecheck rejects unused locals: The production core TypeScript config enables noUnusedLocals and noUnusedParameters for src/**/*, so unused variables in src/training-export.ts are build blockers. (tsconfig.core.json:4, 83b8289ee274)
  • PR leaves an unused pre-compact variable: The session_before_compact handler assigns config from the runtime registry but never reads it in that handler, which will trip the current noUnusedLocals contract after imports are updated. (src/training-export.ts:1082, 6ace302b5313)

Likely related people:

  • steipete: Current-main blame and recent commits connect steipete to the Pi package-scope state, trajectory runtime/docs, and OpenAI transport surfaces that this PR extends. (role: recent area contributor; confidence: high; commits: 365c986a5b2a, 474bea162b4d, 1888242bd30a; files: package.json, src/trajectory/runtime.ts, docs/tools/trajectory.md)
  • bradhallett: Recent merged work changed Pi auto-compaction behavior in the same embedded extension factory area that this PR modifies. (role: adjacent compaction contributor; confidence: medium; commits: 0bdba47a3e89; files: src/agents/pi-embedded-runner/extensions.ts)
  • vincentkoc: Recent merged work touched provider runtime selection and harness extension boundaries, which are relevant to the core-vs-runtime integration shape of this PR. (role: adjacent runtime/config contributor; confidence: medium; commits: aa27e27f3606, 47f6a98909b5; files: src/agents/pi-embedded-runner/extensions.ts, src/config/types.openclaw.ts)

Remaining risk / open question:

  • The PR persists session-derived training data before the retention, cleanup, sharing, and redaction contract is approved.
  • No contributor-supplied real behavior proof shows compaction, reset, or /export-trajectory JSONL output from a real OpenClaw setup.
  • The branch is draft, conflicting, and not maintainer-modifiable, so normal review and repair paths are limited until the contributor updates it.

Codex review notes: model gpt-5.5, reasoning high; reviewed against 83b8289ee274.

@wzhgba wzhgba force-pushed the wuzehuan/training-export branch from b675a0d to 4f6af34 Compare May 9, 2026 08:10
@openclaw-barnacle openclaw-barnacle Bot added the triage: refactor-only Candidate: refactor/cleanup-only PR without maintainer context. label May 9, 2026
@wzhgba wzhgba force-pushed the wuzehuan/training-export branch 3 times, most recently from 8649e3e to 9f06f41 Compare May 9, 2026 10:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents Agent runtime and tooling docs Improvements or additions to documentation gateway Gateway runtime size: XL triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup. triage: refactor-only Candidate: refactor/cleanup-only PR without maintainer context.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant