Skip to content

Auto-share session traces to a private HF dataset#193

Merged
lewtun merged 14 commits into
huggingface:mainfrom
DarshanCode2005:share-trace-on-HF
May 1, 2026
Merged

Auto-share session traces to a private HF dataset#193
lewtun merged 14 commits into
huggingface:mainfrom
DarshanCode2005:share-trace-on-HF

Conversation

@DarshanCode2005
Copy link
Copy Markdown
Contributor

@DarshanCode2005 DarshanCode2005 commented Apr 29, 2026

Resolves #191

What

Every ml-intern session now gets uploaded to your own private Hugging Face
dataset (default {hf_user}/ml-intern-traces) in the Claude Code JSONL
format that the HF Agent Trace Viewer
auto-detects. The dataset is created private. You can flip it to public
from inside the CLI.

The existing upload to smolagents/ml-intern-sessions is unchanged. That
dataset still feeds the backend KPI scheduler.

Why

Users wanted a way to browse, share, and debug their own runs through the
HF trace viewer without exposing them publicly by default.

Changes

Config

  • agent/config.py: added share_traces: bool = True and
    personal_trace_repo_template: str = "{hf_user}/ml-intern-traces".
  • configs/cli_agent_config.json and configs/frontend_agent_config.json:
    surface the same defaults.

Uploader

  • agent/core/session_uploader.py: rewrote the subprocess uploader.
    • New to_claude_code_jsonl(trajectory) converter. Maps litellm messages
      to user / assistant / tool_use / tool_result blocks. Deterministic
      SHA-1 UUIDs keyed by session_id::role::idx so re-uploads keep the
      parent chain stable. System prompts are skipped.
    • New CLI flags: --format {row, claude_code}, --token-env,
      --private.
    • Per-format status tracking on the local trajectory: upload_status
      for the org dataset and personal_upload_status for the user repo,
      so a failure on one path does not clobber the other.
    • Token resolution: --token-env overrides the org fallback chain
      (HF_SESSION_UPLOAD_TOKEN, HF_TOKEN, HF_ADMIN_TOKEN). The
      personal upload uses HF_TOKEN directly.
    • private is now passed into create_repo instead of being hardcoded.
      Existing repos keep whatever visibility the user set.

Session wiring

  • agent/core/session.py:

    • Added _personal_trace_repo_id(). Returns None when sharing is
      disabled, the user is anonymous, or the template is missing.
    • Added _spawn_uploader() helper to keep subprocess args in one place.
    • save_and_upload_detached now spawns two detached subprocesses: one
      for the org dataset (row format, existing token chain) and one for
      the per-user private dataset (Claude Code format, HF_TOKEN).
    • retry_failed_uploads_detached now accepts personal_repo_id and
      runs both retry passes in parallel.
  • agent/core/agent_loop.py: pass
    personal_repo_id=session._personal_trace_repo_id() into the startup
    retry so personal uploads that failed in a prior session get
    re-attempted on next launch.

CLI

  • agent/main.py: new /share-traces slash command.
    • /share-traces shows the dataset URL and current visibility.
    • /share-traces public flips to public via
      HfApi.update_repo_settings.
    • /share-traces private flips back to private.
    • Idempotent create_repo so the first flip works even before any
      session has been saved.
    • Uses the user's own HF_TOKEN.
  • agent/utils/terminal_display.py: added the command to the help text.

Docs

  • README.md: new "Sharing Traces" section covering default behavior,
    the slash command, the opt-out flag, and how to override the template.

Edge cases handled

  • Anonymous user (no resolvable HF username): personal upload is skipped.
  • HF_TOKEN missing: personal upload is skipped, org upload is unaffected.
  • Pre-existing local session logs created before this change: the personal
    retry path skips files that have never been tagged, so we do not
    suddenly re-upload old sessions to a newly created personal repo.
  • Heartbeat saves: the same trajectory is rewritten every minute. Both
    upload paths run on every save, the Hub deduplicates by content hash so
    the cost is small.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 30, 2026

Claude finished @lewtun's task in 3m 18s —— View job


PR Review: Auto-share session traces to a private HF dataset

No blocking issues — 3 P1
Verdict: ready to merge

What I checked

  • session_uploader.py: JSONL converter, token resolution, upload/retry logic, status field design
  • session.py: subprocess wiring, personal repo resolution, concurrent upload paths
  • agent_loop.py: startup retry pass for personal repo
  • main.py: /share-traces command, token flow
  • config.py: new fields, defaults

P1 — Concurrent write race between org and personal uploaders

save_and_upload_detached spawns two subprocesses that each read-modify-write the same local JSON file with no lock. Both processes load the file at startup, then write back their respective status key:

  • session_uploader.py:374-377 — org uploader writes upload_status = "success"
  • session_uploader.py:374-377 (personal) — writes personal_upload_status = "success"

If the personal uploader reads the file before the org uploader finishes writing, the personal uploader's final json.dump will overwrite with a stale copy that still has upload_status: "pending". On the next session start, retry_failed_uploads at session.py:583-598 re-uploads the session to the org dataset unnecessarily.

The org Hub dataset receives spurious duplicate uploads; the Hub is content-hash idempotent so no data is corrupted, but the retry behavior is incorrect.

Fix this →


P1 — Personal upload silently skipped when user authenticated via huggingface-cli login

_spawn_uploader at session.py:503 passes --token-env HF_TOKEN, so the uploader subprocess resolves the personal upload token via os.environ.get("HF_TOKEN") only (session_uploader.py:44). Users who ran huggingface-cli login (which saves the credential to ~/.cache/huggingface/token but does not export HF_TOKEN) will silently fail personal uploads — the local file is immediately marked personal_upload_status: "failed" at session_uploader.py:323-325.

The resolved token is available in session.hf_token (set from resolve_hf_token() in main.py:934) but is never propagated to the subprocess environment. The /share-traces command correctly uses session.hf_token at main.py:854, so the inconsistency is apparent.

Fix this →


P1 — All trace events share the session start timestamp

to_claude_code_jsonl at session_uploader.py:142-143 assigns the session start time to a single timestamp variable, then uses it for every emitted event (session_uploader.py:164, 202, 225). The Claude Code JSONL schema uses per-event timestamps to drive the timeline view in the HF Agent Trace Viewer; with all events sharing the same timestamp they appear simultaneous and the viewer cannot reconstruct turn ordering from time alone.

The per-message timestamp information is not currently stored in the trajectory dict. A minimal fix would be to store datetime.now().isoformat() at the time each message is added to the context manager and include it in get_trajectory()'s messages serialisation.

Fix this →


Notes

  • The create_repo exception-eating pattern at session_uploader.py:354-363 swallows auth/permission errors, making silent failures harder to diagnose. Non-blocking since upload_file will also fail and be marked correctly.
  • Using Python's builtin name format as a parameter (session_uploader.py:285, 409) is a minor code smell but has no runtime impact in this context.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 30, 2026

Claude finished @lewtun's task in 2m 41s —— View job


PR Review: Auto-share session traces to a private HF dataset

  • Gather context and read changed files
  • Check if previously flagged P1 issues were addressed
  • Look for new P0 issues
  • Post findings

No blocking issues — 0 new findings
Verdict: ready to merge

What I checked

  • session_uploader.py: concurrent write paths, token resolution, JSONL converter
  • session.py: _spawn_uploader, _personal_trace_repo_id, save_and_upload_detached, retry logic
  • agent_loop.py: startup retry wiring for personal repo
  • main.py: /share-traces command, token flow, update_repo_settings call
  • config.py: new fields and defaults

Re-review notes

The two new commits (a83ad51, 78f8cd2) are both merge-from-main syncs — no changes to the trace-sharing code. The three P1 findings from the prior review are unaddressed; per review policy they are not re-flagged here, and the author may defer them to a follow-up.

No new P0 findings were identified.

lewtun and others added 2 commits May 1, 2026 11:12
Co-authored-by: Codex <codex@openai.com>
Co-authored-by: Codex <codex@openai.com>
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 1, 2026

Claude finished @lewtun's task in 2m 56s —— View job


PR Review: Auto-share session traces to a private HF dataset

  • Gather context and read changed files
  • Check what new commits have been pushed since last review
  • Look for new P0 issues (re-review: suppressing P1s per policy)
  • Post findings

No new P0 findings
Verdict: ready to merge

What I checked

  • backend/dependencies.py: new _dev_user_from_env() function — token resolution, exception handling, fallback behavior, caching behavior
  • agent/core/hf_access.py: fetch_whoami_v2 implementation (no internal caching confirmed)
  • agent/config.py, agent/core/session_uploader.py, configs/*.json, README.md: dataset rename propagation
  • session_manager.py + routes/agent.py: how user_id flows from get_current_user into sessions (to confirm the rename doesn't break the repo path derivation)

Re-review notes

Two new real-code commits since the last review (a83ad51):

5e70584 — Use HF token owner for local dev auth introduces _dev_user_from_env() (backend/dependencies.py:141). The function correctly falls back to DEV_USER at every failure point. One note (P1, not re-flagged under re-review policy): _dev_user_from_env() calls fetch_whoami_v2 twice per request — once at line 152, then again inside _fetch_user_plan at line 169 — with no caching. In contrast, the production path caches _validate_token results for 5 minutes. This means dev mode makes 2 uncached HTTP round-trips to api.huggingface.co on every request; if the HF API is slow or unavailable, requests will block up to the 5-second fetch_whoami_v2 timeout before falling back to DEV_USER. Not blocking since the fallback is correct and offline dev still works, but a simple functools.lru_cache or module-level (token, resolved_user) pair would eliminate the redundancy.

14e2690 — Rename personal session trace dataset (ml-intern-tracesml-intern-sessions) propagates cleanly across all 6 touch points. Any testers who already created the {user}/ml-intern-traces dataset during branch testing will have their old traces orphaned (new sessions write to the new name), but since this is pre-merge that's an acceptable disruption.

The three P1 findings from the first review remain outstanding (concurrent write race, huggingface-cli login silent failure, shared event timestamps). Per review policy they are not re-flagged here.

Co-authored-by: Codex <codex@openai.com>
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 1, 2026

Claude finished @lewtun's task in 3m 36s —— View job


PR Review: Auto-share session traces to a private HF dataset

  • Gather context and read changed files
  • Check what new commits have been pushed since last review
  • Look for new P0 issues (re-review: suppressing P1s per policy)
  • Post findings

No new P0 findings
Verdict: ready to merge

What I checked

  • session_uploader.py: dataset_card_readme, _upload_dataset_card, integration into upload_session_as_file retry loop, token and repo_id propagation
  • tests/unit/test_session_uploader.py: coverage of README content and format-gated upload behavior
  • session.py: heartbeat save path — confirmed personal_upload_status is wiped on each heartbeat, meaning _upload_dataset_card runs on every save (pre-existing design, acknowledged in PR description as acceptable)

Re-review notes

One new real-code commit since the last review (14e2690):

4cc7717 — Add session dataset card metadata adds dataset_card_readme() and _upload_dataset_card() to session_uploader.py, and two unit tests. The implementation is clean. Note that _upload_dataset_card sits inside the for attempt in range(max_retries) retry loop (session_uploader.py:445), so a transient failure in the README upload consumes retry budget that would otherwise go to the session JSONL upload. This is worth watching for rate-limit scenarios but is not a blocking concern given the Hub's per-user rate headroom.

The three P1 findings from the first review (concurrent write race, huggingface-cli login silent failure, shared event timestamps) remain outstanding. Per review policy they are not re-flagged here.

json.dump(data, f, indent=2)
return False

# Scrub secrets (HF tokens, API keys, etc.) from messages + events
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was this removed? We should retain some best-effort scrubbing

Co-authored-by: OpenAI Codex <codex@openai.com>
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 1, 2026

Claude finished @lewtun's task in 7m 16s —— View job


PR Review: Auto-share session traces to a private HF dataset

  • Gather context and read changed files
  • Investigate lewtun's comment about removed scrubbing at session_uploader.py:68
  • Check new commits since last review (10e7279)
  • Post findings

No new P0 findings
Verdict: ready to merge

What I checked

  • session_uploader.py: _update_upload_status fcntl lock correctness, _resolve_token chain for HF_TOKEN, scrubbing coverage in _write_row_payload and _write_claude_code_payload
  • session.py: _spawn_uploader token propagation via _ML_INTERN_PERSONAL_TOKEN env var
  • context_manager/manager.py: per-message timestamp assignment and litellm.Message extra-field preservation in model_dump()
  • hf_tokens.py: resolve_hf_token + get_cached_hf_token() covering huggingface-cli login path
  • tests/unit/test_session_uploader.py: new coverage for token precedence, atomic status update, per-message timestamps
  • tests/unit/test_dangling_tool_calls.py: timestamp preservation test

Re-review notes

10e7279 — Fix session trace upload review issues addresses all three P1 findings from the first review:

  • Concurrent write race (P1, first review): fixed — _update_upload_status re-reads the file under fcntl.LOCK_EX before writing, and _read_session_file uses LOCK_SH. The main-process atomic rename (tmp.replace(filepath)) is compatible because the status update opens a fresh file descriptor to the renamed inode.
  • huggingface-cli login silent failure (P1, first review): fixed — _spawn_uploader now passes self.hf_token into the subprocess via _ML_INTERN_PERSONAL_TOKEN; _resolve_token("HF_TOKEN") reads that first, then falls through to resolve_hf_token(…, include_cached=True) which calls huggingface_hub.get_token() covering the cached-login case (hf_tokens.py:19-22).
  • Shared event timestamps (P1, first review): fixed — ContextManager.add_message stamps each message at append time (manager.py:243-244); litellm.Message uses extra="allow" so the field survives model_dump() and is picked up in to_claude_code_jsonl at session_uploader.py:172.

Addressing @lewtun's comment (session_uploader.py:68)

The scrubbing was not removed — it was refactored out of upload_session_as_file into the _scrub() helper introduced at line 68 in the new file. It is called in both upload paths:

  • _write_row_payload:258-260 — scrubs messages, events, and tools (same coverage as the original inline code in main)
  • _write_claude_code_payload:282 — scrubs messages before passing to to_claude_code_jsonl (events/tools are not included in the Claude Code JSONL output)

Additionally, save_trajectory_local scrubs the trajectory before writing to disk (session.py:401-407), so the local file already has secrets redacted before any uploader subprocess reads it.

Co-authored-by: OpenAI Codex <codex@openai.com>
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 1, 2026

Claude finished @lewtun's task in 1m 26s —— View job


PR Review: Auto-share session traces to a private HF dataset

  • Gather context and read changed files
  • Check what new commits have been pushed since last review
  • Look for new P0 issues (re-review: suppressing P1s per policy)
  • Post findings

No new P0 findings
Verdict: ready to merge

What I checked

  • session_uploader.py: _scrub_session_for_upload correctness, both write paths (_write_row_payload, _write_claude_code_payload) using the new helper, shallow copy safety
  • tests/unit/test_session_uploader.py: new scrubbing tests for both formats — coverage of HF token, Anthropic key, and GitHub token patterns across messages, events, and tools
  • Dataset card README wording accuracy

Re-review notes

One new real-code commit since the last review (10e7279):

ef14549 — Preserve secret scrubbing before trace uploads addresses @lewtun's inline comment at session_uploader.py:68. The fix introduces _scrub_session_for_upload (session_uploader.py:256-262) that scrubs messages, events, and tools in one place, and both _write_row_payload and _write_claude_code_payload now call it. The _write_claude_code_payload path previously only scrubbed messages; now events and tools are also scrubbed (they are not passed to to_claude_code_jsonl but the helper is now symmetric and future-proof). The shallow copy is correct — scrubbed = dict(data) followed by key replacement does not mutate the caller's data.

Two new tests verify end-to-end that HF, Anthropic, and GitHub secrets are replaced in both the row and claude_code output files.

The dataset card README update accurately describes the automated scrubbing and appropriately qualifies the warning as "no comprehensive redaction" — this is a correct and honest description.

The three P1 findings from the first review (concurrent write race, huggingface-cli login silent failure, shared event timestamps) were all addressed in 10e7279 and acknowledged in the prior review. No further concerns.

Co-authored-by: OpenAI Codex <codex@openai.com>
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 1, 2026

Claude finished @lewtun's task in 1m 8s —— View job


PR Review: Auto-share session traces to a private HF dataset

  • Gather context and read changed files
  • Check new commits since last review
  • Look for new P0 issues (re-review: suppressing P1s per policy)
  • Post findings

No new P0 findings
Verdict: ready to merge

What I checked

  • session_uploader.py:348-417: dataset_card_readme template change — removed self-referential Repository: link (which was interpolated via f-string), added hardcoded demo and CLI links
  • tests/unit/test_session_uploader.py:26-30: new assertions verify the added links and that the old link pattern is absent
  • session_uploader.py:426: call site still passes repo_id — parameter is now unused in the template but callers are unaffected

Re-review notes

One new real-code commit since the last review (ef14549):

c906a38 — Link ML Intern demo in dataset card replaces the per-repo self-link (https://huggingface.co/datasets/{repo_id}) in the dataset card README with two hardcoded links to the ML Intern demo Space and CLI repo. The old link was functional (interpolated via f-string) but redundant since a reader is already on the dataset page. The repo_id parameter at session_uploader.py:348 is now unused inside the template; the call site at session_uploader.py:426 still passes it without error. Not flagging as the function signature is a minor cosmetic cleanup candidate at most (P2).

The three P1 findings from the first review (concurrent write race, huggingface-cli login silent failure, shared event timestamps) were all addressed in 10e7279. No further concerns.

Copy link
Copy Markdown
Member

@lewtun lewtun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for the nice PR @DarshanCode2005 ! LGTM now with some fixes I pushed

@lewtun lewtun merged commit 0bd7547 into huggingface:main May 1, 2026
2 checks passed
@DarshanCode2005
Copy link
Copy Markdown
Contributor Author

Thanks @lewtun. Would it be alright if I emailed you for some career advice?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

automatically share my traces on HF

2 participants