Implement MCP and SDK streaming messages by dschwarz26 · Pull Request #114 · futuresearch/futuresearch-python

dschwarz26 · 2026-02-09T00:58:27Z

See for example:

SDK (task.py): - Add ProgressInfo dataclass and progress extraction from status endpoint - Update await_task_completion with stderr output, JSONL logging, and progress callback (triggers on any count change, not just completed) - Add session URL display and ETA estimates MCP server (server.py): - Add everyrow_agent_submit, everyrow_rank_submit (non-blocking submit) - Add everyrow_progress (12s server-side blocking, chaining instructions) - Add everyrow_results (fetch data, save CSV, cleanup session) - Update auth from whoami to get_billing (v0.2.0 API change) - Keep existing blocking tools for backward compatibility Plugin + hooks: - Add PostToolUse hooks for submit/progress/results tracking - Add Stop hook guard to prevent agent stopping during active tasks - Add SessionEnd cleanup hook - Add status line script for persistent progress bar Includes 33 MCP tests, 5 hook test suites, e2e verification script. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Add structured_output=False to all MCP tool decorators to suppress FastMCP's structuredContent generation (Claude Code displays JSON blob otherwise) - Server writes /tmp/everyrow-task.json directly instead of hooks parsing tool_response (avoids fragile double-escaped JSON) - Simplify hook scripts to no-ops (submit/progress) since server handles state - Add dev-claude.sh for local development with plugin + local engine - Fix verify_transcript.sh to use substring matching for tool names Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

everyrow-mcp/src/everyrow_mcp/server.py

src/everyrow/task.py

.github/workflows/checks.yml

jackwildman · 2026-02-09T13:25:59Z

everyrow-mcp/src/everyrow_mcp/server.py

+
+@mcp.tool(name="everyrow_progress", structured_output=False)
+async def everyrow_progress(params: ProgressInput) -> list[TextContent]:
+    """Check progress of a running everyrow task. Blocks ~12s before returning.


jackwildman · 2026-02-09T13:30:05Z

everyrow-mcp/src/everyrow_mcp/server.py

+_active_tasks: dict[str, dict[str, Any]] = {}
+
+PROGRESS_POLL_DELAY = 12  # seconds to block in everyrow_progress before returning
+TASK_STATE_FILE = "/tmp/everyrow-task.json"


This seems like a pretty bad idea, and blocks anyone from using more than one Claude to do anything with everyrow. As far as I can see, we always know the task context we're interested in when handling this file, so could we call this /tmp/everyrow-task-{task-id}.json instead?

everyrow-mcp/src/everyrow_mcp/server.py

CallumMcMahon · 2026-02-09T13:43:39Z

everyrow-mcp/src/everyrow_mcp/server.py


+# Track active tasks for the submit/poll pattern.
+# Maps task_id -> {session, client, total, session_url, started_at}
+_active_tasks: dict[str, dict[str, Any]] = {}


do we need to make the server stateful here? seems like it shouldn't be needed?

CallumMcMahon · 2026-02-09T13:57:51Z

everyrow-mcp/src/everyrow_mcp/server.py

+    _active_tasks[task_id] = {
+        "session": session,
+        "session_ctx": session_ctx,
+        "client": client,
+        "total": total,
+        "session_url": session_url,
+        "started_at": time.monotonic(),
+        "input_csv": params.input_csv,
+        "prefix": "ranked",
+    }


I'm not convinced any of this is needed. maybe an @cache function somewhere for task id-> session, but everything else seems like unnecessary state. especially in the context of potentially moving this MCP to the web

src/everyrow/task.py

CallumMcMahon · 2026-02-09T14:00:46Z

src/everyrow/task.py

+
+
+def _default_progress_output(progress: ProgressInfo, total: int, elapsed: float) -> None:
+    """Print a progress line to stderr."""


why stderr? is this to avoid stdio output stream getting filled with stuff not related to MCP messages?
seems like bad practice to dump stuff to stderr

src/everyrow/task.py

CallumMcMahon · 2026-02-09T14:05:10Z

ARCHITECTURE.md

+- `everyrow_merge` — Join two CSVs by intelligent entity matching
+- `everyrow_agent` — Run web research agents on each row
+
+Submit/poll tools (for long-running operations):


why not re-use the _async naming pattern of the sdk? submit doesn't seem very descriptive

CallumMcMahon · 2026-02-09T14:09:15Z

ARCHITECTURE.md

+- `everyrow_progress` — Poll task status (blocks ~12s server-side, returns progress text)
+- `everyrow_results` — Retrieve completed results, save to CSV
+
+All tools use `@mcp.tool(structured_output=False)` to suppress FastMCP's `structuredContent` field. Without this, Claude Code displays raw JSON blobs instead of clean text (see [claude-code#9962](https://github.com/anthropics/claude-code/issues/9962)).


does it matter much what claude code sees? I'd probably optimise MCP user experience for claude desktop/GUI clients. It's possible that they also display it weirdly...

CallumMcMahon · 2026-02-09T14:12:06Z

ARCHITECTURE.md

+Long-running operations (agent_map, rank) use a submit/poll pattern because:
+- Operations take 1–10+ minutes
+- LLMs cannot tell time and will hallucinate if asked to wait ([arXiv:2601.13206](https://arxiv.org/abs/2601.13206))
+- Client-side timeouts (60s in Codex CLI) kill blocking calls


if we used a streaming api, it could start by returning the task id, then if the timeout kills it the client can fallback to fetching/polling with the additional endpoint? that way we smoothly transition from short blockable tasks, to those that just miss out on the timeout, without killing the task if the timeout is hit

CallumMcMahon · 2026-02-09T14:13:45Z

ARCHITECTURE.md

+Stop:
+Runs `everyrow-stop-guard.sh`, reads `/tmp/everyrow-task.json`. If a task is running, outputs `{"decision": "block", "reason": "..."}` which prevents Claude from ending its turn. The reason text instructs Claude to call `everyrow_progress` to check status.


wait so we continually burn through CC tokens until the task finishes, trapping CC in a loop of needing to call everyrow_progress?

CallumMcMahon · 2026-02-09T14:16:02Z

ARCHITECTURE.md

+
+Server-Sent Events (SSE): Would replace polling with push notifications. Adds complexity (persistent connections, reconnection logic) for marginal gain. The 12s polling cadence already provides smooth UX. MCP's stdio transport doesn't support SSE natively.
+
+Hook-based state tracking: The original design had PostToolUse hooks on `_submit` and `_progress` tools parse `tool_response` JSON and write the task state file. This was fragile because plugin MCP tool responses are double-escaped JSON strings (`{"result": "<escaped>"}`) that required careful parsing. Moving state writes into the MCP server itself (`_write_task_state()`) was simpler and more reliable.


I still don't follow why we need state at all

docs/reference/MCP.md

docs/progress-monitoring.md

docs/progress-monitoring.mdx

docs/progress-monitoring.md

everyrow-mcp/scripts/everyrow-statusline.sh

everyrow-mcp/scripts/verify_transcript.sh

dev-claude.sh

TESTING.md

everyrow-mcp/src/everyrow_mcp/server.py

End users don't need to override the default API URL

It's subject to change. I don't want the docs to fall out of date.

These async variants have been around for a while, and we haven't felt the need to document them. I'm not convinced it's worth it now.

rgambee · 2026-02-12T21:22:03Z

I think I've addressed all the feedback. This PR is rather unwieldy, so I'm planning to merge it shortly and fix any remaining issues in follow-up PRs.

Some CI failures are expected since the MCP package depends on the published SDK package (see #146). I will verify that the errors are due to that and not something else.

src/everyrow/task.py

dschwarz26 and others added 5 commits February 8, 2026 12:29

Update docs for streaming, add and cleanup tests

017780a

Fixes

796ebf1

Change output messages, give example in docs

69709e5

dschwarz26 requested review from CallumMcMahon and rgambee February 9, 2026 00:58

sentry bot reviewed Feb 9, 2026

View reviewed changes

everyrow-mcp/src/everyrow_mcp/server.py Outdated Show resolved Hide resolved

jackwildman reviewed Feb 9, 2026

View reviewed changes

CallumMcMahon reviewed Feb 9, 2026

View reviewed changes

rgambee reviewed Feb 9, 2026

View reviewed changes

rgambee added 19 commits February 11, 2026 09:02

Regenerate OpenAPI code to include task progress

e248787

Fix formatting and linter errors

462120a

Use wall time, don't overwrite start time

7902c9b

Catch exceptions from progress callback

f829fe9

Check task status before fetching results

edaf09b

Use ~/.everyrow/task.json for storing progress

2c298ff

Check that jq is available

49a6a72

Truncate start time

a9ebbf6

Use notify-send to show notifcation on Linux

d91940b

Report progress for dedupe, merge and screen

6964aef

Properly convert from TaskProgressInfo to ProgressInfo

e8aad1c

Use TaskProgressInfo instead of ProgressInfo

4b56bbd

Use dataclass to store active task info, not dict

a20a46e

Set client parameter type to AuthenticatedClient

70dbce2

Use logging instead of print statements

8182f2b

Update documentation

e80cfb3

Bump SDK version within MCP

f5a25b8

Remove misleading sentence from error message

d0ca4cd

Include more exception details in error message

c5d500d

rgambee added 5 commits February 12, 2026 13:08

Merge main

07853d9

Merge branch 'main' into streaming-sdk

56806f4

Rename input classes

225eacb

Update status line tests

ddf10cd

Delete test scripts

36bee1d

sentry bot reviewed Feb 12, 2026

View reviewed changes

everyrow-mcp/src/everyrow_mcp/server.py Show resolved Hide resolved

rgambee added 12 commits February 12, 2026 13:57

Remove API URL from plugin env

a0eb377

End users don't need to override the default API URL

Use helper function to check API return type

d6baaba

Don't mention poll sleep time in docs

3c6ee36

It's subject to change. I don't want the docs to fall out of date.

Remove mentions of async operations

106393b

These async variants have been around for a while, and we haven't felt the need to document them. I'm not convinced it's worth it now.

Point to progress bar setup from installation

37aa8c7

Allow links to progress bar section

652f6d3

Move MCP server from API reference to overview

aa46a62

Remove extra slash from session URLs

c3e2bca

Adjust SDK progress formatting

afdef7b

Update documentation

4ce1dd8

Log state write erros at debug level

9533645

Fix input class names in test_server.py

daf864c

rgambee added 3 commits February 12, 2026 16:27

Add external URLs to list of ones to skip

7730358

Fix link to installation guide

f438791

Bump SDK and MCP to version 0.3.0

1a54404

jackwildman reviewed Feb 12, 2026

View reviewed changes

src/everyrow/task.py Outdated Show resolved Hide resolved

rgambee added 4 commits February 12, 2026 17:51

Merge branch main

666d5ca

Bump version number in more places

f5fc4ac

Make SDK progress info opt-in

46bd98f

Update documentation

eb5ba19

rgambee merged commit daef48d into main Feb 12, 2026
9 checks passed

rgambee deleted the streaming-sdk branch February 12, 2026 23:05

rgambee mentioned this pull request Feb 13, 2026

Fix claude code plugin #150

Merged



		def _default_progress_output(progress: ProgressInfo, total: int, elapsed: float) -> None:
		"""Print a progress line to stderr."""

		Stop:
		Runs `everyrow-stop-guard.sh`, reads `/tmp/everyrow-task.json`. If a task is running, outputs `{"decision": "block", "reason": "..."}` which prevents Claude from ending its turn. The reason text instructs Claude to call `everyrow_progress` to check status.


		Server-Sent Events (SSE): Would replace polling with push notifications. Adds complexity (persistent connections, reconnection logic) for marginal gain. The 12s polling cadence already provides smooth UX. MCP's stdio transport doesn't support SSE natively.

		Hook-based state tracking: The original design had PostToolUse hooks on `_submit` and `_progress` tools parse `tool_response` JSON and write the task state file. This was fragile because plugin MCP tool responses are double-escaped JSON strings (`{"result": "<escaped>"}`) that required careful parsing. Moving state writes into the MCP server itself (`_write_task_state()`) was simpler and more reliable.

Conversation

dschwarz26 commented Feb 9, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

rgambee commented Feb 12, 2026

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants