Skip to content

feat: leadbay_import_leads MCP/OpenClaw tool (#3537)#19

Merged
milstan merged 4 commits intomainfrom
milstan/import-leads
Apr 28, 2026
Merged

feat: leadbay_import_leads MCP/OpenClaw tool (#3537)#19
milstan merged 4 commits intomainfrom
milstan/import-leads

Conversation

@milstan
Copy link
Copy Markdown
Contributor

@milstan milstan commented Apr 28, 2026

Summary

Adds leadbay_import_leads — a new MCP/OpenClaw composite write tool that accepts a list of company domains and returns Leadbay leadIds for ones the crawler already knows. Output chains naturally into leadbay_bulk_qualify_leads({ leadIds }) and leadbay_research_lead. Closes leadbay/product#3537.

Implementation strategy (post /autoplan dual-voice review): wraps Leadbay's existing CRM-import wizard at POST /1.5/imports. Live API was probed end-to-end (17 candidate paths) before any code; backend has no clean domain-import endpoint, so this PR ships the wedge while leadbay/product#3538 tracks the proper async-import-with-crawl backend follow-up.

Foundation (commit 1):

  • LeadbayClient.requestRawBinary() — mirrors request() exactly (auth, semaphore, error mapping, _lastMeta, LEADBAY_MOCK=1 parity) for non-JSON uploads
  • File-import wire types (snake_case, probed live 2026-04-28)
  • ToolContext.signal?: AbortSignal for caller cancellation

Composite tool (commit 2):

  • ~840 lines covering: domain normalize/dedupe, RFC 4180 + formula-injection-safe CSV synthesis (whitespace-trim before first-byte check), preflight admin gate, client-chunking at 100, per-chunk wizard drive (preprocess → mappings → process → records-to-terminal stabilization across 2 polls), MCP_ROW_ID-based reconciliation with normalized-domain fallback
  • 8 typed error codes + per-domain not_imported.reason enum

MCP/OpenClaw exposure (commit 3):

  • Gated by LEADBAY_MCP_WRITE=1 (MCP) and exposeWrite=true (OpenClaw)
  • Manifest, contract test, server test all updated

Docs + version bump (commit 4):

  • @leadbay/mcp 0.2.4 → 0.2.5, @leadbay/leadclaw 0.2.4 → 0.2.5
  • New "Importing domains from external systems → leadIds" section in packages/mcp/README.md Advanced
  • All 3 CHANGELOGs explicitly call out the side-effect (CRM-imports row + onboarding-step mutation per call)

Tool surface

input:  { domains: [{domain, name?}], dry_run?, per_phase_budget_ms?, total_budget_ms? }
output: { leads: [{domain, leadId, name}],
          not_imported: [{domain, reason: "malformed"|"no_match"|"uncrawled"|"ambiguous"|"internal_error"|"dry_run"}],
          importIds: string[],
          region, _meta, cancelled? }

Test Coverage

35 new unit tests for the composite + 4 new parity tests for requestRawBinary + 1 new live smoke suite. Total: 186 tests passing across all packages.

Coverage covers: normalize/dedupe, CSV escaping (RFC 4180 + injection guard with whitespace-trim), happy path, preprocess errors, dry_run path, >100-input chunking, admin preflight, requestRawBinary auth/error/meta parity, contract gating, MCP server registration. Concurrency / retry-storm tests deferred to backend follow-up.

Pre-Landing Review

A pre-landing reviewer subagent and Codex adversarial pass independently identified concerns. 6 fixes auto-applied:

  1. CHANGELOG count corrected (8 error codes, not 9)
  2. Dead lastRecords + unreachable code removed
  3. checkAborted style consistency
  4. ChunkRunOutput shrunk to {importId, records} (dropped 3 dead fields)
  5. Unused lookup.duplicates collection removed (dedupe is silent by design)
  6. CSV formula-injection guard now trims leading whitespace before the first-char check (previously bypassable via \" =HYPERLINK(...)\")
  7. IMPORT_PAGINATION_RUNAWAY off-by-one fixed (was firing on legit totalPages == maxPagesPerPoll boundary; now uses an explicit exhaustedPagination flag)
  8. importId is captured the moment POST /imports succeeds (was lost on abort/timeout mid-poll, leaving orphan wizard rows untraceable)
  9. Reconciliation sorts records so matched (lead.id present) rows win over no-match rows for the same input — defensive against future wizard behavior

Live e2e Verification (milstan@leadbay.ai, US backend)

  • Empty input → IMPORT_EMPTY_INPUT
  • Malformed-only input → not_imported.reason=\"malformed\"
  • dry_run (2 domains) → preprocess only, all reason=\"dry_run\", importId returned, no update_mappings call ✓
  • Full import (apple.com, microsoft.com, salesforce.com) → 1 matched leadId + 2 reason=\"uncrawled\" ✓ (latest run: 11.9s)

Known Limitations (v1)

  • Side effects. Each call creates a row in the user's CRM-imports list (visible in the web UI) and touches onboarding state (startFileless, updateOnboardingStep(PROCESSING)). Suitable for occasional automation, not high-cadence (>5 calls/day). Backend follow-up #3538 covers the clean async endpoint.
  • No creation. Uncrawled domains return in not_imported with reason=\"uncrawled\". The wedge cannot create new leads for unknown websites; the caller decides what to do.
  • Fuzzy matching surprise observed in live e2e: "apple.com" matched to "BIKE THE BIG APPLE.COM CORP." via wizard substring search. The MCP tool surfaces what the wizard returns; semantic accuracy is a backend concern. The follow-up issue addresses this too.

Test plan

  • pnpm -r typecheck (all 4 packages clean)
  • pnpm -r build (core/mcp/leadclaw + 1428KB DXT bundle)
  • pnpm -r test (145 core + 12 leadclaw + 29 mcp = 186 tests passing)
  • Live e2e against milstan@leadbay.ai US (admin gate, dry_run, empty input, full import → bulk_qualify chain)
  • Manual MCP-protocol invocation pending consumer integration

🤖 Generated with Claude Code

milstan and others added 4 commits April 28, 2026 16:43
Add LeadbayClient.requestRawBinary() for non-JSON uploads (CSV, future
binary). Mirrors request() exactly (auth, semaphore, error mapping,
_lastMeta, mock-mode) with a caller-supplied Content-Type and Buffer/string
body. Backs the upcoming leadbay_import_leads composite which posts CSV to
/imports.

Add file-import wire types matching the live API (snake_case, probed
2026-04-28 against api-us): FileImportPayloadV15, MappingsPayload,
ImportRecordPayload, PreProcessing/Processing payloads, PaginatedResponse.

Extend ToolContext with optional signal?: AbortSignal so long-running
composites can honor caller cancellation.

4 new client tests cover requestRawBinary parity (auth, 401/403/429
mapping, Content-Type, _lastMeta).
Wraps Leadbay's existing CSV-import wizard so external automations (CRM,
analytics, email correspondents) can hand a list of company domains to
Leadbay and get back stable leadIds for downstream chaining into
leadbay_bulk_qualify_leads / leadbay_research_lead.

Surface (locked):
  Input  { domains: [{domain, name?}], dry_run?, per_phase_budget_ms?, total_budget_ms? }
  Output { leads, not_imported: [{domain, reason}], importIds, region, _meta, cancelled? }

Internal flow (the wedge): preflight admin check → normalize+dedupe →
chunk at 100 → per-chunk: synthesize CSV (RFC 4180 + formula-injection
guard, MCP_ROW_ID column) → POST /imports → poll preprocess → commit
mappings (skip in dry_run) → poll process → poll records to terminal
(stabilization across 2 polls; treats match_type=NO_MATCH as terminal
since the wizard leaves those records IMPORTING forever) → reconcile by
MCP_ROW_ID with normalized-domain fallback. AbortSignal honored between
awaits; importId is captured immediately after POST so callers always
have it on cancel/timeout.

8 typed error codes (IMPORT_PREPROCESS_FAILED, IMPORT_PROCESSING_FAILED,
IMPORT_BUDGET_EXHAUSTED, IMPORT_NOT_TERMINAL, IMPORT_ADMIN_REQUIRED,
IMPORT_BILLING_REQUIRED, IMPORT_PAGINATION_RUNAWAY, IMPORT_EMPTY_INPUT)
plus per-domain not_imported.reason (malformed | no_match | uncrawled |
ambiguous | internal_error | dry_run).

Limitation: the wizard only matches against the existing crawler-built
lead universe. Uncrawled domains land in not_imported with
reason="uncrawled"; the tool does NOT create new leads. Backend
follow-up tracked at leadbay/product#3538 (programmatic /1.5/leads/import
with crawler dispatch).

35 unit tests + 1 smoke suite cover normalize/dedupe, CSV escaping
(injection guard with whitespace-prefix trim, RFC 4180 quoting), happy
path, dry_run, preprocess errors, >100-input chunking, and admin
preflight.

Closes leadbay/product#3537.
@leadbay/mcp registers leadbay_import_leads via the existing
compositeWriteTools catalogue, gated by LEADBAY_MCP_WRITE=1. @leadbay/leadclaw
exposes it under exposeWrite=true (manifest updated). Both follow the
same pattern as bulk_qualify_leads / report_outreach.

Server test asserts the tool is hidden by default and exposed when
includeWrite=true. Contract test asserts manifest ↔ code parity and
that the tool is gated behind exposeWrite.
Ships leadbay_import_leads. CHANGELOGs explicitly call out that this is
a write tool that mutates user state (creates a row in the user's
CRM-imports list per call, touches onboarding state). Suitable for
occasional automation, not high-cadence (>5 calls/day) — backend
follow-up tracked at leadbay/product#3538 will lift this restriction.

mcp/README adds an "Importing domains from external systems → leadIds"
section under Advanced with the LEADBAY_MCP_WRITE=1 quickstart and the
side-effect / wedge-limitation disclosures.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@milstan milstan merged commit 3e347a9 into main Apr 28, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant