Phase 12: user research first-pass + intro DM enrichment + idempotency fix by mcheemaa · Pull Request #116 · ghostwright/phantom

mcheemaa · 2026-05-01T17:10:57Z

Summary

Adds a public-source research subroutine that runs at firstboot before the intro DM goes out. Pulls signals from the GitHub public REST API, the personal site (og: meta tags), and the LinkedIn public profile page when a URL is supplied. Composes up to three short bullets (<=280 chars each, with source citations) and appends them to the intro DM under a "What I learned about you so far" subhead. Same bullets are injected into the onboarding system-prompt overlay so the agent can reference what it learned in the first conversation.
Closes the LOW idempotency bug "Onboarding re-fires on restart when evolution generation is 0" via a new firstboot_state SQLite ledger. The intro DM is stamped only after a successful Slack send, so a transient send failure leaves the flag clear and the next process start retries cleanly.
Phase 12 of the Phantom Cloud master plan, feeding into the firstboot flow already covered by Phase 9 self-knowledge overlay.

Architectural invariants enforced

Public sources only. No authenticated API calls, no LinkedIn auth scraping that violates ToS. We fetch the LinkedIn public profile page anonymously and read whatever og: tags it serves; HTTP 999, 403, or non-200 means we move on without retry.
Time-bounded to ~15 seconds total via AbortSignal.timeout. A slow source cannot hold the firstboot DM hostage.
Per-fetch timeout of 4 seconds via per-fetch AbortController so a single hang does not eat the global budget.
Don't fabricate. Empty probes -> bullets: null -> intro DM renders without the section.
Plaintext discipline. The owner email never appears in a bullet, never gets logged, never gets echoed back to the user.
Public mailbox domains (gmail, outlook, icloud, etc.) are skipped for the personal-site probe; only custom-domain emails get a fetch.

Test plan

Operator merge gate

Cheema only. Background agents do not merge ghostwright/phantom PUBLIC PRs.
After merge, queue a Phase 1 follow-up to add the optional PHANTOM_OWNER_LINKEDIN_URL field to the wizard and have phantomd firstboot stamp it into /etc/default/phantom alongside PHANTOM_OWNER_EMAIL.
If a customer requests research off, the operator sets PHANTOM_OWNER_RESEARCH_ENABLED=false in the per-tenant env.

…richment + idempotency fix Adds a public-source research subroutine that runs at firstboot before the intro DM goes out. Pulls signals from the GitHub public REST API, the personal site (og: meta tags), and the LinkedIn public profile page when a URL is supplied. Composes up to three short bullets with source citations, capped at 280 chars each, and appends them to the intro DM under a "What I learned about you so far" subhead. The same bullets are injected into the onboarding system-prompt overlay so the agent can reference what it learned in the first conversation. Architectural invariants enforced: - Public sources only. No authenticated API calls. No LinkedIn auth scraping that violates ToS. We fetch the LinkedIn public profile anonymously and read whatever og: tags it serves; HTTP 999, 403, or any non-200 means we move on. - Time-bounded to 15 seconds total via AbortSignal.timeout. A slow source cannot hold the firstboot DM hostage. - Per-fetch timeout of 4 seconds so a single hang does not eat the global budget. - Don't fabricate. If every probe is empty, returns null bullets and the intro DM renders without the "What I learned" section. - Plaintext discipline. The owner email never appears in a bullet, never gets logged, and is not echoed back to the user. - Public mailbox domains (gmail, outlook, etc.) are skipped for the personal-site probe; only custom-domain emails get a fetch. Also closes the LOW idempotency bug ("Onboarding re-fires on restart when evolution generation is 0") via a new firstboot_state ledger table. The startOnboarding entrypoint short-circuits with skipped: true when intro_sent_at is set; the ledger is stamped only AFTER a successful Slack send so a transient Slack failure leaves the flag clear and the next process start retries. Tests: 75 new tests across fetchers, enrich-owner, firstboot state, the flow integration, and the prompt builder. The full suite is 2382 tests, 2371 pass + 10 skip + 1 todo + 0 fail. bun typecheck clean, biome lint clean. Operator TODOs: - Phase 1 wizard adds an optional PHANTOM_OWNER_LINKEDIN_URL field; phantomd firstbootStep stamps it into /etc/default/phantom alongside PHANTOM_OWNER_EMAIL. The field is not required; the research path works on email + name alone, LinkedIn is a bonus when present. - PHANTOM_OWNER_RESEARCH_ENABLED=false is the operator escape hatch if a customer asks the research subroutine to be off entirely.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2a8856f9e3

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-01T17:14:04Z

+		body = await res.text();
+	} catch {
+		return null;
+	}
+
+	// Cap parsing at 256KB; real og: tags live in the first 16KB. This
+	// also defeats memory-blowup pages that ship multi-MB index.html.
+	const head = body.slice(0, 256 * 1024);


Enforce HTML size cap before reading response body

The page-size guard is applied only after await res.text(), which already buffers the full response in memory. A large or malicious index.html can still consume significant memory/CPU before body.slice(0, 256 * 1024) runs, so the intended 256KB protection is ineffective under real network responses. This undermines the firstboot timeout/perf guarantees for owner research and can degrade startup reliability on oversized pages.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-05-01T17:14:04Z

+			if (result.skipped) {
+				console.log("[onboarding] firstboot ledger says intro already sent; not re-firing");
+			} else {


Rebuild onboarding prompt when intro send is skipped

When startOnboarding returns skipped: true, this branch logs and exits without rebuilding the personalized onboarding prompt, even though needsOnboarding can still be true on restart. In that case the runtime keeps the earlier generic prompt and loses Phase 12 profile/research context for the first conversation after a restart, despite onboarding still being active. This is a regression from the new idempotency path because the skip branch drops prompt enrichment entirely.

Useful? React with 👍 / 👎.

chatgpt-codex-connector Bot reviewed May 1, 2026

View reviewed changes

mcheemaa merged commit 0f50cbc into main May 1, 2026
1 check passed

mcheemaa mentioned this pull request May 1, 2026

channels: persistent intro-DM ledger for HTTP mode (H3 onboarding fix) #119

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Phase 12: user research first-pass + intro DM enrichment + idempotency fix#116

Phase 12: user research first-pass + intro DM enrichment + idempotency fix#116
mcheemaa merged 1 commit intomainfrom
feat/2026-05-01-phase12-user-research-enrichment

mcheemaa commented May 1, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot May 1, 2026

Uh oh!

chatgpt-codex-connector Bot May 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mcheemaa commented May 1, 2026

Summary

Architectural invariants enforced

Test plan

Operator merge gate

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 1, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot May 1, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant