diff --git a/README.md b/README.md
index 289a9be..cf0ce66 100644
--- a/README.md
+++ b/README.md
@@ -26,10 +26,10 @@ It can also watch for trouble on its own. Point it at Slack channels or GitHub i
Four surfaces, each documented in depth on the [docs site](https://sourcehawk.github.io/triagent/):
-- **[Investigations](https://sourcehawk.github.io/triagent/docs/investigations/)**: the live triage view. Hand the assistant a symptom and some context (cluster, Slack thread, incident.io link, notes), watch it work through the diagnosis step by step, and ship the summary as markdown.
-- **[Playbooks](https://sourcehawk.github.io/triagent/docs/playbooks/)**: the step-by-step troubleshooting procedures the assistant follows, defined as YAML. Write and edit them right in the browser, with an AI assistant helping.
-- **[Wiki](https://sourcehawk.github.io/triagent/docs/wiki/)**: the team's lasting knowledge base of failure patterns and prior fixes, which the assistant can search.
-- **[Watches](https://sourcehawk.github.io/triagent/docs/watches/)**: rules that turn Slack messages, GitHub issues, or alerts into proposed investigations on their own.
+- **[Investigations](https://sourcehawk.github.io/triagent/investigations/)**: the live triage view. Hand the assistant a symptom and some context (cluster, Slack thread, incident.io link, notes), watch it work through the diagnosis step by step, and ship the summary as markdown.
+- **[Playbooks](https://sourcehawk.github.io/triagent/playbooks/)**: the step-by-step troubleshooting procedures the assistant follows, defined as YAML. Write and edit them right in the browser, with an AI assistant helping.
+- **[Wiki](https://sourcehawk.github.io/triagent/wiki/)**: the team's lasting knowledge base of failure patterns and prior fixes, which the assistant can search.
+- **[Watches](https://sourcehawk.github.io/triagent/watches/)**: rules that turn Slack messages, GitHub issues, or alerts into proposed investigations on their own.
@@ -117,9 +117,9 @@ This boots a localhost HTTP server, prints its URL with a per-launch token, and
In the browser:
-1. **Pick a cluster**: directly from kubeconfig, or via Teleport.
+1. **Pick a cluster** from the dropdown (sourced from your kubeconfig by default; Teleport if your profile uses it).
2. **Log in** if prompted (SSO/2FA prompts go to the launcher terminal).
-3. **Enter the namespace** and optional notes, Slack channel, or incident URL.
+3. **Add context** (all optional): a sentence on the symptom, a Slack channel, or an incident URL. The assistant narrows down the namespace itself.
4. **Investigate**: the assistant works through the playbook, uses its tools, and writes a summary you can copy or push upstream as a PR (once you've wired an upstream repo; see below).
### A few useful commands
diff --git a/docs/content/investigations.md b/docs/content/investigations.md
index 952eac3..3861fb5 100644
--- a/docs/content/investigations.md
+++ b/docs/content/investigations.md
@@ -18,31 +18,8 @@ The result of a typical session is a tidy markdown summary the operator can past
likely root cause, evidence, recommended next steps. The activity panel keeps every tool call visible, so operators can
audit the chain or interrupt with a follow-up at any point.
-## Why it exists
-
-Cluster triage isn't a `kubectl` command; it's a cross-source scramble. A typical incident looks like this:
-
-1. **Alert lands.** Pager, Slack `@`-mention, customer ticket. You were probably already on something else.
-2. **Catch up on the channel.** What has the customer / oncall / support already said? What's been ruled out? Who
- else is looking?
-3. **Read the cluster state.** Pods, events, logs, the failing pod's owner CR, the Crossplane composite, the
- backup status, the gateway service.
-4. **Check what changed.** Recent deploys, spec bumps, controller version skews, last week's incident
- write-up that mentioned the same component.
-5. **Pull metrics.** Prometheus for saturation, incident.io for the ongoing-incident timeline.
-6. **Recall prior art.** Have we seen this exact pattern before, and what fixed it?
-7. **Synthesise.** Hold the cross-references in your head, decide which thread to pull next, write up a conclusion
- someone else can act on.
-
-Each step is mechanical for an experienced operator, but the tabs multiply and the synthesis is slow. Worse, the
-patterns drift as new operators rotate in, and the artefact at the end is a Slack message that decays the moment the
-channel scrolls.
-
-This tool collapses steps 2–6 into one conversation against one audit trail. The walker knows which sources to consult
-for which failure shapes; the MCP catalog turns each query into a single typed tool call; the summary in step 7 falls
-out of the walker's terminal node. Operators stay in the loop: every tool call is visible in the activity panel, the
-conclusion is editable before sharing, and you can step in mid-session whenever the walker hits something it doesn't
-recognise.
+For the broader problem this surface addresses — the cross-source scramble a typical incident turns into — see
+[Overview → The problem it solves](/docs/overview#the-problem-it-solves).
## How it works
@@ -86,10 +63,14 @@ the token falls out of the address bar. The launcher stays alive in the terminal
### One investigation, end-to-end
-1. **Pick a cluster.** The launcher queries the configured provider (Teleport by default) for the operator's
- reachable clusters, then calls the provider's `Login` to obtain a kubeconfig context.
-2. **Preflight.** Confirms the namespace exists, RBAC permits pod listing, and writes a per-session `mcp.json`
- describing which triagent-mcp servers to spawn.
+1. **Provide a starting point.** An investigation needs at least one input: a cluster, an incident URL, a Slack thread,
+ or free-form notes. Picking a cluster is optional. When one is picked, the launcher queries the configured provider
+ (kubeconfig by default, Teleport when the profile selects it) for the operator's reachable clusters and calls the
+ provider's `Login` to obtain a kubeconfig context. With no cluster up front, the agent infers one from the remaining
+ inputs and calls `switch_context` at runtime.
+2. **Preflight.** When a cluster was picked, confirms it is reachable and RBAC permits read access. Either way it writes
+ a per-session `mcp.json` describing which triagent-mcp servers to spawn. The agent narrows down the namespace at
+ runtime via the k8s tools; it isn't fixed at preflight.
3. **Spawn the agent.** Claude is launched with that `mcp.json` plus a system prompt that points the agent at the
`investigation` playbook. The agent is told nothing product-specific in prose; the playbooks carry the procedural
knowledge.
@@ -101,19 +82,22 @@ the token falls out of the address bar. The launcher stays alive in the terminal
6. **Follow up or close.** The operator can keep chatting (clarifying questions, deeper dives); those route through the
`followup_conversation` meta-playbook so the response shape stays coherent.
-### What lives where
+### Separation of concerns
+
+Each part of the system owns exactly one job, so any one can change without touching the others. The launcher itself
+contains no decision logic — it wires processes together and streams the result to the browser.
| Concern | Owner |
| ---------------------- | -------------------------------------------------------------- |
-| Cluster picker / login | Provider plugin (Teleport by default) |
+| Cluster picker / login | Auth provider (kubeconfig by default, Teleport optional) |
| Tool execution | triagent-mcp servers (k8s, strategies, git, wiki, ...) |
| Decision logic | YAML playbooks (the strategies MCP walks them) |
| Reasoning | Claude CLI (the agent invoking tools) |
| UI | Next.js SPA (this app), embedded in the launcher binary |
| Authentication | Per-launch random token + cookie |
-The launcher itself contains zero decision logic. Playbooks own the procedure, triagent-mcp owns tool semantics,
-claude owns judgment. Each piece is editable independently.
+Playbooks own the procedure, triagent-mcp owns tool semantics, Claude owns judgment. Each piece is editable
+independently.
## Using the tool
@@ -126,8 +110,8 @@ claude owns judgment. Each piece is editable independently.
3. Click **+ new investigation** in the sidebar (or navigate to `/investigations/new`) to start a fresh one. Pick a
cluster from the dropdown. If the provider isn't logged in, you'll be prompted to authenticate (SSO/2FA prompts
surface in the terminal where you ran `triagent start`, not the browser).
-4. Fill in the form:
- - **cluster ID** (required when using the cluster_id profile input). The data namespace is derived per your profile.
+4. Fill in the form. The fields below are individually optional, but the investigation needs at least one starting
+ point — the cluster you picked above, or one of these:
- **incident URL** (optional). Pasted verbatim into the agent's prompt as context, useful for incident.io links
so the agent can pull the corresponding incident metadata if the incident.io MCP is connected.
- **Slack channel** (optional). When Slack is connected, the field becomes a channel picker (search by name); the
@@ -232,9 +216,9 @@ have it and is trained to yield to you in those cases.
### Enabling
-- **Start screen:** tick **Run in auto mode** before submitting.
-- **Mid-session:** press **Enable auto mode** on the session header
- (coming soon; for now, restart with auto mode on).
+Tick **Run in auto mode** on the start screen before submitting. A watch can also start a session in auto mode
+directly (see [Watches](/docs/watches#two-toggles-auto-ingest-and-auto-start)). To hand an already-running manual
+session to the operator agent, restart it with auto mode on.
### Take over
diff --git a/docs/content/overview.md b/docs/content/overview.md
index 307752f..1a64589 100644
--- a/docs/content/overview.md
+++ b/docs/content/overview.md
@@ -3,9 +3,9 @@
Agentic Incident Investigation, driven from your browser.
Triagent is a localhost web app that pairs the Claude reasoning agent with read-only Kubernetes access, an extensible
-MCP catalog (Prometheus, Slack, GitHub, incident.io, your own), a guided playbook walker, and a persistent wiki, all
-bound to a single cluster's namespace per session. You run `triagent start`, it opens a browser, you hand it the
-symptom, and it drives a focused diagnosis you can paste into a ticket when it's done.
+MCP catalog (Prometheus, Slack, GitHub, incident.io, read-only GCP/AWS context, your own), a guided playbook walker,
+and a persistent wiki, all scoped to a single cluster per session. You run `triagent start`, it opens a browser, you
+hand it the symptom, and it drives a focused diagnosis you can paste into a ticket when it's done.
## The problem it solves
@@ -60,7 +60,8 @@ New failure shape on Tuesday → playbook PR on Wednesday → every operator has
## What's in the box
-Four surfaces, each with a dedicated section in these docs.
+Four operator-facing surfaces, each with a dedicated section in these docs: **Investigations**, **Watches**,
+**Playbooks**, and **Wiki**. Underneath them sits the **MCP tool catalog** every surface is built on.
### [Investigations](/docs/investigations)
@@ -88,14 +89,6 @@ full investigation, so the launcher reaches you before the pager does. Each
signal carries a back-reference to the watch and items that produced it;
manual start is a click for the ones the agent flagged as `unclear`.
-### [MCP servers](/docs/mcp)
-
-A tool catalog the agent reads like a map, and the same map an operator reads when authoring a playbook. Exposed as
-curated tools rather than a raw shell, so the agent never gets to run arbitrary commands. The catalog grows as we wire
-in new sources (Kubernetes, Prometheus, the playbook walker, linked git repos, the wiki, Slack, incident.io, …); rather
-than enumerate it here, browse the live list at [**/mcp**](/mcp). The catalog reflects exactly what the launcher
-loaded for this build.
-
### [Playbooks](/docs/playbooks)
Procedural knowledge as data. Each playbook is a YAML graph that encodes one failure shape's triage path: read step
@@ -111,7 +104,15 @@ real git repo, indexed for the agent to consult during triage. Link density comp
canonical entity names, the better the agent's "have we seen this before?" recall gets. Procedure belongs in playbooks;
facts belong in the wiki.
-## Alpha Release
+### [The MCP tool catalog](/docs/mcp)
+
+The layer beneath all four surfaces. A tool catalog the agent reads like a map, and the same map an operator reads when
+authoring a playbook. Exposed as curated tools rather than a raw shell, so the agent never gets to run arbitrary
+commands. The catalog grows as we wire in new sources (Kubernetes, Prometheus, the playbook walker, linked git repos,
+the wiki, Slack, incident.io, …); rather than enumerate it here, browse the live list at [**/mcp**](/mcp). The catalog
+reflects exactly what the launcher loaded for this build.
+
+## Alpha release
This is alpha. Expect rough edges, breaking config changes between versions, and the occasional walker dead-end. Some
things are stable enough to plan around:
@@ -135,6 +136,9 @@ on file. Each integration has its own page:
- **[Slack and incident.io](/docs/connections)** — credentials stored in `~/.config/triagent/credentials.json` (mode
0600), validated against the upstream before saving.
+- **[Cloud providers](/docs/cloud-providers)** — read-only GCP or AWS context (reachability, IAM, GKE/EKS config,
+ logs, audit) so a Kubernetes thread can follow down into the cloud layer. Pinned to a read-only identity in the
+ profile, never entered in the UI.
- **[GitHub repositories](/docs/repos)** — linked over SSH for clone, `gh` CLI for the *Push as PR* flows. Defaults
ship via the profile's `linked_repos`; personal repos persist per-machine.
- **[Profiles](/docs/profiles)** — the deployment-specific config bundle that wires upstream repos for playbooks /
diff --git a/docs/content/profiles.md b/docs/content/profiles.md
index df41578..1f2d161 100644
--- a/docs/content/profiles.md
+++ b/docs/content/profiles.md
@@ -304,7 +304,7 @@ investigation_inputs:
| `text` | single-line input | `{{.value}}` |
| `url` | single-line input, light URL validation | `{{.value}}` |
| `textarea` | multi-line textarea | `{{.value}}` |
-| `cluster_id` | cluster picker bound to detected kube contexts | `{{.value}}` |
+| `cluster_id` | cluster picker bound to the provider's clusters | `{{.value}}` |
| `slack_channel` | channel picker (filtered by `slack.channel_prefix`) | `{{.id}}`, `{{.name}}`, `{{.url}}` |
Required (`optional: false`) inputs must be non-empty at preflight or the investigation refuses to start. For