feat(channels): Microsoft Teams adapter via Graph polling (Phases A-D, closes #75) by initializ-mk · Pull Request #76 · initializ/forge

initializ-mk · 2026-05-20T22:04:03Z

Summary

First-party Microsoft Teams channel adapter implemented as a Microsoft Graph polling adapter (analogous to Telegram polling), not a Bot Framework webhook. Outbound-only — no inbound webhooks, no public endpoint. Preserves Forge's first design principle while finally putting Teams support on the table now that Graph delta polling and un-metered Teams APIs (since 2025-08-25) make the model viable.

Implements Phases A-D of the design recorded in #75:

Phase A — adapter skeleton + Plugin interface
Phase B — typed Graph client (/me, delta paging, POST messages, hosted-content attachments) + OAuth2 auth (delegated refresh-token + client_credentials)
Phase C — Markdown → Teams HTML converter + plain-text extractor + 24 KB chunker + mention extractor
Phase D — admission gate (self-loop guard, bot allowlist, mention/DM mode) + sliding-window dedup + atomic cursor persistence

Closes #75 for the runtime adapter; CLI wizard, user docs, and live tenant validation are deferred to a follow-up PR (see "Deferred" below).

Diff stats

 13 files changed, 2363 insertions(+), 9 deletions(-)

Path	Purpose
`forge-plugins/channels/msteams/msteams.go`	Plugin shell + poll loop + dispatch + SendResponse
`forge-plugins/channels/msteams/auth.go`	Delegated + client_credentials OAuth2; refresh-token rotation hook
`forge-plugins/channels/msteams/graph.go`	Typed Graph client; classified error sentinels (401/403/410/429)
`forge-plugins/channels/msteams/admission.go`	4-stage gate; self-loop beats allowlist
`forge-plugins/channels/msteams/cursor.go`	Atomic delta-link persistence; 0700 dir perm
`forge-plugins/channels/msteams/dedup.go`	Sliding 1000-ID ring
`forge-plugins/channels/msteams/msteams_test.go`	29 unit tests
`forge-plugins/channels/markdown/teams.go`	`MarkdownToTeamsHTML`, `TeamsHTMLToPlain`, `SplitMessageTeams`, `ExtractMention`
`forge-plugins/channels/markdown/teams_test.go`	18 unit tests
`forge-core/security/capabilities.go`	`msteams` capability bundle: `graph.microsoft.com`, `login.microsoftonline.com`
`forge-cli/cmd/channel.go`	`msteams` added to ValidArgs, createPlugin, defaultRegistry, egress wiring, setup instructions
`forge-cli/templates/init/msteams-config.yaml.tmpl`	Config template with `_env`-suffix discipline
`forge-cli/templates/init/env-msteams.tmpl`	`.env` placeholders for `MSTEAMS_*`

Zero edits required to channels_stage.go, k8s_stage.go, requirements_stage.go, template_data.go, K8s manifest templates, or cmd/package.go — the existing _env-suffix scanner from PR #54 picks up msteams-config.yaml automatically. The architecture from Phase 7 (Slack/Telegram pattern) generalised cleanly.

Architecture invariants preserved

Outbound-only. No port bound. All HTTPS to graph.microsoft.com + login.microsoftonline.com.
No Bot Framework / Azure Bot Service. Direct net/http calls only.
_env-suffix convention holds — every secret-bearing setting in msteams-config.yaml.tmpl ends in _env. K8s manifest generation needs zero changes.
No model=A/B query param. Metering era ended 2025-08-25; the param is now ignored.
Self-loop guard fires FIRST in the admission gate — even if the agent's own userID is misconfigured into allow_bot_ids, self-authored messages are dropped (matches Slack/PR feat(slack): admit bot @mentions via allow_bot_ids, drop self always (closes #55) #56).
No hardcoded tenant ID — read via tenant_id_env. Multi-tenant seam stays clean even though multi-tenant adapter isn't in scope.
Sovereign clouds via graph_base_url_env — US Gov / China / GCC operators override MSTEAMS_GRAPH_BASE_URL and add their domains to egress.allowed_domains. Default bundle stays minimal for the 99% on commercial cloud.

Error handling

Graph status	Behaviour
`401`	Force `auth.Refresh()`; if it still 401s → 60s backoff
`403`	Log with `docs/channels/msteams.md` remediation hint; 60s backoff
`429`	Honour `Retry-After` header; clamped to [10s, 300s]
`410 Gone` (delta cursor expired)	Critical — discard cursor, re-init from `$filter=lastModifiedDateTime gt <now>`. WARN log. Never crashes.
`5xx` / network	Exponential backoff 5s → 10s → 20s → 40s → 60s capped; reset on first success

Tests (47 total, no network)

channels/markdown — 18 tests: every conversion-table row from [Feature]: Microsoft Teams channel adapter via Graph polling (outbound-only) #75 §6, plus HTML-escape safety, mention preservation, entity decoding, <br> handling, under-limit and over-limit splits, mention extraction with and without userID.
channels/msteams — 29 tests:
- dedup: seen/mark, ring eviction (oldest first), duplicate-mark non-eviction
- cursor: atomic save+load round-trip, 0700 dir perm enforced, corrupt-file recovery
- admission: self-loop beats allowlist (hard rule), DM-only admit/drop, mention-only admit/drop, mention_or_dm, bot allowlist admit/drop
- stripBotMention: 6 cases including case-insensitive prefix and non-start non-strip
- parseAllowBotIDs: comma / space / newline splits
- graph client: /me, delta paging with @odata.nextLink, all four error sentinels, Retry-After parsing with floor/ceiling clamps, POST payload shape
- auth manager: delegated refresh-token flow with rotation hook + token cache (proves second Token() call doesn't re-hit endpoint), client_credentials flow, token-endpoint error surfaces correctly
- Plugin Init validation: 4 cases covering each required field + poll-interval clamping

go test ./forge-core/... ./forge-cli/... ./forge-plugins/... — all green
gofmt -w + golangci-lint — 0 issues

Deferred to a follow-up PR

These are independent from the runtime adapter and won't block this PR from being reviewed and merged:

forge channel add msteams interactive bubbletea wizard with the device-code flow capture and live /me validation. Today, forge channel add msteams still works non-interactively via the existing add/serve plumbing — it writes the template and the .env placeholders; the operator runs the device-code curl themselves and fills in the refresh token. The wizard is purely UX sugar.
docs/channels/msteams.md user-facing setup guide (Entra app registration walkthrough, permission list, troubleshooting matrix, sovereign-cloud override notes).
Live end-to-end test against a real tenant (Phase E of [Feature]: Microsoft Teams channel adapter via Graph polling (outbound-only) #75). All other Phase E test cases are covered by the unit tests above; this one needs a tenant + manual interaction.
FORGE_PROJECT_DESIGN.md Channel Connectors table row — separate doc-sync follow-up after merge, same pattern as PR docs: add linear and code-plan to embedded-skills index #72 for linear + code-plan.

Open questions deferred (from #75 §14)

The four open questions from the issue are best resolved in the wizard PR, since they all affect UX:

Delegated vs client_credentials default in the wizard — currently delegated per the template default.
Mention-strip behaviour — implemented as strip (the recommended option in [Feature]: Microsoft Teams channel adapter via Graph polling (outbound-only) #75).
Schedule delivery target format — ev.WorkspaceID carries the Graph chat.id as-is; scheduler prompt already handles arbitrary strings.
msteams-channels (in-team channel messages) — explicitly out of scope per [Feature]: Microsoft Teams channel adapter via Graph polling (outbound-only) #75 §15 (blocked on the Microsoft @odata.nextLink bug).

Reference patterns followed

forge-plugins/channels/telegram/telegram.go — polling-loop shape, env-var resolution via channels.ResolveEnvVars
forge-plugins/channels/slack/slack.go — admission flow with self-loop guard ahead of allowlist (PR feat(slack): admit bot @mentions via allow_bot_ids, drop self always (closes #55) #56)
forge-plugins/channels/slack/slack.go SendResponse — large-response summary + attachment + chunked fallback; same use of a2a.Message.Summary from PR feat(channels): LLM-generated summary for large responses (closes #64) #65
forge-cli/build/channels_stage.go — _env-suffix scanner from PR fix: inject channel env vars into K8s manifests (closes #50) #54; no changes needed

@DisplayName

Implements Phases A-D of the FORGE_MSTEAMS_CHANNEL_GRAPH_POLLING design: core adapter + Graph client + auth + markdown converter + unit tests. Outbound-only — no inbound webhooks, no public endpoint, no Bot Framework. CLI wizard (Phase E), user-facing setup docs, and live end-to-end testing against a real tenant are deferred to a follow-up PR — see #75 OPEN QUESTIONS for the wizard UX decisions still to settle (delegated vs client_credentials default, mention-strip behavior). The plumbing already supports `forge channel add msteams` non-interactively via the existing add/serve flow. Files forge-plugins/channels/msteams/ - msteams.go Plugin shell (Init/Start/Stop/SendResponse/ NormalizeEvent), poll loop with 3/5/60s clamp, backoff state machine, dispatch + admission gate + dedup, large-response handling that mirrors slack/telegram (summary + hosted-content attachment, fallback to chunked). - auth.go OAuth2 token manager: delegated refresh-token flow + client_credentials. Caches tokens with a 60s expiry window. Refresh-token rotation hook for the runner to persist back to .env / K8s Secret. - graph.go Typed Graph client: /me, /users/{id}, delta paging via getAllMessages/delta, POST chats/{id}/messages, hosted-content attachments (4 MiB cap enforced). Classifies responses into errUnauthorized / errForbidden / errCursorExpired (410) / *rateLimitedError (Retry-After clamped 10-300s) / generic 5xx. No `model=A/B` query param (metering era ended 2025-08-25). - admission.go 4-stage gate: self-loop FIRST (beats allowlist even on misconfiguration), then bot-allowlist check, then mode filter (mention / dm / mention_or_dm). stripBotMention drops the leading "@DisplayName" so the LLM doesn't see the bot's own name as the first word. - cursor.go .forge/channels/msteams-cursor.json with atomic write (tmp + rename), 0700 dir perm. Corrupt file loads as empty. - dedup.go Sliding 1000-ID ring; oldest evicted first; concurrent-safe. forge-plugins/channels/markdown/ - teams.go MarkdownToTeamsHTML (headings, bold, italic, inline code, fenced code, links, ul, ol, blockquote, strikethrough; HTML-safe escaping). TeamsHTMLToPlain (strips Teams chatMessage HTML to text, preserves @mention display names, decodes entities). SplitMessageTeams (24KB threshold — leaves headroom for HTML-tag overhead under the ~28KB Teams body limit). ExtractMention (mentions[] + <at id="N"> belt-and-braces). - teams_test.go 18 tests across all converter surfaces. forge-core/security/capabilities.go - msteams bundle: graph.microsoft.com + login.microsoftonline.com. Sovereign clouds (graph.microsoft.us, microsoftgraph.chinacloudapi.cn) stay out of the default bundle — operators add via egress.allowed_domains. forge-cli/cmd/channel.go - ValidArgs include msteams. createPlugin + defaultRegistry register msteams.New(). addChannelEgressToForgeYAML adds msteams to egress.capabilities (same pattern as slack). printSetupInstructions prints the Entra app registration steps. forge-cli/templates/init/ - msteams-config.yaml.tmpl Config emitted by `forge channel add msteams`. All settings ending in _env are auto-discovered by the existing ChannelsStage from PR #54 for K8s manifest emission — zero edits to k8s_stage.go or channels_stage.go required. - env-msteams.tmpl .env placeholders for the five MSTEAMS_* variables. Tests (all under unit-test budget — no network, no real Graph calls) 47 tests added across two packages: - channels/markdown: 18 (MarkdownToTeamsHTML / TeamsHTMLToPlain / SplitMessageTeams / ExtractMention) - channels/msteams: 29 covering: * dedup: seen-and-mark, ring eviction (oldest first), duplicate-mark non-eviction * cursor: atomic save + load round-trip, 0700 dir perm, corrupt-file recovery * admission: self-loop beats allowlist (HARD rule), DM-only, mention-only, mention_or_dm, bot allowlist admit/drop * stripBotMention: 6 cases including case- insensitive prefix and non-start non-strip * parseAllowBotIDs: comma/space/newline splits * graph: /me happy path, delta paging (@odata.nextLink shape), all four error branches (401/403/410/429), Retry-After parsing including floor/ceiling clamps, POST chats/{id}/messages payload shape * auth: delegated refresh-token rotation hook fires + cache prevents second HTTP call, client_credentials happy path, refresh-token-expired error surfacing * Plugin Init validation: tenant required, delegated needs refresh_token, client_creds needs user_id, poll_interval_seconds clamped to 3/5/60 Zero edits to channels_stage.go, k8s_stage.go, requirements_stage.go, template_data.go, K8s manifest templates, or cmd/package.go — the existing _env-suffix scanner from PR #54 picks up msteams-config.yaml automatically. go test ./... forge-core ✅ forge-cli ✅ forge-plugins ✅ golangci-lint 0 issues across all three modules gofmt clean Deferred to a follow-up PR (Phase E + F from #75): - forge-cli/cmd/channel_msteams.go interactive bubbletea wizard with device-code flow capture and live /me validation - docs/channels/msteams.md user-facing setup guide (Entra app registration, permissions, troubleshooting, sovereign-cloud override) - End-to-end manual test against a real tenant - FORGE_PROJECT_DESIGN.md Channel Connectors table row

The init wizard's channel-selector step (forge-cli/internal/tui/steps/ channel_step.go) was hard-coded to None/Telegram/Slack. Adding the msteams adapter to the runtime + CLI without surfacing it in the interactive picker meant TUI users couldn't actually pick it. Changes - Add the "MS Teams" item to the SingleSelect list with the 👥 icon and the "Graph polling, no public URL needed" subtitle (same outbound-only framing as the existing options). - Add 3 new phases for the 4-step token capture: channelMsteamsClientIDPhase channelMsteamsClientSecretPhase channelMsteamsRefreshTokenPhase (The 1st step — tenant_id — reuses the existing channelTokenPhase via the per-channel switch already used for Slack's two-step flow.) - New `updateMsteamsPhase` helper drives the chain; each phase stores the just-collected value under the MSTEAMS_* env name and advances to the next prompt, or marks the step complete on the last step. - New `newMsteamsInput` helper centralises the theme-binding boilerplate so the 4 phases stay terse. - View() adds instructions for each of the 4 steps. The refresh-token step embeds the device-code curl invocation so the operator has the command at hand without leaving the TUI. - Summary() returns "MS Teams" for the post-wizard recap row. No new TUI components needed — reuses the existing SingleSelect + SecretInput primitives. No changes to wizard plumbing (WizardContext.ChannelTokens, init.go env-var copy, review_step, egress_step) — the existing per-token-name map already supports arbitrary channels. The full device-code polling automation (where the wizard would call /devicecode + /token itself instead of having the operator paste the refresh token) was explicitly deferred in PR #76's body and remains deferred; this commit just removes the "msteams is invisible in the TUI" papercut. go test ./forge-cli/... — all green gofmt -w + golangci-lint — 0 issues forge --version against /tmp/forge-issue-75-build/forge — boots forge init shows MS Teams 👥 in the channel selector

The OAuth 2.0 device-code flow has two halves: 1. user-consent: visit verification_uri, enter user_code, sign in 2. client-polling: POST /token with grant_type=device_code repeatedly until the user completes consent The previous TUI Step 4/4 instructions documented only half (1), so an operator who finished the sign-in page never saw a refresh token — nothing on their side was polling /token. The "successfully given permission" Microsoft page is the consent-side result; the refresh token only materialises when the client side polls and exchanges the device_code. New subcommand: forge channel msteams-login - Reads tenant_id and client_id from --flags, $MSTEAMS_TENANT_ID / $MSTEAMS_CLIENT_ID in the shell env, or .env in the current dir (first non-empty wins). - Calls POST {login_base}/{tenant}/oauth2/v2.0/devicecode with scope=https://graph.microsoft.com/.default offline_access — same scope set the runtime authManager uses, so the refresh token captured here is interchangeable with one captured externally. - Prints verification_uri + user_code to stderr (so stdout stays clean for piping). Polls /token at the server-advertised interval, honours slow_down (interval += 5s), handles the four terminal error_codes (authorization_pending continue / expired_token / access_denied / generic error). - On success, prints refresh_token to stdout (newline only — no labels, pipe-friendly). With --write-env, appends or replaces the MSTEAMS_REFRESH_TOKEN=<token> line in .env directly. - Sovereign clouds: --login-base override (default https://login.microsoftonline.com; pass login.microsoftonline.us for US Gov, login.chinacloudapi.cn for China). - Timeout: --timeout-seconds (default 900 = 15 minutes, matching Microsoft's device-code lifetime). TUI Step 4/4 (channel_step.go) updated: - Replaces the cryptic curl snippets with a single instruction to run `forge channel msteams-login` in a second terminal. - Mentions --write-env so the operator can skip the paste-back entirely and just leave the TUI field blank. Tested: forge channel msteams-login --help prints the expected usage. Real Entra calls happen at runtime; the polling state machine mirrors the runtime auth.go's refresh-token rotation handler. go test ./forge-cli/... — all green golangci-lint — 0 issues

…-open browser) User feedback: running `forge channel msteams-login` from a second terminal during the TUI wizard didn't work because the agent project directory (and .env) don't exist until the wizard completes. Followed by: "can tui trigger a browser open once I have provided the tenantid, clientid and secret via tui?" — yes, and that's the right UX. This change makes the MS Teams branch of `forge init` self-contained: After collecting tenant_id (step 1), client_id (step 2), client_secret (step 3), the wizard automatically: 4a. POSTs /devicecode using the tenant + client just provided 4b. Opens the verification_uri in the operator's default browser (cross-platform: open / xdg-open / start; failures are silently degraded — the URL is also shown on screen) 4c. Renders the one-time user_code prominently in the TUI 4d. Polls /token in the background until the operator approves consent in the browser 4e. Captures the refresh_token, stores it under MSTEAMS_REFRESH_TOKEN in the wizard's token map, and advances No second terminal. No manual paste. No curl wrangling. The wizard finishes with a complete .env. Architecture - New shared package `forge-cli/internal/devicecode/` containing the RFC 8628 device-authorization-grant primitives: DeviceCodeResponse, TokenResponse, RequestDeviceCode, PollDeviceToken, OpenURL (cross-platform browser launcher mirroring the private openBrowser in forge-core/llm/oauth/flow.go) Used by BOTH the `forge channel msteams-login` CLI subcommand AND the TUI. Same scope (graph.microsoft.com/.default offline_access), same poll cadence, same error class handling — refresh tokens produced by either path are interchangeable. - `cmd/channel_msteams_login.go` refactored to import from there; the duplicate types and helpers in this file are removed (~145 lines down to ~210, mostly env-var helpers). Behaviour unchanged. As a small UX upgrade, this subcommand now also opens the verification_uri in the browser automatically. TUI implementation - New state `channelMsteamsDeviceLoginPhase` replaces the old `channelMsteamsRefreshTokenPhase` (which had been a paste-back SecretInput). - New sub-state machine inside that phase: msteamsLoginRequesting → msteamsLoginWaiting → success | msteamsLoginErr. - Two tea.Cmd dispatchers: requestDeviceCodeCmd (30s timeout) and pollTokenCmd (15-minute timeout, matches Microsoft's device-code expiry). - View() renders: * "⣾ Requesting a one-time code from Microsoft..." during step 4a * The verification URL + user_code prominently, plus a "⣾ Waiting for you to complete sign-in..." spinner during 4d * On failure: error message + "Press R to retry, or S to skip" (skip allows the operator to finish the wizard and capture the refresh token later via `forge channel msteams-login --write-env` from the new project root) Build / lint / test all green. No changes to runtime adapter code (forge-plugins/channels/msteams/) — this is purely an installer upgrade. Existing manual flow (`forge channel msteams-login` from a project root with .env present) still works for refresh-token rotation and non-TUI deployments.

Error reproduced: AADSTS7000218 — "The request body must contain the following parameter: 'client_assertion' or 'client_secret'." The OAuth 2.0 device-authorization grant (RFC 8628) is designed for public clients (no secret), but Microsoft Entra enforces stricter credential checks when the registered app is a CONFIDENTIAL client (a "web" application registration — anything with a client_secret). For those apps, /token requires the client_secret to be present even in the device_code grant. The runtime adapter (forge-plugins/channels/msteams/auth.go) already conditionally included client_secret in its refresh_token grant, so this only manifested in the new device-code path. Changes - internal/devicecode/devicecode.go: PollDeviceToken signature gains a clientSecret string parameter. When non-empty, it's included in the form body. Public-client apps tolerate the extra field, so always passing it (when available) is safe — there's no need to ask the operator which kind their app is. - cmd/channel_msteams_login.go: - New --client-secret flag. - Reads MSTEAMS_CLIENT_SECRET from the shell env or .env (same precedence pattern as tenant/client) when the flag is unset. - Threads the value through to PollDeviceToken. - internal/tui/steps/channel_step.go: pollTokenCmd now passes s.tokens["MSTEAMS_CLIENT_SECRET"] which was captured in TUI step 3. The operator doesn't have to do anything different — the wizard had the secret in hand all along, we just weren't using it. Side note: /devicecode itself does NOT require client_secret (it's public for both client types), so RequestDeviceCode is unchanged. Only the /token POST is affected. Tested - gofmt -w + golangci-lint — 0 issues - go test ./forge-cli/... — all green - Binary rebuilt at /tmp/forge-issue-75-build/forge

User saw the new ">> client_secret WAS sent (len=40)" diagnostic, which confirmed the runtime IS sending client_secret correctly — Entra is rejecting it for a different reason. The actual most-common cause for AADSTS7000218 when client_secret IS in the request body: the Entra app has 'Allow public client flows' OFF under Authentication → Advanced settings. The device-code grant is classified as a "public client flow" regardless of how the client authenticates, and Entra rejects it outright on confidential-client apps until that toggle is flipped on. Update the diagnostic to point users at the toggle first, with the expired-secret check as the less-common fallback. Format the diag as multiple ">>"-prefixed lines so terminal wrapping can't hide it. No code-path changes — only the error-message text differs. The actual form-body handling (which IS correct) is unchanged.

User report: even after providing tenant_id / client_id / client_secret in the TUI wizard (steps 1-3 of the msteams channel) and skipping step 4 (device-code login), the generated agent project had no MSTEAMS_* entries in .env or .forge/secrets.enc. First `forge run` attempt fails with: Error: initialising msteams: msteams: tenant_id is required (set MSTEAMS_TENANT_ID) Root cause: cmd/init.go:buildEnvVars has a per-channel switch that converts opts.EnvVars (map) into the []envVarEntry written to disk. That switch only knew about telegram and slack. msteams matched nothing, so the four MSTEAMS_* keys captured by the TUI's ChannelStep.Apply → ctx.ChannelTokens → opts.EnvVars chain were silently dropped on the floor at the persistence step. Fix: add a msteams case to the switch. Emits: - MSTEAMS_TENANT_ID → .env (GUID, not a secret per isSecretKey) - MSTEAMS_CLIENT_ID → .env (GUID) - MSTEAMS_CLIENT_SECRET → .forge/secrets.enc (suffix _SECRET → secret) - MSTEAMS_REFRESH_TOKEN → .forge/secrets.enc (suffix _TOKEN → secret) or placeholder in .env when empty (e.g., the user skipped step 4 due to the "Allow public client flows" Entra gotcha and intends to capture the token later via `forge channel msteams-login --write-env`) - MSTEAMS_USER_ID → .env if set (only used for client_credentials flow; delegated flow auto-resolves via /me at runtime, so we omit when empty) The existing splitEnvVars + isSecretKey machinery already routes each key correctly based on its suffix — this commit just makes sure the keys actually reach that machinery. go test ./forge-cli/... — all green golangci-lint — 0 issues

User report: after toggling "Allow public client flows" = Yes in Entra to fix AADSTS7000218 ("secret required") on the device-code call, the RUNTIME refresh-token call started failing with the INVERSE error: AADSTS700025: Client is public so neither 'client_assertion' nor 'client_secret' should be presented. Microsoft enforces strict mutual exclusivity: - app confidential (toggle OFF) → client_secret REQUIRED - app public (toggle ON) → client_secret FORBIDDEN The agent can't tell which type the app is from inside the runtime, and forcing the operator to declare it in msteams-config.yaml just shifts the confusion. Instead, both code paths now auto-detect. forge-plugins/channels/msteams/auth.go - Split refreshLocked into refreshLocked + requestToken + postTokenForm so the retry logic is testable without duplicating form construction. - First attempt sends client_secret when configured. On AADSTS700025 (matched by string contains, no JSON re-parse), drop the secret and retry once on the same call. - Cache the inference in authManager.publicClientInferred — every refresh after the first detection skips the secret directly, avoiding a round-trip per hourly refresh. - Inference is per-process: rebooting the agent re-detects on first refresh. That's intentional; operators occasionally flip the toggle and we want the next start to discover the new truth. - client_credentials flow is unchanged — it's inherently confidential so the inference doesn't apply. forge-cli/internal/devicecode/devicecode.go - Same pattern inside PollDeviceToken's polling loop: sendSecret flag starts true, flips false on AADSTS700025, the same device_code (which is valid for ~15 minutes) is re-used on the immediate retry. - This makes the TUI step 4 (inline OAuth) work for both app types out of the box. Test surface - go test ./forge-plugins/channels/msteams/ — all green (existing TestAuthManager_* and TestGraphClient_* still pass against the refactored requestToken/postTokenForm split) - go test ./forge-cli/... — all green - gofmt + golangci-lint — 0 issues Together with the existing AADSTS7000218 diagnostic message ("Allow public client flows is OFF → toggle it on"), both Entra client-type-mismatch errors now resolve themselves automatically without operator intervention beyond the initial registration choice.

User report: agent boots and authenticates as MK successfully, then the poll loop fires repeatedly with HTTP 412: PreconditionFailed: Requested API is not supported in delegated context Root cause is a Graph API permission mismatch baked into PR #76's design. /users/{id}/chats/getAllMessages/delta is APP-ONLY: Permission type: Delegated — NOT supported Permission type: Application — ChatMessage.Read.All, Chat.Read.All The TUI wizard walks operators through the *delegated* OAuth flow (device-code, personal API permissions, no admin consent), so the token we present is delegated and Entra rejects with 412. For delegated, Graph's per-chat /chats/{id}/messages/delta endpoint *does* support delegated context (Chat.Read / ChatMessage.Read), but there's no single getAll-style endpoint that does. The adapter must enumerate /me/chats and maintain one delta cursor per chat. graph.go: + ListChats(ctx, limit) → GET /me/chats?$select=id,topic,chatType&$top=N + InitialChatDeltaURL(chatID, since) → /chats/{id}/messages/delta?$filter=... Both new endpoints work with the delegated Chat.Read scope already requested by the wizard. Existing InitialDeltaURL kept for the client_credentials path. cursor.go: + Per-chat cursor map persisted alongside the existing single global deltaLink (both fields coexist in cursorFile JSON). + load/save keep the app-only behaviour for backwards compat. + loadChats/saveChats added for the delegated path. + writeLocked centralises the atomic tmp+rename so both APIs serialise to the same JSON shape. msteams.go: + pollLoop dispatches by auth flow: FlowDelegated → pollLoopDelegated (per-chat) FlowClientCredentials → pollLoopAppOnly (existing single cursor) + pollLoopDelegated: - Lists chats once at startup, then every 60s in a separate refresh timer so new chats appear without restart. - Walks every known chat per poll tick (default 5s). - On per-chat 410, reinits just that chat's cursor (the others keep their position). - On 401, force-refreshes the token (same recovery as app-only). - On 429, honours Retry-After. - Exponential backoff on transient errors, reset on success. - Tracks chatType from /me/chats so the admission gate always sees the correct oneOnOne/group/meeting classification even when the chatMessage payload omits it. + dispatch gains a chatTypeHint parameter (passed by the delegated loop, empty by the app-only loop) so admission can fall back to the per-chat map when msg.ChatType is absent. go test ./forge-plugins/... — all green golangci-lint — 0 issues This also closes the design gap from #75 §1.3: "An Entra ID app authenticates as a single Teams user … polls /users/{id}/chats/getAllMessages/delta on a 5-second cadence." The operational story stays the same — outbound-only, no public endpoint, ~5s latency floor — but the Graph endpoint the adapter calls now matches the permission model the wizard sets up.

User report: per-chat polling started but every chat got HTTP 400: BadRequest: Unsupported request: Change tracking is not supported against 'microsoft.graph.chatMessage'. Root cause: /chats/{id}/messages/delta does NOT accept $filter. The $filter=lastModifiedDateTime gt <ts> clause I was using to skip historical backlog gets rejected outright by Microsoft Graph. The Graph delta pattern for chatMessage: - First call (no $filter) → returns ALL current messages in the chat, paginated via @odata.nextLink, ending in @odata.deltaLink - Subsequent calls (with deltaLink) → returns only changes since deltaLink was issued Dispatching the initial-sync messages would flood the channel handler with arbitrary chat history. The adapter must drain that initial sync silently, persist the deltaLink, and only treat messages from polls made AFTER the first deltaLink as new chat activity. Changes graph.go - InitialChatDeltaURL no longer takes `since time.Time` and no longer sets $filter. Just hits /chats/{id}/messages/delta with no query params. Comment explicitly calls out the change-tracking restriction. cursor.go - Per-chat entry promoted from `map[string]string` (bare URL) to `map[string]chatCursorState{URL, DeltaSeen}`. DeltaSeen flips true the first time Graph returns @odata.deltaLink for the chat. - loadLocked now falls back to parse the legacy string-valued schema so cursor files produced by previous commits (b657cc8) are still accepted — they're treated as initial-sync state (DeltaSeen=false) so the next poll redoes initial sync rather than dispatching the next page as new traffic. msteams.go (pollLoopDelegated) - Walks cursors as chatCursorState rather than bare strings. - When DeltaSeen is false: drain the response without dispatching, but DO mark each message in the dedup ring so a later cursor reset can't accidentally re-emit them. - When DeltaSeen is true: dispatch normally. - On deltaLink arrival, flip DeltaSeen=true and log "chat X initial sync complete (dropped N historical messages); now tracking changes" so operators can see the transition. - On 410 (cursor expired), reset to InitialChatDeltaURL AND clear DeltaSeen — the next poll re-does initial sync. Tests + lint clean.

User report: cursor reset, fresh build, still hit: BadRequest: Unsupported request: Change tracking is not supported against 'microsoft.graph.chatMessage'. Root cause is structural: Microsoft Graph v1.0 exposes NO delta primitive for chatMessage in delegated context. Both endpoints I tried fail by design: /chats/{id}/messages/delta → HTTP 400 "Change tracking is not supported" /users/{id}/chats/getAllMessages/delta → HTTP 412 "API not supported in delegated context" The only chatMessage delta that works in delegated v1.0 is the BETA endpoint — but we're (correctly) targeting v1.0. Solution: drop the delta primitive entirely from the delegated path and use plain GET /chats/{id}/messages?$top=50&$orderby=lastModifiedDateTime%20desc with client-side watermark tracking via lastModifiedDateTime. graph.go - Replace InitialChatDeltaURL with ListChatMessages, a non-delta list call that returns newest-first up to 50 entries. Comment documents both failed delta attempts so future readers don't re-litigate this. - InitialDeltaURL (the app-only getAllMessages/delta call) kept for the client_credentials path where it does work. cursor.go - chatCursorState shrinks from {URL, DeltaSeen} to {LastSeenTime}. Per-chat state is now just a single RFC3339 timestamp string. - Removed legacy schema fallback in loadLocked: when the on-disk file doesn't match the current schema, treat as empty so the next poll re-anchors cleanly. State files are internal scratch, not public API. msteams.go (pollLoopDelegated) - First contact with a chat: anchor LastSeenTime at "now" and don't poll yet (no need to GET when we'd throw the results away anyway). The next tick starts dispatching real new messages. - Steady state: ListChatMessages(chat, 50), keep only entries with lastModifiedDateTime > LastSeenTime, dispatch in chronological order (Graph returns desc; we reverse so handler sees oldest first for session continuity), then advance LastSeenTime to the newest message's timestamp. - compareTimestamps helper: lexicographic comparison works for all ISO-8601 strings Graph returns because they're fixed-width. - All existing error handling (401 refresh, 429 backoff, generic backoff) carries over unchanged. The visible operational change for operators: - Old log: "chat X initial sync complete (dropped N historical messages); now tracking changes" - New behaviour: no historical-message log line because the first contact with a chat now skips the GET entirely. The cursor file just records a timestamp on the first tick. Tests + lint clean.

User report: agent is operational (self-loop guard fired correctly on the user's own message) but one meeting chat (19:meeting_...@thread.v2) keeps returning 403 forbidden. With the previous code that error path broke out of the per-chat iteration AND tripped global exponential backoff, so a single chat we lack permission for was effectively slowing down polling on the 49 healthy chats. Microsoft applies stricter permission gates on meeting chats: the delegated Chat.Read scope is sometimes insufficient and per-meeting consent (granted by the meeting organiser) is required. That's a permanent state for any given (token, chat) pair — retrying every 5 seconds is pure log spam. Changes (pollLoopDelegated only) - New forbiddenChats set tracks chats we've already learned we cannot read. Skipped at the top of the per-chat iteration so they cost zero HTTP calls. - On 403: * log once at NORMAL level (no "WARN", no backoff) with a hint about meeting-chat consent * add the chat to forbiddenChats * `continue` instead of `break` — the rest of the chats poll normally on this tick - Generic transient errors (5xx, network): also `continue` now instead of `break`. We still set anyErr=true so a global backoff kicks in if EVERY chat errored, but a single chat hiccup no longer blocks the rest. - On chat-list refresh (every 60s), clear forbiddenChats so a newly-granted permission gets picked up. - 401 and 429 still break out of the chat loop — those are global by nature (token expired = no chat works; rate limit applies across all calls in the tenant). No tests added — the polling loop is hard to unit-test without extensive HTTP mocking. Behavioural change covered by manual run against the user's tenant. Other modules unaffected. Lint + go test clean.

User report: "@forge Agent how can you help?" tagged in a group chat was dropped by the admission gate as "msteams: dropping message authored by self". The user expected the agent to reply. Root cause is a fundamental property of delegated-mode Teams bots that PR #76's admission logic got wrong: In delegated mode, the agent acts AS the user (the delegated OAuth2 token grants the user's identity). When the agent posts a reply via PostChatMessage, Graph stamps that reply with from.user.id = ownUserID. When THE USER types a message from any Teams client, Graph stamps it with the same from.user.id = ownUserID. From inside admission, looking at from.user.id alone, the two are indistinguishable. The previous self-loop guard ("drop if msg.From.User.ID == ownUserID") was correct in spirit but caught both cases — including the case the user actually cares about (typing a question to the agent from their own Teams client). The result: any group chat or DM where the user participates is silently dropped, agent never responds. Correct loop prevention for delegated mode is per-message-ID: track the IDs of messages we sent ourselves, and let the dedup ring catch them before they reach admission. Changes graph.go - PostChatMessage now returns (string, error) — the created message's Graph ID. - PostChatMessageWithAttachment similarly. Comment explains why the return value matters in delegated mode. msteams.go - SendResponse and sendChunked capture the ID from each successful POST and call markSent(id), which is a thin wrapper around dedup.mark for outbound messages. - markSent handles the empty-id corner gracefully so call sites don't need to check. admission.go - Removed the from.user.id == ownUserID gate. ownUserID stays on the function signature (diagnostic value, future use) but is explicitly _ = ownUserID with a long comment explaining the delegated-identity trap and why dedup is the canonical loop breaker. msteams_test.go - TestAdmit_SelfLoopBeatsAllowlist replaced with TestAdmit_SelfAuthoredUserMessageAdmitted, which encodes the new behaviour: a DM authored by ownUserID is admitted (loops are caught upstream by dedup, not here). - TestGraphClient_PostChatMessage updated to assert the returned message ID flows through correctly and matches what the stub server returned in its response body. Behaviourally this is also right for the app-only (client_credentials) path: agent posts as the application, so from.application is set on its own messages instead of from.user, and the user.id check never fired for app-posts anyway. Removing it is strictly more permissive in delegated mode, no regression in app-only mode. Lint + go test clean.

@text

…ntion User report: DM to agent works (admission passes via chat_type=oneOnOne) but "@forge Agent how can you help?" in a group chat is dropped with "non-mention non-dm message" — no clue WHY admission didn't see it as a mention. Add diagnostic logging on the non-mention drop path so the next round of testing shows exactly what's in the mentions[] array. Two outcomes: Empty mentions[]: "no formal mentions in payload — in delegated mode you must @-mention your own Teams display name (autocomplete required); the agent shares your identity (<ownUserID>)" → user typed "@text" as plain text instead of using Teams' autocomplete to create a formal mention; OR they tried to mention a display name that doesn't resolve to a real Teams identity. In delegated mode the agent has no separate Teams display name — it shares the operator's user identity, so the operator must @-mention their OWN display name to invoke the agent. Non-empty mentions[] but no match: Logs each mention's text/user.id/displayName so the operator can see whether the mention is targeting a different user, an app id, or the right user under a different identifier. Diagnostic only — no behavioural change to admission or dedup. The delegated-identity quirk is structural and worth documenting in a follow-up commit on the user guide, but the log now points operators at the cause without needing source-level debugging.

User report: setup was complete, DM worked, but @-mentions in group chats didn't trigger the agent. Diagnosis: during the device-code flow the user signed in as themselves (MK) — so the agent's ownUserID is MK's UUID, not the "Forge Agent" Microsoft 365 account they created for the bot. @-mentioning Forge Agent in a chat doesn't match MK's UUID and admission drops with non-mention. The architectural rule for delegated-mode Teams agents: sign in as the account you want the agent to ACT AS, which is also the account other users will MENTION to invoke it. For multi-user deployments this should be a dedicated service account (e.g. forge-agent@yourtenant.onmicrosoft.com), not the operator's personal account. Add a prominent ImportantTxt-styled banner to the TUI step 4 (the "waiting for sign-in" view) so operators see this before completing device-code consent. Three lines, error-styled, embedded right above the spinner: IMPORTANT: sign in as the dedicated Microsoft 365 account you want the agent to ACT AS (e.g. forge-agent@yourtenant). Other Teams users will @-mention that account to invoke the agent. No code-path change; pure documentation surface. The existing failure modes (signing in with the wrong account) are still recoverable via `forge channel msteams-login --write-env` from the project root.

User report: after invoking the agent with "@forge Agent can you summarize this week's updates?" in a group chat, the LLM correctly admitted the mention but had no idea what "this week's updates" referred to. The agent responded with a template, asking the user to paste sources. Architectural mismatch: the polling design anchors LastSeenTime to "now" on first contact and only dispatches truly new messages. That's correct for the "respond when invoked" loop semantics — but it means the LLM sees only the literal mention text and zero conversational context. Teams chats are thread-shaped; the chat history IS what the agent should be reasoning about. Fix: when a message passes admission, fetch the most recent N messages in the same chat via ListChatMessages and prepend them as a context block before the user's actual prompt. The historical messages are NOT dispatched as separate events (that would flood the handler) — they live only inside the current event's Message text. Format: [Recent chat history for context — most recent message at the bottom:] 05-20 14:32 Alice: working on the billing migration 05-21 09:15 Bob: tests are green ... [End of history. The user's current message follows:] @forge Agent can you summarize this week's updates? The current message ID is filtered out of the history (it's already the prompt). HTML bodies are converted to plain text via the existing TeamsHTMLToPlain helper. Bot-authored messages are tagged with "(bot)" in the author field so the LLM doesn't conflate them. Configuration (msteams-config.yaml): include_recent_history: true # default; turn off for chats where # surrounding history would be noise # (e.g., alerts-only channels) recent_history_count: 20 # 1-50, hard-capped at 50 (Graph's # per-request limit on /chats/{id}/messages) The formatted block is additionally soft-capped at ~5000 chars in prependChatHistory to protect the LLM context budget — chatty threads won't blow the context window even if recent_history_count is at the max. Cost: one extra HTTP call per admitted message. Mentions/DMs are infrequent enough that this is a non-issue; large chats with high-frequency admission would benefit from a future cache.

initializ-mk added 16 commits May 20, 2026 18:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(channels): Microsoft Teams adapter via Graph polling (Phases A-D, closes #75)#76

feat(channels): Microsoft Teams adapter via Graph polling (Phases A-D, closes #75)#76
initializ-mk wants to merge 16 commits into
mainfrom
feature/msteams-graph-polling

initializ-mk commented May 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

initializ-mk commented May 20, 2026

Summary

Diff stats

Architecture invariants preserved

Error handling

Tests (47 total, no network)

Deferred to a follow-up PR

Open questions deferred (from #75 §14)

Reference patterns followed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant