feat(channels): Microsoft Teams adapter via Graph polling (Phases A-D, closes #75)#76
Open
initializ-mk wants to merge 16 commits into
Open
feat(channels): Microsoft Teams adapter via Graph polling (Phases A-D, closes #75)#76initializ-mk wants to merge 16 commits into
initializ-mk wants to merge 16 commits into
Conversation
Implements Phases A-D of the FORGE_MSTEAMS_CHANNEL_GRAPH_POLLING design: core adapter + Graph client + auth + markdown converter + unit tests. Outbound-only — no inbound webhooks, no public endpoint, no Bot Framework. CLI wizard (Phase E), user-facing setup docs, and live end-to-end testing against a real tenant are deferred to a follow-up PR — see #75 OPEN QUESTIONS for the wizard UX decisions still to settle (delegated vs client_credentials default, mention-strip behavior). The plumbing already supports `forge channel add msteams` non-interactively via the existing add/serve flow. Files forge-plugins/channels/msteams/ - msteams.go Plugin shell (Init/Start/Stop/SendResponse/ NormalizeEvent), poll loop with 3/5/60s clamp, backoff state machine, dispatch + admission gate + dedup, large-response handling that mirrors slack/telegram (summary + hosted-content attachment, fallback to chunked). - auth.go OAuth2 token manager: delegated refresh-token flow + client_credentials. Caches tokens with a 60s expiry window. Refresh-token rotation hook for the runner to persist back to .env / K8s Secret. - graph.go Typed Graph client: /me, /users/{id}, delta paging via getAllMessages/delta, POST chats/{id}/messages, hosted-content attachments (4 MiB cap enforced). Classifies responses into errUnauthorized / errForbidden / errCursorExpired (410) / *rateLimitedError (Retry-After clamped 10-300s) / generic 5xx. No `model=A/B` query param (metering era ended 2025-08-25). - admission.go 4-stage gate: self-loop FIRST (beats allowlist even on misconfiguration), then bot-allowlist check, then mode filter (mention / dm / mention_or_dm). stripBotMention drops the leading "@DisplayName" so the LLM doesn't see the bot's own name as the first word. - cursor.go .forge/channels/msteams-cursor.json with atomic write (tmp + rename), 0700 dir perm. Corrupt file loads as empty. - dedup.go Sliding 1000-ID ring; oldest evicted first; concurrent-safe. forge-plugins/channels/markdown/ - teams.go MarkdownToTeamsHTML (headings, bold, italic, inline code, fenced code, links, ul, ol, blockquote, strikethrough; HTML-safe escaping). TeamsHTMLToPlain (strips Teams chatMessage HTML to text, preserves @mention display names, decodes entities). SplitMessageTeams (24KB threshold — leaves headroom for HTML-tag overhead under the ~28KB Teams body limit). ExtractMention (mentions[] + <at id="N"> belt-and-braces). - teams_test.go 18 tests across all converter surfaces. forge-core/security/capabilities.go - msteams bundle: graph.microsoft.com + login.microsoftonline.com. Sovereign clouds (graph.microsoft.us, microsoftgraph.chinacloudapi.cn) stay out of the default bundle — operators add via egress.allowed_domains. forge-cli/cmd/channel.go - ValidArgs include msteams. createPlugin + defaultRegistry register msteams.New(). addChannelEgressToForgeYAML adds msteams to egress.capabilities (same pattern as slack). printSetupInstructions prints the Entra app registration steps. forge-cli/templates/init/ - msteams-config.yaml.tmpl Config emitted by `forge channel add msteams`. All settings ending in _env are auto-discovered by the existing ChannelsStage from PR #54 for K8s manifest emission — zero edits to k8s_stage.go or channels_stage.go required. - env-msteams.tmpl .env placeholders for the five MSTEAMS_* variables. Tests (all under unit-test budget — no network, no real Graph calls) 47 tests added across two packages: - channels/markdown: 18 (MarkdownToTeamsHTML / TeamsHTMLToPlain / SplitMessageTeams / ExtractMention) - channels/msteams: 29 covering: * dedup: seen-and-mark, ring eviction (oldest first), duplicate-mark non-eviction * cursor: atomic save + load round-trip, 0700 dir perm, corrupt-file recovery * admission: self-loop beats allowlist (HARD rule), DM-only, mention-only, mention_or_dm, bot allowlist admit/drop * stripBotMention: 6 cases including case- insensitive prefix and non-start non-strip * parseAllowBotIDs: comma/space/newline splits * graph: /me happy path, delta paging (@odata.nextLink shape), all four error branches (401/403/410/429), Retry-After parsing including floor/ceiling clamps, POST chats/{id}/messages payload shape * auth: delegated refresh-token rotation hook fires + cache prevents second HTTP call, client_credentials happy path, refresh-token-expired error surfacing * Plugin Init validation: tenant required, delegated needs refresh_token, client_creds needs user_id, poll_interval_seconds clamped to 3/5/60 Zero edits to channels_stage.go, k8s_stage.go, requirements_stage.go, template_data.go, K8s manifest templates, or cmd/package.go — the existing _env-suffix scanner from PR #54 picks up msteams-config.yaml automatically. go test ./... forge-core ✅ forge-cli ✅ forge-plugins ✅ golangci-lint 0 issues across all three modules gofmt clean Deferred to a follow-up PR (Phase E + F from #75): - forge-cli/cmd/channel_msteams.go interactive bubbletea wizard with device-code flow capture and live /me validation - docs/channels/msteams.md user-facing setup guide (Entra app registration, permissions, troubleshooting, sovereign-cloud override) - End-to-end manual test against a real tenant - FORGE_PROJECT_DESIGN.md Channel Connectors table row
The init wizard's channel-selector step (forge-cli/internal/tui/steps/
channel_step.go) was hard-coded to None/Telegram/Slack. Adding the
msteams adapter to the runtime + CLI without surfacing it in the
interactive picker meant TUI users couldn't actually pick it.
Changes
- Add the "MS Teams" item to the SingleSelect list with the 👥 icon
and the "Graph polling, no public URL needed" subtitle (same
outbound-only framing as the existing options).
- Add 3 new phases for the 4-step token capture:
channelMsteamsClientIDPhase
channelMsteamsClientSecretPhase
channelMsteamsRefreshTokenPhase
(The 1st step — tenant_id — reuses the existing channelTokenPhase
via the per-channel switch already used for Slack's two-step flow.)
- New `updateMsteamsPhase` helper drives the chain; each phase stores
the just-collected value under the MSTEAMS_* env name and advances
to the next prompt, or marks the step complete on the last step.
- New `newMsteamsInput` helper centralises the theme-binding boilerplate
so the 4 phases stay terse.
- View() adds instructions for each of the 4 steps. The refresh-token
step embeds the device-code curl invocation so the operator has the
command at hand without leaving the TUI.
- Summary() returns "MS Teams" for the post-wizard recap row.
No new TUI components needed — reuses the existing SingleSelect +
SecretInput primitives. No changes to wizard plumbing
(WizardContext.ChannelTokens, init.go env-var copy, review_step,
egress_step) — the existing per-token-name map already supports
arbitrary channels.
The full device-code polling automation (where the wizard would call
/devicecode + /token itself instead of having the operator paste the
refresh token) was explicitly deferred in PR #76's body and remains
deferred; this commit just removes the "msteams is invisible in the
TUI" papercut.
go test ./forge-cli/... — all green
gofmt -w + golangci-lint — 0 issues
forge --version against /tmp/forge-issue-75-build/forge — boots
forge init shows MS Teams 👥 in the channel selector
The OAuth 2.0 device-code flow has two halves:
1. user-consent: visit verification_uri, enter user_code, sign in
2. client-polling: POST /token with grant_type=device_code repeatedly
until the user completes consent
The previous TUI Step 4/4 instructions documented only half (1), so an
operator who finished the sign-in page never saw a refresh token —
nothing on their side was polling /token. The "successfully given
permission" Microsoft page is the consent-side result; the refresh
token only materialises when the client side polls and exchanges the
device_code.
New subcommand: forge channel msteams-login
- Reads tenant_id and client_id from --flags, $MSTEAMS_TENANT_ID /
$MSTEAMS_CLIENT_ID in the shell env, or .env in the current dir
(first non-empty wins).
- Calls POST {login_base}/{tenant}/oauth2/v2.0/devicecode with
scope=https://graph.microsoft.com/.default offline_access — same
scope set the runtime authManager uses, so the refresh token captured
here is interchangeable with one captured externally.
- Prints verification_uri + user_code to stderr (so stdout stays clean
for piping). Polls /token at the server-advertised interval, honours
slow_down (interval += 5s), handles the four terminal error_codes
(authorization_pending continue / expired_token / access_denied /
generic error).
- On success, prints refresh_token to stdout (newline only — no labels,
pipe-friendly). With --write-env, appends or replaces the
MSTEAMS_REFRESH_TOKEN=<token> line in .env directly.
- Sovereign clouds: --login-base override (default
https://login.microsoftonline.com; pass login.microsoftonline.us for
US Gov, login.chinacloudapi.cn for China).
- Timeout: --timeout-seconds (default 900 = 15 minutes, matching
Microsoft's device-code lifetime).
TUI Step 4/4 (channel_step.go) updated:
- Replaces the cryptic curl snippets with a single instruction to run
`forge channel msteams-login` in a second terminal.
- Mentions --write-env so the operator can skip the paste-back entirely
and just leave the TUI field blank.
Tested: forge channel msteams-login --help prints the expected
usage. Real Entra calls happen at runtime; the polling state machine
mirrors the runtime auth.go's refresh-token rotation handler.
go test ./forge-cli/... — all green
golangci-lint — 0 issues
…-open browser)
User feedback: running `forge channel msteams-login` from a second
terminal during the TUI wizard didn't work because the agent project
directory (and .env) don't exist until the wizard completes. Followed
by: "can tui trigger a browser open once I have provided the tenantid,
clientid and secret via tui?" — yes, and that's the right UX.
This change makes the MS Teams branch of `forge init` self-contained:
After collecting tenant_id (step 1), client_id (step 2), client_secret
(step 3), the wizard automatically:
4a. POSTs /devicecode using the tenant + client just provided
4b. Opens the verification_uri in the operator's default browser
(cross-platform: open / xdg-open / start; failures are
silently degraded — the URL is also shown on screen)
4c. Renders the one-time user_code prominently in the TUI
4d. Polls /token in the background until the operator approves
consent in the browser
4e. Captures the refresh_token, stores it under
MSTEAMS_REFRESH_TOKEN in the wizard's token map, and advances
No second terminal. No manual paste. No curl wrangling. The wizard
finishes with a complete .env.
Architecture
- New shared package `forge-cli/internal/devicecode/` containing the
RFC 8628 device-authorization-grant primitives:
DeviceCodeResponse, TokenResponse,
RequestDeviceCode, PollDeviceToken,
OpenURL (cross-platform browser launcher mirroring the private
openBrowser in forge-core/llm/oauth/flow.go)
Used by BOTH the `forge channel msteams-login` CLI subcommand AND
the TUI. Same scope (graph.microsoft.com/.default offline_access),
same poll cadence, same error class handling — refresh tokens
produced by either path are interchangeable.
- `cmd/channel_msteams_login.go` refactored to import from there;
the duplicate types and helpers in this file are removed (~145
lines down to ~210, mostly env-var helpers). Behaviour unchanged.
As a small UX upgrade, this subcommand now also opens the
verification_uri in the browser automatically.
TUI implementation
- New state `channelMsteamsDeviceLoginPhase` replaces the old
`channelMsteamsRefreshTokenPhase` (which had been a paste-back
SecretInput).
- New sub-state machine inside that phase: msteamsLoginRequesting →
msteamsLoginWaiting → success | msteamsLoginErr.
- Two tea.Cmd dispatchers: requestDeviceCodeCmd (30s timeout) and
pollTokenCmd (15-minute timeout, matches Microsoft's device-code
expiry).
- View() renders:
* "⣾ Requesting a one-time code from Microsoft..." during step 4a
* The verification URL + user_code prominently, plus a
"⣾ Waiting for you to complete sign-in..." spinner during 4d
* On failure: error message + "Press R to retry, or S to skip"
(skip allows the operator to finish the wizard and capture the
refresh token later via `forge channel msteams-login --write-env`
from the new project root)
Build / lint / test all green. No changes to runtime adapter code
(forge-plugins/channels/msteams/) — this is purely an installer
upgrade.
Existing manual flow (`forge channel msteams-login` from a project
root with .env present) still works for refresh-token rotation and
non-TUI deployments.
Error reproduced: AADSTS7000218 — "The request body must contain the
following parameter: 'client_assertion' or 'client_secret'."
The OAuth 2.0 device-authorization grant (RFC 8628) is designed for
public clients (no secret), but Microsoft Entra enforces stricter
credential checks when the registered app is a CONFIDENTIAL client
(a "web" application registration — anything with a client_secret).
For those apps, /token requires the client_secret to be present even
in the device_code grant.
The runtime adapter (forge-plugins/channels/msteams/auth.go) already
conditionally included client_secret in its refresh_token grant, so
this only manifested in the new device-code path.
Changes
- internal/devicecode/devicecode.go:
PollDeviceToken signature gains a clientSecret string parameter.
When non-empty, it's included in the form body. Public-client apps
tolerate the extra field, so always passing it (when available) is
safe — there's no need to ask the operator which kind their app is.
- cmd/channel_msteams_login.go:
- New --client-secret flag.
- Reads MSTEAMS_CLIENT_SECRET from the shell env or .env (same
precedence pattern as tenant/client) when the flag is unset.
- Threads the value through to PollDeviceToken.
- internal/tui/steps/channel_step.go:
pollTokenCmd now passes s.tokens["MSTEAMS_CLIENT_SECRET"] which
was captured in TUI step 3. The operator doesn't have to do
anything different — the wizard had the secret in hand all along,
we just weren't using it.
Side note: /devicecode itself does NOT require client_secret (it's
public for both client types), so RequestDeviceCode is unchanged.
Only the /token POST is affected.
Tested
- gofmt -w + golangci-lint — 0 issues
- go test ./forge-cli/... — all green
- Binary rebuilt at /tmp/forge-issue-75-build/forge
User saw the new ">> client_secret WAS sent (len=40)" diagnostic, which confirmed the runtime IS sending client_secret correctly — Entra is rejecting it for a different reason. The actual most-common cause for AADSTS7000218 when client_secret IS in the request body: the Entra app has 'Allow public client flows' OFF under Authentication → Advanced settings. The device-code grant is classified as a "public client flow" regardless of how the client authenticates, and Entra rejects it outright on confidential-client apps until that toggle is flipped on. Update the diagnostic to point users at the toggle first, with the expired-secret check as the less-common fallback. Format the diag as multiple ">>"-prefixed lines so terminal wrapping can't hide it. No code-path changes — only the error-message text differs. The actual form-body handling (which IS correct) is unchanged.
User report: even after providing tenant_id / client_id / client_secret
in the TUI wizard (steps 1-3 of the msteams channel) and skipping
step 4 (device-code login), the generated agent project had no
MSTEAMS_* entries in .env or .forge/secrets.enc. First `forge run`
attempt fails with:
Error: initialising msteams: msteams: tenant_id is required
(set MSTEAMS_TENANT_ID)
Root cause: cmd/init.go:buildEnvVars has a per-channel switch that
converts opts.EnvVars (map) into the []envVarEntry written to disk.
That switch only knew about telegram and slack. msteams matched
nothing, so the four MSTEAMS_* keys captured by the TUI's
ChannelStep.Apply → ctx.ChannelTokens → opts.EnvVars chain were
silently dropped on the floor at the persistence step.
Fix: add a msteams case to the switch. Emits:
- MSTEAMS_TENANT_ID → .env (GUID, not a secret per isSecretKey)
- MSTEAMS_CLIENT_ID → .env (GUID)
- MSTEAMS_CLIENT_SECRET → .forge/secrets.enc (suffix _SECRET → secret)
- MSTEAMS_REFRESH_TOKEN → .forge/secrets.enc (suffix _TOKEN → secret)
or placeholder in .env when empty (e.g., the user skipped step 4
due to the "Allow public client flows" Entra gotcha and intends to
capture the token later via `forge channel msteams-login --write-env`)
- MSTEAMS_USER_ID → .env if set (only used for client_credentials flow;
delegated flow auto-resolves via /me at runtime, so we omit when empty)
The existing splitEnvVars + isSecretKey machinery already routes each
key correctly based on its suffix — this commit just makes sure the
keys actually reach that machinery.
go test ./forge-cli/... — all green
golangci-lint — 0 issues
User report: after toggling "Allow public client flows" = Yes in Entra
to fix AADSTS7000218 ("secret required") on the device-code call, the
RUNTIME refresh-token call started failing with the INVERSE error:
AADSTS700025: Client is public so neither 'client_assertion' nor
'client_secret' should be presented.
Microsoft enforces strict mutual exclusivity:
- app confidential (toggle OFF) → client_secret REQUIRED
- app public (toggle ON) → client_secret FORBIDDEN
The agent can't tell which type the app is from inside the runtime,
and forcing the operator to declare it in msteams-config.yaml just
shifts the confusion. Instead, both code paths now auto-detect.
forge-plugins/channels/msteams/auth.go
- Split refreshLocked into refreshLocked + requestToken + postTokenForm
so the retry logic is testable without duplicating form construction.
- First attempt sends client_secret when configured. On AADSTS700025
(matched by string contains, no JSON re-parse), drop the secret
and retry once on the same call.
- Cache the inference in authManager.publicClientInferred — every
refresh after the first detection skips the secret directly,
avoiding a round-trip per hourly refresh.
- Inference is per-process: rebooting the agent re-detects on first
refresh. That's intentional; operators occasionally flip the toggle
and we want the next start to discover the new truth.
- client_credentials flow is unchanged — it's inherently confidential
so the inference doesn't apply.
forge-cli/internal/devicecode/devicecode.go
- Same pattern inside PollDeviceToken's polling loop: sendSecret flag
starts true, flips false on AADSTS700025, the same device_code
(which is valid for ~15 minutes) is re-used on the immediate retry.
- This makes the TUI step 4 (inline OAuth) work for both app types
out of the box.
Test surface
- go test ./forge-plugins/channels/msteams/ — all green (existing
TestAuthManager_* and TestGraphClient_* still pass against the
refactored requestToken/postTokenForm split)
- go test ./forge-cli/... — all green
- gofmt + golangci-lint — 0 issues
Together with the existing AADSTS7000218 diagnostic message ("Allow
public client flows is OFF → toggle it on"), both Entra
client-type-mismatch errors now resolve themselves automatically
without operator intervention beyond the initial registration choice.
User report: agent boots and authenticates as MK successfully, then the poll loop fires repeatedly with HTTP 412: PreconditionFailed: Requested API is not supported in delegated context Root cause is a Graph API permission mismatch baked into PR #76's design. /users/{id}/chats/getAllMessages/delta is APP-ONLY: Permission type: Delegated — NOT supported Permission type: Application — ChatMessage.Read.All, Chat.Read.All The TUI wizard walks operators through the *delegated* OAuth flow (device-code, personal API permissions, no admin consent), so the token we present is delegated and Entra rejects with 412. For delegated, Graph's per-chat /chats/{id}/messages/delta endpoint *does* support delegated context (Chat.Read / ChatMessage.Read), but there's no single getAll-style endpoint that does. The adapter must enumerate /me/chats and maintain one delta cursor per chat. graph.go: + ListChats(ctx, limit) → GET /me/chats?$select=id,topic,chatType&$top=N + InitialChatDeltaURL(chatID, since) → /chats/{id}/messages/delta?$filter=... Both new endpoints work with the delegated Chat.Read scope already requested by the wizard. Existing InitialDeltaURL kept for the client_credentials path. cursor.go: + Per-chat cursor map persisted alongside the existing single global deltaLink (both fields coexist in cursorFile JSON). + load/save keep the app-only behaviour for backwards compat. + loadChats/saveChats added for the delegated path. + writeLocked centralises the atomic tmp+rename so both APIs serialise to the same JSON shape. msteams.go: + pollLoop dispatches by auth flow: FlowDelegated → pollLoopDelegated (per-chat) FlowClientCredentials → pollLoopAppOnly (existing single cursor) + pollLoopDelegated: - Lists chats once at startup, then every 60s in a separate refresh timer so new chats appear without restart. - Walks every known chat per poll tick (default 5s). - On per-chat 410, reinits just that chat's cursor (the others keep their position). - On 401, force-refreshes the token (same recovery as app-only). - On 429, honours Retry-After. - Exponential backoff on transient errors, reset on success. - Tracks chatType from /me/chats so the admission gate always sees the correct oneOnOne/group/meeting classification even when the chatMessage payload omits it. + dispatch gains a chatTypeHint parameter (passed by the delegated loop, empty by the app-only loop) so admission can fall back to the per-chat map when msg.ChatType is absent. go test ./forge-plugins/... — all green golangci-lint — 0 issues This also closes the design gap from #75 §1.3: "An Entra ID app authenticates as a single Teams user … polls /users/{id}/chats/getAllMessages/delta on a 5-second cadence." The operational story stays the same — outbound-only, no public endpoint, ~5s latency floor — but the Graph endpoint the adapter calls now matches the permission model the wizard sets up.
User report: per-chat polling started but every chat got HTTP 400:
BadRequest: Unsupported request: Change tracking is not supported
against 'microsoft.graph.chatMessage'.
Root cause: /chats/{id}/messages/delta does NOT accept $filter. The
$filter=lastModifiedDateTime gt <ts> clause I was using to skip
historical backlog gets rejected outright by Microsoft Graph.
The Graph delta pattern for chatMessage:
- First call (no $filter) → returns ALL current messages in
the chat, paginated via
@odata.nextLink, ending in
@odata.deltaLink
- Subsequent calls (with deltaLink) → returns only changes since
deltaLink was issued
Dispatching the initial-sync messages would flood the channel handler
with arbitrary chat history. The adapter must drain that initial sync
silently, persist the deltaLink, and only treat messages from polls
made AFTER the first deltaLink as new chat activity.
Changes
graph.go
- InitialChatDeltaURL no longer takes `since time.Time` and no longer
sets $filter. Just hits /chats/{id}/messages/delta with no query
params. Comment explicitly calls out the change-tracking restriction.
cursor.go
- Per-chat entry promoted from `map[string]string` (bare URL) to
`map[string]chatCursorState{URL, DeltaSeen}`. DeltaSeen flips true
the first time Graph returns @odata.deltaLink for the chat.
- loadLocked now falls back to parse the legacy string-valued schema
so cursor files produced by previous commits (b657cc8) are still
accepted — they're treated as initial-sync state (DeltaSeen=false)
so the next poll redoes initial sync rather than dispatching the
next page as new traffic.
msteams.go (pollLoopDelegated)
- Walks cursors as chatCursorState rather than bare strings.
- When DeltaSeen is false: drain the response without dispatching,
but DO mark each message in the dedup ring so a later cursor reset
can't accidentally re-emit them.
- When DeltaSeen is true: dispatch normally.
- On deltaLink arrival, flip DeltaSeen=true and log
"chat X initial sync complete (dropped N historical messages);
now tracking changes"
so operators can see the transition.
- On 410 (cursor expired), reset to InitialChatDeltaURL AND clear
DeltaSeen — the next poll re-does initial sync.
Tests + lint clean.
User report: cursor reset, fresh build, still hit:
BadRequest: Unsupported request: Change tracking is not supported
against 'microsoft.graph.chatMessage'.
Root cause is structural: Microsoft Graph v1.0 exposes NO delta
primitive for chatMessage in delegated context. Both endpoints I tried
fail by design:
/chats/{id}/messages/delta → HTTP 400 "Change tracking
is not supported"
/users/{id}/chats/getAllMessages/delta → HTTP 412 "API not supported
in delegated context"
The only chatMessage delta that works in delegated v1.0 is the BETA
endpoint — but we're (correctly) targeting v1.0.
Solution: drop the delta primitive entirely from the delegated path
and use plain GET /chats/{id}/messages?$top=50&$orderby=lastModifiedDateTime%20desc
with client-side watermark tracking via lastModifiedDateTime.
graph.go
- Replace InitialChatDeltaURL with ListChatMessages, a non-delta
list call that returns newest-first up to 50 entries. Comment
documents both failed delta attempts so future readers don't
re-litigate this.
- InitialDeltaURL (the app-only getAllMessages/delta call) kept
for the client_credentials path where it does work.
cursor.go
- chatCursorState shrinks from {URL, DeltaSeen} to {LastSeenTime}.
Per-chat state is now just a single RFC3339 timestamp string.
- Removed legacy schema fallback in loadLocked: when the on-disk
file doesn't match the current schema, treat as empty so the
next poll re-anchors cleanly. State files are internal scratch,
not public API.
msteams.go (pollLoopDelegated)
- First contact with a chat: anchor LastSeenTime at "now" and
don't poll yet (no need to GET when we'd throw the results away
anyway). The next tick starts dispatching real new messages.
- Steady state: ListChatMessages(chat, 50), keep only entries with
lastModifiedDateTime > LastSeenTime, dispatch in chronological
order (Graph returns desc; we reverse so handler sees oldest
first for session continuity), then advance LastSeenTime to the
newest message's timestamp.
- compareTimestamps helper: lexicographic comparison works for all
ISO-8601 strings Graph returns because they're fixed-width.
- All existing error handling (401 refresh, 429 backoff, generic
backoff) carries over unchanged.
The visible operational change for operators:
- Old log: "chat X initial sync complete (dropped N historical
messages); now tracking changes"
- New behaviour: no historical-message log line because the first
contact with a chat now skips the GET entirely. The cursor file
just records a timestamp on the first tick.
Tests + lint clean.
User report: agent is operational (self-loop guard fired correctly on
the user's own message) but one meeting chat (19:meeting_...@thread.v2)
keeps returning 403 forbidden. With the previous code that error path
broke out of the per-chat iteration AND tripped global exponential
backoff, so a single chat we lack permission for was effectively
slowing down polling on the 49 healthy chats.
Microsoft applies stricter permission gates on meeting chats: the
delegated Chat.Read scope is sometimes insufficient and per-meeting
consent (granted by the meeting organiser) is required. That's a
permanent state for any given (token, chat) pair — retrying every
5 seconds is pure log spam.
Changes (pollLoopDelegated only)
- New forbiddenChats set tracks chats we've already learned we
cannot read. Skipped at the top of the per-chat iteration so
they cost zero HTTP calls.
- On 403:
* log once at NORMAL level (no "WARN", no backoff) with a hint
about meeting-chat consent
* add the chat to forbiddenChats
* `continue` instead of `break` — the rest of the chats poll
normally on this tick
- Generic transient errors (5xx, network): also `continue` now
instead of `break`. We still set anyErr=true so a global backoff
kicks in if EVERY chat errored, but a single chat hiccup no
longer blocks the rest.
- On chat-list refresh (every 60s), clear forbiddenChats so a
newly-granted permission gets picked up.
- 401 and 429 still break out of the chat loop — those are global
by nature (token expired = no chat works; rate limit applies
across all calls in the tenant).
No tests added — the polling loop is hard to unit-test without
extensive HTTP mocking. Behavioural change covered by manual run
against the user's tenant.
Other modules unaffected. Lint + go test clean.
User report: "@forge Agent how can you help?" tagged in a group chat was dropped by the admission gate as "msteams: dropping message authored by self". The user expected the agent to reply. Root cause is a fundamental property of delegated-mode Teams bots that PR #76's admission logic got wrong: In delegated mode, the agent acts AS the user (the delegated OAuth2 token grants the user's identity). When the agent posts a reply via PostChatMessage, Graph stamps that reply with from.user.id = ownUserID. When THE USER types a message from any Teams client, Graph stamps it with the same from.user.id = ownUserID. From inside admission, looking at from.user.id alone, the two are indistinguishable. The previous self-loop guard ("drop if msg.From.User.ID == ownUserID") was correct in spirit but caught both cases — including the case the user actually cares about (typing a question to the agent from their own Teams client). The result: any group chat or DM where the user participates is silently dropped, agent never responds. Correct loop prevention for delegated mode is per-message-ID: track the IDs of messages we sent ourselves, and let the dedup ring catch them before they reach admission. Changes graph.go - PostChatMessage now returns (string, error) — the created message's Graph ID. - PostChatMessageWithAttachment similarly. Comment explains why the return value matters in delegated mode. msteams.go - SendResponse and sendChunked capture the ID from each successful POST and call markSent(id), which is a thin wrapper around dedup.mark for outbound messages. - markSent handles the empty-id corner gracefully so call sites don't need to check. admission.go - Removed the from.user.id == ownUserID gate. ownUserID stays on the function signature (diagnostic value, future use) but is explicitly _ = ownUserID with a long comment explaining the delegated-identity trap and why dedup is the canonical loop breaker. msteams_test.go - TestAdmit_SelfLoopBeatsAllowlist replaced with TestAdmit_SelfAuthoredUserMessageAdmitted, which encodes the new behaviour: a DM authored by ownUserID is admitted (loops are caught upstream by dedup, not here). - TestGraphClient_PostChatMessage updated to assert the returned message ID flows through correctly and matches what the stub server returned in its response body. Behaviourally this is also right for the app-only (client_credentials) path: agent posts as the application, so from.application is set on its own messages instead of from.user, and the user.id check never fired for app-posts anyway. Removing it is strictly more permissive in delegated mode, no regression in app-only mode. Lint + go test clean.
…ntion User report: DM to agent works (admission passes via chat_type=oneOnOne) but "@forge Agent how can you help?" in a group chat is dropped with "non-mention non-dm message" — no clue WHY admission didn't see it as a mention. Add diagnostic logging on the non-mention drop path so the next round of testing shows exactly what's in the mentions[] array. Two outcomes: Empty mentions[]: "no formal mentions in payload — in delegated mode you must @-mention your own Teams display name (autocomplete required); the agent shares your identity (<ownUserID>)" → user typed "@text" as plain text instead of using Teams' autocomplete to create a formal mention; OR they tried to mention a display name that doesn't resolve to a real Teams identity. In delegated mode the agent has no separate Teams display name — it shares the operator's user identity, so the operator must @-mention their OWN display name to invoke the agent. Non-empty mentions[] but no match: Logs each mention's text/user.id/displayName so the operator can see whether the mention is targeting a different user, an app id, or the right user under a different identifier. Diagnostic only — no behavioural change to admission or dedup. The delegated-identity quirk is structural and worth documenting in a follow-up commit on the user guide, but the log now points operators at the cause without needing source-level debugging.
User report: setup was complete, DM worked, but @-mentions in group chats didn't trigger the agent. Diagnosis: during the device-code flow the user signed in as themselves (MK) — so the agent's ownUserID is MK's UUID, not the "Forge Agent" Microsoft 365 account they created for the bot. @-mentioning Forge Agent in a chat doesn't match MK's UUID and admission drops with non-mention. The architectural rule for delegated-mode Teams agents: sign in as the account you want the agent to ACT AS, which is also the account other users will MENTION to invoke it. For multi-user deployments this should be a dedicated service account (e.g. forge-agent@yourtenant.onmicrosoft.com), not the operator's personal account. Add a prominent ImportantTxt-styled banner to the TUI step 4 (the "waiting for sign-in" view) so operators see this before completing device-code consent. Three lines, error-styled, embedded right above the spinner: IMPORTANT: sign in as the dedicated Microsoft 365 account you want the agent to ACT AS (e.g. forge-agent@yourtenant). Other Teams users will @-mention that account to invoke the agent. No code-path change; pure documentation surface. The existing failure modes (signing in with the wrong account) are still recoverable via `forge channel msteams-login --write-env` from the project root.
User report: after invoking the agent with "@forge Agent can you summarize this week's updates?" in a group chat, the LLM correctly admitted the mention but had no idea what "this week's updates" referred to. The agent responded with a template, asking the user to paste sources. Architectural mismatch: the polling design anchors LastSeenTime to "now" on first contact and only dispatches truly new messages. That's correct for the "respond when invoked" loop semantics — but it means the LLM sees only the literal mention text and zero conversational context. Teams chats are thread-shaped; the chat history IS what the agent should be reasoning about. Fix: when a message passes admission, fetch the most recent N messages in the same chat via ListChatMessages and prepend them as a context block before the user's actual prompt. The historical messages are NOT dispatched as separate events (that would flood the handler) — they live only inside the current event's Message text. Format: [Recent chat history for context — most recent message at the bottom:] 05-20 14:32 Alice: working on the billing migration 05-21 09:15 Bob: tests are green ... [End of history. The user's current message follows:] @forge Agent can you summarize this week's updates? The current message ID is filtered out of the history (it's already the prompt). HTML bodies are converted to plain text via the existing TeamsHTMLToPlain helper. Bot-authored messages are tagged with "(bot)" in the author field so the LLM doesn't conflate them. Configuration (msteams-config.yaml): include_recent_history: true # default; turn off for chats where # surrounding history would be noise # (e.g., alerts-only channels) recent_history_count: 20 # 1-50, hard-capped at 50 (Graph's # per-request limit on /chats/{id}/messages) The formatted block is additionally soft-capped at ~5000 chars in prependChatHistory to protect the LLM context budget — chatty threads won't blow the context window even if recent_history_count is at the max. Cost: one extra HTTP call per admitted message. Mentions/DMs are infrequent enough that this is a non-issue; large chats with high-frequency admission would benefit from a future cache.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
First-party Microsoft Teams channel adapter implemented as a Microsoft Graph polling adapter (analogous to Telegram polling), not a Bot Framework webhook. Outbound-only — no inbound webhooks, no public endpoint. Preserves Forge's first design principle while finally putting Teams support on the table now that Graph delta polling and un-metered Teams APIs (since 2025-08-25) make the model viable.
Implements Phases A-D of the design recorded in #75:
/me, delta paging, POST messages, hosted-content attachments) + OAuth2 auth (delegated refresh-token + client_credentials)Closes #75 for the runtime adapter; CLI wizard, user docs, and live tenant validation are deferred to a follow-up PR (see "Deferred" below).
Diff stats
forge-plugins/channels/msteams/msteams.goforge-plugins/channels/msteams/auth.goforge-plugins/channels/msteams/graph.goforge-plugins/channels/msteams/admission.goforge-plugins/channels/msteams/cursor.goforge-plugins/channels/msteams/dedup.goforge-plugins/channels/msteams/msteams_test.goforge-plugins/channels/markdown/teams.goMarkdownToTeamsHTML,TeamsHTMLToPlain,SplitMessageTeams,ExtractMentionforge-plugins/channels/markdown/teams_test.goforge-core/security/capabilities.gomsteamscapability bundle:graph.microsoft.com,login.microsoftonline.comforge-cli/cmd/channel.gomsteamsadded to ValidArgs, createPlugin, defaultRegistry, egress wiring, setup instructionsforge-cli/templates/init/msteams-config.yaml.tmpl_env-suffix disciplineforge-cli/templates/init/env-msteams.tmpl.envplaceholders forMSTEAMS_*Zero edits required to
channels_stage.go,k8s_stage.go,requirements_stage.go,template_data.go, K8s manifest templates, orcmd/package.go— the existing_env-suffix scanner from PR #54 picks upmsteams-config.yamlautomatically. The architecture from Phase 7 (Slack/Telegram pattern) generalised cleanly.Architecture invariants preserved
graph.microsoft.com+login.microsoftonline.com.net/httpcalls only._env-suffix convention holds — every secret-bearing setting inmsteams-config.yaml.tmplends in_env. K8s manifest generation needs zero changes.model=A/Bquery param. Metering era ended 2025-08-25; the param is now ignored.userIDis misconfigured intoallow_bot_ids, self-authored messages are dropped (matches Slack/PR feat(slack): admit bot @mentions via allow_bot_ids, drop self always (closes #55) #56).tenant_id_env. Multi-tenant seam stays clean even though multi-tenant adapter isn't in scope.graph_base_url_env— US Gov / China / GCC operators overrideMSTEAMS_GRAPH_BASE_URLand add their domains toegress.allowed_domains. Default bundle stays minimal for the 99% on commercial cloud.Error handling
401auth.Refresh(); if it still 401s → 60s backoff403docs/channels/msteams.mdremediation hint; 60s backoff429Retry-Afterheader; clamped to [10s, 300s]410 Gone(delta cursor expired)$filter=lastModifiedDateTime gt <now>. WARN log. Never crashes.5xx/ networkTests (47 total, no network)
channels/markdown— 18 tests: every conversion-table row from [Feature]: Microsoft Teams channel adapter via Graph polling (outbound-only) #75 §6, plus HTML-escape safety, mention preservation, entity decoding,<br>handling, under-limit and over-limit splits, mention extraction with and without userID.channels/msteams— 29 tests:/me, delta paging with@odata.nextLink, all four error sentinels,Retry-Afterparsing with floor/ceiling clamps, POST payload shapeToken()call doesn't re-hit endpoint), client_credentials flow, token-endpoint error surfaces correctlyDeferred to a follow-up PR
These are independent from the runtime adapter and won't block this PR from being reviewed and merged:
forge channel add msteamsinteractive bubbletea wizard with the device-code flow capture and live/mevalidation. Today,forge channel add msteamsstill works non-interactively via the existing add/serve plumbing — it writes the template and the.envplaceholders; the operator runs the device-code curl themselves and fills in the refresh token. The wizard is purely UX sugar.docs/channels/msteams.mduser-facing setup guide (Entra app registration walkthrough, permission list, troubleshooting matrix, sovereign-cloud override notes).FORGE_PROJECT_DESIGN.mdChannel Connectors table row — separate doc-sync follow-up after merge, same pattern as PR docs: add linear and code-plan to embedded-skills index #72 for linear + code-plan.Open questions deferred (from #75 §14)
The four open questions from the issue are best resolved in the wizard PR, since they all affect UX:
delegatedper the template default.ev.WorkspaceIDcarries the Graphchat.idas-is; scheduler prompt already handles arbitrary strings.msteams-channels(in-team channel messages) — explicitly out of scope per [Feature]: Microsoft Teams channel adapter via Graph polling (outbound-only) #75 §15 (blocked on the Microsoft@odata.nextLinkbug).Reference patterns followed
forge-plugins/channels/telegram/telegram.go— polling-loop shape, env-var resolution viachannels.ResolveEnvVarsforge-plugins/channels/slack/slack.go— admission flow with self-loop guard ahead of allowlist (PR feat(slack): admit bot @mentions via allow_bot_ids, drop self always (closes #55) #56)forge-plugins/channels/slack/slack.goSendResponse— large-response summary + attachment + chunked fallback; same use ofa2a.Message.Summaryfrom PR feat(channels): LLM-generated summary for large responses (closes #64) #65forge-cli/build/channels_stage.go—_env-suffix scanner from PR fix: inject channel env vars into K8s manifests (closes #50) #54; no changes needed