feat(deploy): Streamable-HTTP MCP entrypoint and Azure deployment#103
Conversation
…ffolding - Add firefly-mcp-http CLI entrypoint serving FastMCP over Streamable HTTP with /healthz - Multi-stage Dockerfile (uv sync --extra rest --extra mcp, non-root) - Idempotent Azure provisioning script for rg-firefly (Log Analytics, ACR, Storage + blob, Key Vault, user-assigned MI with federated GH credential, Container Apps env, Container App) - GitHub Actions workflow building and deploying via OIDC - Auth left at the ingress layer (Entra/EasyAuth on Container Apps), no framework-level auth keys, aligning with the zero-trust direction in #98 Refs: #98
73bd099 to
644bcc6
Compare
Resources are live in rg-firefly; the script already drifted from reality (LAW customerId trim, MI propagation retries). If we ever rebuild from scratch, do it with Bicep/Terraform instead.
CodeQL flagged azure/login@v2 and docker/setup-buildx-action@v3 as mutable refs. Pin to the commit SHAs of v2.3.0 and v3.9.0 respectively.
Adds two MCP-exposed tools: - ingest_sharepoint(drive_id, corpus_id, root_folder?): pulls all changed files from a SharePoint drive (delta-based) and ingests them into a corpus via the existing rag.ingest pipeline. Auth via the Container App's managed identity → Microsoft Graph token. - query_corpus(corpus_id, question, top_k): hybrid retrieval (BM25 + dense) with citations. Caveats: - SharePointSource import is guarded; the tool raises NotImplementedError until feat/content-sources-sharepoint merges to main. - VectorStore is in-process InMemoryVectorStore — replaced when the blob-backed store lands. - SqliteCorpus persisted under /tmp/firefly/corpora/<corpus_id>.db (ephemeral on Container Apps replicas). Dockerfile syncs the rag, openai-embeddings, azure, markitdown and sqlite-vec extras so the imports resolve at runtime.
Design that replaces PR #103's tools/builtins/sharepoint_rag.py with a thin composition over the existing CorpusAgent + ContentSource Protocol. Promotes CorpusAgent into the library, adds LocalFolderSource, and collapses the parallel RAG stack PR #103 introduced. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Twelve TDD tasks covering: AnswerAgent + CorpusAgent moves into the library, LocalFolderSource, IngestSummary + ingest_source, retrieve() vs query() split, watch_source polling, CorpusNotFoundError, four MCP corpus_rag tools, span-prefix rename, operator deployment guide, and the PR #103 rebase checklist. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add a filesystem-based ContentSource implementation that yields files from a local folder recursively. Mirrors the cursor-based contract of remote sources (SharePoint, S3) so a single ingest pipeline can serve both local and remote corpora. Supports hidden-file filtering via FolderWatcher.is_hidden(). V1 is delta-less; future enhancements may add mtime-based incremental listing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds the canonical ingest API to CorpusAgent that drives the unified ContentSource loop: list_changed(cursor) → fetch → ingest_one → commit_delta. Introduces IngestSummary—a typed result wrapper with .results (list of IngestionResult), .cursor, and aggregate count properties (ingested, skipped, failed). This task is purely additive and does not modify the existing ingest_folder method. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Rename test to describe asserted behaviour (cursor IS committed after per-file fetch failures, since the iterator drained). - Use public current_cursor() in test assertions instead of reaching into the stub's private attribute. - Add TODO comment flagging that per-file fetch failures are not recorded in the IngestLedger today, so they're invisible to operational replay; tracked for Task 5 / follow-up. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…erSource Change ingest_folder to return IngestSummary instead of list[IngestionResult], delegating to the unified ingest_source pipeline via LocalFolderSource. This is a breaking change for callers; update all call sites to use the new IngestSummary shape (.results, .ingested, .skipped, .failed properties). Per-file filtering via FolderWatcher.is_hidden stays the same; the cursor contract and delta-less logic are now handled by LocalFolderSource. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Split the query pipeline into two public methods: - retrieve(question, top_k, rerank=True) runs expand→retrieve→rerank, returns list[ChunkHit] without LLM answer. Useful for MCP tools and callers that want to compose their own answer over the hits. - query(question, top_k) wraps retrieve() + answerer, returns Answer. Maintains the full pipeline behavior and telemetry. The refactor allows raw retrieval (with optional reranking) to be called independently, making the retrieval surface reusable for downstream callers. Tested with new unit tests asserting independence of the two methods: - retrieve() does not invoke the answer agent - query() invokes the answer agent - retrieve() respects the rerank flag All existing tests pass unchanged, validating backward compatibility. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Parity with ingest_source's failed-fetch IngestionResult literal; both methods now spell n_chunks out so future readers don't wonder whether the omission was meaningful. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Four MCP tools backed by the library-grade CorpusAgent: - ingest_corpus_filesystem(corpus_id, root_path) — uses LocalFolderSource - ingest_corpus_sharepoint(corpus_id, drive_id, root_folder?) — uses SharePointSource with the runtime's managed-identity Graph token - corpus_retrieve(corpus_id, question, top_k) — hybrid + rerank, no LLM answer; raises CorpusNotFoundError on unknown corpus_id - corpus_query(corpus_id, question, top_k) — full pipeline with citations Each call constructs a fresh CorpusAgent rooted at CORPUS_ROOT/<corpus_id>. No process-global registry; on-disk SqliteCorpus + SqliteVec carry state across requests. CORPUS_ROOT defaults to /tmp/firefly/corpora; operators should override for any non-toy deployment (Container Apps /tmp is ephemeral — see docs/deploy/corpus-persistence.md once Task 11 lands). Replaces tools/builtins/sharepoint_rag.py and updates cli/mcp_http.py's side-effect import accordingly. The deleted module reimplemented the RAG stack with worse defaults (TextChunker vs. MarkdownChunker, InMemoryVectorStore vs. SqliteVecVectorStore, no expander/reranker/ answerer, per-process _CORPORA registry that returned "corpus not found" on cold restart). corpus_rag is the canonical replacement. Note: tests catch ToolError (with CorpusNotFoundError as __cause__) because BaseTool.execute wraps domain exceptions in ToolError — consistent with the framework's exception-handling contract. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Standardize telemetry span names across the RAG pipeline to use the firefly.rag.* prefix consistently (matching metric/instrument names in _telemetry.py). Also add a timed_span around ingest_source with terminal status attributes (success, skipped, failed counts). Changes: - agent.py: corpus_search.retrieve -> firefly.rag.retrieve - agent.py: corpus_search.query -> firefly.rag.query - agent.py: wrap ingest_source body in firefly.rag.ingest_source span - answerer.py: corpus_search.answer -> firefly.rag.answer - spec: update KQL and architectural references to use firefly.rag.* names All 180 tests green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…Apps Documents the Azure Files volume approach for persisting corpora across Container Apps replicas / restarts, the multi-replica single-writer caveat for SqliteCorpus, the env-var surface consumed by the corpus_rag MCP tools, and the managed-identity scopes needed for SharePoint ingestion. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
| from fastapi import FastAPI | ||
|
|
||
| from fireflyframework_agentic.exposure.mcp.server import create_mcp_app | ||
| from fireflyframework_agentic.tools.builtins import corpus_rag # noqa: F401 — registers tools |
The module declared `log = logging.getLogger(__name__)` but never logged. Code-quality bot flagged it on PR #103. The MCP layer wraps tool exceptions in ToolError so they're already surfaced to the caller; if we want internal step-level logging we'll add it deliberately, not as dead code. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
python-dotenv passes \$PWD as a literal string; os.path.expandvars() resolves it to the actual working directory at runtime. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…OT default Groups vars by concern, removes trailing whitespace, replaces \$PWD with /tmp/firefly/corpora (python-dotenv does not expand shell variables), and adds a note pointing operators to the Azure Files mount for persistence. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
End-to-end auth flow: SharePoint → Claude CodeThere are two separate phases: indexing (done once or on a schedule) and querying (every user session). Phase 1 — Indexing: SharePoint → SQLite corpus on AzureRuns inside the
Prerequisite: Phase 2 — Querying: user in Claude Code → answerThe user's
The SharePoint data never leaves Azure. The user receives only the answer text and source references. Who authenticates what
Proposed: Claude Code user auth via Container Apps EasyAuth (respects #98)The architecture from #98 places auth at the experience layer (ingress), keeping the framework process completely unaware of credential material. The simplest implementation is Container Apps EasyAuth — it sits at the Container App ingress, the Python process is untouched. Implementation — no framework code changes needed1. Create an Entra App Registration az ad app create --display-name "firefly-mcp" --sign-in-audience AzureADMyOrg
APP_ID=$(az ad app list --display-name "firefly-mcp" --query '[0].appId' -o tsv)
az ad sp create --id $APP_ID2. Enable EasyAuth on the Container App az containerapp auth microsoft update \
-g rg-firefly -n firefly-mcp \
--client-id $APP_ID \
--tenant-id <tenant-id>
az containerapp auth update \
-g rg-firefly -n firefly-mcp \
--unauthenticated-client-action Return401
3. Claude Code users configure their token # Acquire token (valid ~1 hour)
az account get-access-token --resource api://$APP_ID --query accessToken -o tsv// ~/.claude/settings.json
{
"mcpServers": {
"firefly": {
"url": "https://firefly-mcp.mangosmoke-5d24814d.spaincentral.azurecontainerapps.io/mcp",
"headers": { "Authorization": "Bearer <token>" }
}
}
}A small wrapper script can automate token refresh before each session: TOKEN=$(az account get-access-token --resource api://$APP_ID --query accessToken -o tsv)
jq --arg t "$TOKEN" '.mcpServers.firefly.headers.Authorization = "Bearer \($t)"' \
~/.claude/settings.json > /tmp/s.tmp && mv /tmp/s.tmp ~/.claude/settings.jsonWhy this respects #98
|
…k directly Delete examples/corpus_search/agent.py, retrieval/answerer.py, and retrieval/sql.py — these were forwarding shims created during the CorpusAgent/AnswerAgent migration in PR #103. All consumers (example __init__, retrieval/__init__, cli.py, and test files) now import from fireflyframework_agentic.rag.* directly.
The module declared `log = logging.getLogger(__name__)` but never logged. Code-quality bot flagged it on PR #103. The MCP layer wraps tool exceptions in ToolError so they're already surfaced to the caller; if we want internal step-level logging we'll add it deliberately, not as dead code. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
feat(deploy): Streamable-HTTP MCP entrypoint and Azure deployment
…k directly Delete examples/corpus_search/agent.py, retrieval/answerer.py, and retrieval/sql.py — these were forwarding shims created during the CorpusAgent/AnswerAgent migration in PR #103. All consumers (example __init__, retrieval/__init__, cli.py, and test files) now import from fireflyframework_agentic.rag.* directly.
Summary
firefly-mcp-httpCLI: minimal FastAPI app exposing FastMCP at/mcpover Streamable HTTP plus/healthz. Reusescreate_mcp_app(); bypassesmount_http(which doesn't wire FastMCP's lifespan into the parent FastAPI —tools/listreturns 500 otherwise).corpus_rag.py:ingest_corpus_filesystem,ingest_corpus_sharepoint,corpus_retrieve,corpus_query— backed by library-gradeCorpusAgent(MarkdownChunker + SqliteVec + hybrid retrieval + reranker + answerer).Dockerfile(uv sync--extra rest --extra mcp, non-root,CMD ["firefly-mcp-http"])..github/workflows/deploy-mcp.yml— OIDC login → ACR build/push →az containerapp update.docs/deploy/corpus-persistence.md— operator guide for durableCORPUS_ROOTon Azure Container Apps.Auth model
No framework-level API keys. Auth belongs at the experience layer (Container Apps EasyAuth, future dedicated auth service) per the zero-trust direction in #98 — the framework itself stays unaware of bearer tokens.
Running locally and connecting to Claude Code
1. Copy and fill the template:
2. Start the server:
3. Register with Claude Code (once):
4. Test in Claude Code:
Ingest a local folder first (replace
my-corpusand path as needed):Then query it:
corpus_retrievereturns raw ranked chunks (no LLM answer) — useful for judging retrieval quality independently.corpus_queryruns the full pipeline and returns a grounded answer with citations.Azure resources (already provisioned in
rg-firefly,spaincentral)firefly-logsfireflysignaturefireflysignature/firefly-artifactskv-firefly-signaturefirefly-mcp-mifirefly-mcp-mitoken.actions.githubusercontent.com, subjectrepo:fireflyframework/fireflyframework-agentic:ref:refs/heads/main, audienceapi://AzureADTokenExchange.firefly-envfirefly-logs.firefly-mcphttps://firefly-mcp.mangosmoke-5d24814d.spaincentral.azurecontainerapps.io.GitHub repo secrets
AZURE_CLIENT_ID/AZURE_TENANT_IDare already configured for the workflow.Verification
docker buildx build -t firefly-mcp:dev .,docker run -p 8000:8000→/healthz200, MCPinitializereturns{"name":"firefly","version":"3.2.4"},tools/listreturns the four corpus RAG tools./healthzand MCP handshake succeed over HTTPS at the URL above.pytest tests/unit/cli/test_mcp_http.py— 2 passing.Refs: #98
Test plan
.env.template→.env, fill credentials, start withuv run dotenv -f .env run -- firefly-mcp-http.claude mcp add --transport http --scope user firefly-rag http://localhost:8080/mcp/.ingest_corpus_filesystemand verify counts returned.corpus_queryand confirm grounded answer with citations.uv run pytest tests/unit/cli/.docker buildx buildand probe/healthz+ MCPinitialize.firefly-mcpbefore exposing the URL beyond internal use.