Skip to content

SREP-4866: Add RHOBS MCP server for AI agent integration#892

Merged
openshift-merge-bot[bot] merged 1 commit into
openshift:masterfrom
dustman9000:rhobs-mcp-server
May 15, 2026
Merged

SREP-4866: Add RHOBS MCP server for AI agent integration#892
openshift-merge-bot[bot] merged 1 commit into
openshift:masterfrom
dustman9000:rhobs-mcp-server

Conversation

@dustman9000
Copy link
Copy Markdown
Member

@dustman9000 dustman9000 commented May 15, 2026

Summary

  • Adds osdctl rhobs mcp subcommand that starts a stdio-based MCP (Model Context Protocol) server
  • Exposes 3 tools for AI agents: rhobs_metrics (PromQL instant/range), rhobs_logs (LogQL), rhobs_alerts (Alertmanager)
  • Uses the official modelcontextprotocol/go-sdk maintained by the MCP org and Google
  • Reuses existing RhobsFetcher for cell resolution, OIDC auth, and API queries with no code duplication
  • Caches fetcher instances via sync.Map with singleflight deduplication to avoid repeated OCM lookups

Motivation

SREs using AI agents for operational work (alert investigation, cluster debugging) can connect this MCP server to query RHOBS directly. The server provides structured JSON responses with cell metadata, eliminating the need to shell out to CLI commands or manage query syntax manually. Compatible with any MCP-capable client (Claude Code, Cursor, Windsurf, custom agents, etc.).

Configuration

Add to your MCP client configuration (e.g., ~/.claude/mcp_settings.json for Claude Code):

{
  "mcpServers": {
    "rhobs": {
      "command": "osdctl",
      "args": ["rhobs", "mcp"]
    }
  }
}

Prerequisites: OCM login, vault login (vault.devshift.net), ~/.config/osdctl with rhobs_<env>_vault_path entries.

Tools

Tool Description Key Parameters
rhobs_metrics PromQL instant or range query cluster_id, query, optional start/end/step for range
rhobs_logs LogQL log query cluster_id, namespace or query, contain_regex, since, limit
rhobs_alerts Firing alerts from Alertmanager cluster_id

All responses include cell URL, cluster metadata, and environment. All tools are annotated as read-only.

Test plan

  • Unit tests pass: go test ./cmd/rhobs/ -v (73 subtests covering helpers, parameter validation, tool registration via in-memory transport, end-to-end tool call roundtrips)
  • MCP handshake: server initializes, returns 3 tools with correct schemas and read-only annotations
  • Live test production: metrics, alerts, logs against hs-mc-qj7qbfg8g / us-east-1-2 cell
  • Live test stage: metrics, alerts, logs against hs-mc-kbj34s6bg / us-west-2-0 cell
  • Error cases: missing params, invalid cluster, partial range args, bad duration, non-positive limit all return clean tool errors
  • Vault preflight: expired token fails fast with clear message instead of hanging
  • Docs regenerated and verified via make verify-docs
  • gofmt clean, all CI checks green

Jira: https://redhat.atlassian.net/browse/SREP-4866

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 15, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

Adds stdio MCP support for RHOBS: introduces MCP helper utilities, updates skip-version allowlist to include mcp, and wires the new mcp subcommand into the rhobs root command (PersistentPreRunE short-circuits for mcp).

Changes

MCP Server Integration for RHOBS

Layer / File(s) Summary
Skip-version allowlist
cmd/cmd.go
Adds "mcp" to getSkipVersionCommands() so version checks are suppressed when running the mcp command.
MCP helper utilities
cmd/rhobs/mcp_helpers.go (multiple ranges)
Adds a sync.Map fetcher cache and getCachedFetcher, mcpResultJSON, mcpError, and boolPtr helpers used by MCP tools and handlers.
rhobs root command wiring
cmd/rhobs/rootCmd.go (two ranges)
Short-circuits PersistentPreRunE when the command is mcp and registers newCmdMcp() on the rhobs root command.

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 10 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 20.83% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Test Structure And Quality ❓ Inconclusive Check specifies Ginkgo test patterns (Describe/It/BeforeEach), but mcp_test.go uses standard Go testing.T. Check framework mismatch makes it inapplicable. Check targets Ginkgo BDD tests; actual tests use standard Go testing.T with t.Run(). Standard Go tests meet quality criteria: focused behavior, defer cleanup, meaningful assertions.
✅ Passed checks (10 passed)
Check name Status Explanation
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names ✅ Passed The custom check requires stable Ginkgo test names. The PR adds only standard Go tests (testing.T with t.Run()), not Ginkgo tests (It(), Describe(), etc.). No Ginkgo usage found in modified files.
Microshift Test Compatibility ✅ Passed No Ginkgo e2e tests added. PR adds only standard Go unit tests using testing package (*testing.T). Check applies only to Ginkgo e2e tests, so not applicable here.
Single Node Openshift (Sno) Test Compatibility ✅ Passed No Ginkgo e2e tests added. Only standard Go unit tests (testing.T) for MCP tool handlers. Not e2e tests; no multi-node or HA cluster assumptions.
Topology-Aware Scheduling Compatibility ✅ Passed This PR adds a CLI tool for MCP server integration, not Kubernetes deployment manifests or controllers. No scheduling constraints or pod affinity are introduced.
Ote Binary Stdout Contract ✅ Passed No OTE stdout contract violations detected. All module-level code is pure. MCP code only executes on command invocation with log.SetOutput(io.Discard) set first.
Ipv6 And Disconnected Network Test Compatibility ✅ Passed No Ginkgo e2e tests added. This PR adds unit tests to a CLI tool repository using standard Go testing package. Custom check is not applicable.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: adding an RHOBS MCP server for Claude Code integration, which is supported by the file changes across cmd/rhobs, docs, and go.mod.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci openshift-ci Bot requested review from MateSaary and petrkotas May 15, 2026 00:59
@openshift-ci openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 15, 2026
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🧹 Nitpick comments (3)
cmd/rhobs/mcp_test.go (1)

232-264: 🏗️ Heavy lift

These tests don’t validate the behaviors named in the test titles.

TestHandleLogs_InvalidSinceDuration and TestHandleLogs_LimitCapping both fail earlier on fetcher/cluster lookup and only assert a generic tool error. That means invalid since parsing and limit capping can regress without failing these tests. Please make these paths deterministic (e.g., stub fetcher or validate these inputs before fetcher resolution) and assert the specific expected outcome.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cmd/rhobs/mcp_test.go` around lines 232 - 264, The tests
TestHandleLogs_InvalidSinceDuration and TestHandleLogs_LimitCapping are not
exercising the behaviors in their names because handleLogs currently fails
earlier during fetcher/cluster lookup; make the path deterministic by either (A)
injecting or stubbing a fetcher that returns a successful, controlled response
for the test cluster_id (e.g., create a mock fetcher and pass it into handleLogs
or the request context so the handler proceeds to parse the "since" and apply
limit capping), or (B) move/duplicate the input validation logic into handleLogs
before any fetcher/cluster resolution so parsing of "since" (use the same parser
called in handleLogs) and capping of "limit" are exercised; update
TestHandleLogs_InvalidSinceDuration to assert a parse error/result from the
since parsing code and TestHandleLogs_LimitCapping to assert the capped limit
value (referencing handleLogs, makeRequest, and the cluster/fetcher resolution
logic to locate where to inject the mock or relocate validation).
cmd/rhobs/mcpCmd.go (1)

35-37: ⚡ Quick win

Remove the no-op PersistentPreRunE block for code clarity.

Line 35–37 contains an empty PersistentPreRunE that adds no value. With EnableTraverseRunHooks = true explicitly set in main.go, the parent hook in rootCmd.go will execute correctly and does not need a child hook placeholder.

Proposed diff
-		PersistentPreRunE: func(cmd *cobra.Command, args []string) error {
-			return nil
-		},
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cmd/rhobs/mcpCmd.go` around lines 35 - 37, Remove the no-op PersistentPreRunE
function from the command definition (the anonymous func assigned to
PersistentPreRunE in cmd/rhobs/mcpCmd.go) since it does nothing and parent hooks
run via EnableTraverseRunHooks; simply delete the PersistentPreRunE field so the
command relies on the parent hook (rootCmd) behavior.
cmd/rhobs/mcp_helpers.go (1)

13-23: ⚡ Quick win

Avoid duplicate fetcher creation under concurrent cache misses.

Two concurrent requests for the same key can both pass Load and both execute CreateRhobsFetcher(...) before either Store, causing redundant OCM/Vault work.

Suggested direction
+// Option A: guard initialization with per-key singleflight
+// Option B: protect create/store with a mutex and double-check Load before Store

A per-key singleflight around CreateRhobsFetcher is the cleanest way to preserve cache intent during parallel MCP calls.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cmd/rhobs/mcp_helpers.go` around lines 13 - 23, getCachedFetcher currently
races on concurrent cache misses: multiple goroutines can Load the same key,
then each call CreateRhobsFetcher and Store duplicate entries; wrap the
CreateRhobsFetcher call with a per-key singleflight to ensure only one creation
runs for a given key, using the key used with fetcherCache; on singleflight
result, cast/return the *RhobsFetcher and only Store into fetcherCache once (or
rely on the singleflight result) so redundant OCM/Vault work is avoided.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@cmd/rhobs/mcp_test.go`:
- Around line 268-272: TestRegisterMcpTools is vacuous; after calling
registerMcpTools(s) use the server's introspection API (e.g.,
s.RegisteredTools(), s.ListTools(), or the s.Tools slice/field) to assert tools
were actually registered: check the tool count is > 0 and that the expected tool
names from registerMcpTools are present. Update TestRegisterMcpTools to call
that introspection method on the server.NewMCPServer instance and use
t.Fatalf/t.Errorf or require/assert to fail the test when the count or expected
names do not match.

In `@cmd/rhobs/mcp_tools.go`:
- Around line 137-146: The rawQuery branch for building lokiExpr currently
bypasses cluster scoping; ensure lokiExpr always includes the
openshift_cluster_id filter by appending ` | openshift_cluster_id = "<id>"` when
rawQuery is non-empty (unless the provided rawQuery already contains
`openshift_cluster_id`), so update the logic around rawQuery/lokiExpr to append
fetcher.clusterExternalId and/or check for an existing openshift_cluster_id
token; reference variables to change: rawQuery, lokiExpr,
fetcher.clusterExternalId, containRegex, namespace.
- Around line 97-99: The fallback currently drops the metadata envelope when
json.Unmarshal(rawResult, &parsed) fails; change the fallback to preserve the
envelope fields (cell, cluster_id, environment) and return a structured result
that includes the raw payload as a field. Specifically, where
json.Unmarshal(rawResult, &parsed) is used (variables rawResult and parsed and
the call to mcp.NewToolResultText), build and return an mcp.ToolResult (or the
existing envelope type used elsewhere) populated with
parsed.Cell/ClusterID/Environment if available (or defaulted from earlier
context) and put string(rawResult) into a RawPayload or Payload field instead of
returning plain text; apply the same change at both occurrences (around the
json.Unmarshal at lines referenced and the other occurrence near lines 190-191)
so metadata is preserved on parse failure.
- Around line 91-92: The handler currently treats a single supplied time bound
as an instant query by falling through when only one of start/end is set; change
the validation where start and end are checked so that you explicitly return a
validation error (e.g., HTTP 400) if exactly one of start or end is provided. In
practice, update the branch around the start/end check that calls
fetcher.QueryRangeMetrics to first validate (start != "" && end != "") or (start
== "" && end == "") and if only one is present return an error response
describing that both start and end are required for range queries; keep the
existing fetcher.QueryRangeMetrics call unchanged when both are present.

In `@docs/README.md`:
- Around line 4319-4320: Update the MCP server synopsis text that currently
reads "Start a stdio-based MCP (Model Context Protocol) server that exposes
RHOBS metrics and logs querying as tools for Claude Code" to also mention the
alerts tool: include "and alerts (rhobs_alerts)" or similar so the description
lists metrics, logs and rhobs_alerts; edit the sentence containing "Start a
stdio-based MCP (Model Context Protocol) server that exposes" to explicitly
reference rhobs_alerts as one of the available tools.

---

Nitpick comments:
In `@cmd/rhobs/mcp_helpers.go`:
- Around line 13-23: getCachedFetcher currently races on concurrent cache
misses: multiple goroutines can Load the same key, then each call
CreateRhobsFetcher and Store duplicate entries; wrap the CreateRhobsFetcher call
with a per-key singleflight to ensure only one creation runs for a given key,
using the key used with fetcherCache; on singleflight result, cast/return the
*RhobsFetcher and only Store into fetcherCache once (or rely on the singleflight
result) so redundant OCM/Vault work is avoided.

In `@cmd/rhobs/mcp_test.go`:
- Around line 232-264: The tests TestHandleLogs_InvalidSinceDuration and
TestHandleLogs_LimitCapping are not exercising the behaviors in their names
because handleLogs currently fails earlier during fetcher/cluster lookup; make
the path deterministic by either (A) injecting or stubbing a fetcher that
returns a successful, controlled response for the test cluster_id (e.g., create
a mock fetcher and pass it into handleLogs or the request context so the handler
proceeds to parse the "since" and apply limit capping), or (B) move/duplicate
the input validation logic into handleLogs before any fetcher/cluster resolution
so parsing of "since" (use the same parser called in handleLogs) and capping of
"limit" are exercised; update TestHandleLogs_InvalidSinceDuration to assert a
parse error/result from the since parsing code and TestHandleLogs_LimitCapping
to assert the capped limit value (referencing handleLogs, makeRequest, and the
cluster/fetcher resolution logic to locate where to inject the mock or relocate
validation).

In `@cmd/rhobs/mcpCmd.go`:
- Around line 35-37: Remove the no-op PersistentPreRunE function from the
command definition (the anonymous func assigned to PersistentPreRunE in
cmd/rhobs/mcpCmd.go) since it does nothing and parent hooks run via
EnableTraverseRunHooks; simply delete the PersistentPreRunE field so the command
relies on the parent hook (rootCmd) behavior.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 2b5aafe3-45bd-4b3a-aabb-99cf3607fc15

📥 Commits

Reviewing files that changed from the base of the PR and between 5b99f69 and b37488f.

⛔ Files ignored due to path filters (1)
  • go.sum is excluded by !**/*.sum
📒 Files selected for processing (11)
  • cmd/cmd.go
  • cmd/rhobs/mcpCmd.go
  • cmd/rhobs/mcp_helpers.go
  • cmd/rhobs/mcp_query.go
  • cmd/rhobs/mcp_test.go
  • cmd/rhobs/mcp_tools.go
  • cmd/rhobs/rootCmd.go
  • docs/README.md
  • docs/osdctl_rhobs.md
  • docs/osdctl_rhobs_mcp.md
  • go.mod

Comment thread cmd/rhobs/mcp_test.go
Comment thread cmd/rhobs/mcp_tools.go
Comment thread cmd/rhobs/mcp_tools.go
Comment thread cmd/rhobs/mcp_tools.go
Comment thread docs/README.md Outdated
@dustman9000 dustman9000 changed the title Add RHOBS MCP server for Claude Code integration SREP-4866: Add RHOBS MCP server for Claude Code integration May 15, 2026
@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label May 15, 2026
@openshift-ci-robot
Copy link
Copy Markdown

openshift-ci-robot commented May 15, 2026

@dustman9000: This pull request references SREP-4866 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "5.0.0" version, but no target version was set.

Details

In response to this:

Summary

  • Adds osdctl rhobs mcp subcommand that starts a stdio-based MCP (Model Context Protocol) server
  • Exposes 3 tools for Claude Code: rhobs_metrics (PromQL instant/range), rhobs_logs (LogQL), rhobs_alerts (Alertmanager)
  • Reuses existing RhobsFetcher for cell resolution, OIDC auth, and API queries with no code duplication
  • Caches fetcher instances via sync.Map to avoid repeated OCM lookups across queries

Motivation

SREs using Claude Code for operational work (alert investigation, cluster debugging) currently shell out to osdctl rhobs via Bash tool calls. An MCP server provides structured JSON responses, eliminates CLI syntax from context, and saves tokens on every conversation.

Configuration

Add to ~/.claude/mcp_settings.json:

{
 "mcpServers": {
   "rhobs": {
     "command": "osdctl",
     "args": ["rhobs", "mcp"]
   }
 }
}

Prerequisites: OCM login, vault login (vault.devshift.net), ~/.config/osdctl with rhobs_<env>_vault_path entries.

Tools

Tool Description Key Parameters
rhobs_metrics PromQL instant or range query cluster_id, query, optional start/end/step for range
rhobs_logs LogQL log query cluster_id, namespace or query, contain_regex, since, limit
rhobs_alerts Firing alerts from Alertmanager cluster_id

All responses include cell URL, cluster metadata, and environment. All tools are annotated as read-only.

Test plan

  • Unit tests pass: go test ./cmd/rhobs/ -v (22 tests covering helpers, parameter validation, tool registration)
  • MCP handshake works: printf '...\n' | osdctl rhobs mcp returns valid JSON-RPC initialize response
  • tools/list returns 3 tools with correct schemas and read-only annotations
  • Live test: rhobs_metrics with instant query returns filtered results
  • Live test: rhobs_metrics with range query (start/end) returns time series
  • Live test: rhobs_logs with namespace returns log entries
  • Live test: rhobs_alerts returns firing alerts from cell
  • Error cases: missing params, invalid cluster, bad duration all return clean tool errors
  • Docs regenerated via make generate-docs

Summary by CodeRabbit

Release Notes

  • New Features

  • Added osdctl rhobs mcp command to start a Model Context Protocol server that exposes RHOBS metrics and logs as Claude Code tools for integration.

  • Documentation

  • Added documentation for the new MCP server command, including configuration instructions and prerequisites.

  • Tests

  • Added comprehensive test coverage for MCP tool handlers and utilities.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@cmd/rhobs/mcp_tools.go`:
- Around line 167-169: Validate and reject non-positive limit and since values
before calling Loki: check if limit <= 0 or since <= 0 at the start of the
handler that reads the limit and since parameters (the same check also needs to
be added in the other occurrence around the block at lines 187-194), and return
a 400 Bad Request with a clear message instead of proceeding; keep the existing
upper cap (if limit > 10000 then limit = 10000) but perform that after
validating limit is positive. Ensure you update both spots that use the
variables limit and since to prevent submitting invalid queries to Loki.
- Around line 102-103: The handlers are discarding the incoming context (e.g.,
handleMetrics currently takes _ context.Context) and calling backend fetches
with context.Background(); change the function signatures to accept a named ctx
context.Context (replace the underscore) and pass that ctx into all backend
calls instead of context.Background(); apply the same replacement in the other
handler functions flagged in the review so cancellation/deadline/values from the
incoming MCP request are propagated.

In `@cmd/rhobs/mcpCmd.go`:
- Line 49: The MCP server run is using context.Background(), which ignores Cobra
command cancellation; change the call to use the Cobra command context (e.g.
replace server.Run(context.Background(), &mcp.StdioTransport{}) with
server.Run(cmd.Context(), &mcp.StdioTransport{})), ensuring the surrounding
function has access to the cobra *Command (cmd) so cancellation/interrupts
propagate into server.Run.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 25d03549-56d8-483e-b322-0a1a0bdde267

📥 Commits

Reviewing files that changed from the base of the PR and between b37488f and e265bf7.

⛔ Files ignored due to path filters (1)
  • go.sum is excluded by !**/*.sum
📒 Files selected for processing (11)
  • cmd/cmd.go
  • cmd/rhobs/mcpCmd.go
  • cmd/rhobs/mcp_helpers.go
  • cmd/rhobs/mcp_query.go
  • cmd/rhobs/mcp_test.go
  • cmd/rhobs/mcp_tools.go
  • cmd/rhobs/rootCmd.go
  • docs/README.md
  • docs/osdctl_rhobs.md
  • docs/osdctl_rhobs_mcp.md
  • go.mod
✅ Files skipped from review due to trivial changes (1)
  • docs/osdctl_rhobs.md
🚧 Files skipped from review as they are similar to previous changes (3)
  • cmd/rhobs/rootCmd.go
  • cmd/cmd.go
  • cmd/rhobs/mcp_query.go

Comment thread cmd/rhobs/mcp_tools.go Outdated
Comment thread cmd/rhobs/mcp_tools.go
Comment thread cmd/rhobs/mcpCmd.go Outdated
@dustman9000 dustman9000 force-pushed the rhobs-mcp-server branch 2 times, most recently from d6f6de3 to 7eca0e8 Compare May 15, 2026 02:29
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@cmd/rhobs/mcpCmd.go`:
- Around line 13-20: The checkVaultToken function currently runs exec.Command
without a timeout and returns a single generic error; change its signature to
accept a context.Context (rename to checkVaultToken(ctx context.Context)), use
exec.LookPath("vault") to detect missing CLI and return a distinct error, run
the command with exec.CommandContext using a short timeout derived from the
passed ctx (e.g. context.WithTimeout) and capture stdout/stderr into a buffer,
and then differentiate errors: if ctx deadline exceeded return a
timeout-specific error, if the command exited with a non-zero exit code
(exec.ExitError) return an error indicating token lookup failed (suggesting
login), and for other errors return the underlying error (e.g.
network/permission) with the captured stderr to aid debugging; update the caller
that invokes checkVaultToken to pass its context so deadlines propagate.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 0e151ff8-b331-4383-aafc-a4610b0319aa

📥 Commits

Reviewing files that changed from the base of the PR and between e265bf7 and 7eca0e8.

⛔ Files ignored due to path filters (1)
  • go.sum is excluded by !**/*.sum
📒 Files selected for processing (11)
  • cmd/cmd.go
  • cmd/rhobs/mcpCmd.go
  • cmd/rhobs/mcp_helpers.go
  • cmd/rhobs/mcp_query.go
  • cmd/rhobs/mcp_test.go
  • cmd/rhobs/mcp_tools.go
  • cmd/rhobs/rootCmd.go
  • docs/README.md
  • docs/osdctl_rhobs.md
  • docs/osdctl_rhobs_mcp.md
  • go.mod
✅ Files skipped from review due to trivial changes (1)
  • docs/osdctl_rhobs.md
🚧 Files skipped from review as they are similar to previous changes (7)
  • cmd/rhobs/rootCmd.go
  • cmd/cmd.go
  • cmd/rhobs/mcp_helpers.go
  • go.mod
  • cmd/rhobs/mcp_query.go
  • cmd/rhobs/mcp_tools.go
  • cmd/rhobs/mcp_test.go

Comment thread cmd/rhobs/mcpCmd.go Outdated
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@cmd/rhobs/mcp_helpers.go`:
- Around line 13-23: getCachedFetcher currently has a check-then-act race:
multiple goroutines can miss fetcherCache.Load and each call CreateRhobsFetcher
for the same key; fix by deduplicating initialization (e.g., use a
singleflight.Group or sync.Map.LoadOrStore) so only one goroutine runs
CreateRhobsFetcher for a given key and others wait and receive the stored
result; implement this inside getCachedFetcher using the existing fetcherCache
and CreateRhobsFetcher (or add a package-level singleflight.Group) to ensure a
single creation and then store the created *RhobsFetcher in fetcherCache before
returning.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: af2041dd-45b1-4baf-b17d-0d95fda079f7

📥 Commits

Reviewing files that changed from the base of the PR and between 7eca0e8 and 454fcd1.

⛔ Files ignored due to path filters (1)
  • go.sum is excluded by !**/*.sum
📒 Files selected for processing (11)
  • cmd/cmd.go
  • cmd/rhobs/mcpCmd.go
  • cmd/rhobs/mcp_helpers.go
  • cmd/rhobs/mcp_query.go
  • cmd/rhobs/mcp_test.go
  • cmd/rhobs/mcp_tools.go
  • cmd/rhobs/rootCmd.go
  • docs/README.md
  • docs/osdctl_rhobs.md
  • docs/osdctl_rhobs_mcp.md
  • go.mod
✅ Files skipped from review due to trivial changes (1)
  • docs/osdctl_rhobs.md
🚧 Files skipped from review as they are similar to previous changes (5)
  • go.mod
  • cmd/rhobs/mcpCmd.go
  • cmd/rhobs/mcp_query.go
  • cmd/rhobs/mcp_test.go
  • cmd/rhobs/mcp_tools.go

Comment thread cmd/rhobs/mcp_helpers.go Outdated
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@cmd/rhobs/mcp_query.go`:
- Around line 156-179: The loop currently drops entries whose getTimeStamp()
equals edgeTimeStamp and adjusts endTimeStamp in a way that loses tied-timestamp
rows; change the pagination to preserve ties or fail explicitly: when computing
edgeTimeStamp (from flattenedResults[len(flattenedResults)-1].getTimeStamp())
detect how many entries share that timestamp and do not skip all ts ==
edgeTimeStamp in the for loop (replace the ts != edgeTimeStamp check with logic
that either includes all entries with that timestamp into entries until
logsCount is satisfied, or if including them would overflow the page, set
endTimeStamp = edgeTimeStamp (not edgeTimeStamp-1) and return an explicit
error/flag indicating an ambiguous boundary so the caller can retry with a
different strategy); update the code paths around flattenedResults,
edgeTimeStamp, endTimeStamp, and the entries append (mcpLogEntry, getTimeStamp,
getTime, getMessage, result.Stream) accordingly so tied timestamps are never
silently dropped.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: c5be0d02-cccb-43cf-94d5-405c624496f7

📥 Commits

Reviewing files that changed from the base of the PR and between 454fcd1 and 15caf5c.

⛔ Files ignored due to path filters (1)
  • go.sum is excluded by !**/*.sum
📒 Files selected for processing (11)
  • cmd/cmd.go
  • cmd/rhobs/mcpCmd.go
  • cmd/rhobs/mcp_helpers.go
  • cmd/rhobs/mcp_query.go
  • cmd/rhobs/mcp_test.go
  • cmd/rhobs/mcp_tools.go
  • cmd/rhobs/rootCmd.go
  • docs/README.md
  • docs/osdctl_rhobs.md
  • docs/osdctl_rhobs_mcp.md
  • go.mod
✅ Files skipped from review due to trivial changes (2)
  • cmd/cmd.go
  • docs/osdctl_rhobs.md
🚧 Files skipped from review as they are similar to previous changes (6)
  • cmd/rhobs/rootCmd.go
  • go.mod
  • cmd/rhobs/mcpCmd.go
  • cmd/rhobs/mcp_helpers.go
  • cmd/rhobs/mcp_tools.go
  • cmd/rhobs/mcp_test.go

Comment thread cmd/rhobs/mcp_query.go
@dustman9000
Copy link
Copy Markdown
Member Author

/retest

@dustman9000
Copy link
Copy Markdown
Member Author

/retest-required

@dustman9000
Copy link
Copy Markdown
Member Author

All CodeRabbit review comments have been addressed in the current squashed commit. Summarizing for reviewers since force pushes invalidated the old line references:

# Comment Resolution
1 Vacuous TestRegisterMcpTools Uses in-memory transport roundtrip with ListTools to assert 3 tools by name + annotations
2 Partial range args silently fall back Explicit validation: start and end must be provided together
3 JSON parse fallback drops metadata Envelope preserved with result_raw/alerts_raw field
4 Raw query bypasses cluster scoping Intentional, matches existing osdctl rhobs logs --query CLI behavior
5 Synopsis missing alerts Updated to "metrics, logs, and alerts"
6 Context propagation All handlers use incoming ctx, not context.Background()
7 Non-positive limit/since Validation added before fetcher call
8 cmd.Context() for server run Done, removed unused context import
9 Vault preflight timeout 10s timeout via exec.CommandContext, distinct errors for missing CLI/timeout/expired
10 Fetcher race condition singleflight.Group deduplication
11 Log pagination tied timestamps Mirrors existing PrintLogs in requests.go, out of scope for this PR

@dustman9000 dustman9000 changed the title SREP-4866: Add RHOBS MCP server for Claude Code integration SREP-4866: Add RHOBS MCP server for AI agent integration May 15, 2026
Copy link
Copy Markdown
Contributor

@geowa4 geowa4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a couple of questions/thoughts, nothing blocking.

Comment thread cmd/rhobs/mcp_tools.go Outdated
Comment thread cmd/rhobs/mcpCmd.go Outdated
@dustman9000 dustman9000 force-pushed the rhobs-mcp-server branch 4 times, most recently from 8c46aaf to 7a0a4e2 Compare May 15, 2026 14:32
@geowa4
Copy link
Copy Markdown
Contributor

geowa4 commented May 15, 2026

/lgtm

@openshift-ci openshift-ci Bot added the lgtm Indicates that a PR is ready to be merged. label May 15, 2026
Add `osdctl rhobs mcp` with two subcommands:
- `server`: starts a stdio-based MCP server exposing RHOBS tools
- `config`: prints MCP client configuration JSON for easy setup

Usage: claude --mcp-config "$(osdctl rhobs mcp config)"

Uses the official modelcontextprotocol/go-sdk maintained by the MCP
org and Google. Compatible with any MCP client (Claude Code, Cursor,
Windsurf, custom agents).

Tools (all read-only, with OutputSchema):
- rhobs_metrics: PromQL instant/range queries against Thanos
- rhobs_logs: LogQL queries against Loki
- rhobs_alerts: Firing alerts from Alertmanager

Covers ROSA HCP infrastructure: hosted clusters, Management Clusters
(MC), and Service Clusters (SC). Accepts any cluster ID; the correct
RHOBS cell is resolved automatically.
@openshift-ci openshift-ci Bot removed the lgtm Indicates that a PR is ready to be merged. label May 15, 2026
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 15, 2026

@dustman9000: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@geowa4
Copy link
Copy Markdown
Contributor

geowa4 commented May 15, 2026

/lgtm

@openshift-ci openshift-ci Bot added the lgtm Indicates that a PR is ready to be merged. label May 15, 2026
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 15, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dustman9000, geowa4

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-bot openshift-merge-bot Bot merged commit 629b5e5 into openshift:master May 15, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants