Skip to content

E2E tests: optimizer tiers and regression #4889

@yrobla

Description

@yrobla

Description

Add Go/Ginkgo E2E tests in test/e2e/vmcp_optimizer_test.go covering all optimizer tiers and the standalone regression path: FTS5-only mode (--optimizer) exposes only find_tool/call_tool with BM25 keyword search, managed TEI mode (--optimizer-embedding) auto-starts a TEI container and enables semantic search, fail-fast behaviour when TEI fails to start, idempotent TEI container reuse across two successive invocations with the same model, and a regression test confirming that the standalone vmcp serve binary works identically after the Phase 1 extraction refactor. These tests close out the optimizer implementation chain and provide the definitive end-to-end validation gate for RFC THV-0059 Phase 4.

Context

#4887 wired --optimizer, --optimizer-embedding, --embedding-model, and --embedding-image flags into cmd/thv/app/vmcp.go and integrated EmbeddingServiceManager from #4884 into the serve lifecycle in pkg/vmcp/cli/serve.go. #4888 established the Go/Ginkgo E2E test file test/e2e/vmcp_cli_test.go with the StartLongRunningTHVCommand, WaitForMCPServerReady, and group/workload setup patterns that this item extends. The TEI container is named thv-embedding-<model-short-hash>, idempotent across invocations with the same model, and its start is a hard failure when --optimizer-embedding is explicitly given — no silent FTS5 fallback.

The regression test for standalone vmcp serve is critical because the Phase 1 extraction in #4879 moved ~350 lines of runServe business logic out of cmd/vmcp/app/commands.go into pkg/vmcp/cli/serve.go, making the standalone binary a thin wrapper. An E2E test guards against any accidental behaviour change for Kubernetes/operator deployments.

Dependencies: Depends on #4887 (optimizer flags wired in), #4888 (basic E2E infrastructure established in test/e2e/vmcp_cli_test.go)
Blocks: (none — final item in the optimizer chain)

Acceptance Criteria

  • A new file test/e2e/vmcp_optimizer_test.go exists with SPDX header, package e2e_test, and a Ginkgo Describe("vMCP CLI optimizer", ...) block with Label("vmcp", "optimizer", "e2e")
  • FTS5 optimizer test: thv vmcp serve --group <name> --optimizer starts successfully, the MCP client connects, ListTools returns exactly the two tools find_tool and call_tool (and no direct backend tool names), and calling find_tool with a keyword query returns a non-error result
  • TEI managed optimizer test (requires Docker and the TEI CPU image): thv vmcp serve --group <name> --optimizer-embedding starts a thv-embedding-<hash> container, the MCP client connects, ListTools returns find_tool and call_tool, and calling find_tool with a semantic query returns a non-error result; the TEI container is stopped when thv vmcp serve exits
  • Fail-fast test: invoke thv vmcp serve --group <name> --optimizer-embedding --embedding-image ghcr.io/invalid/nonexistent:latest (or equivalent bad image); assert the process exits non-zero within a reasonable timeout and stderr/stdout contains a clear error referencing the TEI failure — no silent fallback to FTS5
  • Idempotent TEI reuse test: start thv vmcp serve --optimizer-embedding (instance A), note the container name (thv-embedding-<hash>), stop instance A, start a second thv vmcp serve --optimizer-embedding with the same model (instance B), assert the container count for prefix thv-embedding- does not increase (i.e., instance B reuses the same container), stop instance B
  • Standalone vmcp serve regression test: start the vmcp binary (not thv) via exec.Command with a valid YAML config and a populated backend group, assert it starts successfully, connect an MCP client, assert ListTools returns at least one backend tool, and stop the process cleanly; the test must pass identically before and after the Phase 1 refactor
  • All background thv vmcp serve and vmcp processes are stopped via DeferCleanup / AfterEach even on test failure; no process leaks
  • Group workloads and groups created in BeforeEach are removed in AfterEach via existing StopAndRemoveMCPServer and RemoveGroup helpers
  • The TEI E2E tests are tagged with an additional Label("requires-docker") and are skipped when the SKIP_DOCKER_TESTS environment variable is set to true (or when Docker is unavailable), so CI without Docker does not fail
  • Random ports for thv vmcp serve --port <n> and vmcp serve --port <n> are allocated via net.Listen("tcp", "127.0.0.1:0") and closed before passing the port number to the command
  • WaitForMCPServerReady with a 90-second timeout (TEI model download may take 30–60 s on first pull) is used for the TEI managed optimizer test
  • All existing tests pass (no regressions)
  • Code reviewed and approved

Technical Approach

Recommended Implementation

Create test/e2e/vmcp_optimizer_test.go. The file contains a single top-level Describe("vMCP CLI optimizer", ...) block with four Context blocks: FTS5, TEI managed, error cases, and standalone regression.

Shared setup in BeforeEach: create a unique group, run a backend workload (thv run fetch --group <name>), wait for the workload to be ready via WaitForMCPServer, and track it for cleanup. Allocate a free port with allocateFreePort() (the local helper from vmcp_cli_test.go or a duplicate in this file).

FTS5 optimizer context (Context("FTS5 optimizer (--optimizer)", ...)):

thv vmcp serve --group <name> --optimizer --port <port>

Start as a background process with StartLongRunningTHVCommand. Poll readiness with WaitForMCPServerReady(config, "http://127.0.0.1:<port>/sse", "sse", 60*time.Second). Connect NewMCPClientForSSE, initialize, call ListTools. Assert:

  • len(tools.Tools) == 2
  • tool names are {"find_tool", "call_tool"} (no backend tool names exposed directly)
  • call find_tool with {"query": "fetch"} and assert the result has no error

TEI managed optimizer context (Context("TEI managed optimizer (--optimizer-embedding)", Label("requires-docker"), ...)):

thv vmcp serve --group <name> --optimizer-embedding --port <port>

Use a 90-second readiness timeout for WaitForMCPServerReady to accommodate the model download. After the server is ready:

  • Assert ListTools returns find_tool and call_tool
  • Call find_tool with {"query": "a semantic query"} and assert no error
  • Check that docker ps --filter name=thv-embedding- shows exactly one matching container (Docker CLI invoked via exec.Command("docker", "ps", "--filter", "name=thv-embedding-", "--format", "{{.Names}}"))
  • Kill the serve process and assert the thv-embedding-* container is stopped/removed (poll with docker ps for up to 15 seconds)

Fail-fast context (Context("fail-fast when TEI fails to start", ...)):

thv vmcp serve --group <name> --optimizer-embedding \
    --embedding-image ghcr.io/invalid/nonexistent:latest --port <port>

Run this synchronously with RunWithTimeout(120 * time.Second). Assert the command returns a non-nil error. Assert stderr contains a substring indicating TEI failure (e.g., "TEI", "embedding", or "failed to start").

Idempotent TEI reuse context (Context("idempotent TEI container reuse", Label("requires-docker"), ...)):

  1. Start instance A with --optimizer-embedding, wait for readiness.
  2. Record containersBefore = count of docker ps output lines matching thv-embedding-.
  3. Stop instance A (SIGINT + wait).
  4. Start instance B with the same --optimizer-embedding and default model.
  5. Wait for readiness.
  6. Assert docker ps output count matching thv-embedding- equals containersBefore (no new container created).
  7. Stop instance B; clean up.

Standalone vmcp regression context (Context("standalone vmcp serve regression", ...)):

  1. Locate the vmcp binary via os.Getenv("VMCP_BINARY"), falling back to searching PATH.
  2. Generate a minimal YAML config file in os.MkdirTemp referencing the test group's workloads via static backend URLs obtained from GetMCPServerURL.
  3. Start vmcp serve --config <tmpfile> --port <port> with exec.Command (not THVCommand), capturing stdout/stderr to GinkgoWriter.
  4. Poll readiness with WaitForMCPServerReady.
  5. Connect NewMCPClientForSSE, assert ListTools returns at least one tool.
  6. SIGINT the process and wait.

Patterns & Frameworks

  • Go/Ginkgo v2: Describe / Context / It / BeforeEach / AfterEach / DeferCleanup / Eventually / By — consistent with group_test.go and api_workload_lifecycle_test.go
  • DeferCleanup before starting each background process: register the SIGINT + Wait cleanup before the process starts, so cleanup always runs even if the It body panics mid-way
  • StartLongRunningTHVCommand from helpers.go: use for all background thv vmcp serve invocations; it pipes stdout/stderr to GinkgoWriter for CI debugging
  • WaitForMCPServerReady from mcp_client_helpers.go: pass "sse" as the mode and a 90-second timeout for TEI-backed tests
  • MCPClientHelper.ListTools + MCPClientHelper.ExpectToolExists from mcp_client_helpers.go: assert optimizer tool exposure
  • SPDX header: // SPDX-FileCopyrightText: Copyright 2025 Stacklok, Inc. / // SPDX-License-Identifier: Apache-2.0
  • No gomock in E2E: real subprocesses and Docker commands only
  • Label("requires-docker") on TEI tests + Skip(...) guard on SKIP_DOCKER_TESTS=true

Code Pointers

  • test/e2e/helpers.goStartLongRunningTHVCommand, WaitForMCPServer, StopAndRemoveMCPServer, RemoveGroup, CreateAndTrackGroup, GenerateUniqueServerName, CheckTHVBinaryAvailable — all reusable from the new test file
  • test/e2e/mcp_client_helpers.goNewMCPClientForSSE, MCPClientHelper.Initialize, MCPClientHelper.ListTools, MCPClientHelper.ExpectToolExists, MCPClientHelper.CallTool, WaitForMCPServerReady — MCP connectivity helpers
  • test/e2e/vmcp_cli_test.go (E2E tests: quick mode and config-file mode #4888) — allocateFreePort() helper and the Describe("vMCP CLI", ...) block structure to match; AfterEach cleanup pattern for background processes
  • test/e2e/e2e_suite_test.go — Ginkgo suite infrastructure; the new test file joins the same suite automatically
  • test/e2e/inspector_test.go — Pattern for starting a long-running subcommand, SIGINT cleanup in AfterEach
  • test/e2e/group_test.go — Group + workload setup/teardown with BeforeEach/AfterEach
  • test/e2e/api_workload_lifecycle_test.goEventually(..., 60*time.Second, 2*time.Second) polling pattern and By(...) step annotations
  • pkg/vmcp/optimizer/optimizer.gofind_tool and call_tool tool names to assert in FTS5/TEI tests
  • cmd/vmcp/app/commands.go — Standalone vmcp binary entrypoint; --config and --port flag names for the regression test
  • .claude/rules/testing.md — E2E test strategy, Ginkgo patterns, DeferCleanup usage

Component Interfaces

// test/e2e/vmcp_optimizer_test.go — top-level structure

var _ = Describe("vMCP CLI optimizer", Label("vmcp", "optimizer", "e2e"), func() {
    var (
        config      *e2e.TestConfig
        groupName   string
        backendName string
        createdWorkloads []string
    )

    BeforeEach(func() {
        config = e2e.NewTestConfig()
        groupName = e2e.GenerateUniqueServerName("vmcp-opt-group")
        backendName = e2e.GenerateUniqueServerName("vmcp-opt-backend")
        createdWorkloads = nil
        Expect(e2e.CheckTHVBinaryAvailable(config)).To(Succeed())

        // Create group and start backend workload
        e2e.CreateAndTrackGroup(config, groupName, &[]string{})
        e2e.NewTHVCommand(config, "run", "fetch", "--name", backendName,
            "--group", groupName).ExpectSuccess()
        createdWorkloads = append(createdWorkloads, backendName)
        Expect(e2e.WaitForMCPServer(config, backendName, 60*time.Second)).To(Succeed())
    })

    AfterEach(func() {
        for _, w := range createdWorkloads {
            _ = e2e.StopAndRemoveMCPServer(config, w)
        }
        _ = e2e.RemoveGroup(config, groupName)
    })

    Context("FTS5 optimizer (--optimizer)", func() {
        It("exposes only find_tool and call_tool with BM25 search", func() { ... })
    })

    Context("TEI managed optimizer (--optimizer-embedding)",
        Label("requires-docker"), func() {
        It("auto-starts TEI container and enables semantic search", func() { ... })
        It("stops the TEI container on serve exit", func() { ... })
        It("reuses the same TEI container across two invocations", func() { ... })
    })

    Context("fail-fast behaviour", func() {
        It("exits non-zero with a clear error when TEI fails to start", func() { ... })
    })

    Context("standalone vmcp serve regression", func() {
        It("works identically after Phase 1 extraction refactor", func() { ... })
    })
})

// skipIfNoDocker skips the current test if Docker is unavailable
// or SKIP_DOCKER_TESTS is set to "true".
func skipIfNoDocker() {
    if os.Getenv("SKIP_DOCKER_TESTS") == "true" {
        Skip("Skipping Docker-dependent test (SKIP_DOCKER_TESTS=true)")
    }
}

Testing Strategy

Unit Tests

Integration Tests (E2E — the primary deliverable)

  • FTS5 optimizer: thv vmcp serve --group <name> --optimizerListTools returns exactly [find_tool, call_tool]find_tool call with keyword returns non-error
  • TEI managed optimizer: thv vmcp serve --group <name> --optimizer-embedding → TEI container visible in docker psListTools returns [find_tool, call_tool]find_tool semantic call returns non-error
  • TEI container cleanup: after SIGINT to thv vmcp serve --optimizer-embedding, docker ps no longer shows thv-embedding-* container within 15 seconds
  • Idempotent TEI reuse: second invocation with same model does not start a second thv-embedding-* container
  • Fail-fast: thv vmcp serve --optimizer-embedding --embedding-image <bad-image> exits non-zero with TEI error message; no vMCP server starts
  • Standalone vmcp serve regression: vmcp serve --config <valid.yaml> starts, MCP client connects, ListTools returns backend tools

Edge Cases

  • TEI readiness timeout: model download can take 30–60 s; use 90-second WaitForMCPServerReady timeout for TEI tests to avoid spurious CI failures
  • Cleanup on panic: DeferCleanup(func() { vMCPCmd.Process.Signal(syscall.SIGINT); vMCPCmd.Wait() }) registered before the It body starts the serve process — ensures cleanup even if assertions panic
  • Parallel test safety: each test invocation uses GenerateUniqueServerName for group and backend names, and allocateFreePort() for port numbers, preventing cross-test interference

Out of Scope

  • Unit tests for optimizer wiring logic (pkg/vmcp/cli/serve.go) — those are Wire optimizer flags into thv vmcp serve #4887
  • Tier 3 (config-file based) optimizer where the user specifies optimizer.embeddingService directly in YAML — the existing config-load path already handles this; no new E2E coverage needed
  • Kubernetes/chainsaw vMCP E2E tests — those remain unchanged under test/e2e/chainsaw/operator/
  • thv vmcp status subcommand — explicitly deferred per RFC open questions
  • MCP protocol conformance testing beyond basic ListTools and CallTool connectivity
  • ARM64 / Apple Silicon platform-specific CI adjustments — CI runs on amd64 Linux; TEI CPU image works there without Rosetta emulation
  • Architecture documentation for local vMCP — that is Architecture documentation for local vMCP #4890

References

  • RFC THV-0059 — Phase 4: optimizer tiers, TEI container lifecycle, fail-fast requirement, idempotent reuse
  • GitHub Issue #4808 — Parent tracking issue
  • test/e2e/helpers.goStartLongRunningTHVCommand, WaitForMCPServer, group/workload helpers
  • test/e2e/mcp_client_helpers.goNewMCPClientForSSE, WaitForMCPServerReady, MCPClientHelper
  • test/e2e/vmcp_cli_test.go (E2E tests: quick mode and config-file mode #4888) — Base E2E infrastructure: allocateFreePort, Ginkgo structure, background process pattern
  • test/e2e/inspector_test.go — Long-running subcommand (background process) cleanup pattern
  • test/e2e/group_test.go — Group + workload setup/teardown pattern
  • pkg/vmcp/optimizer/optimizer.gofind_tool and call_tool tool names
  • pkg/vmcp/cli/embedding_manager.go (Implement EmbeddingServiceManager in pkg/vmcp/cli/ #4884) — TEI container naming: thv-embedding-<model-short-hash>
  • cmd/vmcp/app/commands.go — Standalone vmcp binary flags for regression test
  • .claude/rules/testing.md — E2E test strategy, Ginkgo/Gomega patterns, DeferCleanup usage
  • .claude/rules/go-style.md — SPDX headers, error handling conventions

Metadata

Metadata

Assignees

No one assigned

    Labels

    cliChanges that impact CLI functionalityenhancementNew feature or requestvmcpVirtual MCP Server related issues
    No fields configured for Task 📋.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions