Skip to content

Add E2E test suite and CI workflows#8

Open
scotwells wants to merge 6 commits into04-deploymentfrom
05-e2e-and-ci
Open

Add E2E test suite and CI workflows#8
scotwells wants to merge 6 commits into04-deploymentfrom
05-e2e-and-ci

Conversation

@scotwells
Copy link
Copy Markdown
Contributor

@scotwells scotwells commented Mar 17, 2026

Summary

  • Add Chainsaw-based E2E test suite covering five scenarios: system-readiness, bgp-peering, session-lifecycle, bgp-advertisement, and bgp-route-policy
  • Add kind-config.yaml configuring a multi-node kind cluster for realistic BGP peering topology
  • Add chainsaw-config.yaml with per-test timeouts and parallel execution settings
  • Add test/e2e/overlay/kustomization.yaml for deploying the controller into the kind cluster during test runs
  • Add test/e2e/fixtures/ with a shared baseline BGPConfiguration used across multiple test cases
  • Add test/e2e/Taskfile.yml so engineers can run the full E2E suite locally with task e2e
  • Add .github/workflows/ci.yaml with lint, build, unit test, and E2E stages running on every PR
  • Add .github/workflows/docker.yaml building and pushing the container image to ghcr.io/datum-cloud/bgp on push to main and on semver tags

Context

This is part of a stacked PR series:

  1. Documentation (PR Add service design and API documentation #4)
  2. API types (PR Add BGP API types and CRD manifests #5)
  3. Controllers (PR Add BGP controller implementation #6)
  4. Deployment (PR Add Dockerfile, deployment manifests, and Makefile #7)
  5. E2E tests and CI (this PR — Add E2E test suite and CI workflows #8)

🤖 Generated with Claude Code

scotwells and others added 5 commits March 17, 2026 22:58
- Add Chainsaw-based E2E test suite with five test scenarios:
  system-readiness, bgp-peering, session-lifecycle, bgp-advertisement,
  and bgp-route-policy
- Add kind cluster config (kind-config.yaml) with multi-node topology
  for realistic BGP peering tests
- Add chainsaw-config.yaml with test timeouts and parallel execution settings
- Add test/e2e/overlay/kustomization.yaml for deploying the controller
  into the kind cluster during tests
- Add test/e2e/fixtures/ with shared baseline BGPConfiguration fixture
- Add Taskfile.yml for running the full E2E suite locally
- Add .github/workflows/ci.yaml with lint, build, unit test, and E2E stages
- Add .github/workflows/docker.yaml for building and pushing the container
  image to ghcr.io on push to main and on tags

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Remove SessionState, ReceivedPrefixes, AdvertisedPrefixes,
LastTransitionTime, and FlapCount from BGPSessionStatus. These operational
counters are now exposed exclusively as Prometheus metrics to eliminate the
10-second polling loop that wrote to every BGPSession object cluster-wide.

Conditions (Configured and SessionEstablished) remain on the CRD status for
kubectl visibility. Update printcolumns to show Configured and Established
condition status instead of the removed sessionState and receivedPrefixes
fields. Regenerate CRD manifest and update deepcopy accordingly.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Delete status.go and its RunStatusPoller goroutine, which polled GoBGP
every 10 seconds and wrote to every BGPSession object in the cluster.
This caused multi-writer races in the DaemonSet deployment and generated
unnecessary API server load.

Replace with inline logic in the SessionReconciler:
- After AddPeer/UpdatePeer succeeds, call ListPeer for the specific peer
  to get its current state in the same reconcile pass.
- Set the SessionEstablished condition based on the live GoBGP state.
- Emit bgp_session_state, bgp_received_prefixes_total, and
  bgp_session_flaps_total Prometheus metrics from the reconciler.
- Return RequeueAfter: 30s so the condition and metrics are refreshed
  periodically. Each node only updates sessions it owns (LocalEndpoint
  match), eliminating the multi-writer race.

Move peerStateToString from status.go into session_reconciler.go since it
is only needed there now.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace assertions on .status.sessionState with assertions on the
SessionEstablished condition now that the sessionState field has been
removed from BGPSessionStatus.

- bgp-peering test: query SessionEstablished condition status instead of
  sessionState string when counting established sessions.
- session-lifecycle test: assert SessionEstablished condition=True instead
  of sessionState: Established in the chainsaw assert step.

Update docs/api/README.md:
- Remove sessionState, receivedPrefixes, advertisedPrefixes,
  lastTransitionTime, and flapCount from the BGPSession status table.
- Add a Prometheus metrics table for the operational counters that moved
  out of the CRD status.
- Update the Verifying Configuration section to use the condition JSONPath.
- Update the printed columns description (Local, Remote, Configured,
  Established).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
These tests still queried .status.sessionState which was removed in the
status simplification. Use the SessionEstablished condition instead.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Update container image paths from ghcr.io/datum-cloud/bgp to
ghcr.io/milo-os/bgp, update doc repo references, and update
copyright holder to Datum Technology, Inc.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant