Skip to content

CI: Validate full stack (temporary)#9

Closed
scotwells wants to merge 11 commits intomainfrom
05-e2e-and-ci
Closed

CI: Validate full stack (temporary)#9
scotwells wants to merge 11 commits intomainfrom
05-e2e-and-ci

Conversation

@scotwells
Copy link
Copy Markdown
Contributor

Temporary PR to trigger CI on the complete stack. Will close after CI passes.

scotwells and others added 10 commits March 17, 2026 16:28
- Add comprehensive README with CRD table, architecture diagram, and quick start guide
- Add service design doc covering BGP control plane architecture and goals
- Add API reference documentation for all CRD types
- Add architecture, overview, and getting started guides
- Add example YAML manifests for common BGP use cases

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add Go API types for all six BGP CRDs: BGPConfiguration, BGPSession,
  BGPPeeringPolicy, BGPAdvertisement, BGPEndpoint, and BGPRoutePolicy
- Add groupversion_info.go registering the bgp.miloapis.com/v1alpha1 API group
- Add generated deepcopy methods (zz_generated.deepcopy.go)
- Add CRD YAML manifests with full OpenAPI schema validation
- Add CRD kustomization.yaml for bundle management
- Add go.mod and go.sum with required dependencies

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add reconcilers for all BGP resource types: BGPConfiguration, BGPSession,
  BGPPeeringPolicy, BGPAdvertisement, and BGPRoutePolicy
- Add gobgp.go with GoBGP daemon management and peer session lifecycle
- Add status.go with shared status condition helpers
- Add metrics.go exposing Prometheus metrics for BGP sessions and routes
- Add controller.go wiring all reconcilers into the controller-manager
- Add internal/netlink package for kernel route programming via netlink
- Add internal/routesync route_watcher.go for kernel route change detection
- Add cmd/bgp/main.go as the binary entry point

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add distroless-based Dockerfile using multi-stage build for minimal image size
- Add DaemonSet manifest deploying bgp-controller on every node with host networking
- Add ClusterRole and ServiceAccount with least-privilege RBAC for BGP resources
- Add Namespace manifest for bgp-system
- Add GoBGP config map template (gobgp-config.yaml)
- Add kustomization.yaml assembling the full deploy bundle
- Add Makefile with build, push, deploy, and generate targets
- Add .gitignore for Go build artifacts and editor files

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add missing security hardening to the bgp container:
- runAsNonRoot: true — reject root UIDs at admission
- readOnlyRootFilesystem: true — prevent writes to container FS
- allowPrivilegeEscalation: false — block setuid/setgid escalation
- capabilities.drop: [ALL] then add: [NET_ADMIN] — follow least-privilege
  principle by dropping all capabilities and only re-adding NET_ADMIN
  which is required for netlink route programming.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The DaemonSet security context sets runAsNonRoot: true. Add a non-root
user (UID 65532) to the image and set USER directive to match.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add Chainsaw-based E2E test suite with five test scenarios:
  system-readiness, bgp-peering, session-lifecycle, bgp-advertisement,
  and bgp-route-policy
- Add kind cluster config (kind-config.yaml) with multi-node topology
  for realistic BGP peering tests
- Add chainsaw-config.yaml with test timeouts and parallel execution settings
- Add test/e2e/overlay/kustomization.yaml for deploying the controller
  into the kind cluster during tests
- Add test/e2e/fixtures/ with shared baseline BGPConfiguration fixture
- Add Taskfile.yml for running the full E2E suite locally
- Add .github/workflows/ci.yaml with lint, build, unit test, and E2E stages
- Add .github/workflows/docker.yaml for building and pushing the container
  image to ghcr.io on push to main and on tags

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Remove SessionState, ReceivedPrefixes, AdvertisedPrefixes,
LastTransitionTime, and FlapCount from BGPSessionStatus. These operational
counters are now exposed exclusively as Prometheus metrics to eliminate the
10-second polling loop that wrote to every BGPSession object cluster-wide.

Conditions (Configured and SessionEstablished) remain on the CRD status for
kubectl visibility. Update printcolumns to show Configured and Established
condition status instead of the removed sessionState and receivedPrefixes
fields. Regenerate CRD manifest and update deepcopy accordingly.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Delete status.go and its RunStatusPoller goroutine, which polled GoBGP
every 10 seconds and wrote to every BGPSession object in the cluster.
This caused multi-writer races in the DaemonSet deployment and generated
unnecessary API server load.

Replace with inline logic in the SessionReconciler:
- After AddPeer/UpdatePeer succeeds, call ListPeer for the specific peer
  to get its current state in the same reconcile pass.
- Set the SessionEstablished condition based on the live GoBGP state.
- Emit bgp_session_state, bgp_received_prefixes_total, and
  bgp_session_flaps_total Prometheus metrics from the reconciler.
- Return RequeueAfter: 30s so the condition and metrics are refreshed
  periodically. Each node only updates sessions it owns (LocalEndpoint
  match), eliminating the multi-writer race.

Move peerStateToString from status.go into session_reconciler.go since it
is only needed there now.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace assertions on .status.sessionState with assertions on the
SessionEstablished condition now that the sessionState field has been
removed from BGPSessionStatus.

- bgp-peering test: query SessionEstablished condition status instead of
  sessionState string when counting established sessions.
- session-lifecycle test: assert SessionEstablished condition=True instead
  of sessionState: Established in the chainsaw assert step.

Update docs/api/README.md:
- Remove sessionState, receivedPrefixes, advertisedPrefixes,
  lastTransitionTime, and flapCount from the BGPSession status table.
- Add a Prometheus metrics table for the operational counters that moved
  out of the CRD status.
- Update the Verifying Configuration section to use the condition JSONPath.
- Update the printed columns description (Local, Remote, Configured,
  Established).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
These tests still queried .status.sessionState which was removed in the
status simplification. Use the SessionEstablished condition instead.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@scotwells
Copy link
Copy Markdown
Contributor Author

CI passed. All 5 e2e tests green with performance fixes.

@scotwells scotwells closed this Mar 17, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant