Skip to content

NO-JIRA: docs(agents): add CRD API machinery fundamentals and envtest docs#8236

Open
enxebre wants to merge 1 commit intoopenshift:mainfrom
enxebre:agents-md-api-and-docs
Open

NO-JIRA: docs(agents): add CRD API machinery fundamentals and envtest docs#8236
enxebre wants to merge 1 commit intoopenshift:mainfrom
enxebre:agents-md-api-and-docs

Conversation

@enxebre
Copy link
Copy Markdown
Member

@enxebre enxebre commented Apr 14, 2026

Summary

  • Adds CRD API machinery fundamentals section to AGENTS.md covering serialization, validation execution, immutability, and defaulting behaviors
  • Documents envtest testing framework in AGENTS.md with build tags and run commands
  • Expands test/envtest/README.md with directory layout, framework resolution logic, and guidance for adding new API test suites
  • Adds note about MkDocs documentation structure to AGENTS.md

Test plan

  • Verify AGENTS.md content is accurate against current codebase
  • Verify test/envtest/README.md directory layout matches actual structure
  • Run make test-envtest-api-all to confirm documented commands work

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Documentation
    • Added MkDocs publishing and navigation guidance for maintainers.
    • Documented API validation test framework and directory/layout expectations, how test suites resolve CRDs and feature-gate files, and how to run validation tests.
    • Expanded CRD API machinery fundamentals with guidance on API versioning, serialization choices, validation execution, immutability, and transition/defaulting recommendations.

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

Pipeline controller notification
This repo is configured to use the pipeline controller. Second-stage tests will be triggered either automatically or after lgtm label is added, depending on the repository configuration. The pipeline controller will automatically detect which contexts are required and will utilize /test Prow commands to trigger the second stage.

For optional jobs, comment /test ? to see a list of all defined jobs. To trigger manually all jobs from second stage use /pipeline required command.

This repository is configured in: LGTM mode

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Apr 14, 2026
@openshift-ci-robot
Copy link
Copy Markdown

@enxebre: This pull request explicitly references no jira issue.

Details

In response to this:

Summary

  • Adds CRD API machinery fundamentals section to AGENTS.md covering serialization, validation execution, immutability, and defaulting behaviors
  • Documents envtest testing framework in AGENTS.md with build tags and run commands
  • Expands test/envtest/README.md with directory layout, framework resolution logic, and guidance for adding new API test suites
  • Adds note about MkDocs documentation structure to AGENTS.md

Test plan

  • Verify AGENTS.md content is accurate against current codebase
  • Verify test/envtest/README.md directory layout matches actual structure
  • Run make test-envtest-api-all to confirm documented commands work

🤖 Generated with Claude Code

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 14, 2026

📝 Walkthrough

Walkthrough

The pull request updates two documentation files. AGENTS.md is expanded with MkDocs publication/navigation guidance, a new “Envtest (API Validation Tests)” section documenting test location, YAML structure (onCreate/onUpdate cases and expected errors), supported Kubernetes versions, feature-gate filtering, and relevant make targets, and a reorganized “CRD API Machinery Fundamentals” section covering API version guidance, serialization (omitempty vs omitzero), pointer/default rationale, validation execution semantics, immutability constraints, and validation ratcheting/defaulting/transition guidance. test/envtest/README.md adds a directory-layout section for test assets, explains relative-path resolution for CRDs and feature gates, notes portability and multi-root LoadTestSuiteSpecs usage, and fixes minor wording/path details.


Important

Pre-merge checks failed

Please resolve all errors before merging. Addressing warnings is optional.

❌ Failed checks (1 error)

Check name Status Explanation Resolution
Stable And Deterministic Test Names ❌ Error Test uses dynamic name with fmt.Sprintf and variable featureSet, violating the custom check's explicit ban on dynamic test names. Replace dynamic test name with static string; move featureSet validation into test body instead of test title.
✅ Passed checks (9 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main changes: documentation additions to AGENTS.md covering CRD API machinery fundamentals and envtest documentation, plus updates to test/envtest/README.md.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Test Structure And Quality ✅ Passed PR modifies only documentation files (AGENTS.md and test/envtest/README.md), not test code.
Microshift Test Compatibility ✅ Passed PR contains only documentation updates to markdown files with no modifications to test code or new Ginkgo e2e tests.
Single Node Openshift (Sno) Test Compatibility ✅ Passed Pull request contains only documentation changes to markdown files with no new Ginkgo e2e tests, so SNO compatibility check does not apply.
Topology-Aware Scheduling Compatibility ✅ Passed This pull request modifies only documentation files (AGENTS.md and test/envtest/README.md) with no deployment manifests, operator code, controllers, or scheduling constraints.
Ote Binary Stdout Contract ✅ Passed PR contains only documentation changes (AGENTS.md and test/envtest/README.md) with no executable code modifications affecting process-level stdout.
Ipv6 And Disconnected Network Test Compatibility ✅ Passed PR contains only documentation changes; no new Ginkgo e2e test code with IPv4 assumptions or connectivity requirements.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci openshift-ci bot requested review from bryan-cox and csrwng April 14, 2026 11:41
@openshift-ci openshift-ci bot added the area/testing Indicates the PR includes changes for e2e testing label Apr 14, 2026
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci bot commented Apr 14, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: enxebre

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added approved Indicates a PR has been approved by an approver from all required OWNERS files. and removed do-not-merge/needs-area labels Apr 14, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@AGENTS.md`:
- Around line 91-101: Update the Kubernetes version range in the "Envtest (API
Validation Tests)" section of AGENTS.md so it matches test/envtest/README.md:
change "1.30–1.35" to "1.31–1.35" in the Envtest (API Validation Tests)
paragraph and verify both documents consistently state "1.31–1.35" to avoid
future mismatch.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Pro Plus

Run ID: 568fa85f-9ae0-4305-8266-5a947006f573

📥 Commits

Reviewing files that changed from the base of the PR and between 3dbb066 and 97bd276.

📒 Files selected for processing (2)
  • AGENTS.md
  • test/envtest/README.md

Comment thread AGENTS.md
Comment on lines +91 to +101
### Envtest (API Validation Tests)
- Located in `test/envtest/` with build tag `envtest`
- Tests CRD validation rules (CEL, OpenAPI schema) against real kube-apiserver + etcd
- Test cases are YAML-driven following the openshift/api convention
- Each YAML file defines `onCreate` and `onUpdate` test cases with expected errors
- Run with `make test-envtest-ocp` (OpenShift k8s versions) or `make test-envtest-kube` (vanilla k8s versions), or `make test-envtest-api-all` for both
- Tests run across multiple Kubernetes versions (1.30–1.35) to verify validation ratcheting and compatibility
- Feature gate filtering: test suites can target stable, tech-preview, or feature-gated CRD variants

See test/envtest/README.md for details

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix Kubernetes version range mismatch across docs.

Line 97 says envtest runs across Kubernetes 1.30–1.35, but test/envtest/README.md documents 1.31–1.35. Please align these to avoid contributor confusion.

📝 Suggested doc fix
-- Tests run across multiple Kubernetes versions (1.30–1.35) to verify validation ratcheting and compatibility
+- Tests run across multiple Kubernetes versions (1.31–1.35) to verify validation ratcheting and compatibility
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
### Envtest (API Validation Tests)
- Located in `test/envtest/` with build tag `envtest`
- Tests CRD validation rules (CEL, OpenAPI schema) against real kube-apiserver + etcd
- Test cases are YAML-driven following the openshift/api convention
- Each YAML file defines `onCreate` and `onUpdate` test cases with expected errors
- Run with `make test-envtest-ocp` (OpenShift k8s versions) or `make test-envtest-kube` (vanilla k8s versions), or `make test-envtest-api-all` for both
- Tests run across multiple Kubernetes versions (1.30–1.35) to verify validation ratcheting and compatibility
- Feature gate filtering: test suites can target stable, tech-preview, or feature-gated CRD variants
See test/envtest/README.md for details
### Envtest (API Validation Tests)
- Located in `test/envtest/` with build tag `envtest`
- Tests CRD validation rules (CEL, OpenAPI schema) against real kube-apiserver + etcd
- Test cases are YAML-driven following the openshift/api convention
- Each YAML file defines `onCreate` and `onUpdate` test cases with expected errors
- Run with `make test-envtest-ocp` (OpenShift k8s versions) or `make test-envtest-kube` (vanilla k8s versions), or `make test-envtest-api-all` for both
- Tests run across multiple Kubernetes versions (1.31–1.35) to verify validation ratcheting and compatibility
- Feature gate filtering: test suites can target stable, tech-preview, or feature-gated CRD variants
See test/envtest/README.md for details
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@AGENTS.md` around lines 91 - 101, Update the Kubernetes version range in the
"Envtest (API Validation Tests)" section of AGENTS.md so it matches
test/envtest/README.md: change "1.30–1.35" to "1.31–1.35" in the Envtest (API
Validation Tests) paragraph and verify both documents consistently state
"1.31–1.35" to avoid future mismatch.

@codecov
Copy link
Copy Markdown

codecov bot commented Apr 14, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 34.63%. Comparing base (783f795) to head (961f460).
⚠️ Report is 23 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #8236   +/-   ##
=======================================
  Coverage   34.63%   34.63%           
=======================================
  Files         767      767           
  Lines       93186    93200   +14     
=======================================
+ Hits        32277    32282    +5     
- Misses      58236    58243    +7     
- Partials     2673     2675    +2     

see 2 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Comment thread AGENTS.md Outdated

For conventions read https://github.com/openshift/enhancements/blob/master/dev-guide/api-conventions.md

`make api-lint` will enforce most conventions and best practices.
Copy link
Copy Markdown
Contributor

@muraee muraee Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should it use api-lint-fix to actually enforce conventions?

NO-JIRA

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@enxebre enxebre force-pushed the agents-md-api-and-docs branch from 97bd276 to 961f460 Compare April 14, 2026 12:30
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@test/envtest/README.md`:
- Around line 11-25: The fenced code block in test/envtest/README.md is missing
a language identifier (MD040); update the opening fence from ``` to include a
tag such as ```text (or ```console/```yaml as appropriate) so the block starts
with a language identifier and the markdownlint warning is resolved—modify the
fenced block that lists "cmd/install/assets/crds/hypershift-operator/"
accordingly.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Pro Plus

Run ID: 7eb72cdd-a24e-44c3-bcdd-208f528865f0

📥 Commits

Reviewing files that changed from the base of the PR and between 97bd276 and 961f460.

📒 Files selected for processing (2)
  • AGENTS.md
  • test/envtest/README.md
✅ Files skipped from review due to trivial changes (1)
  • AGENTS.md

Comment thread test/envtest/README.md
Comment on lines +11 to +25
```
cmd/install/assets/crds/hypershift-operator/
├── zz_generated.crd-manifests/ # Generated CRDs (by make api)
│ ├── 0000_10_hostedclusters-Default.crd.yaml
│ ├── 0000_10_hostedclusters-TechPreviewNoUpgrade.crd.yaml
│ └── ...
├── tests/ # Test suite YAMLs
│ ├── hostedclusters.hypershift.openshift.io/
│ │ ├── stable.hostedclusters.validation.testsuite.yaml
│ │ └── ...
│ └── nodepools.hypershift.openshift.io/
│ └── ...
└── payload-manifests/ # Feature gate definitions
└── featuregates/
```
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Add a language tag to the fenced block (MD040).

Line 11 opens a fenced block without a language identifier, which triggers markdownlint MD040.

✏️ Proposed fix
-```
+```text
 cmd/install/assets/crds/hypershift-operator/
 ├── zz_generated.crd-manifests/                    # Generated CRDs (by make api)
 │   ├── 0000_10_hostedclusters-Default.crd.yaml
 │   ├── 0000_10_hostedclusters-TechPreviewNoUpgrade.crd.yaml
 │   └── ...
 ├── tests/                                         # Test suite YAMLs
 │   ├── hostedclusters.hypershift.openshift.io/
 │   │   ├── stable.hostedclusters.validation.testsuite.yaml
 │   │   └── ...
 │   └── nodepools.hypershift.openshift.io/
 │       └── ...
 └── payload-manifests/                             # Feature gate definitions
     └── featuregates/
</details>

<!-- suggestion_start -->

<details>
<summary>📝 Committable suggestion</summary>

> ‼️ **IMPORTANT**
> Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

```suggestion

🧰 Tools
🪛 markdownlint-cli2 (0.22.0)

[warning] 11-11: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/envtest/README.md` around lines 11 - 25, The fenced code block in
test/envtest/README.md is missing a language identifier (MD040); update the
opening fence from ``` to include a tag such as ```text (or ```console/```yaml
as appropriate) so the block starts with a language identifier and the
markdownlint warning is resolved—modify the fenced block that lists
"cmd/install/assets/crds/hypershift-operator/" accordingly.

@muraee
Copy link
Copy Markdown
Contributor

muraee commented Apr 15, 2026

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Apr 15, 2026
@openshift-merge-bot
Copy link
Copy Markdown
Contributor

Scheduling tests matching the pipeline_run_if_changed or not excluded by pipeline_skip_if_only_changed parameters:
/test e2e-aks
/test e2e-aws
/test e2e-aws-upgrade-hypershift-operator
/test e2e-azure-self-managed
/test e2e-kubevirt-aws-ovn-reduced
/test e2e-v2-aws

@cwbotbot
Copy link
Copy Markdown

cwbotbot commented Apr 15, 2026

Test Results

e2e-aws

e2e-aks

@hypershift-jira-solve-ci
Copy link
Copy Markdown

AI Test Failure Analysis

Job: pull-ci-openshift-hypershift-main-e2e-aws | Build: 2044342633500774400 | Cost: $1.85558825 | Failed step: hypershift-aws-run-e2e-nested

View full analysis report


Generated by hypershift-analyze-e2e-failure post-step using Claude claude-opus-4-6

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci bot commented Apr 15, 2026

@enxebre: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-aks 961f460 link true /test e2e-aks
ci/prow/e2e-aws 961f460 link true /test e2e-aws

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@hypershift-jira-solve-ci
Copy link
Copy Markdown

Now I have all the evidence I need. Let me verify the key details: 427 tests ran, 400 passed, 25 skipped, only 2 failures — both TestCreateCluster and its subtest TestCreateCluster/ValidateHostedCluster. The failure is due to OLM catalog deployment image resolution race condition. This is a known flake pattern unrelated to the docs-only PR.

Test Failure Analysis Complete

Job Information

Test Failure Analysis

Error

Failed to wait for HostedCluster e2e-clusters-gghpx/create-cluster-ms52j to rollout in 30m0s: context deadline exceeded

incorrect condition: wanted Available=True, got Available=False:
  ComponentsNotAvailable(Waiting for components to be available: certified-operators-catalog, community-operators-catalog)

incorrect condition: wanted Degraded=False, got Degraded=True:
  UnavailableReplicas([certified-operators-catalog deployment has 2 unavailable replicas,
  community-operators-catalog deployment has 2 unavailable replicas])

Summary

The TestCreateCluster/ValidateHostedCluster test timed out after 30 minutes waiting for the HostedCluster create-cluster-ms52j to reach Available=True. The cluster version installed successfully (5.0.0-0.ci), all 3 nodes became ready, and all core control plane components (etcd, kube-apiserver, infrastructure) were healthy. However, two OLM catalog deployments — certified-operators-catalog and community-operators-catalog — never became available due to an image resolution race condition in the ImageStreamTag trigger mechanism, leaving the HostedCluster in Degraded=True state. This is an intermittent infrastructure flake unrelated to the docs-only PR changes. All other 10+ test suites (TestCreateClusterPrivate, TestCreateClusterProxy, TestUpgradeControlPlane, TestNodePool, TestAutoscaling, etc.) passed successfully.

Root Cause

Image resolution race condition in OLM catalog deployments.

The catalog deployments (certified-operators-catalog, community-operators-catalog) use image.openshift.io/triggers annotations referencing ImageStreamTags (e.g., catalogs:certified-operators). During HostedCluster rollout, the openshift-controller-manager resolves these image references through ImageStream triggers. A timing-dependent race caused the following failure chain:

  1. Deployment creation (~10:07:11Z): Both catalog deployments were created with unresolved short image tags.

  2. Early ReplicaSet pods (Revisions 1-2): Pods from earlier ReplicaSets (e.g., certified-operators-catalog-b87697c8f-dmn75) attempted to pull the unresolved short tag catalogs:certified-operators from docker.io/library/ — which doesn't exist — resulting in ImagePullBackOff.

  3. Latest ReplicaSet pod (Revision 3): The newest pod (certified-operators-catalog-57495575dc-67m59) received the correctly resolved image digest from the internal registry. The image was pulled successfully at ~10:08:54Z, but the extract-content init container then hit a CreateContainerError: CRI-O could not locate the pulled image by its local image ID (image not known), a known intermittent CRI-O/image registry interaction issue.

  4. ProgressDeadlineExceeded (~10:17:14Z): Both deployments exceeded their 600s progress deadline with 0 available replicas, causing the HostedCluster Available condition to remain False.

  5. Test timeout (after 30m): ValidateHostedCluster timed out waiting for Available=True.

This failure is NOT caused by PR #8236. The PR only modifies documentation files (docs/ directory) and does not touch any Go code, test code, or CI configuration. The 2 of 4 catalog deployments that failed (while redhat-operators-catalog and redhat-marketplace-catalog succeeded) confirms this is a timing-dependent flake, not a systemic issue.

Recommendations
  1. Retry the job — Run /test e2e-aws to retry. This is an intermittent infrastructure flake that should pass on retry.
  2. Not a PR blocker — The docs-only PR NO-JIRA: docs(agents): add CRD API machinery fundamentals and envtest docs #8236 cannot have caused this failure. The pipeline_skip_if_only_changed annotation includes docs/ in the skip pattern, so the job could potentially have been skipped entirely depending on file paths.
  3. Known flake pattern — The OLM catalog image resolution race via ImageStreamTag triggers is a recurring pattern in HyperShift CI. If occurring frequently, consider investigating retry logic for catalog deployment readiness or improvements to ImageStreamTag trigger resolution timing.
Evidence
Evidence Detail
Failed test TestCreateCluster/ValidateHostedCluster — timed out after 2515s (30m)
HostedCluster e2e-clusters-gghpx/create-cluster-ms52jAvailable=False, Degraded=True
Blocking components certified-operators-catalog (2 unavailable replicas), community-operators-catalog (2 unavailable replicas)
Old RS pod failure certified-operators-catalog-b87697c8f-dmn75ImagePullBackOff pulling unresolved catalogs:certified-operators from docker.io
New RS pod failure certified-operators-catalog-57495575dc-67m59CreateContainerError — CRI-O image not known after successful pull
Healthy catalogs redhat-operators-catalog and redhat-marketplace-catalog both available — confirms timing-dependent race
Healthy components etcd (QuorumAvailable), kube-apiserver (Available), ClusterVersion (5.0.0-0.ci applied successfully), 3/3 nodes ready
Other tests 400 of 427 tests passed, 25 skipped — only TestCreateCluster failed
PR relevance Docs-only change — no Go code, test code, or CI config modified
CI step hypershift-aws-run-e2e-nested failed after 57m28s; pre-phase and post-phase steps all succeeded

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. area/testing Indicates the PR includes changes for e2e testing jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants