-
Notifications
You must be signed in to change notification settings - Fork 11
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
docs: add structure of support review process
docs: add structure of support review process docs: update dev-guide with filters for process cmd doc: review support and install guides doc: PR review; review support guide and formatting doc: creating troubleshooting document and migrating from user guide doc: review install-review guide doc: overall review doc: @rborst PR review docs/review: update mkdocs and dev ToC after rebase docs: review - ready for final review doc/support-guide: review checklist reference doc/support-guide: PR review for @bostrt doc/support-guide: add insights cmdline Dedicated mode now default and baseline results download (#1) * doc: dedicated mode is now default * doc/support-guide: steps on downloading baseline results * Update docs/user.md Co-authored-by: Marco Braga <braga@mtulio.eng.br> * Update docs/user.md Co-authored-by: Marco Braga <braga@mtulio.eng.br> * docs: remove development env guidance from user guide * docs: no longer need aws CLI and can reference HTML webpage hosted in S3 * docs: remove requirement for AWS access key Co-authored-by: Marco Braga <braga@mtulio.eng.br>
- Loading branch information
Showing
8 changed files
with
755 additions
and
139 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,6 +1,14 @@ | ||
# OpenShift Provider Certification Tool | ||
|
||
Welcome to the documentation for the OpenShift Provider Certification Tool! | ||
|
||
OpenShift Provider Certification Tool is used to evaluate an OpenShift installation on a provider or hardware is in conformance. | ||
|
||
Here you can find the initial steps to use the OpenShift Provider Certification Tool. | ||
|
||
- [User Guide](./user.md) | ||
- [Installation Check List](./user-installation-checklist.md) | ||
- [Installation Review](./user-installation-review.md) | ||
- [Results Review](./user-results-review.md) | ||
- [Support Guide](./support-guide.md) | ||
- [Development Guide](./dev.md) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,272 @@ | ||
# OpenShift Provider Certification Tool - Support Guide | ||
|
||
- [Support Case Check List](#check-list) | ||
- [New Support Cases](#check-list-new-case) | ||
- [New Executions](#check-list-new-executions) | ||
- [Setting up the Review Environment](#setup) | ||
- [Install tools](#setup-install) | ||
- [Download dependencies](#setup-download-baseline) | ||
- [Download Partner Results](#setup-download-results) | ||
- [Review guide: exploring the failed tests](#review-process) | ||
- [Exploring the failures](#review-process-exploring) | ||
- [Extracting the failures to the local directory](#review-process-extracting) | ||
- [Explaning the extracted files](#review-process-explain) | ||
- [Review Guidelines](#review-process-guidelines) | ||
|
||
|
||
## Support Case Check List <a name="check-list"></a> | ||
|
||
### New Support Cases <a name="check-list-new-case"></a> | ||
|
||
Check-list to require when **new** support case has been opened: | ||
|
||
- Documentation: Installing Steps containing the flavors/size of the Infrastructure and the steps to install OCP | ||
- Documentation: Diagram of the Architecture including zonal deployment | ||
- Archive with Certification results | ||
- Archive with must-gather | ||
- [Installation Checklist (file `user-installation-checklist.md`)](./user-installation-checklist.md) with the partner's update to sign off post-instalation items | ||
|
||
### New Executions <a name="check-list-new-executions"></a> | ||
|
||
The following assets, certification assets, should be updated when certain conditions happen: | ||
|
||
- Certification Results | ||
- Must Gather | ||
- Install Documentation (when any item/flavor/configuration has been modified) | ||
|
||
|
||
The following conditions require new certification assets: | ||
|
||
- The version of the OpenShift Container Platform has been updated | ||
- Any Infrastructure component(s) (e.g.: server size, disk category, ELB type/size/config) or cluster dependencies (e.g.: external storage backend for image registry) have been modified | ||
|
||
|
||
## Review Environment <a name="setup"></a> | ||
|
||
### Install Tools <a name="setup-install"></a> | ||
|
||
- Download the [openshift-provider-cert](./user.md#install): OpenShift Provider Certification tool | ||
- Download the [`omg`](https://github.com/kxr/o-must-gather): tool to analyse Must-gather archive | ||
```bash | ||
pip3 install o-must-gather --user | ||
``` | ||
|
||
### Download Baseline CI results <a name="setup-download-baseline"></a> | ||
|
||
The Openshift provider certification tool is run periodically ([source code](https://github.com/openshift/release/blob/master/ci-operator/jobs/redhat-openshift-ecosystem/provider-certification-tool/redhat-openshift-ecosystem-provider-certification-tool-main-periodics.yaml)) in OpenShift CI using the latest stable release of OpenShift. | ||
These baseline results are stored long-term in an AWS S3 bucket (`s3://openshift-provider-certification/baseline-results`). An HTML listing can be found here: https://openshift-provider-certification.s3.us-west-2.amazonaws.com/index.html. | ||
These baseline results should be used as a reference when reviewing a partner's certification results. | ||
|
||
1. Identify cluster version in the partner's must gather: | ||
```bash | ||
$ omg get clusterversion | ||
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS | ||
version 4.11.13 True False 11h Cluster version is 4.11.13 | ||
``` | ||
2. Navigate to https://openshift-provider-certification.s3.us-west-2.amazonaws.com/index.html and find the latest results (by date) for the matching OpenShift version | ||
3. Download the *latest* test results for the version (bottom of list). Copy the results archive link from the webpage in previous step. | ||
```bash | ||
$ curl --output 4.11.13-20221125.tar.gz https://openshift-provider-certification.s3.us-west-2.amazonaws.com/baseline-results/4.11.13-20221125.tar.gz | ||
$ file 4.11.13-20221125.tar.gz | ||
4.11.13-20221125.tar.gz: gzip compressed data, original size modulo 2^32 430269440 | ||
``` | ||
|
||
4. Proceed with comparing baseline results with actual provider results. | ||
- Download the suite test list for the version used by the partner | ||
|
||
```bash | ||
RELEASE_VERSION="4.11.4->CHANGE_ME" | ||
TESTS_IMG=$(oc adm release info ${RELEASE_VERSION} --image-for='tests') | ||
oc image extract ${TESTS_IMG} --file="/usr/bin/openshift-tests" | ||
chmod u+x ./openshift-tests | ||
./openshift-tests run --dry-run kubernetes/conformance > ./test-list_openshift-tests_kubernetes-conformance.txt | ||
./openshift-tests run --dry-run openshift/conformance > ./test-list_openshift-tests_openshift-validated.txt | ||
``` | ||
|
||
### Download Partner Results <a name="setup-download-results"></a> | ||
|
||
- Download the Provider certification archive from the Support Case. Example file name: `retrieved-archive.tar.gz` | ||
- Download the Must-gather from the Support Case. Example file name: `must-gather.tar.gz` | ||
|
||
## Review guide: exploring the failed tests <a name="review-process"></a> | ||
|
||
The steps below use the subcommand `process` to apply filters on the failed tests and help to keep the initial focus of the investigation on the failures exclusively on the partner's results. | ||
|
||
The filters use only tests included in the respective suite, isolating from common failures identified on the Baseline results or Flakes from CI. To see more details about the filters, read the [dev documentation describing filters flow](./dev.md#dev-diagram-filters). | ||
|
||
Required to use this section: | ||
|
||
- OPCT CLI downloaded to the current directory | ||
- OpenShift e2e test suite exported to the current directory | ||
- Baseline results exported to the current directory | ||
- The Certification Result is in the current directory | ||
|
||
|
||
### Exploring the failures <a name="review-process-exploring"></a> | ||
|
||
Compare the provider results with the baseline: | ||
|
||
```bash | ||
./openshift-provider-cert-linux-amd64 process \ | ||
--baseline ./opct_baseline-ocp_4.11.4-platform_none-provider-date_uuid.tar.gz \ | ||
--base-suite-ocp ./test-list_openshift-tests_openshift-validated.txt \ | ||
--base-suite-k8s ./test-list_openshift-tests_kubernetes-conformance.txt \ | ||
./<timestamp>_sonobuoy_<uuid>.tar.gz | ||
``` | ||
|
||
### Extracting the failures to a local directory <a name="review-process-extracting"></a> | ||
|
||
Compare the results and extract the files (option `--save-to`) to the local directory `./results-provider-processed`: | ||
|
||
```bash | ||
./openshift-provider-cert-linux-amd64 process \ | ||
--baseline ./opct_baseline-ocp_4.11.4-platform_none-provider-date_uuid.tar.gz \ | ||
--base-suite-ocp ./test-list_openshift-tests_openshift-validated.txt \ | ||
--base-suite-k8s ./test-list_openshift-tests_kubernetes-conformance.txt \ | ||
--save-to processed \ | ||
./<timestamp>_sonobuoy_<uuid>.tar.gz | ||
``` | ||
|
||
This is the expected output: | ||
|
||
> Note: the tabulation is not ok when pasting to Markdown | ||
```bash | ||
(...Header...) | ||
|
||
> Processed Summary < | ||
|
||
Total Tests suites: | ||
- kubernetes/conformance: 353 | ||
- openshift/conformance: 3488 | ||
|
||
Total Tests by Certification Layer: | ||
- openshift-kube-conformance: | ||
- Status: failed | ||
- Total: 675 | ||
- Passed: 654 | ||
- Failed: 21 | ||
- Timeout: 0 | ||
- Skipped: 0 | ||
- Failed (without filters) : 21 | ||
- Failed (Filter SuiteOnly): 2 | ||
- Failed (Filter Baseline : 2 | ||
- Failed (Filter CI Flakes): 0 | ||
- Status After Filters : pass | ||
- openshift-conformance-validated: | ||
- Status: failed | ||
- Total: 3818 | ||
- Passed: 1708 | ||
- Failed: 61 | ||
- Timeout: 0 | ||
- Skipped: 2049 | ||
- Failed (without filters) : 61 | ||
- Failed (Filter SuiteOnly): 32 | ||
- Failed (Filter Baseline : 7 | ||
- Failed (Filter CI Flakes): 2 | ||
- Status After Filters : failed | ||
|
||
Total Tests by Certification Layer: | ||
|
||
=> openshift-kube-conformance: (2 failures, 2 flakes) | ||
|
||
--> Failed tests to Review (without flakes) - Immediate action: | ||
<empty> | ||
|
||
--> Failed flake tests - Statistic from OpenShift CI | ||
Flakes Perc TestName | ||
1 0.138% [sig-api-machinery] CustomResourcePublishOpenAPI [Privileged:ClusterAdmin] works for multiple CRDs of same group and version but different kinds [Conformance] [Suite:openshift/conformance/parallel/minimal] [Suite:k8s] | ||
2 0.275% [sig-api-machinery] ResourceQuota should create a ResourceQuota and capture the life of a secret. [Conformance] [Suite:openshift/conformance/parallel/minimal] [Suite:k8s] | ||
|
||
|
||
=> openshift-conformance-validated: (7 failures, 5 flakes) | ||
|
||
--> Failed tests to Review (without flakes) - Immediate action: | ||
[sig-network-edge][Feature:Idling] Unidling should handle many TCP connections by possibly dropping those over a certain bound [Serial] [Skipped:Network/OVNKubernetes] [Suite:openshift/conformance/serial] | ||
[sig-storage] CSI Volumes [Driver: csi-hostpath] [Testpattern: Dynamic PV (default fs)] provisioning should provision storage with pvc data source [Suite:openshift/conformance/parallel] [Suite:k8s] | ||
|
||
--> Failed flake tests - Statistic from OpenShift CI | ||
Flakes Perc TestName | ||
101 10.576% [sig-arch][bz-DNS][Late] Alerts alert/KubePodNotReady should not be at or above pending in ns/openshift-dns [Suite:openshift/conformance/parallel] | ||
67 7.016% [sig-arch][bz-Routing][Late] Alerts alert/KubePodNotReady should not be at or above pending in ns/openshift-ingress [Suite:openshift/conformance/parallel] | ||
2 0.386% [sig-imageregistry] Image registry should redirect on blob pull [Suite:openshift/conformance/parallel] | ||
32 4.848% [sig-network][Feature:EgressFirewall] egressFirewall should have no impact outside its namespace [Suite:openshift/conformance/parallel] | ||
11 2.402% [sig-network][Feature:EgressFirewall] when using openshift-sdn should ensure egressnetworkpolicy is created [Suite:openshift/conformance/parallel] | ||
|
||
Data Saved to directory './processed/' | ||
``` | ||
> TODO: create the index with a legend with references to the output. | ||
### Understanding the extracted results <a name="review-process-explain"></a> | ||
The data extracted to local storage contains the following files for each plugin: | ||
- `test_${PLUGIN_NAME}_baseline_failures.txt`: List of test failures from the baseline execution | ||
- `test_${PLUGIN_NAME}_provider_failures.txt`: List of test failures from the execution | ||
- `test_${PLUGIN_NAME}_provider_filter1-suite.txt`: List of test failures included on suite | ||
- `test_${PLUGIN_NAME}_provider_filter2-baseline.txt`: List of test failures tests* after applying all filters | ||
- `test_${PLUGIN_NAME}_provider_suite_full.txt`: List with suite e2e tests | ||
The base directory (`./results-provider-processed`) also contains the **all error messages (stdout and fail summary)** for each failed test. Those errors are saved into individual files onto those sub-directories (for each plugin): | ||
- `failures-baseline/${PLUGIN_NAME}_${INDEX}-failure.txt`: the error summary | ||
- `failures-baseline/${PLUGIN_NAME}_${INDEX}-systemOut.txt`: the entire stdout of the failed plugin | ||
Considerations: | ||
- `${PLUGIN_NAME}`: currently these plugins names are valid: [`openshift-validated`, `kubernetes-conformance`] | ||
- `${INDEX}` is the simple index ordered by test name on the list | ||
Example of files on the extracted directory: | ||
```bash | ||
$ tree processed/ | ||
processed/ | ||
├── failures-baseline | ||
[redacted] | ||
├── failures-provider | ||
[redacted] | ||
├── failures-provider-filtered | ||
│ ├── kubernetes-conformance_1-1-failure.txt | ||
│ ├── kubernetes-conformance_1-1-systemOut.txt | ||
│ ├── kubernetes-conformance_2-2-failure.txt | ||
│ ├── kubernetes-conformance_2-2-systemOut.txt | ||
│ ├── openshift-validated_1-31-failure.txt | ||
│ ├── openshift-validated_1-31-systemOut.txt | ||
[redacted] | ||
│ ├── openshift-validated_7-1-failure.txt | ||
│ └── openshift-validated_7-1-systemOut.txt | ||
├── tests_kubernetes-conformance_baseline_failures.txt | ||
├── tests_kubernetes-conformance_provider_failures.txt | ||
├── tests_kubernetes-conformance_provider_filter1-suite.txt | ||
├── tests_kubernetes-conformance_provider_filter2-baseline.txt | ||
├── tests_kubernetes-conformance_suite_full.txt | ||
├── tests_openshift-validated_baseline_failures.txt | ||
├── tests_openshift-validated_provider_failures.txt | ||
├── tests_openshift-validated_provider_filter1-suite.txt | ||
├── tests_openshift-validated_provider_filter2-baseline.txt | ||
└── tests_openshift-validated_suite_full.txt | ||
|
||
3 directories, 300 files | ||
``` | ||
### Review Guidelines <a name="review-process-guidelines"></a> | ||
> WIP: the idea here is to provide guidance on the main points/assets to review, pointing to the details on the respective/dedicated sections. | ||
This section is a guide of the initial files to review when start exploring the resulting archive. | ||
Items to review: | ||
- OCP version matches the certification request | ||
- Review the result file | ||
- Check if the failures are 0, if not, need to check one by one | ||
- To provide a better interaction between the review process, one spreadsheet named `failures-index.xlsx` is created inside the extracted directory (`./processed/` exemplified in the last section). It can be used as a tool to review failures and take notes about them. | ||
- Check details of each test failed on the sub-directory `failures-provider-filtered/*.txt`. | ||
Additional items to review: | ||
- explore the must-gather objects according to findings on the failures files | ||
- run insights rules on the must-gather to check if there's a new know issue: `insights run -p ccx_rules_ocp ${MUST_GATHER_PATH}` | ||
> TODO: provide steps to install and run insight OCP rules (opct could provide one container with it installed to avoid overhead and environment issues) |
Oops, something went wrong.