Skip to content

Commit

Permalink
e2e: set performance profile cpus using env vars
Browse files Browse the repository at this point in the history
We've been observing lately that some tests that involve disabling load balancing are failing (like 32646) because the expected result does not have specific anticipated CPUs. After investigation, it turns out that one factor is the profile configuration of the CPU distribution.

PAO functional tests configure fixed CPU values under the PP. This is considered misconfiguration, especially when the system has more than 4 CPUs, and there is no guarantee that the functionality of the performance profile controller will work adequately with not all cpus reflected in the CPU section in the PP.

To resolve this complication, we are introducing new environment
variables RESERVED_CPU_SET, ISOLATED_CPU_SET, OFFLINED_CPU_SET, should be
set the profile would use them instead of the defaults.
  • Loading branch information
shajmakh committed Jan 9, 2024
1 parent 7820510 commit 031e6f5
Show file tree
Hide file tree
Showing 3 changed files with 136 additions and 6 deletions.
100 changes: 100 additions & 0 deletions report.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,100 @@
<?xml version="1.0" encoding="UTF-8"?>
<testsuites tests="4" disabled="0" errors="0" failures="1" time="0.318217729">
<testsuite name="Performance Addon Operator configuration" package="/home/shajmakh/cluster-node-tuning-operator/test/e2e/performanceprofile/functests/0_config" tests="4" disabled="0" skipped="0" errors="0" failures="1" time="0.318217729" timestamp="2024-01-09T13:54:53">
<properties>
<property name="SuiteSucceeded" value="false"></property>
<property name="SuiteHasProgrammaticFocus" value="false"></property>
<property name="SpecialSuiteFailureReason" value=""></property>
<property name="SuiteLabels" value="[]"></property>
<property name="RandomSeed" value="1704801291"></property>
<property name="RandomizeAllSpecs" value="false"></property>
<property name="LabelFilter" value=""></property>
<property name="FocusStrings" value=""></property>
<property name="SkipStrings" value=""></property>
<property name="FocusFiles" value=""></property>
<property name="SkipFiles" value=""></property>
<property name="FailOnPending" value="false"></property>
<property name="FailFast" value="true"></property>
<property name="FlakeAttempts" value="2"></property>
<property name="DryRun" value="false"></property>
<property name="ParallelTotal" value="1"></property>
<property name="OutputInterceptorMode" value=""></property>
</properties>
<testcase name="[BeforeSuite]" classname="Performance Addon Operator configuration" status="passed" time="6.7445e-05">
<system-err>&gt; Enter [BeforeSuite] TOP-LEVEL - /home/shajmakh/cluster-node-tuning-operator/test/e2e/performanceprofile/functests/0_config/test_suite_performance_config_test.go:54 @ 01/09/24 13:54:53.467&#xA;&lt; Exit [BeforeSuite] TOP-LEVEL - /home/shajmakh/cluster-node-tuning-operator/test/e2e/performanceprofile/functests/0_config/test_suite_performance_config_test.go:54 @ 01/09/24 13:54:53.467 (0s)&#xA;</system-err>
</testcase>
<testcase name="[It] [performance][config] Performance configuration should remove OLM artifacts for performance-addon-operator" classname="Performance Addon Operator configuration" status="passed" time="0.292315322">
<system-err>&gt; Enter [BeforeEach] [performance][config] Performance configuration - /home/shajmakh/cluster-node-tuning-operator/test/e2e/performanceprofile/functests/utils/utils.go:24 @ 01/09/24 13:54:53.467&#xA;&lt; Exit [BeforeEach] [performance][config] Performance configuration - /home/shajmakh/cluster-node-tuning-operator/test/e2e/performanceprofile/functests/utils/utils.go:24 @ 01/09/24 13:54:53.646 (179ms)&#xA;&gt; Enter [It] should remove OLM artifacts for performance-addon-operator - /home/shajmakh/cluster-node-tuning-operator/test/e2e/performanceprofile/functests/0_config/config.go:49 @ 01/09/24 13:54:53.646&#xA;&lt; Exit [It] should remove OLM artifacts for performance-addon-operator - /home/shajmakh/cluster-node-tuning-operator/test/e2e/performanceprofile/functests/0_config/config.go:49 @ 01/09/24 13:54:53.76 (114ms)&#xA;</system-err>
</testcase>
<testcase name="[It] [performance][config] Performance configuration Should successfully deploy the performance profile" classname="Performance Addon Operator configuration" status="failed" time="0.002254741">
<failure message="failed to build performance profile: both reserved and isolated cpusets are required.&#xA;Unexpected error:&#xA; &lt;*errors.errorString | 0xc89d3f8&gt;: &#xA; both reserved and isolated cpusets are required.&#xA; {&#xA; s: &#34;both reserved and isolated cpusets are required.&#34;,&#xA; }&#xA;occurred" type="failed">[FAILED] failed to build performance profile: both reserved and isolated cpusets are required.&#xA;Unexpected error:&#xA; &lt;*errors.errorString | 0xc89d3f8&gt;: &#xA; both reserved and isolated cpusets are required.&#xA; {&#xA; s: &#34;both reserved and isolated cpusets are required.&#34;,&#xA; }&#xA;occurred&#xA;In [It] at: /home/shajmakh/cluster-node-tuning-operator/test/e2e/performanceprofile/functests/0_config/config.go:70 @ 01/09/24 13:54:53.784&#xA;&#xA;There were additional failures detected after the initial failure. These are visible in the timeline&#xA;</failure>
<system-err>&gt; Enter [BeforeEach] [performance][config] Performance configuration - /home/shajmakh/cluster-node-tuning-operator/test/e2e/performanceprofile/functests/utils/utils.go:24 @ 01/09/24 13:54:53.782&#xA;&lt; Exit [BeforeEach] [performance][config] Performance configuration - /home/shajmakh/cluster-node-tuning-operator/test/e2e/performanceprofile/functests/utils/utils.go:24 @ 01/09/24 13:54:53.782 (0s)&#xA;&gt; Enter [It] Should successfully deploy the performance profile - /home/shajmakh/cluster-node-tuning-operator/test/e2e/performanceprofile/functests/0_config/config.go:67 @ 01/09/24 13:54:53.782&#xA;[FAILED] Failure recorded during attempt 1:&#xA;failed to build performance profile: both reserved and isolated cpusets are required.&#xA;Unexpected error:&#xA; &lt;*errors.errorString | 0xc452058&gt;: &#xA; both reserved and isolated cpusets are required.&#xA; {&#xA; s: &#34;both reserved and isolated cpusets are required.&#34;,&#xA; }&#xA;occurred&#xA;In [It] at: /home/shajmakh/cluster-node-tuning-operator/test/e2e/performanceprofile/functests/0_config/config.go:70 @ 01/09/24 13:54:53.783&#xA;&lt; Exit [It] Should successfully deploy the performance profile - /home/shajmakh/cluster-node-tuning-operator/test/e2e/performanceprofile/functests/0_config/config.go:67 @ 01/09/24 13:54:53.783 (1ms)&#xA;&#xA;Attempt #1 Failed. Retrying ↺ @ 01/09/24 13:54:53.783&#xA;&#xA;&gt; Enter [BeforeEach] [performance][config] Performance configuration - /home/shajmakh/cluster-node-tuning-operator/test/e2e/performanceprofile/functests/utils/utils.go:24 @ 01/09/24 13:54:53.783&#xA;&lt; Exit [BeforeEach] [performance][config] Performance configuration - /home/shajmakh/cluster-node-tuning-operator/test/e2e/performanceprofile/functests/utils/utils.go:24 @ 01/09/24 13:54:53.783 (0s)&#xA;&gt; Enter [It] Should successfully deploy the performance profile - /home/shajmakh/cluster-node-tuning-operator/test/e2e/performanceprofile/functests/0_config/config.go:67 @ 01/09/24 13:54:53.783&#xA;[FAILED] failed to build performance profile: both reserved and isolated cpusets are required.&#xA;Unexpected error:&#xA; &lt;*errors.errorString | 0xc89d3f8&gt;: &#xA; both reserved and isolated cpusets are required.&#xA; {&#xA; s: &#34;both reserved and isolated cpusets are required.&#34;,&#xA; }&#xA;occurred&#xA;In [It] at: /home/shajmakh/cluster-node-tuning-operator/test/e2e/performanceprofile/functests/0_config/config.go:70 @ 01/09/24 13:54:53.784&#xA;&lt; Exit [It] Should successfully deploy the performance profile - /home/shajmakh/cluster-node-tuning-operator/test/e2e/performanceprofile/functests/0_config/config.go:67 @ 01/09/24 13:54:53.784 (1ms)&#xA;</system-err>
</testcase>
<testcase name="[ReportAfterSuite] e2e serial suite" classname="Performance Addon Operator configuration" status="passed" time="5.0296e-05">
<system-err>&gt; Enter [ReportAfterSuite] TOP-LEVEL - /home/shajmakh/cluster-node-tuning-operator/test/e2e/performanceprofile/functests/0_config/test_suite_performance_config_test.go:59 @ 01/09/24 13:54:53.793&#xA;&lt; Exit [ReportAfterSuite] TOP-LEVEL - /home/shajmakh/cluster-node-tuning-operator/test/e2e/performanceprofile/functests/0_config/test_suite_performance_config_test.go:59 @ 01/09/24 13:54:53.793 (0s)&#xA;</system-err>
</testcase>
</testsuite>
<testsuite name="" package="/home/shajmakh/cluster-node-tuning-operator/test/e2e/performanceprofile/functests/1_performance" tests="0" disabled="0" skipped="0" errors="0" failures="0" time="0" timestamp="0001-01-01T00:00:00">
<properties>
<property name="SuiteSucceeded" value="false"></property>
<property name="SuiteHasProgrammaticFocus" value="false"></property>
<property name="SpecialSuiteFailureReason" value="Suite did not run because prior suites failed and --keep-going is not set"></property>
<property name="SuiteLabels" value="[]"></property>
<property name="RandomSeed" value="1704801291"></property>
<property name="RandomizeAllSpecs" value="false"></property>
<property name="LabelFilter" value=""></property>
<property name="FocusStrings" value=""></property>
<property name="SkipStrings" value=""></property>
<property name="FocusFiles" value=""></property>
<property name="SkipFiles" value=""></property>
<property name="FailOnPending" value="false"></property>
<property name="FailFast" value="true"></property>
<property name="FlakeAttempts" value="2"></property>
<property name="DryRun" value="false"></property>
<property name="ParallelTotal" value="1"></property>
<property name="OutputInterceptorMode" value=""></property>
</properties>
</testsuite>
<testsuite name="" package="/home/shajmakh/cluster-node-tuning-operator/test/e2e/performanceprofile/functests/6_mustgather_testing" tests="0" disabled="0" skipped="0" errors="0" failures="0" time="0" timestamp="0001-01-01T00:00:00">
<properties>
<property name="SuiteSucceeded" value="false"></property>
<property name="SuiteHasProgrammaticFocus" value="false"></property>
<property name="SpecialSuiteFailureReason" value="Suite did not run because prior suites failed and --keep-going is not set"></property>
<property name="SuiteLabels" value="[]"></property>
<property name="RandomSeed" value="1704801291"></property>
<property name="RandomizeAllSpecs" value="false"></property>
<property name="LabelFilter" value=""></property>
<property name="FocusStrings" value=""></property>
<property name="SkipStrings" value=""></property>
<property name="FocusFiles" value=""></property>
<property name="SkipFiles" value=""></property>
<property name="FailOnPending" value="false"></property>
<property name="FailFast" value="true"></property>
<property name="FlakeAttempts" value="2"></property>
<property name="DryRun" value="false"></property>
<property name="ParallelTotal" value="1"></property>
<property name="OutputInterceptorMode" value=""></property>
</properties>
</testsuite>
<testsuite name="" package="/home/shajmakh/cluster-node-tuning-operator/test/e2e/performanceprofile/functests/10_performance_ppc" tests="0" disabled="0" skipped="0" errors="0" failures="0" time="0" timestamp="0001-01-01T00:00:00">
<properties>
<property name="SuiteSucceeded" value="false"></property>
<property name="SuiteHasProgrammaticFocus" value="false"></property>
<property name="SpecialSuiteFailureReason" value="Suite did not run because prior suites failed and --keep-going is not set"></property>
<property name="SuiteLabels" value="[]"></property>
<property name="RandomSeed" value="1704801291"></property>
<property name="RandomizeAllSpecs" value="false"></property>
<property name="LabelFilter" value=""></property>
<property name="FocusStrings" value=""></property>
<property name="SkipStrings" value=""></property>
<property name="FocusFiles" value=""></property>
<property name="SkipFiles" value=""></property>
<property name="FailOnPending" value="false"></property>
<property name="FailFast" value="true"></property>
<property name="FlakeAttempts" value="2"></property>
<property name="DryRun" value="false"></property>
<property name="ParallelTotal" value="1"></property>
<property name="OutputInterceptorMode" value=""></property>
</properties>
</testsuite>
</testsuites>
35 changes: 31 additions & 4 deletions test/e2e/performanceprofile/functests/0_config/config.go
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@ package __performance_config
import (
"context"
"fmt"
"k8s.io/utils/cpuset"
"os"
"time"

Expand Down Expand Up @@ -65,11 +66,11 @@ var _ = Describe("[performance][config] Performance configuration", Ordered, fun

It("Should successfully deploy the performance profile", func() {

performanceProfile := testProfile()
performanceProfile, err := testProfile()
Expect(err).ToNot(HaveOccurred(), "failed to build performance profile: %v", err)
profileAlreadyExists := false

performanceManifest, foundOverride := os.LookupEnv("PERFORMANCE_PROFILE_MANIFEST_OVERRIDE")
var err error
if foundOverride {
performanceProfile, err = externalPerformanceProfile(performanceManifest)
Expect(err).ToNot(HaveOccurred(), "Failed overriding performance profile", performanceManifest)
Expand Down Expand Up @@ -158,9 +159,34 @@ func externalPerformanceProfile(performanceManifest string) (*performancev2.Perf
return profile, nil
}

func testProfile() *performancev2.PerformanceProfile {
func testProfile() (*performancev2.PerformanceProfile, error) {
reserved := performancev2.CPUSet("0")
isolated := performancev2.CPUSet("1-3")
offlined := performancev2.CPUSet("")

customReserved, foundReserved := os.LookupEnv("RESERVED_CPU_SET")
customIsolated, foundIsolated := os.LookupEnv("ISOLATED_CPU_SET")
if foundReserved != foundIsolated {
return nil, fmt.Errorf("both reserved and isolated cpusets are required.")
}
if foundReserved {
if _, err := cpuset.Parse(customReserved); err != nil {
return nil, fmt.Errorf("failed to parse the provided reserved cpu set %s: %v. Using default values", customReserved, err)
}
reserved = performancev2.CPUSet(customReserved)

if _, err := cpuset.Parse(customIsolated); err != nil {
return nil, fmt.Errorf("failed to parse the provided isolated cpu set %s: %v. Using default values", customIsolated, err)
}
isolated = performancev2.CPUSet(customIsolated)
}
customOfflined, foundOfflined := os.LookupEnv("OFFLINED_CPU_SET")
if foundOfflined {
if _, err := cpuset.Parse(customOfflined); err != nil {
return nil, fmt.Errorf("failed to parse the provided offlined cpu set %s: %v. Using default values", customOfflined, err)
}
offlined = performancev2.CPUSet(customOfflined)
}
hugePagesSize := performancev2.HugePageSize("1G")

profile := &performancev2.PerformanceProfile{
Expand All @@ -175,6 +201,7 @@ func testProfile() *performancev2.PerformanceProfile {
CPU: &performancev2.CPU{
Reserved: &reserved,
Isolated: &isolated,
Offlined: &offlined,
},
HugePages: &performancev2.HugePages{
DefaultHugePagesSize: &hugePagesSize,
Expand Down Expand Up @@ -215,5 +242,5 @@ func testProfile() *performancev2.PerformanceProfile {
"pools.operator.machineconfiguration.openshift.io/master": "",
}
}
return profile
return profile, nil
}
7 changes: 5 additions & 2 deletions test/e2e/performanceprofile/functests/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,9 @@ This deployment takes some time and requires reboot
Tests are executed in order of file-names
So be careful with renaming existing or adding new suites

DISCOVERY_MODE env variable to get an already deployed performanceProfile
Environment variables:
DISCOVERY_MODE: to get an already deployed performanceProfile.
If DISCOVERY_MODE set to true the suites will search for a PerformanceProfile on the cluster and use it.
If no PerformanceProfile is found, that suites will be skipped
If no PerformanceProfile is found, that suites will be skipped.

RESERVED_CPU_SET, ISOLATED_CPU_SET, OFFLINED_CPU_SET: strings that present the CPU sets distributed between reserved, isolated, and offlined CPU profile specifications. The runner is responsible for validating that these values are compatible with the testing environment.

0 comments on commit 031e6f5

Please sign in to comment.