Skip to content

Commit 3d8ca18

Browse files
authored
Operator should not overwrite the replica field when it is not set in the yaml (#805)
1 parent c2aa640 commit 3d8ca18

File tree

6 files changed

+200
-13
lines changed

6 files changed

+200
-13
lines changed

docs/scaling/010_overview.adoc

Lines changed: 106 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -41,9 +41,114 @@ NOTE: The Operator will only apply safe scaling functionality to deployments tha
4141
If a deployment is storage disabled then it can be scaled up or down by the required number of members
4242
in one step as there is no fear of data loss in a storage disabled member.
4343
44+
[#problem]
45+
== Kubectl Scale & Autoscaling
46+
47+
When using the `kubectl scale` command or the Kubernetes autoscaler to scale Coherence clusters it is important
48+
to be aware of how replicas are controlled when later applying updates to a cluster.
49+
50+
=== The Problem
51+
52+
If the replica count that has been changed via one of the scaling commands, when later updating it using an edited
53+
version of the original YAML the replicas may be reverted and the cluster inadvertently resized.
54+
55+
For example, a Coherence cluster can be defined with the `storage-cluster.yaml` file shown below.
56+
57+
[source,yaml]
58+
.storage-cluster.yaml
59+
----
60+
apiVersion: coherence.oracle.com/v1
61+
kind: Coherence
62+
metadata:
63+
name: storage
64+
spec:
65+
replicas: 6
66+
----
67+
68+
Applying the YAML file will create a cluster with six Pods.
69+
[source,bash]
70+
----
71+
kubectl apply -f storage-cluster.yaml
72+
----
73+
74+
The cluster can then be scaled using kubectl scale
75+
[source,bash]
76+
----
77+
kubectl scale coh/storage --replicas=9
78+
----
79+
80+
The cluster now has nine Pods. Later an update is applied to add a new system property to the JVM by
81+
editing the original YAML file:
82+
83+
[source,yaml]
84+
.storage-cluster.yaml
85+
----
86+
apiVersion: coherence.oracle.com/v1
87+
kind: Coherence
88+
metadata:
89+
name: storage
90+
spec:
91+
replicas: 6
92+
jvm:
93+
args:
94+
- "-Dfoo=bar"
95+
----
96+
97+
The update can be applied using kubectl apply:
98+
[source,bash]
99+
----
100+
kubectl apply -f storage-cluster.yaml
101+
----
102+
103+
But, the YAML file contains the original replica count, so the cluster will be scaled back from nine to six Pods.
104+
This is probably not what was desired.
105+
106+
One solution is to always make sure the replicas in the yaml is set to the "correct" value before applying it.
107+
This can be awkward in some environments, and especially if using the Kubernetes HPA to control scaling.
108+
109+
=== The Solution
110+
111+
If you intend to use `kubectl scale` or the Kubernetes HPA to control scaling then it is best to not set the
112+
`replicas` field in the YAML file used to create and update the Coherence cluster.
113+
114+
For example, the initial YAML above could have been created like this:
115+
116+
[source,yaml]
117+
.storage-cluster.yaml
118+
----
119+
apiVersion: coherence.oracle.com/v1
120+
kind: Coherence
121+
metadata:
122+
name: storage
123+
----
124+
125+
After applying the YAML, the cluster size would start with the default three Pods.
126+
After creation, the cluster can easily be scaled up to its required size using `kubectl scale`
127+
128+
Later when applying the system property update, again the `replicas` field is unset in the YAML file:
129+
[source,yaml]
130+
.storage-cluster.yaml
131+
----
132+
apiVersion: coherence.oracle.com/v1
133+
kind: Coherence
134+
metadata:
135+
name: storage
136+
spec:
137+
jvm:
138+
args:
139+
- "-Dfoo=bar"
140+
----
141+
142+
Now, when the above YAML is applied using `kubectl apply` the replicas value for the cluster will be unchanged and
143+
stay at whatever value it was scaled to.
144+
145+
Another solution would be to create the initial YAML with the `replicas` field set to the desired initial size.
146+
Then later when applying updates, ensure that the `replicas` field has been deleted from the YAML file.
147+
148+
44149
== Controlling Safe Scaling
45150
46-
The `Coherence` CRD has a number of fields that control the behaviour of scaling.
151+
The `Coherence` CRD has a number of fields that control the behavior of scaling.
47152
48153
=== Scaling Policy
49154

examples/200_autoscaler/README.adoc

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -32,12 +32,18 @@ The diagram below shows, at a high level, how this works.
3232
3333
image::images/autoscaler.png[]
3434
35-
Prometheus will obtain metrics from the Coherence Pod's metrics endpoints.
35+
Prometheus will get metrics from the Coherence Pod's metrics endpoints.
3636
The Prometheus Adapter exposes certain configured metrics polled from Prometheus as custom Kubernetes metrics.
3737
The HPA is configured to poll the custom metrics and use those to scale the `Coherence` resource (which will in turn cause
3838
the Coherence Operator to scale the `StatefulSet`).
3939
40+
[IMPORTANT]
41+
====
42+
When using the HPA it is important to understand how the replica field in the Coherence resource
43+
is applied to avoid inadvertently resizing an autoscaled cluster when applying updates to the YAML.
4044
45+
See the explanation in <<docs/scaling/010_overview.adoc#problem,Scale Coherence Deployments - The Problem>>
46+
====
4147
4248
== Autoscaling Coherence Clusters
4349

test/certification/certifiy_deployment_test.go

Lines changed: 46 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -84,10 +84,56 @@ func TestCertifyScaling(t *testing.T) {
8484
g.Expect(err).NotTo(HaveOccurred())
8585
}
8686

87+
// Test the scenario where we create a Coherence cluster without a replicas field, which will default to three Pods.
88+
// Then scale up the cluster to four.
89+
// The apply an update using the same Coherence resource with no replicas field.
90+
// After the update is applied, the cluster should still be four and not revert to three.
91+
func TestCertifyScalingWithUpdate(t *testing.T) {
92+
// Ensure that everything is cleaned up after the test!
93+
testContext.CleanupAfterTest(t)
94+
g := NewGomegaWithT(t)
95+
96+
ns := helper.GetTestClusterNamespace()
97+
98+
// the name of the cluster from scale-with-update-one.yaml and scale-with-update-two.yaml
99+
name := "certify-scale-update"
100+
101+
// Start with the default three replicas
102+
err := apply(t, ns, "scale-with-update-one.yaml")
103+
g.Expect(err).NotTo(HaveOccurred())
104+
_, err = helper.WaitForPodsWithLabel(testContext, ns, "one=testOne", 3, time.Second*10, time.Minute*10)
105+
g.Expect(err).NotTo(HaveOccurred())
106+
107+
// Scale Up to four
108+
err = scale(t, ns, name, 4)
109+
g.Expect(err).NotTo(HaveOccurred())
110+
_, err = helper.WaitForStatefulSet(testContext, ns, name, 4, time.Second*10, time.Minute*5)
111+
g.Expect(err).NotTo(HaveOccurred())
112+
113+
// apply the update
114+
err = apply(t, ns, "scale-with-update-two.yaml")
115+
g.Expect(err).NotTo(HaveOccurred())
116+
// There should eventually be four Pods with the additional label
117+
_, err = helper.WaitForPodsWithLabel(testContext, ns, "two=testTwo", 4, time.Second*10, time.Minute*10)
118+
g.Expect(err).NotTo(HaveOccurred())
119+
}
120+
87121
func scale(t *testing.T, namespace, name string, replicas int32) error {
88122
cmd := exec.Command("kubectl", "-n", namespace, "scale", fmt.Sprintf("--replicas=%d", replicas), "coherence/"+name)
89123
cmd.Stdout = os.Stdout
90124
cmd.Stderr = os.Stderr
91125
t.Log("Executing Scale Command: " + strings.Join(cmd.Args, " "))
92126
return cmd.Run()
93127
}
128+
129+
func apply(t *testing.T, namespace, fileName string) error {
130+
actualFile, err := helper.FindActualFile(fileName)
131+
if err != nil {
132+
return err
133+
}
134+
cmd := exec.Command("kubectl", "-n", namespace, "apply", "-f", actualFile)
135+
cmd.Stdout = os.Stdout
136+
cmd.Stderr = os.Stderr
137+
t.Log("Executing Kubectl Command: " + strings.Join(cmd.Args, " "))
138+
return cmd.Run()
139+
}
Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
apiVersion: coherence.oracle.com/v1
2+
kind: Coherence
3+
metadata:
4+
name: certify-scale-update
5+
spec:
6+
labels:
7+
one: testOne
8+
readinessProbe:
9+
initialDelaySeconds: 10
10+
periodSeconds: 10
Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
apiVersion: coherence.oracle.com/v1
2+
kind: Coherence
3+
metadata:
4+
name: certify-scale-update
5+
spec:
6+
labels:
7+
one: testOne
8+
two: testTwo

test/e2e/remote/scaling_test.go

Lines changed: 23 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
/*
2-
* Copyright (c) 2020, 2024, Oracle and/or its affiliates.
2+
* Copyright (c) 2020, 2025, Oracle and/or its affiliates.
33
* Licensed under the Universal Permissive License v 1.0 as shown at
44
* http://oss.oracle.com/licenses/upl.
55
*/
@@ -9,22 +9,24 @@ package remote
99
import (
1010
goctx "context"
1111
"fmt"
12+
"io"
13+
"strings"
14+
"testing"
15+
"time"
16+
1217
cohv1 "github.com/oracle/coherence-operator/api/v1"
1318
"github.com/oracle/coherence-operator/test/e2e/helper"
1419
"golang.org/x/net/context"
15-
"io"
1620
appsv1 "k8s.io/api/apps/v1"
21+
"k8s.io/apimachinery/pkg/types"
1722
"k8s.io/utils/ptr"
1823
"sigs.k8s.io/testing_frameworks/integration"
19-
"strings"
20-
"testing"
21-
"time"
2224

2325
. "github.com/onsi/gomega"
2426
)
2527

2628
// Test scaling up and down with different policies.
27-
// This test is an example of using sub-tests to run the test with different test cases.
29+
// This test is an example of using subtests to run the test with different test cases.
2830
func TestScaling(t *testing.T) {
2931
// Ensure that everything is cleaned up after the test!
3032
testContext.CleanupAfterTest(t)
@@ -91,7 +93,7 @@ func TestScaleDownToZeroWithSuspendFalse(t *testing.T) {
9193
}
9294

9395
// If a deployment is scaled down to zero it should be deleted and just its parent Coherence resource should remain.
94-
// This test scales down using the "kubectl scale --relicas=0" command
96+
// This test scales down using the "kubectl scale --replicas=0" command
9597
func TestScaleDownToZeroUsingKubectl(t *testing.T) {
9698
// Ensure that everything is cleaned up after the test!
9799
testContext.CleanupAfterTest(t)
@@ -138,7 +140,7 @@ var kubeCtlScaler = func(t *testing.T, d *cohv1.Coherence, replicas int32) error
138140
}
139141

140142
// Assert that a deployment can be created and scaled using the specified policy.
141-
func assertScale(t *testing.T, id string, policy cohv1.ScalingPolicy, replicasStart, replicasScale int32, scaler ScaleFunction) {
143+
func assertScale(t *testing.T, id string, policy cohv1.ScalingPolicy, replicasStart, replicasScale int32, scaler ScaleFunction) types.NamespacedName {
142144
g := NewGomegaWithT(t)
143145

144146
testContext.CleanupAfterTest(t)
@@ -153,16 +155,24 @@ func assertScale(t *testing.T, id string, policy cohv1.ScalingPolicy, replicasSt
153155
// Give the deployment a unique name based on the test name
154156
deployment.SetName(fmt.Sprintf("%s-%s", deployment.GetName(), strings.ToLower(id)))
155157

156-
// update the replica count and scaling policy
157-
deployment.SetReplicas(replicasStart)
158+
// update the replica count if greater than or equal zero, otherwise do not set the replica count field
159+
var initialReplicas int32
160+
if replicasStart >= 0 {
161+
deployment.SetReplicas(replicasStart)
162+
initialReplicas = replicasStart
163+
} else {
164+
deployment.Spec.Replicas = nil
165+
initialReplicas = cohv1.DefaultReplicas
166+
}
158167

168+
// update the scaling policy
159169
if deployment.Spec.Scaling == nil {
160170
deployment.Spec.Scaling = &cohv1.ScalingSpec{}
161171
}
162172
deployment.Spec.Scaling.Policy = &policy
163173

164174
// Do the canary test unless parallel scaling down
165-
doCanary := replicasStart < replicasScale || policy != cohv1.ParallelScaling
175+
doCanary := initialReplicas < replicasScale || policy != cohv1.ParallelScaling
166176

167177
t.Logf("assertScale() - doCanary=%t", doCanary)
168178
t.Log("assertScale() - Installing Coherence deployment...")
@@ -186,6 +196,8 @@ func assertScale(t *testing.T, id string, policy cohv1.ScalingPolicy, replicasSt
186196
err = helper.CheckCanary(testContext, namespace, deployment.Name)
187197
g.Expect(err).NotTo(HaveOccurred())
188198
}
199+
200+
return types.NamespacedName{Namespace: deployment.Namespace, Name: deployment.Name}
189201
}
190202

191203
func assertScaleDownToZero(t *testing.T, id string, scaler ScaleFunction, suspend *bool) {

0 commit comments

Comments
 (0)