Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

k8s: Wait for restarted broker to catch up #7594

Conversation

RafalKorepta
Copy link
Contributor

@RafalKorepta RafalKorepta commented Dec 1, 2022

UnderReplicatedPartitionThreshold regulate when rolling update will continue with
restarts. The procedure can be described as follows:

  1. Rolling update checks if Pod specification needs to be replaced and deletes it
  2. Deleted Redpanda Pod is put into maintenance mode (postStart hook will disable
    maintenance mode when new Pod starts)
  3. Rolling update waits for Pod to be in Ready state
  4. Rolling update checks if cluster is in healthy state
  5. Rolling update checks if restarted Redpanda Pod admin API Ready endpoint returns HTTP 200 response
  6. Using UnderReplicatedPartitionThreshold each under replicated partition metric is compared with the threshold
  7. Rolling update moves to the next Redpanda pod

The metric vectorized_cluster_partition_under_replicated_replicas is used in the comparison

Mentioned metrics has the following help description:
vectorized_cluster_partition_under_replicated_replicas Number of under replicated replicas

By default, the UnderReplicatedPartitionThreshold will be 0, which means all partitions needs to catch up without any lag.

Backports Required

  • none - not a bug fix
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v22.3.x
  • v22.2.x
  • v22.1.x

UX Changes

Release Notes

Features

  • Wait for under replicated partition to catch up

REF

#3023 (comment)

@RafalKorepta RafalKorepta requested a review from a team as a code owner December 1, 2022 15:50
@RafalKorepta RafalKorepta force-pushed the rk/gh-3023/check-under-replicated-partitions-in-upgrade-procedure branch from c5fe25d to 38e0c58 Compare January 2, 2023 10:21
Copy link
Member

@nicolaferraro nicolaferraro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a minor issue on the logic. Other than that, it looks good.

continue
}
if m.Gauge == nil {
r.logger.Info("cluster_partition_under_replicated_replicas metric does not have value", "labels", m.Label)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

continue missing

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thx added

@@ -139,6 +140,7 @@ func NewStatefulSet(
decommissionWaitInterval,
logger.WithValues("Kind", statefulSetKind()),
nil,
0,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems there's no way to customize this. Is this expected?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes and no. In ideal world Redpanda operator should have close loop for rolling update. When it detects the need for restart, then Redpanda should be throttled to the point where operator can observe no lag on all Redpanda instances. This is ideal, but in real deployment we can not throttle producers, so there are to options:

  1. threshold in cluster custom resource definition
  2. threshold in operator configuration (currently command line flags)

I would lean towards threshold in cluster custom resource definition.

What are your thoughts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I would also feel better if this is customizable - in case this yields cluster never catching up so we can at least tweak it and let it move on...

I am fine with both options you presented.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a restartConfig section in the CRD where this can fit. I'm thinking to cases where we do an upgrade between slightly incompatible versions and the cluster is unable to replicate partitions until all nodes are at the same version. The flag allows to unlock the upgrade..

if err = r.queryRedpandaUnderReplicatedPartition(ctx, &adminURL); err != nil {
return &RequeueAfterError{
RequeueAfter: RequeueDuration,
Msg: fmt.Sprintf("under replicated partition is not ready: %v", err),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(non-blocking) Maybe a better description of the error...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will try with broker reported under replicated partitions: %v

@RafalKorepta RafalKorepta force-pushed the rk/gh-3023/check-under-replicated-partitions-in-upgrade-procedure branch from 38e0c58 to c5bb116 Compare January 2, 2023 11:26
Copy link
Contributor

@alenkacz alenkacz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, great addition to make rollout safer! I left some comments though...

src/go/k8s/pkg/resources/statefulset.go Outdated Show resolved Hide resolved
@@ -139,6 +140,7 @@ func NewStatefulSet(
decommissionWaitInterval,
logger.WithValues("Kind", statefulSetKind()),
nil,
0,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I would also feel better if this is customizable - in case this yields cluster never catching up so we can at least tweak it and let it move on...

I am fine with both options you presented.

src/go/k8s/pkg/resources/statefulset_update.go Outdated Show resolved Hide resolved
client := &http.Client{Timeout: adminAPITimeout}

// TODO right now we support TLS only on one listener so if external
// connectivity is enabled, TLS is enabled only on external listener. This
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is true, tls could be enabled on internal listener as well. I think the behavior you're describing is true only for schema registry.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be brutally honest I copied it from your previous implementation

// TODO right now we support TLS only on one listener so if external
// connectivity is enabled, TLS is enabled only on external listener. This
// will be fixed by https://github.com/redpanda-data/redpanda/issues/1084

6235e96#diff-5619b1cd9bdb790f53d380d84b884e952da245749cb78145fdf8f6b328b1be2cR182-R184

}

for name, metricFamily := range metrics {
if name != "vectorized_cluster_partition_under_replicated_replicas" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is per node metric? Don't we have to look for metric for this particular node?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will ask core how to discover under replicated replicas

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be honest I forget how I implemented this function. We query current POD. You can see that headless service with exact pod name is used

Host: fmt.Sprintf("%s.%s", pod.Name, headlessServiceWithPort),

That means metrics comes from one POD.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh I see, thanks

@RafalKorepta RafalKorepta force-pushed the rk/gh-3023/check-under-replicated-partitions-in-upgrade-procedure branch 3 times, most recently from 12aac25 to 4515dc8 Compare January 4, 2023 15:20
@RafalKorepta
Copy link
Contributor Author

RafalKorepta commented Jan 5, 2023

@RafalKorepta RafalKorepta force-pushed the rk/gh-3023/check-under-replicated-partitions-in-upgrade-procedure branch 3 times, most recently from af460c4 to 77934ba Compare January 5, 2023 13:46
alenkacz
alenkacz previously approved these changes Jan 6, 2023
Copy link
Contributor

@alenkacz alenkacz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, no blockers

@@ -177,6 +177,26 @@ type ClusterSpec struct {
type RestartConfig struct {
// DisableMaintenanceModeHooks deactivates the preStop and postStart hooks that force nodes to enter maintenance mode when stopping and exit maintenance mode when up again
DisableMaintenanceModeHooks *bool `json:"disableMaintenanceModeHooks,omitempty"`

// UnderReplicatedPartitionThreshold regulate when rolling update will continue with
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: regulates - or rather controls

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@@ -217,6 +217,19 @@ func TestDefault(t *testing.T) {
redpandaCluster.Default()
assert.Equal(t, v1alpha1.DefaultLicenseSecretKey, redpandaCluster.Spec.LicenseRef.Key)
})

t.Run("", func(t *testing.T) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: should the test have a name?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test name added


func TestEvaluateRedpandaUnderReplicatedPartition(t *testing.T) {
ts := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
f, err := os.Open("testdata/metrics.gold.txt")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think the filed are supposed to be called golden, not gold

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, name changed.

Rafal Korepta added 2 commits January 6, 2023 11:13
When rolling restart is performed then the broker might need to catch up with
other replica. This change will parse /metrics endpoint to see if
`vectorized_cluster_partition_under_replicated_replicas` is under threshold.
Kminion will create artificial load to have any values for
`vectorized_cluster_partition_under_replicated_replicas` metric.
@RafalKorepta RafalKorepta force-pushed the rk/gh-3023/check-under-replicated-partitions-in-upgrade-procedure branch from 77934ba to f40b53c Compare January 6, 2023 10:14
@RafalKorepta RafalKorepta merged commit 5edf3ad into redpanda-data:dev Jan 6, 2023
joejulian pushed a commit to joejulian/redpanda that referenced this pull request Mar 10, 2023
joejulian pushed a commit to joejulian/redpanda that referenced this pull request Mar 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants