Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make PVC scaling more idempotent & resilient to crashes #838

Merged
merged 5 commits into from Sep 23, 2021

Conversation

coro
Copy link
Contributor

@coro coro commented Sep 8, 2021

Note to reviewers: remember to look at the commits in this PR and consider if they can be squashed

Summary Of Changes

This allows for this section of the code to have its own test suite,
without relying on the restrictions placed by envtest. This allows us to
test more scenarios, such as where a controller has previously crashed
and left the existing API objects in a bad state (e.g. statefulsets
missing, PVCs being the wrong size, etc.).

This also fixes #782, as the process of scaling PVCs should now be more
idempotent than before:

  • StatefulSets don't need to be present on the cluster in order to scale
    the PVCs (as long as the PVCs exist)
  • PVC scaling is done on a PVC-by-PVC case, rather than all at once, so
    that if one or more replicas have already scaled their PVCs it does not
    block the others
  • The current size of PVCs is determined by the live reading of the K8s
    API, rather than what's reported in the RabbitmqCluster object.

Local Testing

Please ensure you run the unit, integration and system tests before approving the PR.

To run the unit and integration tests:

$ make unit-tests integration-tests

You will need to target a k8s cluster and have the operator deployed for running the system tests.

For example, for a Kubernetes context named dev-bunny:

$ kubectx dev-bunny
$ make destroy deploy-dev
# wait for operator to be deployed
$ make system-tests

This allows for this section of the code to have its own test suite,
without relying on the restrictions placed by envtest. This allows us to
test more scenarios, such as where a controller has previously crashed
and left the existing API objects in a bad state (e.g. statefulsets
missing, PVCs being the wrong size, etc.).

This also fixes #782, as the process of scaling PVCs should now be more
idempotent than before:
- StatefulSets don't need to be present on the cluster in order to scale
the PVCs (as long as the PVCs exist)
- PVC scaling is done on a PVC-by-PVC case, rather than all at once, so
that if one or more replicas have already scaled their PVCs it does not
block the others
- The current size of PVCs is determined by the live reading of the K8s
API, rather than what's reported in the RabbitmqCluster object.
@coro coro marked this pull request as ready for review September 9, 2021 10:11
@coro coro changed the title WIP: Extract PVC scaling to its own package Extract PVC scaling to its own package Sep 9, 2021
@coro coro changed the title Extract PVC scaling to its own package Make PVC scaling more idempotent & resilient to crashes Sep 9, 2021
Copy link
Member

@MirahImage MirahImage left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll leave it up to you as to whether you want to leave the BeforeEach and JustBeforeEach in the scaling_test_suite or move them into the scaling_test.

@ansd ansd self-requested a review September 22, 2021 07:34
Copy link
Member

@ansd ansd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very good changes.

Especially the unit tests are excellent 😍 ... not only the Gomega matchers you wrote but also that you use the K8s unit test packages 🙌

@coro I pushed two commits.
Logs were polluted at Info level with large PVC structs.
I set it to Debug level. Feel free to remove that log line completely.

when not defining replicas in Go struct

Before this commit, when replicas was not defined as in:
    Spec: rabbitmqv1beta1.RabbitmqClusterSpec{
            Replicas: &one,
    },

unit tests were failing.

Replicas is an optional field.
If not set, it should default to 1 instead of causing a panic.
Before this commit huge "Found exising PVCs" logs
were output after every reconcilication even though PVCs didn't get
resized.

"By default logr's V(0) is zap's InfoLevel and V(1) is zap's DebugLevel (which is numerically -1)."
@coro coro merged commit 35d4743 into main Sep 23, 2021
@coro coro deleted the persistence_scaling branch September 23, 2021 10:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] PVC expansion fails if the operator crashes in the middle of reconciliation
3 participants