Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add EmergencyPhase test case #57

Merged
merged 4 commits into from
Apr 13, 2023
Merged

Conversation

harpratap
Copy link
Contributor

What this PR does / why we need it:

Test case for Emergency mode
The difference here would be the minReplicas == maxReplicas and TargetUtilization is 90 (same as default)

Which issue(s) this PR fixes:

Fixes # #2

Special notes for your reviewer:

@harpratap harpratap self-assigned this Apr 11, 2023
}

err := tc.createResources(ctx, k8sClient, cfg)
Expect(err).ShouldNot(HaveOccurred())
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This step will fail because VPA and HPA already exists. I added cleanUP() but even after deleting mercari tortoise the child VPA and HPA still exists. Is this expected?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I completely forgot to add the cleanup, thanks.

even after deleting mercari tortoise the child VPA and HPA still exists.

Yes, it's the expected behavior at the moment. Currently the controller doesn't delete VPA or HPA managed by tortoises when we delete tortoises.

I think at least we can delete VPAs as you expected because the controller creates the VPAs when tortoise is created.
But, for HPA, there're a possibility that HPA is created by the users and attached to tortoise by them/ So probably we don't want to delete it in that case.

Anyway, I'll create the issue, thanks for bringing this up. 👍

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#58

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now, please just delete all VPAs and HPAs in cleanUp fn 🙏

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added - fa6fa12

@harpratap harpratap marked this pull request as ready for review April 11, 2023 07:58
controllers/tortoise_controller_test.go Outdated Show resolved Hide resolved
wantHPA.Spec.MaxReplicas = 20
for i, m := range wantHPA.Spec.Metrics {
if m.External != nil && m.External.Metric.Name == "datadogmetric@default:mercari-app-cpu-app" {
wantHPA.Spec.Metrics[i].External.Target.Value = resourceQuantityPtr(resource.MustParse("90"))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, it should be 75... I added the change to fix the bug causing this. Could you rebase this branch upon the latest main branch and change here to 75?

Copy link
Contributor Author

@harpratap harpratap Apr 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sanposhiho Thanks, done! BTW just curious why is it a bug? I think it won't make any difference if the targetUtilization is 0% or 100% when minReplica == maxReplica

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was this integration test specific bug.

it won't make any difference if the targetUtilization is 0% or 100% when minReplica == maxReplica

This is true. But, tortoise's expected behavior here is updating the target utilization like the tortoise does during the non-emergency mode although, in actual case, any target utilization doesn't give any impact on HPA's behavior.


To exactly explain how this bug happened (sorry super long story):

We create tortoise and then update tortoise status later in createResources fn in this integration test.
So, after creating tortoise and before updating status, the controller will try to reconcile the tortoise.
This test case specifies the emergency mode, so the tortoise controller regards the tortoise as finishing the initialization step and tries to proceed creating the recommendation on the tortoise status. (In initialization step, the controller fills some required fields in tortoise's status)

But, the initialization step isn't actually finished and updating the tortoise status get failed because some required field in status are missing. Before introducing this bug fix, the controller is handling the tortoise like:

  1. calculate the recommendation
  2. update HPA based on calculated recommendation.
  3. update tortoise status with calculated recommendations.

So, if failing to update tortoise status, HPA and VPA have been updated based on calculated recommendations, but tortoise status cannot get the recommendation on the status.

Also, during (2), In this test case, the CPU target utilization gets changed 50 -> 75 because:

  • the VPA recommendation is 3 core, the current HPA target utilization is 50% (=2 core), and the CPU request is 4 cores.
  • VPA recommendation showed that the target Pod is sometimes using 1 more core than the target utilization. In other words, we only need to give 1 additional core (= 25% given the CPU req is 4 core)

Then after updating tortoise status, the controller again reconciles the tortoise; calculate the recommendation again. The tortoise is calculating the target utilization recommendation based on the VPA's recommendation value and HPA's current target utilization.
But, the HPA was updated at the last reconciliation (50% -> 75%), but the VPA doesn't change its recommendation because the VPA in the integration test is just fake one.
So the recommendation is calculated like:

  • the VPA recommendation is 3 core (not updated), the current HPA target utilization is 75% (=3 core), and the CPU request is 4 cores.
  • VPA recommendation showed that the target Pod is using exactly the same core as the HPA target utilization. In other words, we don't need to give additional core to this Pod. (shouldn't actually happen)

So, the tortoise recommend 100% CPU target utilization on HPA in this reconciliation, but we configure that 90% is the maximum, so at final, the tortoise gives 90% target utilization on HPA.

This bug fix changed the controller to update the status once before updating HPA and VPA, so we can now prevent this from happening.

controllers/tortoise_controller_test.go Outdated Show resolved Hide resolved
@sanposhiho sanposhiho merged commit 57e5a79 into main Apr 13, 2023
@sanposhiho sanposhiho deleted the harry/integration-test/emergency branch April 13, 2023 02:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants