-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add EmergencyPhase test case #57
Conversation
} | ||
|
||
err := tc.createResources(ctx, k8sClient, cfg) | ||
Expect(err).ShouldNot(HaveOccurred()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This step will fail because VPA and HPA already exists. I added cleanUP()
but even after deleting mercari
tortoise the child VPA and HPA still exists. Is this expected?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I completely forgot to add the cleanup, thanks.
even after deleting mercari tortoise the child VPA and HPA still exists.
Yes, it's the expected behavior at the moment. Currently the controller doesn't delete VPA or HPA managed by tortoises when we delete tortoises.
I think at least we can delete VPAs as you expected because the controller creates the VPAs when tortoise is created.
But, for HPA, there're a possibility that HPA is created by the users and attached to tortoise by them/ So probably we don't want to delete it in that case.
Anyway, I'll create the issue, thanks for bringing this up. 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now, please just delete all VPAs and HPAs in cleanUp
fn 🙏
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added - fa6fa12
wantHPA.Spec.MaxReplicas = 20 | ||
for i, m := range wantHPA.Spec.Metrics { | ||
if m.External != nil && m.External.Metric.Name == "datadogmetric@default:mercari-app-cpu-app" { | ||
wantHPA.Spec.Metrics[i].External.Target.Value = resourceQuantityPtr(resource.MustParse("90")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, it should be 75... I added the change to fix the bug causing this. Could you rebase this branch upon the latest main
branch and change here to 75?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sanposhiho Thanks, done! BTW just curious why is it a bug? I think it won't make any difference if the targetUtilization is 0% or 100% when minReplica == maxReplica
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was this integration test specific bug.
it won't make any difference if the targetUtilization is 0% or 100% when minReplica == maxReplica
This is true. But, tortoise's expected behavior here is updating the target utilization like the tortoise does during the non-emergency mode although, in actual case, any target utilization doesn't give any impact on HPA's behavior.
To exactly explain how this bug happened (sorry super long story):
We create tortoise and then update tortoise status later in createResources
fn in this integration test.
So, after creating tortoise and before updating status, the controller will try to reconcile the tortoise.
This test case specifies the emergency mode, so the tortoise controller regards the tortoise as finishing the initialization step and tries to proceed creating the recommendation on the tortoise status. (In initialization step, the controller fills some required fields in tortoise's status)
But, the initialization step isn't actually finished and updating the tortoise status get failed because some required field in status are missing. Before introducing this bug fix, the controller is handling the tortoise like:
- calculate the recommendation
- update HPA based on calculated recommendation.
- update tortoise status with calculated recommendations.
So, if failing to update tortoise status, HPA and VPA have been updated based on calculated recommendations, but tortoise status cannot get the recommendation on the status.
Also, during (2), In this test case, the CPU target utilization gets changed 50 -> 75 because:
- the VPA recommendation is 3 core, the current HPA target utilization is 50% (=2 core), and the CPU request is 4 cores.
- VPA recommendation showed that the target Pod is sometimes using 1 more core than the target utilization. In other words, we only need to give 1 additional core (= 25% given the CPU req is 4 core)
Then after updating tortoise status, the controller again reconciles the tortoise; calculate the recommendation again. The tortoise is calculating the target utilization recommendation based on the VPA's recommendation value and HPA's current target utilization.
But, the HPA was updated at the last reconciliation (50% -> 75%), but the VPA doesn't change its recommendation because the VPA in the integration test is just fake one.
So the recommendation is calculated like:
- the VPA recommendation is 3 core (not updated), the current HPA target utilization is 75% (=3 core), and the CPU request is 4 cores.
- VPA recommendation showed that the target Pod is using exactly the same core as the HPA target utilization. In other words, we don't need to give additional core to this Pod. (shouldn't actually happen)
So, the tortoise recommend 100% CPU target utilization on HPA in this reconciliation, but we configure that 90% is the maximum, so at final, the tortoise gives 90% target utilization on HPA.
This bug fix changed the controller to update the status once before updating HPA and VPA, so we can now prevent this from happening.
What this PR does / why we need it:
Test case for Emergency mode
The difference here would be the minReplicas == maxReplicas and TargetUtilization is 90 (same as default)
Which issue(s) this PR fixes:
Fixes # #2
Special notes for your reviewer: