New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FleetAutoScaler Buffer policy does not notice Unhealthy/Stopping gameservers #423
Comments
The current behavior is done on purpose, to prevent flooding the cluster with servers. Your buffer size must account for init and shutdown times. |
Is this something we could document better? Would love suggestions if we there are particular places we can do that. |
Hmm... maybe more aggressive fas strategy as configured parameter?
|
A couple of interesting things from this:
|
|
What do you mean by "fleet summary" explicitly? We are passing through the fleet status, which includes counts. That all being said, the webhook implantation can always access the k8s API for any extra information it needs. |
current state
does not reflect my core problem - gs in shutdown+unhealthy state :) |
I take it subtracting allocated and ready from total replicas doesn't give you what you want? We could always add more values to the status totals as we find them necessary. Also the webhook will only fire every 30s, just like the current buffer autoscaler - for exactly the same reason as @victor-prodan described. |
OK I tune buffer size. Current fleet "split brain" situation (max-min-buffer in fas, current count in flt) make this process ...complicated. Do you have any plans about metrics for Prometheus or internal k8s /metrics? |
Hey @Oleksii-Terekhov , I'm currently working on metrics, and the first exporter option will be Prometheus. I currently have implemented :
let me know ! |
With dreams about multi-cluster: And maybe some info about fas |
I think prometheus will automatically add the namespace. I will add the fleet_name in the count of gameservers, good idea, fas metric seems doable I'll make sure it's on the first draft. Thanks ! |
Bug: When gs goes to stop from Allocated state, fas doesn't start new gs until old pod doesn't disappear
Now gameServer may be in statuses:
My setup: gs with additional sidecar container. After session end gs in state Unhealthy about 30-60 seconds due sidecar stop. Sometimes during this time i cannot play - all MinReplicas+BufferSize servers unhealthy (but for fas - they READY or ALLOCATED :(( )
Proposal: don't count Shutdown Error Unhealthy in applyBufferPolicy() if we didn't reach MaxReplicas
Code snippet from pkg\fleetautoscalers\fleetautoscalers.go
The text was updated successfully, but these errors were encountered: