Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fleet autoscaler should not modify a Fleet's spec #3229

Open
jeremylvln opened this issue Jun 21, 2023 · 3 comments
Open

Fleet autoscaler should not modify a Fleet's spec #3229

jeremylvln opened this issue Jun 21, 2023 · 3 comments
Labels
kind/bug These are bugs. stale Pending closure unless there is a strong objection.

Comments

@jeremylvln
Copy link
Contributor

jeremylvln commented Jun 21, 2023

What happened:

I created a FleetAutoscaler managing a Fleet with a Buffer policy. The autoscaler behavior is to modify the spec of the Fleet with an updated replicas count. This is wrong because if another controller is in charge of managing Agones resources, this controller and Agones will fight and update consequently the replicas field because one will think the value is wrong and cycle with that.

I consider this more a bug than a feature request because it should really not be done this way. But this ticket has a step in both categories.

What you expected to happen:

The expected behavior is something like Kubernetes built-in Deployment and HorizontalPodAutoscaler where the spec is not modified, but the handling is rather internal. So I think you have three ways of handling this issue:

  1. Rather than modifying the spec modify the status. This way a watching controller will not try to fight Agones, but I agree that the semantic is a bit violated as one resource should not modify the status of another
  2. Refactor the code to take into account the replicas count queried by the autoscaler in the fleet reconciler, if any, otherwise use the replicas count from the spec. This couple a bit the reconcile code of the fleet with the autoscaler, but it still a valid option
  3. Implement the scale subresource on the Fleet and manages any runtime modification of the replicas through it (see below)

From Kubernetes's documentation about HPAs (link):

The HorizontalPodAutoscaler controller accesses corresponding workload resources that support scaling (such as Deployments and StatefulSet). These resources each have a subresource named scale, an interface that allows you to dynamically set the number of replicas and examine each of their current states. For general information about subresources in the Kubernetes API, see Kubernetes API Concepts.

How to reproduce it (as minimally and precisely as possible):

Create a dummy Fleet. Create a FleetAutoscaler with dummy values.

Anything else we need to know?:

Environment:

  • Agones version: 1.31.0
  • Kubernetes version (use kubectl version): 1.27
  • Cloud provider or hardware configuration: Scaleway
  • Install method (yaml/helm): yaml
  • Troubleshooting guide log(s): N/A
  • Others: N/A
@jeremylvln
Copy link
Contributor Author

Update: If I understood correctly the Fleet API, it seems that the scale subresource is at least declared (implemented?). So there is already a step in solution 3.

@roberthbailey
Copy link
Member

The scale subresource has been implemented on Fleet for quite a while and is used in the fleet quickstart: https://agones.dev/site/docs/getting-started/create-fleet/#3-scale-up-the-fleet

I'm not sure I quite understand your issue though. If you use the scale subresource, that is just an indirect way of changing the replicas count in the spec for the Fleet, is it not?

Further down in the HPA documentation that you linked it says:

When you configure autoscaling for a Deployment, you bind a HorizontalPodAutoscaler to a single Deployment. The HorizontalPodAutoscaler manages the replicas field of the Deployment.

So even if the HPA controller is changing the replicas field via the scale subresource, it still manages that field. The scale subresource is just a generic interface to the replicas field so that the HPA controller doesn't need to have code added for any new resource types that can be scaled up and down.

Also, to your point about conflicting controllers, the HPA documentation says:

When an HPA is enabled, it is recommended that the value of spec.replicas of the Deployment and / or StatefulSet be removed from their manifest(s). If this isn't done, any time a change to that object is applied, for example via kubectl apply -f deployment.yaml, this will instruct Kubernetes to scale the current number of Pods to the value of the spec.replicas key. This may not be desired and could be troublesome when an HPA is active.

and the same is true for a Fleet. You can prevent your controller and the fleet autoscaler from fighting over the replicas field but not including replicas in your fleet specification, and allowing the fleet autoscaler to set the field on your behalf.

Copy link

'This issue is marked as Stale due to inactivity for more than 30 days. To avoid being marked as 'stale' please add 'awaiting-maintainer' label or add a comment. Thank you for your contributions '

@github-actions github-actions bot added the stale Pending closure unless there is a strong objection. label Jun 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug These are bugs. stale Pending closure unless there is a strong objection.
Projects
None yet
Development

No branches or pull requests

2 participants