Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PID controllers and autoscaling #1

Closed
glyn opened this issue Mar 14, 2023 · 7 comments
Closed

PID controllers and autoscaling #1

glyn opened this issue Mar 14, 2023 · 7 comments

Comments

@glyn
Copy link

glyn commented Mar 14, 2023

Hi Stevan

I wasn't sure how to get in touch, so please pardon the use of an issue!

I worked on a project (*) some years back where we experimented with the use of PID controllers to autoscale the number of instances in a cluster in response to workload metrics (e.g. based on a queue of requests). One of my then colleagues @jchester wrote up some of the findings in his book Knative in Action.

We found Feedback Control for Computer Systems by Philipp K Janert a useful introduction to the theory and practice of PID controllers.

We also found testing with a variety of shapes of input workload helped to identify problems with our controllers. We tested with sine waves, square waves, and step functions if I remember correctly. You might like to try that.

We experimented with various methods of smoothing and stabilising the resultant behaviour, as described in Janert's book. The results were variable in quality and tuning the PID parameters was tricky and unreliable. A particular issue we faced was the latency in scaling up the number of instances, which could produce some interesting (i.e. unwanted) feedback effects when the input workload varied quickly.

Hope that helps,
Glyn

  • The project was projectriff which later merged into the Knative project.
@stevana
Copy link
Owner

stevana commented Mar 14, 2023

Hey Glyn,

Thanks for sharing your experiences and references!

Two things spring to mind:

  1. Did you have a look at robust control theory? Supposedly it's better suited for
    non-linear systems where there's noise (bursty traffic, etc);

  2. I wonder if the latency in scaling up in instances can be solved by having
    a pool of instances ready to go. Perhaps this pool can be run by a
    third-party which keeps hot instances and charges for a "traffic insurance"
    premium? I haven't thought much about this, maybe it doesn't make economic
    sense...

@glyn
Copy link
Author

glyn commented Mar 14, 2023

Hey Glyn,

Thanks for sharing your experiences and references!

You're welcome!

Two things spring to mind:

1. Did you have a look at [robust control theory](https://users.ece.cmu.edu/~koopman/des_s99/control_theory/)? Supposedly  it's better suited for
   non-linear systems where there's noise (bursty traffic, etc);

No, I'm afraid we weren't aware of that at the time. Looks like it would have been useful though! Thanks.

2. I wonder if the latency in scaling up in instances can be solved by having
   a pool of instances ready to go. Perhaps this pool can be run by a
   third-party which keeps hot instances and charges for a "traffic insurance"
   premium? I haven't thought much about this, maybe it doesn't make economic
   sense...

Yes, that was one of the approaches we considered. It works well unless/until the pool becomes exhausted, in which case we are back to square one. I like the insurance analogy for addressing the charging issue.

(Closing the issue now so as not to clutter your set of issues. For future reference, you may want to enable discussions on github if you want to encourage feedback (no pun intended).)

@glyn glyn closed this as completed Mar 14, 2023
@stevana
Copy link
Owner

stevana commented Mar 14, 2023

It works well unless/until the pool becomes exhausted, in which case we are back to square one.

My guess is that the more clients (globally) this pool has the less bursty the traffic will be, because it's averaged out over many clients rather than just your one service.

For future reference, you may want to enable discussions on github

Thanks, I don't think I've ever used that before. I also don't mind using issues for this, this repo will likely not be very active anyway.

@glyn
Copy link
Author

glyn commented Mar 14, 2023

It works well unless/until the pool becomes exhausted, in which case we are back to square one.

My guess is that the more clients (globally) this pool has the less bursty the traffic will be, because it's averaged out over many clients rather than just your one service.

That's probably true in general. The riff project needed to be able to scale the instances to zero. The usecase was an occasionally used service that shouldn't consume resources when it's not in use. The instances were essentially instances of an application (e.g. packaged as a docker/oci image) rather than something reusable across applications.

@theOGognf
Copy link

Hey there,

Not every day I come across seeing a crossover in control and software. I like the way you applied the PID controller. I think the PID is a good fit for the application because it generates smooth input and is easily tunable.

Robust control can be beneficial if you can dynamically model your system (e.g., you know demand will change over time in an expected way like a sine wave) and you have parametric bounds on that model (e.g., you know the max demand over time). Adaptive control helps you build that dynamic model, but you still need to have some idea as to how your system fundamentally behaves to instantiate your controller with. The better your initial model, the better your adaptive and robust controller will perform. Otherwise, your controller won't have as smooth output as it'll attempt to compensate for unmodeled errors by applying large changes in inputs and model parameter estimates. Adaptive and robust controllers will generally perform well at tracking some desired state even for poorly modeled systems, but at the cost of large input fluctuations which may not be good for your application.

Cheers!

@stevana
Copy link
Owner

stevana commented Mar 14, 2023

Thanks @theOGognf!

Do you happen to have any good resouces on robust and or adaptive control, by the way?

Cheers!

@theOGognf
Copy link

Thanks @theOGognf!

Do you happen to have any good resouces on robust and or adaptive control, by the way?

Cheers!

Here's my favorite. (PDF download warning). That professor has other good materials on adaptive robust control, but I like the slides the best.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants