New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Research item: Scale to zero and back again #238

Closed
alexellis opened this Issue Sep 29, 2017 · 15 comments

Comments

Projects
None yet
5 participants
@alexellis
Member

alexellis commented Sep 29, 2017

This is a research item or PoC - not necessarily going to be merged into the project. The aim is to learn about challenges and trade-offs about scaling to zero and back again.

Expected Behaviour

Given a service has been scaled to zero replicas, upon synchronous or asynchronous invocation we should increase the replica count to 1/1 and then wait until the function is ready and invoke it.

Current Behaviour

Scaling to zero is the same as "not found"

Possible Solution

Ideas:

  • Implement spin-down service by watching Prometheus metrics
  • Add attribute/meta to functions which can spin down or up
  • Implement smart-proxy to spin up a function from 0/1 to 1/1 replicas and poll until ready

Context

This should be implemented at the FaaS repo level so that it can be utilised across all providers.

@sorenbs

This comment has been minimized.

Show comment
Hide comment
@sorenbs

sorenbs Oct 9, 2017

What is the best case delay when scaling up from zero?

~ 10 ms
~ 100 ms
~ 1 s
~ 10 s

sorenbs commented Oct 9, 2017

What is the best case delay when scaling up from zero?

~ 10 ms
~ 100 ms
~ 1 s
~ 10 s

@alexellis

This comment has been minimized.

Show comment
Hide comment
@alexellis

alexellis Oct 9, 2017

Member

That's a great question, but it depends on so much - including the back-end you're targeting and the hardware you've deployed to. Are you adding to the discussion or keen to try implementing this?

Member

alexellis commented Oct 9, 2017

That's a great question, but it depends on so much - including the back-end you're targeting and the hardware you've deployed to. Are you adding to the discussion or keen to try implementing this?

@wonderxboy

This comment has been minimized.

Show comment
Hide comment
@wonderxboy

wonderxboy Nov 11, 2017

Anyone working on this?

wonderxboy commented Nov 11, 2017

Anyone working on this?

@alexellis

This comment has been minimized.

Show comment
Hide comment
@alexellis

alexellis Nov 11, 2017

Member

Yes I have a prototype but Docker Swarm is misrepresenting readiness so going back up to 1/1 was less reliable. Take a look at this - https://github.com/openfaas/faas/compare/enable_zero_proxy?expand=1

Member

alexellis commented Nov 11, 2017

Yes I have a prototype but Docker Swarm is misrepresenting readiness so going back up to 1/1 was less reliable. Take a look at this - https://github.com/openfaas/faas/compare/enable_zero_proxy?expand=1

@wonderxboy

This comment has been minimized.

Show comment
Hide comment
@wonderxboy

wonderxboy Nov 22, 2017

@alexellis I see your code proxy.go only spins up the service instance when there is 0 instance of such a service.

  1. How could a service get into a 0-instance at the first place? At deploy time with a instance number(faas-cli does not seem to have such an option)?
  2. Does it scale back to 0 automatically when there is no activity for a given time span?

wonderxboy commented Nov 22, 2017

@alexellis I see your code proxy.go only spins up the service instance when there is 0 instance of such a service.

  1. How could a service get into a 0-instance at the first place? At deploy time with a instance number(faas-cli does not seem to have such an option)?
  2. Does it scale back to 0 automatically when there is no activity for a given time span?
@alexellis

This comment has been minimized.

Show comment
Hide comment
@alexellis

alexellis Nov 22, 2017

Member

You would use the logs to decide when the function was inactive and then scale down.

Member

alexellis commented Nov 22, 2017

You would use the logs to decide when the function was inactive and then scale down.

@alexellis

This comment has been minimized.

Show comment
Hide comment
@alexellis

alexellis Nov 22, 2017

Member

Logs or metrics.

Member

alexellis commented Nov 22, 2017

Logs or metrics.

@wonderxboy

This comment has been minimized.

Show comment
Hide comment
@wonderxboy

wonderxboy Nov 22, 2017

Would you work on that? Or should I have a look? I would expect this to be a full-fledged feature in that the system automatically take care of the whole process. This is a very useful feature form a cloud provider's perspective, the functions can be served with less resource consumptions

wonderxboy commented Nov 22, 2017

Would you work on that? Or should I have a look? I would expect this to be a full-fledged feature in that the system automatically take care of the whole process. This is a very useful feature form a cloud provider's perspective, the functions can be served with less resource consumptions

@burtonr

This comment has been minimized.

Show comment
Hide comment
@burtonr

burtonr Dec 29, 2017

Member

Bumping this as it's a common question when I do demonstrations of OpenFaaS to the community.

@wonderxboy or @alexellis has any progress been made on this? I'd love to help if I can 😄

Member

burtonr commented Dec 29, 2017

Bumping this as it's a common question when I do demonstrations of OpenFaaS to the community.

@wonderxboy or @alexellis has any progress been made on this? I'd love to help if I can 😄

@alexellis

This comment has been minimized.

Show comment
Hide comment
@alexellis

alexellis Dec 29, 2017

Member

The target audience for running OpenFaaS is not a cloud provider, that's why this isn't top priority on the roadmap. People often ask about this because their only frame or reference is a cloud provider who has massive multi-tenancy issues. It's much more useful for them than for a small-to-medium team running functions alongside their existing deployments.

I have a PoC but it's dependent on some WIP items such as #460 #273

Member

alexellis commented Dec 29, 2017

The target audience for running OpenFaaS is not a cloud provider, that's why this isn't top priority on the roadmap. People often ask about this because their only frame or reference is a cloud provider who has massive multi-tenancy issues. It's much more useful for them than for a small-to-medium team running functions alongside their existing deployments.

I have a PoC but it's dependent on some WIP items such as #460 #273

@burtonr

This comment has been minimized.

Show comment
Hide comment
@burtonr

burtonr Dec 29, 2017

Member

I agree that the target audience isn't a cloud provider, but rather the audience may be using a cloud provider and being able to scale back to 0 could save a lot of charges. At least that's the response from my company when I pitched it to them.

When those other 2 issues do get resolved (sounds like it could be soon) I'd like to lend a hand in getting this going again.

Member

burtonr commented Dec 29, 2017

I agree that the target audience isn't a cloud provider, but rather the audience may be using a cloud provider and being able to scale back to 0 could save a lot of charges. At least that's the response from my company when I pitched it to them.

When those other 2 issues do get resolved (sounds like it could be soon) I'd like to lend a hand in getting this going again.

@alexellis

This comment has been minimized.

Show comment
Hide comment
@alexellis

alexellis Dec 29, 2017

Member

It's the response from folks who use cloud functions as a point of reference. Happy to discuss more on Slack.

Member

alexellis commented Dec 29, 2017

It's the response from folks who use cloud functions as a point of reference. Happy to discuss more on Slack.

@berndtj

This comment has been minimized.

Show comment
Hide comment
@berndtj

berndtj Jul 12, 2018

I'd like to see this feature move forward. We are trying to host a multi-user service in a restrictive k8s environment. We need to free up idle resources and scaling to zero will make a huge difference.

berndtj commented Jul 12, 2018

I'd like to see this feature move forward. We are trying to host a multi-user service in a restrictive k8s environment. We need to free up idle resources and scaling to zero will make a huge difference.

@alexellis

This comment has been minimized.

Show comment
Hide comment
@alexellis

alexellis Jul 12, 2018

Member

Hi @berndtj I've restarted the work on this and there is more conversation in the linked PR.

Member

alexellis commented Jul 12, 2018

Hi @berndtj I've restarted the work on this and there is more conversation in the linked PR.

@alexellis

This comment has been minimized.

Show comment
Hide comment
@alexellis

alexellis Aug 7, 2018

Member

Derek close: released and blog available at https://www.openfaas.com/

Member

alexellis commented Aug 7, 2018

Derek close: released and blog available at https://www.openfaas.com/

@derek derek bot closed this Aug 7, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment