Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Pulsar unique autoscaling metrics #457

Open
tpiperatgod opened this issue Aug 23, 2022 · 13 comments
Open

Add Pulsar unique autoscaling metrics #457

tpiperatgod opened this issue Aug 23, 2022 · 13 comments
Assignees
Labels

Comments

@tpiperatgod
Copy link
Contributor

The autoscaling of FunctionMesh's resources is currently controlled by HPA.

We can add some Pulsar unique metrics to the HPA to determine if the target workload needs to be scaled.

Here are two approaches:

  1. introduce KEDA to FunctionMesh since KEDA has supported the Pulsar scaler, also KEDA supports CRD as a scalable object, ref: https://keda.sh/docs/2.8/concepts/scaling-deployments/#scaling-of-custom-resources
  2. add HPA extension adapter to FunctionMesh and develop a built-in scaler that aligns with the KEDA Pulsar scaler

what do you think?

@tpiperatgod tpiperatgod added type/feature Indicates new functionality compute/serverless labels Aug 23, 2022
@tpiperatgod
Copy link
Contributor Author

and with the new scaler, FunctionMesh can downscale the replicas of a function to 0

@hpvd
Copy link

hpvd commented Aug 23, 2022

+1 on this, was also thinking about using KEDA when, talking about the relationship between size and spinup duration / faster dynamic scaling in the advantages of distroless topic #448

@hpvd
Copy link

hpvd commented Aug 23, 2022

regarding KEDA: this is a good introduction:
https://medium.com/backstagewitharchitects/how-autoscaling-works-in-kubernetes-why-you-need-to-start-using-keda-b601b483d355
(the embedded video is also interesting)

@hpvd
Copy link

hpvd commented Aug 23, 2022

there is already a blogpost saying that KEDA may be a future direction (at the end of https://streamnative.cn/blog/engineering/2022-01-19-auto-scaling-pulsar-functions-in-kubernetes-using-custom-metrics-zh/)

@hpvd
Copy link

hpvd commented Aug 23, 2022

of course in some/many usecases the possibility to easily autoscale to zero would help a lot in the field of infrastructure costs...

@tpiperatgod tpiperatgod self-assigned this Oct 21, 2022
@tpiperatgod
Copy link
Contributor Author

Overview

Function Mesh's function instances can be dynamically scaled with the help of HPA based on CPU and memory metrics. However, Function Mesh has not yet been able to scale to/from 0 replica. This proposal aims to provide a solution that can implement this feature.

Motivation

Provides the ability to scale the function instances of Function Mesh to/from 0 replica.

Proposal

I propose to introduce the KEDA project as a basic solution for implementing the scaling of Function Mesh's function instances to/from 0 replica. The advantage of this solution is that Function Mesh's event engine is Pulsar, and KEDA already has a Pulsar scaler, which can use Pulsar's message backlog as a metric for function scaling.

Structure for scaling configurations:

type AdvanceScaleConfig struct {
	Driver   string            `json:"driver,omitempty"`   \\ Indicates the driver for Scaler, available: "keda"
	Topics   []string          `json:"topics,omitempty"`   \\ Indicates the topics used to trigger the Scaler
	Strategy map[string]string `json:"strategy,omitempty"` \\ Indicates the trigger strategy
}

Example:

spec:
  advanceScaleConfig:
    driver: keda
    topics:
      - persistent://public/default/my-topic-1
      - persistent://public/default/my-topic-2
    strategy: 
      msgBacklogThreshold: 10
      activationMsgBacklogThreshold: 2
      pollingInterval: 30

According to the definition of KEDA Pulsar Scaler, a Scaler is triggered by only one Topic, so if there are multiple Topics in the Function (spec.inputs), the Operator will generate a Trigger for each Topic.

Example of KEDA ScaledObject resource for the above configuration:

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: <function-name>-scaler
  namespace: <function-namespace>
spec:
  scaleTargetRef:
    name: <function-sts-name>
  pollingInterval: 30
  triggers:
  - type: pulsar
    metadata:
      adminURL: http://localhost:80 # Get from spec.pulsar.pulsarConfig
      topic: persistent://public/default/my-topic-1
      subscription: sub1 # Get from spec.SubscriptionName
      msgBacklogThreshold: '10'
      activationMsgBacklogThreshold: '2'
  - type: pulsar
    metadata:
      adminURL: http://localhost:80 # Get from spec.pulsar.pulsarConfig
      topic: persistent://public/default/my-topic-2
      subscription: sub1 # Get from spec.SubscriptionName
      msgBacklogThreshold: '10'
      activationMsgBacklogThreshold: '2'

Example configuration of the Auth section, if the following is configured in Function:

spec:
  pulsar:
    tlsConfig: 
      enabled: true
      allowInsecure: true
      certSecretName: "ca-name"
      certSecretKey: "ca-key"

Example of resources corresponding to KEDA:

apiVersion: v1
kind: Secret
metadata:
  name: <function-name>-keda-tls-secrets
  namespace: <function-namespace>
data:
  cert: "ca-name"
  key: "ca-key"
---
apiVersion: keda.sh/v1alpha1
kind: TriggerAuthentication
metadata:
  name: <function-name>-keda-trigger-auth-pulsar-credential
  namespace: <function-namespace>
spec:
  secretTargetRef:
  - parameter: cert
    name: <function-name>-keda-tls-secrets
    key: cert
  - parameter: key
    name: <function-name>-keda-tls-secrets
    key: key
---
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: <function-name>-scaler
  namespace: <function-namespace>
spec:
  scaleTargetRef:
    name: <function-sts-name>
  pollingInterval: 30
  triggers:
  - type: pulsar
    metadata:
      tls: "enable"
      adminURL: https://localhost:8443
      topic: persistent://public/default/my-topic
      subscription: sub1
      msgBacklogThreshold: '5'
    authenticationRef:
      name: <function-name>-keda-trigger-auth-pulsar-credential

@tpiperatgod
Copy link
Contributor Author

state-machine-diagram is here

@tpiperatgod
Copy link
Contributor Author

of course in some/many usecases the possibility to easily autoscale to zero would help a lot in the field of infrastructure costs...

Hi @hpvd, it seems you are interested in this development, may I take the liberty to ask what company you work for? Also, what kind of cases are you using Function Mesh in?

@hpvd
Copy link

hpvd commented Oct 26, 2022

@tpiperatgod thanks for your question.
We are still incubating our new company ;-) It's in the field of mechanical engineering...
We are looking into pulsar for streaming but also for high-load, on demand batch processing.
Because of the latter and the fact that we and our customers don't (always) work 24/7, scaling to zero is more than nice to have... (yes we could work with crons, but this not flexible and the amount of rules always keeps growing..)
Beside this, we are interested in a strong security of everything and of course the main features of pulsar -like great performance, build in geo-replication and functions, relative low effort for constant maintenance ...

@tpiperatgod
Copy link
Contributor Author

tpiperatgod commented Oct 27, 2022

@tpiperatgod thanks for your question. We are still incubating our new company ;-) It's in the field of mechanical engineering... We are looking into pulsar for streaming but also for high-load, on demand batch processing. Because of the latter and the fact that we and our customers don't (always) work 24/7, scaling to zero is more than nice to have... (yes we could work with crons, but this not flexible and the amount of rules always keeps growing..) Beside this, we are interested in a strong security of everything and of course the main features of pulsar -like great performance, build in geo-replication and functions, relative low effort for constant maintenance ...

Oh, I see. So for now you're worried about two things.

  • Security
  • Serverless

And the community is working on these issues.

  • This issue is being tracked on improvements to serverless (scaled down to zero)
  • And this issue is being tracked on improving the runtime environment (and thus addressing security vulnerabilities)

You are welcome to participate in building the community

@hpvd
Copy link

hpvd commented Oct 27, 2022

thanks for your warm words.
Yes, there was a lot of great progress and there are many good things on the way...
e.g.

and also

@hpvd
Copy link

hpvd commented Oct 28, 2022

these 2 points may be interesting for testing and release of this new functionality:

1) new in latest KEDA (v2.8): Activation and Scaling Thresholds

Previously in KEDA, when scaling from 0 to 1, KEDA would “activate” (scale to 1) a resource when any activity happened on that event source. For example, if using a queue, a single message on the queue would trigger activation and scale.

As of this release, we now allow you to set an activationThreshold for many scalers which is the metric that must be hit before scaling to 1.

This would allow you to delay scaling up to 1 until n number of messages were unprocessed. This pairs with other thresholds and target values for scaling from 1 to n instances, where the HPA will scale out to n instances based on the current event metric and the defined threshold values.

Details on thresholds and the new activation thresholds can be found in the KEDA concept docs

see https://keda.sh/blog/2022-08-10-keda-2.8.0-release/

2) next KEDA v2.9 is planned for Nov 3rd

(but not sure if this will happen)
see https://github.com/kedacore/keda/blob/main/ROADMAP.md

@hpvd
Copy link

hpvd commented Dec 28, 2022

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants