Add Pulsar unique autoscaling metrics #457

tpiperatgod · 2022-08-23T09:53:13Z

The autoscaling of FunctionMesh's resources is currently controlled by HPA.

We can add some Pulsar unique metrics to the HPA to determine if the target workload needs to be scaled.

Here are two approaches:

introduce KEDA to FunctionMesh since KEDA has supported the Pulsar scaler, also KEDA supports CRD as a scalable object, ref: https://keda.sh/docs/2.8/concepts/scaling-deployments/#scaling-of-custom-resources
add HPA extension adapter to FunctionMesh and develop a built-in scaler that aligns with the KEDA Pulsar scaler

what do you think?

tpiperatgod · 2022-08-23T10:05:57Z

and with the new scaler, FunctionMesh can downscale the replicas of a function to 0

hpvd · 2022-08-23T10:40:57Z

+1 on this, was also thinking about using KEDA when, talking about the relationship between size and spinup duration / faster dynamic scaling in the advantages of distroless topic #448

hpvd · 2022-08-23T10:43:06Z

regarding KEDA: this is a good introduction:
https://medium.com/backstagewitharchitects/how-autoscaling-works-in-kubernetes-why-you-need-to-start-using-keda-b601b483d355
(the embedded video is also interesting)

hpvd · 2022-08-23T10:45:26Z

there is already a blogpost saying that KEDA may be a future direction (at the end of https://streamnative.cn/blog/engineering/2022-01-19-auto-scaling-pulsar-functions-in-kubernetes-using-custom-metrics-zh/)

hpvd · 2022-08-23T10:47:57Z

of course in some/many usecases the possibility to easily autoscale to zero would help a lot in the field of infrastructure costs...

tpiperatgod · 2022-10-21T07:42:30Z

Overview

Function Mesh's function instances can be dynamically scaled with the help of HPA based on CPU and memory metrics. However, Function Mesh has not yet been able to scale to/from 0 replica. This proposal aims to provide a solution that can implement this feature.

Motivation

Provides the ability to scale the function instances of Function Mesh to/from 0 replica.

Proposal

I propose to introduce the KEDA project as a basic solution for implementing the scaling of Function Mesh's function instances to/from 0 replica. The advantage of this solution is that Function Mesh's event engine is Pulsar, and KEDA already has a Pulsar scaler, which can use Pulsar's message backlog as a metric for function scaling.

Structure for scaling configurations:

type AdvanceScaleConfig struct {
	Driver   string            `json:"driver,omitempty"`   \\ Indicates the driver for Scaler, available: "keda"
	Topics   []string          `json:"topics,omitempty"`   \\ Indicates the topics used to trigger the Scaler
	Strategy map[string]string `json:"strategy,omitempty"` \\ Indicates the trigger strategy
}

Example:

spec:
  advanceScaleConfig:
    driver: keda
    topics:
      - persistent://public/default/my-topic-1
      - persistent://public/default/my-topic-2
    strategy: 
      msgBacklogThreshold: 10
      activationMsgBacklogThreshold: 2
      pollingInterval: 30

According to the definition of KEDA Pulsar Scaler, a Scaler is triggered by only one Topic, so if there are multiple Topics in the Function (spec.inputs), the Operator will generate a Trigger for each Topic.

Example of KEDA ScaledObject resource for the above configuration:

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: <function-name>-scaler
  namespace: <function-namespace>
spec:
  scaleTargetRef:
    name: <function-sts-name>
  pollingInterval: 30
  triggers:
  - type: pulsar
    metadata:
      adminURL: http://localhost:80 # Get from spec.pulsar.pulsarConfig
      topic: persistent://public/default/my-topic-1
      subscription: sub1 # Get from spec.SubscriptionName
      msgBacklogThreshold: '10'
      activationMsgBacklogThreshold: '2'
  - type: pulsar
    metadata:
      adminURL: http://localhost:80 # Get from spec.pulsar.pulsarConfig
      topic: persistent://public/default/my-topic-2
      subscription: sub1 # Get from spec.SubscriptionName
      msgBacklogThreshold: '10'
      activationMsgBacklogThreshold: '2'

Example configuration of the Auth section, if the following is configured in Function:

spec:
  pulsar:
    tlsConfig: 
      enabled: true
      allowInsecure: true
      certSecretName: "ca-name"
      certSecretKey: "ca-key"

Example of resources corresponding to KEDA:

apiVersion: v1
kind: Secret
metadata:
  name: <function-name>-keda-tls-secrets
  namespace: <function-namespace>
data:
  cert: "ca-name"
  key: "ca-key"
---
apiVersion: keda.sh/v1alpha1
kind: TriggerAuthentication
metadata:
  name: <function-name>-keda-trigger-auth-pulsar-credential
  namespace: <function-namespace>
spec:
  secretTargetRef:
  - parameter: cert
    name: <function-name>-keda-tls-secrets
    key: cert
  - parameter: key
    name: <function-name>-keda-tls-secrets
    key: key
---
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: <function-name>-scaler
  namespace: <function-namespace>
spec:
  scaleTargetRef:
    name: <function-sts-name>
  pollingInterval: 30
  triggers:
  - type: pulsar
    metadata:
      tls: "enable"
      adminURL: https://localhost:8443
      topic: persistent://public/default/my-topic
      subscription: sub1
      msgBacklogThreshold: '5'
    authenticationRef:
      name: <function-name>-keda-trigger-auth-pulsar-credential

tpiperatgod · 2022-10-24T03:44:35Z

state-machine-diagram is here

tpiperatgod · 2022-10-25T05:58:09Z

of course in some/many usecases the possibility to easily autoscale to zero would help a lot in the field of infrastructure costs...

Hi @hpvd, it seems you are interested in this development, may I take the liberty to ask what company you work for? Also, what kind of cases are you using Function Mesh in?

hpvd · 2022-10-26T14:55:52Z

@tpiperatgod thanks for your question.
We are still incubating our new company ;-) It's in the field of mechanical engineering...
We are looking into pulsar for streaming but also for high-load, on demand batch processing.
Because of the latter and the fact that we and our customers don't (always) work 24/7, scaling to zero is more than nice to have... (yes we could work with crons, but this not flexible and the amount of rules always keeps growing..)
Beside this, we are interested in a strong security of everything and of course the main features of pulsar -like great performance, build in geo-replication and functions, relative low effort for constant maintenance ...

tpiperatgod · 2022-10-27T01:27:18Z

@tpiperatgod thanks for your question. We are still incubating our new company ;-) It's in the field of mechanical engineering... We are looking into pulsar for streaming but also for high-load, on demand batch processing. Because of the latter and the fact that we and our customers don't (always) work 24/7, scaling to zero is more than nice to have... (yes we could work with crons, but this not flexible and the amount of rules always keeps growing..) Beside this, we are interested in a strong security of everything and of course the main features of pulsar -like great performance, build in geo-replication and functions, relative low effort for constant maintenance ...

Oh, I see. So for now you're worried about two things.

Security
Serverless

And the community is working on these issues.

This issue is being tracked on improvements to serverless (scaled down to zero)
And this issue is being tracked on improving the runtime environment (and thus addressing security vulnerabilities)

You are welcome to participate in building the community

hpvd · 2022-10-27T08:11:29Z

thanks for your warm words.
Yes, there was a lot of great progress and there are many good things on the way...
e.g.

this issue about scaling using Keda
your mentioned Kubernetes upgrade Upgrade the Kubernetes environment Function Mesh is running on #490

and also

security fixes need to fix operator security vulnerabilities #371
looking into distroless Build distroless package for better security, smaller size, speed and more #448
finalize E2E encryption: [Feature] Enable end-to-end encryption for function mesh #44
auth for go functions go function failed to run with authentication enabled #362
support newer Kubernetes versions Test compatibility to newer Kubernetes versions #555

hpvd · 2022-10-28T14:22:34Z

these 2 points may be interesting for testing and release of this new functionality:

1) new in latest KEDA (v2.8): Activation and Scaling Thresholds

Previously in KEDA, when scaling from 0 to 1, KEDA would “activate” (scale to 1) a resource when any activity happened on that event source. For example, if using a queue, a single message on the queue would trigger activation and scale.

As of this release, we now allow you to set an activationThreshold for many scalers which is the metric that must be hit before scaling to 1.

This would allow you to delay scaling up to 1 until n number of messages were unprocessed. This pairs with other thresholds and target values for scaling from 1 to n instances, where the HPA will scale out to n instances based on the current event metric and the defined threshold values.

Details on thresholds and the new activation thresholds can be found in the KEDA concept docs

see https://keda.sh/blog/2022-08-10-keda-2.8.0-release/

2) next KEDA v2.9 is planned for Nov 3rd

(but not sure if this will happen)
see https://github.com/kedacore/keda/blob/main/ROADMAP.md

hpvd · 2022-12-28T06:59:13Z

Keda 2.9 was released:
https://github.com/kedacore/keda/blob/main/CHANGELOG.md#v291

tpiperatgod added type/feature Indicates new functionality compute/serverless labels Aug 23, 2022

tpiperatgod self-assigned this Oct 21, 2022

tpiperatgod added the m/2022-11 label Oct 21, 2022

freeznet added m/2022-12 and removed m/2022-11 labels Nov 22, 2022

hpvd mentioned this issue Nov 23, 2022

Pulsar function replicas auto scaling apache/pulsar#18584

Open

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Pulsar unique autoscaling metrics #457

Add Pulsar unique autoscaling metrics #457

tpiperatgod commented Aug 23, 2022

tpiperatgod commented Aug 23, 2022

hpvd commented Aug 23, 2022 •

edited

hpvd commented Aug 23, 2022

hpvd commented Aug 23, 2022

hpvd commented Aug 23, 2022

tpiperatgod commented Oct 21, 2022

tpiperatgod commented Oct 24, 2022

tpiperatgod commented Oct 25, 2022

hpvd commented Oct 26, 2022 •

edited

tpiperatgod commented Oct 27, 2022 •

edited

hpvd commented Oct 27, 2022 •

edited

hpvd commented Oct 28, 2022 •

edited

hpvd commented Dec 28, 2022

Add Pulsar unique autoscaling metrics #457

Add Pulsar unique autoscaling metrics #457

Comments

tpiperatgod commented Aug 23, 2022

tpiperatgod commented Aug 23, 2022

hpvd commented Aug 23, 2022 • edited

hpvd commented Aug 23, 2022

hpvd commented Aug 23, 2022

hpvd commented Aug 23, 2022

tpiperatgod commented Oct 21, 2022

Overview

Motivation

Proposal

tpiperatgod commented Oct 24, 2022

tpiperatgod commented Oct 25, 2022

hpvd commented Oct 26, 2022 • edited

tpiperatgod commented Oct 27, 2022 • edited

hpvd commented Oct 27, 2022 • edited

hpvd commented Oct 28, 2022 • edited

1) new in latest KEDA (v2.8): Activation and Scaling Thresholds

2) next KEDA v2.9 is planned for Nov 3rd

hpvd commented Dec 28, 2022

hpvd commented Aug 23, 2022 •

edited

hpvd commented Oct 26, 2022 •

edited

tpiperatgod commented Oct 27, 2022 •

edited

hpvd commented Oct 27, 2022 •

edited

hpvd commented Oct 28, 2022 •

edited