Skip to content
master
Switch branches/tags
Go to file
Code

Latest commit

fixed permission issue

Signed-off-by: raffaelespazzoli <raffaele.spazzoli@gmail.com>
1e3d0e4

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

readme.md

Proactive Node Scaling Operator

build status Go Report Card GitHub go.mod Go version

This operator makes the cluster autoscaler more proactive. As of now the cluster auto scaler will create new nodes only when a pod is pending because it cannot be allocated due to lack of capacity. This is not a goos user experience as the pending workload has to wait for several minutes as the new node is create and joins the cluster.

The Proactive Node Scaling Operator improves the user experience by allocating low priority pods that don't do anything. When the cluster is full and a new user pod is created the following happens:

  1. some of the low priority pods are de-scheduled to make room for the user pod, which can then be scheduled. The user workload does not have to wait in this case.

  2. the de-scheduled low priority pods are rescheduled and in doing so the trigger the cluster autoscaler to add new nodes.

Essentially this operator allows you to trade wasted resources for faster response time.

In order for this operator to work correctly pod priorities must be defined. Here is an example of how to do so:

apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: normal-workload
value: 1000
globalDefault: true
description: "This priority classis the cluster default and should be used for normal workloads."
priority: 0

The low priority pods scheduled by this operator will have the priority defined in the priority field (0 by default). The sleected prioorty should be very low so to be lower than anything else running in the cluster.

Also for this operator to work the cluster autoscaler must be active, see OpenShift instructions here on how to turn it on.

To activate the proactive autoscaling, a CR must be defined, here is an example:

apiVersion: redhatcop.redhat.io/v1alpha1
kind: NodeScalingWatermark
metadata:
  name: us-west-2a
spec:
  watermarkPercentage: 20
  nodeSelector:
    topology.kubernetes.io/zone: us-west-2a

The nodeSelector selects the nodes observed by this operator, which are also the nodes on which the low priority pods will be scheduled. The nodes observed by the cluster autoscaler should coincide with the nodes selected by this operator CR.

The watermarkPercentage define the percentage of capacity of user workload that will be allocated to low priority pods. So in this example 20% of the user allocated capacity will be allocated via low priority pods. This also means that when the user workload reaches 80% capacity of the nodes selected by this CR (and the autoscaler), the cluster will start to scale.

Deploying the Operator

This is a cluster-level operator that you can deploy in any namespace, proactive-node-scaling-operator is recommended.

It is recommended to deploy this operator via OperatorHub, but you can also deploy it using Helm.

Deploying from OperatorHub

If you want to utilize the Operator Lifecycle Manager (OLM) to install this operator, you can do so in two ways: from the UI or the CLI.

Deploying from OperatorHub UI

  • If you would like to launch this operator from the UI, you'll need to navigate to the OperatorHub tab in the console. Before starting, make sure you've created the namespace that you want to install this operator to with the following:
oc new-project proactive-node-scaling-operator
  • Once there, you can search for this operator by name: proactive node scaling operator. This will then return an item for our operator and you can select it to get started. Once you've arrived here, you'll be presented with an option to install, which will begin the process.
  • After clicking the install button, you can then select the namespace that you would like to install this to as well as the installation strategy you would like to proceed with (Automatic or Manual).
  • Once you've made your selection, you can select Subscribe and the installation will begin. After a few moments you can go ahead and check your namespace and you should see the operator running.

Proactive Node Scaling Operator

Deploying from OperatorHub using CLI

If you'd like to launch this operator from the command line, you can use the manifests contained in this repository by running the following:

oc new-project proactive-node-scaling-operator

oc apply -f config/operatorhub -n proactive-node-scaling-operator

This will create the appropriate OperatorGroup and Subscription and will trigger OLM to launch the operator in the specified namespace.

Deploying with Helm

Here are the instructions to install the latest release with Helm.

oc new-project proactive-node-scaling-operator
helm repo add proactive-node-scaling-operator https://redhat-cop.github.io/proactive-node-scaling-operator
helm repo update
helm install proactive-node-scaling-operator proactive-node-scaling-operator/proactive-node-scaling-operator

This can later be updated with the following commands:

helm repo update
helm upgrade proactive-node-scaling-operator proactive-node-scaling-operator/proactive-node-scaling-operator

Disconnected deployment

Use the PausePodImage field of the NodeScalingWatermark to specify an internally mirrored pause pod image, when running in a disconnected environment.

Development

Running the operator locally

make install
export TEMPLATE_FILE_NAME=./config/templates/watermarkDeploymentTemplate.yaml
oc new-project proactive-node-scaling-operator-local
kustomize build ./config/local-development | oc apply -f - -n proactive-node-scaling-operator-local
export token=$(oc serviceaccounts get-token 'default' -n proactive-node-scaling-operator-local)
oc login --token ${token}
make run ENABLE_WEBHOOKS=false

Building/Pushing the operator image

export repo=raffaelespazzoli #replace with yours
docker login quay.io/$repo/proactive-node-scaling-operator
make docker-build IMG=quay.io/$repo/proactive-node-scaling-operator:latest
make docker-push IMG=quay.io/$repo/proactive-node-scaling-operator:latest

Deploy to OLM via bundle

make manifests
make bundle IMG=quay.io/$repo/proactive-node-scaling-operator:latest
operator-sdk bundle validate ./bundle --select-optional name=operatorhub
make bundle-build BUNDLE_IMG=quay.io/$repo/proactive-node-scaling-operator-bundle:latest
docker login quay.io/$repo/proactive-node-scaling-operator-bundle
podman push quay.io/$repo/proactive-node-scaling-operator-bundle:latest
operator-sdk bundle validate quay.io/$repo/proactive-node-scaling-operator-bundle:latest --select-optional name=operatorhub
oc new-project proactive-node-scaling-operator
operator-sdk cleanup proactive-node-scaling-operator -n proactive-node-scaling-operator
operator-sdk run bundle --install-mode AllNamespaces -n proactive-node-scaling-operator quay.io/$repo/proactive-node-scaling-operator-bundle:latest

Testing

Create the following resource:

oc new-project proactive-node-scaling-operator-test
oc apply -f ./test/ai-ml-watermark.yaml -n proactive-node-scaling-operator-test
oc apply -f ./test/zone-watermark.yaml -n proactive-node-scaling-operator-test

Releasing

git tag -a "<tagname>" -m "<commit message>"
git push upstream <tagname>

If you need to remove a release:

git tag -d <tagname>
git push upstream --delete <tagname>

If you need to "move" a release to the current main

git tag -f <tagname>
git push upstream -f <tagname>

Cleaning up

operator-sdk cleanup proactive-node-scaling-operator -n proactive-node-scaling-operator
oc delete operatorgroup operator-sdk-og
oc delete catalogsource proactive-node-scaling-operator-catalog