Skip to content

Conversation

@jkremser
Copy link
Member

@jkremser jkremser commented Nov 7, 2025

copy of its prp-azul/README.md (so that formatting is ON):

Azul JVM & PodResourceProfiles

This example shows how PodResourceProfiles (vertical scaling) can help with resource intensive workloads during startup. Azul JVM - Zing runs JIT compilation during application warmup that dynamically
optimize certain (hot) paths of code into machine code. This compilation process requires more CPU than normal mode of the Java application. At the same time, we would like to make sure,
the users get the best experience so that we should allow the incoming traffic to the application only after it has been heated and it is performant enough.

Architecture

Azul JVM can expose the information about its compilation queue using JMX. With a simple Python script we can read this number and consider the workload ready only after it is bellow some configurable
threshold. Startup probes in Kubernetes are great fit for this use-case. They allow to check certain criteria more often during the startup and only after the startup is done, the classical readiness & liveness probes can kick in and start doing their periodic checks.

To demonstrate a Java application that does some serious heavy lifting, we choose to use the Renaissance benchmarking suite from MIT. Namely the finagle-http benchmark. This particular benchmark
sends many small Finagle HTTP requests to a Finagle HTTP server and waits for the responses. Once the benchmark run to completion, we run a sleep command.

Important

This feature is possible only with Kubernetes In-Place Pod Resource Updates. This feature is enabled by default since 1.33 (for older version it needs to be enabled using a feature flag).

Demo

Tip

For trying this on k3d, create the cluster using:

k3d cluster create in-place-updates --no-lb --k3s-arg "--disable=traefik,servicelb@server:*" --k3s-arg "--kube-apiserver-arg=feature-gates=InPlacePodVerticalScaling=true@server:*"
  1. Install Kedify in K8s cluster - https://docs.kedify.io/installation/helm
  2. Deploy example application:
kubectl apply -f k8s/
  1. Keep checking its CPU resources:
kubectl get po -lapp=heavy-workload -ojsonpath="{.items[*].spec.containers[?(.name=='main')].resources}" | jq

We should be able to see that after some time, it drops from 1 CPU to 0.2.

In order to check the length of the compilation Q, one can run:

kubectl exec -ti $(kubectl get po -lapp=heavy-workload -ojsonpath="{.items[0].metadata.name}") -- /ready.py
JMX_HOST=127.0.0.1
JMX_PORT=9010
OUTSTANDING_COMPILES_THRESHOLD=500
786
TotalOutstandingCompiles still above threshold: 786 >= 500
command terminated with exit code 2

Conclusion

By asking for right amount of compute power at right times, we allow for more effective bin-packing algorithm in Kubernetes and, if used together with tools like Karpenter, this boils down to real cost savings.

@jkremser jkremser requested a review from a team as a code owner November 7, 2025 12:39
@jkremser jkremser force-pushed the azul branch 5 times, most recently from 21bda70 to dbd27cb Compare November 7, 2025 13:42
Signed-off-by: Jirka Kremser <jiri.kremser@gmail.com>
@jkremser jkremser merged commit a764371 into kedify:main Nov 11, 2025
@jkremser jkremser deleted the azul branch November 11, 2025 06:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants