Multiple parallel KRM function evaluator pods for the same KRM functions#336
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: dgyorgy-nokia The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
bb5ec2f to
42e051c
Compare
|
|
|
I think a PR was merged recently that ignored that linting error. If you rebase this PR, it may remove the linting error. |
|
Dear @dgyorgy-nokia, I've been looking into this PR as I'm also interested in the parallel pod feature for our use case. While reviewing the code, I came across something that might cause issues with multiple pods. In endpointIP := endpoint.Subsets[0].Addresses[0].IPWhen a second pod joins the same Service, its IP would appear at a different index in the Addresses slice, so this check would fail for any pod that isn't first in the list. I think changing it to search through all addresses would fix this: found := false
for _, addr := range endpoint.Subsets[0].Addresses {
if addr.IP == podIP {
found = true
break
}
}
if !found {
return false, fmt.Errorf("pod IP %s not found in service endpoints", podIP)
}By the way, the same pattern exists in main branch ( Let me know if I'm missing something or if you'd like me to help with a fix. |
Dear @thc1006 , Thanks for the feedback! The idea here to have service-pod pairs, which means that one service should have only one pod. The function runner takes care of distributing the request across the krm function pods. Two new parameters have been introduced for this. (MaxWaitlistLength/MaxParallelPodsPerFunction) There is a waitlist for each started service-pod pair. In case of this waitlist reaches the value in the MaxWaitlistLength parameter and the configured value in MaxParallelPodsPerFunction is bigger than the actual number of pods, a new pod-service pair will be created. With multiple running pods, the incoming request will be handled by the pod with the lowest waitlist. It can also be configured to behave the same way as it does now. In case of the MaxParallelPodsPerFunction configured to one, only one pod will handle the requests. This way the MaxWaitlistLength setting is ignored and requests are routed to there, even if the limit is exceeded. There are a few more changes I need to make. I'll update this PR as soon as I can. |
✅ Deploy Preview for porch ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
|
This PR will require updates to documentation. https://docs.porch.nephio.org/docs/5_architecture_and_components/function-runner/ other... |
Hi @thc1006 , |
|
Hi @dgyorgy-nokia, thx for pushing the second file! here's batch 2: https://gist.github.com/thc1006/3c0ab87ea258dbe1d1406daf2cd25c85 two files:
one small note — i noticed hope these help push coverage over 80%! |
|
re efiacor's docs comment — looked at both pages and the config page is missing - --max-parallel-pods-per-function=1 # max pods per function image (default: 1)
- --max-waitlist-length=2 # concurrent evaluations per pod before spawning a new pod (default: 2)one thing worth noting in the description: lmk if a full diff would be easier and i'll put one together. |
Nice! |
|
Congrats! |
|
@dgyorgy-nokia, in the default k8s manifests function-runner is running with 2 replicas, but as I understand in this PR load balancing is only happening inside the process. Would that mean getting overload on the krm function executors compared to the provided cache configuration? |
Yes, that’s correct. The actual load balancing happens in the PodCacheManager, based on the provided cache configuration, not at the Kubernetes level. So it indeed makes sense to reduce the functionrunner replica count to one. |
|
@dgyorgy-nokia would you mind rebasing this PR please? |
|
Is this PR ready for final review/merge @dgyorgy-nokia @thc1006 ? |
|
Dear @dgyorgy-nokia , |
Sure, it's ready. |
|
|
||
| go 1.25.7 | ||
|
|
||
| retract v1.3.0 |
There was a problem hiding this comment.
Can we remove this retract? Don't think it's needed.
|
efiacor
left a comment
There was a problem hiding this comment.
Nice work guys. Thanks. I suspect we might need some addition updates to the docs as a follow up. Somewhere here maybe - https://docs.porch.nephio.org/docs/5_architecture_and_components/function-runner/
|
Documentation Updates 2 document(s) were updated by changes in this PR: _indexView Changes@@ -25,13 +25,14 @@
### Pod Lifecycle Management
Manages function execution pods with caching and garbage collection:
-- **Pod Cache Manager**: Orchestrates pod lifecycle via channel-based communication
+- **Pod Cache Manager**: Orchestrates a pool of pods for each KRM function type via channel-based communication, supporting configurable parallel pod execution and per-pod request waitlists
- **Pod Manager**: Handles pod and service CRUD operations
- **Pod Creation**: Template-based pod creation with init container for wrapper server injection
- **Service Management**: ClusterIP service frontends for service mesh compatibility
-- **TTL-Based Caching**: Reuses pods with configurable expiration and extension on use
+- **TTL-Based Caching**: Reuses pods with configurable expiration per pod; each pod in the pool has its own TTL
- **Garbage Collection**: Periodic cleanup of expired pods and failed pod handling
- **Pod Warming**: Pre-creates pods for frequently-used functions
+- **Horizontal Scaling**: Supports multiple parallel pods per function type (maxParallelPodsPerFunction) with configurable waitlist limits (maxWaitlistLength), improving throughput under high load while maintaining resource efficiency through TTL-based cleanup
For detailed architecture and process flows, see [Pod Lifecycle Management]({{% relref "/docs/5_architecture_and_components/function-runner/functionality/pod-lifecycle-management.md" %}}).
@@ -86,13 +87,15 @@
1. **Function Evaluation** receives gRPC request from Task Handler
2. **Multi-Evaluator** tries executable evaluator first (fast path)
3. **If NotFound**, falls back to pod evaluator (container execution)
-4. **Pod Lifecycle Management** checks pod cache for existing pod
-5. **If cache miss**, creates new pod with wrapper server via Pod Manager
-6. **Image & Registry Management** resolves image metadata and authentication
-7. **Pod Manager** creates pod with image pull secrets and service frontend
-8. **Pod Cache Manager** stores pod with TTL for reuse
-9. **Function Evaluation** connects to pod via service and executes function
-10. **Wrapper Server** executes function binary and returns structured results
-11. **Garbage Collection** periodically removes expired pods from cache
+4. **Pod Lifecycle Management** checks pod cache for available pods of the requested function type
+5. **If available pod found**, request is assigned to the pod with the shortest waitlist
+6. **If all pods busy but waitlists have capacity**, request is queued in the shortest waitlist
+7. **If all pods and waitlists full**, creates new pod (up to maxParallelPodsPerFunction limit) via Pod Manager
+8. **Image & Registry Management** resolves image metadata and authentication
+9. **Pod Manager** creates pod with image pull secrets and service frontend
+10. **Pod Cache Manager** stores pod in function's pool with individual TTL for reuse
+11. **Function Evaluation** connects to pod via service and executes function
+12. **Wrapper Server** executes function binary and returns structured results
+13. **Garbage Collection** periodically removes expired pods from each function's pool
Each functional area is documented in detail on its own page with architecture diagrams, process flows, and implementation specifics.function-evaluationView Changes@@ -8,7 +8,7 @@
## Overview
-Function evaluation is the core responsibility of the Function Runner - executing KRM (Kubernetes Resource Model) functions through pluggable evaluator strategies. The system uses a strategy pattern where different evaluators handle function execution in different ways (pod-based, executable, or chained), all conforming to a common interface.
+Function evaluation is the core responsibility of the Function Runner - executing KRM (Kubernetes Resource Model) functions through pluggable evaluator strategies. The system uses a strategy pattern where different evaluators handle function execution in different ways (pod-based, executable, or chained), all conforming to a common interface. The pod-based evaluator supports horizontal scaling through multiple parallel pods per function type, enabling high-throughput execution for demanding workloads.
### High-Level Architecture
@@ -133,18 +133,24 @@
### Waitlist Mechanism
-Prevents duplicate pod creation when multiple requests arrive for the same function:
-
-**Waitlist pattern:**
-- Multiple requests for same image queue up
-- Single pod creation serves all waiters
-- Batch notification when pod ready
-- Prevents duplicate pod creation
+Each pod maintains its own waitlist to queue requests and enable efficient load distribution:
+
+**Per-pod waitlist pattern:**
+- Each pod instance has a separate waitlist with configurable maximum length
+- Incoming requests are assigned to the pod with the shortest waitlist
+- Multiple requests for the same image can execute across different pods in parallel
+- When all pods are busy and all waitlists reach maximum length, a new pod is created (up to `maxParallelPodsPerFunction`)
+
+**Load distribution:**
+- Requests distributed to least-loaded pod (shortest waitlist)
+- Ties broken by selecting lowest pod index
+- Enables horizontal scaling when demand increases
+- Automatic scale-down through garbage collection of idle pods
**Error handling:**
-- Pod creation errors sent to all waiters
-- Waitlist cleared on error
-- Each waiter receives error independently
+- Pod creation errors sent to all waiters in that pod's waitlist
+- Failed pod removed from pool and waitlist redistributed to other pods
+- Each waiter receives error independently if redistribution fails
- Allows retry on next request
### Function Execution
@@ -417,6 +423,18 @@
- Garbage collection removes expired pods
- Failed pods immediately deleted
+**Parallel execution configuration:**
+- `maxParallelPodsPerFunction`: Controls how many pods can run simultaneously for each function type (default: 1)
+- `maxWaitlistLength`: Controls how many requests can queue per pod before triggering new pod creation (default: 2)
+- Configurable globally via command-line arguments or per-function via `pod-cache-config.yaml`
+- Per-function configuration overrides global defaults
+
+**Scaling behavior:**
+- New pod created when all existing pods' waitlists reach `maxWaitlistLength`
+- Maximum pods per function limited by `maxParallelPodsPerFunction`
+- Idle pods beyond TTL are garbage collected
+- Enables automatic scaling based on load
+
### Cache Warming
**Warming strategy:**
@@ -437,13 +455,19 @@
- Multiple requests can execute concurrently
- Each request gets own gRPC connection
- Pod cache manager coordinates access
-- Waitlist prevents duplicate pod creation
+- Per-pod waitlists manage queuing and load distribution
**Concurrency characteristics:**
-- Same function, same pod: Sequential (one at a time)
-- Same function, different pods: Concurrent
+- Same function, same pod: Sequential (one at a time per pod)
+- Same function, multiple pods: Parallel execution across pods (up to `maxParallelPodsPerFunction`)
- Different functions: Fully concurrent
-- No artificial concurrency limits
+- No artificial concurrency limits beyond configured maximums
+
+**Parallel pod scaling:**
+- Multiple pods can run simultaneously for the same function type
+- Each pod maintains its own waitlist of queued requests
+- New pods created when all waitlists reach `maxWaitlistLength` (up to `maxParallelPodsPerFunction`)
+- Enables high-throughput execution for demanding workloads
### Resource Limits
@@ -458,3 +482,5 @@
- Configure cache warming for hot functions
- Use executable evaluator for critical path
- Monitor pod resource usage
+- Tune `maxParallelPodsPerFunction` for high-load scenarios
+- Adjust `maxWaitlistLength` to balance responsiveness vs pod overhead |





Title
Multiple parallel KRM function evaluator pods for the same KRM functions
Description
What changed:
Why it’s needed:
Currently the PodEvaluator that manages KRM function evaluator pods and executes KRM functions in them is designed to have 0 or 1 pod at any time for each KRM function image name. This limits the scalability of KRM function evaluation, thus this change intends to relax this limitation, by allowing multiple pods running at the same time for the same KRM function image name.
How it works:
waitlistper KRM function per running pod.waitlistfor the given KRM function. If there are multiple shortestwaitlists, then picks the one with the lowest index in the slice ofwaitlists.waitlists of a KRM function is longer than a constant value (maxWaitlistLength parameter), and the maximum of allowed parallel pods () per KRM function haven't reached yet, than a new pod started, and a waitlists created for it.waitlistof a pod is empty for more than the given TTL period, then deletes that pod and itswaitlistType of Change
Checklist