Multiple parallel KRM function evaluator pods for the same KRM functions by dgyorgy-nokia · Pull Request #336 · kptdev/porch

dgyorgy-nokia · 2025-10-15T13:54:14Z

Title

Multiple parallel KRM function evaluator pods for the same KRM functions

Description

What changed:
- Allow to run multiple pods for the same KRM function image, if the load requires it.
- Add new command line parameter to function-runner that sets the maximum allowed parallel pod executions per KRM function type (maxParallelPodsPerFunction) and an another one which maximises the waitlist length of a pod. (maxWaitlistLength).
- By default the PodEvaluator only starts 1 pod per KRM function type. If a new KRM function evaluation is requested, while another is ongoing for the same KRM function and the maximum allowed parallel pod number hasn't reached yet, then a new pod started.
- The garbage collector collects idle pods, similarly to how it works now.
Why it’s needed:
Currently the PodEvaluator that manages KRM function evaluator pods and executes KRM functions in them is designed to have 0 or 1 pod at any time for each KRM function image name. This limits the scalability of KRM function evaluation, thus this change intends to relax this limitation, by allowing multiple pods running at the same time for the same KRM function image name.
How it works:
- there is a separate waitlist per KRM function per running pod.
- the KRM function evaluation requests are sent to a single pod one-by-one, not in parallel
- when a new function evaluation request is received, it picks the shortest waitlist for the given KRM function. If there are multiple shortest waitlists, then picks the one with the lowest index in the slice of waitlists.
- when all waitlists of a KRM function is longer than a constant value (maxWaitlistLength parameter), and the maximum of allowed parallel pods () per KRM function haven't reached yet, than a new pod started, and a waitlists created for it.
- when a waitlist of a pod is empty for more than the given TTL period, then deletes that pod and its waitlist

Type of Change

Checklist

nephio-prow · 2025-10-15T13:54:18Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: dgyorgy-nokia
Once this PR has been reviewed and has the lgtm label, please assign efiacor for approval by writing /assign @efiacor in a comment. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

sonarqubecloud · 2025-10-15T14:38:16Z

Quality Gate failed

Failed conditions
58.3% Coverage on New Code (required ≥ 80%)

See analysis details on SonarQube Cloud

sonarqubecloud · 2025-11-18T13:51:48Z

Quality Gate failed

Failed conditions
58.3% Coverage on New Code (required ≥ 80%)

See analysis details on SonarQube Cloud

liamfallon · 2025-11-24T07:17:55Z

I think a PR was merged recently that ignored that linting error. If you rebase this PR, it may remove the linting error.

thc1006 · 2026-01-22T02:50:48Z

Dear @dgyorgy-nokia,

I've been looking into this PR as I'm also interested in the parallel pod feature for our use case. While reviewing the code, I came across something that might cause issues with multiple pods.

In func/internal/podmanager.go around line 790, the endpoint validation only checks the first address:

endpointIP := endpoint.Subsets[0].Addresses[0].IP

When a second pod joins the same Service, its IP would appear at a different index in the Addresses slice, so this check would fail for any pod that isn't first in the list.

I think changing it to search through all addresses would fix this:

found := false
for _, addr := range endpoint.Subsets[0].Addresses {
    if addr.IP == podIP {
        found = true
        break
    }
}
if !found {
    return false, fmt.Errorf("pod IP %s not found in service endpoints", podIP)
}

By the way, the same pattern exists in main branch (podevaluator.go:1257), though it doesn't cause problems there since only one pod per function is currently supported.

Let me know if I'm missing something or if you'd like me to help with a fix.

dgyorgy-nokia · 2026-01-29T12:39:51Z

Dear @dgyorgy-nokia,

I've been looking into this PR as I'm also interested in the parallel pod feature for our use case. While reviewing the code, I came across something that might cause issues with multiple pods.

In func/internal/podmanager.go around line 790, the endpoint validation only checks the first address:
endpointIP := endpoint.Subsets[0].Addresses[0].IP
When a second pod joins the same Service, its IP would appear at a different index in the Addresses slice, so this check would fail for any pod that isn't first in the list.

I think changing it to search through all addresses would fix this:
found := false
for _, addr := range endpoint.Subsets[0].Addresses {
    if addr.IP == podIP {
        found = true
        break
    }
}
if !found {
    return false, fmt.Errorf("pod IP %s not found in service endpoints", podIP)
}
By the way, the same pattern exists in main branch (podevaluator.go:1257), though it doesn't cause problems there since only one pod per function is currently supported.

Let me know if I'm missing something or if you'd like me to help with a fix.

Dear @thc1006 ,

Thanks for the feedback!

The idea here to have service-pod pairs, which means that one service should have only one pod.

The function runner takes care of distributing the request across the krm function pods. Two new parameters have been introduced for this. (MaxWaitlistLength/MaxParallelPodsPerFunction)

There is a waitlist for each started service-pod pair. In case of this waitlist reaches the value in the MaxWaitlistLength parameter and the configured value in MaxParallelPodsPerFunction is bigger than the actual number of pods, a new pod-service pair will be created. With multiple running pods, the incoming request will be handled by the pod with the lowest waitlist.

It can also be configured to behave the same way as it does now. In case of the MaxParallelPodsPerFunction configured to one, only one pod will handle the requests. This way the MaxWaitlistLength setting is ignored and requests are routed to there, even if the limit is exceeded.

There are a few more changes I need to make. I'll update this PR as soon as I can.

netlify · 2026-02-12T14:58:21Z

✅ Deploy Preview for porch ready!

Name	Link
🔨 Latest commit	`264ec44`
🔍 Latest deploy log	https://app.netlify.com/projects/porch/deploys/69ac9079a4c91100080cfec8
😎 Deploy Preview	https://deploy-preview-336--porch.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

efiacor · 2026-02-23T18:05:18Z

This PR will require updates to documentation.

https://docs.porch.nephio.org/docs/5_architecture_and_components/function-runner/
https://docs.porch.nephio.org/docs/6_configuration_and_deployments/configurations/components/function-runner-config/

other...

dgyorgy-nokia · 2026-02-23T19:15:59Z

Hi @dgyorgy-nokia,

Thanks for adding the podcachemanager tests! noticed the podmanager_unit_test.go from the gist didn't make it in tho — was that intentional or just missed?

also I have some more tests ready covering EvaluateFunction paths and event loop edge cases that should help push the coverage closer to 80%. lmk if u want me to share them and I'll put up another gist~

thx!

Hi @thc1006 ,
Sorry, I just missed the second test file, I’ve already pushed that one as well.
If there are further tests, please feel free to share them.
Thanks!

thc1006 · 2026-02-23T19:22:40Z

Hi @dgyorgy-nokia,

thx for pushing the second file! here's batch 2: https://gist.github.com/thc1006/3c0ab87ea258dbe1d1406daf2cd25c85

two files:

podevaluator_unit_test.go — 6 tests covering EvaluateFunction paths (success, grpc errors, context cancel, etc.)
podcachemanager_eventloop_test.go — 10 tests covering the event loop, warmupCache, and retrieveFunctionPods edge cases

one small note — i noticed podCacheManager now takes a context.Context, so i updated the goroutine calls from go pcm.podCacheManager() to go pcm.podCacheManager(t.Context()). verified everything passes against the current tip of your branch.

hope these help push coverage over 80%!

thc1006 · 2026-02-23T19:40:00Z

re efiacor's docs comment — looked at both pages and the config page is missing --max-parallel-pods-per-function and --max-waitlist-length entirely. here's what i'd add to the Pod Runtime Arguments section:

- --max-parallel-pods-per-function=1  # max pods per function image (default: 1)
- --max-waitlist-length=2             # concurrent evaluations per pod before spawning a new pod (default: 2)

one thing worth noting in the description: --max-waitlist-length counts both queued and actively running evaluations (execution inside each pod is serialized via fnEvaluationMutex), so it's a load-pressure threshold rather than a strict queue cap. also only takes effect when --max-parallel-pods-per-function > 1. same two flags should go into the Complete Example block as well.

lmk if a full diff would be easier and i'll put one together.

liamfallon · 2026-02-23T19:57:31Z

Quality Gate passed

Issues 16 New issues 0 Accepted issues

Measures 0 Security Hotspots 81.2% Coverage on New Code 0.0% Duplication on New Code

See analysis details on SonarQube Cloud

Nice!

thc1006 · 2026-02-23T19:58:25Z

Congrats!

nagygergo · 2026-02-25T15:17:34Z

@dgyorgy-nokia, in the default k8s manifests function-runner is running with 2 replicas, but as I understand in this PR load balancing is only happening inside the process. Would that mean getting overload on the krm function executors compared to the provided cache configuration?
I think that function-runner should be scaled back to a single replica.

dgyorgy-nokia · 2026-02-25T15:33:53Z

@dgyorgy-nokia, in the default k8s manifests function-runner is running with 2 replicas, but as I understand in this PR load balancing is only happening inside the process. Would that mean getting overload on the krm function executors compared to the provided cache configuration? I think that function-runner should be scaled back to a single replica.

Yes, that’s correct. The actual load balancing happens in the PodCacheManager, based on the provided cache configuration, not at the Kubernetes level. So it indeed makes sense to reduce the functionrunner replica count to one.
If that’s fine, I’ll modify the relevant part accordingly.

liamfallon · 2026-03-05T14:47:43Z

@dgyorgy-nokia would you mind rebasing this PR please?

efiacor · 2026-03-06T12:41:11Z

Is this PR ready for final review/merge @dgyorgy-nokia @thc1006 ?
Can we do a final rebase?

thc1006 · 2026-03-06T12:44:43Z

Dear @dgyorgy-nokia ,
I am ok. You make the final decision. ( thx everyone

dgyorgy-nokia · 2026-03-06T14:02:59Z

Is this PR ready for final review/merge @dgyorgy-nokia @thc1006 ? Can we do a final rebase?

Sure, it's ready.

efiacor · 2026-03-06T19:50:31Z


 go 1.25.7

+retract v1.3.0


Can we remove this retract? Don't think it's needed.

sonarqubecloud · 2026-03-07T21:00:06Z

Quality Gate passed

Issues
16 New issues
0 Accepted issues

Measures
0 Security Hotspots
81.2% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

efiacor

Nice work guys. Thanks. I suspect we might need some addition updates to the docs as a follow up. Somewhere here maybe - https://docs.porch.nephio.org/docs/5_architecture_and_components/function-runner/

dosubot · 2026-03-09T09:24:57Z

Documentation Updates

2 document(s) were updated by changes in this PR:

_index

View Changes

@@ -25,13 +25,14 @@
 ### Pod Lifecycle Management
 
 Manages function execution pods with caching and garbage collection:
-- **Pod Cache Manager**: Orchestrates pod lifecycle via channel-based communication
+- **Pod Cache Manager**: Orchestrates a pool of pods for each KRM function type via channel-based communication, supporting configurable parallel pod execution and per-pod request waitlists
 - **Pod Manager**: Handles pod and service CRUD operations
 - **Pod Creation**: Template-based pod creation with init container for wrapper server injection
 - **Service Management**: ClusterIP service frontends for service mesh compatibility
-- **TTL-Based Caching**: Reuses pods with configurable expiration and extension on use
+- **TTL-Based Caching**: Reuses pods with configurable expiration per pod; each pod in the pool has its own TTL
 - **Garbage Collection**: Periodic cleanup of expired pods and failed pod handling
 - **Pod Warming**: Pre-creates pods for frequently-used functions
+- **Horizontal Scaling**: Supports multiple parallel pods per function type (maxParallelPodsPerFunction) with configurable waitlist limits (maxWaitlistLength), improving throughput under high load while maintaining resource efficiency through TTL-based cleanup
 
 For detailed architecture and process flows, see [Pod Lifecycle Management]({{% relref "/docs/5_architecture_and_components/function-runner/functionality/pod-lifecycle-management.md" %}}).
 
@@ -86,13 +87,15 @@
 1. **Function Evaluation** receives gRPC request from Task Handler
 2. **Multi-Evaluator** tries executable evaluator first (fast path)
 3. **If NotFound**, falls back to pod evaluator (container execution)
-4. **Pod Lifecycle Management** checks pod cache for existing pod
-5. **If cache miss**, creates new pod with wrapper server via Pod Manager
-6. **Image & Registry Management** resolves image metadata and authentication
-7. **Pod Manager** creates pod with image pull secrets and service frontend
-8. **Pod Cache Manager** stores pod with TTL for reuse
-9. **Function Evaluation** connects to pod via service and executes function
-10. **Wrapper Server** executes function binary and returns structured results
-11. **Garbage Collection** periodically removes expired pods from cache
+4. **Pod Lifecycle Management** checks pod cache for available pods of the requested function type
+5. **If available pod found**, request is assigned to the pod with the shortest waitlist
+6. **If all pods busy but waitlists have capacity**, request is queued in the shortest waitlist
+7. **If all pods and waitlists full**, creates new pod (up to maxParallelPodsPerFunction limit) via Pod Manager
+8. **Image & Registry Management** resolves image metadata and authentication
+9. **Pod Manager** creates pod with image pull secrets and service frontend
+10. **Pod Cache Manager** stores pod in function's pool with individual TTL for reuse
+11. **Function Evaluation** connects to pod via service and executes function
+12. **Wrapper Server** executes function binary and returns structured results
+13. **Garbage Collection** periodically removes expired pods from each function's pool
 
 Each functional area is documented in detail on its own page with architecture diagrams, process flows, and implementation specifics.

function-evaluation

View Changes

@@ -8,7 +8,7 @@
 
 ## Overview
 
-Function evaluation is the core responsibility of the Function Runner - executing KRM (Kubernetes Resource Model) functions through pluggable evaluator strategies. The system uses a strategy pattern where different evaluators handle function execution in different ways (pod-based, executable, or chained), all conforming to a common interface.
+Function evaluation is the core responsibility of the Function Runner - executing KRM (Kubernetes Resource Model) functions through pluggable evaluator strategies. The system uses a strategy pattern where different evaluators handle function execution in different ways (pod-based, executable, or chained), all conforming to a common interface. The pod-based evaluator supports horizontal scaling through multiple parallel pods per function type, enabling high-throughput execution for demanding workloads.
 
 ### High-Level Architecture
 
@@ -133,18 +133,24 @@
 
 ### Waitlist Mechanism
 
-Prevents duplicate pod creation when multiple requests arrive for the same function:
-
-**Waitlist pattern:**
-- Multiple requests for same image queue up
-- Single pod creation serves all waiters
-- Batch notification when pod ready
-- Prevents duplicate pod creation
+Each pod maintains its own waitlist to queue requests and enable efficient load distribution:
+
+**Per-pod waitlist pattern:**
+- Each pod instance has a separate waitlist with configurable maximum length
+- Incoming requests are assigned to the pod with the shortest waitlist
+- Multiple requests for the same image can execute across different pods in parallel
+- When all pods are busy and all waitlists reach maximum length, a new pod is created (up to `maxParallelPodsPerFunction`)
+
+**Load distribution:**
+- Requests distributed to least-loaded pod (shortest waitlist)
+- Ties broken by selecting lowest pod index
+- Enables horizontal scaling when demand increases
+- Automatic scale-down through garbage collection of idle pods
 
 **Error handling:**
-- Pod creation errors sent to all waiters
-- Waitlist cleared on error
-- Each waiter receives error independently
+- Pod creation errors sent to all waiters in that pod's waitlist
+- Failed pod removed from pool and waitlist redistributed to other pods
+- Each waiter receives error independently if redistribution fails
 - Allows retry on next request
 
 ### Function Execution
@@ -417,6 +423,18 @@
 - Garbage collection removes expired pods
 - Failed pods immediately deleted
 
+**Parallel execution configuration:**
+- `maxParallelPodsPerFunction`: Controls how many pods can run simultaneously for each function type (default: 1)
+- `maxWaitlistLength`: Controls how many requests can queue per pod before triggering new pod creation (default: 2)
+- Configurable globally via command-line arguments or per-function via `pod-cache-config.yaml`
+- Per-function configuration overrides global defaults
+
+**Scaling behavior:**
+- New pod created when all existing pods' waitlists reach `maxWaitlistLength`
+- Maximum pods per function limited by `maxParallelPodsPerFunction`
+- Idle pods beyond TTL are garbage collected
+- Enables automatic scaling based on load
+
 ### Cache Warming
 
 **Warming strategy:**
@@ -437,13 +455,19 @@
 - Multiple requests can execute concurrently
 - Each request gets own gRPC connection
 - Pod cache manager coordinates access
-- Waitlist prevents duplicate pod creation
+- Per-pod waitlists manage queuing and load distribution
 
 **Concurrency characteristics:**
-- Same function, same pod: Sequential (one at a time)
-- Same function, different pods: Concurrent
+- Same function, same pod: Sequential (one at a time per pod)
+- Same function, multiple pods: Parallel execution across pods (up to `maxParallelPodsPerFunction`)
 - Different functions: Fully concurrent
-- No artificial concurrency limits
+- No artificial concurrency limits beyond configured maximums
+
+**Parallel pod scaling:**
+- Multiple pods can run simultaneously for the same function type
+- Each pod maintains its own waitlist of queued requests
+- New pods created when all waitlists reach `maxWaitlistLength` (up to `maxParallelPodsPerFunction`)
+- Enables high-throughput execution for demanding workloads
 
 ### Resource Limits
 
@@ -458,3 +482,5 @@
 - Configure cache warming for hot functions
 - Use executable evaluator for critical path
 - Monitor pod resource usage
+- Tune `maxParallelPodsPerFunction` for high-load scenarios
+- Adjust `maxWaitlistLength` to balance responsiveness vs pod overhead

^{How did I do? Any feedback?}

nephio-prow Bot requested review from johnbelamaric and liamfallon October 15, 2025 13:54

Bhargav-Singh and others added 8 commits October 15, 2025 16:09

NCDEV-2832 : support sequential KRM function evaluation in podEvaluator

75049ab

refactor podevaluator modules into separate files

db25c6b

NCDEV-2833: Multiple parallel KRM function evaluator pods

7e1c1dd

NCOMGITFM-1569 Schedule waitlisted items for failed pod to another

1cc74f1

NCDEV-3650: Unify KRM fn parameter names

5e4bb93

Add pod retrieve logic, fixes, resource request/limits for krm pods

1e1d6fe

fix lint error

ca0982f

lint fixes

42e051c

dgyorgy-nokia force-pushed the parallel-fn-pods branch from bb5ec2f to 42e051c Compare October 15, 2025 14:23

Merge branch 'main' into parallel-fn-pods

148f363

dgyorgy-nokia marked this pull request as draft October 15, 2025 17:55

nephio-prow Bot added the do-not-merge/work-in-progress #ededed label Oct 15, 2025

Merge main into parallel-fn-pods

0826d8c

mozesl-nokia mentioned this pull request Apr 27, 2026

Parallel KRM function execution #884

Open

thc1006 mentioned this pull request Jan 22, 2026

feat: Add parallel KRM function execution support #425

Closed

4 tasks

mozesl-nokia mentioned this pull request Apr 27, 2026

PIP: Manifest based configuration of KRM function executions #893

Closed

dgyorgy-nokia added 3 commits February 11, 2026 21:11

Merge branch 'main' into parallel-fn-pods

5d7e91b

Merge remote-tracking branch 'origin/main' into parallel-fn-pods

a476df6

additional changes

df04416

dgyorgy-nokia added 2 commits February 12, 2026 17:37

fix goscan

9fc82b0

add some buffer to e2e test

37d59a1

more unit tests by thc1006

88b4906

liamfallon requested review from efiacor, mozesl-nokia and nagygergo February 24, 2026 12:30

mozesl-nokia mentioned this pull request Mar 5, 2026

FunctionConfig CRD and Reconciler #475

Merged

10 tasks

Merge branch 'main' into parallel-fn-pods

1c18d32

Merge branch 'main' into parallel-fn-pods

d474586

dosubot Bot added the size:XXL This PR changes 1000+ lines, ignoring generated files. label Mar 6, 2026

Increase the expected time buffer a little for e2e test

6275e93

efiacor reviewed Mar 6, 2026

View reviewed changes

comment fix

264ec44

efiacor approved these changes Mar 9, 2026

View reviewed changes

efiacor merged commit 47eb46f into kptdev:main Mar 9, 2026
22 checks passed

Conversation

dgyorgy-nokia commented Oct 15, 2025

Title

Description

Type of Change

Checklist

Uh oh!

nephio-prow Bot commented Oct 15, 2025

Uh oh!

sonarqubecloud Bot commented Oct 15, 2025

Quality Gate failed

Uh oh!

sonarqubecloud Bot commented Nov 18, 2025

Quality Gate failed

Uh oh!

liamfallon commented Nov 24, 2025

Uh oh!

thc1006 commented Jan 22, 2026

Uh oh!

dgyorgy-nokia commented Jan 29, 2026

Uh oh!

netlify Bot commented Feb 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for porch ready!

Uh oh!

efiacor commented Feb 23, 2026

Uh oh!

dgyorgy-nokia commented Feb 23, 2026

Uh oh!

thc1006 commented Feb 23, 2026

Uh oh!

thc1006 commented Feb 23, 2026

Uh oh!

liamfallon commented Feb 23, 2026

Quality Gate passed

Uh oh!

thc1006 commented Feb 23, 2026

Uh oh!

nagygergo commented Feb 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dgyorgy-nokia commented Feb 25, 2026

Uh oh!

liamfallon commented Mar 5, 2026

Uh oh!

efiacor commented Mar 6, 2026

Uh oh!

thc1006 commented Mar 6, 2026

Uh oh!

dgyorgy-nokia commented Mar 6, 2026

Uh oh!

efiacor Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

sonarqubecloud Bot commented Mar 7, 2026

Quality Gate passed

Uh oh!

efiacor left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

dosubot Bot commented Mar 9, 2026

_index

function-evaluation

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

netlify Bot commented Feb 12, 2026 •

edited

Loading

nagygergo commented Feb 25, 2026 •

edited

Loading