[core] skip model agent task for pvc storage#171
Conversation
There was a problem hiding this comment.
Summary of Changes
Hello @slin1237, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
This pull request introduces foundational changes to the model agent components to enable future Persistent Volume Claim (PVC) storage support. It modifies the agent's behavior to explicitly ignore and skip any models that specify PVC as their storage type, delegating their management entirely to the BaseModel controller. This is a critical step in the broader implementation of OEP-0004, ensuring a clear separation of concerns for different storage mechanisms.
Highlights
- Model Agent Task Skipping: The
Gophercomponent of the model agent (pkg/modelagent/gopher.go) has been updated to explicitly detectPVCstorage types and immediately skip any further processing for models using them. This ensures that the model agent does not attempt to handle PVC-backed models, which are intended to be managed by the BaseModel controller. - Scout Component Download Logic Update: The
Scoutcomponent (pkg/modelagent/scout.go) now includes logic within itsshouldDownloadModelfunction to identifyPVCstorage URIs. If a model usesPVCstorage,Scoutwill returnfalse, indicating that the model should not be downloaded by the agent, further reinforcing the delegation of PVC handling to the BaseModel controller. - Unit Test Coverage: New unit tests have been added in
pkg/modelagent/gopher_test.goto verify the correct skipping behavior forPVCstorage types in both theGopher's task processing and theScout's download decision logic. These tests also confirm that existing storage types (OCI, HuggingFace, Vendor) are still processed as expected.
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.
| Feature | Command | Description |
|---|---|---|
| Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
| Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
| Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
| Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Code Review
This PR correctly implements the skipping of PVC storage types in the model agent's gopher and scout components, aligning with the design to handle PVCs in the BaseModel controller. The changes are logical and well-contained. My review focuses on improving error handling robustness in the scout, strengthening the new unit tests for better coverage and correctness, and fixing a compilation issue in the new test file.
| "go.uber.org/zap" | ||
| corev1 "k8s.io/api/core/v1" | ||
| metav1 "k8s.io/apimachinery/pkg/apis/meta/v1" | ||
| ) |
There was a problem hiding this comment.
| storageType, err := storage.GetStorageType(*storageSpec.StorageUri) | ||
| if err == nil && storageType == storage.StorageTypePVC { | ||
| w.logger.Debugf("Skipping PVC storage type in shouldDownloadModel (handled by BaseModel controller)") | ||
| return false | ||
| } |
There was a problem hiding this comment.
The error returned by storage.GetStorageType is currently being ignored. If an invalid storage URI is provided (e.g., invalid://...), GetStorageType will return an error, but the function will proceed to evaluate node selectors and affinity. This could lead to unexpected behavior where the scout incorrectly decides a model should be downloaded, only for the gopher to fail later.
It would be more robust to handle the error here, log it, and prevent the download from being scheduled.
| storageType, err := storage.GetStorageType(*storageSpec.StorageUri) | |
| if err == nil && storageType == storage.StorageTypePVC { | |
| w.logger.Debugf("Skipping PVC storage type in shouldDownloadModel (handled by BaseModel controller)") | |
| return false | |
| } | |
| storageType, err := storage.GetStorageType(*storageSpec.StorageUri) | |
| if err != nil { | |
| w.logger.Warnf("Invalid storage URI '%s', cannot determine storage type: %v. Model will not be downloaded.", *storageSpec.StorageUri, err) | |
| return false | |
| } | |
| if storageType == storage.StorageTypePVC { | |
| w.logger.Debugf("Skipping PVC storage type in shouldDownloadModel (handled by BaseModel controller)") | |
| return false | |
| } |
| ) | ||
|
|
||
| // TestHandleTaskPVCSkip tests that PVC storage types are properly skipped | ||
| func TestHandleTaskPVCSkip(t *testing.T) { |
There was a problem hiding this comment.
This test, TestHandleTaskPVCSkip, doesn't seem to test the gopher's processTask function as its name implies. Instead, it primarily tests the behavior of storage.GetStorageType and validates a boolean flag expectSkip from the test case table. The core logic inside gopher.go's switch statement for StorageTypePVC is not actually executed or verified.
Additionally, the assertion logic is incomplete. The check if storageType == storage.StorageTypePVC { assert.True(t, tc.expectSkip, ...) } only validates the true case for expectSkip. It doesn't verify that expectSkip is correctly false for other storage types.
To improve this, consider renaming the test to something like TestGetStorageTypeForGopherTasks to more accurately reflect what's being tested, or refactoring this into a true unit test for gopher.processTask.
| if storageType == storage.StorageTypePVC { | ||
| assert.True(t, tc.expectSkip, "PVC storage type should be skipped") | ||
| } |
There was a problem hiding this comment.
The assertion logic here is incomplete. It only validates the true case for expectSkip when the storage type is PVC. It doesn't verify that expectSkip is correctly false for other storage types.
A more robust assertion would check the expectation for all cases.
// Verify that PVC storage type would be skipped
shouldSkip := (storageType == storage.StorageTypePVC)
assert.Equal(t, tc.expectSkip, shouldSkip)| testCases := []struct { | ||
| name string | ||
| storageSpec *v1beta1.StorageSpec | ||
| expectedResult bool | ||
| description string | ||
| }{ |
There was a problem hiding this comment.
The test suite for shouldDownloadModel is good, but it would be more complete if it included a test case for an invalid storage URI. This would verify that the function correctly handles malformed URIs and returns false, preventing download tasks from being scheduled. This is especially relevant given the potential for unhandled errors from storage.GetStorageType.
{
name: "Invalid storage URI should be skipped",
storageSpec: &v1beta1.StorageSpec{
StorageUri: stringPtr("invalid://some/uri"),
},
expectedResult: false,
description: "Invalid storage URI should return false (skip)",
}
PR Type
/kind feature
What this PR does / why we need it
This PR implements the model agent changes needed for PVC storage support as outlined in OEP-0004. Specifically, it updates the model agent
(
gopher.goandscout.go) to skip processing of PVC storage types entirely.Changes made:
storage.GetStorageType()pvc://{pvc-name}/{sub-path})Why this is needed:
PVC storage requires a fundamentally different approach than downloadable storage types. Since DaemonSet pods cannot effectively mount PVCs
(especially RWO volumes), and PVC models don't require downloading or node-specific verification, the model agent must skip these entirely. This
allows the BaseModel controller to handle all PVC operations directly, avoiding complex DaemonSet-PVC interactions.
Which issue(s) this PR fixes
Related to OEP-0004: PVC Storage Support implementation
#162 #160
Special notes for your reviewer
User-facing change
NONE - This is an internal implementation change that prepares for future PVC storage support. Users will not see any behavioral changes with this
PR alone.