Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bug] KFP V2 does not support Volumes #8570

Closed
StefanoFioravanzo opened this issue Dec 13, 2022 · 9 comments
Closed

[bug] KFP V2 does not support Volumes #8570

StefanoFioravanzo opened this issue Dec 13, 2022 · 9 comments

Comments

@StefanoFioravanzo
Copy link
Member

TL;DR

Kubeflow Pipelines V2 does not support V1 VolumeOp and PipelineVolume to manipulate volume objects during the run and mount them on pipeline steps.
Kubeflow Pipelines V2 doesn't seem to have a roadmap commitment to support volumes before GA in June 2023.

Context

Kubeflow Pipelines V1 provides the ResourceOp construct to define arbitrary K8s resources as a pipeline step. VolumeOp is an extension of ResourceOp and allows provisioning and mounting PVCs.

Problem

Kubeflow Pipelines V2 gets rid of those constructs (There is no ResourceOp or VolumeOp among the public attributes and objects of the V2 DSL

__all__ = [
'component',
'container_component',
'pipeline',
'importer',
'ContainerSpec',
'Condition',
'ExitHandler',
'ParallelFor',
'Input',
'Output',
'InputPath',
'OutputPath',
'IfPresentPlaceholder',
'ConcatPlaceholder',
'PipelineTaskFinalStatus',
'PIPELINE_JOB_NAME_PLACEHOLDER',
'PIPELINE_JOB_RESOURCE_NAME_PLACEHOLDER',
'PIPELINE_JOB_ID_PLACEHOLDER',
'PIPELINE_TASK_NAME_PLACEHOLDER',
'PIPELINE_TASK_ID_PLACEHOLDER',
'Artifact',
'ClassificationMetrics',
'Dataset',
'HTML',
'Markdown',
'Metrics',
'Model',
'SlicedClassificationMetrics',
'PipelineTask',
]
).

Not having a ResourceOp is ok since one can emulate its behavior with a custom component, but KFP execution engine seems to miss the ability to declare a volume mount in the step's Pod in the first place. We are concerned that adding volume support may not just be a matter of extending the V2 DSL because of how the new execution model is designed.

In the Kubeflow Pipelines (KFP) v2 System Design document a paragraph states the following

In this design, we support gcs, s3, minio object stores using their clients. However, a caveat is that the approach doesn't scale to support more cloud providers or custom solutions. After Kubernetes standardizes on CSI, I'd hope there will be more momentum on ReadWriteMany volume. Especially, volume drivers backed by object stores like csi-gcs, csi-s3 and azure-blob-csi may become a good fit as pipeline storages. We will continuously evaluate the space and consider supporting volumes as an alternative later.

As stated in the comments, K8s has standardized on CSI since January 2019.

We haven't seen other comments or design proposals targeting this missing functionality.

Proposed Next Steps

This is a fundamental breaking change from KFP V1 and may not find a straightforward resolution for many people relying on volumes in their production pipelines.
We propose first-class support for volumes as a non-negotiable feature that needs to be part of KFP V2 before it becomes GA.

We would love to hear some feedback from the Google folks who are contributing to this design and discuss together a way forward. We would be happy to step in and help solve this issue.

@zijianjoy @james-jwu @chensun @connor-mccarthy @elikatsis

/area backend
/area sdk

Impacted by this bug? Give it a 👍.

@connor-mccarthy
Copy link
Member

Thanks, @StefanoFioravanzo. This is on our roadmap for before June 2023.

While volumes are a general concept/feature, the implementation is typically platform-specific. For example, you'd configure a volume on Kubernetes differently than you would on GCP. KFP v2, however, compiles pipelines to a platform-agnostic pipeline representation protocol: PipelineSpec IR.

For this reason, we will be implementing support for volumes as a platform-specific feature to make support across platforms possible while keeping PipelineSpec IR decoupled from the executing platform.

@StefanoFioravanzo
Copy link
Member Author

@connor-mccarthy Thanks for your quick reply!

I wasn't aware of the platform-specific API proposal - it looks exciting and has a flexible solution to support different runtimes.

I wonder how you define the success criteria for volumes support on Kubernetes. Would it make sense to define a list of Kubernetes volume-related capabilities the SDK v1 offers so we have a reference to claim that v2 offers feature parity?

@connor-mccarthy
Copy link
Member

Thanks, @StefanoFioravanzo! That's exactly the plan. We're striving to ensure users can author pipelines in v2 that are functionally equivalent to those they could author in v1 using Kubernetes volume features (relevant docs).

@StefanoFioravanzo
Copy link
Member Author

@connor-mccarthy That's fantastic! Cannot wait to see these updates come to v2 then and looking forward to more details

@chensun chensun added this to Needs triage in KFP SDK Triage via automation Dec 29, 2022
@chensun chensun moved this from Needs triage to Backlog in KFP SDK Triage Dec 29, 2022
@vsoch
Copy link

vsoch commented Jun 29, 2023

Thanks, @StefanoFioravanzo. This is on our roadmap for before June 2023.

Hi folks! Is there any update here? I was wanting to use ResourceOp today and saw it was deprecated for v2, and not sure if I even have any options. Are most people still using v1?

@connor-mccarthy
Copy link
Member

The general ResourceOp is indeed removed in v2, but the KFP does support Volumes and Secrets via the kfp-kubernetes extension library. Please see the Platform-specific features KFP SDK docs and the kfp-kubernetes reference documentation.

KFP SDK Triage automation moved this from Backlog to Closed Jul 5, 2023
@vsoch
Copy link

vsoch commented Jul 5, 2023

Could we please keep the issue open and scope it to be about resourceOp for custom resource definitions? There is still no support or design (that I can find) to support that.

@connor-mccarthy
Copy link
Member

The original issue is primarily concerned with Volume support. Please consider opening another issue about ResourceOp support.

@vsoch
Copy link

vsoch commented Jul 5, 2023

Done #9703 thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

No branches or pull requests

3 participants