Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pipeline with volume does not work with "k8sapi" argo executor #4326

Closed
nakfour opened this issue Aug 5, 2020 · 17 comments
Closed

Pipeline with volume does not work with "k8sapi" argo executor #4326

nakfour opened this issue Aug 5, 2020 · 17 comments
Assignees
Labels
lifecycle/stale The issue / pull request is stale, any activities remove this label.

Comments

@nakfour
Copy link
Member

nakfour commented Aug 5, 2020

What steps did you take:

As part of Open Data Hub we are trying to get KFP 1.0 working on Openshift 4.x. We must use the "k8sapi" executor since docker is not present. Provided KFP python examples work as long as there are no passing parameters. We are trying to convert the KFP examples with passing parameters to use volumes to get them to work on OCP. We tried one of the simple volume creation examples as pointed here (https://github.com/kubeflow/pipelines/blob/master/samples/core/volume_ops/volume_ops.py) This caused an error stating we need to use volumes or emtpyDir. Looking at the generated yaml file we notice these lines

outputs:
      parameters:
      - name: create-pvc-manifest
        valueFrom: {jsonPath: '{}'}
      - name: create-pvc-name
        valueFrom: {jsonPath: '{.metadata.name}'}
      - name: create-pvc-size
        valueFrom: {jsonPath: '{.status.capacity.storage}'}

These need to be placed in emptyDir or volumes. We modified the generated yaml file to exclude the output and that worked.

Is there a way we can use volumes or emptydir in KFP python without generating these output parameters?

Environment:

KFP 1.0 on OCP 4.4

@Bobgy
Copy link
Contributor

Bobgy commented Aug 6, 2020

/assign @Ark-kun @neuromage @chensun @numerology

@Ark-kun
Copy link
Contributor

Ark-kun commented Aug 11, 2020

I think this is a know upstream issue: argoproj/argo-workflows#2679

We must use the "k8sapi" executor since docker is not present.

The pns executor might be the more future-proof way.

As a workaround: All VolumeOp does is just pass the volume YAML to Argo and argo just calls kubectl create using that YAML. So, you could just create a custom component that uses kubectl create to create a volume. This way you control the outputs.

@nakfour
Copy link
Member Author

nakfour commented Aug 12, 2020

@Ark-kun we documented why we are not using the pns executer here: https://github.com/opendatahub-io/odh-manifests/issues/117
The issue is with the DSL generated volume yaml that includes these lines as mentioned before

outputs:
      parameters:
      - name: create-pvc-manifest
        valueFrom: {jsonPath: '{}'}
      - name: create-pvc-name
        valueFrom: {jsonPath: '{.metadata.name}'}
      - name: create-pvc-size
        valueFrom: {jsonPath: '{.status.capacity.storage}'}

These lines are not generated by Argo, they are generated by the python DSL SDK when creating a volume. Is there a way to create a volume in python without any outputs that require EmptyDir?

@Ark-kun
Copy link
Contributor

Ark-kun commented Aug 19, 2020

You might want to make sure your Kubernetes environment supports Argo. I was under the impression that several of our users have used OpenShift semi-successfully.

Provided KFP python examples work as long as there are no passing parameters.

I think this is a standard feature of Argo. Can you please check whether any of Argo's parameter or artifact passing examples work for your setup? Data passing is a core feature of Argo and KFP. Without making it work, the product might not be usable.

The issue is with the DSL generated volume yaml that includes these lines as mentioned before

I think this is a standard feature of Argo resource templates. You might want to make sure your Kubernetes environment supports Argo.

These need to be placed in emptyDir or volumes.

Can you explain what needs to be placed in the volumes?

We must use the "k8sapi" executor since docker is not present.

I think we had several of our users use KFP in non-docker environments. Argo developers propose PNS executor as the optimal one for the future.

We tried one of the simple volume creation examples

I'd consider the VolumeOp to be an experimental community-driven feature. Modifying global state (such as creating Kubernetes objects) is somewhat frowned upon compared to pure components that only use their inputs.

Is there a way to create a volume

You can use kubectl create .... You can create container component that runs kubectl create .... In fast that's what VolumeOp and Argo do.

without any outputs that require EmptyDir

I'm not sure I understand how EmptyDir comes into play here. Can you please help me understand.

@Ark-kun
Copy link
Contributor

Ark-kun commented Aug 19, 2020

I think the best course of action is to first try to make the underlying orchestrator (Argo) work for you.

@Ark-kun
Copy link
Contributor

Ark-kun commented Aug 19, 2020

I think it might be best for you to avoid VolumeOp for now. Try making the containers and data passing work. If the system throws error asking to create emptyDir volumes under the output paths, then it might be possible to post-process the compiled workflow to add them. We've suggested to the Argo developers that they auto-mount the emptyDir volumes if Argo needs them for operation.

@nakfour
Copy link
Member Author

nakfour commented Aug 19, 2020

@Ark-kun yes Argo works, we have been using it in our production cluster actually for quite some time. The key is to use volumes in passing parameters, once that is done, it all works. We are trying to find a way to get KFP python code work by allowing users to create volumes in the python code, instead of telling them dooc create pv first then write your python KFP code.

@nakfour
Copy link
Member Author

nakfour commented Aug 19, 2020

@Ark-kun is it possible to even use emptyDir from python KFP?

@nakfour
Copy link
Member Author

nakfour commented Aug 20, 2020

@Ark-kun thanks for component suggestion in the KFP weekly for a component to create a volume from python. Is it possible to collaborate on creating this feature?

@eterna2
Copy link
Contributor

eterna2 commented Sep 30, 2020

Hey @nakfour

I am trying to setup kubeflow in our openshift cluster too. I would n't mind helping out too.

Would be cool if u can tag me also for any openshift related issue too? (So I can watch out for it as I am still at a very early stage of getting kf to work on our cluster).

@Ark-kun
Copy link
Contributor

Ark-kun commented Sep 30, 2020

yes Argo works

I was interested whether the whole Argo works (e.g. whether the artifact and parameter passing examples work). Data passing is a core Argo feature that distinguishes it from many other orchestrators.

We are trying to find a way to get KFP python code work by allowing users to create volumes in the python code, instead of telling them dooc create pv first then write your python KFP code.

We're somewhat discouraging volumes due to them often making components non-reusable (make them depend on the cluster state). Also, volumes a global (like global variables) which makes components less pure and less reproducible. We also discourage volume creation as part of the pipeline, because it's affecting global state. If some pipeline task deletes a volume while some future task will need it, there would be problems.

is it possible to even use emptyDir from python KFP?

Every KFP task has task.add_volume() and task.add_volume_mount(). They support all Kubernetes volumes including emptyDir.

The key is to use volumes in passing parameters, once that is done, it all works.

AFAIK, for container components/templates mounting emptyDir volumes under the outputs should work around this Argo issue.

BTW, We've developed an experimental way to switch KFP's data-passing from Argo's artifact repository to Volumes. Please check

def data_passing_method(self, value):

        from kfp.dsl import PipelineConf, data_passing_methods
        from kubernetes.client.models import V1Volume, V1PersistentVolumeClaim
        pipeline_conf = PipelineConf()
        pipeline_conf.data_passing_method = data_passing_methods.KubernetesVolume(
            volume=V1Volume(
                name='data',
                persistent_volume_claim=V1PersistentVolumeClaim('data-volume'),
            ),
            path_prefix='artifact_data/',
        )

@nakfour
Copy link
Member Author

nakfour commented Oct 1, 2020

@Ark-kun wanted to see if there us any update on this issue? Thanks

@elikatsis
Copy link
Member

elikatsis commented Oct 2, 2020

Hi @nakfour, this issue had totally slipped from me!

If your only problem using VolumeOps is the outputs, then you can disable them in your DSL code as follows:

vop = kfp.dsl.VolumeOp(...)
vop.outputs = {}

If this helps you, we could implement an argument in the constructor to disable outputs and all that should be required will be:

vop = kfp.dsl.VolumeOp(..., disable_outputs=True)

We also discourage volume creation as part of the pipeline, because it's affecting global state. If some pipeline task deletes a volume while some future task will need it, there would be problems.

Creating resources as part of the pipeline is essential to ML and there are components and samples that actually do similar things. If I'm not wrong, the taxi cab example creates deployments and services using pure components, doesn't it? The bad thing about these components is that users don't really have a say on the resources created 😄
Creating volumes is similar to that and comes with lots of benefits (I'll omit them since this issue isn't a place to discuss about them).

Regarding the experimental data passing method, this code is wrong:

pipeline_conf.data_passing_method = data_passing_methods.KubernetesVolume(
    volume=V1Volume(
        name='data',
        persistent_volume_claim=V1PersistentVolumeClaim('data-volume'),
    ),
    path_prefix='artifact_data/',
)

The correct way to do it is

pipeline_conf.data_passing_method = data_passing_methods.KubernetesVolume(
    volume=V1Volume(
        name='data',
        persistent_volume_claim=V1PersistentVolumeClaimVolumeSource(claim_name='data-volume'),
    ),
    path_prefix='artifact_data/',
)

But all of this is too much boilerplate code. You can always use a PipelineVolume which makes things much easier since it is a V1Volume with some extra magic:

pipeline_conf.data_passing_method = data_passing_methods.KubernetesVolume(
    volume=kfp.dsl.PipelineVolume(pvc="data-volume"),
    path_prefix='artifact_data/',
)

However, this data passing method is not easy to maintain. All your pipelines will be using the same PVC, or you will have your users ask which PVC to use every time they want to run a pipeline. And what size should these PVCs have? How can you know how much data will your users be passing around?
Creating PVCs using VolumeOps in the beginning of the pipeline and (optionally) deleting them in the end seems like the easiest, fastest, and most sane way to go.

@zsaladin
Copy link
Contributor

zsaladin commented Oct 8, 2020

In my case I need the outputs.parameters.
Because other containers depend on the pvc name.

So I changed my codes.

Old Codes

volume = dsl.VolumeOp(
    name="volume",
    resource_name="volume",
    modes=dsl.VOLUME_MODE_RWO,
    size="1Gi"
)
merge_op = func_to_container_op(merge)
merge_task = merge_op()
merge_task.apply(
    mount_pvc(
        pvc_name=volume .volume.persistent_volume_claim.claim_name,
        volume_mount_path="/mnt"
    )
)

New Codes

volume = dsl.VolumeOp(
    name="volume",
    resource_name="volume",
    modes=dsl.VOLUME_MODE_RWO,
    size="1Gi"
)
volume.outputs = {}  # To exclude outputs

merge_op = func_to_container_op(merge)
merge_task = merge_op()
merge_task.apply(
    mount_pvc(
        pvc_name="{{workflow.name}}-volume",  # Hard coded name instead of outputs.parameter 
        volume_mount_path="/mnt"
    )
)

The name of pvc consists of workflow.name and volume.name.
This one works.

@nakfour
Copy link
Member Author

nakfour commented Dec 9, 2020

Thanks @elikatsis I will try your comment on disabling the outputs, if that works and we can create volumes with VolumeOp then we should be good. @zsaladin also thanks for the code example

@stale
Copy link

stale bot commented Jun 9, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the lifecycle/stale The issue / pull request is stale, any activities remove this label. label Jun 9, 2021
@stale
Copy link

stale bot commented Apr 19, 2022

This issue has been automatically closed because it has not had recent activity. Please comment "/reopen" to reopen it.

@stale stale bot closed this as completed Apr 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lifecycle/stale The issue / pull request is stale, any activities remove this label.
Projects
None yet
Development

No branches or pull requests

9 participants