[sdk] Not able to pass Custom Data Types to V2 Pipeline (works with v1) #6390

alexcpn · 2021-08-19T06:00:47Z

Environment

KFP version:

kustomize build apps/pipeline/upstream/env/platform-agnostic-multi-user-pns | grep 1.5.1
2021/08/19 11:19:46 nil value at `valueFrom.configMapKeyRef.name` ignored in mutation attempt
2021/08/19 11:19:46 nil value at `valueFrom.secretKeyRef.name` ignored in mutation attempt
2021/08/19 11:19:46 well-defined vars that were never replaced: kfp-app-name,kfp-app-version
  appVersion: 1.5.1
        image: gcr.io/ml-pipeline/cache-deployer:1.5.1
        image: gcr.io/ml-pipeline/cache-server:1.5.1
      - image: gcr.io/ml-pipeline/metadata-envoy:1.5.1
        image: gcr.io/ml-pipeline/metadata-writer:1.5.1
        image: gcr.io/ml-pipeline/api-server:1.5.1
        image: gcr.io/ml-pipeline/persistenceagent:1.5.1
        image: gcr.io/ml-pipeline/scheduledworkflow:1.5.1
        image: gcr.io/ml-pipeline/frontend:1.5.1
        image: gcr.io/ml-pipeline/viewer-crd-controller:1.5.1
      - image: gcr.io/ml-pipeline/visualization-server:1.5.1

KFP SDK version:

build version dev_local

All dependencies version:

kfp                      1.6.3
kfp-pipeline-spec        0.1.8
kfp-server-api           1.6.0

Steps to reproduce

For V1 pipeline the following works

from typing import NamedTuple
from typing import TypeVar
from kfp.components import InputPath, OutputPath
PandasDataFrame = TypeVar('pandas.core.frame.DataFrame')
#def readdata(url,out: OutputPath(PandasDataFrame)):
def readdata(url:str,out: OutputPath(PandasDataFrame)):    
    import pandas as pd
    from collections import namedtuple
    df = pd.read_csv(url)
    print("No of records",df.index)
    df.to_parquet(out)        
-------------------------------------    
read_data = create_component_from_func(readdata,base_image='tensorflow/tensorflow:2.6.0', packages_to_install=['pandas==0.24','sklearn','numpy','pyarrow'])

---------------------------------------
import kfp.dsl as dsl
@dsl.pipeline(
  name='Get and Process Training Data',
  description='Get and Process Training data'
)
def getdata_and_process_pipeline(
  a:str="https://raw.githubusercontent.com/alexcpn/neuralnetwork_learn/main/data/heart-attack-prediction/heart.csv"
):
  
  model_path = create_nn_model().output
  pd_as_parquet = read_data(url=a).output
  process_task =process_data(pandas_parqute)
-----------------
client.create_run_from_pipeline_func(getdata_and_process_pipeline, arguments={})
-----------------------

However for V2, the same function is giving an execution error

read_data = create_component_from_func_v2(readdata,base_image='tensorflow/tensorflow:2.6.0', packages_to_install=['pandas==0.24','sklearn','numpy','pyarrow'])

-----
import kfp.dsl as dsl
@dsl.pipeline(
  name='Get and Process Training Data',
  description='Get and Process Training data'
)
def getdata_and_process_pipeline(
  a:str="https://raw.githubusercontent.com/alexcpn/neuralnetwork_learn/main/data/heart-attack-prediction/heart.csv"
):
  
  model_path = create_nn_model().output
  pd_as_parquet = read_data(url=a).output
----------------
client.create_run_from_pipeline_func(getdata_and_process_pipeline,mode=kfp.dsl.PipelineExecutionMode.V2_COMPATIBLE, arguments={})
----------
Logs for readdata component

NameError: name 'PandasDataFrame' is not defined
F0818 13:29:15.365216      37 main.go:56] Failed to execute component: exit status 1

Expected result

Custom data types should work a in v2 as in v1

Materials and Reference

Full v1 code -

https://colab.research.google.com/drive/1f_p4EVKReT57J4Maz4vRfhccJ_qVv03W?usp=sharing

Impacted by this bug? Give it a 👍. We prioritise the issues with the most 👍.

The text was updated successfully, but these errors were encountered:

zijianjoy · 2021-08-20T00:43:03Z

duplicate #5711

cc @chensun

alexcpn added area/sdk kind/bug labels Aug 19, 2021

zijianjoy closed this as completed Aug 20, 2021

alexcpn mentioned this issue Aug 21, 2021

[sdk] Pipeline V2 - Ouput[Dataset] giving error - TypeError: expected str, bytes or os.PathLike object, not NoneType #6410

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[sdk] Not able to pass Custom Data Types to V2 Pipeline (works with v1) #6390

[sdk] Not able to pass Custom Data Types to V2 Pipeline (works with v1) #6390

alexcpn commented Aug 19, 2021 •

edited

Loading

zijianjoy commented Aug 20, 2021

[sdk] Not able to pass Custom Data Types to V2 Pipeline (works with v1) #6390

[sdk] Not able to pass Custom Data Types to V2 Pipeline (works with v1) #6390

Comments

alexcpn commented Aug 19, 2021 • edited Loading

Environment

Steps to reproduce

Expected result

Materials and Reference

zijianjoy commented Aug 20, 2021

alexcpn commented Aug 19, 2021 •

edited

Loading