You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
from typing import NamedTuple
from typing import TypeVar
from kfp.components import InputPath, OutputPath
PandasDataFrame = TypeVar('pandas.core.frame.DataFrame')
#def readdata(url,out: OutputPath(PandasDataFrame)):
def readdata(url:str,out: OutputPath(PandasDataFrame)):
import pandas as pd
from collections import namedtuple
df = pd.read_csv(url)
print("No of records",df.index)
df.to_parquet(out)
-------------------------------------
read_data = create_component_from_func(readdata,base_image='tensorflow/tensorflow:2.6.0', packages_to_install=['pandas==0.24','sklearn','numpy','pyarrow'])
---------------------------------------
import kfp.dsl as dsl
@dsl.pipeline(
name='Get and Process Training Data',
description='Get and Process Training data'
)
def getdata_and_process_pipeline(
a:str="https://raw.githubusercontent.com/alexcpn/neuralnetwork_learn/main/data/heart-attack-prediction/heart.csv"
):
model_path = create_nn_model().output
pd_as_parquet = read_data(url=a).output
process_task =process_data(pandas_parqute)
-----------------
client.create_run_from_pipeline_func(getdata_and_process_pipeline, arguments={})
-----------------------
However for V2, the same function is giving an execution error
read_data = create_component_from_func_v2(readdata,base_image='tensorflow/tensorflow:2.6.0', packages_to_install=['pandas==0.24','sklearn','numpy','pyarrow'])
-----
import kfp.dsl as dsl
@dsl.pipeline(
name='Get and Process Training Data',
description='Get and Process Training data'
)
def getdata_and_process_pipeline(
a:str="https://raw.githubusercontent.com/alexcpn/neuralnetwork_learn/main/data/heart-attack-prediction/heart.csv"
):
model_path = create_nn_model().output
pd_as_parquet = read_data(url=a).output
----------------
client.create_run_from_pipeline_func(getdata_and_process_pipeline,mode=kfp.dsl.PipelineExecutionMode.V2_COMPATIBLE, arguments={})
----------
Logs for readdata component
NameError: name 'PandasDataFrame' is not defined
F0818 13:29:15.365216 37 main.go:56] Failed to execute component: exit status 1
Environment
Steps to reproduce
For V1 pipeline the following works
However for V2, the same function is giving an execution error
Expected result
Custom data types should work a in v2 as in v1
Materials and Reference
Full v1 code -
https://colab.research.google.com/drive/1f_p4EVKReT57J4Maz4vRfhccJ_qVv03W?usp=sharing
Impacted by this bug? Give it a 👍. We prioritise the issues with the most 👍.
The text was updated successfully, but these errors were encountered: