Skip to content

[DataCatalog2.0]: Running pipeline with versioned datasets gives an error #4738

@ElenaKhaustova

Description

@ElenaKhaustova

Description

Running pipelines with versioned=True gives an error for all runners using feature-1.0.0 branch

The error exists in Kedro 0.19.12 too.

The following error happens when using SequentialRunner and ThreadRunner:

DatasetError: Save path
'/Users/Projects/Testing/kedrocatalog/data/02_intermediate/preprocessed_companies.parquet/2025-05-13T12.22.11.290Z/prepr
ocessed_companies.parquet' for
ParquetDataset(filepath=/Users/Projects/Testing/kedrocatalog/data/02_intermediate/preprocessed_companies.parquet,
load_args={}, protocol=file, save_args={}, version=Version(load=None, save='2025-05-13T12.22.11.290Z')) must not exist if versioning
is enabled.

With ParallelRunner and SharedMemoryDataCatalog the error changes to:

DatasetError: Data for MemoryDataset has not been saved yet.

Context

Found while #4699

Steps to Reproduce

Set versioned=True for datasets in catalog.yml and run pipelines via Python API:

default = pipelines.get("__default__")
tr = ThreadRunner()
tr.run(pipeline=ds, catalog=catalog) 

Metadata

Metadata

Labels

Component: IOIssue/PR addresses data loading/saving/versioning and validation, the DataCatalog and DataSetsComponent: Jupyter/IPythonIssue/PR relevant for Jupyter Notebooks, IPython sessions and the interactive workflow in KedroIssue: Bug Report 🐞Bug that needs to be fixed

Type

Projects

Status

In Progress

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions