Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new catalog CLI commands for dataset factories #2603

Closed
merelcht opened this issue May 22, 2023 · 2 comments
Closed

Add new catalog CLI commands for dataset factories #2603

merelcht opened this issue May 22, 2023 · 2 comments
Assignees
Labels
Issue: Feature Request New feature or improvement to existing feature Type: Parent Issue

Comments

@merelcht
Copy link
Member

Description

Part of #2423

We should decide what new catalog CLI commands to add to support the dataset factories syntax:

  • a command that ranks all patterns so a user can understand the priority matching order
  • a resolve command that shows the fully resolved catalog when all patterns are matched (this excludes the actual patterns for ease of reading)

Context

These CLI commands are mainly to make it clear for users how exactly the dataset factories work and match the catalog entries.

@merelcht merelcht added the Issue: Feature Request New feature or improvement to existing feature label May 22, 2023
@merelcht merelcht changed the title Decide what new catalog CLI command to add for dataset factories Add new catalog CLI commands for dataset factories May 23, 2023
@ankatiyar
Copy link
Contributor

ankatiyar commented Jun 12, 2023

  • Update kedro catalog list
    • Should correctly list patterned datasets as the right Dataset type instead of DefaultDataSet

CURRENT

Datasets in '__default__' pipeline:
  Datasets mentioned in pipeline:
    DefaultDataset:
    - germany_companies
    - france_companies
    - switzerland_companies

EXPECTED

Datasets in '__default__' pipeline:
  Datasets mentioned in pipeline:
    CSVDataSet:
    - switzerland_companies
    - germany_companies
    - france_companies
  • Update kedro catalog create --pipeline <pipeline_name>
    • creates `conf/base/catalog/<pipeline_name>.yml

CURRENT

france_companies:
  type: MemoryDataset
germany_companies:
  type: MemoryDataset
switzerland_companies:
  type: MemoryDataset

EXPECTED

france_companies:
  filepath: data/01_raw/france_companies.csv
  type: pandas.CSVDataSet
  
germany_companies:
 filepath: data/01_raw/germany_companies.csv
 type: pandas.CSVDataSet
 
switzerland_companies:
 filepath: data/01_raw/switzerland_companies.csv
 type: pandas.CSVDataSet
 

@merelcht
Copy link
Member Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Issue: Feature Request New feature or improvement to existing feature Type: Parent Issue
Projects
Archived in project
Development

No branches or pull requests

3 participants