Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Dataset Factory Patterns to Experiment Tracking #1824

Merged
merged 6 commits into from
Apr 3, 2024

Conversation

ravi-kumar-pilla
Copy link
Contributor

Description

Resolves #1689

Development notes

  • Discovering datasets defined via dataset factory pattern using datasets in pipelines before adding catalog and tracking datasets to kedro viz
  • Used catalog._get_dataset(dataset_name, suggest=False) to discover patterns before setting catalog

NOTE:

Earlier we used catalog.exists() on the datasets for discovery which caused the below issues -

  1. Users were not able to use Kedro Viz if the dataset defined in the catalog does not exist (This was resolved using exception handlers)
  2. Users received a timed-out issue as the datasets resided remotely and in a distributed cluster
  3. exists() also calls _get_dataset internally and also calls AbstractDataset.exists. We are removing this additional call

Example patterns -

Screenshot 2023-10-18 at 1 11 46 PM

QA notes

  • Steps to Reproduce
  • You can also follow the dataset factory patterns guide and modify the demo-project or create a spaceflights project
  • After executing kedro run, you should see the factory pattern datasets in the experiment tracking run

Checklist

  • Read the contributing guidelines
  • Opened this PR as a 'Draft Pull Request' if it is work-in-progress
  • Updated the documentation to reflect the code changes
  • Added new entries to the RELEASE.md file
  • Added tests to cover my changes

Signed-off-by: ravi-kumar-pilla <ravi_kumar_pilla@mckinsey.com>
@ravi-kumar-pilla
Copy link
Contributor Author

CircleCI Build fix - #1819

Signed-off-by: ravi-kumar-pilla <ravi_kumar_pilla@mckinsey.com>
Signed-off-by: ravi-kumar-pilla <ravi_kumar_pilla@mckinsey.com>
@ravi-kumar-pilla
Copy link
Contributor Author

Hi @iamelijahko ,

I know you are working on refactoring catalog api. I wanted to bring this to your notice if this helps -

At this moment we are using a private method of DataCatalog to get around resolving the dataset factory patterns in one of our use-case.
catalog._get_dataset(dataset_name, suggest=False)

Thank you

Copy link
Contributor

@ankatiyar ankatiyar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@ankatiyar ankatiyar requested a review from noklam April 2, 2024 15:34
@ravi-kumar-pilla ravi-kumar-pilla merged commit 14039c1 into main Apr 3, 2024
14 checks passed
@ravi-kumar-pilla ravi-kumar-pilla deleted the fix/df-patterns branch April 3, 2024 15:34
@jitu5 jitu5 mentioned this pull request Apr 16, 2024
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add Dataset Factory Patterns to Experiment Tracking
4 participants