Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix dataset factory patterns in Experiment Tracking #1588

Merged
merged 12 commits into from Oct 27, 2023

Conversation

ravi-kumar-pilla
Copy link
Contributor

@ravi-kumar-pilla ravi-kumar-pilla commented Oct 18, 2023

Description

Resolves #1480

Development notes

Example patterns -

Screenshot 2023-10-18 at 1 11 46 PM

QA notes

  • Steps to Reproduce
  • You can also follow the dataset factory patterns guide and modify the demo-project or create a spaceflights project
  • After executing kedro run, you should see the factory pattern datasets in the experiment tracking run

Checklist

  • Read the contributing guidelines
  • Opened this PR as a 'Draft Pull Request' if it is work-in-progress
  • Updated the documentation to reflect the code changes
  • Added new entries to the RELEASE.md file
  • Added tests to cover my changes

Signed-off-by: ravi-kumar-pilla <ravi_kumar_pilla@mckinsey.com>
Signed-off-by: ravi-kumar-pilla <ravi_kumar_pilla@mckinsey.com>
@ravi-kumar-pilla ravi-kumar-pilla changed the title Fix issue of dataset factory discovery in Kedro Viz Experiment Tracking Fix for dataset factory discovery in Kedro Viz Experiment Tracking Oct 18, 2023
@ravi-kumar-pilla ravi-kumar-pilla changed the title Fix for dataset factory discovery in Kedro Viz Experiment Tracking Fix dataset factory discovery in Kedro Viz Experiment Tracking Oct 18, 2023
Signed-off-by: ravi-kumar-pilla <ravi_kumar_pilla@mckinsey.com>
"""
for pipeline in pipelines.values():
# Temporary try/except block so the Kedro develop branch can work with Viz.
try:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might catch other exceptions. You could try if hasattr(pipeline, 'data_set):`

@noklam
Copy link
Contributor

noklam commented Oct 19, 2023

Is this ready to be reviewed or still in draft?

@ravi-kumar-pilla
Copy link
Contributor Author

Is this ready to be reviewed or still in draft?

I reverted it to draft PR in favor of 1491 . Apologies for the confusion

…x/et-dataset_factory

Signed-off-by: ravi-kumar-pilla <ravi_kumar_pilla@mckinsey.com>
Signed-off-by: ravi-kumar-pilla <ravi_kumar_pilla@mckinsey.com>
Signed-off-by: ravi-kumar-pilla <ravi_kumar_pilla@mckinsey.com>
Signed-off-by: ravi-kumar-pilla <ravi_kumar_pilla@mckinsey.com>
…x/et-dataset_factory

Signed-off-by: ravi-kumar-pilla <ravi_kumar_pilla@mckinsey.com>
@ravi-kumar-pilla ravi-kumar-pilla changed the title Fix dataset factory discovery in Kedro Viz Experiment Tracking Fix dataset factory patterns in Experiment Tracking Oct 25, 2023
Signed-off-by: ravi-kumar-pilla <ravi_kumar_pilla@mckinsey.com>
@ravi-kumar-pilla ravi-kumar-pilla marked this pull request as ready for review October 25, 2023 20:29
@ravi-kumar-pilla
Copy link
Contributor Author

Is this ready to be reviewed or still in draft?

Hi @noklam, this is now ready for review

Copy link
Contributor

@ankatiyar ankatiyar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM :)

Signed-off-by: ravi-kumar-pilla <ravi_kumar_pilla@mckinsey.com>
…/kedro-viz into fix/et-dataset_factory

Signed-off-by: ravi-kumar-pilla <ravi_kumar_pilla@mckinsey.com>
@ravi-kumar-pilla ravi-kumar-pilla merged commit dca4581 into main Oct 27, 2023
20 checks passed
@ravi-kumar-pilla ravi-kumar-pilla deleted the fix/et-dataset_factory branch October 27, 2023 17:39
@noklam
Copy link
Contributor

noklam commented Oct 30, 2023

For reference, @rashidakanchwala shared an alternative solution here. @ravi-kumar-pilla What was the reason that this wasn't accepted, is it because of some edge case?

I guess I don't understand why tracking dataset is treated in a special way, is this just for experiment tracking? Viz should have traversed the graph once already when creating the DAG, so it feels unnecessary to traverse again to resolve the name.

@ravi-kumar-pilla
Copy link
Contributor Author

Hi @noklam , Thanks for mentioning it here. @rashidakanchwala and I had a discussion on this. So, based on what I understood from this PR -

  1. Separation of responsibility - TrackingDatasetRepository was introduced and is populated during the add_catalog phase which is primarily used in experiment tracking runs. (can also be used in pipelines)
  2. Experiment Tracking should not depend on creation of the DAG, rather it is a historical data of kedro runs. Ideally, ET data should not depend on even the catalog information as it is a tracker of historic data and the catalog can change between runs. But, as of now ET does not store data/path to the datasets. This makes it depend on the catalog information.
  3. This PR has the tracking datasets being created during the add_dataset function (i.e., during the creation of node repos for DAG). This makes ET data to be dependant more on the creation of pipelines, which will not serve the purpose of separation of responsibility.

Let me know what do you think of this.

Thank you

@noklam
Copy link
Contributor

noklam commented Oct 30, 2023

Experiment Tracking should not depend on creation of the DAG, rather it is a historical data of kedro runs. Ideally, ET data should not depend on even the catalog information as it is a tracker of historic data and the catalog can change between runs. But, as of now ET does not store data/path to the datasets. This makes it depend on the catalog information.

I think this is a good point, thanks for clarifying.

@rashidakanchwala rashidakanchwala mentioned this pull request Nov 16, 2023
5 tasks
rashidakanchwala added a commit that referenced this pull request Nov 17, 2023
This is minor release with big backend refactoring work and some bug fixes.

Bug fixes and other changes
Refactor flowchart dataclasses to pydantic base models. (Refactor Flowchart models from dataclass to pydantic base models #1565)
Fix dataset factory patterns in Experiment Tracking. (Fix dataset factory patterns in Experiment Tracking #1588)
Update demo-project to use OmegaConfigLoader. (Update demo project to use OmegaConfigLoader #1590)
Improve feedback for copy to clipboard feature. (Add tooltip to shareable urls copy button #1614)
Ensure Kedro-Viz works when hosted on a URL subpath. (Fix: Kedro-Viz doesn't work when hosted via a URL subpath #1621)
Bump fastapi upper bounds. (Bump FAST API upper bounds #1634)
Fix shareable URL modal to appear across the app. (Make shareable URL modal open globally across the app.  #1639)
Add Kedro-Viz CLI command deprecation warning. (Add kedro viz deprecation warning for CLI #1641)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

No experiment tracking for datasets defined with a dataset factory
3 participants