Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No experiment tracking for datasets defined with a dataset factory #1480

Closed
1 task done
pierre-godard opened this issue Aug 11, 2023 · 10 comments · Fixed by #1588
Closed
1 task done

No experiment tracking for datasets defined with a dataset factory #1480

pierre-godard opened this issue Aug 11, 2023 · 10 comments · Fixed by #1588

Comments

@pierre-godard
Copy link

pierre-godard commented Aug 11, 2023

Description

Experiment tracking does not detect datasets when defined with a dataset factory.

Context

I've been using the recent dataset factory feature and it's been wonderful!

However, when I tried to visualize the tracked dataset with Kedro Viz experiment tracking feature, no tracked dataset is detected.

Steps to Reproduce

  1. Create a node that outputs some metrics dict structure under the my_namespace.metrics name
  2. Update your catalog with a dataset factory:
"{namespace}.metrics":
  type: tracking.MetricsDataSet
  filepath: data/{namespace}.metrics.json
  1. Run your kedro pipeline, the file should now be saved under data/my_namespace.metrics.json/<SESSION_ID>/my_namespace.metrics.json
  2. Run kedro viz and go to the latest experiment, under the "Overview" tab

Expected Result

The expected behavior would be for the my_namespace.metrics JSON content to appear on the UI.

Actual Result

There is no JSON content on the UI.

Your Environment

Include as many relevant details as possible about the environment you experienced the bug in:

  • Web browser system and version: Google Chrome - Version 115.0.5790.170 (Official Build) (64-bit)
  • Operating system and version: Ubuntu 22.04.3 LTS (64bits)
  • NodeJS version used (if relevant): -
  • Kedro version used: 0.18.12
  • Kedro Viz version used: 6.3.4
  • Python version used: 3.9.17

Checklist

  • Include labels so that we can categorise your issue
@pierre-godard
Copy link
Author

pierre-godard commented Aug 11, 2023

I looked a bit more carefully at the code and could come up with a hook that seems to solve the problem.

It enforces the discovery of data sets defined in the registered pipelines:

import logging
from typing import Dict

from kedro.framework.hooks import hook_impl
from kedro.framework.project import pipelines
from kedro.io.core import DataSetNotFoundError
from kedro.io.data_catalog import DataCatalog
from kedro.pipeline import Pipeline

LOGGER = logging.getLogger(__name__)


class DataCatalogDiscoveryHooks:
    """
    Custom hooks for Kedro.
    """

    @hook_impl
    def after_catalog_created(self, catalog: DataCatalog) -> None:
        """
        Enforce the discovery of all the data sets in the project.
        """
        _pipelines: Dict[str, Pipeline] = dict(pipelines)

        LOGGER.info("Enforcing data set pattern discovery...")

        data_set_names = {data_set_name for pipeline in _pipelines.values() for data_set_name in pipeline.data_sets()}

        # Sort data sets by name, then by namespace to display similar data sets together in kedro viz
        sorted_data_set_names = sorted(data_set_names, key=lambda name: ".".join(reversed(name.split("."))))

        for data_set_name in sorted_data_set_names:
            try:
                # Enforce data set pattern discovery
                catalog._get_dataset(data_set_name)  # pylint: disable=protected-access
            except DataSetNotFoundError:
                continue

Would this be a suitable solution? I so, I can come up with a PR to add this logic in kedro_viz.server.populate_data for example.

@tynandebold
Copy link
Member

Hi @pierre-godard, thanks so much for the ticket and this investigation!

Please do open a PR with this solution you've outlined above and we can start taking a closer look to get it merged in and the problem fixed.

@pierre-godard
Copy link
Author

Hi! Here is the PR: #1491

@tynandebold
Copy link
Member

Amazing, thank you! We'll have a look soon.

@ravi-kumar-pilla
Copy link
Contributor

ravi-kumar-pilla commented Oct 17, 2023

Hi @pierre-godard , Thank you for the PR.

After looking at the PR and the way we access datasets, I feel the discovery should be via the catalog object. We need to get all the datasets available (both factory pattern and normal) via the DataCatalog object. Further looking into the issue, we get the list of dataset names via DataCatalog object's list() method. I feel the list() method should include the dataset factory names along with regular dataset names. This will help viz to get all the dataset names via catalog.

Happy to discuss further with the team. @ankatiyar @merelcht

Thank you

@ankatiyar
Copy link
Contributor

@ravi-kumar-pilla I'll look into this!

@ankatiyar
Copy link
Contributor

I've left some comments on the PR but the approach does seem like the most straightforward one. :)

@EloyID
Copy link

EloyID commented Feb 14, 2024

Hello I am using Kedro-viz 7.10. I try using experiment tracking with factories and it does not work. If I write the following in the catalog data is generated but not shown in the Kedro viz

"{dataset_name}#metrics":
  type: tracking.MetricsDataset
  filepath: data/10_tracking/{dataset_name}.json

If I add after this code, (not running experiments again, just refreshing kedro viz) I can then see the results in the experiment tracking

"pca_target_regression.train_dataset_metrics":
  type: tracking.MetricsDataset
  filepath: data/10_tracking/pca_target_regression.train_dataset_metrics.json

Any idea of why this is happening?

@astrojuanlu
Copy link
Member

Hi @EloyID , I think #1689 hasn't been solved yet, might be affecting you.

@ravi-kumar-pilla , is this still the parent issue of #1689 ?

@ravi-kumar-pilla
Copy link
Contributor

Hi @EloyID , I think #1689 hasn't been solved yet, might be affecting you.

@ravi-kumar-pilla , is this still the parent issue of #1689 ?

@astrojuanlu yes. As mentioned in the ticket, we tried resolving this on viz side but we run into few issues. As of now, experiment tracking does not support factory patterns. We will try to resolve this in future sprints. Thank you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

6 participants