Make Viz compatible with `kedro-datasets` #1214

merelcht · 2023-01-12T13:03:25Z

Description

Development notes

I've made the matching of import paths more generic so that both datasets from kedro.extras and kedro-datasets get detected correctly
Viz will try to import kedro-datasets but if not installed, it will revert to using kedro.extras.datasets

QA notes

I've manually tested this by spinning up the development server as described here: https://github.com/kedro-org/kedro-viz/blob/main/CONTRIBUTING.md#launch-a-development-server-with-a-real-kedro-project

Tried that everything works with kedro==0.18.3 and also with kedro==0.18.4 + kedro-datasets==1.0.1

Checklist

Read the contributing guidelines
Opened this PR as a 'Draft Pull Request' if it is work-in-progress
Updated the documentation to reflect the code changes
Added new entries to the RELEASE.md file
Added tests to cover my changes

Signed-off-by: Merel Theisen <merel.theisen@quantumblack.com>

rashidakanchwala

LGTM thanks @merelcht

tynandebold · 2023-01-16T09:50:42Z

Don't forget to add a line to the release notes please :)

antonymilne

Looks good, thanks for the fix! Just a few minor suggestions to make things a bit clearer.

Definitely would be nice to get @jmholzer to review also before merging in case he has any other comments since he's also been thinking about this.

antonymilne · 2023-01-16T10:51:03Z

demo-project/conf/base/catalog_08_reporting.yml

@@ -25,7 +25,7 @@ reporting.feature_importance:
  versioned: true

 reporting.cancellation_policy_grid:
-  type: demo_project.extras.datasets.image_dataset.ImageDataSet
+  type: image_dataset.ImageDataSet


For some reason (I think just as an example of how to do it) Joel wanted to define this dataset inside the project itself and not use the kedro built-in one. So we should leave this as it was.

package/kedro_viz/models/experiment_tracking.py

antonymilne · 2023-01-16T11:09:02Z

package/kedro_viz/models/experiment_tracking.py

@@ -111,3 +114,9 @@ def load_tracking_data(self, run_id: str):

 def get_dataset_type(dataset: AbstractVersionedDataSet) -> str:


Suggested change

def get_dataset_type(dataset: AbstractVersionedDataSet) -> str:

def get_full_dataset_type(dataset: AbstractVersionedDataSet) -> str:

"""e.g. kedro.extras.datasets.plotly.plotly_dataset.PlotlyDataSet or kedro_datasets.plotly.plotly_dataset.PlotlyDataSet"""

Maybe? Sort of depends how dataset_type is used elsewhere as to whether it's worth doing.

we no longer have this. as get_dataset_module_class is the new 'get_dataset_type' e.g. pandas.csv_dataset.CSVDataset

antonymilne · 2023-01-16T11:10:01Z

package/kedro_viz/models/experiment_tracking.py

@@ -67,14 +66,15 @@ class TrackingDatasetModel:

    dataset_name: str
    # dataset is the actual dataset instance, whereas dataset_type is a string.
-    # e.g. "kedro.extras.datasets.tracking.metrics_dataset.MetricsDataSet"
+    # e.g. "kedro_datasets.tracking.metrics_dataset.MetricsDataSet"
    dataset: AbstractVersionedDataSet
    dataset_type: str = field(init=False)
    # runs is a mapping from run_id to loaded data.
    runs: Dict[str, Any] = field(init=False, default_factory=dict)

    def __post_init__(self):
        self.dataset_type = get_dataset_type(self.dataset)


Is this variable actually still used anywhere now in experiment tracking?

Yes, I tried removing this which caused errors. I think it's used for the schema in graphql.

I suspect what's happening here is we're passing the full self.dataset_type to the frontend, which is then just being turned back into the abbreviated form - see https://github.com/kedro-org/kedro-viz/pull/1214/files#r1071164866.

The full dataset type is used in the frontend in metadata panel in the flowchart view but not in experiment tracking (I think). So there's a good simplification we could make here if all we really need in experiment tracking is the full dataset type.

Feel free to leave as it is for now, but should be an easy tidy up for someone to do in a follow up PR if I'm right.

antonymilne · 2023-01-16T11:15:42Z

package/kedro_viz/models/experiment_tracking.py

+def get_dataset_module_class(dataset: AbstractVersionedDataSet) -> str:
+    class_name = f"{dataset.__class__.__qualname__}"
+    _, dataset_type, dataset_file = f"{dataset.__class__.__module__}".rsplit(".", 2)
+    return f"{dataset_type}.{dataset_file}.{class_name}"


The naming is kind of confusing given we already call the "full path" dataset_type (which is probably already confusing, but it's also in the flowchart part of kedro-viz source, so probably let's leave it that way for now).

Suggested change

def get_dataset_module_class(dataset: AbstractVersionedDataSet) -> str:

class_name = f"{dataset.__class__.__qualname__}"

_, dataset_type, dataset_file = f"{dataset.__class__.__module__}".rsplit(".", 2)

return f"{dataset_type}.{dataset_file}.{class_name}"

def get_abbreviated_dataset_type(dataset: AbstractVersionedDataSet) -> str:

"""e.g. plotly.plotly_dataset.PlotlyDataSet"""

abbreviated_module_name = ".".join(dataset.__class__.__module__.split(".")[-2:])

return f"{abbreviated_module_name}.{dataset.__class__.__qualname__}"

antonymilne · 2023-01-16T11:17:09Z

package/kedro_viz/models/flowchart.py

@@ -466,37 +470,39 @@ def __post_init__(self):
            self._get_namespace(self.full_name)
        )

+    @staticmethod


I'd refactor this as per suggestions in experiment_tracking.py.

jmholzer · 2023-01-16T11:30:26Z

Definitely would be nice to get @jmholzer to review also before merging in case he has any other comments since he's also been thinking about this.

I've read through the code and I definitely think the approach is the right one. I'll add a review today 🙂.

antonymilne · 2023-01-16T12:03:38Z

src/utils/short-type.js

-const getShortType = (longTypeName, fallback) =>
-  shortTypeMapping[longTypeName] || fallback;
+const getShortType = (name, fallback) => {
+  const longTypeName = name?.split('.').slice(-3).join('.');


Isn't this basically just doing what your new get_abbreviated_dataset_type does? Why don't we just pass the abbreviated dataset type to the frontend instead now we have it in the backend.

makes sense. because nowhere in the front end we show the full name.

makes sense, i have removed the origin get_dataset_type class and renamed get_dataset_module as get_dataset_type so now everywhere we pass around the shorten dataset name that excludes 'kedro.extras.datasets'

Co-authored-by: Antony Milne <49395058+AntonyMilneQB@users.noreply.github.com>

…al dataset_type

…hub.com/kedro-org/kedro-viz into fix/make-compatible-with-kedro-datasets

jmholzer · 2023-01-16T18:28:44Z

package/tests/test_api/test_graphql/test_queries.py

+    )
+
+
+def _get_dataset_type(dataset):


It would be good to have a comment here about what this function does, and how and why it is different from the function with the same name in package/kedro_viz/models/flowchart.py, as this may get confusing.

actually these two functions are not required in the test files so I have removed it (test_queries, and test _responses). In order to assert the correct result I am going to compare it with the string e.g. "pandas.csv_dataset.CsvDataSet" just the way it was done before.

jmholzer · 2023-01-16T18:31:31Z

package/tests/test_api/test_rest/test_reponses.py

+    )
+
+
+def _get_dataset_type(dataset):


This function has been declared twice with the same body (here and in test_queries), I am fine with the DAMP approach, but ditto my comment.

merelcht added 3 commits January 11, 2023 17:45

Make kedro viz compatible with kedro-datasets

d0bfeb4

Signed-off-by: Merel Theisen <merel.theisen@quantumblack.com>

Fix linting

06aca3b

Signed-off-by: Merel Theisen <merel.theisen@quantumblack.com>

Fix lint

1520eba

Signed-off-by: Merel Theisen <merel.theisen@quantumblack.com>

merelcht self-assigned this Jan 12, 2023

merelcht requested a review from rashidakanchwala January 12, 2023 13:03

merelcht and others added 4 commits January 13, 2023 10:08

Make implementation consistent across flowchart and ET

3d6b8e0

Signed-off-by: Merel Theisen <merel.theisen@quantumblack.com>

fe changes

4a50f59

fix tests to work with kedro-datasets

c62b973

fix tests

395ff70

rashidakanchwala assigned rashidakanchwala and merelcht and unassigned merelcht Jan 13, 2023

rashidakanchwala requested a review from antonymilne January 13, 2023 15:49

fix lint

5f12780

rashidakanchwala marked this pull request as ready for review January 13, 2023 15:52

rashidakanchwala requested a review from limdauto as a code owner January 13, 2023 15:52

merelcht requested a review from jmholzer January 13, 2023 15:53

rashidakanchwala removed request for limdauto and jmholzer January 13, 2023 15:53

merelcht requested a review from jmholzer January 13, 2023 15:53

rashidakanchwala approved these changes Jan 16, 2023

View reviewed changes

update release.md

214b084

rashidakanchwala requested a review from yetudada as a code owner January 16, 2023 10:02

rashidakanchwala removed the request for review from yetudada January 16, 2023 10:02

tynandebold mentioned this pull request Jan 16, 2023

Improve Kedro-Viz docs for Experiment Tracking kedro-org/kedro#2193

Merged

8 tasks

antonymilne approved these changes Jan 16, 2023

View reviewed changes

antonymilne reviewed Jan 16, 2023

View reviewed changes

Update package/kedro_viz/models/experiment_tracking.py

34ca924

Co-authored-by: Antony Milne <49395058+AntonyMilneQB@users.noreply.github.com>

rashidakanchwala and others added 6 commits January 16, 2023 16:01

replaced dataset_module_class with dataset_type and remove the origin…

bd6a529

…al dataset_type

fix tests

cad2a26

fix lint

b2d6811

Merge branch 'main' into fix/make-compatible-with-kedro-datasets

577dedd

small fix on fe

7107404

Merge branch 'fix/make-compatible-with-kedro-datasets' of https://git…

5ff07cd

…hub.com/kedro-org/kedro-viz into fix/make-compatible-with-kedro-datasets

jmholzer approved these changes Jan 16, 2023

View reviewed changes

rashidakanchwala and others added 3 commits January 16, 2023 21:20

fix tests

8748b49

fix lint

65f1ac9

Merge branch 'main' into fix/make-compatible-with-kedro-datasets

22302bb

rashidakanchwala merged commit 5d6ce10 into main Jan 17, 2023

rashidakanchwala deleted the fix/make-compatible-with-kedro-datasets branch January 17, 2023 09:21

tynandebold mentioned this pull request Jan 18, 2023

Release v5.2.0 #1223

Merged

5 tasks

tynandebold mentioned this pull request Jan 27, 2023

Handle different dataset types in a better way #963

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make Viz compatible with `kedro-datasets` #1214

Make Viz compatible with `kedro-datasets` #1214

merelcht commented Jan 12, 2023 •

edited by tynandebold

rashidakanchwala left a comment

tynandebold commented Jan 16, 2023

antonymilne left a comment

antonymilne Jan 16, 2023

antonymilne Jan 16, 2023

rashidakanchwala Jan 16, 2023

antonymilne Jan 16, 2023

merelcht Jan 16, 2023

antonymilne Jan 16, 2023

antonymilne Jan 16, 2023

antonymilne Jan 16, 2023

jmholzer commented Jan 16, 2023

antonymilne Jan 16, 2023

rashidakanchwala Jan 16, 2023

rashidakanchwala Jan 16, 2023

jmholzer Jan 16, 2023

rashidakanchwala Jan 16, 2023 •

edited

jmholzer Jan 16, 2023

		@@ -111,3 +114,9 @@ def load_tracking_data(self, run_id: str):

		def get_dataset_type(dataset: AbstractVersionedDataSet) -> str:

	def get_dataset_type(dataset: AbstractVersionedDataSet) -> str:
	def get_full_dataset_type(dataset: AbstractVersionedDataSet) -> str:
	"""e.g. kedro.extras.datasets.plotly.plotly_dataset.PlotlyDataSet or kedro_datasets.plotly.plotly_dataset.PlotlyDataSet"""

Make Viz compatible with kedro-datasets #1214

Make Viz compatible with kedro-datasets #1214

Conversation

merelcht commented Jan 12, 2023 • edited by tynandebold

Description

Development notes

QA notes

Checklist

rashidakanchwala left a comment

Choose a reason for hiding this comment

tynandebold commented Jan 16, 2023

antonymilne left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jmholzer commented Jan 16, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rashidakanchwala Jan 16, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Make Viz compatible with `kedro-datasets` #1214

Make Viz compatible with `kedro-datasets` #1214

merelcht commented Jan 12, 2023 •

edited by tynandebold

rashidakanchwala Jan 16, 2023 •

edited