New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make Viz compatible with kedro-datasets
#1214
Make Viz compatible with kedro-datasets
#1214
Conversation
Signed-off-by: Merel Theisen <merel.theisen@quantumblack.com>
Signed-off-by: Merel Theisen <merel.theisen@quantumblack.com>
Signed-off-by: Merel Theisen <merel.theisen@quantumblack.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM thanks @merelcht
Don't forget to add a line to the release notes please :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, thanks for the fix! Just a few minor suggestions to make things a bit clearer.
Definitely would be nice to get @jmholzer to review also before merging in case he has any other comments since he's also been thinking about this.
@@ -25,7 +25,7 @@ reporting.feature_importance: | |||
versioned: true | |||
|
|||
reporting.cancellation_policy_grid: | |||
type: demo_project.extras.datasets.image_dataset.ImageDataSet | |||
type: image_dataset.ImageDataSet |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For some reason (I think just as an example of how to do it) Joel wanted to define this dataset inside the project itself and not use the kedro built-in one. So we should leave this as it was.
@@ -111,3 +114,9 @@ def load_tracking_data(self, run_id: str): | |||
|
|||
def get_dataset_type(dataset: AbstractVersionedDataSet) -> str: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def get_dataset_type(dataset: AbstractVersionedDataSet) -> str: | |
def get_full_dataset_type(dataset: AbstractVersionedDataSet) -> str: | |
"""e.g. kedro.extras.datasets.plotly.plotly_dataset.PlotlyDataSet or kedro_datasets.plotly.plotly_dataset.PlotlyDataSet""" |
Maybe? Sort of depends how dataset_type
is used elsewhere as to whether it's worth doing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we no longer have this. as get_dataset_module_class is the new 'get_dataset_type' e.g. pandas.csv_dataset.CSVDataset
@@ -67,14 +66,15 @@ class TrackingDatasetModel: | |||
|
|||
dataset_name: str | |||
# dataset is the actual dataset instance, whereas dataset_type is a string. | |||
# e.g. "kedro.extras.datasets.tracking.metrics_dataset.MetricsDataSet" | |||
# e.g. "kedro_datasets.tracking.metrics_dataset.MetricsDataSet" | |||
dataset: AbstractVersionedDataSet | |||
dataset_type: str = field(init=False) | |||
# runs is a mapping from run_id to loaded data. | |||
runs: Dict[str, Any] = field(init=False, default_factory=dict) | |||
|
|||
def __post_init__(self): | |||
self.dataset_type = get_dataset_type(self.dataset) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this variable actually still used anywhere now in experiment tracking?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I tried removing this which caused errors. I think it's used for the schema in graphql.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect what's happening here is we're passing the full self.dataset_type
to the frontend, which is then just being turned back into the abbreviated form - see https://github.com/kedro-org/kedro-viz/pull/1214/files#r1071164866.
The full dataset type is used in the frontend in metadata panel in the flowchart view but not in experiment tracking (I think). So there's a good simplification we could make here if all we really need in experiment tracking is the full dataset type.
Feel free to leave as it is for now, but should be an easy tidy up for someone to do in a follow up PR if I'm right.
def get_dataset_module_class(dataset: AbstractVersionedDataSet) -> str: | ||
class_name = f"{dataset.__class__.__qualname__}" | ||
_, dataset_type, dataset_file = f"{dataset.__class__.__module__}".rsplit(".", 2) | ||
return f"{dataset_type}.{dataset_file}.{class_name}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The naming is kind of confusing given we already call the "full path" dataset_type
(which is probably already confusing, but it's also in the flowchart part of kedro-viz source, so probably let's leave it that way for now).
def get_dataset_module_class(dataset: AbstractVersionedDataSet) -> str: | |
class_name = f"{dataset.__class__.__qualname__}" | |
_, dataset_type, dataset_file = f"{dataset.__class__.__module__}".rsplit(".", 2) | |
return f"{dataset_type}.{dataset_file}.{class_name}" | |
def get_abbreviated_dataset_type(dataset: AbstractVersionedDataSet) -> str: | |
"""e.g. plotly.plotly_dataset.PlotlyDataSet""" | |
abbreviated_module_name = ".".join(dataset.__class__.__module__.split(".")[-2:]) | |
return f"{abbreviated_module_name}.{dataset.__class__.__qualname__}" |
@@ -466,37 +470,39 @@ def __post_init__(self): | |||
self._get_namespace(self.full_name) | |||
) | |||
|
|||
@staticmethod |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd refactor this as per suggestions in experiment_tracking.py.
I've read through the code and I definitely think the approach is the right one. I'll add a review today 🙂. |
src/utils/short-type.js
Outdated
const getShortType = (longTypeName, fallback) => | ||
shortTypeMapping[longTypeName] || fallback; | ||
const getShortType = (name, fallback) => { | ||
const longTypeName = name?.split('.').slice(-3).join('.'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this basically just doing what your new get_abbreviated_dataset_type
does? Why don't we just pass the abbreviated dataset type to the frontend instead now we have it in the backend.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
makes sense. because nowhere in the front end we show the full name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
makes sense, i have removed the origin get_dataset_type class and renamed get_dataset_module as get_dataset_type so now everywhere we pass around the shorten dataset name that excludes 'kedro.extras.datasets'
Co-authored-by: Antony Milne <49395058+AntonyMilneQB@users.noreply.github.com>
…hub.com/kedro-org/kedro-viz into fix/make-compatible-with-kedro-datasets
) | ||
|
||
|
||
def _get_dataset_type(dataset): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be good to have a comment here about what this function does, and how and why it is different from the function with the same name in package/kedro_viz/models/flowchart.py
, as this may get confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually these two functions are not required in the test files so I have removed it (test_queries, and test _responses). In order to assert the correct result I am going to compare it with the string e.g. "pandas.csv_dataset.CsvDataSet" just the way it was done before.
) | ||
|
||
|
||
def _get_dataset_type(dataset): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function has been declared twice with the same body (here and in test_queries
), I am fine with the DAMP approach, but ditto my comment.
Description
Fixes #1205
Development notes
kedro.extras
andkedro-datasets
get detected correctlykedro-datasets
but if not installed, it will revert to usingkedro.extras.datasets
QA notes
I've manually tested this by spinning up the development server as described here: https://github.com/kedro-org/kedro-viz/blob/main/CONTRIBUTING.md#launch-a-development-server-with-a-real-kedro-project
Tried that everything works with
kedro==0.18.3
and also withkedro==0.18.4
+kedro-datasets==1.0.1
Checklist
RELEASE.md
file