-
Notifications
You must be signed in to change notification settings - Fork 873
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve Kedro-Viz docs for Experiment Tracking #2193
Improve Kedro-Viz docs for Experiment Tracking #2193
Conversation
…eded secondary exp tracking file
This looks great, thanks @tynandebold. I do have a query about Spaceflights. So this now works if you take the most recent spaceflights starter and add the code given above? There's no reliance on any previous part of the Kedro-Viz docs like the previous page on Plotly reporting? I would normally test it myself just to 'dogfood' the instructions, but I'm a bit tied up with the Academy decks today and don't want to hold you up. I think we still need to introduce Kedro-Viz for anyone (unlikely enough) to be coming straight to this page without having previously installed Viz. We put this into the Plotly page. -- would it be possible to add to this file? You could also move/edit a small section of the text from where it appears later in the file, so you test the version at the point you mention installing Viz: "Here comes the fun part of accessing your run data on Kedro-Viz. Having ensured that you are using Kedro-Viz >=4.1.1 (you can confirm your Kedro-Viz version by running kedro info), run" |
This looks great to me @tynandebold. LMK when you've made those changes and I'll approve. |
Co-authored-by: Jo Stichbury <jo_stichbury@mckinsey.com>
Co-authored-by: Jo Stichbury <jo_stichbury@mckinsey.com>
Co-authored-by: Jo Stichbury <jo_stichbury@mckinsey.com>
Co-authored-by: Jo Stichbury <jo_stichbury@mckinsey.com>
I've made the changes now @stichbury. I took your second suggestion and moved the testing of the Viz version to after the install step. Let me know how that reads and if it's in the correct place. And to your query about Spaceflights: yes, all worked fine for me when I tested with the latest version of Spaceflights (using the Could there have been some leftover dependencies hanging around in my new conda environment that were populated from an older one? I'm not too familiar with this. Again, it worked fine for me with 0.18.3 of Kedro and the couple new lines that I added, so I'm quite confident it's solid. I do hope to test with Merel's fix though, to ensure it'll work with 0.18.4. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for this! 🌟 🌟 🌟 🌟 🌟
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me! And so exciting to see the time series and parallel coordinates! 🥳
Here's the list of changes in this commit: - Moved some of the "experiment tracking" explanation from the logging file into the current file - Added a "Why use Kedro experiment tracking?" section - Made sure users knew to install dependencies - Fixed formatting and language from the previous versions of the documentation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you so much for this stellar work @tynandebold! I've made some additions and I've left a bigger question which we might need @rashidakanchwala or @merelcht to help with.
|
||
## Set up tracking datasets | ||
|
||
There are two types of tracking datasets: [`tracking.MetricsDataSet`](/kedro.extras.datasets.tracking.MetricsDataSet) and [`tracking.JSONDataSet`](/kedro.extras.datasets.tracking.JSONDataSet). The `tracking.MetricsDataSet` should be used for tracking numerical metrics, and the `tracking.JSONDataSet` can be used for tracking any other JSON-compatible data like boolean or text-based data. | ||
|
||
Set up two datasets to log `r2 scores` and `parameters` for each run by adding the following in the `conf/base/catalog.yml` file: | ||
Set up two datasets to log the columns used in the companies dataset (`companies_columns`) and experiment metrics for the `active_modelling_pipeline` (`active_modelling_pipeline.metrics`) like the coefficient of determination (`r2 score`), max error (`me`) and mean absolute error (`mae`) by adding the following in the `conf/base/catalog.yml` file: | ||
|
||
```yaml | ||
active_modelling_pipeline.metrics: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My only other comment is based on this and it might need to be a bigger change to how we structure this tutorial. We introduce namespacing here and in the plot comparison section (data_processing.confusion_matrix
in catalog.yml
) but we don't actually explain what the full stop means here.
It's important that we do because I got confused with the guidance in 113 - Note that the output dataset must exactly match the name of the tracking dataset specified in the catalog file.
because users are asked to just specify metrics
and not active_modelling_pipeline.metrics
.
So my question would be:
- Can we, ideally, do this tutorial without introducing the concept of namespaces too (it's an information overload)?
- And if not, how do we introduce it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great spot @yetudada ! We do indeed not need the namespace here, as no namespacing has been applied to the pipelines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome let me update the other mentions of active_modelling_pipeline.metrics
😊
Co-authored-by: Yetunde Dada <43755008+yetudada@users.noreply.github.com>
|
||
## Set up tracking datasets | ||
|
||
There are two types of tracking datasets: [`tracking.MetricsDataSet`](/kedro.extras.datasets.tracking.MetricsDataSet) and [`tracking.JSONDataSet`](/kedro.extras.datasets.tracking.JSONDataSet). The `tracking.MetricsDataSet` should be used for tracking numerical metrics, and the `tracking.JSONDataSet` can be used for tracking any other JSON-compatible data like boolean or text-based data. | ||
|
||
Set up two datasets to log `r2 scores` and `parameters` for each run by adding the following in the `conf/base/catalog.yml` file: | ||
Set up two datasets to log the columns used in the companies dataset (`companies_columns`) and experiment metrics for the `active_modelling_pipeline` (`active_modelling_pipeline.metrics`) like the coefficient of determination (`r2 score`), max error (`me`) and mean absolute error (`mae`) by adding the following in the `conf/base/catalog.yml` file: | ||
|
||
```yaml | ||
active_modelling_pipeline.metrics: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great spot @yetudada ! We do indeed not need the namespace here, as no namespacing has been applied to the pipelines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we just need a general check to see if this tutorial runs from beginning to end in a clean virtual environment with all the changes but otherwise I think we're ready to go live.
|
||
## Set up tracking datasets | ||
|
||
There are two types of tracking datasets: [`tracking.MetricsDataSet`](/kedro.extras.datasets.tracking.MetricsDataSet) and [`tracking.JSONDataSet`](/kedro.extras.datasets.tracking.JSONDataSet). The `tracking.MetricsDataSet` should be used for tracking numerical metrics, and the `tracking.JSONDataSet` can be used for tracking any other JSON-compatible data like boolean or text-based data. | ||
|
||
Set up two datasets to log `r2 scores` and `parameters` for each run by adding the following in the `conf/base/catalog.yml` file: | ||
Set up two datasets to log the columns used in the companies dataset (`companies_columns`) and experiment metrics for the `active_modelling_pipeline` (`active_modelling_pipeline.metrics`) like the coefficient of determination (`r2 score`), max error (`me`) and mean absolute error (`mae`) by adding the following in the `conf/base/catalog.yml` file: | ||
|
||
```yaml | ||
active_modelling_pipeline.metrics: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome let me update the other mentions of active_modelling_pipeline.metrics
😊
Co-authored-by: Yetunde Dada <43755008+yetudada@users.noreply.github.com>
Co-authored-by: Merel Theisen <49397448+merelcht@users.noreply.github.com>
Co-authored-by: Yetunde Dada <43755008+yetudada@users.noreply.github.com>
I'm happy to try this out as I want to test it out and learn about experiment tracking, but I'm not really able to right now. I think @tynandebold needs to hold off until the next viz release anyway, so hopefully this can wait until next week -- I'll get it done before the end of the sprint. |
Co-authored-by: Jo Stichbury <jo_stichbury@mckinsey.com>
You must update the `src/requirements.txt` file in your Kedro project by adding the following dataset to enable Matplotlib for your project: | ||
|
||
```bash | ||
kedro[matplotlib.MatplotlibWriter]==0.18.3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can remove this now, can't we? Once kedro-org/kedro-viz#1214 is merged we shouldn't need to specify a Kedro version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True! I have to make some wholesale changes to the other viz docs when that PR merges so can remove this then if you merge this PR ahead of #1214 but otherwise, if it won't merge in until later, you can certainly remove this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've removed it since this PR will be merged in the next couple of days.
@stichbury you'll have a couple days to test it out. Hopefully we release Viz tomorrow or Wednesday. Separately, in this PR I think we should remove the references to Kedro version |
I'll prioritise this on Tuesday 17th. |
… docs/improve-viz-docs-for-exp-tracking-metrics-release Signed-off-by: Tynan DeBold <thdebold@gmail.com>
… of https://github.com/quantumblacklabs/kedro into docs/improve-viz-docs-for-exp-tracking-metrics-release Signed-off-by: Tynan DeBold <thdebold@gmail.com>
Signed-off-by: Tynan DeBold <thdebold@gmail.com>
@tynandebold I've now tested the text and all is good. I made a couple of changes to the experiment tracking page to tweak it (nothing major) and also removed mention of Kedro 0.18.3 in the Viz docs about plotly charts, which I've sneaked into this PR. Hope that's OK! |
Fantastic, @stichbury! Thank you 💥 |
Description
Resolves (part of) kedro-org/kedro-viz#1241.
Development notes
Taken from the comment here, I've done the following:
kedro-datasets
kedro-viz#1205).A note on each checked item:
Lastly, I've updated the gifs and images to reflect the changes in design on the experiment tracking page (now we have tabs for overview, metrics, and plots, and before we didn't).
Checklist
RELEASE.md
file