Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Kedro-Viz docs for Experiment Tracking #2193

Merged

Conversation

tynandebold
Copy link
Member

@tynandebold tynandebold commented Jan 11, 2023

NOTE: Kedro datasets are moving from kedro.extras.datasets to a separate kedro-datasets package in
kedro-plugins repository. Any changes to the dataset implementations
should be done by opening a pull request in that repository.

Description

Resolves (part of) kedro-org/kedro-viz#1241.

Warning
This PR shouldn't be merged until version 5.2.0 of Kedro-Viz is released. This is because there's content in this PR that will only exist when that release is out.

Development notes

Taken from the comment here, I've done the following:

  • Ensure that the spaceflights tutorial works without issue. There was a problem with Plotly where the charts weren't rendering (relates to Make Kedro-Viz work with kedro-datasets kedro-viz#1205).
  • There are actually two pages for exp. tracking documentation. We should only have one.
    • Copy the contents of the second, smaller page into the first and then delete that content from the second page.
  • Mention parallel coordinate and time-series plots.

A note on each checked item:

  1. I checked this using Kedro 0.18.3, since 0.18.4 will prevent the charts from loading. There were a couple of minor hangups while running through the tutorial that I've remedied. All should work smoothly now.
  2. I've deleted this secondary docs page with experiment tracking instructions and didn't move anything over into the main page because it was already well covered.
  3. The new work is captured and takes precedent in the ordering on the page, meaning the metrics plots in the experiment tracking page shows up earlier in the documentation versus the metrics plot that can be viewed on the flowchart page.

Lastly, I've updated the gifs and images to reflect the changes in design on the experiment tracking page (now we have tabs for overview, metrics, and plots, and before we didn't).

Checklist

  • Read the contributing guidelines
  • Opened this PR as a 'Draft Pull Request' if it is work-in-progress
  • Updated the documentation to reflect the code changes
  • Added a description of this change in the RELEASE.md file
  • Added tests to cover my changes

@stichbury
Copy link
Contributor

stichbury commented Jan 12, 2023

This looks great, thanks @tynandebold.

I do have a query about Spaceflights. So this now works if you take the most recent spaceflights starter and add the code given above? There's no reliance on any previous part of the Kedro-Viz docs like the previous page on Plotly reporting? I would normally test it myself just to 'dogfood' the instructions, but I'm a bit tied up with the Academy decks today and don't want to hold you up.

I think we still need to introduce Kedro-Viz for anyone (unlikely enough) to be coming straight to this page without having previously installed Viz. We put this into the Plotly page. -- would it be possible to add to this file?

You could also move/edit a small section of the text from where it appears later in the file, so you test the version at the point you mention installing Viz:

"Here comes the fun part of accessing your run data on Kedro-Viz. Having ensured that you are using Kedro-Viz >=4.1.1 (you can confirm your Kedro-Viz version by running kedro info), run"

@stichbury
Copy link
Contributor

This looks great to me @tynandebold. LMK when you've made those changes and I'll approve.

tynandebold and others added 5 commits January 12, 2023 13:15
Co-authored-by: Jo Stichbury <jo_stichbury@mckinsey.com>
Co-authored-by: Jo Stichbury <jo_stichbury@mckinsey.com>
Co-authored-by: Jo Stichbury <jo_stichbury@mckinsey.com>
Co-authored-by: Jo Stichbury <jo_stichbury@mckinsey.com>
@tynandebold
Copy link
Member Author

I've made the changes now @stichbury. I took your second suggestion and moved the testing of the Viz version to after the install step. Let me know how that reads and if it's in the correct place.

And to your query about Spaceflights: yes, all worked fine for me when I tested with the latest version of Spaceflights (using the kedro new command) and added the code from this PR. I used a fresh conda environment for testing this and followed the instruction from this page only, taking nothing from any other page in the docs.

Could there have been some leftover dependencies hanging around in my new conda environment that were populated from an older one? I'm not too familiar with this. Again, it worked fine for me with 0.18.3 of Kedro and the couple new lines that I added, so I'm quite confident it's solid. I do hope to test with Merel's fix though, to ensure it'll work with 0.18.4.

Copy link
Contributor

@stichbury stichbury left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for this! 🌟 🌟 🌟 🌟 🌟

Copy link
Member

@merelcht merelcht left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me! And so exciting to see the time series and parallel coordinates! 🥳

Here's the list of changes in this commit: 
 - Moved some of the "experiment tracking" explanation from the logging file into the current file
 - Added a "Why use Kedro experiment tracking?" section
 - Made sure users knew to install dependencies
 - Fixed formatting and language from the previous versions of the documentation
Copy link
Contributor

@yetudada yetudada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much for this stellar work @tynandebold! I've made some additions and I've left a bigger question which we might need @rashidakanchwala or @merelcht to help with.


## Set up tracking datasets

There are two types of tracking datasets: [`tracking.MetricsDataSet`](/kedro.extras.datasets.tracking.MetricsDataSet) and [`tracking.JSONDataSet`](/kedro.extras.datasets.tracking.JSONDataSet). The `tracking.MetricsDataSet` should be used for tracking numerical metrics, and the `tracking.JSONDataSet` can be used for tracking any other JSON-compatible data like boolean or text-based data.

Set up two datasets to log `r2 scores` and `parameters` for each run by adding the following in the `conf/base/catalog.yml` file:
Set up two datasets to log the columns used in the companies dataset (`companies_columns`) and experiment metrics for the `active_modelling_pipeline` (`active_modelling_pipeline.metrics`) like the coefficient of determination (`r2 score`), max error (`me`) and mean absolute error (`mae`) by adding the following in the `conf/base/catalog.yml` file:

```yaml
active_modelling_pipeline.metrics:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My only other comment is based on this and it might need to be a bigger change to how we structure this tutorial. We introduce namespacing here and in the plot comparison section (data_processing.confusion_matrix in catalog.yml) but we don't actually explain what the full stop means here.

It's important that we do because I got confused with the guidance in 113 - Note that the output dataset must exactly match the name of the tracking dataset specified in the catalog file. because users are asked to just specify metrics and not active_modelling_pipeline.metrics.

So my question would be:

  • Can we, ideally, do this tutorial without introducing the concept of namespaces too (it's an information overload)?
  • And if not, how do we introduce it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great spot @yetudada ! We do indeed not need the namespace here, as no namespacing has been applied to the pipelines.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome let me update the other mentions of active_modelling_pipeline.metrics 😊

docs/source/visualisation/experiment_tracking.md Outdated Show resolved Hide resolved
Co-authored-by: Yetunde Dada <43755008+yetudada@users.noreply.github.com>
docs/source/visualisation/experiment_tracking.md Outdated Show resolved Hide resolved

## Set up tracking datasets

There are two types of tracking datasets: [`tracking.MetricsDataSet`](/kedro.extras.datasets.tracking.MetricsDataSet) and [`tracking.JSONDataSet`](/kedro.extras.datasets.tracking.JSONDataSet). The `tracking.MetricsDataSet` should be used for tracking numerical metrics, and the `tracking.JSONDataSet` can be used for tracking any other JSON-compatible data like boolean or text-based data.

Set up two datasets to log `r2 scores` and `parameters` for each run by adding the following in the `conf/base/catalog.yml` file:
Set up two datasets to log the columns used in the companies dataset (`companies_columns`) and experiment metrics for the `active_modelling_pipeline` (`active_modelling_pipeline.metrics`) like the coefficient of determination (`r2 score`), max error (`me`) and mean absolute error (`mae`) by adding the following in the `conf/base/catalog.yml` file:

```yaml
active_modelling_pipeline.metrics:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great spot @yetudada ! We do indeed not need the namespace here, as no namespacing has been applied to the pipelines.

Copy link
Contributor

@yetudada yetudada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we just need a general check to see if this tutorial runs from beginning to end in a clean virtual environment with all the changes but otherwise I think we're ready to go live.


## Set up tracking datasets

There are two types of tracking datasets: [`tracking.MetricsDataSet`](/kedro.extras.datasets.tracking.MetricsDataSet) and [`tracking.JSONDataSet`](/kedro.extras.datasets.tracking.JSONDataSet). The `tracking.MetricsDataSet` should be used for tracking numerical metrics, and the `tracking.JSONDataSet` can be used for tracking any other JSON-compatible data like boolean or text-based data.

Set up two datasets to log `r2 scores` and `parameters` for each run by adding the following in the `conf/base/catalog.yml` file:
Set up two datasets to log the columns used in the companies dataset (`companies_columns`) and experiment metrics for the `active_modelling_pipeline` (`active_modelling_pipeline.metrics`) like the coefficient of determination (`r2 score`), max error (`me`) and mean absolute error (`mae`) by adding the following in the `conf/base/catalog.yml` file:

```yaml
active_modelling_pipeline.metrics:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome let me update the other mentions of active_modelling_pipeline.metrics 😊

docs/source/visualisation/experiment_tracking.md Outdated Show resolved Hide resolved
docs/source/visualisation/experiment_tracking.md Outdated Show resolved Hide resolved
tynandebold and others added 3 commits January 13, 2023 13:57
Co-authored-by: Yetunde Dada <43755008+yetudada@users.noreply.github.com>
Co-authored-by: Merel Theisen <49397448+merelcht@users.noreply.github.com>
Co-authored-by: Yetunde Dada <43755008+yetudada@users.noreply.github.com>
@stichbury
Copy link
Contributor

I think we just need a general check to see if this tutorial runs from beginning to end in a clean virtual environment with all the changes but otherwise I think we're ready to go live.

I'm happy to try this out as I want to test it out and learn about experiment tracking, but I'm not really able to right now. I think @tynandebold needs to hold off until the next viz release anyway, so hopefully this can wait until next week -- I'll get it done before the end of the sprint.

@tynandebold tynandebold self-assigned this Jan 16, 2023
Co-authored-by: Jo Stichbury <jo_stichbury@mckinsey.com>
You must update the `src/requirements.txt` file in your Kedro project by adding the following dataset to enable Matplotlib for your project:

```bash
kedro[matplotlib.MatplotlibWriter]==0.18.3
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can remove this now, can't we? Once kedro-org/kedro-viz#1214 is merged we shouldn't need to specify a Kedro version.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True! I have to make some wholesale changes to the other viz docs when that PR merges so can remove this then if you merge this PR ahead of #1214 but otherwise, if it won't merge in until later, you can certainly remove this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've removed it since this PR will be merged in the next couple of days.

@tynandebold
Copy link
Member Author

@stichbury you'll have a couple days to test it out. Hopefully we release Viz tomorrow or Wednesday.

Separately, in this PR I think we should remove the references to Kedro version 0.18.3 in this file, since that problem should be solved when this PR is completed. What do you think?

@stichbury
Copy link
Contributor

@stichbury you'll have a couple days to test it out. Hopefully we release Viz tomorrow or Wednesday.

Separately, in this PR I think we should remove the references to Kedro version 0.18.3 in this file, since that problem should be solved when this PR is completed. What do you think?

I'll prioritise this on Tuesday 17th.

tynandebold and others added 7 commits January 16, 2023 13:05
… docs/improve-viz-docs-for-exp-tracking-metrics-release

Signed-off-by: Tynan DeBold <thdebold@gmail.com>
… of https://github.com/quantumblacklabs/kedro into docs/improve-viz-docs-for-exp-tracking-metrics-release

Signed-off-by: Tynan DeBold <thdebold@gmail.com>
Signed-off-by: Tynan DeBold <thdebold@gmail.com>
@stichbury
Copy link
Contributor

@stichbury you'll have a couple days to test it out. Hopefully we release Viz tomorrow or Wednesday.
Separately, in this PR I think we should remove the references to Kedro version 0.18.3 in this file, since that problem should be solved when this PR is completed. What do you think?

I'll prioritise this on Tuesday 17th.

@tynandebold I've now tested the text and all is good. I made a couple of changes to the experiment tracking page to tweak it (nothing major) and also removed mention of Kedro 0.18.3 in the Viz docs about plotly charts, which I've sneaked into this PR. Hope that's OK!

@tynandebold
Copy link
Member Author

Fantastic, @stichbury! Thank you 💥

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants