Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create new documentation page to show a staged migration from a notebook to a Kedro project. #2855

Closed
stichbury opened this issue Jul 28, 2023 · 6 comments
Labels
Component: Documentation 📄 Issue/PR for markdown and API documentation

Comments

@stichbury
Copy link
Contributor

This is a page 2 from this ticket #2845 and I'll pair with @astrojuanlu later in Q3 to work out an example.

@stichbury
Copy link
Contributor Author

@astrojuanlu This is the ticket to add a notebook for docs about notebooks 🤯

@astrojuanlu
Copy link
Member

Notebook version of the spaceflights tutorial https://gist.github.com/astrojuanlu/f3e2bc336568a18cc5ca507e05206dd7

Quick and dirty, as notebooks are :) Contains, in order of appearance:

  1. Cells on top loading the data with pandas, as it's typically done (these are to be replaced by catalog.load eventually)
  2. Then the preprocessing part, without functions (this is to be replaced by the functions, to remove duplicated code, and eventually the nodes and pipelines)
  3. Then the model training part, that contains the hardcoded list of features (in X = model_input_table[[...]]) and the hardcoded test_size (these are to be replaced by parameters)

So my recommendation is to work through the Jupyter blog posts we have in the works, and cover (3), (1), (2). Starting (2) is the tricky part, because at that point the user already needs the Kedro project template. (3) can use the OmegaConfigLoader exclusively, and (1) a combination of OmegaConfigLoader + DataCatalog.

@stichbury
Copy link
Contributor Author

stichbury commented Oct 3, 2023

I'm tempted to use nbsphinx and write this page of the docs in a notebook so it's readable HTML but users can also download it and work with it directly. WDYT, is that asking for trouble? I'm cognisant that version control is painful, wondering though if it's a helpful way for readers to have a notebook ready to go with docs/code to work with.

I am aware of jupytext and stuff for syncing markdown and notebooks but I don't want to add too much complexity for readers (or process for us).

@astrojuanlu
Copy link
Member

nbsphinx allows you to have the notebooks without the outputs, then they get executed when the docs are built. this makes version control a bit easier.

in fact, we can have markdown-only notebooks. here's an example I know well: https://github.com/poliastro/poliastro/blob/main/docs/source/examples/analyzing-NEOs.myst.md

(Rendered: https://docs.poliastro.space/en/stable/examples/Analyzing%20NEOs.html)

(Config: https://github.com/poliastro/poliastro/blob/55e96432b27301c5dffb4ef6b4f383d970c6e9c0/docs/source/conf.py#L169-L171)

@stichbury
Copy link
Contributor Author

stichbury commented Oct 3, 2023

@astrojuanlu I've spent two days in a notebook environment 😱 and have come up with this:

#3098

Please could you take a look when it's convenient. It's still not perfect, but what do you think as a notebook/documentation.

We still need to think of a way to distribute this so people can download the .ipynb and have the various yaml files available too. I wondered first about writing some code to run that creates them the various config/catalog files on the fly, but I don't like that much. Maybe it's just a question of putting code somewhere in the docs as an example folder they can browse to. We don't have precedent with that. Maybe now is the time.

Anyway, pls could you take a look when you are able to see how the notebook now stacks up with your original gist.

@stichbury
Copy link
Contributor Author

This was resolved by #3128

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: Documentation 📄 Issue/PR for markdown and API documentation
Projects
Archived in project
Development

No branches or pull requests

2 participants