Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove kedro.line_magic entrypoint #878

Closed
Galileo-Galilei opened this issue Sep 7, 2021 · 10 comments
Closed

Remove kedro.line_magic entrypoint #878

Galileo-Galilei opened this issue Sep 7, 2021 · 10 comments
Labels
Issue: Feature Request New feature or improvement to existing feature pinned Issue shouldn't be closed by stale bot Stage: Technical Design 🎨 Ticket needs to undergo technical design before implementation

Comments

@Galileo-Galilei
Copy link
Contributor

Galileo-Galilei commented Sep 7, 2021

This issue has changed quite a bit since it was originally written. Look at the final post here to see what we now need to do.

Original issue here for posterity.


Description

Is your feature request related to a problem? A clear and concise description of what the problem is: "I'm always frustrated when ..."

I am the author of several kedro plugins either open souce (kedro-mlflow) or at work. It turns out that most plugins either perform one (or several) of the following tasks:

  • provide CLI commands
  • modify pipeline/catalog/node behaviour with hooks
  • add new datasets

Since Kedro is a data science framework, and since Jupyter is a standard for the exploration phase of a data science project, it seems natural to make Kedro compatible with Jupyter. For now, Kedro offers the following functionnality:

  • setup a bunch of Kedro configuration (bootstraping the project, creating a session and activate it)
  • declare some global variables to be accessible of the shell (catalog, session, context)
  • register custom "line_magic" from plugins

This works well for the core library, but it would be a more pleasant user experience to improve plugin compatibilities with notebooks with the following actions:

  • add documentation on how to create a custom line magic in your plugin, because this feature is undocumented yet. I had to read the code to discover it and I struggled a little to make it work². I can make a PR, maybe by adding a sub-section in the plugin section?
  • add the possibility to execute some functions / push global variables to the notebook session when it is openened via a Kedro CLI command

Context

Why is this change important to you? How would you use it? How can it benefit other users?

It is a common pattern to "setup" your plugin, e.g. instantiate a connection to an external service (kedro-mlflow creates a connection to the mlflow server, I do have a sas plugin which creates a connection to the database...), especially in ther "before pipeline_run" hook.

If I create a custom line_magic (say %reload_kedro_mlflow), it is discovered by kedro and it is accessible inside the notebook or the ipython session. However, I'd like to run it automatically when the notebook is opened via the Kedro CLI instead of forcing the user to make it manually.

Possible Implementation

(Optional) Suggest an idea for implementing the addition or change.

Possible Alternatives

(Optional) Describe any alternative solutions or features you've considered.

It is currently possible to bypass Kedro configuration by creating a load_ipython_extension function to execute code when opening the notebook (like here: https://github.com/quantumblacklabs/kedro/blob/fc9d1c5d35981c51a651af467e79e93df065d354/kedro/extras/extensions/ipython.py#L121). However, we cannot access the objects created by reload_kedro (especially the session or the catalog if we want to modify /use them). We need to duplicate almost all the code to make it works, and it prevents combining several plugins.

@Galileo-Galilei Galileo-Galilei added the Issue: Feature Request New feature or improvement to existing feature label Sep 7, 2021
@antonymilne
Copy link
Contributor

Thanks for this very clear write up! I'm wondering how this would fit into current ideas around improving the jupyter/ipython workflow (as per 0.17.5 release notes, we have removed .ipython/profile_default/startup/ from the Kedro project template in favour of .ipython/profile_default/ipython_config.py and kedro.extras.extensions.ipython). @idanov

add documentation on how to create a custom line magic in your plugin, because this feature is undocumented yet. I had to read the code to discover it and I struggled a little to make it work². I can make a PR, maybe by adding a sub-section in the plugin section?

This is an excellent idea I think. As far as I can see the kedro.line_magic entry point is indeed completely undocumented and would fit in well on that page. I've actually been meaning to make that page a bit clearer for a while, but if you did want to add a subsection for how to write custom line magic then that would be awesome.

add the possibility to execute some functions / push global variables to the notebook session when it is openened via a Kedro CLI command

Just to understand, this would effectively give users the ability to add their own custom code that would be executed upon running %reload_kedro?

@Galileo-Galilei
Copy link
Contributor Author

Galileo-Galilei commented Sep 11, 2021

Yes, I've seen how you refactored the ipython part for the next release and it seems much simpler like this.

Regarding the documentation, I agree that apart from the line_magic, the plugin section is a bit sparse and may need more details for beginners (I often find myself digging in the code to see how things are implement under the hood, or look at kedro-viz as an example). I'd try to suggest some improvements when I have time but this is not a top priority for me to be honest.

Just to understand, this would effectively give users the ability to add their own custom code that would be executed upon running %reload_kedro?

Yes. The current workflow for a user is:

  1. Open the notebook with kedro jupyter notebook
  2. Call %reload_kedro_mlflow inside a cell
  3. (Optional) If modifications are made to the kedro objects / configuration (catalog, pipelines, mlflow.yml config) and the user needs to reload them, call:
%reload_kedro
%reload_kedro_mlflow

I wish it were:

  1. Open the notebook with kedro jupyter notebook
  2. (Optional) If modifications are made to the kedro objects / configuration (catalog, pipelines, mlflow.yml config) and the user needs to reload them, call %reload_kedro inside a cell

It is honestly not a big deal and the first workflow is quite satisfying, I just think that it may be more user friendly to hide the "kedro-mlflow" part from the user.

@stale
Copy link

stale bot commented Nov 10, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Nov 10, 2021
@antonymilne antonymilne pinned this issue Nov 10, 2021
@stale stale bot removed the stale label Nov 10, 2021
@antonymilne antonymilne unpinned this issue Nov 12, 2021
@antonymilne antonymilne added the pinned Issue shouldn't be closed by stale bot label Nov 12, 2021
@merelcht merelcht added the Community Issue/PR opened by the open-source community label Mar 7, 2022
@merelcht merelcht added the Component: Jupyter/IPython Issue/PR relevant for Jupyter Notebooks, IPython sessions and the interactive workflow in Kedro label Mar 24, 2022
@merelcht merelcht removed the Community Issue/PR opened by the open-source community label May 16, 2022
@antonymilne antonymilne removed the Component: Jupyter/IPython Issue/PR relevant for Jupyter Notebooks, IPython sessions and the interactive workflow in Kedro label Jun 8, 2022
@antonymilne
Copy link
Contributor

antonymilne commented Aug 24, 2022

Somehow it's nearly a year now since this was posted, but I have some fresh thoughts on it now! For context, we've been working on the kedro-viz %run_viz line magic recently (kedro-org/kedro-viz#1012). Now that I understand more about how these line magics work I've realised that our current setup is not great.

tl;dr: let's go for the proposal under Possible Alternatives above and remove the kedro.line_magic entrypoint altogether. It's very weak and doesn't add anything useful since we can achieve everything we want just through %load_ext already.

Background

  • the correct way to use the kedro in Jupyter/IPython is %load_ext kedro.extras.extensions.ipython (soon to be aliased to just %load_ext kedro: see Alias IPython extension #1763)
  • this is done automatically when you run kedro jupyter/ipython but can also be run on any managed instance to "kedro-ify" your jupyter/ipython session. The behaviour is identical; all kedro jupyter/ipython does is load the extension automatically
  • kedro defines an entrypoint kedro.line_magic, which according to a github search is used in publicly available repos precisely two times: kedro-mlflow and kedro-viz. I can see one old usage of it in an internal package too, and @Galileo-Galilei suggests he uses it internally too. But it's certainly hardly used at all
  • as I realised in Fix %run_viz on Databricks kedro-viz#1012 (comment) the way this entry point works is actually pretty bad. We wrap the call in needs_local_scope so that all line magics must have the variable local_ns in their signature, even if they don't want it. It should be the responsibility of the package defining the line magic to specify decorators like needs_local_scope; it shouldn't be forced on them by kedro
  • furthermore, this integration is extremely limited because it only allows you to register line magics (not cell magics or to push variables for example)
  • altogether it feels like this entry point was written "bespoke" so that kedro-viz could sneakily make %run_viz and as such was never documented. It's not powerful enough to be used for general cases. It's probably only @Galileo-Galilei who realised that it existed and used it

So let's remove the entrypoint!

If we can agree a good plan with @Galileo-Galilei then I would even be happy to do this as a non-breaking release, after doing a user survey to double check that no one else is using it.

What would replace it?

Here's the requirements:

  1. essential: ability to add custom behaviour into a notebook, e.g. to register line magics, push variables to scope, etc.
  2. nice to have: such code would automatically run when %load_ext kedro is run (or maybe when reload_kedro is run, not sure)

Essential (requirement 1)

As far as I can tell, satisfying requirement 1 is trivial because this is exactly what's already provided by IPython extensions! If you write an IPython extension then it can then do whatever you like - no need to be constrained like you currently are to just writing line magics with local_ns in their signature.

The user flow in a managed jupyter instance would then be:

%load_ext kedro
only needed if you're not in the project directory or want to change env etc.: %reload_kedro <path>
%load_ext kedro_mlflow
%load_ext kedro_viz

This is basically the Possible Alternatives proposal from @Galileo-Galilei above. The crucial thing is that I believe this statement is not correct:

However, we cannot access the objects created by reload_kedro (especially the session or the catalog if we want to modify /use them). We need to duplicate almost all the code to make it works, and it prevents combining several plugins.

So long as %reload_kedro has run (which is called by %load_ext kedro) then these variables are exactly what's available inside local_ns. i.e. you can do local_ns["catalog"] inside your kedro-mlflow line magic. Probably you don't even need local_ns and could get these out of get_ipython() somewhere.

Nice to have (requirement 2)

If we wanted to, this could be achieved by introducing a new, more general IPython entrypoint that runs when %load_ext or %reload_kedro is run. Or probably better, as in the original proposal, a hook. Technically this doesn't need to take any arguments since you can always do get_ipython, but probably for convenience we would expose catalog etc.

However, introducing this hook feels like a big addition for what would be a small increase in convenience for a small and infrequently used feature (basically just @Galileo-Galilei + kedro-viz). Given that all it does it remove the need for a user to type one extra line of code (%load_ext ...) I think it's not worth doing. The above solution of each plugin handling its own IPython extension feels clean and not having kedro automatically loading them also feels clean to me. If there's more call for adding this as a convenience in future then we could always add it, but honestly it feels very unlikely that many people would want it.

Please comment!

From the kedro side, I think the only possible objection would be that the flow for doing %run_viz would now require you to do %load_ext kedro_viz first. Personally I think this is totally fine. Maybe %load_ext kedro_viz would even execute %run_viz automatically (like %load_ext kedro runs %reload_kedro).

We should also check that this doesn't break everything. i.e. if someone upgraded to a version of kedro without this entrypoint but still had things registered to it, would everything crash horribly? What happens if you pip install kedro-viz and the entrypoint isn't defined?

@Galileo-Galilei what do you think of this all? As above, if it's a plan you are happy with then I think we should just do it after checking that no one else cares in a user survey.

@antonymilne
Copy link
Contributor

antonymilne commented Aug 24, 2022

I should also clarify that the above local_ns trick does work in terms of hot-reloading the variables, i.e. it's not stuck on the initial values of these variables populated when %reload_kedro first executes.

  • put print(local_ns) inside the definition of %run_viz
  • when you do %run_viz you see the catalog etc. variables populated as you'd expect
  • simulate %reload_kedro altering the variables by manually setting catalog = 10
  • doing %run_viz again shows the updated catalog = 10 value

@Galileo-Galilei
Copy link
Contributor Author

Galileo-Galilei commented Aug 25, 2022

Thank you for the very detailed answer!

If we can agree a good plan with @Galileo-Galilei then I would even be happy to do this as a non-breaking release, after doing a user survey to double check that no one else is using it.

Completely happy with it, and not considering it as a breaking change:

  • I am 100% sure that no one in my team is aware of this line magic in kedro-mlflow (they don't even use the one in kedro), and no of our company internal plugins use it neither.
  • Funnily, this is no longer needed at all: load_ext kedro.extras... calls load_context() which setup the configuration and stores the config inside the context as an attribute, as discussed in details here, so whatever your decision is, I can already remove it from kedro-mlflow

This is basically the Possible Alternatives proposal from @Galileo-Galilei above

Glad to go with this option!

So long as %reload_kedro has run (which is called by %load_ext kedro) then these variables are exactly what's available inside local_ns. i.e. you can do local_ns["catalog"] inside your kedro-mlflow line magic. Probably you don't even need local_ns and could get these out of get_ipython() somewhere.

I figured this out muuuuuuuuch longer after I wrote the original post 😅, but thank you for the indication!

However, introducing this hook feels like a big addition for what would be a small increase in convenience for a small and infrequently used feature

Completly agree on this. The feature seems too little used to be worth the engineering effort, especially since your "Requirement 1" would already be a very convenient setup.

We should also check that this doesn't break everything. i.e. if someone upgraded to a version of kedro without this entrypoint but still had things registered to it, would everything crash horribly? What happens if you pip install kedro-viz and the entrypoint isn't defined?

Indeed, it would be a pity to have a lot of users complaining because you removed something unused !

@Galileo-Galilei what do you think of this all? As above, if it's a plan you are happy with then I think we should just do it after checking that no one else cares in a user survey.

Let's go! 🚀

@antonymilne
Copy link
Contributor

antonymilne commented Aug 26, 2022

Awesome, thanks for the confirmation @Galileo-Galilei. Looking at the code again actually there's no way this can break anything since Kedro actually doesn't expose particular entrypoint that are available for use; it just tries to load things up using kedro.framework.cli.utils.load_entry_point. Hence if we don't look for the line_magic entrypoint at all then it doesn't matter if there's things registered there. They'll just be ignored and not break anything.

The actions here are:

Initial checks

  • Do a quick user survey to see if anyone is using kedro.line_magic entrypoint
  • Check that people (@yetudada, @idanov, @tynandebold) are happy with the additional %load_ext kedro_viz step to execute %run_viz. Note that in the future might not even be an additional step here; we could consider making %load_ext kedro_viz run a default version of %run_viz (without any extra pipeline etc. arguments). And we'd get the bonus of an %unload_ext command to kill the kedro-viz instance, which we've always been missing

Implement it!

  • Update kedro-viz to be an actual IPython extension that has an load_ipython_extension command and registers line magic %run_viz (ticket to follow on kedro-viz repo). In the same way that we've aliased kedro.extras.extensions.ipython to kedro we should alias it to kedro_viz
  • Update docs on how to do %run_viz to include extra %load_ext kedro_viz step
  • Remove the kedro.line_magic entrypoint from ENTRY_POINT_GROUPS and any code that refers to it (probably just kedro.extras.extensions.ipython + tests)
  • Future: write %unload_ext and consider whether just doing %load_ext kedro_viz should launch kedro-viz itself without the need for %run_viz step. Depends a bit on what happens with the two possible %run_viz commands. Need to think about how specifying arguments like pipeline would work.

@antonymilne antonymilne changed the title Improve plugin integration in notebooks Remove kedro.line_magic entrypoint Aug 26, 2022
@antonymilne antonymilne added the Stage: Technical Design 🎨 Ticket needs to undergo technical design before implementation label Aug 26, 2022
@antonymilne
Copy link
Contributor

antonymilne commented Aug 31, 2022

Notes from technical design session on 31 August:

  • General agreement that the current implementation of the line magic entrypoint is not correct (e.g. needs_local_scope should be on kedro-viz side)
  • Some (@jmholzer) agreed that we should remove the line magic entrypoint; others (@idanov @ankatiyar) had concerns that explicitly needing to do %load_ext kedro_viz is not as beginner-friendly as the existing workflow where %run_viz is registered automatically
  • @noklam did not like the idea of %load_ext kedro_viz itself loading kedro-viz e.g. how do you provide CLI arguments like pipeline? Note there is a flow here that's possible and consistent with %load_ext kedro though, in which %load_ext kedro_viz executes %run_viz but you can also call %run_viz yourself with arguments
  • @AhdraMeraliQB and @SajidAlamQB also like %run_viz line magic existing, but @SajidAlamQB thinks it's fine to have the extra %load_ext kedro_viz step as well
  • A more generic IPython entrypoint (and maybe hook with ipython +catalog/context/etc. arguments) would remove the need for the %load_ext step. @AntonyMilneQB (and @Galileo-Galilei from above discussion) think this is overkill for now and could always be added later

@noklam
Copy link
Contributor

noklam commented Aug 31, 2022

I am slightly in favor of loading %run_viz automatically.

Existing workflows

  1. kedro ipython/jupyter -> %reload_kedro (If you are in local)
  2. %load_ext kedro -> %reload_kedro (If you are in JupyterHub/Databricks)

I would expect 1 is a more common case, so I think most people would only interact with %reload_kedro, and therefore they should interact with %run_viz equivalently. We could also add more line magic in the future (maybe a line magic to save a static output or something related to experiment tracking?), this makes me less prefer overloading the load_ext too much.

Overall I agree the existing implementation is incorrect (arguably it's a bug, just that no one is using it), but I would go for a lower effort option which may just be fixing the local_scope, fixing the line magic to accept the correct arguments (if we haven't already fixed it?)

@antonymilne
Copy link
Contributor

@noklam I definitely see the argument for %load_ext kedro_viz.ipython not running kedro viz automatically. Note this idea is not so different from how the kedro IPython extension works though:

  • %load_ext kedro.ipython automatically runs reload_kedro (so your point 1 above isn't quite right: if you're in local usually you don't need to run %reload_kedro at all)
  • if you need to do something "more advanced" (e.g. you're on JupyterHub or you want to specify env) then you run %reload_kedro

Compare to possible kedro-viz IPython workflow:

  • %load_ext kedro_viz.ipython automatically runs kedro-viz
  • if you need to do something "more advanced" (e.g. you're on JupyterHub or you want to specify env or pipeline) then you run %run_viz

I don't feel strongly about %load_ext kedro_viz.ipython automatically running kedro-viz though - it does also feel a little unusual to me also.

What I think is more important is the existence of the line_magic entrypoint, and the more I think about this the more I still think we should remove it. Fixing this line_magic entrypoint needs a bit more than just moving the needs_local_scope I think:

  • I'm pretty sure that calling register_line_magic is not the right way to even register the function (should be ipython.register_magic_function)
  • the registration should be done in load_ipython_extension rather than reload_kedro
  • if we actually want this to be a feature that people use then it should be documented

None of these are hard things to do but I think the effort is comparable to just removing it. Adding the on_ipython_loaded hook would also be very easy and a much better solution if we do still want %run_viz to be registered automatically (i.e. without needing to do %load_ext kedro_viz.ipython).

One advantage of an explicit %load_ext kedro_viz.ipython that I forgot to mention before is that it would enable %unload_ext kedro_viz.ipython, which could be used to kill kedro-viz instances. This is something we don't currently have a way to do and would otherwise require a new line magic %kill_viz or something. The load/unload mechanism feels much nicer though.


Let me try and paint the picture in a different way: if this line_magic entrypoint didn't already exist in kedro, would we be trying to add it? I think very unlikely. It would be a non-trivial increase in the API surface for a small convenience for one particular use case that is not common (one less line to type if you want to launch kedro-viz from Jupyter). Given that the effort to remove it is about the same as that required to fix it, and removing it would be de-facto non-breaking, I still don't really see any point in keeping it.

It's such a minor thing though that I'm also happy just to leave it as it is rather than removing or fixing it... 😀

@noklam noklam modified the milestones: Improve the Interactive Jupyter notebook workflow, Something Jupyter Apr 24, 2023
@kedro-org kedro-org locked and limited conversation to collaborators Mar 27, 2024
@merelcht merelcht converted this issue into discussion #3748 Mar 27, 2024

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
Issue: Feature Request New feature or improvement to existing feature pinned Issue shouldn't be closed by stale bot Stage: Technical Design 🎨 Ticket needs to undergo technical design before implementation
Projects
Status: Done
Development

No branches or pull requests

4 participants