[spike] Improve Kedro CLI startup time #1476

noklam · 2022-04-25T09:00:06Z

Description

Currently, Kedro CLI startup time can be slow even if very little work is done, for example, doing kedro to show the available command or help can take a few seconds. These commands should feel instant since it is just printing out some help text without any heavy work.

Useful tool to profile import time

https://github.com/asottile/importtime-waterfall
python -X importtime run.py

The kedro command itself takes 0.7s, which isn't fast but still an acceptable response time. Initial investigation suggests that it can take few seconds mainly due to a slow import during plugin registration.

Kedro startup time with plugin installed

(0.7s, removed screenshot)

Kedro startup time with no plugin installed (telemetry + viz)

Causes:

Slow import from plugin
Slow import from kedro itself
Slow import from pkg_resource (something that is run with console_script)

Context

Why is this change important to you? How would you use it? How can it benefit other users?

The Kedro CLI can be slow sometimes and lead to a bad user experience. It should be the most common way that users interact with kedro.

Possible Implementation

(Optional) Suggest an idea for implementing the addition or change.
The first improvement that can be done is to do lazy loading in plugin registration to address (1). (2) & (3) will requires more time to study.

Possible Alternatives

(Optional) Describe any alternative solutions or features you've considered.
Compile some of the kedro CLI, so it prints out help message without actually loading any python code.

e.g.
kedro -> Load some pre-compiled text files and print, it may be tricky with plugins though.

The text was updated successfully, but these errors were encountered:

merelcht · 2022-05-04T12:35:05Z

Some older issues about this topic: https://github.com/quantumblacklabs/private-kedro/issues/930 & https://github.com/quantumblacklabs/private-kedro/pull/537

znfgnu · 2022-12-08T09:50:30Z

I created PR that should solve this problem in kedro-org/kedro-viz#1196 - any help with making tests passed would be appreciated

astrojuanlu · 2023-09-13T22:47:49Z

(From #3033)

Still an issue. For example, kedro info on a spaceflights project takes a bit more than 6 seconds on my machine (M1 MacBook Pro from 2022):

$ time kedro info
 _            _
| | _____  __| |_ __ ___
| |/ / _ \/ _` | '__/ _ \
|   <  __/ (_| | | | (_) |
|_|\_\___|\__,_|_|  \___/
v0.18.13

Kedro is a Python framework for
creating reproducible, maintainable
and modular data science code.

Installed plugins:
kedro_telemetry: 0.2.5 (entry points:cli_hooks,hooks)
kedro_viz: 6.5.0 (entry points:global,hooks,line_magic)

________________________________________________________
Executed in    6.27 secs    fish           external
   usr time    1.15 secs    0.26 millis    1.15 secs
   sys time    0.67 secs    2.65 millis    0.67 secs

After warmup (i.e. executing the command again immediately after) this can go down to ~1 second.

Ahead of this I did some profiling using https://github.com/joerick/pyinstrument and https://www.speedscope.app/, and this is what I found:

For kedro info with instrumentation on and no warmup, 7.5 seconds were spent importing the various kedro.framework.cli.* subpackages, and 1.7 seconds actually executing the function.

astrojuanlu · 2024-03-05T10:50:51Z

Some users are puzzled that kedro gets stuck and sometimes it's difficult to tell whether it's just slow https://linen-slack.kedro.org/t/16630828/i-m-encountering-issues-when-running-the-load-ext-kedro-ipyt#3758284a-34fb-435c-b054-dd3713cd78c9

Today with @rashidakanchwala kedro viz run took like 30 seconds on my M1 Mac.

astrojuanlu · 2024-03-06T13:48:46Z

I don't know what happened lately but Kedro has got unacceptably slow. I'm raising the priority of this.

astrojuanlu · 2024-05-13T08:40:53Z

❯ time kedro registry list
- __default__
- data_processing
- data_science
- reporting


________________________________________________________
Executed in   24.37 secs    fish           external
   usr time    5.08 secs  245.00 micros    5.08 secs
   sys time    4.95 secs  593.00 micros    4.95 secs

❯ time kedro registry list
- __default__
- data_processing
- data_science
- reporting


________________________________________________________
Executed in    3.72 secs    fish           external
   usr time    2.82 secs    0.24 millis    2.82 secs
   sys time    4.56 secs    1.33 millis    4.56 secs

On my computer, the first time it takes 24 seconds to run kedro registry list, second time it's 3 seconds.

As a first step, we may need a better understanding of how slow Kedro CLI startup time is, and why. Some questions I have:

Are there any commands that are slower? (Looks like kedro info is faster than kedro registry list)
What's the effect of having plugins? Specifically, what's the effect of having/not having kedro-viz, kedro-datasets?
All other things equal, are there significant performance differences in different operating systems?
- The "all other things equal" might be difficult to test due to differences in hardware
- Still, it's important to understand because different systems will do different smart things for caching (see the 24 vs 3 second difference)

Then as a next step, we'd need an assessment on what can we do about this. Ideally it shouldn't take this long to run seemingly simple commands, even with a cold cache. Can we aim for a combination of slashing the time in ~half + showing something during those initial seconds so the user knows that it didn't get stuck?

ankatiyar · 2024-05-28T15:50:37Z

Profile of the command kedro registry list using pyinstrument and visualisation using speedscope.app

Note: this is when kedro-viz and kedro-telemetry are also installed in the environment, and we have consented to sending data to kedro-telemetry

JSON: https://github.com/ankatiyar/kedro-profiles/blob/main/registry.json
Command: pyinstrument -r speedscope -o registry.json -m kedro registry list

From profiling various commands, there's three main things (that take up majority of the time) that happen before the code for the specific command is executed -

initialisation of the kedro/framework/cli module - kedro/framework/cli/__init__.py ~500ms
Some part of this takes time importing catalog/micropkg etc sub commands + the libraries used in those commands
initialisation of KedroCLI - kedro/framework/cli.py: KedroCLI.__init__ ~1.7s
This part loads up all the plugin commands including kedro-viz which is quite heavy and imports a lot of libraries in itself
Execution of the before_command_run hook, which sends data to kedro-telemetry~0.9-1s

That is almost 3s** before the actual code of the command is actually executed.
** on my M1 macbook

Proposal

On perusing the click documentation I came across this example of lazy loading of sub-commands: https://click.palletsprojects.com/en/8.1.x/complex/#lazily-loading-subcommands

1. Lazy loading of subcommands on `kedro-viz`

Quick prototype: kedro-org/kedro-viz#1920
Refactor the kedro-viz CLI with delayed imports and lazy loaded commands
This shaves off time when KedroCLI is initialised -

kedro/kedro/framework/cli/cli.py

Lines 105 to 107 in 011e3a8

    
           super().__init__( 
        
               ("Global commands", self.global_groups), 
        
               ("Project specific commands", self.project_groups),

Profile:

2. Lazy loading of subcommands of Kedro

Quick prototype: #3883

This does not seem to save too much time, on my system, sometimes it does and sometimes it doesn't. 🤔
For eg, kedro info is sometimes faster

Profile: LazyGroup is used for both Kedro CLI and Kedro-viz CLI

3. Use `after_command_run` on `kedro-telemetry`

We spend ~1s sending telemetry info before we actually execute the command. This can be moved to the after_command_run hook which will not affect the actual total time of the command that is run but the results of the command actually show up sooner. Additionally, after_command_run hook also receives the exit code of the command which might be useful telemetry info?

Profile of `kedro info` and `kedro registry list` with implemented suggestions

`kedro info`	current	lazy viz	lazy viz + kedro cli	telemetry after_command_run
kedro/framework/cli/init	3.37s	1.97s	2.15s	2.05s
KedroCLI init	579ms	649ms	511ms	473ms
telemetry hook	981ms	867ms	1.02s	1s
time before command is executed (approx)	3.2s	1.80s	2s	1s

`kedro registry list`	current	lazy viz	lazy viz + kedro cli	telemetry after_command_run
kedro/framework/cli/init	5s	3.6s	4s	4s
KedroCLI init	705ms	649ms	417ms	486ms
telemetry hook	959ms	877ms	1.01s	1.02s
time before command is executed (approx)	3.25s	1.80s	1.8s	900ms

Please note that this is just from one run of the commands, the time varies a little bit between each run of the command ^

Conclusion

Of the above suggestions, I think we would benefit greatly from making the Kedro-viz CLI either lazy or lighter. Moving to after_command_run hook on kedro-telemetry would also make the apparent time taken by the commands much faster.
On Kedro side, it's not super beneficial to make the subcommands lazy but we could do it to make things like kedro info slightly faster

ALSO:
The JSON files if you want to open them in speedscope.app yourself - https://github.com/ankatiyar/kedro-profiles
Miro board with the above screenshots + also same thing for kedro info - https://miro.com/app/board/uXjVKFQAxA4=/?share_link_id=239496676079

astrojuanlu · 2024-05-29T07:29:22Z

Thanks a lot @ankatiyar, this is an awesome summary.

There is one thing I think might be preventing us from getting useful results though. Consider this comparison between how much time it took to run kedro registry list right after turning on my laptop on an existing environment (therefore in theory all .pyc files were there) vs how much time it took to run right after ("hot cache").

First thing in the morning:
registry.json

Right after:
registry_hot_cache.json

These is my environment. Notice that I have a different one (with kedro-mlflow and other stuff).

$ uv pip freeze
aiofiles==23.2.1
alembic==1.13.1
aniso8601==9.0.1
annotated-types==0.7.0
antlr4-python3-runtime==4.9.3
anyio==3.7.1
appdirs==1.4.4
appnope==0.1.4
argon2-cffi==23.1.0
argon2-cffi-bindings==21.2.0
arrow==1.3.0
asttokens==2.4.1
async-lru==2.0.4
attrs==23.2.0
babel==2.15.0
beautifulsoup4==4.12.3
binaryornot==0.4.4
bleach==6.1.0
blinker==1.8.2
build==1.2.1
cachetools==5.3.3
certifi==2024.2.2
cffi==1.16.0
chardet==5.2.0
charset-normalizer==3.3.2
click==8.1.7
click-default-group==1.2.4
cloudpickle==3.0.0
comm==0.2.2
contourpy==1.2.1
cookiecutter==2.6.0
coverage==7.5.1
cycler==0.12.1
debugpy==1.8.1
decorator==5.1.1
defusedxml==0.7.1
deprecated==1.2.14
dnspython==2.6.1
docker==7.0.0
dynaconf==3.2.5
email-validator==2.1.1
entrypoints==0.4
et-xmlfile==1.1.0
executing==2.0.1
fastapi==0.111.0
fastapi-cli==0.0.4
fastjsonschema==2.19.1
flask==3.0.3
fonttools==4.51.0
fqdn==1.5.1
fsspec==2024.5.0
gitdb==4.0.11
gitpython==3.1.43
graphene==3.3
graphql-core==3.2.3
graphql-relay==3.2.0
gunicorn==22.0.0
h11==0.14.0
httpcore==1.0.5
httptools==0.6.1
httpx==0.27.0
idna==3.7
importlib-metadata==7.1.0
importlib-resources==6.4.0
iniconfig==2.0.0
ipykernel==6.29.4
ipython==8.24.0
isoduration==20.11.0
itsdangerous==2.2.0
jedi==0.19.1
jinja2==3.1.4
joblib==1.4.2
json5==0.9.25
jsonpointer==2.4
jsonschema==4.22.0
jsonschema-specifications==2023.12.1
jupyter-client==8.6.1
jupyter-core==5.7.2
jupyter-events==0.10.0
jupyter-lsp==2.2.5
jupyter-server==2.14.0
jupyter-server-terminals==0.5.3
jupyterlab==4.2.0
jupyterlab-pygments==0.3.0
jupyterlab-server==2.27.1
kedro==0.19.5
kedro-datasets==3.0.0
kedro-mlflow==0.12.2
kedro-telemetry==0.4.0
kedro-viz==9.0.0
kiwisolver==1.4.5
lazy-loader==0.4
mako==1.3.5
markdown==3.6
markdown-it-py==3.0.0
markupsafe==2.1.5
matplotlib==3.9.0
matplotlib-inline==0.1.7
mdurl==0.1.2
mistune==3.0.2
mlflow==2.13.0
more-itertools==10.2.0
nbclient==0.10.0
nbconvert==7.16.4
nbformat==5.10.4
nest-asyncio==1.6.0
networkx==3.3
notebook==7.2.0
notebook-shim==0.2.4
numpy==1.26.4
omegaconf==2.3.0
openpyxl==3.1.2
opentelemetry-api==1.16.0
opentelemetry-sdk==1.16.0
opentelemetry-semantic-conventions==0.37b0
orjson==3.10.3
overrides==7.7.0
packaging==23.2
pandas==2.2.2
pandocfilters==1.5.1
parse==1.20.1
parso==0.8.4
pexpect==4.9.0
pillow==10.3.0
platformdirs==4.2.2
plotly==5.22.0
pluggy==1.3.0
pre-commit-hooks==4.6.0
prometheus-client==0.20.0
prompt-toolkit==3.0.43
protobuf==4.25.3
psutil==5.9.8
ptyprocess==0.7.0
pure-eval==0.2.2
pyarrow==15.0.2
pycparser==2.22
pydantic==2.7.1
pydantic-core==2.18.2
pygments==2.18.0
pyinstrument==4.6.2
pyparsing==3.1.2
pyproject-hooks==1.1.0
pytest==7.4.4
pytest-cov==3.0.0
pytest-mock==1.13.0
python-dateutil==2.9.0.post0
python-dotenv==1.0.1
python-json-logger==2.0.7
python-multipart==0.0.9
python-slugify==8.0.4
pytoolconfig==1.3.1
pytz==2024.1
pyyaml==6.0.1
pyzmq==26.0.3
querystring-parser==1.2.4
referencing==0.35.1
requests==2.32.1
rfc3339-validator==0.1.4
rfc3986-validator==0.1.1
rich==13.7.1
rope==1.13.0
rpds-py==0.18.1
ruamel-yaml==0.18.6
ruamel-yaml-clib==0.2.8
ruff==0.1.15
scikit-learn==1.4.2
scipy==1.13.0
seaborn==0.12.2
secure==0.3.0
send2trash==1.8.3
setuptools==70.0.0
shellingham==1.5.4
six==1.16.0
smmap==5.0.1
sniffio==1.3.1
soupsieve==2.5
speedscope==1.0
sqlalchemy==2.0.30
sqlparse==0.5.0
stack-data==0.6.3
starlette==0.37.2
strawberry-graphql==0.229.1
tenacity==8.3.0
terminado==0.18.1
text-unidecode==1.3
threadpoolctl==3.5.0
tinycss2==1.3.0
toml==0.10.2
toposort==1.10
tornado==6.4
traitlets==5.14.3
typer==0.12.3
types-python-dateutil==2.9.0.20240316
typing-extensions==4.11.0
tzdata==2024.1
ujson==5.10.0
uri-template==1.3.0
urllib3==2.2.1
uvicorn==0.29.0
uvloop==0.19.0
watchfiles==0.21.0
watchgod==0.8.2
wcwidth==0.2.13
webcolors==1.13
webencodings==0.5.1
websocket-client==1.8.0
websockets==12.0
werkzeug==3.0.3
wrapt==1.16.0
zipp==3.18.2

I don't know if it's something specific to my computer (2021 MacBook Pro, Apple M1 Max, 32 GB of RAM), but I anticipate that this will probably make effective performance measurements quite hard.

Some pointers, in case it helps:

astrojuanlu · 2024-05-29T07:30:39Z

Having said that, if at least the times are ~proportional in the cold & hot cache situations, then this doesn't affect anything and we can proceed with confidence.

datajoely · 2024-05-29T08:06:50Z

Amazing work @ankatiyar - Can you also run this when you have a lot of nodes, datasets and parameters for the catalog to resolve? My hunch is that is the real bottleneck felt with users at scale.

astrojuanlu · 2024-05-29T08:32:12Z

@datajoely Conversation about performance is spread around different issues that tackle different aspects:

Startup (this one)
Complex config Investigate performance of config loading for big projects #3893
Complex pipelines Create example projects to assess Kedro performance for complex pipelines #3866

Startup affects even simple commands, like kedro info and kedro registry list.

datajoely · 2024-05-29T08:43:35Z

Excellent - I still feel these can be thought of as sub-issues of the start up parent. Runtime performance once the session and catalog is built is fine.

ankatiyar · 2024-05-29T08:50:42Z

Looking at @astrojuanlu's profile of the command, it also seems like loading of project_groups commands like the ones in kedro-mlflow also take up a bunch of time so just implementing lazy loading on kedro-viz might only solve part of the problem. kedro-mlflow ~13s and kedro-viz ~5s

I'm trying out if all commands can somehow be lazily loaded for all plugins on Kedro side. [TO BE UPDATED]

As for hot-caching, I think the times seem proportional, even on my machine, I ran the commands a few time so the profiles shown in the above comment are after the warmup. I don't know if we can do anything about the performance differences between when your OS is warmed up or not but the lazy loading and the telemetry stuff is very small effort changes for a relatively faster CLI experience on our end.

DimedS · 2024-05-29T14:21:09Z

Great investigation, @ankatiyar! I also share the concern about cold starts versus subsequent runs. Like @astrojuanlu, my first command takes more than 10-20 seconds, while subsequent runs are much faster. I like the idea of moving kedro-telemetry to the after_command_run hook to speed up the original code execution. But will we lose information about failed runs if we make this change?

noklam · 2024-05-29T14:25:59Z

Looking at @astrojuanlu's profile of the command, it also seems like loading of project_groups commands like the ones in kedro-mlflow

@ankatiyar Is there anything we can do from framework side to control this instead of implementing this in the plugin library? Is the Lazy loading having no effect (does libraries still get imported?)

telemetry is a quick win, echo @DimedS we should make sure it still send telemetry when pipeline fails.

ankatiyar · 2024-05-29T14:40:42Z

@noklam @DimedS
I believe the after_command_run is still executed if the run fails and it also receives the exit code of the command run. I'll confirm though

astrojuanlu · 2024-05-29T15:21:43Z

I don't know if the architecture of pluggy or kedro allows this, but if we don't expect plugins to be able to add pipelines to the registry, maybe we should try to delay the plugin discovery process as much as possible. If one runs kedro registry list or kedro info in principle that should already be enough information to not load the CLI extensions, right?

ankatiyar · 2024-05-30T13:05:27Z

Update re: Lazy Loading of Plugins

TLDR: It's possible to lazy load plugins and their commands on Kedro side

Prototype PR: #3901

Profile of regular `kedro registry list`

Updated proposal

Note

this time I have kedro-mlflow + kedro-airflow + kedro-telemetry+kedro-viz installed in the system for the below profiles. Also have consented to collection of kedro-telemetry data.

Lazy loading of plugins and their commands

In #3901 -

We don't load the entry points of all the plugins eagerly, just get the entry points and pass them to CommandCollection. We already use a custom version of click's CommandCollection -

kedro/kedro/framework/cli/utils.py

Line 111 in 27f5405

class CommandCollection(click.CommandCollection):
If a command is not in existing list of commands CommandCollection.list_commands(), add plugins till it is.
Execute the command

Profile of `kedro registry list` with lazy loading of plugins

Send telemetry info in `after_command_run` [Out of scope]

See: kedro-org/kedro-plugins#709
As mentioned in #1476 (comment), this would not reduce the TOTAL time taken by a command but will reduce the apparent time taken by the command.

Profile of `kedro registry list` with lazy loading of plugins + telemetry `after_command_run`

Profile data

`kedro registry list`	regular	with lazy plugin loading	lazy + telemetry after
initialisation of CLI module	858ms	576ms	553
KedroCLI init	4.11s	270ms	287ms
sending telemetry	1s	0.8s	1s
time before the main code runs	~6s	~1.7s	~0.8s
total	6.96s	3.76s	3.8s

`kedro info`	regular	with lazy plugin loading	lazy + telemetry after
initialisation of CLI module	860ms	737ms	846ms
KedroCLI init	3.97s	292ms	304ms
sending telemetry	1.05s	880ms	858ms
time before the main code runs	~5.7s	~1.9s	~0.8s
total	5.99s	2.03s	2.12s

Things to note

It might take same or more time to run plugin commands - there might be a better way to load only the plugin for which the command we're running than just iterating through the entrypoints till we load the plugin which has the command
There'll also be some delay when the command is wrong or doesn't exist. For eg kedro infroo will take a while before the error message is displayed because we'll spend some time loading plugins. I don't think this is too bad since we eagerly load these commands already right now.
We could still consider combining this with the LazyGroup strategy in try delaying import + lazy subcommands kedro-viz#1920 and Lazy load click subcommands #3883 or just look into improving the CLI setup on kedro-viz in general. Kedro viz could be sped up a little bit by delaying imports at least 🤔

datajoely · 2024-05-30T13:17:22Z

I'm amazed kedro-telemetry is so chunky, we should absolutely defer it till after the important stuff, but I wonder if async is sensible here too.

astrojuanlu · 2024-05-30T13:37:54Z

but I wonder if async is sensible here too.

Yeah in this case it would make sense to make it async, but... pytest-dev/pluggy#320

ankatiyar · 2024-05-30T13:38:17Z

I'm amazed kedro-telemetry is so chunky, we should absolutely defer it till after the important stuff, but I wonder if async is sensible here too.

I was just testing the switch to after_command_run hook on kedro-telemetry, seems like the hook doesn't get executed when there is an error in the pipeline. (cc @noklam and @DimedS also because they asked before)
I think async would be the way to go

I'll create a separate issue for kedro-telemetry stuff and we can focus on the plugin loading stuff here.

noklam · 2024-05-30T13:57:26Z

after_command_run
https://github.com/kedro-org/kedro-plugins/blob/064c5d9bc491a3f96f4341a2c2d1f034b7b354cb/kedro-telemetry/kedro_telemetry/plugin.py#L185

We still have the project hook that send telemetry immediately after. We should combine the two hooks to avoid checking consent twice as well.

astrojuanlu · 2024-06-03T12:06:49Z

For now, shall we then proceed with

? How does kedro-org/kedro-viz#1920 interact or conflict with #3901 ?

ankatiyar · 2024-06-03T13:01:10Z

For now, shall we then proceed with

Lazy load commands from plugins #3901

and Lazy load click subcommands #3883

? How does kedro-org/kedro-viz#1920 interact or conflict with #3901 ?

These solutions can be used together of course,

First step: Lazy load commands from plugins #3901 Loads plugins lazily when KedroCLI is initialised
Second step: Lazy load click subcommands #3883 & try delaying import + lazy subcommands kedro-viz#1920 load the "subcommands" lazily (catalog commands, micropkg commands, etc and run and deploy for viz)

I think it's definitely worth refactoring CLI code on kedro-viz side as there is a lot of heavy imports in their cli.py

There's another thing to note -
Right now, if you do kedro --help, you also see all plugin commands:

After lazy plugin loading (i.e after #3901), we lose the plugin commands from the top-level help text:

But you can always do kedro airflow --help or kedro mlflow --help

astrojuanlu · 2024-06-03T13:18:29Z

Good point about losing visibility of the full list in kedro --help. Wondering if it's any way to retain both.

znfgnu · 2024-06-06T21:50:12Z

Are there any contraindications to:

start kedro process in the background,
make cli communicate with it,
handle cli commands by forking the process?

This way, once loaded, the process with libraries would be ready to fork, unless stopped using cli command.

noklam · 2024-06-06T22:53:36Z

@znfgnu IPython essentially?

datajoely · 2024-06-07T09:45:35Z

I think - and correct me if I'm wrong - the suggestion is about creating a long living kedro process and then a client process expose the CLI and can interact with it without the start-up penalty

noklam · 2024-06-07T10:11:18Z

Would the user need to do anything differently? If we can make this almost invisible to the users then it's great.

Is there any examples of Python base CLI doing this?

astrojuanlu · 2024-06-07T10:30:10Z

The current notion of Kedro is that it's a library and a framework. I don't think we should underestimate the consequences of making it a service, in terms of software architecture, reusability, perception etc. Feel free to open a new discussion https://github.com/kedro-org/kedro/discussions/

ankatiyar · 2024-06-19T14:29:12Z

Discussed in Tech Design on 19th June:

Two approaches discussed:

Make subcommands of Kedro and Kedro-viz lazy, described in this command and prototypes in try delaying import + lazy subcommands kedro-viz#1920 and Lazy load click subcommands #3883
Make loading of plugins lazy, described in this comment and prototype in Lazy load commands from plugins #3901

The drawbacks of second approach:

Current implementation loses out on the commands from the plugin when you do kedro --help (as shown in this comment)
Losing the ability for plugins to override Kedro "core" commands. https://docs.kedro.org/en/stable/extend_kedro/plugins.html#global-and-project-commands

Decision: To go with approach 1 for now - lazy subcommands on Kedro and Kedro Viz with delayed imports, and make sure that the help texts still show up.

astrojuanlu · 2024-06-20T09:55:12Z

Dropping this here as a interesting historical reference.

Consider the example of a Python command line program (CLI) with a number of subcommands. Each subcommand may perform different tasks, requiring the import of different dependencies. But a given invocation of the program will only execute a single subcommand, or possibly none (i.e. if just --help usage info is requested). Top-level eager imports in such a program will result in the import of many modules that will never be used at all; the time spent (possibly compiling and) executing these modules is pure waste.

To improve startup time, some large Python CLIs make imports lazy by manually placing imports inline into functions to delay imports of expensive subsystems. This manual approach is labor-intensive and fragile; one misplaced import or refactor can easily undo painstaking optimization work.

https://peps.python.org/pep-0690/#motivation (rejected)

This is just a hard problem in Python in general.

noklam added Issue: Feature Request New feature or improvement to existing feature Component: CLI Issue/PR that addresses the CLI for Kedro labels Apr 25, 2022

noklam mentioned this issue May 17, 2022

Handle plugin registration failure ContextualVersionConflict with log instead of raising error #1542

Merged

5 tasks

znfgnu mentioned this issue Dec 7, 2022

Kedro CLI startup time made shorter kedro-org/kedro-viz#1196

Merged

5 tasks

merelcht modified the milestones: Remove and improve CLI commands, Remove development related CLI commands, Improve CLI commands Feb 6, 2023

noklam mentioned this issue Mar 14, 2023

Update docs to remove mentions of pkg_resources #2391

Closed

noklam mentioned this issue Jul 30, 2023

perf(datasets): lazily load datasets in init files kedro-org/kedro-plugins#277

Merged

4 tasks

noklam mentioned this issue Sep 13, 2023

Kedro initialization is slow #3033

Closed

astrojuanlu mentioned this issue Sep 25, 2023

Simplify api hierarchy #712

Closed

astrojuanlu mentioned this issue Oct 5, 2023

Performance profiling on kedro new with starter and without starter kedro-org/kedro-plugins#120

Closed

astrojuanlu mentioned this issue Oct 25, 2023

Conduct external market research kedro-org/kedro-devrel#94

Closed

astrojuanlu modified the milestones: Improve Kedro CLI, Improve Developer Experience Nov 28, 2023

astrojuanlu mentioned this issue May 8, 2024

Create massive pipeline to test with flowchart on Kedro-viz kedro-org/kedro-viz#1064

Open

1 task

astrojuanlu mentioned this issue May 13, 2024

Create example projects to assess Kedro performance for complex pipelines #3866

Open

merelcht changed the title ~~Improve Kedro CLI startup time~~ [spike] Improve Kedro CLI startup time May 13, 2024

merelcht assigned ravi-kumar-pilla and ankatiyar May 13, 2024

noklam mentioned this issue May 24, 2024

Disable rich markup if rich handler is not present #3682

Merged

7 tasks

ankatiyar mentioned this issue May 30, 2024

kedro-telemetry: Improve performance kedro-org/kedro-plugins#709

Open

ankatiyar mentioned this issue Jun 10, 2024

Lazy load commands from plugins #3901

Closed

7 tasks

This was referenced Jun 20, 2024

Implement lazy loading of subcommands for CLI kedro-org/kedro-viz#1952

Open

Lazy load click subcommands #3883

Merged

ankatiyar closed this as completed in #3883 Jul 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[spike] Improve Kedro CLI startup time #1476

[spike] Improve Kedro CLI startup time #1476

noklam commented Apr 25, 2022 •

edited

Loading

merelcht commented May 4, 2022

znfgnu commented Dec 8, 2022

astrojuanlu commented Sep 13, 2023

astrojuanlu commented Mar 5, 2024

astrojuanlu commented Mar 6, 2024

astrojuanlu commented May 13, 2024

ankatiyar commented May 28, 2024 •

edited

Loading

astrojuanlu commented May 29, 2024

astrojuanlu commented May 29, 2024

datajoely commented May 29, 2024 •

edited

Loading

astrojuanlu commented May 29, 2024

datajoely commented May 29, 2024

ankatiyar commented May 29, 2024

DimedS commented May 29, 2024

noklam commented May 29, 2024 •

edited

Loading

ankatiyar commented May 29, 2024

astrojuanlu commented May 29, 2024

ankatiyar commented May 30, 2024 •

edited

Loading

datajoely commented May 30, 2024

astrojuanlu commented May 30, 2024

ankatiyar commented May 30, 2024 •

edited

Loading

noklam commented May 30, 2024

astrojuanlu commented Jun 3, 2024

ankatiyar commented Jun 3, 2024

astrojuanlu commented Jun 3, 2024

znfgnu commented Jun 6, 2024

noklam commented Jun 6, 2024

datajoely commented Jun 7, 2024

noklam commented Jun 7, 2024

astrojuanlu commented Jun 7, 2024

ankatiyar commented Jun 19, 2024

astrojuanlu commented Jun 20, 2024

[spike] Improve Kedro CLI startup time #1476

[spike] Improve Kedro CLI startup time #1476

Comments

noklam commented Apr 25, 2022 • edited Loading

Description

Useful tool to profile import time

Kedro startup time with plugin installed

Kedro startup time with no plugin installed (telemetry + viz)

Context

Possible Implementation

Possible Alternatives

merelcht commented May 4, 2022

znfgnu commented Dec 8, 2022

astrojuanlu commented Sep 13, 2023

astrojuanlu commented Mar 5, 2024

astrojuanlu commented Mar 6, 2024

astrojuanlu commented May 13, 2024

ankatiyar commented May 28, 2024 • edited Loading

Proposal

1. Lazy loading of subcommands on kedro-viz

2. Lazy loading of subcommands of Kedro

3. Use after_command_run on kedro-telemetry

Profile of kedro info and kedro registry list with implemented suggestions

Conclusion

astrojuanlu commented May 29, 2024

astrojuanlu commented May 29, 2024

datajoely commented May 29, 2024 • edited Loading

astrojuanlu commented May 29, 2024

datajoely commented May 29, 2024

ankatiyar commented May 29, 2024

DimedS commented May 29, 2024

noklam commented May 29, 2024 • edited Loading

ankatiyar commented May 29, 2024

astrojuanlu commented May 29, 2024

ankatiyar commented May 30, 2024 • edited Loading

Update re: Lazy Loading of Plugins

Profile of regular kedro registry list

Updated proposal

Lazy loading of plugins and their commands

Profile of kedro registry list with lazy loading of plugins

Send telemetry info in after_command_run [Out of scope]

Profile of kedro registry list with lazy loading of plugins + telemetry after_command_run

Profile data

Things to note

datajoely commented May 30, 2024

astrojuanlu commented May 30, 2024

ankatiyar commented May 30, 2024 • edited Loading

noklam commented May 30, 2024

astrojuanlu commented Jun 3, 2024

ankatiyar commented Jun 3, 2024

astrojuanlu commented Jun 3, 2024

znfgnu commented Jun 6, 2024

noklam commented Jun 6, 2024

datajoely commented Jun 7, 2024

noklam commented Jun 7, 2024

astrojuanlu commented Jun 7, 2024

ankatiyar commented Jun 19, 2024

astrojuanlu commented Jun 20, 2024

noklam commented Apr 25, 2022 •

edited

Loading

ankatiyar commented May 28, 2024 •

edited

Loading

1. Lazy loading of subcommands on `kedro-viz`

3. Use `after_command_run` on `kedro-telemetry`

Profile of `kedro info` and `kedro registry list` with implemented suggestions

datajoely commented May 29, 2024 •

edited

Loading

noklam commented May 29, 2024 •

edited

Loading

ankatiyar commented May 30, 2024 •

edited

Loading

Profile of regular `kedro registry list`

Profile of `kedro registry list` with lazy loading of plugins

Send telemetry info in `after_command_run` [Out of scope]

Profile of `kedro registry list` with lazy loading of plugins + telemetry `after_command_run`

ankatiyar commented May 30, 2024 •

edited

Loading