Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[spike] Improve Kedro CLI startup time #1476

Closed
noklam opened this issue Apr 25, 2022 · 32 comments · Fixed by #3883
Closed

[spike] Improve Kedro CLI startup time #1476

noklam opened this issue Apr 25, 2022 · 32 comments · Fixed by #3883
Assignees
Labels
Component: CLI Issue/PR that addresses the CLI for Kedro Issue: Feature Request New feature or improvement to existing feature

Comments

@noklam
Copy link
Contributor

noklam commented Apr 25, 2022

Description

Currently, Kedro CLI startup time can be slow even if very little work is done, for example, doing kedro to show the available command or help can take a few seconds. These commands should feel instant since it is just printing out some help text without any heavy work.

Useful tool to profile import time

  1. The kedro command itself takes 0.7s, which isn't fast but still an acceptable response time. Initial investigation suggests that it can take few seconds mainly due to a slow import during plugin registration.

Kedro startup time with plugin installed

(0.7s, removed screenshot)

Kedro startup time with no plugin installed (telemetry + viz)

image

Causes:

  1. Slow import from plugin
  2. Slow import from kedro itself
  3. Slow import from pkg_resource (something that is run with console_script)

Context

Why is this change important to you? How would you use it? How can it benefit other users?

The Kedro CLI can be slow sometimes and lead to a bad user experience. It should be the most common way that users interact with kedro.

Possible Implementation

(Optional) Suggest an idea for implementing the addition or change.
The first improvement that can be done is to do lazy loading in plugin registration to address (1). (2) & (3) will requires more time to study.

Possible Alternatives

(Optional) Describe any alternative solutions or features you've considered.
Compile some of the kedro CLI, so it prints out help message without actually loading any python code.

e.g.
kedro -> Load some pre-compiled text files and print, it may be tricky with plugins though.

@noklam noklam added Issue: Feature Request New feature or improvement to existing feature Component: CLI Issue/PR that addresses the CLI for Kedro labels Apr 25, 2022
@merelcht
Copy link
Member

merelcht commented May 4, 2022

Some older issues about this topic: https://github.com/quantumblacklabs/private-kedro/issues/930 & https://github.com/quantumblacklabs/private-kedro/pull/537

@znfgnu
Copy link

znfgnu commented Dec 8, 2022

I created PR that should solve this problem in kedro-org/kedro-viz#1196 - any help with making tests passed would be appreciated

@astrojuanlu
Copy link
Member

(From #3033)

Still an issue. For example, kedro info on a spaceflights project takes a bit more than 6 seconds on my machine (M1 MacBook Pro from 2022):

$ time kedro info
 _            _
| | _____  __| |_ __ ___
| |/ / _ \/ _` | '__/ _ \
|   <  __/ (_| | | | (_) |
|_|\_\___|\__,_|_|  \___/
v0.18.13

Kedro is a Python framework for
creating reproducible, maintainable
and modular data science code.

Installed plugins:
kedro_telemetry: 0.2.5 (entry points:cli_hooks,hooks)
kedro_viz: 6.5.0 (entry points:global,hooks,line_magic)

________________________________________________________
Executed in    6.27 secs    fish           external
   usr time    1.15 secs    0.26 millis    1.15 secs
   sys time    0.67 secs    2.65 millis    0.67 secs

After warmup (i.e. executing the command again immediately after) this can go down to ~1 second.

Ahead of this I did some profiling using https://github.com/joerick/pyinstrument and https://www.speedscope.app/, and this is what I found:

image

For kedro info with instrumentation on and no warmup, 7.5 seconds were spent importing the various kedro.framework.cli.* subpackages, and 1.7 seconds actually executing the function.

@astrojuanlu
Copy link
Member

Some users are puzzled that kedro gets stuck and sometimes it's difficult to tell whether it's just slow https://linen-slack.kedro.org/t/16630828/i-m-encountering-issues-when-running-the-load-ext-kedro-ipyt#3758284a-34fb-435c-b054-dd3713cd78c9

Today with @rashidakanchwala kedro viz run took like 30 seconds on my M1 Mac.

@astrojuanlu
Copy link
Member

I don't know what happened lately but Kedro has got unacceptably slow. I'm raising the priority of this.

@astrojuanlu
Copy link
Member

❯ time kedro registry list
- __default__
- data_processing
- data_science
- reporting


________________________________________________________
Executed in   24.37 secs    fish           external
   usr time    5.08 secs  245.00 micros    5.08 secs
   sys time    4.95 secs  593.00 micros    4.95 secs

❯ time kedro registry list
- __default__
- data_processing
- data_science
- reporting


________________________________________________________
Executed in    3.72 secs    fish           external
   usr time    2.82 secs    0.24 millis    2.82 secs
   sys time    4.56 secs    1.33 millis    4.56 secs

On my computer, the first time it takes 24 seconds to run kedro registry list, second time it's 3 seconds.

As a first step, we may need a better understanding of how slow Kedro CLI startup time is, and why. Some questions I have:

  • Are there any commands that are slower? (Looks like kedro info is faster than kedro registry list)
  • What's the effect of having plugins? Specifically, what's the effect of having/not having kedro-viz, kedro-datasets?
  • All other things equal, are there significant performance differences in different operating systems?
    • The "all other things equal" might be difficult to test due to differences in hardware
    • Still, it's important to understand because different systems will do different smart things for caching (see the 24 vs 3 second difference)

Then as a next step, we'd need an assessment on what can we do about this. Ideally it shouldn't take this long to run seemingly simple commands, even with a cold cache. Can we aim for a combination of slashing the time in ~half + showing something during those initial seconds so the user knows that it didn't get stuck?

@merelcht merelcht changed the title Improve Kedro CLI startup time [spike] Improve Kedro CLI startup time May 13, 2024
@ankatiyar
Copy link
Contributor

ankatiyar commented May 28, 2024

Profile of the command kedro registry list using pyinstrument and visualisation using speedscope.app

Note: this is when kedro-viz and kedro-telemetry are also installed in the environment, and we have consented to sending data to kedro-telemetry

image

JSON: https://github.com/ankatiyar/kedro-profiles/blob/main/registry.json
Command: pyinstrument -r speedscope -o registry.json -m kedro registry list

From profiling various commands, there's three main things (that take up majority of the time) that happen before the code for the specific command is executed -

  • initialisation of the kedro/framework/cli module - kedro/framework/cli/__init__.py ~500ms
    Some part of this takes time importing catalog/micropkg etc sub commands + the libraries used in those commands
  • initialisation of KedroCLI - kedro/framework/cli.py: KedroCLI.__init__ ~1.7s
    This part loads up all the plugin commands including kedro-viz which is quite heavy and imports a lot of libraries in itself
  • Execution of the before_command_run hook, which sends data to kedro-telemetry~0.9-1s

That is almost 3s** before the actual code of the command is actually executed.
** on my M1 macbook

Proposal

On perusing the click documentation I came across this example of lazy loading of sub-commands: https://click.palletsprojects.com/en/8.1.x/complex/#lazily-loading-subcommands

1. Lazy loading of subcommands on kedro-viz

Quick prototype: kedro-org/kedro-viz#1920
Refactor the kedro-viz CLI with delayed imports and lazy loaded commands
This shaves off time when KedroCLI is initialised -

super().__init__(
("Global commands", self.global_groups),
("Project specific commands", self.project_groups),

Profile:
image

2. Lazy loading of subcommands of Kedro

Quick prototype: #3883

This does not seem to save too much time, on my system, sometimes it does and sometimes it doesn't. 🤔
For eg, kedro info is sometimes faster

Profile: LazyGroup is used for both Kedro CLI and Kedro-viz CLI
image

3. Use after_command_run on kedro-telemetry

We spend ~1s sending telemetry info before we actually execute the command. This can be moved to the after_command_run hook which will not affect the actual total time of the command that is run but the results of the command actually show up sooner. Additionally, after_command_run hook also receives the exit code of the command which might be useful telemetry info?

image

Profile of kedro info and kedro registry list with implemented suggestions

kedro info current lazy viz lazy viz + kedro cli telemetry after_command_run
kedro/framework/cli/init 3.37s 1.97s 2.15s 2.05s
KedroCLI init 579ms 649ms 511ms 473ms
telemetry hook 981ms 867ms 1.02s 1s
time before command is executed (approx) 3.2s 1.80s 2s 1s
kedro registry list current lazy viz lazy viz + kedro cli telemetry after_command_run
kedro/framework/cli/init 5s 3.6s 4s 4s
KedroCLI init 705ms 649ms 417ms 486ms
telemetry hook 959ms 877ms 1.01s 1.02s
time before command is executed (approx) 3.25s 1.80s 1.8s 900ms

Please note that this is just from one run of the commands, the time varies a little bit between each run of the command ^

Conclusion

Of the above suggestions, I think we would benefit greatly from making the Kedro-viz CLI either lazy or lighter. Moving to after_command_run hook on kedro-telemetry would also make the apparent time taken by the commands much faster.
On Kedro side, it's not super beneficial to make the subcommands lazy but we could do it to make things like kedro info slightly faster

ALSO:
The JSON files if you want to open them in speedscope.app yourself - https://github.com/ankatiyar/kedro-profiles
Miro board with the above screenshots + also same thing for kedro info - https://miro.com/app/board/uXjVKFQAxA4=/?share_link_id=239496676079

@astrojuanlu
Copy link
Member

Thanks a lot @ankatiyar, this is an awesome summary.

There is one thing I think might be preventing us from getting useful results though. Consider this comparison between how much time it took to run kedro registry list right after turning on my laptop on an existing environment (therefore in theory all .pyc files were there) vs how much time it took to run right after ("hot cache").

kedro-cache

First thing in the morning:
registry.json

Right after:
registry_hot_cache.json

These is my environment. Notice that I have a different one (with kedro-mlflow and other stuff).

$ uv pip freeze
aiofiles==23.2.1
alembic==1.13.1
aniso8601==9.0.1
annotated-types==0.7.0
antlr4-python3-runtime==4.9.3
anyio==3.7.1
appdirs==1.4.4
appnope==0.1.4
argon2-cffi==23.1.0
argon2-cffi-bindings==21.2.0
arrow==1.3.0
asttokens==2.4.1
async-lru==2.0.4
attrs==23.2.0
babel==2.15.0
beautifulsoup4==4.12.3
binaryornot==0.4.4
bleach==6.1.0
blinker==1.8.2
build==1.2.1
cachetools==5.3.3
certifi==2024.2.2
cffi==1.16.0
chardet==5.2.0
charset-normalizer==3.3.2
click==8.1.7
click-default-group==1.2.4
cloudpickle==3.0.0
comm==0.2.2
contourpy==1.2.1
cookiecutter==2.6.0
coverage==7.5.1
cycler==0.12.1
debugpy==1.8.1
decorator==5.1.1
defusedxml==0.7.1
deprecated==1.2.14
dnspython==2.6.1
docker==7.0.0
dynaconf==3.2.5
email-validator==2.1.1
entrypoints==0.4
et-xmlfile==1.1.0
executing==2.0.1
fastapi==0.111.0
fastapi-cli==0.0.4
fastjsonschema==2.19.1
flask==3.0.3
fonttools==4.51.0
fqdn==1.5.1
fsspec==2024.5.0
gitdb==4.0.11
gitpython==3.1.43
graphene==3.3
graphql-core==3.2.3
graphql-relay==3.2.0
gunicorn==22.0.0
h11==0.14.0
httpcore==1.0.5
httptools==0.6.1
httpx==0.27.0
idna==3.7
importlib-metadata==7.1.0
importlib-resources==6.4.0
iniconfig==2.0.0
ipykernel==6.29.4
ipython==8.24.0
isoduration==20.11.0
itsdangerous==2.2.0
jedi==0.19.1
jinja2==3.1.4
joblib==1.4.2
json5==0.9.25
jsonpointer==2.4
jsonschema==4.22.0
jsonschema-specifications==2023.12.1
jupyter-client==8.6.1
jupyter-core==5.7.2
jupyter-events==0.10.0
jupyter-lsp==2.2.5
jupyter-server==2.14.0
jupyter-server-terminals==0.5.3
jupyterlab==4.2.0
jupyterlab-pygments==0.3.0
jupyterlab-server==2.27.1
kedro==0.19.5
kedro-datasets==3.0.0
kedro-mlflow==0.12.2
kedro-telemetry==0.4.0
kedro-viz==9.0.0
kiwisolver==1.4.5
lazy-loader==0.4
mako==1.3.5
markdown==3.6
markdown-it-py==3.0.0
markupsafe==2.1.5
matplotlib==3.9.0
matplotlib-inline==0.1.7
mdurl==0.1.2
mistune==3.0.2
mlflow==2.13.0
more-itertools==10.2.0
nbclient==0.10.0
nbconvert==7.16.4
nbformat==5.10.4
nest-asyncio==1.6.0
networkx==3.3
notebook==7.2.0
notebook-shim==0.2.4
numpy==1.26.4
omegaconf==2.3.0
openpyxl==3.1.2
opentelemetry-api==1.16.0
opentelemetry-sdk==1.16.0
opentelemetry-semantic-conventions==0.37b0
orjson==3.10.3
overrides==7.7.0
packaging==23.2
pandas==2.2.2
pandocfilters==1.5.1
parse==1.20.1
parso==0.8.4
pexpect==4.9.0
pillow==10.3.0
platformdirs==4.2.2
plotly==5.22.0
pluggy==1.3.0
pre-commit-hooks==4.6.0
prometheus-client==0.20.0
prompt-toolkit==3.0.43
protobuf==4.25.3
psutil==5.9.8
ptyprocess==0.7.0
pure-eval==0.2.2
pyarrow==15.0.2
pycparser==2.22
pydantic==2.7.1
pydantic-core==2.18.2
pygments==2.18.0
pyinstrument==4.6.2
pyparsing==3.1.2
pyproject-hooks==1.1.0
pytest==7.4.4
pytest-cov==3.0.0
pytest-mock==1.13.0
python-dateutil==2.9.0.post0
python-dotenv==1.0.1
python-json-logger==2.0.7
python-multipart==0.0.9
python-slugify==8.0.4
pytoolconfig==1.3.1
pytz==2024.1
pyyaml==6.0.1
pyzmq==26.0.3
querystring-parser==1.2.4
referencing==0.35.1
requests==2.32.1
rfc3339-validator==0.1.4
rfc3986-validator==0.1.1
rich==13.7.1
rope==1.13.0
rpds-py==0.18.1
ruamel-yaml==0.18.6
ruamel-yaml-clib==0.2.8
ruff==0.1.15
scikit-learn==1.4.2
scipy==1.13.0
seaborn==0.12.2
secure==0.3.0
send2trash==1.8.3
setuptools==70.0.0
shellingham==1.5.4
six==1.16.0
smmap==5.0.1
sniffio==1.3.1
soupsieve==2.5
speedscope==1.0
sqlalchemy==2.0.30
sqlparse==0.5.0
stack-data==0.6.3
starlette==0.37.2
strawberry-graphql==0.229.1
tenacity==8.3.0
terminado==0.18.1
text-unidecode==1.3
threadpoolctl==3.5.0
tinycss2==1.3.0
toml==0.10.2
toposort==1.10
tornado==6.4
traitlets==5.14.3
typer==0.12.3
types-python-dateutil==2.9.0.20240316
typing-extensions==4.11.0
tzdata==2024.1
ujson==5.10.0
uri-template==1.3.0
urllib3==2.2.1
uvicorn==0.29.0
uvloop==0.19.0
watchfiles==0.21.0
watchgod==0.8.2
wcwidth==0.2.13
webcolors==1.13
webencodings==0.5.1
websocket-client==1.8.0
websockets==12.0
werkzeug==3.0.3
wrapt==1.16.0
zipp==3.18.2

I don't know if it's something specific to my computer (2021 MacBook Pro, Apple M1 Max, 32 GB of RAM), but I anticipate that this will probably make effective performance measurements quite hard.

Some pointers, in case it helps:

@astrojuanlu
Copy link
Member

Having said that, if at least the times are ~proportional in the cold & hot cache situations, then this doesn't affect anything and we can proceed with confidence.

@datajoely
Copy link
Contributor

datajoely commented May 29, 2024

Amazing work @ankatiyar - Can you also run this when you have a lot of nodes, datasets and parameters for the catalog to resolve? My hunch is that is the real bottleneck felt with users at scale.

@astrojuanlu
Copy link
Member

@datajoely Conversation about performance is spread around different issues that tackle different aspects:

Startup affects even simple commands, like kedro info and kedro registry list.

@datajoely
Copy link
Contributor

Excellent - I still feel these can be thought of as sub-issues of the start up parent. Runtime performance once the session and catalog is built is fine.

@ankatiyar
Copy link
Contributor

Looking at @astrojuanlu's profile of the command, it also seems like loading of project_groups commands like the ones in kedro-mlflow also take up a bunch of time so just implementing lazy loading on kedro-viz might only solve part of the problem. kedro-mlflow ~13s and kedro-viz ~5s

I'm trying out if all commands can somehow be lazily loaded for all plugins on Kedro side. [TO BE UPDATED]

As for hot-caching, I think the times seem proportional, even on my machine, I ran the commands a few time so the profiles shown in the above comment are after the warmup. I don't know if we can do anything about the performance differences between when your OS is warmed up or not but the lazy loading and the telemetry stuff is very small effort changes for a relatively faster CLI experience on our end.

@DimedS
Copy link
Contributor

DimedS commented May 29, 2024

Great investigation, @ankatiyar! I also share the concern about cold starts versus subsequent runs. Like @astrojuanlu, my first command takes more than 10-20 seconds, while subsequent runs are much faster. I like the idea of moving kedro-telemetry to the after_command_run hook to speed up the original code execution. But will we lose information about failed runs if we make this change?

@noklam
Copy link
Contributor Author

noklam commented May 29, 2024

Looking at @astrojuanlu's profile of the command, it also seems like loading of project_groups commands like the ones in kedro-mlflow

@ankatiyar Is there anything we can do from framework side to control this instead of implementing this in the plugin library? Is the Lazy loading having no effect (does libraries still get imported?)

telemetry is a quick win, echo @DimedS we should make sure it still send telemetry when pipeline fails.

@ankatiyar
Copy link
Contributor

@noklam @DimedS
I believe the after_command_run is still executed if the run fails and it also receives the exit code of the command run. I'll confirm though

@astrojuanlu
Copy link
Member

I don't know if the architecture of pluggy or kedro allows this, but if we don't expect plugins to be able to add pipelines to the registry, maybe we should try to delay the plugin discovery process as much as possible. If one runs kedro registry list or kedro info in principle that should already be enough information to not load the CLI extensions, right?

@ankatiyar
Copy link
Contributor

ankatiyar commented May 30, 2024

Update re: Lazy Loading of Plugins

TLDR: It's possible to lazy load plugins and their commands on Kedro side

Prototype PR: #3901

Profile of regular kedro registry list

image

Updated proposal

Note

this time I have kedro-mlflow + kedro-airflow + kedro-telemetry+kedro-viz installed in the system for the below profiles. Also have consented to collection of kedro-telemetry data.

Lazy loading of plugins and their commands

In #3901 -

  • We don't load the entry points of all the plugins eagerly, just get the entry points and pass them to CommandCollection. We already use a custom version of click's CommandCollection -
    class CommandCollection(click.CommandCollection):
  • If a command is not in existing list of commands CommandCollection.list_commands(), add plugins till it is.
  • Execute the command

Profile of kedro registry list with lazy loading of plugins

image

Send telemetry info in after_command_run [Out of scope]

See: kedro-org/kedro-plugins#709
As mentioned in #1476 (comment), this would not reduce the TOTAL time taken by a command but will reduce the apparent time taken by the command.

Profile of kedro registry list with lazy loading of plugins + telemetry after_command_run

image

Profile data

kedro registry list regular with lazy plugin loading lazy + telemetry after
initialisation of CLI module 858ms 576ms 553
KedroCLI init 4.11s 270ms 287ms
sending telemetry 1s 0.8s 1s
time before the main code runs ~6s ~1.7s ~0.8s
total 6.96s 3.76s 3.8s
kedro info regular with lazy plugin loading lazy + telemetry after
initialisation of CLI module 860ms 737ms 846ms
KedroCLI init 3.97s 292ms 304ms
sending telemetry 1.05s 880ms 858ms
time before the main code runs ~5.7s ~1.9s ~0.8s
total 5.99s 2.03s 2.12s

Things to note

  • It might take same or more time to run plugin commands - there might be a better way to load only the plugin for which the command we're running than just iterating through the entrypoints till we load the plugin which has the command
  • There'll also be some delay when the command is wrong or doesn't exist. For eg kedro infroo will take a while before the error message is displayed because we'll spend some time loading plugins. I don't think this is too bad since we eagerly load these commands already right now.
  • We could still consider combining this with the LazyGroup strategy in try delaying import + lazy subcommands kedro-viz#1920 and Lazy load click subcommands #3883 or just look into improving the CLI setup on kedro-viz in general. Kedro viz could be sped up a little bit by delaying imports at least 🤔

@datajoely
Copy link
Contributor

I'm amazed kedro-telemetry is so chunky, we should absolutely defer it till after the important stuff, but I wonder if async is sensible here too.

@astrojuanlu
Copy link
Member

but I wonder if async is sensible here too.

Yeah in this case it would make sense to make it async, but... pytest-dev/pluggy#320

@ankatiyar
Copy link
Contributor

ankatiyar commented May 30, 2024

I'm amazed kedro-telemetry is so chunky, we should absolutely defer it till after the important stuff, but I wonder if async is sensible here too.

I was just testing the switch to after_command_run hook on kedro-telemetry, seems like the hook doesn't get executed when there is an error in the pipeline. (cc @noklam and @DimedS also because they asked before)
I think async would be the way to go

I'll create a separate issue for kedro-telemetry stuff and we can focus on the plugin loading stuff here.

@noklam
Copy link
Contributor Author

noklam commented May 30, 2024

after_command_run
https://github.com/kedro-org/kedro-plugins/blob/064c5d9bc491a3f96f4341a2c2d1f034b7b354cb/kedro-telemetry/kedro_telemetry/plugin.py#L185

We still have the project hook that send telemetry immediately after. We should combine the two hooks to avoid checking consent twice as well.

@astrojuanlu
Copy link
Member

For now, shall we then proceed with

? How does kedro-org/kedro-viz#1920 interact or conflict with #3901 ?

@ankatiyar
Copy link
Contributor

For now, shall we then proceed with

? How does kedro-org/kedro-viz#1920 interact or conflict with #3901 ?

These solutions can be used together of course,

I think it's definitely worth refactoring CLI code on kedro-viz side as there is a lot of heavy imports in their cli.py

There's another thing to note -
Right now, if you do kedro --help, you also see all plugin commands:
Screenshot 2024-06-03 at 13 53 37

After lazy plugin loading (i.e after #3901), we lose the plugin commands from the top-level help text:
Screenshot 2024-06-03 at 13 53 29

But you can always do kedro airflow --help or kedro mlflow --help

@astrojuanlu
Copy link
Member

Good point about losing visibility of the full list in kedro --help. Wondering if it's any way to retain both.

@znfgnu
Copy link

znfgnu commented Jun 6, 2024

Are there any contraindications to:

  • start kedro process in the background,
  • make cli communicate with it,
  • handle cli commands by forking the process?

This way, once loaded, the process with libraries would be ready to fork, unless stopped using cli command.

@noklam
Copy link
Contributor Author

noklam commented Jun 6, 2024

@znfgnu IPython essentially?

@datajoely
Copy link
Contributor

I think - and correct me if I'm wrong - the suggestion is about creating a long living kedro process and then a client process expose the CLI and can interact with it without the start-up penalty

@noklam
Copy link
Contributor Author

noklam commented Jun 7, 2024

Would the user need to do anything differently? If we can make this almost invisible to the users then it's great.

Is there any examples of Python base CLI doing this?

@astrojuanlu
Copy link
Member

The current notion of Kedro is that it's a library and a framework. I don't think we should underestimate the consequences of making it a service, in terms of software architecture, reusability, perception etc. Feel free to open a new discussion https://github.com/kedro-org/kedro/discussions/

@ankatiyar
Copy link
Contributor

Discussed in Tech Design on 19th June:

Two approaches discussed:

  1. Make subcommands of Kedro and Kedro-viz lazy, described in this command and prototypes in try delaying import + lazy subcommands kedro-viz#1920 and Lazy load click subcommands #3883
  2. Make loading of plugins lazy, described in this comment and prototype in Lazy load commands from plugins #3901

The drawbacks of second approach:

Decision: To go with approach 1 for now - lazy subcommands on Kedro and Kedro Viz with delayed imports, and make sure that the help texts still show up.

@astrojuanlu
Copy link
Member

Dropping this here as a interesting historical reference.

Consider the example of a Python command line program (CLI) with a number of subcommands. Each subcommand may perform different tasks, requiring the import of different dependencies. But a given invocation of the program will only execute a single subcommand, or possibly none (i.e. if just --help usage info is requested). Top-level eager imports in such a program will result in the import of many modules that will never be used at all; the time spent (possibly compiling and) executing these modules is pure waste.

To improve startup time, some large Python CLIs make imports lazy by manually placing imports inline into functions to delay imports of expensive subsystems. This manual approach is labor-intensive and fragile; one misplaced import or refactor can easily undo painstaking optimization work.

https://peps.python.org/pep-0690/#motivation (rejected)

This is just a hard problem in Python in general.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: CLI Issue/PR that addresses the CLI for Kedro Issue: Feature Request New feature or improvement to existing feature
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

8 participants