Provide a lightweight solution to speed up session reload or create new session #2879

noklam · 2023-08-01T13:45:48Z

Quotes

Carlos Barreto
We are using Kedro as part of an event stream + Amazon ECS solution. What they want to check is if there is a way to always have the Kedro context up and running having an API call to execute the pipeline only when necessary. I was thinking that this is possible by programmatically generating the KedroContext, making it a global service, and only using specific pipeline calls. But I don’t know if we have any similar use cases implemented already, and I wanted to get some opinions on it. Today, we runs something like a kedro run inside the container, every time, and this ends up spending important warm-up seconds loading the context/dependencies into memory.

Description

As I have many development work with IPython or Jupyter, often I want to make small changes to test if it works. %reload_kedro could be quite slow and the developing experience is frustrating because for every change .

This also potentially related to #1853, #2134, #2182

kedro ipython take > 20s to start and %reload_kedro takes

Context

Enforce 1 session = 1 run #1329

After this PR, session can only be run once. The easiest way to create a new session is %reload_kedro. While %reload_kedro works, it is considerably slow with big project for a few reasons:

It recreates everything session,context,pipelines,catalog.
If certain datasets exist, it will even re-establish connection to database (slow) Lazy Loading of Catalog Items #2829
All the plugin hooks are registered again - evident by the log message

INFO Registered line magic init.py:115 'run_viz'

What's the minimal effort to recreate session?

If we look into the code, there is a self._run_called attribute and everytime we do session.run it will check if it is True.

kedro/kedro/framework/session/session.py

Lines 434 to 438 in 6913acd

    
           try: 
        
               run_result = runner.run( 
        
                   filtered_pipeline, catalog, hook_manager, session_id 
        
               ) 
        
               self._run_called = True

kedro/kedro/framework/session/session.py

Lines 366 to 371 in 6913acd

    
           if self._run_called: 
        
               raise KedroSessionError( 
        
                   "A run has already been completed as part of the" 
        
                   " active KedroSession. KedroSession has a 1-1 mapping with" 
        
                   " runs, and thus only one run should be executed per session." 
        
               )

Why do we need this check? Mainly because of session_id need to be a unique value, otherwise it can cause error in experiment tracking (kedro-viz) because it need to be a unique id. If we simply override session._run_called = False and do session.run(), almost everything will work.

Experiment-tracking is not a core feature of kedro (but kedro-viz), is there other obivous reason that we need to protect session_id from running twice?

(edited)
It could be related to the timestamp for saving versioned data. However, it's unclear to me because catalog get save_version from session_id, but there is another function that you can find in most dataset implementation.

save_version = self.resolve_save_version()

Possible Implementation

Source: #1551 (comment)

(Bonus) - KedroSession.reset() to create a new session easily? - this can potentially make the Jupyter workflow nicer. Instead of asking user to create their session with lots of details, they can just take the global session and do session.reset() #1571

Maybe implement a session.clear(), session.reset() method

Possible Alternatives

Speed up reload_kedro so the overhead is insignificant.
Remove the session._run_called checks

The text was updated successfully, but these errors were encountered:

noklam · 2023-09-21T12:31:12Z

Muhammed Afnas
12:03 PM
hi everyone,
can we initiate multiple sessions in kedro? if yes, could anyone help me with it?
kedro version - 0.18
i am building a web application where in i have to trigger the different pipelines of a kedro project based on button clicks on the dash ui.
as of now, individually it is working, but when one session is running, if i tries to trigger another session it gives a runtime error.

astrojuanlu · 2023-09-21T19:15:46Z

Experiment-tracking is not a core feature of kedro (but kedro-viz), is there other obivous reason that we need to protect session_id from running twice?

I recall there's some issue about session_id that @datajoely identified in his research. Maybe it's related?

noklam · 2023-09-21T20:19:42Z

That's more related to orchestration and it requires a way to pass a unique identifier when the run is spread to multiple KedroSession

datajoely · 2023-09-22T10:28:41Z

session_id is used for versioning too which is why it needs to be alphabetically sortable

Arguably if we kept a private session_id and exposed a parameterisable one that would be sufficient

astrojuanlu · 2023-09-22T16:54:14Z

Uh, we're sorting by session_id? Maybe we should store the datetime instead, but this might be a bit of a digression.

datajoely · 2023-09-22T17:20:54Z

The session_id was the Versioning ID way back when - @merelcht @idanov can provide more context here

astrojuanlu · 2024-02-15T11:53:29Z

Moving this to the Session milestone.

noklam added the Issue: Feature Request New feature or improvement to existing feature label Aug 1, 2023

github-actions bot mentioned this issue Sep 1, 2023

Monthly issue metrics report #2996

Closed

astrojuanlu mentioned this issue Sep 13, 2023

Kedro initialization is slow #3033

Closed

takikadiri mentioned this issue Sep 18, 2023

Allow injecting data into a KedroSession run #2169

Open

datajoely mentioned this issue Jan 1, 2024

Enhancing Pipeline Context Preservation in Runners for Deployment of Kedro (on AWS Batch) #3468

Open

Galileo-Galilei mentioned this issue Jan 21, 2024

Universal Kedro Deployment (Part 4) - Embedding kedro pipelines in third-party applications #3540

Open

astrojuanlu added this to the Something about the session milestone Feb 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Provide a lightweight solution to speed up session reload or create new session #2879

Provide a lightweight solution to speed up session reload or create new session #2879

noklam commented Aug 1, 2023 •

edited

noklam commented Sep 21, 2023

astrojuanlu commented Sep 21, 2023

noklam commented Sep 21, 2023

datajoely commented Sep 22, 2023

astrojuanlu commented Sep 22, 2023

datajoely commented Sep 22, 2023

astrojuanlu commented Feb 15, 2024

Provide a lightweight solution to speed up session reload or create new session #2879

Provide a lightweight solution to speed up session reload or create new session #2879

Comments

noklam commented Aug 1, 2023 • edited

Quotes

Description

Context

What's the minimal effort to recreate session?

Possible Implementation

Possible Alternatives

noklam commented Sep 21, 2023

astrojuanlu commented Sep 21, 2023

noklam commented Sep 21, 2023

datajoely commented Sep 22, 2023

astrojuanlu commented Sep 22, 2023

datajoely commented Sep 22, 2023

astrojuanlu commented Feb 15, 2024

noklam commented Aug 1, 2023 •

edited