Investigate how session/context should be re-designed to work well for API use-cases #2182

datajoely · 2023-01-09T10:16:36Z

Discussed in #2134

^{Originally posted by illia-shkroba December 16, 2022}
Hello.

I'm trying to build RestAPI with FastAPI that runs Kedro Pipeline under the hood and come up with this solution:

import pathlib
from typing import Any, Iterable

from fastapi import Depends, FastAPI
from kedro.framework.context import KedroContext
from kedro.framework.session import KedroSession
from kedro.framework.startup import bootstrap_project

app = FastAPI(
    title="FastAPI + Kedro",
    version="0.0.1",
    license_info={
        "name": "GNU GENERAL PUBLIC LICENSE",
        "url": "https://www.gnu.org/licenses/gpl-3.0.html",
    },
)


def get_session() -> Iterable[KedroSession]:
    bootstrap_project(pathlib.Path().cwd())
    with KedroSession.create() as session:
        yield session


def get_context(session: KedroSession = Depends(get_session)) -> Iterable[KedroContext]:
    yield session.load_context()


@app.get("/")
def index(
    session: KedroSession = Depends(get_session),
    context: KedroContext = Depends(get_context),
) -> dict[str, Any]:
    session.run("math")
    catalog = context.catalog
    return catalog.load("output")

session.run("math") runs a simple pipeline that calculates a variance for the input: [1, 2, 3].

The solution seems to work as expected, but it takes nearly 2.1 seconds to finish a request:

time curl http://127.0.0.1:8000
# curl http://127.0.0.1:8000  0.00s user 0.00s system 0% cpu 2.097 total

I've noticed that session.load_context() takes about 1 second to finish. Also I've found that load_context() is used by session.run():

        session_id = self.store["session_id"]
        save_version = session_id
        extra_params = self.store.get("extra_params") or {}
        context = self.load_context()

It seems that load_context() is called twice during the request:

Inside of get_context().
Inside of session.run().

I've tried to cache the result of session.load_context() like this:

def get_context(session: KedroSession = Depends(get_session)) -> Iterable[KedroContext]:
    context = session.load_context()
    session.load_context = lambda: context
    yield context

And by doing that I've decreased the request processing time to 1.06 seconds.

time curl http://127.0.0.1:8000
# curl http://127.0.0.1:8000  0.00s user 0.00s system 0% cpu 1.062 total

Do you have any suggestions on how I can further optimize the session.run()? Should I try a different approach with a plain DataCatalog/SequentialRunner? Or maybe Kedro's implementation of load_context() should be modified to use some caching?

The text was updated successfully, but these errors were encountered:

merelcht · 2023-02-27T14:41:59Z

Related discussion: #2169 (comment)

noklam · 2023-02-28T06:02:53Z

I think this is related too, we need to document and understand what is the use case and what improvements we can make.

Usage of Kedro pipeline with web services & Deployment #1846

There are some questions we want to ask.

How common Kedro pipeline being exposed as a web endpoint?

Summary (To be updated)

Session can be used once only
Session creation is slow - creating a session for every API call is unsuitable because it runs lots of small pipelines. (significant overhead)
Runner is often used to get rid of the 1 session 1 run assumption, and directly interact with lower-level objects like DataCatalog.
API often need data injection (some parameters) - How to distribute and extend kedro pipelines #795 - How can we make Kedro pipeline work better with a RESTful API? Is there an easy way that user can pass extra data (common in a RESTful call with JSON) and trigger a Kedro pipeline?
- One example is custom Runner and interact with Catalog directly (I am guessing the use of .add_feed_dict directly to inject data) Allow injecting data into a KedroSession run #2169 (comment)
What's the downside of using Runner?
- The hook system is built for session instead of runner

astrojuanlu · 2024-02-15T11:53:09Z

Moving this to the Session milestone

astrojuanlu · 2024-04-15T08:22:57Z

In light of the interest that kedro-boot is getting (lots of mentions in Slack), that the authors @takikadiri and @Galileo-Galilei have already poured lots of thought on its design, and that we mostly agreed in Tech Design #2169 (comment) that this is an idea worth pursuing, are we ready for at least a first exploration of this issue from a technical standpoint?

@merelcht @rashidakanchwala I often hear that "the KedroSession was created for Experiment Tracking", do you happen to have any pointers? And besides, should kedro-org/kedro-viz#1624 be a blocker?

datajoely · 2024-04-15T08:27:07Z

Can I please volunteer myself for a user interview on how my teams have approached this!

merelcht added Stage: Technical Design 🎨 Ticket needs to undergo technical design before implementation and removed Stage: Technical Design 🎨 Ticket needs to undergo technical design before implementation labels Jan 26, 2023

merelcht added this to the Improve the Interactive Jupyter notebook workflow milestone Feb 6, 2023

merelcht added the Stage: Technical Design 🎨 Ticket needs to undergo technical design before implementation label Feb 6, 2023

merelcht removed this from the Improve the Interactive Jupyter notebook workflow milestone Feb 27, 2023

merelcht changed the title ~~Investigate whether we can instantiate the context less fequently~~ Investigate how session/context should be re-designed to work well for API use-cases Feb 27, 2023

merelcht added the Stage: User Research 🔬 Ticket needs to undergo user research before implementation label Feb 27, 2023

noklam mentioned this issue Aug 14, 2023

Provide a lightweight solution to speed up session reload or create new session #2879

Open

takikadiri mentioned this issue Sep 18, 2023

Allow injecting data into a KedroSession run #2169

Open

datajoely mentioned this issue Sep 28, 2023

Synthesis of research related to deployment of Kedro to modern MLOps platforms #3094

Closed

merelcht added this to the Kedro Server/Service milestone Jan 12, 2024

Galileo-Galilei mentioned this issue Jan 21, 2024

Universal Kedro Deployment (Part 4) - Embedding kedro pipelines in third-party applications #3540

Open

astrojuanlu modified the milestones: Kedro Server/Service, Something about the session Feb 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Investigate how session/context should be re-designed to work well for API use-cases #2182

Investigate how session/context should be re-designed to work well for API use-cases #2182

datajoely commented Jan 9, 2023

merelcht commented Feb 27, 2023

noklam commented Feb 28, 2023 •

edited

Loading

astrojuanlu commented Feb 15, 2024

astrojuanlu commented Apr 15, 2024

datajoely commented Apr 15, 2024

Investigate how session/context should be re-designed to work well for API use-cases #2182

Investigate how session/context should be re-designed to work well for API use-cases #2182

Comments

datajoely commented Jan 9, 2023

Discussed in #2134

merelcht commented Feb 27, 2023

noklam commented Feb 28, 2023 • edited Loading

astrojuanlu commented Feb 15, 2024

astrojuanlu commented Apr 15, 2024

datajoely commented Apr 15, 2024

noklam commented Feb 28, 2023 •

edited

Loading