[FEA]: Create LLM Engine Core Functionality #1178

mdemoret-nv · 2023-09-08T03:06:19Z

Is this a new feature, an improvement, or a change to existing functionality?

New Feature

How would you describe the priority of this feature request

Critical (currently preventing usage)

Please provide a clear description of problem this feature solves

To support working with LLMs directly, an LLM Engine should be added to Morpheus to support executing complex LLM functions directly within a pipeline.

Describe your ideal solution

The LLM engine will compose of 3 components:

Prompt Generator
LLM Service
Task Generator

The Prompt Generator is responsible for taking some input message and converting it into a prompt to be fed into the LLM service. The Prompt Generator can be as simple as a basic templating engine, and can be as complex as a full Langchain or Llamaindex pipeline.

The LLM Service will be responsible for executing queries with an LLM. It will be generic to allow multiple LLM services to be used but initially should target the service created by #1130.

The Task Generator is responsible for taking the output from the LLM service and converting it into a task to be processed. This task can either be fed back into the Prompt Generator or exit the LLM engine to be processed by the pipeline.

This issue will contain the design discussion for the above components since the API has not been finalized.

Describe any alternatives you have considered

No response

Additional context

No response

Code of Conduct

I agree to follow this project's Code of Conduct
I have searched the open feature requests and have found no duplicates for this feature request

mdemoret-nv · 2023-09-12T19:53:20Z

LLM Engine Core API

The LLMEngine class will utilize the following comonents:

LLMPropmtGenerator
1. For a given incoming task, the prompt generator is responsible for converting the input ControlMessage into a prompt to be run by the model
LLMService
1. Primary responsibility is configuring LLMClient objects to be able to process LLM requests given input prompts
2. Configures what type of service, which model, parameters to use with that model, options, etc.
3. Takes the output from the LLMPromptGenerator and sends it to the LLM service for inference
LLMTaskHandler
1. The task handler is responsible for turning the output of the model into tasks to be further processed.
2. Task handlers can generate additional tasks to be run through the LLMEngine again, or new tasks to be run by the rest of the pipeline.

`LLMEngine` API

The LLMEngine is mainly a container for the prompt generators, LLM service, and task handlers. Users can configure the functionality of the engine by adding different generators and handlers depending on their desired functionality. For example, a simple use case might add a TemplatePromptGenerator which is capable of taking input tasks and building prompts using a pre-defined template. Or a user might add a SpearPhishingQualityCheck task handler to ensure that the output from an LLM generates emails that pass some basic checks, and if the checks dont pass, it could try regenerating the emails in a second pass.

The public facing API for the LLMEngine is as follows:

class LLMEngine:
    def __init__(self, llm_service: LLMService):
        ...

    @property
    def llm_service(self) -> LLMService:
        ...

    # Managing Prompt Generators
    def add_prompt_generator(self, prompt_generator: LLMPropmtGenerator):
        ...
    def insert_prompt_generator(self, prompt_generator: LLMPropmtGenerator, index: int):
        ...
    def remove_prompt_generator(self, prompt_generator: LLMPropmtGenerator) -> bool:
        ...

    # Managing Task Handlers
    def add_task_handler(self, task_handler: LLMTaskHandler):
        ...
    def insert_task_handler(self, task_handler: LLMTaskHandler, index: int):
        ...
    def remove_task_handler(self, task_handler: LLMTaskHandler) -> bool:
        ...

    # Executing a task with a given input message
    def run(self, message: ControlMessage) -> list[ControlMessage]:
        ...

Notes:

The LLMEngine requires a LLMService when it is created. This is necessary to allow prompt generators and task handlers to access the LLMService when they are added to the engine.
Prompt generators and task handlers are ordered by priority. Not every prompt generator and task handler may be able to handle a specific task. When this happens, it passes the message on to the next handler/generator until one is capable of handling the task
If no handlers/generators can process a task, an error is generated
The run call only works with ControlMessage since it needs the specific task llm_engine on the control message to function properly
Multiple messages can be returned by run since some rows may need to be reprocessed and others are complete.

`LLMPromptGenerator` API

The LLMPromptGenerator only has a single purpose, to generate prompts from input messages. While the API is quite simple, the prompt generator can be as basic or complex as needed. For example, its possible to run entire Langchain pipelines inside of the prompt generator if needed and not use any additional task handlers. This approach would make it easy to load LLM functionality from external libraries at the cost of performance (one request at a time).

The public facing API is as follows:

class LLMPropmtGenerator(ABC):

    @abstractmethod
    def try_handle(self, engine: "LLMEngine", input_task: dict,
                   input_message: ControlMessage) -> LLMGeneratePrompt | LLMGenerateResult | None:
        pass

Notes:

The try_handle function takes in the LLMEngine to allow it to run queries itself using the engines LLMService. This is necessary to allow things like Langchain to operate completely inside of the engine without modification.
The try_handle returns None if it cannot handle the current task/message.
The try_handle function can return either LLMGeneratePrompt, to have the engine execute the LLM in the next step, or LLMGenerateResult if the LLM model does not need to be run
- LLMGenerateResult may be returned if the prompt generator has already run the model, or determines it does not need to be run.
The input_task is the result of input_message.remove_task("llm_engine"). This could be left on the message, but since this may return None, it seemed safer to pull this off the message before passing it to the handlers.

`LLMService` API

The LLMService object is an abstraction around running and executing LLM queries/completions using any external service or model type. It has taken inspiration from the LLM libraries of Langchain and Llamaindex. The LLMService does not have actual methods for running LLM models on it. Instead, you request an LLMClient from the service and use this object for executing prompts.

Using the service and client architecture allows multiple models to be used within the LLMEngine all with a single configuration. Additionally, clients can work with their service to provide things like dynamic batching, or caching.

The public facing API is as follows:

class LLMClient(ABC):

    @typing.overload
    @abstractmethod
    def generate(self, prompt: str) -> str:
        ...

    @typing.overload
    @abstractmethod
    def generate(self, prompt: list[str]) -> list[str]:
        ...

    @abstractmethod
    def generate(self, prompt: str | list[str]) -> str | list[str]:
        pass


class LLMService(ABC):

    @abstractmethod
    def get_client(self, model_name: str, model_kwargs: dict | None = None) -> LLMClient:
        pass

Notes:

A new LLMClient needs to be created for each model/model_kwargs combination. This is necessary to allow for multiple requests to be dynamically batched by forcing the user to specify these properties before running inference.
Multiple clients, even for the same model, can be created from a single service.
Client objects are not thread safe. Instead, each thread should use its own copy of the client.

`LLMTaskHandler` API

Finally, the LLMTaskHandler needs to determine what to do with the output of the LLMService. Task handlers will vary from application to application and can be very basic or very complex. Basic applications might convert every response into a new ControlMessage to be processed by the pipeline. Complex applications may perform tasks such as evaluating arbitrary code or running LLM agents. The only requirement is that the output is turned back into a ControlMessage.

The public facing API is as follows:

class LLMTaskHandler(ABC):

    @abstractmethod
    def try_handle(self, engine: "LLMEngine", input_task: dict, input_message: ControlMessage,
                   responses: LLMGenerateResult) -> list[ControlMessage] | None:
        pass

Notes:

Much like the LLMPromptGenerator the LLMTaskHandler returns None from try_handle if it cannot process this message
The input responses is always of type LLMGenerateResult since this will be output by either the prompt generator or the LLM service
The input_message is passed to the handlers to allow for reuse of either the payload, the message, or both. This eliminates overhead in creating new messages for each loop.

Outstanding Questions

Given the above design, there are still a few outstanding questions that need to be resolved:

~~How will we avoid inheritance slicing in the LLMEngine for python derived classes from LLMPromptGenerator and LLMTaskGenerator?~~
- Ideally, the LLMEngine would be fully implemented in C++ and maintain a list of std::shared_ptr<LLMPromptGenerator> and std::shared_ptr<LLMTaskGenerator>. However, this may suffer from inheritance slicing and cause the python portion to be dropped.
- Theoretically, since the LLMEngine should exceed the lifetime of any of the generators/handlers, we could work around this issue by maintaining a reference to any of the objects passed in to add_prompt_generator/add_task_handler.
~~How do we handle asyncio calls from the LLMEngine?~~
- ~~Most of the LLM libraries have async versions of each method allowing for efficient dispatching of requests without blocking.~~
- ~~We should try to take advantage of this where possible to maximize efficiency with python implementations.~~

mdemoret-nv · 2023-09-13T17:12:42Z

After some initial prototyping, the above outstanding questions have been resolved:

While inheritance slicing is an issue, unlike our message classes, ownership of the task handler and prompt generators is very clear. So we can ensure the lifetime of the python objects by simply creating a pybind11 trampoline class and saving the python object in add_task_handler and add_prompt_generator
From pybind11, its possible to start a C++ thread that can be used as an asyncio loop and if the LLMPromptGenerator or LLMTaskHandler return a coroutine from try_handle, that coroutine can be run async via asyncio.run_coroutine_threadsafe
- The big benefit here is while we await for the returned future to be complete, we can call boost::this_fiber::yield(); which works perfectly to bridge the C++ async code with the Python async code

mdemoret-nv · 2023-09-13T17:48:28Z

Below is an example showing how all of the components would be used together to run a LangChain agent executor with the LLMEngine:

# Simple class which uses the sync implementation of try_handle but always fails
class AlwaysFailPromptGenerator(LLMPromptGenerator):

    def try_handle(self, engine: LLMEngine, task: LLMTask,
                   message: ControlMessage) -> typing.Optional[typing.Union[LLMGeneratePrompt, LLMGenerateResult]]:
        # Always return None to skip this generator
        return None


# Prompt generator wrapper around a LangChain agent executor
class LangChainAgentExectorPromptGenerator(LLMPromptGenerator):

    def __init__(self, agent_executor: AgentExecutor):
        self._agent_executor = agent_executor

    async def try_handle(self, input_task: LLMTask,
                         input_message: ControlMessage) -> LLMGeneratePrompt | LLMGenerateResult | None:

        if (input_task["task_type"] != "dictionary"):
            return None

        input_keys = input_task["input_keys"]

        with input_message.payload().mutable_dataframe() as df:
            input_dict: list[dict] = df[input_keys].to_dict(orient="records")

        results = []

        for x in input_dict:
            # Await the result of the agent executor
            single_result = await self._agent_executor.arun(**x)

            results.append(single_result)

        return LLMGenerateResult(model_name=input_task["model_name"],
                                 model_kwargs=input_task["model_kwargs"],
                                 prompts=[],
                                 responses=results)


class SimpleTaskHandler(LLMTaskHandler):

    def try_handle(self, engine: LLMEngine, task: LLMTask, message: ControlMessage,
                   result: LLMGenerateResult) -> typing.Optional[list[ControlMessage]]:

        with message.payload().mutable_dataframe() as df:
            df["response"] = result.responses

        return [message]


# Create the NeMo LLM Service using our API key and org ID
llm_service = NeMoLLMService(api_key="my_api_key", org_id="my_org_id")

engine = LLMEngine(llm_service=llm_service)

# Create a LangChain agent executor using the NeMo LLM Service and specified tools
agent = initialize_agent(tools,
                         NeMoLangChainWrapper(engine.llm_service.get_client(model_name="gpt-43b-002")),
                         agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION,
                         verbose=True)

# Add 2 prompt generators to test fallback
engine.add_prompt_generator(AlwaysFailPromptGenerator())
engine.add_prompt_generator(LangChainAgentExectorPromptGenerator(agent))

# Add our task handler
engine.add_task_handler(SimpleTaskHandler())

# Create a control message with a single task which uses the LangChain agent executor
message = ControlMessage()

message.add_task("llm_query", LLMDictTask(input_keys=["input"], model_name="gpt-43b-002").dict())

payload = cudf.DataFrame({
    "input": [
        "What is the product of the 998th, 999th and 1000th prime numbers?",
        "What is the product of the 998th and 1000th prime numbers?"
    ],
})
message.payload(MessageMeta(payload))

# Finally, run the engine
result = engine.run(message)

Another easier example showing simple templating with verification of the output:

class TemplatePromptGenerator(LLMPromptGenerator):

    def try_handle(self, input_task: dict,
                    input_message: ControlMessage) -> LLMGeneratePrompt | LLMGenerateResult | None:

        if (input_task["task_type"] != "template"):
            return None

        input_keys = input_task["input_keys"]

        with input_message.payload().mutable_dataframe() as df:
            input_dict: list[dict] = df[input_keys].to_dict(orient="records")

        template: str = input_task["template"]

        prompts = [template.format(**x) for x in input_dict]

        return LLMGeneratePrompt(model_name=input_task["model_name"],
                                    model_kwargs=input_task["model_kwargs"],
                                    prompts=prompts)

class QualityCheckTaskHandler(LLMTaskHandler):

    def _check_quality(self, response: str) -> bool:
        # Some sort of check here
        pass

    def try_handle(self, engine: LLMEngine, task: LLMTask, message: ControlMessage,
                    result: LLMGenerateResult) -> typing.Optional[list[ControlMessage]]:

        # Loop over all responses and check if they pass the quality check
        passed_check = [self._check_quality(r) for r in result.responses]

        with message.payload().mutable_dataframe() as df:
            df["emails"] = result.responses

            if (not all(passed_check)):
                # Need to separate into 2 messages
                good_message = ControlMessage()
                good_message.payload(MessageMeta(df[passed_check]))

                bad_message = ControlMessage()
                bad_message.payload(MessageMeta(df[~passed_check]))

                # Set a new llm_engine task on the bad message
                bad_message.add_task("llm_query",
                                        LLMDictTask(input_keys=["input"], model_name="gpt-43b-002").dict())

                return [good_message, bad_message]

            else:
                # Return a single message
                return [message]

llm_service = NeMoLLMService(api_key="my_api_key", org_id="my_org_id")

engine = LLMEngine(llm_service=llm_service)

# Add a templating prompt generator to convert our payloads into prompts
engine.add_prompt_generator(
    TemplatePromptGenerator(
        template=("Write a brief summary of the email below to use as a subject line for the email. "
                    "Be as brief as possible.\n\n{body}")))

# Add our task handler
engine.add_task_handler(QualityCheckTaskHandler())

# Create a control message with a single task which uses the LangChain agent executor
message = ControlMessage()

message.add_task("llm_query", LLMDictTask(input_keys=["body"], model_name="gpt-43b-002").dict())

payload = cudf.DataFrame({
    "body": [
        "Email body #1...",
        "Email body #2...",
    ],
})
message.payload(MessageMeta(payload))

# Finally, run the engine
result = engine.run(message)

print(result)

mdemoret-nv · 2023-10-20T16:20:07Z

Fixed by #1292

mdemoret-nv added feature request New feature or request sherlock Issues/PRs related to Sherlock workflows and components labels Sep 8, 2023

This was referenced Sep 8, 2023

[FEA]: Create Templating Prompt Generator for the LLM Engine #1179

Closed

[FEA]: Create Langchain Prompt Generator #1180

Closed

[FEA]: Create Llamaindex LLM Node #1181

Open

[FEA]: Create Haystack LLM Node #1182

Open

mdemoret-nv added this to the 23.11 - Sherlock milestone Sep 12, 2023

mdemoret-nv mentioned this issue Oct 12, 2023

Add LLMEngine for processing LLM workflows #1261

Merged

mdemoret-nv mentioned this issue Oct 20, 2023

Improving the CoroutineRunnable class to use schedulers and support multiple in-flight messages #1292

Merged

mdemoret-nv linked a pull request Oct 20, 2023 that will close this issue

Improving the CoroutineRunnable class to use schedulers and support multiple in-flight messages #1292

Merged

mdemoret-nv closed this as completed Oct 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEA]: Create LLM Engine Core Functionality #1178

[FEA]: Create LLM Engine Core Functionality #1178

mdemoret-nv commented Sep 8, 2023

mdemoret-nv commented Sep 12, 2023 •

edited

Loading

mdemoret-nv commented Sep 13, 2023

mdemoret-nv commented Sep 13, 2023

mdemoret-nv commented Oct 20, 2023

[FEA]: Create LLM Engine Core Functionality #1178

[FEA]: Create LLM Engine Core Functionality #1178

Comments

mdemoret-nv commented Sep 8, 2023

Is this a new feature, an improvement, or a change to existing functionality?

How would you describe the priority of this feature request

Please provide a clear description of problem this feature solves

Describe your ideal solution

Describe any alternatives you have considered

Additional context

Code of Conduct

mdemoret-nv commented Sep 12, 2023 • edited Loading

LLM Engine Core API

LLMEngine API

LLMPromptGenerator API

LLMService API

LLMTaskHandler API

Outstanding Questions

mdemoret-nv commented Sep 13, 2023

mdemoret-nv commented Sep 13, 2023

mdemoret-nv commented Oct 20, 2023

mdemoret-nv commented Sep 12, 2023 •

edited

Loading

`LLMEngine` API

`LLMPromptGenerator` API

`LLMService` API

`LLMTaskHandler` API