    # Deploying GenAI In Production With NVIDIA & MLRun

In this demo we will learn about **NVIDIA NIMs** and how to use them with a production first mindset using **MLRun**. We will see how to deploy and operationalize the NIM with MLRun's model monitoring and finally, create a **multi-agent banking chatbot** using MLRun's GenAI Factory.

## Table of Content

1. [How to Deploy a NIM with MLRun?](#section_1)
2. [How to Use a NIM?](#section_2)
3. [How to Oprationalize a NIM using MLRun?](#section_3)
4. [MLRun's GenAI Factory](#section_4)

## Prerequisites

* **Previous Knowledge** - The demo is leaning on the following MLRun and LLM usage knoweledge:
  * MLRun basic components ([projects](https://docs.mlrun.org/en/stable/projects/project.html) and [functions](https://docs.mlrun.org/en/stable/runtimes/functions.html))
  * [LangChain](https://python.langchain.com/docs/tutorials/llm_chain/) basics (for the exmaple)
* **Python packages** - The following python packages are required. You can install them at the next cell:
  * `mlrun` - to orchestrate and operationlize the demo.
  * `langchain_nvidia_ai_endpoints` and `langchain-openai` - to write the exmaple and use the NIM.

In [None]:
!pip install mlrun langchain_nvidia_ai_endpoints langchain-openai 

* **Credentials**:
  * NVIDIA NGC API Key - To use the NIM.
  * OpenAI Key - For the monitoring (LLM as a Judge).

In [1]:
import os

NGC_API_KEY = ""
OPENAI_BASE_URL = ""
OPENAI_API_KEY = ""

os.environ["OPENAI_API_KEY"] = OPENAI_API_KEY
os.environ["OPENAI_BASE_URL"] = OPENAI_BASE_URL

* **MLRun project** - We'll create an MLRun project for the demo:

In [41]:
import mlrun

project = mlrun.get_or_create_project(name="mlrun-nvidia-nim-demo-edmond", context="./src")

> 2024-11-07 10:43:30,243 [info] Project loaded successfully: {"project_name":"mlrun-nvidia-nim-demo-edmond"}


In [3]:
import os
from src.nim_application import NIMApplication
from src.langchain_mlrun_llm import MLRun
from mlrun.features import Feature  # To log the model with inputs and outputs information

# Deploy all the real-time monitoring functions:
project.set_model_monitoring_credentials(
    os.environ["V3IO_ACCESS_KEY"],
    "v3io",
    "v3io",
    "v3io",
)

GPU_NODE_GROUP = {"app.iguazio.com/node-group" : "added-a10"}

project.log_model(
    "nim-gateway-model",
    model_file="data/my_dummy_model.pkl",
    inputs=[Feature(value_type="str", name="question")],
    outputs=[Feature(value_type="str", name="answer")],
)

<mlrun.artifacts.model.ModelArtifact at 0x7fdc16fe6520>

___
<a id="section_1"></a>
## 1. How to Deploy a NIM with MLRun?

In this section we will learn about NVIDIA NIM and how to use MLRun's application runtime to deploy it.

### 1.1. What is a NIM?

NVIDIA NIM™, part of NVIDIA AI Enterprise, provides containers to self-host GPU-accelerated inferencing microservices for pretrained and customized AI models across clouds, data centers, and workstations. NIM microservices expose industry-standard APIs for simple integration into AI applications, development frameworks, and workflows. Built on pre-optimized inference engines from NVIDIA and the community, including NVIDIA® TensorRT™ and TensorRT-LLM, NIM microservices automatically optimize response latency and throughput for each combination of foundation model and GPU system detected at runtime. NIM containers also provide standard observability data feeds and built-in support for autoscaling on Kubernetes on GPUs.

### 1.2. MLRUN's NIM Application

First, choose the NIM you wish to use. You can see the available NIMs [here](https://docs.nvidia.com/nim/large-language-models/latest/models.html). In this demo, we will use a `meta/llama3-8b-instruct` NIM. 

In [4]:
MODEL_NAME = "meta/llama3-8b-instruct"

To deploy a NIM using MLRun, we will use MLRun's `NIMApplication`. NIM Application is a wrapper for creating an application runtime to deploy a NIM as a serverless function using nuclio. The application runtime deploys a container image (the NIM image) that is exposed on a specific port. The runtime is based on top of Nuclio, and adds the application as a side-car to a Nuclio function pod while the actual function is a reverse proxy to that application. To learn more about application runtimes, go to the [documentation](https://docs.mlrun.org/en/latest/runtimes/application.html#application).

In [5]:
nim_application = NIMApplication(
    name="my-nim-5",
    model_name=MODEL_NAME,
    ngc_api_key=NGC_API_KEY,
)

> 2024-11-07 09:29:23,793 [info] Found an existing application. Status: {"state": "ready", "internal_invocation_urls": ["nuclio-mlrun-nvidia-nim-demo-edmond-my-nim-5.default-tenant.svc.cluster.local:8080"], "application_image": "nvcr.io/nim/meta/llama3-8b-instruct:latest", "external_invocation_urls": ["mlrun-nvidia-nim-demo-edmond-my-nim-5-gw.default-tenant.app.llm5.iguazio-cd1.com"], "nuclio_name": "mlrun-nvidia-nim-demo-edmond-my-nim-5", "container_image": "docker-registry.default-tenant.app.llm5.iguazio-cd1.com/nuclio/mlrun-nvidia-nim-demo-edmond-mlrun-nvidia-nim-demo-edmond-my-nim-5-processor:latest", "sidecar_name": "my-nim-5-sidecar", "address": "mlrun-nvidia-nim-demo-edmond-my-nim-5-gw.default-tenant.app.llm5.iguazio-cd1.com"}


### 1.3. Deploy and Test

Using the application's `deploy` and `invoke` methods, you can easily deploy and use the NIM.

In [6]:
nim_application.deploy(force_redeploy=False, 
                       application_node_selection=GPU_NODE_GROUP)

> 2024-11-07 08:38:01,171 [error] Failed to execute command 'kubectl delete secret mlrun-nvidia-nim-demo-edmond-my-nim-5-nim-creds': Error from server (Forbidden): secrets "mlrun-nvidia-nim-demo-edmond-my-nim-5-nim-creds" is forbidden: User "system:serviceaccount:default-tenant:jupyter-edmond-job-executor" cannot delete resource "secrets" in API group "" in the namespace "default-tenant"
> 2024-11-07 08:38:01,368 [error] Failed to execute command 'kubectl delete secret mlrun-nvidia-nim-demo-edmond-my-nim-5-ngc-api-key': Error from server (Forbidden): secrets "mlrun-nvidia-nim-demo-edmond-my-nim-5-ngc-api-key" is forbidden: User "system:serviceaccount:default-tenant:jupyter-edmond-job-executor" cannot delete resource "secrets" in API group "" in the namespace "default-tenant"
> 2024-11-07 08:38:01,481 [info] Starting remote function deploy
2024-11-07 08:38:01  (info) Deploying function
2024-11-07 08:38:01  (info) Building
2024-11-07 08:38:01  (info) Staging files and preparing base imag

Wait 7-8 minutes for the side-car to load and then try it.

In [7]:
result = nim_application.invoke(
    messages="What is the capital of Great Britain?", 
    max_tokens=128
)

result.text

'{"id":"cmpl-f1f7e1716ba6488eb2858624d1fb72b8","object":"chat.completion","created":1730969445,"model":"meta/llama3-8b-instruct","choices":[{"index":0,"message":{"role":"assistant","content":"The capital of the United Kingdom of Great Britain and Northern Ireland, commonly referred to as Great Britain, is London. However, it\'s important to note that Great Britain is actually a geographical term that refers to the island that comprises England, Scotland, and Wales. The capital of Scotland is Edinburgh, the capital of Wales is Cardiff, and the capital of England is still London."},"logprobs":null,"finish_reason":"stop","stop_reason":128009}],"usage":{"prompt_tokens":18,"total_tokens":94,"completion_tokens":76}}'

___
<a id="section_2"></a>
## 2. How to Use a NIM?

Besides of invoking the `NIMApplication` instance, we wish to use the NIM in a specific use case. Using the known framework [LangChain](https://www.langchain.com/) - a framework to build AI logic with LLMs by chaining interoperable components.

### 2.1. Intent Classification Example

Our multi-agent chatbot is constructed of 3 agents:
* Loan agent - answer all loan related questions and related company policies
* Investment agent - answer all investment related questions and related company policies
* General agent - general conversations and customer service

We need to build a classifier to choose to which agent to send the user request. This is called an intent classifier. We will show a simple exmaple of it using LangChain and NVIDIA's integration to it:

In [6]:
classifier_system_prompt = """
You are a helpful AI classifier. Given a request, classify the request to the most relevant category: [loans, investments, other].
Please answer in only one word.

For exmaple:
* human: What is a mortgage? - AI: loans
* human: What stock should I buy? - AI: investments
* human: How far is the moon? - AI: other
* human: Hi - AI: other
"""

In [7]:
from langchain_core.prompts import ChatPromptTemplate
from langchain_nvidia_ai_endpoints.llm import NVIDIA

nim_llm = NVIDIA(
    base_url=f"https://{nim_application.get_url()}",
    model=MODEL_NAME,
    max_tokens=1,
)
classifier_prompt_template = ChatPromptTemplate(
    [
        ("system", classifier_system_prompt),
        ("human", "The request: {request} - AI: "),
    ]
)
classifier_chain = classifier_prompt_template | nim_llm

### 2.2. Test

We can now test out our NIM as part of a legitimate chatbot use case:

> Note: Depends on the NIM selected, the prompt may need to adapt to it. In addition, the asnwers may need some post-processing to remove redundant tokens.

In [8]:
classifier_chain.invoke(
    {
        "request": "I need a 250k USD to open a restaurant. What options do I have?",
    },
)

' loans'

In [9]:
classifier_chain.invoke(
    {
        "request": "I want to buy ETFs. What options do I have?",
    },
)

' investments'

___
<a id="section_3"></a>
## 3. How to Operationalize a NIM using MLRun?

As an orchestrator, MLRun provides MLOps best practices for every layer of a GenAI application, operationalizing data, code and mdoels, including NIMs!

In this section we'll cover how easily **MLRun integrate into an existing code that uses NIMs and enable MLOps features** such as automatic scalling, monitoring and modularity.

### 3.1. Add a LLM Gateway

The MLRun LLM gateway is a great way to enable LLM modularity and monitoring.
* Modularity - The gateway offer a seamless transition between models and vendors, so you can use OpenAI models and NIMs using the same code and API. Changing models is a simple configuration away.
* Monitoring - The gateway allows monitoring in different levels: a specific use case, a specific model, general LLM provider and a higher levels for general usage monitroing such as latency, throughput and memory. All is done by using labels, as simple as that. 

<img src="./images/llm_gateway.png" style="width: 1200px;"/>

In this demo, we will add a LLM gateway to use the deployed NIM and allow monitoring it later. The gateway is a simple serving function of a `V2ModelServer`. To read more about model servers and serving functions, click [here](https://docs.mlrun.org/en/latest/serving/custom-model-serving-class.html):

In [42]:
serving_function = project.get_function("nim-gateway")

serving_function.add_model(
    "nim_gateway",
    llm="langchain_nvidia_ai_endpoints.llm.NVIDIA",
    class_name="LangChainModelServer",
    init_kwargs={
        "base_url": f"https://{nim_application.get_url()}",
        "model_name": MODEL_NAME,
        "max_tokens": 1,
    },
    model_path=f"store://models/{project.name}/nim-gateway-model#0:latest",
    project=project.name
)

<mlrun.serving.states.TaskStep at 0x7fdbc617a5e0>

To test it, we will use the `to_mock_server` method:

In [43]:
mock_server = serving_function.to_mock_server()

> 2024-11-07 10:43:42,134 [info] Project loaded successfully: {"project_name":"mlrun-nvidia-nim-demo-edmond"}
> 2024-11-07 10:43:42,156 [info] model nim_gateway was loaded
> 2024-11-07 10:43:42,157 [info] Loaded ['nim_gateway']


In [44]:
mock_server.test(
    path=f"/v2/models/nim_gateway/infer",
    body={"inputs": ["hello"]},
)

DBG1
{'base_url': 'https://mlrun-nvidia-nim-demo-edmond-my-nim-5-gw.default-tenant.app.llm5.iguazio-cd1.com', 'model_name': 'meta/llama3-8b-instruct', 'max_tokens': 1}
DBG1
> 2024-11-07 10:43:42,635 [info] Invoking function: {"method":"POST","path":"http://nuclio-mlrun-nvidia-nim-demo-edmond-my-nim-5.default-tenant.svc.cluster.local:8080/v1/chat/completions"}


{'id': '36a0ed190fe44f1d8b822f1b349d1b4c',
 'model_name': 'nim_gateway',
 'outputs': {'id': 'cmpl-54870ae07d3648dc81e0db69bdcdf217',
  'object': 'chat.completion',
  'created': 1730976222,
  'model': 'meta/llama3-8b-instruct',
  'choices': [{'index': 0,
    'message': {'role': 'assistant',
     'content': "Hello! It's nice to meet you. Is there something I can help you with, or would you like to chat?"},
    'logprobs': None,
    'finish_reason': 'stop',
    'stop_reason': 128009}],
  'usage': {'prompt_tokens': 11, 'total_tokens': 37, 'completion_tokens': 26}},
 'timestamp': '2024-11-07 10:43:42.634459+00:00'}

### 3.2. Add a Monitoring Application - LLM as a Judge

The MLRun's monitoring service includes built-in model monitoring and reporting capabilities. It is running **monitoring applications** on recorded data that let you get analysis of:
* Continuous Assessment
* Model performance
* Data drift
* Concept drift
* Operational performance
* And more

To learn more about MLRun monitoring and see how you can write your own monitoring applications, click [here](https://docs.mlrun.org/en/latest/concepts/model-monitoring.html) 

In this demo, we'll use MLRun's **LLM as a Judge monitoring application** to monitor our intent classifier use case. First we'll enable model monitoring:

In [20]:
project.enable_model_monitoring(
    image="mlrun/mlrun",
    base_period=1,  # frequency (in minutes) at which the monitoring applications are triggered
    deploy_histogram_data_drift_app=False,
)



Then, we'll configure the LLM judge:

> To learn more about prompting the judge, follow the [LLM monitoring demo](https://github.com/mlrun/demo-llm-monitor/tree/main).

In [10]:
classifier_judge_config = {
    "name": "Classifier Judge",
    "definition": """
The Classifier Judge is a metric used to evaluate if a model classify requests from a user into one of 3 classes correctly. 
The classes are [loans, investments, other]
""",
    "rubric": """
Classifier Judge: The details for different scores are as follows:
    - Score 0: Incorrect - The model classified the request incorrectly.
    - Score 1: Correct - The model correctly classified the request.
""",
    "examples": """
Question: What is the process to apply for a mortgage?
    Score 0: Incorrect
    Answer: "other"
    Score 1: Correct
    Answer: "loans"
Question: Hi, how are you today?
    Score 0: Incorrect
    Answer: "investments"
    Score 1: Correct
    Answer: "other"
""",
}

In [22]:
monitoring_application = project.set_model_monitoring_function(
    func="src/llm_as_a_judge.py",
    application_class="LLMAsAJudgeApplication",
    name="llm-as-a-judge",
    image="gcr.io/iguazio/llm-as-a-judge:1.7.0",
    framework="openai",
    judge_type="single-grading",
    metric_name="classifier_judge",
    model_name="gpt-4",
    prompt_config=classifier_judge_config,
)

Now, we'll deploy the monitoring application and the gateway:

In [23]:
project.deploy_function(monitoring_application)

> 2024-11-07 08:57:31,180 [info] Starting remote function deploy
2024-11-07 08:57:31  (info) Deploying function
2024-11-07 08:57:31  (info) Building
2024-11-07 08:57:31  (info) Staging files and preparing base images
2024-11-07 08:57:31  (warn) Using user provided base image, runtime interpreter version is provided by the base image
2024-11-07 08:57:31  (info) Building processor image
2024-11-07 09:03:37  (info) Build complete
2024-11-07 09:06:17  (info) Function deploy complete
> 2024-11-07 09:06:25,613 [info] Successfully deployed function: {"external_invocation_urls":[],"internal_invocation_urls":["nuclio-mlrun-nvidia-nim-demo-edmond-llm-as-a-judge.default-tenant.svc.cluster.local:8080"]}


DeployStatus(state=ready, outputs={'endpoint': 'http://nuclio-mlrun-nvidia-nim-demo-edmond-llm-as-a-judge.default-tenant.svc.cluster.local:8080', 'name': 'mlrun-nvidia-nim-demo-edmond-llm-as-a-judge'})

In [45]:
serving_function.set_tracking()
serving_function.deploy()

> 2024-11-07 10:43:51,832 [info] Starting remote function deploy
2024-11-07 10:43:52  (info) Deploying function
2024-11-07 10:43:52  (info) Building
2024-11-07 10:43:52  (info) Staging files and preparing base images
2024-11-07 10:43:52  (warn) Using user provided base image, runtime interpreter version is provided by the base image
2024-11-07 10:43:52  (info) Building processor image
2024-11-07 10:45:38  (info) Build complete
2024-11-07 10:46:34  (info) Function deploy complete
> 2024-11-07 10:46:43,682 [info] Successfully deployed function: {"external_invocation_urls":["mlrun-nvidia-nim-demo-edmond-nim-gateway.default-tenant.app.llm5.iguazio-cd1.com/"],"internal_invocation_urls":["nuclio-mlrun-nvidia-nim-demo-edmond-nim-gateway.default-tenant.svc.cluster.local:8080"]}


'http://mlrun-nvidia-nim-demo-edmond-nim-gateway.default-tenant.app.llm5.iguazio-cd1.com/'

### 3.3. Intent Classification Example

Now, looking back at our code from before, all we need to do to use the MLRun LLM Gateway is to replace the `NVIDIA` LLM instance with `MLRun` LLM instance:

In [11]:
serving_function = project.get_function("nim-gateway")

In [12]:
mlrun_llm = MLRun(
    mlrun_function=serving_function,
    model_name="nim_gateway",
    label="intent_classifier",
)
classifier_chain = classifier_prompt_template | mlrun_llm

### 3.4. Test

We'll test the chain on a dataset to compare the groud truth performance with our judge scores the use case that is using the NIM.

In [None]:
import pandas as pd
import logging
from tqdm.auto import tqdm

logging.disable(logging.CRITICAL)
dataset = pd.read_csv("data/banking-intents.csv")
result = 0

for _, row in tqdm(dataset.iterrows(), total=len(dataset)):
    request = row["prompt"]
    label = row["label"].lower().strip()
    prediction = classifier_chain.invoke(
        {
            "request": request,
        },
    )["choices"][0]["message"]["content"].lower().strip()
    result += label == prediction

print(f"Accuracy: {result / len(dataset)}")
logging.disable(logging.NOTSET)

### 3.6. Review Monitoring

Looking at the Grafana screen of our monitoring application we can see our judge would have scored the intent classifeir we had about the same!

<img src="./images/monitoring_screen.png" style="width: 1200px;"/>

___
<a id="section_4"></a>
## 4. MLRun's GenAI Factory (WIP)

Please note: MLRun's GenAI factory is currently being developed, and therefor is unstable and not ready for production. However, we can still show how it can be used in the future, please note that details might change so follow up.  

MLRun's GenAI Factory is a flexible and scalable platform extension for building generative AI applications. It features a modular architecture, allowing users to easily swap out different LLM models and storage solutions as needed while keeping MLOps best practices in check including features like experiment tracking, versioning, monitoring and more.

### 4.1. Intent Classification Example

We'll show now how to turn our intent classifier into **reusable component** with **configurable and trackable** inputs using the GenAI Factory Step. All is needed to be done, is to inherit the step class `ChainRunner` and copy paste our logic to the `_run` method. Every component we wish to be configurable can be set via the `__init__` method of the class.

So in our exmaple, we wish the classes the classifier will classify will be configurable via the `classifier_classes` argument, and of course we wish the `llm` and `prompt_template` will be configurable as well.

Both models and prompts are stored in the GenAI Factory and are versioned and auto-tracked.

In [None]:
from langchain_core.language_models.llms import LLM
from langchain_core.prompts.prompt import PromptTemplate

from genai_factory.chains.base import ChainRunner
from genai_factory.schemas import WorkflowEvent

_classifier_prompt_template = """
You are a helpful AI classifier, given the following conversation and a follow up request,
and a list of possible categories, classify the follow up request to the most relevant class.
Chat History: {chat_history}
request: {question}
classify classes: {classifier_classes}
Answer in one word, the class of the request.
"""


class IntentClassifier(ChainRunner):
    def __init__(
        self,
        llm: LLM = None,
        prompt_template: str = None,
        classifier_classes: list[str] = None,
        **kwargs,
    ):
        super().__init__(**kwargs)
        
        # Store configurables:
        self.llm = llm
        self.classifier_classes = classifier_classes
        self.prompt_template = prompt_template or _classifier_prompt_template
        
        # Create the chain:
        prompt_template = PromptTemplate.from_template(self.prompt_template)
        self._chain = prompt_template | self.llm

    def _run(self, event: WorkflowEvent) -> dict:
        response = self._chain.invoke(
            {
                "question": event.query,
                "chat_history": str(event.conversation),
                "classifier_classes": self.classifier_classes,
            }
        )
        return {
            "answer": response.content, 
            "sources": ""
        }

### 4.2. Agents Example

We can now create a multi-agent chatbot using the GenAI Factory, We'll create 3 agents - each with its own tools and expertise.<br> The agents code is available in the demo's code. We'll show here the Loan Agent as an example, using LangChain and OpenAI:

In [None]:
from langchain.agents import AgentExecutor
from langchain.agents.format_scratchpad.openai_tools import (
    format_to_openai_tool_messages,
)
from langchain.agents.output_parsers.openai_tools import OpenAIToolsAgentOutputParser
from langchain.tools.retriever import create_retriever_tool
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_openai import ChatOpenAI

from genai_factory.chains.base import ChainRunner
from genai_factory.chains.retrieval import MultiRetriever


class LoanAgent(ChainRunner):
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self._llm = None
        self.agent = None
        self.retriever = None

    @property
    def llm(self):
        if not self._llm:
            self._llm = ChatOpenAI(model="gpt-4", temperature=0.5)
        return self._llm

    def _get_agent(self):
        if self.agent:
            return self.agent
        # Create the RAG tools 
        # We use an existing step called MultiRetriever to create a data retriever tool, that is already connected to the DB 
        retriever = MultiRetriever(default_collection="default", context=self.context)
        retriever.post_init()
        self.retriever = retriever._get_retriever("default")
        data_retriever_tool = create_retriever_tool(
            self.retriever.chain.retriever.vectorstore.as_retriever(),
            "loan-data-retriever",
            "Query a retriever to get information about the bank's loan options.",
        )
        tools = [data_retriever_tool]
        llm_with_tools = self.llm.bind_tools(tools)
        # Create the prompt template 
        prompt = ChatPromptTemplate.from_messages(
            [
                (
                    "system",
                    TOOL_PROMPT,
                ),
                ("user", "{input}"),
                MessagesPlaceholder(variable_name="agent_scratchpad"),
            ]
        )
        # Create the agent
        agent = (
            {
                "input": lambda x: x["input"],
                "agent_scratchpad": lambda x: format_to_openai_tool_messages(
                    x["intermediate_steps"]
                ),
            }
            | prompt
            | llm_with_tools
            | OpenAIToolsAgentOutputParser()
        )
        return AgentExecutor(
            agent=agent, tools=tools, verbose=True, handle_parsing_errors=True
        )

    def _run(self, event):
        """ Run the agent on the given event """
        self.agent = self._get_agent()
        response = list(self.agent.stream({"input": event.query}))
        answer = response[-1]["messages"][-1].content
        return {"answer": answer, "sources": ""}


TOOL_PROMPT = str(
    "You are a helpful AI banking assistant that can provide information about the banks loans and answer loan related questions. "
    "Try to answer the question using the data retriever, if the query is related to bank loan policy, "
    "if you can't find the answer, make something up."
    "Do not answer any questions that are not related to the bank, or that are inappropriate."
)


### 4.3. Application Workflow

The rest of the workflow will include 3 agents as described above - each with its own tools and experties. The agents code is available in the demo's code. We'll construct the workflow via the following script:

<img src="./images/workflow_sketch.png" style="width: 800;"/>

In [None]:

from genai_factory.chains.base import HistorySaver, SessionLoader
from genai_factory.chains.refine import RefineQuery
from genai_factory import workflow_server

from classifier_llm import Classifier
from loans_agent import LoanAgent
from investment_agent import InvestmentAgent
from general_agent import GeneralAgent
from choice_intent import MyChoice

# We have 3 intent categories
intent_categories = ["loans", "investments", "other"]

# This is the workflow graph of our application
workflow_graph = [
    SessionLoader(),
    RefineQuery(),
    Classifier(classifier_classes=intent_categories),
    MyChoice(),
    (
        GeneralAgent(),
        LoanAgent(),
        InvestmentAgent(),
    ),
    HistorySaver(),
]

# Add our workflow to the server to be deployed 
workflow_server.add_workflow(
    name="default",
    graph=workflow_graph,
    workflow_type="application",
)

### 4.4. Running the GenAI factory

As written at the beginning of this section, MLRun's GenAI factory is still being developed and therefor is unstable, for those of you that want to give it a chance,<br> please visit the designated Github repo and try to run the "Quickstart" example first. <br>
GenAI factory: https://github.com/mlrun/genai-factory