#Data Setup

We have the raw review content CSV for Boat products in volume. Which can be easily uploaded from UI.

In [0]:
# Dataframe
import pandas as pd

# Load Dataset
boat=pd.read_csv('/Volumes/techsummit25/boat_reviews/raw_data/BoatProduct.csv')

display(boat)


On the raw data we are doing some data clean-up

In [0]:
# Cleaning Price Column
new = boat["ProductPrice"].str.split(" ", n = 1, expand = True)
boat["Price"]= new[1]
boat['Price']=boat['Price'].str.replace('price₹', '')
boat['Price']=boat['Price'].str.replace(',', '').astype('float64')

# Cleaning Discount Column
new = boat["Discount"].str.split(" ", n = 1, expand = True)
boat["Disc"]= new[0]

# Cleaning no of reviews column
new = boat["NumberofReviews"].str.split(" ", n = 1, expand = True)
boat["Review"]= new[0]
boat['Review']=boat['Review'].astype('category')
boat['Review']=pd.to_numeric(boat['Review'],errors='coerce')

# Cleaning Rate column
new = boat["Rate"].str.split(" ", n = 1, expand = True)
boat["Rate"]= new[1]
boat["Rate"]=boat["Rate"].astype('float64')

# Dropping duplicate columns
boat.drop(["ProductPrice","Discount","NumberofReviews"],axis=1,inplace=True)
boat.head()

display(boat)


Loading the raw data in a Delta Table

In [0]:
# Convert pandas dataframe to spark dataframe
boat_spark = spark.createDataFrame(boat)

# Save the spark dataframe to a Delta table
boat_spark.write.format("delta").mode("overwrite").saveAsTable("techsummit25.boat_reviews.boatproductdelta")

spark.sql("DELETE FROM techsummit25.boat_reviews.boatproductdelta WHERE ProductName = 'ProductName'")

# Display the Delta table
display(spark.sql("SELECT * FROM techsummit25.boat_reviews.boatproductdelta"))

Creating a SQL function to get the average number of reviews for a product

In [0]:
%sql
CREATE OR REPLACE FUNCTION techsummit25.boat_reviews.avg_score(p STRING) RETURNS FLOAT
    COMMENT 'Get the average number of reviews for a product'
    RETURN SELECT round(AVG(Review)) FROM techsummit25.boat_reviews.boatproductdelta WHERE Lower(ProductName) like CONCAT('%', lower(p), '%');

Creating a SQL function to get the review comments for a product

In [0]:
%sql
CREATE OR REPLACE FUNCTION techsummit25.boat_reviews.comments(p STRING) RETURNS TABLE (comments STRING)
    COMMENT 'Get all review comments for a product'
    RETURN SELECT summary FROM techsummit25.boat_reviews.boatproductdelta WHERE Lower(ProductName) like CONCAT('%', lower(p), '%');

#Agent Creation

This part notebook created by an AI Playground export. 

This notebook uses Mosaic AI Agent Framework ([AWS](https://docs.databricks.com/en/generative-ai/retrieval-augmented-generation.html) | [Azure](https://learn.microsoft.com/en-us/azure/databricks/generative-ai/retrieval-augmented-generation)) to recreate your agent from the AI Playground. It defines a LangChain agent that has access to the tools from the source Playground session.

Use this notebook to iterate on and modify the agent. For example, you could add more tools or change the system prompt.

 **_NOTE:_**  This notebook uses LangChain, however AI Agent Framework is compatible with other agent frameworks like Pyfunc and LlamaIndex.



In [0]:
%pip install -U -qqqq databricks-agents mlflow langchain==0.2.16 langchain_core langchain-community==0.2.16 langgraph==0.2.16 pydantic
dbutils.library.restartPython()

## Import and setup

Use `mlflow.langchain.autolog()` to set up [MLflow traces](https://docs.databricks.com/en/mlflow/mlflow-tracing.html).

In [0]:
import mlflow
from mlflow.models import ModelConfig

mlflow.langchain.autolog()
config = ModelConfig(development_config="boat_agent_config.yml")

## Define the chat model and tools
Create a LangChain chat model that supports [LangGraph tool](https://langchain-ai.github.io/langgraph/how-tos/tool-calling/) calling.

Modify the tools your agent has access to by modifying the `uc_functions` list in [config.yml]($./boat_agent_config.yml). Any non-UC function spec tools can be defined in this notebook. See [LangChain - How to create tools](https://python.langchain.com/v0.2/docs/how_to/custom_tools/) and [LangChain - Using built-in tools](https://python.langchain.com/v0.2/docs/how_to/tools_builtin/).

 **_NOTE:_**  This notebook uses LangChain, however AI Agent Framework is compatible with other agent frameworks like Pyfunc and LlamaIndex.

In [0]:
from langchain_community.chat_models import ChatDatabricks
from langchain_community.tools.databricks import UCFunctionToolkit

# Create the llm
llm = ChatDatabricks(endpoint=config.get("llm_endpoint"))

uc_functions = config.get("uc_functions")

uc_function_tools = (
    UCFunctionToolkit(warehouse_id=config.get("warehouse_id"))
    .include(*uc_functions)
    .get_tools()
)



## Output parsers
Databricks interfaces, such as the AI Playground, can optionally display pretty-printed tool calls.

Use the following helper functions to parse the LLM's output into the expected format.

In [0]:
from typing import Iterator, Dict, Any
from langchain_core.messages import (
    AIMessage,
    HumanMessage,
    ToolMessage,
    MessageLikeRepresentation,
)

import json

uc_functions_set = {x.replace(".", "__") for x in uc_functions}


def is_uc_function(tool_name: str) -> bool:
    """Check if `tool_name` is in `uc_functions` or belongs to a schema from config.yml."""
    if tool_name in uc_functions_set:
        return True
    for pattern in uc_functions_set:
        if "*" in pattern and tool_name.startswith(pattern[:-1]):
            return True
    return False


def stringify_tool_call(tool_call: Dict[str, Any]) -> str:
    """Convert a raw tool call into a formatted string that the playground UI expects"""
    if is_uc_function(tool_call.get("name")):
        request = json.dumps(
            {
                "id": tool_call.get("id"),
                "name": tool_call.get("name"),
                "arguments": json.dumps(tool_call.get("args", {})),
            },
            indent=2,
        )
        return f"<uc_function_call>{request}</uc_function_call>"
    else:
        # for non UC functions, return the string representation of tool calls
        # you can modify this to return a different format
        return str(tool_call)


def stringify_tool_result(tool_msg: ToolMessage) -> str:
    """Convert a ToolMessage into a formatted string that the playground UI expects"""
    if is_uc_function(tool_msg.name):
        result = json.dumps(
            {"id": tool_msg.tool_call_id, "content": tool_msg.content}, indent=2
        )
        return f"<uc_function_result>{result}</uc_function_result>"
    else:
        # for non UC functions, return the string representation of tool message
        # you can modify this to return a different format
        return str(tool_msg)


def parse_message(msg) -> str:
    """Parse different message types into their string representations"""
    # tool call result
    if isinstance(msg, ToolMessage):
        return stringify_tool_result(msg)
    # tool call
    elif isinstance(msg, AIMessage) and msg.tool_calls:
        tool_call_results = [stringify_tool_call(call) for call in msg.tool_calls]
        return "".join(tool_call_results)
    # normal HumanMessage or AIMessage (reasoning or final answer)
    elif isinstance(msg, (AIMessage, HumanMessage)):
        return msg.content
    else:
        print(f"Unexpected message type: {type(msg)}")
        return str(msg)


def wrap_output(stream: Iterator[MessageLikeRepresentation]) -> Iterator[str]:
    """
    Process and yield formatted outputs from the message stream.
    The invoke and stream langchain functions produce different output formats.
    This function handles both cases.
    """
    for event in stream:
        # the agent was called with invoke()
        if "messages" in event:
            for msg in event["messages"]:
                yield parse_message(msg) + "\n\n"
        # the agent was called with stream()
        else:
            for node in event:
                for key, messages in event[node].items():
                    if isinstance(messages, list):
                        for msg in messages:
                            yield parse_message(msg) + "\n\n"
                    else:
                        print("Unexpected value {messages} for key {key}. Expected a list of `MessageLikeRepresentation`'s")
                        yield str(messages)

## Create the agent
Use the LangGraph [`create_react_agent` function](https://langchain-ai.github.io/langgraph/how-tos/create-react-agent/#usage) to build a simple graph with the model and tools defined by [config.yml]($./boat_agent_config.yml). For more customization, you can create your own LangGraph agent by following [LangGraph - Quick Start](https://langchain-ai.github.io/langgraph/tutorials/introduction/).

In [0]:
from langchain_core.runnables import RunnableGenerator
from langgraph.prebuilt import create_react_agent

# Create the agent
system_message = config.get("agent_prompt")
agent_with_raw_output = create_react_agent(llm, uc_function_tools, state_modifier=system_message)
agent = agent_with_raw_output | RunnableGenerator(wrap_output)

## Test the agent

Interact with the agent to test its output. Since this notebook called `mlflow.langchain.autolog()` you can view the trace for each step the agent takes.

Replace this placeholder input with an appropriate domain-specific example for your agent.

In [0]:
# Input example with an appropriate domain-specific example for your agent
for event in agent.stream({"messages": [{"role": "user", "content": "what is the total number of reviews for the product  Stone Grenade?"}]}):
    print(event, "---" * 20 + "\n")

In [0]:
mlflow.models.set_model(agent)

### Log the `agent` as an MLflow model
Log the agent as code from the [agent]($./agent) notebook. See [MLflow - Models from Code](https://mlflow.org/docs/latest/models.html#models-from-code).

In [0]:
# Log the model to MLflow
import os
import mlflow

input_example = {
    "messages": [
        {
            "role": "user",
            "content": "what is the total number of reviews for the product Stone Grenade?"
        }
    ]
}

with mlflow.start_run():
    logged_agent_info = mlflow.langchain.log_model(
        lc_model=os.path.join(
            os.getcwd(),
            'boat_agent',
        ),
        pip_requirements=[
            f"langchain==0.2.16",
            f"langchain-community==0.2.16",
            f"langgraph==0.2.16",
            f"pydantic",
        ],
        model_config="boat_agent_config.yml",
        artifact_path='boat_agent',
        input_example=input_example,
    )

## Evaluate the agent with [Agent Evaluation](https://docs.databricks.com/generative-ai/agent-evaluation/index.html)

You can edit the requests or expected responses in your evaluation dataset and run evaluation as you iterate your agent, leveraging mlflow to track the computed quality metrics.

In [0]:
import pandas as pd

eval_examples = [
    {
        "request": {
            "messages": [
                {
                    "role": "user",
                    "content": "what is the total number of reviews for the product Stone Grenade?"
                }
            ]
        },
        "expected_response": "The average score for the product 'Stone Grenade' is 92.0."
    }
]

eval_dataset = pd.DataFrame(eval_examples)
display(eval_dataset)

In [0]:
import mlflow
import pandas as pd

with mlflow.start_run(run_id=logged_agent_info.run_id):
    eval_results = mlflow.evaluate(
        f"runs:/{logged_agent_info.run_id}/boat_agent",  # replace `chain` with artifact_path that you used when calling log_model.
        data=eval_dataset,  # Your evaluation dataset
        model_type="databricks-agent",  # Enable Mosaic AI Agent Evaluation
    )

# Review the evaluation results in the MLFLow UI (see console output), or access them in place:
display(eval_results.tables['eval_results'])

## Register the model to Unity Catalog

Update the `catalog`, `schema`, and `model_name` below to register the MLflow model to Unity Catalog.

In [0]:
mlflow.set_registry_uri("databricks-uc")

# define the catalog, schema, and model name for your UC model
catalog = "techsummit25"
schema = "boat_reviews"
model_name = "boat_agent"
UC_MODEL_NAME = f"{catalog}.{schema}.{model_name}"

# register the model to UC
uc_registered_model_info = mlflow.register_model(model_uri=logged_agent_info.model_uri, name=UC_MODEL_NAME)

## Deploy the agent

In [0]:
from databricks import agents

# Deploy the model to the review app and a model serving endpoint
agents.deploy(UC_MODEL_NAME, uc_registered_model_info.version)