## Register & Deploy DSPy Program

In this notebook, we will look at registering our DSPy program to UC, and deploying it to Model Serving. 

In [0]:
%pip install -q dspy databricks-agents mlflow databricks-sdk==0.50.0 
dbutils.library.restartPython()

In [0]:
import dspy
import math
import mlflow
import pandas as pd
from mlflow.models import ModelConfig

from dspy.retrieve.databricks_rm import DatabricksRM

In [0]:
config_file = "config.yaml"
model_config = ModelConfig(development_config=config_file)

In [0]:
CATALOG = model_config.get("catalog")
SCHEMA = model_config.get("schema")
VOLUME = model_config.get("volume")

VECTOR_SEARCH_ENDPOINT = model_config.get("vector_search_endpoint")
VECTOR_SEARCH_INDEX = model_config.get("vector_search_index")
index_path = f"{CATALOG}.{SCHEMA}.{VECTOR_SEARCH_INDEX}"

EMBEDDING_ENDPOINT_NAME = model_config.get("embedding_endpoint_name")

model = model_config.get("chat_endpoint_name")

LM = f"databricks/{model}"

path = f"{CATALOG}/{SCHEMA}/{VOLUME}"

We've copied over the RAG module we've defined previously

In [0]:
class RAG(dspy.Module):
  def __init__(self, for_mosaic_agent=True): 
    # setup mlflow tracing
    mlflow.dspy.autolog()

    # setup flag indicating if the object will be deployed as a Mosaic Agent
    self.for_mosaic_agent = for_mosaic_agent
    self.lm = dspy.LM(LM, cache=False)

    # setup the primary retriever pointing to the chunked documents
    self.retriever = DatabricksRM(
        databricks_index_name=index_path,
        text_column_name="page_content",
        docs_id_column_name="unique_chunk_index",
        columns=["page_content"],
        k=5,
        use_with_databricks_agent_framework=for_mosaic_agent
    )
    
    self.response_generator = dspy.Predict("context, question -> response")

  def forward(self, question):
    if self.for_mosaic_agent:
      # When using a mosaic agent, questions can be of type ChatAgentMessage or (TODO: accept ChatModelMessage) 
      question = question["messages"][-1]["content"]

    context = self.retriever(
        question
    )

    with dspy.context(lm=self.lm):
      response = self.response_generator(context=context, question=question)

      if self.for_mosaic_agent:
        return response.response
      return response

## Register to UC

In [0]:
from mlflow.models.resources import (
    DatabricksVectorSearchIndex,
    DatabricksServingEndpoint,
)
import pkg_resources

# Set the registry URI to Unity Catalog
mlflow.set_registry_uri("databricks-uc")

# Setup Agent name
uc_model_name = f"{CATALOG}.{SCHEMA}.dspy_rag"

# Instantiating Agent
mosaic_agent_rag = RAG(for_mosaic_agent=True)
mosaic_agent_rag.load("optimized_rag.json")

# Logging Agent into Unity Catalog
with mlflow.start_run(run_name="dspy_rag_oss_docs") as run:
    uc_registered_model_info = mlflow.dspy.log_model(
        mosaic_agent_rag,
        "model",
        input_example={
            "messages": [
                {
                    "role": "user",
                    "content": "What data governance and auditing capabilities does Databricks support?",
                }
            ]
        },
        task="llm/v1/chat",
        registered_model_name=uc_model_name,
        pip_requirements=[
            "mlflow==2.22.0",
            f"dspy=={pkg_resources.get_distribution('dspy').version}",
        ],
        resources=[
            DatabricksVectorSearchIndex(
                index_name=index_path
            ),
            DatabricksServingEndpoint(endpoint_name=model),
            DatabricksServingEndpoint(endpoint_name=EMBEDDING_ENDPOINT_NAME)
        ],
    )

## Deploy to Model Serving

In [0]:
from databricks.agents import deploy

model_version = uc_registered_model_info.registered_model_version

# Deploy Agent
deploy(
    model_name=uc_model_name,
    model_version=model_version
)

## Test with Review App

As shown above, we got a model serving endpoint automatically provisioned, along witht the Databricks review app. The Review App stages our compiled DSPy program in an environment where expert stakeholders can interact with it - in other words, have a conversation, ask questions, provide feedback, and so on. The review app logs all questions, answers, and feedback in an inference table so you can further analyze the LLM’s performance. In this way, the review app helps to ensure the quality and safety of the answers your application provides.