
# 1/ Deploying our first AI Agent (RAG) application with Mosaic AI Agent Framework & Agent Evaluation

<img src="https://github.com/prasadkona/databricks_demos/blob/main/images/llm-rag-full-unstructuredio.png?raw=true" style="width: 800px; margin-left: 10px">

<br/>

## From data to chatbot in 10 minutes

Rag applications are decoupled in 2 main parts:
- The knowledge database used to add additional context and improve the bot answer
- The actual chatbot application and its review / feedback mechanism

## 1.1/ Data preparation for RAG: building and indexing our knowledge base into Databricks Vector Search

Let's start by prepraing our knowledge database. In this simple first demo, we will use UnstructuredIO to extract files from one of the 20+ sources supported. UnstructuredIO supports 70+file types. Unstructured IO extracts files from the sources, extracts text, transforms, enriches it with embeddings and delivers the consumption ready data to the Databricks as a table. 

They can automatically sync new files that get created on the supported sources and update the Databricks tavles accordingly.


In [0]:
%pip install -U --quiet databricks-sdk==0.49.0 "databricks-langchain>=0.4.0" databricks-agents mlflow[databricks] databricks-vectorsearch==0.55 langchain==0.3.19 langchain_core==0.3.37 bs4==0.0.2 markdownify==0.14.1 pydantic==2.10.1
%pip install openai

dbutils.library.restartPython()

[43mNote: you may need to restart the kernel using %restart_python or dbutils.library.restartPython() to use updated packages.[0m
[43mNote: you may need to restart the kernel using %restart_python or dbutils.library.restartPython() to use updated packages.[0m


In [0]:
%run ../_resources/00-init $reset_all_data=false

Following are set in config file used by the 00-init. Update thse parameters to reflect your environment.

* catalog = "prasad_kona_isv"
* dbName = db = "demo"
* VECTOR_SEARCH_ENDPOINT_NAME="one-env-shared-endpoint-6"
* vs_index_name ="prasad_unstructuredio_demo_index"


Create table statement for UnstructuredIO (for reference)


% sql 
-- Create table statement for data ingested by UnstructuredIO
CREATE TABLE if not exists prasad_kona_isv.demo.unstructuredio_demo_table (
  id STRING NOT NULL,
  record_id STRING NOT NULL,
  element_id STRING NOT NULL,
  text STRING,
  embeddings ARRAY<FLOAT>,
  type STRING,
  metadata VARIANT,
  text_as_html STRING GENERATED ALWAYS AS (COALESCE(get_json_object(CAST(metadata AS STRING), '$.text_as_html'), '')),
  filename   STRING GENERATED ALWAYS AS (COALESCE(get_json_object(CAST(metadata AS STRING), '$.filename'), '')), 
  CONSTRAINT `unstructuredio_demo_table_pk` PRIMARY KEY (`id`))
USING delta
TBLPROPERTIES (
  'delta.checkpoint.writeStatsAsJson' = 'false',
  'delta.checkpoint.writeStatsAsStruct' = 'true',
  'delta.enableDeletionVectors' = 'true',
  'delta.feature.appendOnly' = 'supported',
  'delta.feature.deletionVectors' = 'supported',
  'delta.feature.invariants' = 'supported',
  'delta.feature.variantType-preview' = 'supported',
  'delta.minReaderVersion' = '3',
  'delta.minWriterVersion' = '7');

-- Enable change data feed. This is required for vector search
ALTER TABLE `prasad_kona_isv`.`demo`.`unstructuredio_demo_table_v2` SET TBLPROPERTIES (delta.enableChangeDataFeed = true);



In [0]:
%sql 
-- The dataset for your knowledge base has been loaded for you by UnstructuredIO.

SELECT * FROM prasad_kona_isv.demo.unstructuredio_demo_table


## 1.2/ Vector search Endpoints

Vector search endpoints are entities where your indexes will live. Think about them as entry point to handle your search request. 

Let's start by creating our first Vector Search endpoint. Once created, you can view it in the [Vector Search Endpoints UI](#/setting/clusters/vector-search). Click on the endpoint name to see all indexes that are served by the endpoint.

Reference code to creating "Vector Search endpoint"

`
from databricks.vector_search.client import VectorSearchClient
vsc = VectorSearchClient(disable_notice=True)

if not endpoint_exists(vsc, VECTOR_SEARCH_ENDPOINT_NAME):
    vsc.create_endpoint(name=VECTOR_SEARCH_ENDPOINT_NAME, endpoint_type="STANDARD")

wait_for_vs_endpoint_to_be_ready(vsc, VECTOR_SEARCH_ENDPOINT_NAME)
print(f"Endpoint named {VECTOR_SEARCH_ENDPOINT_NAME} is ready.")
`


<img src="https://github.com/databricks-demos/dbdemos-resources/blob/main/images/product/chatbot-rag/rag-basic-prep-3.png?raw=true" style="float: right; margin-left: 10px" width="400px">


## 1.3/ Creating the Vector Search Index

Once the endpoint is created, all we now have to do is to as Databricks to create the index on top of the existing table. 

You just need to specify the text column and our embedding foundation model.  Databricks will build and synchronize the index automatically for us.

This can be done using the API, or in a few clicks within the Unity Catalog Explorer menu:

<img src="https://github.com/databricks-demos/dbdemos-resources/blob/main/images/index_creation.gif?raw=true" width="600px">


Reference code to creating "Vector Search index"


from databricks.sdk import WorkspaceClient
import databricks.sdk.service.catalog as c

`#`The table we'd like to index
source_table_fullname = f"{catalog}.{db}.databricks_documentation"
`#`Where we want to store our index
vs_index_fullname = f"{catalog}.{db}.databricks_documentation_vs_index"

if not index_exists(vsc, VECTOR_SEARCH_ENDPOINT_NAME, vs_index_fullname):
  print(f"Creating index {vs_index_fullname} on endpoint {VECTOR_SEARCH_ENDPOINT_NAME}...")
  vsc.create_delta_sync_index(
    endpoint_name=VECTOR_SEARCH_ENDPOINT_NAME,
    index_name=vs_index_fullname,
    source_table_name=source_table_fullname,
    pipeline_type="TRIGGERED",
    primary_key="id",
    embedding_source_column='content', #The column containing our text
    embedding_model_endpoint_name='databricks-gte-large-en' #The embedding endpoint used to create the embeddings
  )
  `#`Let's wait for the index to be ready and all our embeddings to be created and indexed
  wait_for_index_to_be_ready(vsc, VECTOR_SEARCH_ENDPOINT_NAME, vs_index_fullname)
else:
  `#`Trigger a sync to update our vs content with the new data saved in the table
  wait_for_index_to_be_ready(vsc, VECTOR_SEARCH_ENDPOINT_NAME, vs_index_fullname)
  vsc.get_index(VECTOR_SEARCH_ENDPOINT_NAME, vs_index_fullname).sync()

print(f"index {vs_index_fullname} on table {source_table_fullname} is ready")
`

## 1.4/ Searching for relevant content

That's all we have to do. Databricks can automatically capture and synchronize new entries in your table with the index. You can also trig the sync's as needed.

Note that depending on your dataset size and model size, index creation can take a few seconds to start and index your embeddings.

Let's give it a try and search for similar content.

*Note: `similarity_search` also support a filters parameter. This is useful to add a security layer to your RAG system: you can filter out some sensitive content based on who is doing the call (for example filter on a specific department based on the user preference).*

In [0]:
from databricks.vector_search.client import VectorSearchClient
vsc = VectorSearchClient(disable_notice=True)

vs_index_fullname = f"{catalog}.{db}.{vs_index_name}"

DEBUG_FLAG=False

In [0]:
from mlflow.deployments import get_deploy_client

# Embedding models like bge-large-en Foundation models are available using the /serving-endpoints/databricks-bge-large-en/invocations api. 
deploy_client = get_deploy_client("databricks")
embeddings = deploy_client.predict(endpoint="databricks-bge-large-en", inputs={"input": ["What is Apache Spark?"]})

#print(embeddings)

In [0]:
# UDF for embedding
from pyspark.sql.types import *
def get_embedding_for_string(text):
    response = deploy_client.predict(endpoint="agents-demo-embeddings-large", inputs={"input": text})
    e = response.data
    return e[0]['embedding']

get_embedding_for_string_udf = udf(get_embedding_for_string, ArrayType(FloatType()))
#print(get_embedding_for_string("What is a lakehouse ?"))

INFO:py4j.clientserver:Received command c on object id p0


In [0]:
# Using langchain DatabricksEmbeddings
from databricks_langchain import DatabricksEmbeddings

embed = DatabricksEmbeddings(
    endpoint="agents-demo-embeddings-large",
)

#print(embeddings)


In [0]:
# Get vector search index details via api
if(DEBUG_FLAG)
  x =vsc.get_index(VECTOR_SEARCH_ENDPOINT_NAME, vs_index_fullname)
  print(x.describe())

INFO:py4j.clientserver:Received command c on object id p0


{'name': 'prasad_kona_isv.demo.prasad_unstructuredio_demo_index', 'endpoint_name': 'one-env-shared-endpoint-6', 'primary_key': 'id', 'index_type': 'DELTA_SYNC', 'delta_sync_index_spec': {'source_table': 'prasad_kona_isv.demo.unstructuredio_demo_table', 'embedding_vector_columns': [{'name': 'embeddings', 'embedding_dimension': 3072}], 'pipeline_type': 'TRIGGERED', 'pipeline_id': 'c1659de6-1252-435a-b4e7-381c927634be'}, 'status': {'detailed_state': 'ONLINE_NO_PENDING_UPDATE', 'message': 'Index creation succeeded. Check latest status: https://e2-demo-field-eng.cloud.databricks.com/explore/data/prasad_kona_isv/demo/prasad_unstructuredio_demo_index', 'indexed_row_count': 1581, 'triggered_update_status': {'last_processed_commit_version': 11, 'last_processed_commit_timestamp': '2025-04-28T19:09:34Z'}, 'ready': True, 'index_url': 'e2-demo-field-eng.cloud.databricks.com/api/2.0/vector-search/indexes/prasad_kona_isv.demo.prasad_unstructuredio_demo_index'}, 'creator': 'prasad.kona@databricks.com'

In [0]:
question = "what is walmarts revenue in 2022?"

# Assuming you have a function to convert the question to a query vector
#query_vector = get_embedding_for_string(question)
query_vector = embed.embed_query(question)

results = vsc.get_index(VECTOR_SEARCH_ENDPOINT_NAME, vs_index_fullname).similarity_search(
  query_vector=query_vector,
  columns=[ "id","text","filename","type","text_as_html"],
  num_results=3)
docs = results.get('result', {}).get('data_array', [])
docs

INFO:py4j.clientserver:Received command c on object id p0


[NOTICE] Using a notebook authentication token. Recommended for development only. For improved performance, please use Service Principal based authentication. To disable this message, pass disable_notice=True.


[['8b704986-f77b-5c69-9947-5a651b386ef6',
  'y 31, 2020 ("fiscal 2020"). During fiscal 2022, we generated total revenues of $572.8 billion, which was comprised primarily of net sales of $567.8 billion. We maintain our principal offices in Bentonville, Arkansas. Our common stock trades on the New York Stock Exchange under the symbol "WMT."',
  'WALMART_2022_10K.pdf',
  'CompositeElement',
  '<p class="NarrativeText" id="ca70934faeb540dd884b61b1c3b91576">Walmart Inc. ("Walmart," the "Company" or "we") helps people around the world save money and live better – anytime and anywhere – by providing the opportunity to shop in both retail stores and through eCommerce, and to access our other service offerings. Through innovation, we strive to continuously improve a customer-centric experience that seamlessly integrates our eCommerce and retail stores in an omni-channel offering that saves time for our customers. Each week, we serve approximately 230 million customers who visit more than 10,500

# 2/ Deploy our chatbot model with RAG using Llama

<img src="https://github.com/prasadkona/databricks_demos/blob/main/images/llm-rag-full-unstructuredio-2.png?raw=true" style="float: right" width="500px">

We've seen how UnstructuredIO and Databricks makes it easy to ingest, prepare your documents, and deploy a Vector Search index on top of it.

Now that our Vector Search index is ready, let's deploy a langchain application.

## 2.1/ Configuring our Chain parameters

As any appliaction, a RAG chain needs some configuration for each environement (ex: different catalog for test/prod environement). 

Databricks makes this easy with Chain Configurations. You can use this object to configure any value within your app, including the different system prompts and make it easy to test and deploy newer version with better prompt.

In [0]:
# For this first basic demo, we'll keep the configuration as a minimum. In real app, you can make all your RAG as a param (such as your prompt template to easily test different prompts!)
chain_config = {
    "llm_model_serving_endpoint_name": "databricks-llama-4-maverick",  # the foundation model we want to use
    "embedding_model":"agents-demo-embeddings-large",
    "vector_search_endpoint_name": VECTOR_SEARCH_ENDPOINT_NAME,  # the endoint we want to use for vector search
    "vector_search_index": f"{catalog}.{db}.{vs_index_name}",
    "llm_prompt_template": """You are an assistant that answers questions. Use the following pieces of retrieved context to answer the question. Some pieces of context may be irrelevant, in which case you should not use them to form the answer.\n\nContext: {context}""",
}

### 2.2 Building our Langchain retriever

<img src="https://github.com/prasadkona/databricks_demos/blob/main/images/llm-rag-full-unstructuredio-2.png?raw=true" style="float: right" width="500px">

Let's start by building our Langchain retriever. 

It will be in charge of:

* Creating the input question (compute the embeddings using MosaicAI foundational models)
* Calling the vector search index to find similar documents to augment the prompt with 

Databricks Langchain wrapper makes it easy to do in one step, handling all the underlying logic and API call for you.

In [0]:
from databricks.vector_search.client import VectorSearchClient
from databricks_langchain.vectorstores import DatabricksVectorSearch
from langchain.schema.runnable import RunnableLambda
from langchain_core.output_parsers import StrOutputParser

## Enable MLflow Tracing
mlflow.langchain.autolog()

## Load the chain's configuration
model_config = mlflow.models.ModelConfig(development_config=chain_config)

## Turn the Vector Search index into a LangChain retriever
vector_search_as_retriever = DatabricksVectorSearch(
    endpoint=model_config.get("vector_search_endpoint_name"),
    index_name=model_config.get("vector_search_index"),
    columns=["id","text","filename","type","text_as_html"],
    text_column="text",
    embedding = DatabricksEmbeddings(
                    endpoint=model_config.get("embedding_model"),
                )
).as_retriever(search_kwargs={"k": 3})

# Method to format the docs returned by the retriever into the prompt (keep only the text from chunks)
def format_context(docs):
    chunk_contents = [f"Passage: {d.page_content}\n" for d in docs]
    return "".join(chunk_contents)

#Let's try our retriever chain:
relevant_docs = (vector_search_as_retriever | RunnableLambda(format_context)| StrOutputParser()).invoke('What was costco revenue in 2022?')

display_txt_as_html(relevant_docs)

INFO:py4j.clientserver:Received command c on object id p0
INFO:py4j.clientserver:Received command c on object id p0
INFO:py4j.clientserver:Received command c on object id p0


[NOTICE] Using a notebook authentication token. Recommended for development only. For improved performance, please use Service Principal based authentication. To disable this message, pass disable_notice=True.


Trace(request_id=tr-19d94399949f4c43bda5b56d03525c3d)

You can see in the results that Databricks automatically trace your chain details and you can debug each steps and review the documents retrieved.

## 2.3/ Building Databricks Chat Model to query our demo's Foundational LLM

<img src="https://github.com/prasadkona/databricks_demos/blob/main/images/llm-rag-full-unstructuredio-2.png?raw=true" style="float: right" width="500px">

Our chatbot will be using Meta's Llama open source model. However, it could be utilized with any other LLMs supported on Databricks.  

Other types of models that could be utilized include:

- Databricks Foundation models (_what we will use by default in this demo_)
- Your organization's custom, fine-tuned model
- An external model provider (_such as Azure OpenAI_)


In [0]:
from langchain_core.prompts import ChatPromptTemplate
from databricks_langchain.chat_models import ChatDatabricks
from operator import itemgetter

prompt = ChatPromptTemplate.from_messages(
    [  
        ("system", model_config.get("llm_prompt_template")), # Contains the instructions from the configuration
        ("user", "{question}") #user's questions
    ]
)

# Our foundation model answering the final prompt
model = ChatDatabricks(
    endpoint=model_config.get("llm_model_serving_endpoint_name"),
    extra_params={"temperature": 0.01, "max_tokens": 500}
)

#Let's try our prompt:
answer = (prompt | model | StrOutputParser()).invoke({'question':'What was revenue in 2022 for pepsi?', 'context': ''})
display_txt_as_html(answer)

INFO:py4j.clientserver:Received command c on object id p0
INFO:py4j.clientserver:Received command c on object id p0
INFO:py4j.clientserver:Received command c on object id p0
INFO:py4j.clientserver:Received command c on object id p0
INFO:py4j.clientserver:Received command c on object id p0
INFO:py4j.clientserver:Received command c on object id p0
INFO:py4j.clientserver:Received command c on object id p0
INFO:py4j.clientserver:Received command c on object id p0
INFO:py4j.clientserver:Received command c on object id p0
INFO:py4j.clientserver:Received command c on object id p0
INFO:py4j.clientserver:Received command c on object id p0
INFO:py4j.clientserver:Received command c on object id p0
INFO:py4j.clientserver:Received command c on object id p0
INFO:py4j.clientserver:Received command c on object id p0
INFO:py4j.clientserver:Received command c on object id p0
INFO:py4j.clientserver:Received command c on object id p0
INFO:py4j.clientserver:Received command c on object id p0
INFO:py4j.clie

Trace(request_id=tr-6acc0bcaa1544885ab677ca712013786)


## 2.4/ Putting it together in a final chain, supporting the standard Chat Completion format

<img src="https://github.com/databricks-demos/dbdemos-resources/blob/main/images/product/chatbot-rag/rag-basic-chain-4.png?raw=true" style="float: right" width="500px">


Let's now merge the retriever and the model in a single Langchain chain.

We will use a custom langchain template for our assistant to give proper answer.

We will make sure our chain support the standard Chat Completion API input schema : `{"messages": [{"role": "user", "content": "What is Retrieval-augmented Generation?"}]}`

Make sure you take some time to try different templates and adjust your assistant tone and personality for your requirement.

*Note that we won't support history in this first version, and will only take the last message as the question. See the advanced demo for a more complete example.*

In [0]:
# Return the string contents of the most recent messages: [{...}] from the user to be used as input question
def extract_user_query_string(chat_messages_array):
    return chat_messages_array[-1]["content"]

# RAG Chain
chain = (
    {
        "question": itemgetter("messages") | RunnableLambda(extract_user_query_string),
        "context": itemgetter("messages")
        | RunnableLambda(extract_user_query_string)
        | vector_search_as_retriever
        | RunnableLambda(format_context),
    }
    | prompt
    | model
    | StrOutputParser()
)


#### Databricks will track all the chain for you

<img src="https://ai-cookbook.io/_images/mlflow_trace2.gif" width="600px" style="float: right; margin-left: 10px">

As you can see in the cell result below, Databricks automatically trace the chain call. 

This makes it super easy to debug and improve your chain!

In [0]:
# Let's give it a try:
input_example = {"messages": [ {"role": "user", "content": "What was revenue in 2022 for pepsi?"}]}
answer = chain.invoke(input_example)
print(answer)

INFO:py4j.clientserver:Received command c on object id p0
INFO:py4j.clientserver:Closing down clientserver connection
INFO:py4j.clientserver:Closing down clientserver connection


[NOTICE] Using a notebook authentication token. Recommended for development only. For improved performance, please use Service Principal based authentication. To disable this message, pass disable_notice=True.


INFO:py4j.clientserver:Received command c on object id p0


According to the Consolidated Statement of Income, the Net Revenue for PepsiCo, Inc. and Subsidiaries in 2022 was $86,392 million.


Trace(request_id=tr-0a25e498e84c4b11af10235d1d27a133)

## 2.5/ Deploy a RAG Chain to a web-based UI for stakeholder feedback

Our chain is now ready! 

Let's first register the Rag Chain model to MLFlow and Unity Catalog, and then use Agent Framework to deploy to the Agent Evaluation stakeholder review application which is backed by a scalable, production-ready Model Serving endpoint.

In [0]:
from mlflow.models.resources import DatabricksVectorSearchIndex, DatabricksServingEndpoint
# Log the model to MLflow
with mlflow.start_run(run_name="pkona_rag_bot"):
  logged_chain_info = mlflow.langchain.log_model(
          #Note: In classical ML, MLflow works by serializing the model object.  In generative AI, chains often include Python packages that do not serialize.  Here, we use MLflow's new code-based logging, where we saved our chain under the chain notebook and will use this code instead of trying to serialize the object.
          lc_model=os.path.join(os.getcwd(), 'chain'),  # Chain code file e.g., /path/to/the/chain.py 
          model_config=chain_config, # Chain configuration 
          artifact_path="chain", # Required by MLflow, the chain's code/config are saved in this directory
          input_example=input_example,
          example_no_conversion=True,  # Required by MLflow to use the input_example as the chain's schema,
          # Specify resources for automatic authentication passthrough
          resources=[
            DatabricksVectorSearchIndex(index_name=model_config.get("vector_search_index")),
            DatabricksServingEndpoint(endpoint_name=model_config.get("embedding_model")),
            DatabricksServingEndpoint(endpoint_name=model_config.get("llm_model_serving_endpoint_name"))
          ]
      )

MODEL_NAME = "pkona_rag_demo"
MODEL_NAME_FQN = f"{catalog}.{db}.{MODEL_NAME}"
# Register to UC
uc_registered_model_info = mlflow.register_model(model_uri=logged_chain_info.model_uri, name=MODEL_NAME_FQN)

INFO:py4j.clientserver:Received command c on object id p0
INFO:py4j.clientserver:Received command c on object id p0
INFO:py4j.clientserver:Received command c on object id p0


[NOTICE] Using a notebook authentication token. Recommended for development only. For improved performance, please use Service Principal based authentication. To disable this message, pass disable_notice=True.


INFO:py4j.clientserver:Closing down clientserver connection
INFO:py4j.clientserver:Closing down clientserver connection
INFO:py4j.clientserver:Received command c on object id p0
INFO:py4j.clientserver:Closing down clientserver connection
INFO:py4j.clientserver:Received command c on object id p0
INFO:py4j.clientserver:Received command c on object id p0
INFO:py4j.clientserver:Received command c on object id p0
INFO:py4j.clientserver:Received command c on object id p0
INFO:py4j.clientserver:Received command c on object id p0
INFO:py4j.clientserver:Received command c on object id p0
INFO:py4j.clientserver:Received command c on object id p0
INFO:py4j.clientserver:Received command c on object id p0
INFO:py4j.clientserver:Received command c on object id p0
INFO:py4j.clientserver:Received command c on object id p0
INFO:py4j.clientserver:Received command c on object id p0
INFO:py4j.clientserver:Received command c on object id p0
INFO:py4j.clientserver:Closing down clientserver connection
INFO:p

[NOTICE] Using a notebook authentication token. Recommended for development only. For improved performance, please use Service Principal based authentication. To disable this message, pass disable_notice=True.


INFO:py4j.clientserver:Received command c on object id p0
INFO:py4j.clientserver:Received command c on object id p0
INFO:py4j.clientserver:Received command c on object id p0
INFO:py4j.clientserver:Received command c on object id p0
INFO:py4j.clientserver:Received command c on object id p0
INFO:py4j.clientserver:Received command c on object id p0
INFO:py4j.clientserver:Received command c on object id p0
INFO:py4j.clientserver:Received command c on object id p0
INFO:py4j.clientserver:Received command c on object id p0
INFO:py4j.clientserver:Received command c on object id p0
INFO:py4j.clientserver:Received command c on object id p0
INFO:py4j.clientserver:Received command c on object id p0
INFO:py4j.clientserver:Received command c on object id p0
INFO:py4j.clientserver:Received command c on object id p0
INFO:py4j.clientserver:Received command c on object id p0
INFO:py4j.clientserver:Received command c on object id p0
INFO:py4j.clientserver:Received command c on object id p0
INFO:py4j.clie

Uploading artifacts:   0%|          | 0/12 [00:00<?, ?it/s]

INFO:py4j.clientserver:Received command c on object id p0


Registered model 'prasad_kona_isv.demo.pkona_rag_demo' already exists. Creating a new version of this model...
INFO:py4j.clientserver:Received command c on object id p0


Downloading artifacts:   0%|          | 0/12 [00:00<?, ?it/s]

INFO:py4j.clientserver:Received command c on object id p0
INFO:py4j.clientserver:Closing down clientserver connection
INFO:py4j.clientserver:Closing down clientserver connection
INFO:py4j.clientserver:Closing down clientserver connection
INFO:py4j.clientserver:Closing down clientserver connection
INFO:py4j.clientserver:Closing down clientserver connection
INFO:py4j.clientserver:Closing down clientserver connection
INFO:py4j.clientserver:Closing down clientserver connection
INFO:py4j.clientserver:Closing down clientserver connection


Uploading artifacts:   0%|          | 0/12 [00:00<?, ?it/s]

Created version '3' of model 'prasad_kona_isv.demo.pkona_rag_demo'.
INFO:py4j.clientserver:Received command c on object id p0


Let's now deploy the Mosaic AI **Agent Evaluation review application** using the model we just created!

In [0]:
from databricks import agents
# Deploy to enable the Review APP and create an API endpoint
# Note: scaling down to zero will provide unexpected behavior for the chat app. Set it to false for a prod-ready application.
deployment_info = agents.deploy(MODEL_NAME_FQN, model_version=uc_registered_model_info.version, scale_to_zero=True)

instructions_to_reviewer = f"""## Instructions for Testing the AI chatbot

Your inputs are invaluable for the development team. By providing detailed feedback and corrections, you help us fix issues and improve the overall quality of the application. We rely on your expertise to identify any gaps or areas needing enhancement."""

# Add the user-facing instructions to the Review App
agents.set_review_instructions(MODEL_NAME_FQN, instructions_to_reviewer)

wait_for_model_serving_endpoint_to_be_ready(deployment_info.endpoint_name)

Downloading artifacts:   0%|          | 0/1 [00:00<?, ?it/s]

INFO:py4j.clientserver:Received command c on object id p0


Downloading artifacts:   0%|          | 0/1 [00:00<?, ?it/s]

INFO:py4j.clientserver:Received command c on object id p0
INFO:py4j.clientserver:Received command c on object id p0
INFO:py4j.clientserver:Received command c on object id p0
INFO:py4j.clientserver:Received command c on object id p0
INFO:py4j.clientserver:Received command c on object id p0
INFO:py4j.clientserver:Received command c on object id p0
INFO:py4j.clientserver:Received command c on object id p0
INFO:py4j.clientserver:Received command c on object id p0
INFO:py4j.clientserver:Received command c on object id p0
INFO:py4j.clientserver:Received command c on object id p0
INFO:py4j.clientserver:Received command c on object id p0
INFO:py4j.clientserver:Received command c on object id p0



    Deployment of prasad_kona_isv.demo.pkona_rag_demo version 3 initiated.  This can take up to 15 minutes and the Review App & Query Endpoint will not work until this deployment finishes.

    View status: https://e2-demo-field-eng.cloud.databricks.com/ml/endpoints/agents_prasad_kona_isv-demo-pkona_rag_demo
    Review App: https://e2-demo-field-eng.cloud.databricks.com/ml/review-v2/308f204122a04044a9fe43c8c97b1bac/chat
Waiting for endpoint to deploy agents_prasad_kona_isv-demo-pkona_rag_demo. Current state: EndpointState(config_update=<EndpointStateConfigUpdate.IN_PROGRESS: 'IN_PROGRESS'>, ready=<EndpointStateReady.NOT_READY: 'NOT_READY'>)


INFO:py4j.clientserver:Received command c on object id p0
INFO:py4j.clientserver:Received command c on object id p0
INFO:py4j.clientserver:Received command c on object id p0
INFO:py4j.clientserver:Received command c on object id p0
INFO:py4j.clientserver:Received command c on object id p0
INFO:py4j.clientserver:Received command c on object id p0
INFO:py4j.clientserver:Received command c on object id p0
INFO:py4j.clientserver:Received command c on object id p0
INFO:py4j.clientserver:Received command c on object id p0
INFO:py4j.clientserver:Received command c on object id p0
INFO:py4j.clientserver:Received command c on object id p0
INFO:py4j.clientserver:Received command c on object id p0
INFO:py4j.clientserver:Received command c on object id p0
INFO:py4j.clientserver:Received command c on object id p0
INFO:py4j.clientserver:Received command c on object id p0
INFO:py4j.clientserver:Received command c on object id p0
INFO:py4j.clientserver:Received command c on object id p0
INFO:py4j.clie

Waiting for endpoint to deploy agents_prasad_kona_isv-demo-pkona_rag_demo. Current state: EndpointState(config_update=<EndpointStateConfigUpdate.IN_PROGRESS: 'IN_PROGRESS'>, ready=<EndpointStateReady.NOT_READY: 'NOT_READY'>)


INFO:py4j.clientserver:Received command c on object id p0
INFO:py4j.clientserver:Received command c on object id p0
INFO:py4j.clientserver:Received command c on object id p0
INFO:py4j.clientserver:Received command c on object id p0
INFO:py4j.clientserver:Received command c on object id p0
INFO:py4j.clientserver:Received command c on object id p0
INFO:py4j.clientserver:Received command c on object id p0
INFO:py4j.clientserver:Received command c on object id p0
INFO:py4j.clientserver:Received command c on object id p0
INFO:py4j.clientserver:Received command c on object id p0
INFO:py4j.clientserver:Received command c on object id p0
INFO:py4j.clientserver:Received command c on object id p0
INFO:py4j.clientserver:Received command c on object id p0
INFO:py4j.clientserver:Received command c on object id p0
INFO:py4j.clientserver:Received command c on object id p0
INFO:py4j.clientserver:Received command c on object id p0
INFO:py4j.clientserver:Received command c on object id p0
INFO:py4j.clie

endpoint ready.


INFO:py4j.clientserver:Received command c on object id p0


# 3/ Use the Mosaic AI Agent Evaluation to evaluate your RAG applications

## 3.1/ Chat with your bot and build your validation dataset!

Our Chat Bot is now live. Databricks provides a built-in chatbot application that you can use to test the chatbot and give feedbacks on its answer.

You can easily give access to external domain experts and have them test and review the bot.  **Your domain experts do NOT need to have Databricks Workspace access** - you can assign permissions to any user in your SSO if you have enabled [SCIM](https://docs.databricks.com/en/admin/users-groups/scim/index.html)

This is a critical step to build or improve your evaluation dataset: have users ask questions to your bot, and provide the bot with output answer when they don't answer properly.

Your Chatbot is automatically capturing all stakeholder questions and bot responses, including an MLflow trace for each, into Delta Tables in your Lakehouse. On top of that, Databricks makes it easy to track feedback from your end user: if the chatbot doesn't give a good answer and the user gives a thumbdown, their feedback is included in the Delta Tables.

Once your eval dataset is ready, you'll then be able to leverage it for offline evaluation to measure your new chatbot performance, and also potentially to Fine Tune your model.
<br/>

<img src="https://github.com/databricks-demos/dbdemos-resources/blob/main/images/product/chatbot-rag/eval-framework.gif?raw=true" width="1000px">


In [0]:
print(f"\n\nReview App URL to share with your stakeholders: {deployment_info.review_app_url}")



Review App URL to share with your stakeholders: https://e2-demo-field-eng.cloud.databricks.com/ml/review-v2/308f204122a04044a9fe43c8c97b1bac/chat



## 3.2/ Evaluate your bot's quality with Mosaic AI Agent Evaluation specialized LLM judge models

Our bot is now Live. 

Evaluation is a key part of deploying a RAG application. Databricks simplify this tasks with specialized LLM models tuned to evaluate your bot's quality/cost/latency, even if ground truth is not available.

Mosaic AI Agent Evaluation evaluates:
1. Answer correctness - requires ground truth
2. Hallucination / groundness - no ground truth required
3. Answer relevance - no ground truth required
4. Retrieval precision - no ground truth required
5. (Lack of) Toxicity - no ground truth required

In this example, we'll use an evaluation set that we curated based on our internal experts using the Mosaic AI Agent Evaluation review app interface.  This proper Eval Dataset is saved as a Delta Table.

To see how to collect the dataset from the Eval App, see the [https://notebooks.databricks.com/demos/llm-rag-chatbot/index.html] => 03-advanced-app/03-Offline-Evaluation notebook.

Reference code for eval

eval_dataset = spark.table("eval_set_databricks_documentation").limit(10).toPandas()
display(eval_dataset)

## 3.1/ Run Evaluation of your Chain

Let's leverage the Mosaic AI Agent Evaluation specialized LLM to evaluate our model performance (make sure you use `databricks-rag`):

Reference code to run eval on your chain

with mlflow.start_run(run_id=logged_chain_info.run_id):
    # Evaluate the logged model
    eval_results = mlflow.evaluate(
        data=eval_dataset,
        model=logged_chain_info.model_uri,
        model_type="databricks-agent",
    )


You can open your MLFlow Experiment to review the different evaluation, and compare multiple model response to see how different prompts answer: 

<img src="https://github.com/prasadkona/databricks_demos/blob/main/images/rag-mlflow-eval.png?raw=true" width="1200px">

## Next: 

This is the end of this simple demo. 
