<a href="https://colab.research.google.com/github/hannahroblecollegis/jaffle_shop/blob/main/gemini/sample-apps/e2e-gen-ai-app-starter-pack/notebooks/getting_started.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
# Copyright 2024 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Getting Started - Template

<table align="left">
  <td style="text-align: center">
    <a href="https://colab.research.google.com/github/GoogleCloudPlatform/generative-ai/blob/main/gemini/sample-apps/e2e-gen-ai-app-starter-pack/notebooks/getting_started.ipynb">
      <img width="32px" src="https://www.gstatic.com/pantheon/images/bigquery/welcome_page/colab-logo.svg" alt="Google Colaboratory logo"><br> Open in Colab
    </a>
  </td>
  <td style="text-align: center">
    <a href="https://console.cloud.google.com/vertex-ai/colab/import/https:%2F%2Fraw.githubusercontent.com%2FGoogleCloudPlatform%2Fgenerative-ai%2Fmain%2Fgemini%2Fsample-apps%2Fe2e-gen-ai-app-starter-pack%2Fnotebooks%2Fgetting_started.ipynb">
      <img width="32px" src="https://lh3.googleusercontent.com/JmcxdQi-qOpctIvWKgPtrzZdJJK-J3sWE1RsfjZNwshCFgE_9fULcNpuXYTilIR2hjwN" alt="Google Cloud Colab Enterprise logo"><br> Open in Colab Enterprise
    </a>
  </td>
  <td style="text-align: center">
    <a href="https://console.cloud.google.com/vertex-ai/workbench/deploy-notebook?download_url=https://raw.githubusercontent.com/GoogleCloudPlatform/generative-ai/main/gemini/sample-apps/e2e-gen-ai-app-starter-pack/notebooks/getting_started.ipynb">
      <img src="https://www.gstatic.com/images/branding/gcpiconscolors/vertexai/v1/32px.svg" alt="Vertex AI logo"><br> Open in Vertex AI Workbench
    </a>
  </td>
  <td style="text-align: center">
    <a href="https://github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/sample-apps/e2e-gen-ai-app-starter-pack/notebooks/getting_started.ipynb">
      <img width="32px" src="https://upload.wikimedia.org/wikipedia/commons/9/91/Octicons-mark-github.svg" alt="GitHub logo"><br> View on GitHub
    </a>
  </td>
</table>

| | |
|-|-|
|Author(s) | [Elia Secchi](https://github.com/eliasecchig) |

## Overview

This tutorial walks you through the process of developing and assessing a chain - a sequence of steps that power an AI application.
These operations may include interactions with language models, utilization of tools, or data preprocessing steps, aiming to solve a given use case e.g a chatbot that provides grounded information.

You'll learn how to:

1. Build chains using three different approaches:
   - [LangChain Expression Language (LCEL)](https://python.langchain.com/docs/expression_language/)
   - [LangGraph](https://python.langchain.com/docs/langgraph/)
   - A custom Python implementation. This is to enable implementation with other SDKs ( e.g [Vertex AI SDK](https://cloud.google.com/vertex-ai/docs/python-sdk/use-vertex-ai-python-sdk ), [LlamaIndex](https://www.llamaindex.ai/))  and to allow granular control on the sequence of steps in the chain
   
2. Evaluate the performance of your chains using [Vertex AI Evaluation](https://cloud.google.com/vertex-ai/generative-ai/docs/models/evaluation-overview)

Finally, the tutorial discusses next steps for deploying your chain in a production application

By the end of this tutorial, you'll have a solid foundation for developing and refining your own Generative AI chains.

## Get Started

### Install required packages using Poetry (Recommended)

This template uses [Poetry](https://python-poetry.org/) as tool to manage project dependencies.
Poetry makes it easy to install and keep track of the packages your project needs.

To run this notebook with Poetry, follow these steps:
1. Make sure Poetry is installed. See the [relative guide for installation](https://python-poetry.org/docs/#installation).

2. Make sure that dependencies are installed. From your command line:

   ```bash
   poetry install --with streamlit,jupyter
   ```

3. Run Jupyter:

   ```bash
   poetry run jupyter
   ```
   
4. Open this notebook in the Jupyter interface.

### (Alternative) Install Vertex AI SDK and other required packages

In [2]:
%pip install --quiet --upgrade nest_asyncio
%pip install --upgrade --user --quiet langchain-core langchain-google-vertexai langchain-google-community langchain langgraph
%pip install --upgrade --user --quiet "google-cloud-aiplatform[rapid_evaluation]"

[0m

### Restart runtime

To use the newly installed packages in this Jupyter runtime, you must restart the runtime. You can do this by running the cell below, which restarts the current kernel.

The restart might take a minute or longer. After it's restarted, continue to the next step.

In [2]:
import IPython

app = IPython.Application.instance()
app.kernel.do_shutdown(True)

{'status': 'ok', 'restart': True}

<div class="alert alert-block alert-warning">
<b>⚠️ The kernel is going to restart. Wait until it's finished before continuing to the next step. ⚠️</b>
</div>

### Authenticate your notebook environment (Colab only)

If you're running this notebook on Google Colab, run the cell below to authenticate your environment.

In [3]:
import sys

if "google.colab" in sys.modules:
    from google.colab import auth

    auth.authenticate_user()

### Set Google Cloud project information and initialize Vertex AI SDK

To get started using Vertex AI, you must have an existing Google Cloud project and [enable the Vertex AI API](https://console.cloud.google.com/flows/enableapi?apiid=aiplatform.googleapis.com).

Learn more about [setting up a project and a development environment](https://cloud.google.com/vertex-ai/docs/start/cloud-environment).

In [None]:
# Use the environment variable if the user doesn't provide Project ID.
import os

import vertexai

PROJECT_ID = "as-110-s-demo"  # @param {type:"string", isTemplate: true}
if PROJECT_ID == "[your-project-id]":
    PROJECT_ID = str(os.environ.get("GOOGLE_CLOUD_PROJECT"))

LOCATION = os.environ.get("GOOGLE_CLOUD_REGION", "us-central1")

vertexai.init(project=PROJECT_ID, location=LOCATION)

### Import libraries

In [5]:
# Add the parent directory to the Python path. This allows importing modules from the parent directory
import sys

sys.path.append("../")

In [6]:
# Hannah

!gsutil cp gs://e2e-gen-ai-app-starter-pack/app-starter-pack.zip . && unzip app-starter-pack.zip && cd app-starter-pack

Copying gs://e2e-gen-ai-app-starter-pack/app-starter-pack.zip...
/ [1 files][273.0 KiB/273.0 KiB]                                                
Operation completed over 1 objects/273.0 KiB.                                    
Archive:  app-starter-pack.zip
replace app-starter-pack/streamlit/streamlit_app.py? [y]es, [n]o, [A]ll, [N]one, [r]ename: y
  inflating: app-starter-pack/streamlit/streamlit_app.py  
replace app-starter-pack/streamlit/utils/message_editing.py? [y]es, [n]o, [A]ll, [N]one, [r]ename: y
  inflating: app-starter-pack/streamlit/utils/message_editing.py  
replace app-starter-pack/streamlit/utils/local_chat_history.py? [y]es, [n]o, [A]ll, [N]one, [r]ename: y
  inflating: app-starter-pack/streamlit/utils/local_chat_history.py  
replace app-starter-pack/streamlit/utils/title_summary.py? [y]es, [n]o, [A]ll, [N]one, [r]ename: y
  inflating: app-starter-pack/streamlit/utils/title_summary.py  
replace app-starter-pack/streamlit/utils/stream_handler.py? [y]es, [n]o, [A]ll, [N]

In [15]:
#Hannah
%ls

[0m[01;34mapp[0m/             [01;34mdeployment[0m/  Makefile    poetry.lock     README.md   [01;34mtests[0m/
CONTRIBUTING.md  Dockerfile   [01;34mnotebooks[0m/  pyproject.toml  [01;34mstreamlit[0m/


In [8]:
#Hannah

!pip install traceloop.sdk



In [16]:
from collections.abc import Iterator
import json
from typing import Any, Literal

from app.eval.utils import batch_generate_messages, generate_multiturn_history
from app.patterns.custom_rag_qa.templates import (
    inspect_conversation_template,
    rag_template,
    template_docs,
)
from app.patterns.custom_rag_qa.vector_store import get_vector_store
from app.utils.output_types import OnChatModelStreamEvent, OnToolEndEvent, custom_chain
from google.cloud import aiplatform
from langchain.schema import Document
from langchain_core.messages import ToolMessage
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_core.runnables import RunnableConfig
from langchain_core.tools import tool
from langchain_google_community.vertex_rank import VertexAIRank
from langchain_google_vertexai import ChatVertexAI, VertexAIEmbeddings
from langgraph.graph import END, MessagesState, StateGraph
from langgraph.prebuilt import ToolNode
import pandas as pd
from vertexai.evaluation import CustomMetric, EvalTask
import yaml



## Chain Interface

This section outlines a possible interface for the chain, which, if implemented, ensures compatibility with the FastAPI server application included in the template. However, it's important to note that you have the flexibility to explore and implement alternative interfaces that suit their specific needs and requirements.


### Input Interface

The chain must provide an `astream_events` method that accepts a dictionary with a "messages" key.
The "messages" value should be a list of LangChain [HumanMessage](https://api.python.langchain.com/en/latest/messages/langchain_core.messages.human.HumanMessage.html), [AIMessage](https://api.python.langchain.com/en/latest/messages/langchain_core.messages.ai.AIMessage.html) objects and [ToolMessage](https://api.python.langchain.com/en/latest/messages/langchain_core.messages.tool.ToolMessage.html).

For example a possible input might be:

```py
{
    "messages": [
        HumanMessage("first"),
        AIMessage("a response"),
        HumanMessage("a follow up"),
    ]
}
```

Alternatively you can use the shortened form:

```py
{"messages": [("user", "first"), ("ai", "a response"), ("user", "a follow up")]}
```

### Output Interface

All chains use the [LangChain Stream Events (v2) API](https://python.langchain.com/docs/how_to/streaming/#using-stream-events). This API supports various use cases (simple chains, RAG, Agents). This API emits asynchronous events that can be used to stream the chain's output.

LangChain chains (LCEL, LangGraph) automatically implement the `astream_events` API.

We provide examples of emitting `astream_events`-compatible events with custom Python code, allowing implementation with other SDKs (e.g., Vertex AI, LLamaIndex).

### Customizing I/O Interfaces

To modify the Input/Output interface, update `app/server.py` and related unit and integration tests.

## Events supported

The following list defines the events that are captured and supported by the Streamlit frontend.

In [17]:
SUPPORTED_EVENTS = [
    "on_tool_start",
    "on_tool_end",
    "on_retriever_start",
    "on_retriever_end",
    "on_chat_model_stream",
]

### Define the LLM
We set up the Large Language Model (LLM) for our conversational bot.

In [18]:
llm = ChatVertexAI(model_name="gemini-1.5-flash-002", temperature=0)

### Leverage LangChain LCEL

LangChain Expression Language (LCEL) provides a declarative approach to composing chains seamlessly. Key benefits include:

1. Rapid prototyping to production deployment without code changes
2. Scalability from simple "prompt + LLM" chains to complex, multi-step workflows
3. Enhanced readability and maintainability of chain logic

For comprehensive guidance on LCEL implementation, refer to the [official documentation](https://python.langchain.com/docs/expression_language/get_started).

In [54]:
template = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            "You are an admissions representative at a small liberal arts college"
            "in the Midwest. Your role is to guide prospective students through the"
            "process of enrolling in undergraduate programs. You provide essential"
            "support, answering questions about admissions requirements, financial aid, "
            "and program options. Your goal is to help students feel confident and well-informed"
            "throughout their enrollment journey. Your responses should be informative, engaging, "
            "and tailored to the user's specific requests."
        ),
        MessagesPlaceholder(variable_name="messages"),
    ]
)

chain = template | llm

Let's test the chain with a dummy question:

In [71]:
input_message = {"messages": [("human", "How are roommates assigned?")]}

async for event in chain.astream_events(input=input_message, version="v2"):
    if event["event"] in SUPPORTED_EVENTS:
        print(event["data"])

{'chunk': AIMessageChunk(content='Room', additional_kwargs={}, response_metadata={'safety_ratings': []}, id='run-bc92cfe7-7aab-4c28-a003-75de9c72388f')}
{'chunk': AIMessageChunk(content='mate assignments at [College Name] are handled through our online housing portal, which', additional_kwargs={}, response_metadata={'safety_ratings': []}, id='run-bc92cfe7-7aab-4c28-a003-75de9c72388f')}
{'chunk': AIMessageChunk(content=" you'll access after you've accepted your offer of admission and submitted your", additional_kwargs={}, response_metadata={'safety_ratings': []}, id='run-bc92cfe7-7aab-4c28-a003-75de9c72388f')}
{'chunk': AIMessageChunk(content=" housing application.  We understand that finding a compatible roommate is important, so we strive to make the process as smooth as possible.\n\nWhile we don't", additional_kwargs={}, response_metadata={'safety_ratings': []}, id='run-bc92cfe7-7aab-4c28-a003-75de9c72388f')}
{'chunk': AIMessageChunk(content=' guarantee perfect matches, we do our bes

This methodology is used for the chain defined in the [`app/chain.py`](../app/chain.py) file.

We can also leverage the `invoke` method for synchronous invocation.

In [72]:
response = chain.invoke(input=input_message)
print(response.content)

Roommate assignments at [College Name] are handled through our online housing portal, which you'll access after you've accepted your offer of admission and submitted your housing application.  We understand that finding a compatible roommate is important, so we strive to make the process as smooth as possible.

While we don't guarantee perfect matches, we do our best to pair students based on information you provide in your housing application.  This includes things like:

* **Lifestyle Preferences:**  Do you prefer a quiet study environment or a more social atmosphere?  Are you a night owl or an early riser?  These preferences help us find compatible roommates.
* **Sleep Schedule:**  We ask about your typical sleep schedule to help us group students with similar patterns.
* **Cleanliness Habits:**  This is a crucial aspect of roommate compatibility, and we encourage honesty in your responses.
* **Hobbies and Interests:** While not a primary factor, shared interests can certainly contr

### Use LangGraph

LangGraph is a framework for building stateful, multi-actor applications with Large Language Models (LLMs).
It extends the LangChain library, allowing you to coordinate multiple chains (or actors) across multiple steps of computation in a cyclic manner.

In [22]:
# 1. Define tools


@tool
def search(query: str):
    """Simulates a web search. Use it get information on weather. E.g what is the weather like in a region"""
    if "sf" in query.lower() or "san francisco" in query.lower():
        return "It's 60 degrees and foggy."
    return "It's 90 degrees and sunny."


tools = [search]

# 2. Set up the language model
llm = llm.bind_tools(tools)


# 3. Define workflow components
def should_continue(state: MessagesState) -> Literal["tools", END]:
    """Determines whether to use tools or end the conversation."""
    last_message = state["messages"][-1]
    return "tools" if last_message.tool_calls else END


async def call_model(state: MessagesState, config: RunnableConfig):
    """Calls the language model and returns the response."""
    response = llm.invoke(state["messages"], config)
    return {"messages": response}


# 4. Create the workflow graph
workflow = StateGraph(MessagesState)
workflow.add_node("agent", call_model)
workflow.add_node("tools", ToolNode(tools))
workflow.set_entry_point("agent")

# 5. Define graph edges
workflow.add_conditional_edges("agent", should_continue)
workflow.add_edge("tools", "agent")

# 6. Compile the workflow
chain = workflow.compile()

Let's test the new chain with a dummy question:

In [23]:
input_message = {"messages": [("human", "What is the weather like in NY?")]}

async for event in chain.astream_events(input=input_message, version="v2"):
    if event["event"] in SUPPORTED_EVENTS:
        print(event["data"])

{'chunk': AIMessageChunk(content='', additional_kwargs={'function_call': {'name': 'search', 'arguments': '{"query": "what is the weather like in NY"}'}}, response_metadata={'safety_ratings': []}, id='run-fb465614-9117-4691-abbe-eaa325c42242', tool_calls=[{'name': 'search', 'args': {'query': 'what is the weather like in NY'}, 'id': '3d7e2ed9-93ad-486a-b00f-ee82891c4061', 'type': 'tool_call'}], tool_call_chunks=[{'name': 'search', 'args': '{"query": "what is the weather like in NY"}', 'id': '3d7e2ed9-93ad-486a-b00f-ee82891c4061', 'index': None, 'type': 'tool_call_chunk'}])}
{'chunk': AIMessageChunk(content='', additional_kwargs={}, response_metadata={'safety_ratings': []}, id='run-fb465614-9117-4691-abbe-eaa325c42242')}
{'chunk': AIMessageChunk(content='', additional_kwargs={}, response_metadata={'safety_ratings': [], 'finish_reason': 'STOP'}, id='run-fb465614-9117-4691-abbe-eaa325c42242', usage_metadata={'input_tokens': 35, 'output_tokens': 9, 'total_tokens': 44})}
{'input': {'query': '

This methodology is used for the chain defined in the [`app/patterns/langgraph_dummy_agent/chain.py`](../app/patterns/langgraph_dummy_agent/chain.py) file.

### Use custom python code

You can also use pure python code to orchestrate the different steps of your chain and emit `astream_events` [API compatible events](https://python.langchain.com/docs/how_to/streaming/#using-stream-events).

This offers full flexibility in how the different steps of a chain are orchestrated and allows you to include other SDK frameworks such as [Vertex AI SDK](https://cloud.google.com/vertex-ai/docs/python-sdk/use-vertex-ai-python-sdk ), [LlamaIndex](https://www.llamaindex.ai/).

We demonstrate this third methodology by implementing a RAG chain. The function `get_vector_store` provides a brute force Vector store (scikit-learn) initialized with data obtained from the [practictioners guide for MLOps](https://services.google.com/fh/files/misc/practitioners_guide_to_mlops_whitepaper.pdf).

In [25]:
#Hannah

!pip install pypdf

Collecting pypdf
  Downloading pypdf-5.0.1-py3-none-any.whl.metadata (7.4 kB)
Downloading pypdf-5.0.1-py3-none-any.whl (294 kB)
[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/294.5 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [91m━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━[0m [32m163.8/294.5 kB[0m [31m4.9 MB/s[0m eta [36m0:00:01[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m294.5/294.5 kB[0m [31m5.3 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: pypdf
Successfully installed pypdf-5.0.1


In [27]:
#Hannah

!pip install langchain-google-community[vertexaisearch]

Collecting google-cloud-discoveryengine<0.12.0,>=0.11.13 (from langchain-google-community[vertexaisearch])
  Downloading google_cloud_discoveryengine-0.11.14-py3-none-any.whl.metadata (5.2 kB)
Downloading google_cloud_discoveryengine-0.11.14-py3-none-any.whl (2.2 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.2/2.2 MB[0m [31m21.7 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: google-cloud-discoveryengine
Successfully installed google-cloud-discoveryengine-0.11.14


In [28]:
llm = ChatVertexAI(model_name="gemini-1.5-flash-002", temperature=0)
embedding = VertexAIEmbeddings(model_name="text-embedding-004")


vector_store = get_vector_store(embedding=embedding)
retriever = vector_store.as_retriever(search_kwargs={"k": 20})
compressor = VertexAIRank(
    project_id=PROJECT_ID,
    location_id="global",
    ranking_config="default_ranking_config",
    title_field="id",
    top_n=5,
)


@tool
def retrieve_docs(query: str) -> list[Document]:
    """
    Useful for retrieving relevant documents based on a query.
    Use this when you need additional information to answer a question.

    Args:
        query (str): The user's question or search query.

    Returns:
        List[Document]: A list of the top-ranked Document objects, limited to TOP_K (5) results.
    """
    retrieved_docs = retriever.invoke(query)
    ranked_docs = compressor.compress_documents(documents=retrieved_docs, query=query)
    return ranked_docs


@tool
def should_continue() -> None:
    """
    Use this tool if you determine that you have enough context to respond to the questions of the user.
    """
    return None


# Set up conversation inspector
inspect_conversation = inspect_conversation_template | llm.bind_tools(
    [retrieve_docs, should_continue], tool_choice="any"
)

# Set up response chain
response_chain = rag_template | llm


@custom_chain
def chain(
    input: dict[str, Any], **kwargs: Any
) -> Iterator[OnToolEndEvent | OnChatModelStreamEvent]:
    """
    Implement a RAG QA chain with tool calls.

    This function is decorated with `custom_chain` to offer LangChain compatible
    astream_events, support for synchronous invocation through the `invoke` method,
    and OpenTelemetry tracing.
    """
    # Inspect conversation and determine next action
    inspection_result = inspect_conversation.invoke(input)
    tool_call_result = inspection_result.tool_calls[0]

    # Execute the appropriate tool based on the inspection result
    if tool_call_result["name"] == "retrieve_docs":
        # Retrieve relevant documents
        docs = retrieve_docs.invoke(tool_call_result["args"])
        # Format the retrieved documents
        formatted_docs = template_docs.format(docs=docs)
        # Create a ToolMessage with the formatted documents
        tool_message = ToolMessage(
            tool_call_id=tool_call_result["name"],
            name=tool_call_result["name"],
            content=formatted_docs,
            artifact=docs,
        )
    else:
        # If no documents need to be retrieved, continue with the conversation
        tool_message = should_continue.invoke(tool_call_result)

    # Update input messages with new information
    input["messages"] = input["messages"] + [inspection_result, tool_message]

    # Yield tool results metadata
    yield OnToolEndEvent(
        data={"input": tool_call_result["args"], "output": tool_message}
    )

    # Stream LLM response
    for chunk in response_chain.stream(input=input):
        yield OnChatModelStreamEvent(data={"chunk": chunk})

The `@custom_chain` decorator defined in `app/utils/output_types.py`:
- Enables compatibility with the `astream_events` LangChain API interface by offering a `chain.astream_events` method.
- Provides an `invoke` method for synchronous invocation. This method can be utilized for evaluation purposes.
- Adds OpenTelemetry tracing functionality.

This methodology is used for the chain defined in `app/patterns/custom_rag_qa/chain.py` file.

Let's test the custom chain we just created.

In [29]:
input_message = {"messages": [("human", "What is MLOps?")]}

async for event in chain.astream_events(input=input_message, version="v2"):
    if event["event"] in SUPPORTED_EVENTS:
        print(event["data"])

{'input': {'query': 'What is MLOps? '}, 'output': {'content': '## Context provided:\n\n<Document 0>\n• Avoiding training-serving skews that are due to inconsistencies in data and in runtime dependencies between \ntraining environments and serving environments.\n• Handling concerns about model fairness and adversarial attacks.\nMLOps is a methodology for ML engineering that unifies ML system development (the ML element) with ML system \noperations (the Ops element). It advocates formalizing and (when beneficial) automating critical steps of ML system \nconstruction. MLOps provides a set of standardized processes and technology capabilities for building, deploying, \nand operationalizing ML systems rapidly and reliably.\nMLOps supports ML development and deployment in the way that DevOps and DataOps support application engi -\nneering and data engineering (analytics). The difference is that when you deploy a web service, you care about resil -\nience, queries per second, load balancing, 

## Evaluation

Evaluation is the activity of assessing the quality of the model's outputs, to gauge its understanding and success in fulfilling the prompt's instructions.

In the context of Generative AI, evaluation extends beyond the evaluation of the model's outputs to include the evaluation of the chain's outputs and in some cases the evaluation of the intermediate steps (for example, the evaluation of the retriever's outputs).

### Vertex AI Evaluation
To evaluate the chain's outputs, we'll utilize [Vertex AI Evaluation](https://cloud.google.com/vertex-ai/generative-ai/docs/models/evaluation-overview) to assess our AI application's performance.
Vertex AI Evaluation streamlines the evaluation process for generative AI by offering three key features:

- [Pre-built Metrics](https://cloud.google.com/vertex-ai/generative-ai/docs/models/determine-eval): It provides a library of ready-to-use metrics for common evaluation tasks, saving you time and effort in defining your own. These metrics cover a range of areas, simplifying the assessment of different aspects of your model's performance.

- [Custom Metrics](https://cloud.google.com/vertex-ai/generative-ai/docs/models/determine-eval): Beyond pre-built options, Vertex AI Evaluation allows you to define and implement custom metrics tailored to your specific needs and application requirements.

- Strong Integration with [Vertex AI Experiments](https://cloud.google.com/vertex-ai/docs/experiments/intro-vertex-ai-experiments): Vertex AI Evaluation seamlessly integrates with Vertex AI Experiments, creating a unified workflow for tracking experiments and managing evaluation results.

### Evaluation Samples

**Note**: This notebook includes a section on evaluation, but it's a placeholder which should evolve based on the needs of your app. For a set of recommended samples on evaluation please visit the [official documentation](https://cloud.google.com/vertex-ai/generative-ai/docs/models/evaluation-examples).

For a comprehensive solution to perform evaluation in Vertex AI, consider leveraging [Evals Playbook](https://github.com/GoogleCloudPlatform/applied-ai-engineering-samples/tree/main/genai-on-vertex-ai/gemini/evals_playbook), which provides recipes to streamline the experimentation and evaluation process. It showcases how you can define, track, compare, and iteratively refine experiments, customize evaluation runs and metrics and log prompts and responses.


## Evaluating a chain

Let's start by defining again a simple chain:

In [48]:
template = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            "You are an admissions representative at a small liberal arts college"
            "in the Midwest. Your role is to guide prospective students through the"
            "process of enrolling in undergraduate programs. You provide essential"
            "support, answering questions about admissions requirements, financial aid, "
            "and program options. Your goal is to help students feel confident and well-informed"
            "throughout their enrollment journey. Your responses should be informative, engaging, "
            "and tailored to the user's specific requests."
        ),
        MessagesPlaceholder(variable_name="messages"),
    ]
)

chain = template | llm

We then import the ground truth data we will use for evaluation. Data is stored in [`app/eval/data/chats.yaml`](../app/eval/data/chats.yaml)
Note: You might need to adjust the path depending on where your Jupyter kernel was initialized.

In [36]:
#Hannah

%ls

chain.py  [0m[01;34meval[0m/  __init__.py  [01;34mpatterns[0m/  [01;34m__pycache__[0m/  README.md  server.py  [01;34mutils[0m/


In [49]:
y = yaml.safe_load(open("eval/data/chats.yaml"))
df = pd.DataFrame(y)
df

Unnamed: 0,messages
0,"[{'type': 'human', 'content': 'Hi'}, {'type': ..."
1,"[{'type': 'human', 'content': 'Hi'}, {'type': ..."


We leverage the helper functions [`generate_multiturn_history`](../app/eval/utils.py) and [`batch_generate_messages`](../app/eval/utils.py) to prepare the data for evaluation and to generate the responses from the chain.

You can see below the documentation for the two functions.

In [50]:
help(generate_multiturn_history)

Help on function generate_multiturn_history in module app.eval.utils:

generate_multiturn_history(df: pandas.core.frame.DataFrame) -> pandas.core.frame.DataFrame
    Processes a DataFrame of conversations to create a multi-turn history.
    
    This function iterates through a DataFrame where each row represents a conversation.
    It extracts human and AI messages from the "messages" column and structures them
    into a new DataFrame. Each row in the output DataFrame represents a single turn
    in a conversation, including the human message, AI message, and the conversation
    history up to that point.
    
    Args:
        df (pd.DataFrame): A DataFrame where each row represents a conversation.
                           The DataFrame should have a column named "messages" containing
                           a list of alternating human and AI messages.
    
    Returns:
        pd.DataFrame: A DataFrame where each row represents a single turn in a conversation.
                

In [51]:
help(batch_generate_messages)

Help on function batch_generate_messages in module app.eval.utils:

batch_generate_messages(messages: pandas.core.frame.DataFrame, runnable: Callable[[List[Dict[str, Any]]], Dict[str, Any]], max_workers: int = 4) -> pandas.core.frame.DataFrame
    Generates AI responses to user messages using a provided runnable.
    
    Processes a Pandas DataFrame containing conversation histories and user messages, utilizing
    the specified runnable to predict AI responses in parallel.
    
    Args:
        messages (pd.DataFrame): DataFrame with a 'messages' column. Each row
            represents a conversation and contains a list of dictionaries, where
              each dictionary
            represents a message turn in the format:
    
            ```json
            [
                {"type": "human", "content": "user's message"},
                {"type": "ai", "content": "AI's response"},
                {"type": "human", "content": "current user's message"},
                ...
        

In [52]:
df = generate_multiturn_history(df)
df

Unnamed: 0,human_message,ai_message,conversation_history
0,"{'type': 'human', 'content': 'Hi'}","{'type': 'ai', 'content': 'Hi, how can I help ...",[]
1,"{'type': 'human', 'content': 'I'm looking for ...","{'type': 'ai', 'content': 'Sure, I can help yo...","[{'type': 'human', 'content': 'Hi'}, {'type': ..."
2,"{'type': 'human', 'content': 'I'm not vegetari...","{'type': 'ai', 'content': 'Okay, I ll keep tha...","[{'type': 'human', 'content': 'Hi'}, {'type': ..."
3,"{'type': 'human', 'content': 'Those all sound ...","{'type': 'ai', 'content': 'That's a great choi...","[{'type': 'human', 'content': 'Hi'}, {'type': ..."
4,"{'type': 'human', 'content': 'Thanks for your ...","{'type': 'ai', 'content': 'You're welcome! Is ...","[{'type': 'human', 'content': 'Hi'}, {'type': ..."
5,"{'type': 'human', 'content': 'No, that's all. ...","{'type': 'ai', 'content': 'You're welcome! Hav...","[{'type': 'human', 'content': 'Hi'}, {'type': ..."
6,"{'type': 'human', 'content': 'Hi'}","{'type': 'ai', 'content': 'Hi, how can I help ...",[]
7,"{'type': 'human', 'content': 'I'm looking for ...","{'type': 'ai', 'content': 'Sure, I can help yo...","[{'type': 'human', 'content': 'Hi'}, {'type': ..."
8,"{'type': 'human', 'content': 'I'm vegetarian.'}","{'type': 'ai', 'content': 'Sure, I can help yo...","[{'type': 'human', 'content': 'Hi'}, {'type': ..."
9,"{'type': 'human', 'content': 'Those all sound ...","{'type': 'ai', 'content': 'That's a great choi...","[{'type': 'human', 'content': 'Hi'}, {'type': ..."


In [41]:
scored_data = batch_generate_messages(df, chain)

100%|██████████| 10/10 [00:08<00:00,  1.23it/s]


We extract the user message and the reference (ground truth) message from dataframe so that we can use them for evaluation.

In [42]:
scored_data["user"] = scored_data["human_message"].apply(lambda x: x["content"])
scored_data["reference"] = scored_data["ai_message"].apply(lambda x: x["content"])
scored_data

Unnamed: 0,human_message,ai_message,conversation_history,response,response_obj,user,reference
0,"{'type': 'human', 'content': 'Hi'}","{'type': 'ai', 'content': 'Hi, how can I help ...","[{'type': 'human', 'content': 'Hi'}]",Hello! What can I help you cook today? I'm r...,"{'input_tokens': 64, 'output_tokens': 48, 'tot...",Hi,"Hi, how can I help you?"
1,"{'type': 'human', 'content': 'I'm looking for ...","{'type': 'ai', 'content': 'Sure, I can help yo...","[{'type': 'human', 'content': 'Hi'}, {'type': ...",I have many healthy dinner recipe recommendati...,"{'input_tokens': 90, 'output_tokens': 149, 'to...",I'm looking for a recipe for a healthy dinner....,"Sure, I can help you with that. What are your ..."
2,"{'type': 'human', 'content': 'I'm not vegetari...","{'type': 'ai', 'content': 'Okay, I ll keep tha...","[{'type': 'human', 'content': 'Hi'}, {'type': ...","Okay, here's a delicious and healthy gluten-fr...","{'input_tokens': 134, 'output_tokens': 596, 't...","I'm not vegetarian or vegan, but I am gluten-f...","Okay, I ll keep that in mind. Here are a few r..."
3,"{'type': 'human', 'content': 'Those all sound ...","{'type': 'ai', 'content': 'That's a great choi...","[{'type': 'human', 'content': 'Hi'}, {'type': ...",Excellent choice! Grilled salmon with roasted ...,"{'input_tokens': 290, 'output_tokens': 690, 't...",Those all sound great! I think I'm going to tr...,That's a great choice! I hope you enjoy it.
4,"{'type': 'human', 'content': 'Thanks for your ...","{'type': 'ai', 'content': 'You're welcome! Is ...","[{'type': 'human', 'content': 'Hi'}, {'type': ...","You're welcome! To help you further, let's ge...","{'input_tokens': 308, 'output_tokens': 83, 'to...",Thanks for your help!,You're welcome! Is there anything else I can h...
5,"{'type': 'human', 'content': 'No, that's all. ...","{'type': 'ai', 'content': 'You're welcome! Hav...","[{'type': 'human', 'content': 'Hi'}, {'type': ...",You're very welcome! Enjoy your delicious and...,"{'input_tokens': 334, 'output_tokens': 30, 'to...","No, that's all. Thanks again!",You're welcome! Have a great day!
6,"{'type': 'human', 'content': 'Hi'}","{'type': 'ai', 'content': 'Hi, how can I help ...","[{'type': 'human', 'content': 'Hi'}]",Hello! What can I help you cook today? I'm r...,"{'input_tokens': 64, 'output_tokens': 48, 'tot...",Hi,"Hi, how can I help you?"
7,"{'type': 'human', 'content': 'I'm looking for ...","{'type': 'ai', 'content': 'Sure, I can help yo...","[{'type': 'human', 'content': 'Hi'}, {'type': ...","For a romantic dinner, I recommend **Pan-Seare...","{'input_tokens': 90, 'output_tokens': 608, 'to...",I'm looking for a recipe for a romantic dinner...,"Sure, I can help you with that. What are your ..."
8,"{'type': 'human', 'content': 'I'm vegetarian.'}","{'type': 'ai', 'content': 'Sure, I can help yo...","[{'type': 'human', 'content': 'Hi'}, {'type': ...",Excellent! Here's a recipe for a romantic veg...,"{'input_tokens': 124, 'output_tokens': 809, 't...",I'm vegetarian.,"Sure, I can help you find a healthy vegetarian..."
9,"{'type': 'human', 'content': 'Those all sound ...","{'type': 'ai', 'content': 'That's a great choi...","[{'type': 'human', 'content': 'Hi'}, {'type': ...",Excellent choice! Burnt Aubergine Veggie Chil...,"{'input_tokens': 412, 'output_tokens': 665, 't...",Those all sound great! I like the Burnt auberg...,That's a great choice! I hope you enjoy it.


#### Define a CustomMetric using Gemini model

Define a customized Gemini model-based metric function, with explanations for the score. The registered custom metrics are computed on the client side, without using online evaluation service APIs.

In [43]:
evaluator_llm = ChatVertexAI(
    model_name="gemini-1.5-flash-001",
    temperature=0,
    response_mime_type="application/json",
)


def custom_faithfulness(instance):
    prompt = f"""You are examining written text content. Here is the text:
************
Written content: {instance["response"]}
************
Original source data: {instance["reference"]}

Examine the text and determine whether the text is faithful or not.
Faithfulness refers to how accurately a generated summary reflects the essential information and key concepts present in the original source document.
A faithful summary stays true to the facts and meaning of the source text, without introducing distortions, hallucinations, or information that wasn't originally there.

Your response must be an explanation of your thinking along with single integer number on a scale of 0-5, 0
the least faithful and 5 being the most faithful.

Produce results in JSON

Expected format:

```json
{{
    "explanation": "< your explanation>",
    "custom_faithfulness":
}}
```
"""

    result = evaluator_llm.invoke([("human", prompt)])
    result = json.loads(result.content)
    return result


# Register Custom Metric
custom_faithfulness_metric = CustomMetric(
    name="custom_faithfulness",
    metric_function=custom_faithfulness,
)

In [44]:
experiment_name = "template-langchain-eval"  # @param {type:"string"}

We are now ready to run the evaluation. We will use different metrics, combining the custom metric we defined above with some pre-built metrics.

Results of the evaluation will be automatically tagged into the experiment_name we define.

You can click `View Experiment`, to see the experiment in Google Cloud Console.

In [45]:
metrics = ["fluency", "safety", custom_faithfulness_metric]

eval_task = EvalTask(
    dataset=scored_data,
    metrics=metrics,
    experiment=experiment_name,
    metric_column_mapping={"prompt": "user"},
)
eval_result = eval_task.evaluate()

INFO:google.cloud.aiplatform.metadata.experiment_resources:Associating projects/7198055878/locations/us-central1/metadataStores/default/contexts/template-langchain-eval-98e9faac-65f8-4a55-8a1b-2a2b2e17148e to Experiment: template-langchain-eval


INFO:vertexai.evaluation._evaluation:Computing metrics with a total of 30 Vertex Gen AI Evaluation Service API requests.
100%|██████████| 30/30 [01:17<00:00,  2.58s/it]
INFO:vertexai.evaluation._evaluation:Evaluation Took:77.48689969399993 seconds


Once an eval result is produced, we are able to display summary metrics:

In [46]:
eval_result.summary_metrics

{'row_count': 10,
 'fluency/mean': 'NaN',
 'fluency/std': 'NaN',
 'safety/mean': 'NaN',
 'safety/std': 'NaN',
 'custom_faithfulness/mean': 1.1,
 'custom_faithfulness/std': 1.5238839267549946}

We are also able to display a pandas dataframe containing a detailed summary of how our eval dataset performed and relative granular metrics.

In [47]:
eval_result.metrics_table

Unnamed: 0,human_message,ai_message,conversation_history,response,response_obj,user,reference,custom_faithfulness/score,custom_faithfulness/explanation,fluency/explanation,fluency/score,safety/explanation,safety/score
0,"{'type': 'human', 'content': 'Hi'}","{'type': 'ai', 'content': 'Hi, how can I help ...","[{'type': 'human', 'content': 'Hi'}]",Hello! What can I help you cook today? I'm r...,"{'input_tokens': 64, 'output_tokens': 48, 'tot...",Hi,"Hi, how can I help you?",1,The generated text is not faithful to the orig...,Error,,Error,
1,"{'type': 'human', 'content': 'I'm looking for ...","{'type': 'ai', 'content': 'Sure, I can help yo...","[{'type': 'human', 'content': 'Hi'}, {'type': ...",I have many healthy dinner recipe recommendati...,"{'input_tokens': 90, 'output_tokens': 149, 'to...",I'm looking for a recipe for a healthy dinner....,"Sure, I can help you with that. What are your ...",2,The text is not faithful to the original sourc...,Error,,Error,
2,"{'type': 'human', 'content': 'I'm not vegetari...","{'type': 'ai', 'content': 'Okay, I ll keep tha...","[{'type': 'human', 'content': 'Hi'}, {'type': ...","Okay, here's a delicious and healthy gluten-fr...","{'input_tokens': 134, 'output_tokens': 596, 't...","I'm not vegetarian or vegan, but I am gluten-f...","Okay, I ll keep that in mind. Here are a few r...",0,The provided text is a recipe for Sheet Pan Le...,Error,,Error,
3,"{'type': 'human', 'content': 'Those all sound ...","{'type': 'ai', 'content': 'That's a great choi...","[{'type': 'human', 'content': 'Hi'}, {'type': ...",Excellent choice! Grilled salmon with roasted ...,"{'input_tokens': 290, 'output_tokens': 690, 't...",Those all sound great! I think I'm going to tr...,That's a great choice! I hope you enjoy it.,1,The provided text is a detailed recipe for Gri...,Error,,Error,
4,"{'type': 'human', 'content': 'Thanks for your ...","{'type': 'ai', 'content': 'You're welcome! Is ...","[{'type': 'human', 'content': 'Hi'}, {'type': ...","You're welcome! To help you further, let's ge...","{'input_tokens': 308, 'output_tokens': 83, 'to...",Thanks for your help!,You're welcome! Is there anything else I can h...,0,The generated text is not faithful to the orig...,Error,,Error,
5,"{'type': 'human', 'content': 'No, that's all. ...","{'type': 'ai', 'content': 'You're welcome! Hav...","[{'type': 'human', 'content': 'Hi'}, {'type': ...",You're very welcome! Enjoy your delicious and...,"{'input_tokens': 334, 'output_tokens': 30, 'to...","No, that's all. Thanks again!",You're welcome! Have a great day!,1,The generated text is not faithful to the orig...,Error,,Error,
6,"{'type': 'human', 'content': 'Hi'}","{'type': 'ai', 'content': 'Hi, how can I help ...","[{'type': 'human', 'content': 'Hi'}]",Hello! What can I help you cook today? I'm r...,"{'input_tokens': 64, 'output_tokens': 48, 'tot...",Hi,"Hi, how can I help you?",1,The generated text is not faithful to the orig...,Error,,Error,
7,"{'type': 'human', 'content': 'I'm looking for ...","{'type': 'ai', 'content': 'Sure, I can help yo...","[{'type': 'human', 'content': 'Hi'}, {'type': ...","For a romantic dinner, I recommend **Pan-Seare...","{'input_tokens': 90, 'output_tokens': 608, 'to...",I'm looking for a recipe for a romantic dinner...,"Sure, I can help you with that. What are your ...",5,The provided text is a complete recipe for Pan...,Error,,Error,
8,"{'type': 'human', 'content': 'I'm vegetarian.'}","{'type': 'ai', 'content': 'Sure, I can help yo...","[{'type': 'human', 'content': 'Hi'}, {'type': ...",Excellent! Here's a recipe for a romantic veg...,"{'input_tokens': 124, 'output_tokens': 809, 't...",I'm vegetarian.,"Sure, I can help you find a healthy vegetarian...",0,The provided text is not faithful to the origi...,Error,,Error,
9,"{'type': 'human', 'content': 'Those all sound ...","{'type': 'ai', 'content': 'That's a great choi...","[{'type': 'human', 'content': 'Hi'}, {'type': ...",Excellent choice! Burnt Aubergine Veggie Chil...,"{'input_tokens': 412, 'output_tokens': 665, 't...",Those all sound great! I like the Burnt auberg...,That's a great choice! I hope you enjoy it.,0,The provided text is a recipe for Burnt Auberg...,Error,,Error,


## Next Steps

Congratulations on completing the getting started tutorial! You've learned different methodologies to build a chain and how to evaluate it.
Let's explore the next steps in your journey:

### 1. Prepare for Production

Once you're satisfied with your chain's evaluation results:

1. Write your chain into the [`app/chain.py` file](../app/chain.py).
2. Remove the `patterns` folder and its associated tests (these are for demonstration only).

### 2. Local Testing

Test your chain using the playground:

```bash
make playground
```

This launches af feature-rich playground, including chat curation, user feedback collection, multimodal input, and more!


### 3. Production Deployment

Once you are satisfied with the results, you can setup your CI/CD pipelines to deploy your chain to production.

Please refer to the [deployment guide](../deployment/README.md) for more information on how to do that.

In [78]:
cd ..

/content/app-starter-pack


In [82]:
!curl -sSL https://install.python-poetry.org | python3 -

[36mRetrieving Poetry metadata[0m

# Welcome to [36mPoetry[0m!

This will download and install the latest version of [36mPoetry[0m,
a dependency and package manager for Python.

It will add the `poetry` command to [36mPoetry[0m's bin directory, located at:

[33m/root/.local/bin[0m

You can uninstall at any time by executing this script with the --uninstall option,
and these changes will be reverted.

Installing [36mPoetry[0m ([36m1.8.4[0m)
[1A[2KInstalling [36mPoetry[0m ([1m1.8.4[0m): [33mCreating environment[0m
[1A[2KInstalling [36mPoetry[0m ([1m1.8.4[0m): [33mInstalling Poetry[0m
[1A[2KInstalling [36mPoetry[0m ([1m1.8.4[0m): [33mCreating script[0m
[1A[2KInstalling [36mPoetry[0m ([1m1.8.4[0m): [33mDone[0m

[36mPoetry[0m ([1m1.8.4[0m) is installed now. Great!

To get started you need [36mPoetry[0m's bin directory ([33m/root/.local/bin[0m) in your `PATH`
environment variable.

Add `export PATH="[33m/root/.local/bin[0m:$PATH"` to yo

In [84]:
import os
os.environ['PATH'] += os.pathsep + os.path.expanduser("~/.local/bin")

In [85]:
!poetry install --with streamlit,jupyter

[1;30;43mStreaming output truncated to the last 5000 lines.[0m
[2A[0J  [34;1m-[39;22m [39mInstalling [39m[36mopentelemetry-proto[39m[39m ([39m[39;1m1.27.0[39;22m[39m)[39m: [34mInstalling...[39m
  [34;1m-[39;22m [39mInstalling [39m[36mopentelemetry-semantic-conventions[39m[39m ([39m[39;1m0.48b0[39;22m[39m)[39m: [34mDownloading...[39m [39;1m0%[39;22m
  [34;1m-[39;22m [39mInstalling [39m[36mpandas[39m[39m ([39m[39;1m2.2.3[39;22m[39m)[39m: [34mPending...[39m
[7A[0J  [32;1m-[39;22m [39mInstalling [39m[36mmarshmallow[39m[39m ([39m[32m3.22.0[39m[39m)[39m
  [32;1m-[39;22m [39mInstalling [39m[36mmore-itertools[39m[39m ([39m[32m10.5.0[39m[39m)[39m
  [32;1m-[39;22m [39mInstalling [39m[36mnotebook-shim[39m[39m ([39m[32m0.2.4[39m[39m)[39m
  [34;1m-[39;22m [39mInstalling [39m[36mopentelemetry-proto[39m[39m ([39m[39;1m1.27.0[39;22m[39m)[39m: [34mInstalling...[39m
  [34;1m-[39;22m [39mInstalling [39

In [86]:
!poetry --version

[39;1mPoetry[39;22m (version [36m1.8.4[39m)


In [None]:
!make playground

poetry run uvicorn app.server:app --host 0.0.0.0 --port 8000 --reload & poetry run streamlit run streamlit/streamlit_app.py --browser.serverAddress=localhost --server.enableCORS=false --server.enableXsrfProtection=false
[32mINFO[0m:     Will watch for changes in these directories: ['/content/app-starter-pack']
[32mINFO[0m:     Uvicorn running on [1mhttp://0.0.0.0:8000[0m (Press CTRL+C to quit)
[32mINFO[0m:     Started reloader process [[36m[1m37175[0m] using [36m[1mWatchFiles[0m

Collecting usage statistics. To deactivate, set browser.gatherUsageStats to false.
[0m
[0m
[34m[1m  You can now view your Streamlit app in your browser.[0m
[0m
[34m  URL: [0m[1mhttp://localhost:8501[0m
[0m
Process SpawnProcess-1:
Traceback (most recent call last):
  File "/usr/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/usr/lib/python3.10/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/

## Cleaning up

To clean up all Google Cloud resources used in this project, you can [delete the Google Cloud
project](https://cloud.google.com/resource-manager/docs/creating-managing-projects#shutting_down_projects) you used for the tutorial.

Otherwise, you can delete the individual resources you created in this tutorial.

In [None]:
import os

# Delete Experiments
delete_experiments = True
if delete_experiments or os.getenv("IS_TESTING"):
    experiments_list = aiplatform.Experiment.list()
    for experiment in experiments_list:
        experiment.delete()