**Disclaimer**. This example uses the Evidently API as available in version 0.6.7 or lower. Please ensure you are using the correct version when running this notebook. See an updated example: https://github.com/evidentlyai/community-examples/blob/main/tutorials/Regression_testing_with_debugging_updated.ipynb 

Evidently docs: https://docs.evidentlyai.com/

Join our Discord: https://discord.com/invite/xZjKRaNp8b

In [None]:
!pip install evidently[llm]

In [None]:
import pandas as pd

from evidently.test_suite import TestSuite
from evidently.descriptors import *
from evidently.tests import *

from evidently.ui.workspace.cloud import CloudWorkspace

Optional imports to create monitoring panels:

In [None]:
from evidently.ui.dashboards import DashboardPanelPlot
from evidently.ui.dashboards import DashboardPanelTestSuite
from evidently.ui.dashboards import DashboardPanelTestSuiteCounter
from evidently.ui.dashboards import TestSuitePanelType
from evidently.ui.dashboards import ReportFilter
from evidently.ui.dashboards import PanelValue
from evidently.ui.dashboards import PlotType
from evidently.ui.dashboards import CounterAgg
from evidently.tests.base_test import TestStatus
from evidently.renderers.html_widgets import WidgetSize

# Connect to Evidently Cloud

In [None]:
ws = CloudWorkspace(token="YOUR_API_TOKEN", url="https://app.evidently.cloud")

Create a Project

In [None]:
project = ws.create_project("Regression testing example", team_id="YOUR_TEAM_ID")
project.description = "My project description"
project.save()

# Golden dataset

Prepare a set of approved answers to compare against. Add a placeholder for the new response column (initially contains the duplicates of golden answers). This will be the **reference data**.

In [None]:
data = [
    ["Hi there, how do I reset my password?", "To reset your password, click on 'Forgot Password' on the login page and follow the instructions sent to your registered email.", "To reset your password, click on 'Forgot Password' on the login page and follow the instructions sent to your registered email."],
    ["Can I change my email address on file?", "Yes, you can change your email address in the account settings under 'Profile Information'.", "Yes, you can change your email address in the account settings under 'Profile Information'."],
    ["Where can I view my purchase history?", "You can view your purchase history by going to 'Order History' in your account settings.", "You can view your purchase history by going to 'Order History' in your account settings."],
    ["What payment methods are accepted?", "We accept credit cards, debit cards, and PayPal.", "We accept credit cards, debit cards, and PayPal."],
    ["How do I contact customer support?", "You can contact customer support through the 'Help' section on our website or by emailing support@example.com email.", "You can contact customer support through the 'Help' section on our website or by emailing support@example.com email."],
    ["How do I update my billing information?", "You can update your billing information in the 'Billing' section of your account settings.", "You can update your billing information in the 'Billing' section of your account settings."],
    ["What is the return policy?", "Our return policy allows returns within 30 days of purchase with a receipt.", "Our return policy allows returns within 30 days of purchase with a receipt."],
    ["Can I track my order?", "Yes, you can track your order by logging into your account and going to 'Track Order'.", "Yes, you can track your order by logging into your account and going to 'Track Order'."],
    ["How do I apply a discount code?", "To apply a discount code, enter it at checkout in the 'Discount Code' field.", "To apply a discount code, enter it at checkout in the 'Discount Code' field."],
    ["Is there a mobile app available?", "Yes, our mobile app is available on both iOS and Android platforms.", "Yes, our mobile app is available on both iOS and Android platforms."]
]

columns = ["question", "reference_response", "response"]

ref_data = pd.DataFrame(data, columns=columns)

In [None]:
pd.set_option('display.max_colwidth', None)

In [None]:
ref_data.head(10)

# Define the Test Suite

In [None]:
#test_suite = TestSuite(tests=[...])

Define the test suite conditions you will be testing against:
- If the **semantic similarity** between new and old responses is < 0.9, the test returns a warning.
- If the **semantic similarity** between new and old responses is < 0.8, the test fails.
- If the minimal **sentiment** is outside +/- 10% (default) from the reference range, the test fails.
- If the mean **text length** is outside +/- 10% (default) from the reference range, the test returns a warning.
- If the **text length** is outside the 0 to 250 symbols range, the test fails.
- If responses **mention competitors**, the test fails.

In [None]:
test_suite = TestSuite(tests=[
    TestColumnValueMin(
        column_name=SemanticSimilarity(
        display_name="Response Similarity",
        with_column="reference_response").
        on("response"),
        gte=0.9,
        is_critical=False),
    TestColumnValueMin(
        column_name=SemanticSimilarity(
        display_name="Response Similarity",
        with_column="reference_response").
        on("response"),
        gte=0.8),
    TestColumnValueMin(
        column_name=Sentiment(
        display_name="Response Sentiment").
        on("response")),
    TestColumnValueMean(
        column_name=TextLength(
        display_name="Text Length").
        on("response"),
        is_critical=False),
    TestShareOfOutRangeValues(
        column_name=TextLength(
        display_name="Text Length").
        on("response"),
        left=0, right=250,
        eq=0),
    TestCategoryCount(
        column_name=Contains(
            items=["AnotherFinCo", "YetAnotherFinCo"],
            display_name="Competitor Mentions").
        on("response"),
        category=True,
        eq=0),
])

# Change prompt - 1

Generate a new set of answers. (Modify the prompt to be more conversational in style.)

In [None]:
#In real use case, call your LLM product API to get a new set of questions and add them to a "response" column.

data = [
    ["Hi there, how do I reset my password?", "To reset your password, click on 'Forgot Password' on the login page and follow the instructions sent to your registered email.", "Hey! Just click on 'Forgot Password' on the login page and follow the steps in the email we send you."],
    ["Can I change my email address on file?", "Yes, you can change your email address in the account settings under 'Profile Information'.", "Of course! Go to 'Profile Information' in your account settings to update your email address."],
    ["Where can I view my purchase history?", "You can view your purchase history by going to 'Order History' in your account settings.", "Absolutely! Just go to 'Order History' in your account settings to see all your purchases."],
    ["What payment methods are accepted?", "We accept credit cards, debit cards, and PayPal.", "We take credit cards, debit cards, and PayPal."],
    ["How do I contact customer support?", "You can contact customer support through the 'Help' section on our website or by emailing support@example.com email.", "You can reach out to our support team via the 'Help' section on our website or by emailing support@example.com."],
    ["How do I update my billing information?", "You can update your billing information in the 'Billing' section of your account settings.", "You can easily update your billing info in the 'Billing' section of your account settings."],
    ["What is the return policy?", "Our return policy allows returns within 30 days of purchase with a receipt.", "You can return items within 30 days of purchase as long as you have the receipt."],
    ["Can I track my order?", "Yes, you can track your order by logging into your account and going to 'Track Order'.", "Of course! To track the order, log into your account and go to 'Track Order' to see where your package is."],
    ["How do I apply a discount code?", "To apply a discount code, enter it at checkout in the 'Discount Code' field.", "Simply enter your discount code at checkout in the 'Discount Code' box."],
    ["Is there a mobile app available?", "Yes, our mobile app is available on both iOS and Android platforms.", "Yes, we have a mobile app available on both iOS and Android."]
]

columns = ["question", "reference_response", "response"]

cur_data_1 = pd.DataFrame(data, columns=columns)

In [None]:
#To preview the dataset

#cur_data_1.head(10)

In [None]:
test_suite.run(reference_data=ref_data, current_data=cur_data_1)

In [None]:
ws.add_test_suite(project.id, test_suite, include_data=True)

In [None]:
#To send without raw data (default)

#ws.add_test_suite(project.id, test_suite, include_data=False)

In [None]:
#To view tests in Python

#test_suite

# Create monitoring panels

Add a counter panel to show the SUCCESS rate of the latest test run:

In [None]:
project.dashboard.add_panel(
     DashboardPanelTestSuiteCounter(
        title="Latest Test run",
        filter=ReportFilter(metadata_values={}, tag_values=[], include_test_suites=True),
        size=WidgetSize.FULL,
        statuses=[TestStatus.SUCCESS],
        agg=CounterAgg.LAST,
    ),
    tab="Tests"
)

Add a test monitoring panel to show all test results over time.

In [None]:
project.dashboard.add_panel(
    DashboardPanelTestSuite(
        title="Test results",
        filter=ReportFilter(metadata_values={}, tag_values=[], include_test_suites=True),
        size=WidgetSize.FULL,
        panel_type=TestSuitePanelType.DETAILED,
        time_agg="1min",
    ),
    tab="Tests"
)

Add a panel with a bar chart to plot min, max, and mean semantic similarity.

In [None]:
project.dashboard.add_panel(
    DashboardPanelPlot(
        title="Semantic similarity",
        filter=ReportFilter(metadata_values={}, tag_values=[], include_test_suites=True),
        size=WidgetSize.HALF,
        values=[
            PanelValue(
                metric_id="ColumnSummaryMetric",
                field_path='current_characteristics.min',
                metric_args={"column_name" : SemanticSimilarity(display_name="Response Similarity", with_column="reference_response").on("response")},
                legend='min',
            ),
            PanelValue(
                metric_id="ColumnSummaryMetric",
                field_path='current_characteristics.mean',
                metric_args={"column_name" : SemanticSimilarity(display_name="Response Similarity", with_column="reference_response").on("response")},
                legend='mean',
            ),
            PanelValue(
                metric_id="ColumnSummaryMetric",
                field_path='current_characteristics.max',
                metric_args={"column_name" : SemanticSimilarity(display_name="Response Similarity", with_column="reference_response").on("response")},
                legend='max',
            ),
        ],
        plot_type=PlotType.BAR,
    ),
    tab="Tests"
)

Add a panel with a bar chart to plot min, max, and mean sentiment.

In [None]:
project.dashboard.add_panel(
    DashboardPanelPlot(
        title="Sentiment",
        filter=ReportFilter(metadata_values={}, tag_values=[], include_test_suites=True),
        size=WidgetSize.HALF,
        values=[
            PanelValue(
                metric_id="ColumnSummaryMetric",
                field_path='current_characteristics.min',
                metric_args={"column_name" : Sentiment(display_name="Response Sentiment").on("response")},
                legend='min',
            ),
            PanelValue(
                metric_id="ColumnSummaryMetric",
                field_path='current_characteristics.mean',
                metric_args={"column_name" : Sentiment(display_name="Response Sentiment").on("response")},
                legend='mean',
            ),
            PanelValue(
                metric_id="ColumnSummaryMetric",
                field_path='current_characteristics.max',
                metric_args={"column_name" : Sentiment(display_name="Response Sentiment").on("response")},
                legend='max',
            ),
        ],
        plot_type=PlotType.BAR,
    ),
    tab="Tests"
)

Save results

In [None]:
project.save()

# Change prompt - 2

Generate a new set of answers. (Modify the prompt to be more formal).

In [None]:
data = [
    ["Hi there, how do I reset my password?", "To reset your password, click on 'Forgot Password' on the login page and follow the instructions sent to your registered email.", "To reset your password, kindly click on the 'Forgot Password' link located on the login page and then follow the detailed instructions that will be sent to your registered email address. This should guide you through the process smoothly."],
    ["Can I change my email address on file?", "Yes, you can change your email address in the account settings under 'Profile Information'.", "Yes, you may update your email address by navigating to the account settings section and selecting 'Profile Information'. From there, you can easily make the necessary changes to your email address."],
    ["Where can I view my purchase history?", "You can view your purchase history by going to 'Order History' in your account settings.", "You can access your complete purchase history by logging into your account and visiting the 'Order History' section. This area contains detailed records of all your past transactions."],
    ["What payment methods are accepted?", "We accept credit cards, debit cards, and PayPal.", "We are pleased to accept a variety of payment methods, including credit cards, debit cards, and PayPal. These options provide flexibility and convenience for our valued customers."],
    ["How do I contact customer support?", "You can contact customer support through the 'Help' section on our website or by emailing support@example.com email.", "To get in touch with our customer support team, please visit the 'Help' section on our website. Alternatively, you can send an email to support@example.com, and we will be happy to assist you."],
    ["How do I update my billing information?", "You can update your billing information in the 'Billing' section of your account settings.", "Updating your billing information is straightforward. Simply log into your account, navigate to the 'Billing' section within your account settings, and make the necessary updates to your billing details."],
    ["What is the return policy?", "Our return policy allows returns within 30 days of purchase with a receipt.", "Our return policy permits returns within a 30-day period from the date of purchase, provided you have the receipt. This policy is designed to ensure your satisfaction with our products."],
    ["Can I track my order?", "Yes, you can track your order by logging into your account and going to 'Track Order'.", "Certainly! You can monitor the status of your order by logging into your account and selecting the 'Track Order' option. This will provide you with real-time updates on your shipment."],
    ["How do I apply a discount code?", "To apply a discount code, enter it at checkout in the 'Discount Code' field.", "To use a discount code, please enter it in the 'Discount Code' field during the checkout process. This will automatically apply the discount to your purchase."],
    ["Is there a mobile app available?", "Yes, our mobile app is available on both iOS and Android platforms.", "Yes, we offer a mobile application that is available for both iOS and Android platforms. You can download it from the respective app stores for a more convenient shopping experience."]
]

columns = ["question", "reference_response", "response"]

cur_data_2 = pd.DataFrame(data, columns=columns)


In [None]:
#To preview the dataset

#cur_data_2.head(10)

In [None]:
test_suite.run(reference_data=ref_data, current_data=cur_data_2)

In [None]:
ws.add_test_suite(project.id, test_suite, include_data=True)

In [None]:
#To view tests in Python

#test_suite

In [None]:
test_suite.datasets().current

In [None]:
df = pd.DataFrame(test_suite.datasets().current)

filtered_df = df[df['Response Similarity'] < 0.9]

filtered_df.head()

# Change prompt - 3

Generate a new set of answers. (Modify the prompt to add additional context when appropriate).

In [None]:
data = [
    ["Hi there, how do I reset my password?", "To reset your password, click on 'Forgot Password' on the login page and follow the instructions sent to your registered email.", "To reset your password, kindly click on the 'Forgot Password' link located on the login page. You will receive an email with detailed instructions. Please follow those steps carefully to successfully reset your password."],
    ["Can I change my email address on file?", "Yes, you can change your email address in the account settings under 'Profile Information'.", "Yes, you may update your email address by navigating to the account settings section and selecting 'Profile Information'. After updating, please ensure to save the changes to keep your account information current."],
    ["Where can I view my purchase history?", "You can view your purchase history by going to 'Order History' in your account settings.", "You can access your complete purchase history by logging into your account and visiting the 'Order History' section. This section contains detailed records of all your past transactions, including dates and amounts."],
    ["What payment methods are accepted?", "We accept credit cards, debit cards, and PayPal.", "We are pleased to accept a variety of payment methods, including credit cards, debit cards, and PayPal. These options are available to provide you with the utmost flexibility and convenience during the checkout process."],
    ["How do I contact customer support?", "You can contact customer support through the 'Help' section on our website or by emailing support@example.com email.", "To get in touch with our customer support team, please visit the 'Help' section on our website. Additionally, you can send an email to support@example.com, and we will respond to your query as soon as possible."],
    ["How do I update my billing information?", "You can update your billing information in the 'Billing' section of your account settings.", "Updating your billing information is straightforward. Simply log into your account, navigate to the 'Billing' section within your account settings, and make the necessary updates to your billing details. Please ensure to save the changes."],
    ["What is the return policy?", "Our return policy allows returns within 30 days of purchase with a receipt.", "Our return policy permits returns within a 30-day period from the date of purchase, provided you have the receipt. This policy is designed to ensure your satisfaction with our products, allowing you to shop with confidence."],
    ["Can I track my order?", "Yes, you can track your order by logging into your account and going to 'Track Order'.", "Certainly! You can monitor the status of your order by logging into your account and selecting the 'Track Order' option. This will provide you with real-time updates on your shipment, including the estimated delivery date."],
    ["How do I apply a discount code?", "To apply a discount code, enter it at checkout in the 'Discount Code' field.", "To use a discount code, please enter it in the 'Discount Code' field during the checkout process. This will automatically apply the discount to your purchase, providing you with the reduced price."],
    ["Is there a mobile app available?", "Yes, our mobile app is available on both iOS and Android platforms.", "Yes, we offer a mobile application that is available for both iOS and Android platforms. You can download it from the respective app stores for a more convenient shopping experience."]
]

columns = ["question", "reference_response", "response"]

cur_data_3 = pd.DataFrame(data, columns=columns)

In [None]:
#To preview the dataset

#cur_data_3.head(10)

In [None]:
test_suite.run(reference_data=ref_data, current_data=cur_data_3)

In [None]:
ws.add_test_suite(project.id, test_suite, include_data=True)

In [None]:
#To view tests in Python

#test_suite

In [None]:
df = pd.DataFrame(test_suite.datasets().current)

filtered_df = df[df['Response Similarity'] < 0.9]

filtered_df.head()

# Change prompt - 4

Generate a new set of answers. (Modify the prompt to add examples).


In [None]:
data = [
    ["Hi there, how do I reset my password?", "To reset your password, click on 'Forgot Password' on the login page and follow the instructions sent to your registered email.", "To reset your password, please click on the 'Forgot Password' link on the login page. You will receive an email with specific instructions. For example, you might be asked to enter your registered email and create a new password."],
    ["Can I change my email address on file?", "Yes, you can change your email address in the account settings under 'Profile Information'.", "Yes, you can update your email address by going to the account settings and selecting 'Profile Information'. For instance, you could remove your old email and add a new one, ensuring to save the changes."],
    ["Where can I view my purchase history?", "You can view your purchase history by going to 'Order History' in your account settings.", "You can check your purchase history by logging into your account and navigating to the 'Order History' section. This section will display all your past orders, including the order dates and amounts paid."],
    ["What payment methods are accepted?", "We accept credit cards, debit cards, and PayPal.", "We accept various payment methods, such as credit cards, debit cards, and PayPal. This provides you with several options to choose from when making a purchase."],
    ["How do I contact customer support?", "You can contact customer support through the 'Help' section on our website or by emailing support@example.com email.", "To reach our customer support team, you can use the 'Help' section on our website. Additionally, you may email us at support@example.com, and we will address your concerns promptly."],
    ["How do I update my billing information?", "You can update your billing information in the 'Billing' section of your account settings.", "To update your billing information, log into your account, go to the 'Billing' section under account settings, and make the necessary updates. For example, you can change your credit card details or address."],
    ["What is the return policy?", "Our return policy allows returns within 30 days of purchase with a receipt.", "Our return policy allows you to return items within 30 days of purchase if you have the receipt. This ensures you can return products that do not meet your expectations or requirements."],
    ["Can I track my order?", "Yes, you can track your order by logging into your account and going to 'Track Order'.", "Yes, you can track your order by logging into your account and selecting the 'Track Order' option. This feature will provide you with updates on your order's status and estimated delivery date."],
    ["How do I apply a discount code?", "To apply a discount code, enter it at checkout in the 'Discount Code' field.", "To apply a discount code, enter it in the 'Discount Code' field during checkout. This will ensure the discount is applied to your total purchase amount, giving you the reduced price."],
    ["Is there a mobile app available?", "Yes, our mobile app is available on both iOS and Android platforms.", "Yes, our mobile app is available on both iOS and Android platforms. You can download it from the respective app stores for a more seamless and convenient shopping experience."]
]

columns = ["question", "reference_response", "response"]

cur_data_4 = pd.DataFrame(data, columns=columns)

In [None]:
#To preview the dataset

#cur_data_4.head(10)

In [None]:
test_suite.run(reference_data=ref_data, current_data=cur_data_4)

In [None]:
ws.add_test_suite(project.id, test_suite, include_data=True)

In [None]:
#To view tests in Python

#test_suite

# Change prompt - 5

Generate a new set of answers. (Modify the prompt to give shorter responses).

In [None]:
data = [
    ["Hi there, how do I reset my password?", "To reset your password, click on 'Forgot Password' on the login page and follow the instructions sent to your registered email.", "Click 'Forgot Password' on the login page and follow the email instructions."],
    ["Can I change my email address on file?", "Yes, you can change your email address in the account settings under 'Profile Information'.", "Go to 'Profile Information' in account settings to change your email."],
    ["Where can I view my purchase history?", "You can view your purchase history by going to 'Order History' in your account settings.", "Check 'Order History' in your account settings."],
    ["What payment methods are accepted?", "We accept credit cards, debit cards, and PayPal.", "We accept credit cards, debit cards, and PayPal."],
    ["How do I contact customer support?", "You can contact customer support through the 'Help' section on our website or by emailing support@example.com email.", "Visit 'Help' on our website or email support@example.com."],
    ["How do I update my billing information?", "You can update your billing information in the 'Billing' section of your account settings.", "Update billing info in the 'Billing' section of account settings."],
    ["What is the return policy?", "Our return policy allows returns within 30 days of purchase with a receipt.", "Return items within 30 days with a receipt."],
    ["Can I track my order?", "Yes, you can track your order by logging into your account and going to 'Track Order'.", "Track your order in the 'Track Order' section of your account."],
    ["How do I apply a discount code?", "To apply a discount code, enter it at checkout in the 'Discount Code' field.", "Enter your discount code at checkout."],
    ["Is there a mobile app available?", "Yes, our mobile app is available on both iOS and Android platforms.", "Our app is available on iOS and Android."]
]

columns = ["question", "reference_response", "response"]

cur_data_5 = pd.DataFrame(data, columns=columns)

In [None]:
#To preview the dataset

#cur_data_5.head(10)

In [None]:
test_suite.run(reference_data=ref_data, current_data=cur_data_5)

In [None]:
ws.add_test_suite(project.id, test_suite, include_data=True)

In [None]:
#To view tests in Python

#test_suite

In [None]:
df = pd.DataFrame(test_suite.datasets().current)

filtered_df = df[df['Response Similarity'] < 0.9]

filtered_df.head()