# Fiddler LLM Evaluation Quick Start Guide

Fiddler is the pioneer in enterprise AI Observability, offering a unified platform that enables all model stakeholders to monitor model performance and to investigate the true source of model degredation.  Fiddler's AI Observability platform supports both traditional ML models as well as Generative AI applications.  This guide walks you through how to onboard an LLM chatbot application that is built using a RAG architecture in order to compare two sets of data from two different apis from OpenAi and Anthropic.

---

You can start using Fiddler ***in minutes*** by following these 8 quick steps:

1. Connect to Fiddler
2. Create a Fiddler Project
3. Load a Data Sample
4. Opt-in to Specific Fiddler LLM Enrichments
5. Add Information About the LLM Application
6. Publish Events for Comparison

Get insights!

# Fiddler LLM Evaluation Quick Start Guide

Fiddler is the pioneer in enterprise AI Observability, offering a unified platform that enables all model stakeholders to monitor model performance and to investigate the true source of model degredation.  Fiddler's AI Observability platform supports both traditional ML models as well as Generative AI applications.  This guide walks you through how to onboard an LLM chatbot application that is built using a RAG architecture in order to compare two sets of data from two different apis from OpenAi and Anthropic.

---

You can start using Fiddler ***in minutes*** by following these 8 quick steps:

1. Connect to Fiddler
2. Create a Fiddler Project
3. Load a Data Sample
4. Opt-in to Specific Fiddler LLM Enrichments
5. Add Information About the LLM Application

6. Publish Production Events

Get insights!

## 0. Imports

In [None]:
%pip install -q fiddler-client

import numpy as np
import pandas as pd
import fiddler as fdl

print(f"Running Fiddler Python client version {fdl.__version__}")

## 1. Connect to Fiddler

Before you can add information about your LLM Application with Fiddler, you'll need to connect using the Fiddler Python client.


---


**We need a couple pieces of information to get started.**
1. The URL you're using to connect to Fiddler
2. Your authorization token

Your authorization token can be found by navigating to the **Credentials** tab on the **Settings** page of your Fiddler environment.

In [None]:
URL = ''  # Make sure to include the full URL (including https:// e.g. 'https://your_company_name.fiddler.ai').
TOKEN = ''

Constants for this example notebook, change as needed to create your own versions

In [None]:
PROJECT_NAME = 'quickstart_examples'  # If the project already exists, the notebook will create the model under the existing project.
MODEL_NAME = 'fiddler_llm_app_evaluation'

OPENAI_NAME = 'openai_dataset'
ANTHROPIC_NAME = 'anthropic_dataset'

# Sample data hosted on GitHub
PATH_TO_CSV = 'https://media.githubusercontent.com/media/fiddler-labs/fiddler-examples/main/quickstart/data/v3/chatbot_production_data.csv'

Now just run the following code block to connect to the Fiddler API!

In [None]:
fdl.init(url=URL, token=TOKEN)

## 2. Create a Fiddler Project

Once you connect, you can create a new project by specifying a unique project name in the `Project` constructor and calling the `create()` method. If the project already exists, it will load it for use.

In [None]:
try:
    # Create project
    project = fdl.Project(name=PROJECT_NAME).create()
    print(f'New project created with id = {project.id} and name = {project.name}')
except fdl.Conflict:
    # Get project by name
    project = fdl.Project.from_name(name=PROJECT_NAME)
    print(f'Loaded existing project with id = {project.id} and name = {project.name}')

## 4. Opt-in to Specific Fiddler LLM Enrichments

After picking a sample of our chatbot's prompts and responses, we can request that Fiddler execute a series of enrichment services that can "score" our prompts and responses for a variety of insights.  These enrichment services can detect AI safety issues like PII leakage, hallucinations, toxicity, and more.  We can also opt-in for enrichment services like embedding generation which will allow us to track prompt and response outliers and drift. A full description of these enrichments can be found [here](https://docs.fiddler.ai/platform-guide/llm-monitoring/enrichments-private-preview).

---
Define a list of Fiddler AI backend enrichments for various aspects of the model's input and output, including text embeddings, sentiment analysis, and PII detection. Each enrichment is represented by an appropriate Fiddler API enrichment object, such as TextEmbedding or Enrichment, with associated configuration.

In [None]:
fiddler_backend_enrichments = [
    # Generate text embeddings for the prompt (question) column
    fdl.TextEmbedding(
        name='Prompt TextEmbedding',
        source_column='question',
        column='Enrichment Prompt Embedding',
        n_tags=10,
    ),
    # Generate text embeddings for the response column
    fdl.TextEmbedding(
        name='Response TextEmbedding',
        source_column='response',
        column='Enrichment Response Embedding',
        n_tags=10,
    ),
    # Generate text embeddings for the source documents (rag documents) column
    fdl.TextEmbedding(
        name='Source Docs TextEmbedding',
        source_column='source_docs',
        column='Enrichment Source Docs Embedding',
        n_tags=10,
    ),
    # Enrichment to assess response faithfulness using source docs and the response
    fdl.Enrichment(
        name='Faithfulness',
        enrichment='ftl_response_faithfulness',
        columns=['source_docs', 'response'],
        config={'context_field': 'source_docs', 'response_field': 'response'},
    ),
    # Perform sentiment analysis on the question and response columns
    fdl.Enrichment(
        name='Enrichment QA Sentiment',
        enrichment='sentiment',
        columns=['question', 'response'],
    ),
    # Detect personally identifiable information (PII) in the question column
    fdl.Enrichment(
        name='Rag PII', enrichment='pii', columns=['question'], allow_list=['fiddler']
    ),
]


## 5.  Add Information About the LLM application

Now it's time to onboard information about our LLM application to Fiddler.  We do this by defining a `ModelSpec` object.


---


The `ModelSpec` object will contain some **information about how your LLM application operates**.
  
*Just include:*
1. The **input/output** columns.  For a LLM application, these are just the raw inputs and outputs of our LLM application.
2. Any **metadata** columns. Make sure to include the 'model' column we generated earlier. 
3. The **custom features** which contain the configuration of the enrichments we opted for.

We'll also want the **task** your model or LLM application is performing (LLM, regression, binary classification, not set, etc.)


In [None]:
model_spec = fdl.ModelSpec(
    inputs=['question', 'response', 'source_docs'],
    metadata=['session_id', 'comment', 'timestamp', 'feedback', 'model'],
    custom_features=fiddler_backend_enrichments,
)

model_task = fdl.ModelTask.LLM

Then just publish all of this to Fiddler by configuring a Model object to represent your LLM application in Fiddler.

In [None]:
llm_application = fdl.Model.from_data(
    source=df1.head(100),
    name=MODEL_NAME,
    project_id=project.id,
    spec=model_spec,
    task=model_task,
    max_cardinality=5,
)

Now call the create method to publish it to Fiddler!

In [None]:
llm_application.create()
print(
    f'New model created with id = {llm_application.id} and name = {llm_application.name}'
)

## 6. Publish Events for Comparison

Information about your LLM application is now onboarded to Fiddler. It's time to start publishing some data to the preproduction environment for comparison!


---


Each record sent to Fiddler is called **an event**.  Events simply contain the inputs and outputs of a predictive model or LLM application.
  
Let's load in some sample events (prompts and responses) from our dummy OpenAI and Anthropic datasets.

In [None]:
publish_job_df1 = llm_application.publish(
    source=df1,
    environment=fdl.EnvType.PRE_PRODUCTION,
    dataset_name=OPENAI_NAME,
)

# Print the Job ID for tracking
print(f'Initiated pre-production environment data upload with Job ID = {publish_job_df1.id}')

Finally, load the second dataset for comparison with the first. 

In [None]:
publish_job_df2 = llm_application.publish(
    source=df2,
    environment=fdl.EnvType.PRE_PRODUCTION,
    dataset_name=ANTHROPIC_NAME,
)

# Print the Job ID for tracking
print(f'Initiated pre-production environment data upload with Job ID = {publish_job_df2.id}')


# Get insights

**You're all done!**

You can now head to your Fiddler environment and start getting enhanced observability into your LLM application's performance.

# Get insights

**You're all done!**

You can now head to your Fiddler environment and start getting enhanced observability into your LLM application's performance.

**What's Next?**

Try the [ML Monitoring - Quick Start Guide](https://docs.fiddler.ai/quickstart-notebooks/quick-start)

---


**Questions?**  
  
Check out [our docs](https://docs.fiddler.ai/) for a more detailed explanation of what Fiddler has to offer.

Join our [community Slack](http://fiddler-community.slack.com/) to ask any questions!

If you're still looking for answers, fill out a ticket on [our support page](https://fiddlerlabs.zendesk.com/) and we'll get back to you shortly.