# Fiddler LLM Evaluation Quick Start Guide

Fiddler is the pioneer in enterprise AI Observability, offering a unified platform enabling all model stakeholders to monitor model performance and investigate the source of model degradation. Fiddler's AI Observability platform supports traditional ML models and Generative AI applications. Fiddler can assist teams during the evaluation phase of selecting LLM models before developing an application. This guide explains how to compare LLM outputs from different models, such as GPT3.5 and Claude, to help determine the most suitable model for your language model application.

---

You can start using Fiddler ***in minutes*** by following these 6 quick steps:

1. Connect to Fiddler
2. Create or Retrieve a Fiddler Project
3. Load Data Samples
4. Enable Specific Fiddler LLM Enrichments
5. Provide Information About the LLM Project
6. Publish Datasets for Model Comparison

Get insights!

## 0. Imports

In [None]:
%pip install -q fiddler-client

import numpy as np
import pandas as pd
import fiddler as fdl

print(f"Running Fiddler Python client version {fdl.__version__}")

## 1. Connect to Fiddler

Before you can add information about your LLM datasets with Fiddler, you'll need to connect using the Fiddler Python client.


---

**We need a couple pieces of information to get started.**
1. The URL you're using to connect to Fiddler
2. Your authorization token

Your authorization token can be found by navigating to the **Credentials** tab on the **Settings** page of your Fiddler environment.

In [None]:
URL = ''  # Make sure to include the full URL (including https:// e.g. 'https://your_company_name.fiddler.ai').
TOKEN = ''

Constants for this example notebook, change as needed to create your own versions

In [None]:
PROJECT_NAME = 'quickstart_examples'  # If the project already exists, the notebook will create the model under the existing project.
MODEL_NAME = 'fiddler_llm_evaluation'

GPT_NAME = 'gpt3.5_dataset'
CLAUDE_NAME = 'claude_dataset'

# Sample data hosted on GitHub
PATH_TO_SAMPLE_GPT_CSV = 'https://media.githubusercontent.com/media/fiddler-labs/fiddler-examples/main/quickstart/data/v3/chatbot_production_data.csv'
PATH_TO_SAMPLE_CLAUDE_CSV 

Now just run the following code block to connect to the Fiddler API!

In [None]:
fdl.init(url=URL, token=TOKEN)

## 2. Create a Fiddler Project

Once you connect, you can create a new project by specifying a unique project name for the name parameter with either the Project.create() or the Project.get_or_create() methods. If the project already exists, the get_or_create() method will instead return the existing project which is helpful when running this notebook multiple times and when using an existing project to house Fiddler examples. 

*Note: get_or_create() requires Fiddler Python client 3.7+.*

In [None]:
project = fdl.Project(name=PROJECT_NAME).get_or_create()

# Check if the project has an ID to distinguish new vs existing
print(f'{"Created new" if project.id is None else "Retrieved existing"} project with id = {project.id} and name = {project.name}')


## 3. Load Dataset Samples

In [None]:
gpt_df = pd.read_csv(PATH_TO_SAMPLE_GPT_CSV)
claude_df = pd.read_csv(PATH_TO_SAMPLE_CLAUDE_CSV)

gpt_df
claude_df

## 4. Enable Fiddler LLM Enrichments

After picking a sample of our chatbot's prompts and responses, we can request that Fiddler execute a series of enrichment services that can "score" our prompts and responses for a variety of insights.  These enrichment services can detect AI safety issues like PII leakage, hallucinations, toxicity, and more.  We can also opt-in for enrichment services like embedding generation which will allow us to track prompt and response outliers and drift. A full description of these enrichments can be found [here](https://docs.fiddler.ai/platform-guide/llm-monitoring/enrichments-private-preview).

---
Define a list of Fiddler AI backend enrichments for various aspects of the model's input and output, including text embeddings, sentiment analysis, and PII detection. Each enrichment is represented by an appropriate Fiddler API enrichment object, such as TextEmbedding or Enrichment, with associated configuration.

In [None]:
fiddler_backend_enrichments = [
    # Generate text embeddings for the prompt (question) column
    fdl.TextEmbedding(
        name='Prompt TextEmbedding',
        source_column='question',
        column='Enrichment Prompt Embedding',
        n_tags=10,
    ),
    # Generate text embeddings for the response column
    fdl.TextEmbedding(
        name='Response TextEmbedding',
        source_column='response',
        column='Enrichment Response Embedding',
        n_tags=10,
    ),
    # Generate text embeddings for the source documents (rag documents) column
    fdl.TextEmbedding(
        name='Source Docs TextEmbedding',
        source_column='source_docs',
        column='Enrichment Source Docs Embedding',
        n_tags=10,
    ),
    # Enrichment to assess response faithfulness using source docs and the response
    fdl.Enrichment(
        name='Faithfulness',
        enrichment='ftl_response_faithfulness',
        columns=['source_docs', 'response'],
        config={'context_field': 'source_docs', 'response_field': 'response'},
    ),
    # Perform sentiment analysis on the question and response columns
    fdl.Enrichment(
        name='Enrichment QA Sentiment',
        enrichment='sentiment',
        columns=['question', 'response'],
    ),
    # Detect personally identifiable information (PII) in the question column
    fdl.Enrichment(
        name='Rag PII', enrichment='pii', columns=['question'], allow_list=['fiddler']
    ),
]


## 5.  Provide Information About the LLM Project

Now it's time to onboard information about our LLM dataset to Fiddler.  We do this by defining a `ModelSpec` object.


---


The `ModelSpec` object will contain some **information about how your LLM datasets are structured**.
  
*Just include:*
1. The **input/output** columns.  These are just the raw inputs and outputs tracked in our LLM dataset.
2. Any **metadata** columns. Make sure to include the 'model' column we generated earlier. 
3. The **custom features** which contain the configuration of the enrichments we opted for.

We'll also want to set the **task** to LLM, since these datasets are generated from LLMs.


In [None]:
model_spec = fdl.ModelSpec(
    inputs=['question', 'response', 'source_docs'],
    metadata=['session_id', 'comment', 'timestamp', 'feedback', 'model'],
    custom_features=fiddler_backend_enrichments,
)

model_task = fdl.ModelTask.LLM

Set this up in Fiddler by configuring a Model object to represent your LLM evaluation project.

In [None]:
llm_project = fdl.Model.from_data(
    source=gpt_df,
    name=MODEL_NAME,
    project_id=project.id,
    spec=model_spec,
    task=model_task,
    max_cardinality=5,
)

Now call the create method to create it in Fiddler.

In [None]:
llm_project.create()
print(
    f'New model created with id = {llm_.id} and name = {llm_project.name}'
)

## 6. Publish Data for Comparison

Information about the LLM datasets is now onboarded to Fiddler. It's time to actually start adding the data itself to the preproduction environment for comparison!

  
Let's load in some sample data (prompts and responses) from our GPT and Claude datasets.

In [None]:
publish_job_gpt = llm_project.publish(
    source=gpt_df,
    environment=fdl.EnvType.PRE_PRODUCTION,
    dataset_name=GPT_NAME,
)

# Print the Job ID for tracking
print(f'Initiated pre-production environment data upload with Job ID = {publish_job_gpt.id}')

Finally, load the second dataset for comparison with the first. 

In [None]:
publish_job_claude = llm_project.publish(
    source=claude_df,
    environment=fdl.EnvType.PRE_PRODUCTION,
    dataset_name=CLAUDE_NAME,
)

# Print the Job ID for tracking
print(f'Initiated pre-production environment data upload with Job ID = {publish_job_claude.id}')


# Get insights

**You're all done!**

You can now head to your Fiddler environment and start getting enhanced insights into the multiple datasets you are evaluating.  

**What's Next?**

Try the [ML Monitoring - Quick Start Guide](https://docs.fiddler.ai/quickstart-notebooks/quick-start)

---


**Questions?**  
  
Check out [our docs](https://docs.fiddler.ai/) for a more detailed explanation of what Fiddler has to offer.

Join our [community Slack](http://fiddler-community.slack.com/) to ask any questions!

If you're still looking for answers, fill out a ticket on [our support page](https://fiddlerlabs.zendesk.com/) and we'll get back to you shortly.