# 📊 Evaluating LLM Outputs with Evidently Descriptors: A Cross-Provider Tutorial

## 📝 Overview

In this tutorial, we'll explore how to run **Evidently descriptors** to generate data columns with different Large Language Model (LLM) providers. Evidently is a powerful open-source tool for monitoring, testing, and analyzing machine learning models — and its descriptors make it easy to quantify and compare text generation metrics like length, sentiment, toxicity, and more.

We'll set up a workflow where we:
- 🔌 Connect to multiple LLM providers (like OpenAI, Ollama, Vertex)
- 🎛️ Run the same descriptors across different models
- 📊 Compare results side-by-side

Whether you're assessing model performance, tuning prompt engineering strategies, or building monitoring for production LLM systems — this notebook will give you a hands-on guide.

## 📚 What You’ll Need
- Python 3.10+
- [Evidently](https://evidentlyai.com/) installed with extra `[llm]`
- Access to one or more LLM providers (API keys or local models like Ollama)
- Basic familiarity with Python and JSON

## 🚀 Let’s get started!

📊 First, let's prepare a simple dataset for negativity evaluation.

We'll use a tiny sample of two text reviews to test how different LLM providers assess negativity.


In [None]:
import pandas as pd


df = pd.DataFrame({"review": [
    "Your service is bad",
    "Your service is good",
]})

# 🔌 OpenAI Integration

In this section, we'll run the **Negativity** descriptor using **OpenAI's GPT-4o-mini** model.
To proceed, you'll need an **OpenAI API key**.

You can provide it in two ways:
- Set it as an environment variable: `OPENAI_API_KEY`
- Or pass it directly via `OpenAIOptions`


In [None]:
openai_api_key = "..."

In [None]:
from evidently.descriptors import NegativityLLMEval
from evidently import Dataset
from evidently.llm.options import OpenAIOptions

open_ai_negativity = NegativityLLMEval("review", provider="openai", model="gpt-4o-mini")
dataset = Dataset.from_pandas(df, descriptors=[open_ai_negativity], options=OpenAIOptions(api_key=openai_api_key))
dataset.as_dataframe()

# 🔌 Gemini Integration

Now, let's switch to **Gemini 2.0 Flash**.
Just like with OpenAI, you need to provide an API key and adjust the options.


In [None]:
from evidently.llm.options import GeminiOptions

gemini_api_key = "..."
gemini_ai_negativity = NegativityLLMEval("review", provider="gemini", model="gemini-2.0-flash")
dataset = Dataset.from_pandas(df, descriptors=[gemini_ai_negativity], options=GeminiOptions(api_key=gemini_api_key))
dataset.as_dataframe()

# 🔌 Vertex AI Integration

We can also call Gemini models from Vertex AI.
For that, you'll need to provide credentials json as api_key.


In [None]:
import json
from evidently.llm.options import VertexAIOptions

vertex_credentials = {...}
vertex_credentials_json = json.dumps(vertex_credentials)
vertex_ai_negativity = NegativityLLMEval("review", provider="vertex_ai", model="gemini-2.0-flash")
dataset = Dataset.from_pandas(df, descriptors=[vertex_ai_negativity], options=VertexAIOptions(api_key=vertex_credentials_json))
dataset.as_dataframe()

# 🔌 Mistral Integration

Now, let's switch to **Mistral**.
Just like with OpenAI and Vertex AI, you need to provide an API key and adjust the options.


In [None]:
from evidently.llm.options import MistralOptions

mistral_api_key = "..."
mistral_negativity = NegativityLLMEval("review", provider="mistral", model="mistral-small-2503")
dataset = Dataset.from_pandas(df, descriptors=[mistral_negativity], options=MistralOptions(api_key=mistral_api_key))
dataset.as_dataframe()

# 🌐 Other Providers

Evidently supports a variety of other providers out of the box.
You can check which options classes are available by inspecting:

`evidently.llm.options.__all__`


In [None]:
from evidently.llm import options

options.__all__

Even more providers are supported via the **legacy module**.
Here's a list of them for reference:


In [None]:
from evidently.legacy.utils.llm import wrapper

for name in wrapper.__dict__:
    if name.endswith("Options"):
        print(name)

Additionally, because Evidently relies on **LiteLLM** under the hood for API integration,
you can access any model/provider supported by LiteLLM — even if Evidently doesn't have a dedicated options class.

Here's how to use the generic `LLMOptions` for this:


In [None]:
from evidently.llm.options import LLMOptions

litellm_openai_negativity = NegativityLLMEval("review", provider="litellm", model="openai/gpt-4o-mini")
dataset = Dataset.from_pandas(df, descriptors=[litellm_openai_negativity], options=LLMOptions(api_key=openai_api_key))
dataset.as_dataframe()

# 🖥️ Ollama: Running LLMs Locally

You can also run models like **Llama 3.2** locally using **Ollama**.
First, install it from [https://ollama.com/download](https://ollama.com/download).

Then, pull and serve the model:


In [None]:
! ollama pull llama3.2

In [None]:
! ollama serve

Check that the Ollama API is live:


In [None]:
! curl 127.0.0.1:11434

Now, let's run the negativity descriptor using the locally served Llama model.

In [None]:
from evidently.llm.options import OllamaOptions

ollama_negativity = NegativityLLMEval("review", provider="ollama", model="llama3.2")
dataset = Dataset.from_pandas(df, descriptors=[ollama_negativity], options=OllamaOptions(api_url="http://localhost:11434"))
dataset.as_dataframe()

# ⚙️ Customizing LLM API Calls

Evidently allows customizing API call parameters by subclassing the corresponding `Options` class.

For example, to set a custom temperature for Ollama:


In [None]:
from typing import Dict, Any


class MyOllamaOptions(OllamaOptions):
    api_url = "http://localhost:11434"
    temperature: float = 0.7

    def get_additional_kwargs(self) -> Dict[str, Any]:
        return {"temperature": self.temperature}

dataset = Dataset.from_pandas(df, descriptors=[ollama_negativity], options=MyOllamaOptions(temperature=0.3))
dataset.as_dataframe()