# OpenInference Callback Handler

[OpenInference](https://github.com/Arize-ai/open-inference-spec) is an open standard for capturing and storing AI model inferences. It enables production LLMapp servers to seamlessly integrate with LLM observability solutions such as [Arize](https://arize.com/) and [Phoenix](https://github.com/Arize-ai/phoenix).

The `OpenInferenceCallbackHandler` saves data from retrieval-augmented generation (RAG) systems for downstream analysis and debugging. In particular, it saves the following data in columnar format:

- query IDs
- query text
- query embeddings
- scores (e.g., cosine similarity)
- retrieved document IDs

This tutorial demonstrates the callback handler's use for both in-notebook experimentation and lightweight production logging.

⚠️ The `OpenInferenceCallbackHandler` is in beta and its APIs are subject to change.

ℹ️ The callback maintainers are actively working to support additional RAG use-cases. If you find that your particular query engine or use-case is not supported, email the maintainers at phoenix-devs@arize.com.

## Install Dependencies and Import Libraries

Install notebook dependencies.

In [None]:
!pip install html2text llama-index pandas tqdm

Import libraries.


In [None]:
import hashlib
import json
import os
import textwrap

from llama_index import (
    SimpleWebPageReader,
    ServiceContext,
    VectorStoreIndex,
)
from llama_index.callbacks import CallbackManager, OpenInferenceCallbackHandler
from llama_index.callbacks.open_inference_callback import DataBuffer, QueryData
from llama_index.node_parser import SimpleNodeParser
import pandas as pd
from tqdm import tqdm

## Load and Parse Documents

Load documents from Paul Graham's essay "What I Worked On".

In [None]:
documents = SimpleWebPageReader().load_data(
    [
        "http://raw.githubusercontent.com/jerryjliu/llama_index/main/examples/paul_graham_essay/data/paul_graham_essay.txt"
    ]
)
print(documents[0].text)

Parse the document into nodes. Display the first node's text.

In [None]:
parser = SimpleNodeParser()
nodes = parser.get_nodes_from_documents(documents)
print(nodes[0].text)


## Access RAG Data as a Pandas Dataframe

When experimenting with chatbots and LLMapps in a notebook, it's often useful to run your chatbot against a small collection of user queries and collect and analyze the data for iterative improvement. The `OpenInferenceCallbackHandler` stores RAG data in columnar format and provides convenient access to the data as a pandas dataframe.

Instantiate the OpenInference callback handler and attach to the service context.

In [None]:
callback_handler = OpenInferenceCallbackHandler()
callback_manager = CallbackManager([callback_handler])
service_context = ServiceContext.from_defaults(callback_manager=callback_manager)


Build the index and instantiate the query engine.

In [None]:
index = VectorStoreIndex.from_documents(documents, service_context=service_context)
query_engine = index.as_query_engine()

Run your query engine across a collection of queries.

In [None]:
max_characters_per_line = 80
queries = [
    "What did Paul Graham do growing up?",
    "When and how did Paul Graham's mother die?",
    "What, in Paul Graham's opinion, is the most distinctive thing about YC?",
    "When and how did Paul Graham meet Jessica Livingston?",
    "What is Bel, and when and where was it written?",
]
for query in queries:
    response = query_engine.query(query)
    print("Query")
    print("=====")
    print(textwrap.fill(query, max_characters_per_line))
    print()
    print("Response")
    print("========")
    print(textwrap.fill(str(response), max_characters_per_line))
    print()


The data from your query engine runs can be accessed as a pandas dataframe for analysis and iterative improvement.

In [None]:
callback_handler.query_dataframe

The dataframe column names conform to the OpenInference spec, which specifies the category, data type, and intent of each column.

## Log Production RAG Data

In a production setting, LlamaIndex application maintainers can log the data generated by their system by implementing and passing a custom `logger` to `OpenInferenceCallbackHandler`. The logger is a callable that accepts a buffer of data from the `OpenInferenceCallbackHandler`, persists the data (e.g., by uploading to cloud storage or sending to a data ingestion service), and flushes the buffer after data is persisted. A reference implementation is included below that periodically writes data in OpenInference format to local Parquet files when the buffer exceeds a certain size.

In [None]:
class LocalParquetLogger:
    def __init__(self, data_path: str, file_size_threshold_in_bytes: int = 1000):
        """Writes query data to Parquet files.

        Args:
            data_path (str): Directory where parquet files will be written. If the directory does
            not exist, it will be created.

            file_size_threshold_in_bytes (int): File-size threshold in bytes. Once this threshold is
            exceeded, the query data will be written and the buffer will be flushed.
        """
        os.makedirs(data_path, exist_ok=True)
        self._data_path = data_path
        self._max_size_in_bytes = file_size_threshold_in_bytes

    def __call__(self, query_data_buffer: DataBuffer[QueryData]) -> None:
        """Writes the query data buffer to a Parquet file if the buffer size exceeds the configured
           file size threshold.

        Args:
            query_data_buffer (DataBuffer[QueryData]): A buffer of query data.
        """
        if query_data_buffer.size_in_bytes > self._max_size_in_bytes:
            self._flush_buffer(query_data_buffer)

    def _flush_buffer(self, query_data_buffer: DataBuffer[QueryData]) -> None:
        """Writes the query data buffer to a Parquet file and empties the buffer.

        Args:
            data_buffer (DataBuffer[QueryData]): A buffer of query data.
        """
        query_dataframe = query_data_buffer.dataframe
        query_ids_string = str(query_dataframe[":id.str:"].tolist())
        batch_id = self._hash_string(query_ids_string)
        file_path = os.path.join(self._data_path, f"log-{batch_id}.parquet")
        query_dataframe.to_parquet(file_path)
        query_data_buffer.clear()

    @staticmethod
    def _hash_string(string: str) -> str:
        """Hashes a string using SHA256.

        Args:
            string (str): Input string.

        Returns:
            str: Hashed string.
        """
        return hashlib.sha256(string.encode()).hexdigest()


⚠️ In a production setting, it's important to configure the `max_size_in_bytes` parameter on `OpenInferenceCallbackHandler` or periodically flush the buffer via your own custom logger logic, otherwise, the callback handler will indefinitely accumulate data in memory and eventually cause your system to crash.

Attach the logger to your callback and re-run the query engine. The RAG data will be saved to disk.

In [None]:
data_path = "rag_data"
logger = LocalParquetLogger(
    data_path=data_path,
    # this parameter is set artificially low for demonstration purposes
    # to force a flush to disk, in practice it would be much larger
    file_size_threshold_in_bytes=40,
)
callback_handler = OpenInferenceCallbackHandler(
    logger=logger,
    # this parameter is not necessary since the logger flushes to disk
    # but is recommended as a safeguard
    max_size_in_bytes=1000,
)
callback_manager = CallbackManager([callback_handler])
service_context = ServiceContext.from_defaults(callback_manager=callback_manager)
index = VectorStoreIndex.from_documents(documents, service_context=service_context)
query_engine = index.as_query_engine()

for query in tqdm(queries):
    query_engine.query(query)

Load and display saved Parquet data from disk to verify that the logger is working. 

In [None]:
query_dataframes = []
for file_name in os.listdir(data_path):
    file_path = os.path.join(data_path, file_name)
    query_dataframes.append(pd.read_parquet(file_path))
query_dataframe = pd.concat(query_dataframes)
query_dataframe
