# Custom Chain

This notebook shows how to evaluate a custom chain on ALL evaluation tasks.

We will first define a `create_chain` function that creates a custom chain given a schema to extract. We will then iterate over all benchmark tasks for extraction and run our chain over them.


In [54]:
#!pip install -U langchain-benchmarks langchain langchain-openai rapidfuzz

## Get the Benchmarks

First, let's load the relevant benchmarks.

In [22]:
from langchain_benchmarks import registry, clone_public_dataset

registry.filter(Type="ExtractionTask")

Name,Type,Dataset ID,Description
Email Extraction,ExtractionTask,a1742786-bde5-4f51-a1d8-e148e5251ddb,"A dataset of 42 real emails deduped from a spam folder, with semantic HTML tags removed, as well as a script for initial extraction and formatting of other emails from an arbitrary .mbox file like the one exported by Gmail. Some additional cleanup of the data was done by hand after the initial pass. See https://github.com/jacoblee93/oss-model-extraction-evals."
Chat Extraction,ExtractionTask,00f4444c-9460-4a82-b87a-f50096f1cfef,A dataset meant to test the ability of an LLM to extract and infer structured information from a dialogue. The dialogue is between a user and a support engineer. Outputs should be structured as a JSON object and test both the ability of the LLM to correctly structure the information and its ability to perform simple classification tasks.


In [23]:
task = registry["Email Extraction"]

Each task has instructions (which are a prompt) as well as a schema. You do not need to use the instructions but they may be helpful for quickly bootstrapping a default prompt.

In [24]:
task.instructions

ChatPromptTemplate(input_variables=['input'], messages=[SystemMessagePromptTemplate(prompt=PromptTemplate(input_variables=[], template='You are an expert researcher.')), HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['input'], template='What can you tell me about the following email? Make sure to extract the question in the correct format. Here is the email:\n ```\n{input}\n```'))])

In [25]:
task.schema

langchain_benchmarks.extraction.tasks.email_task.Email

## Define Chain Creation Function

Here is where we put our logic for extracting things. We will make this function take in a prompt and an output schema (although it can really take in anything, you just need to modify the logic where it is called below).

In [39]:
from langchain_openai import ChatOpenAI
from langchain.output_parsers.openai_tools import JsonOutputToolsParser

In [40]:
def create_extraction_chain(prompt, schema):
    llm = ChatOpenAI(model="gpt-4-turbo-preview", temperature=0).bind_tools(
        tools=[schema],
    )
    
    output_parser = JsonOutputToolsParser()
    extraction_chain = prompt | llm | output_parser | (lambda x: {"output": x[0]['args']})
    return extraction_chain

In [41]:
## Loop over tasks



In [42]:
chains_to_eval = [
    ("openai-tools", create_extraction_chain)
]

In [43]:
import datetime

from langsmith.client import Client
from langchain_benchmarks.extraction import get_eval_config
from langchain_benchmarks.extraction.tasks.chat_extraction import get_eval_config as get_chat_eval_config


In [44]:
eval_configs = {
    "Email Extraction": get_eval_config(),
    "Chat Extraction": get_chat_eval_config()
}

In [48]:
prompts = {task.name: task.instructions for task in registry.filter(Type="ExtractionTask")}

In [61]:
from langchain_core.messages import SystemMessage, HumanMessage

_email_template = """What can you tell me about the following email? Make sure to extract the question in the correct format. Here is the email:\n ```\n{input}\n```"""
def email_extraction_formatting(inputs):
    return [HumanMessage(content=_email_template.format(input=inputs["input"]))]

_chat_template = """Generate a ticket for the following question-response pair:\n<Dialogue>\n{dialogue}\n</Dialogue>'"""
_chat_instructions = """You are a helpdesk assistant responsible with extracting information and generating tickets. Dialogues are between a user and a support engineer."""

def format_run(dialogue_input: dict):
    question = dialogue_input["question"]
    answer = dialogue_input["answer"]
    return {
        "dialogue": f"<question>\n{question}\n</question>\n"
        f"<assistant-response>\n{answer}\n</assistant-response>"
    }

def chat_extraction_formatting(inputs):
    dialogue = format_run(inputs)["dialogue"]
    return [
        SystemMessage(content=_chat_instructions),
        HumanMessage(content=_chat_template.format(dialogue=dialogue))
    ]

prompt_formatting = {
    "Email Extraction": email_extraction_formatting,
    "Chat Extraction": chat_extraction_formatting
}

In [62]:
import uuid

client = Client()  # Launch langsmith client for cloning datasets
today = datetime.datetime.today().isoformat()

for task in registry.filter(Type="ExtractionTask"):

    dataset_name = task.name
    clone_public_dataset(task.dataset_id, dataset_name=dataset_name)
    dataset = client.read_dataset(dataset_name=dataset_name)

    for name, chain_factory in chains_to_eval:
        if task.name == "Email Extraction":
            continue
        print()
        print(f"Benchmarking {task.name} on {name}")
        eval_config = eval_configs[task.name]

        chain = chain_factory(prompt_formatting[task.name], task.schema)
        project_name = f"{name}-{task.name}-{today}"
        client.run_on_dataset(
            dataset_name=dataset_name,
            llm_or_chain_factory=chain,
            evaluation=eval_config,
            verbose=False,
            project_name=project_name,
            tags=[name],
            concurrency_level=5,
            project_metadata={
                "name": name,
                "task": task.name,
                "date": today,
            },
        )

Dataset Email Extraction already exists. Skipping.
You can access the dataset at https://smith.langchain.com/o/97591f89-2916-48d3-804e-20cab23f91aa/datasets/ccbb1190-dc59-45c8-8f5d-7a7a00fa4c4d.
Dataset Chat Extraction already exists. Skipping.
You can access the dataset at https://smith.langchain.com/o/97591f89-2916-48d3-804e-20cab23f91aa/datasets/b8637606-8ac0-4bab-9ad5-29796196cbbc.

Benchmarking Chat Extraction on openai-tools
View the evaluation results for project 'openai-tools-Chat Extraction-2024-02-20T20:39:20.189708' at:
https://smith.langchain.com/o/97591f89-2916-48d3-804e-20cab23f91aa/datasets/b8637606-8ac0-4bab-9ad5-29796196cbbc/compare?selectedSessions=7c1213e1-7dcb-4d2b-b252-04e51e3ed82e

View all tests for Dataset Chat Extraction at:
https://smith.langchain.com/o/97591f89-2916-48d3-804e-20cab23f91aa/datasets/b8637606-8ac0-4bab-9ad5-29796196cbbc
[------------------------------------------------->] 27/27