# Evaluating an Extraction Chain

Structured data [extraction](https://python.langchain.com/docs/use_cases/extraction) from unstructured text is a core part of any LLM applications. Whether it's preparing structured rows for database insertion, deriving API parameters for function calling and forms, or for building knowledge graphs, the utility is present.

This walkthrough presents a method to evaluate an extraction chain. While our example dataset revolves around legal briefs, the principles and techniques laid out here are widely applicable across various domains and use-cases.

By the end of this guide, you'll be equipped to set up and evaluate extraction chains tailored to your specific needs, ensuring your applications extract information both effectively and efficiently.

![Contract Exctraction Dataset](./img/contract-extraction-dataset.png)

## Prerequisites

This walkthrough requires LangChain and Anthropic. Ensure they're installed and that you've configured the necessary API keys.

In [1]:
# %pip install -U langchain langsmith langchain_experimental anthropic jsonschema

In [2]:
import os
import uuid

uid = uuid.uuid4()
# os.environ["LANGCHAIN_API_KEY"] = "YOUR API KEY"
# os.environ["ANTHROPIC_API_KEY"] = "sk-ant-***"

## 1. Create dataset

For this task, we will be filling out details about legal contracts from their context. We have prepared a mall labeled dataset for this walkthrough based on the Contract Understanding Atticus Dataset (CUAD)([link](https://github.com/TheAtticusProject/cuad)). You can explore the [Contract Extraction](https://smith.langchain.com/public/08ab7912-006e-4c00-a973-0f833e74907b/d) dataset at the provided link.

In [3]:
from langsmith import Client

share_token = "08ab7912-006e-4c00-a973-0f833e74907b"
dataset_name = f"Contract Extraction - {uid}"

client = Client()
examples = list(client.list_shared_examples(share_token))
dataset = client.create_dataset(dataset_name=dataset_name)
client.create_examples(
    inputs=[e.inputs for e in examples],
    outputs=[e.outputs for e in examples],
    dataset_id=dataset.id,
)

## 2. Define extraction chain

Our dataset inputs are quite long, so we will be testing out the experimental [Anthropic Functions](https://python.langchain.com/docs/integrations/chat/anthropic_functions) chain for this extraction task. This chain prompts the model to respond in XML that conforms to the provided schema.

Below, we will define the contract schema to extract

In [4]:
from typing import List, Optional, Union

from pydantic import BaseModel


class Address(BaseModel):
    street: str
    city: str
    state: str
    zip_code: str
    country: Optional[str]


class Party(BaseModel):
    name: str
    address: Address
    type: Optional[str]


class Section(BaseModel):
    title: str
    content: str


class Contract(BaseModel):
    document_title: str
    exhibit_number: Optional[str]
    effective_date: str
    parties: List[Party]
    sections: List[Section]

Now we can define our extraction chain.  We define it in the `create_chain`

In [5]:
from langchain import hub
from langchain.chains import create_extraction_chain
from langchain.chat_models import ChatAnthropic
from langchain_experimental.llms.anthropic_functions import AnthropicFunctions

contract_prompt = hub.pull("wfh/anthropic_contract_extraction")


extraction_subchain = create_extraction_chain(
    Contract.schema(),
    llm=AnthropicFunctions(model="claude-2", max_tokens=20_000),
    prompt=contract_prompt,
)
# Dataset inputs have an "context" key, but this chain
# expects a dict with an "input" key
chain = (
    (lambda x: {"input": x["context"]})
    | extraction_subchain
    | (lambda x: {"output": x["text"]})
)

## 3. Evaluate

For this evaluation, we'll utilize the JSON edit distance evaluator, which standardizes the extracted entities and then determines a normalized string edit distance between the canonical versions. It is a fast way to check for the similarity between two json objects without relying explicitly on an LLM.

In [6]:
import logging

# We will suppress any errors here since the documents are long
logger = logging.getLogger()
logger.setLevel(logging.CRITICAL)

In [8]:
from langchain.smith import RunEvalConfig

eval_config = RunEvalConfig(
    evaluators=["json_edit_distance"],
)
client.run_on_dataset(
    dataset_name=dataset_name,
    llm_or_chain_factory=chain,
    evaluation=eval_config,
    # In case you are rate-limited
    concurrency_level=2,
)

View the evaluation results for project 'test-stupendous-leather-28' at:
https://smith.langchain.com/o/ebbaf2eb-769b-4505-aca2-d11de10372a4/projects/p/efd31950-39c2-4f9e-8da3-21f34d5cbd52?eval=true
[------------------------------------------------->] 16/16

{'project_name': 'test-stupendous-leather-28',
 'results': {'b8ed2c0b-2c74-4c18-9685-c56998b33f34': {'output': {'Error': "IndexError('list index out of range')"},
   'input': {'context': 'Exhibit 10.18\n\nMASTER SUPPLY AGREEMENT\n\nMASTER SUPPLY AGREEMENT (the "Agreement") dated November 1, 2019 (the "Effective Date") between REYNOLDS CONSUMER PRODUCTS LLC, a Delaware limited liability company with its headquarters at 1900 West Field Court, Lake Forest, IL 60045 ("Seller"), and PACTIV LLC, a Delaware limited liability company with its headquarters at 1900 West Field Court, Lake Forest, IL 60045 ("Buyer"). Seller and Buyer are referred to individually at times as a "Party" and collectively at times as the "Parties".\n\nBACKGROUND\n\nA. Seller sells various types of products used in the consumer and food service markets.\n\nB. Buyer sells various types of products, including certain products of the type made by Seller, to its customers.\n\nC. The Parties are entering into this Agreement 

## Conclusion

In this walkthrough, we showcased a methodical approach to evaluating an extraction chain applied to template filling for legal briefs.
You can use similar techniques to evaluate chains intended to return structured output.