### Task example

Given a set of lease documents. We need to automatically extract some information (fields) from each document. Find the best LLM model and strategy.

Let's say we need to extract full name and address from each document.

Output template for extraction would be:
```json
{
  "full_name": "<landlord full name>",
  "address": "<address of the property>"
}
```

### How to solve the problem

To find the best model we need some metrics to measure the quality of extraction. Strightworward equality might not work in 100% of use cases. Thus, you can use one of the simplest metric that suit extraction tasks: Levenshtein distance between two strings. To find more metrics check out this [article](https://www.confident-ai.com/blog/llm-evaluation-metrics-everything-you-need-for-llm-evaluation).

After we can make multiple runs of extraction using different models/strategies, and compare the total accuracy.

In [2]:
from fuzzywuzzy import fuzz
from enum import Enum

class EqualityPattern(Enum):
    good = 'good'
    half_credit = 'half-credit'
    bed = 'bed'

SIMILARITY_THRESHOLD = 90
HALF_CREDIT_THRESHOLD = 70

def equality_measurement(target_string: str, compared_string: str) -> str:
    ratio = fuzz.token_set_ratio(target_string, compared_string)
    if ratio > SIMILARITY_THRESHOLD:
        return EqualityPattern.good
    elif ratio > HALF_CREDIT_THRESHOLD:
        return EqualityPattern.half_credit
    else:
        return EqualityPattern.bed
    

In [1]:
from openpyxl.styles import PatternFill, Font

def color_patterns(pattern: str):
    if pattern == EqualityPattern.good:
        fill = PatternFill(start_color="C6EFCE", end_color="C6EFCE", fill_type="solid")
        font = Font(color="006100", size=12)
    elif pattern == EqualityPattern.half_credit:
        fill = PatternFill(start_color="FFEB9C", end_color="FFEB9C", fill_type="solid")
        font = Font(color="9C6500", size=12)
    elif pattern == EqualityPattern.bed:
        fill = PatternFill(start_color="FFC7CE", end_color="FFC7CE", fill_type="solid")
        font = Font(color="9C0006", size=12)
    return fill, font

In [3]:
# Let's save evaluation of one model/strategy into Excel file for further analysis
import pandas as pd
from openpyxl.styles import PatternFill

# Sample data
extraction_df = pd.DataFrame([
    {"name": "first", "full_name": "John Doe", "address": "123 Elm St"},
    {"name": "second", "full_name": "Jane Smith", "address": "456 Oak St"}
])

golden_df = pd.DataFrame([
    {"name": "first", "full_name": "John Doe", "address": "123 Elm St"},
    {"name": "second", "full_name": "Jane Smith", "address": "789 Pine St"}
])
df = extraction_df

with pd.ExcelWriter('output.xlsx', engine='openpyxl') as writer:
    sheet_name = 'Sheet1'
    df.to_excel(writer, index=False, sheet_name=sheet_name)
    workbook = writer.book
    sheet = writer.sheets[sheet_name]

    # Iterate over the DataFrame and apply the green fill
    for row in range(2, len(df) + 2):  # iterate over each document, skipping header (first row)
        for col in range(2, len(df.columns) + 1):  # iterate over each extraction field, skipping document name (first column)
            cell_value = sheet.cell(row=row, column=col).value
            golden_value = golden_df.iloc[row-2, col-1]
            equality_pattern = equality_measurement(cell_value, golden_value)
            fill, font = color_patterns(equality_pattern)
            cell = sheet.cell(row=row, column=col) 
            cell.fill = fill
            cell.font = font

### Output example

The code above generates xlsx file, that looks like this inside:

![](../images/img.png)


### Next steps 

You can evaluate accuracy:
- by all cells (reveal overall extraction quality of a model)
- by fields (for example, to check if some prompt deceases quality of a certain field extraction)
- by documents (for example, to find low-qualitative documents after OCR - Optical character recognition)