# **ExtractLabel: Schema-Driven Extraction with Fabric AI Functions**

#### This notebook walks through the examples from the blog post. You will build extraction schemas using both raw JSON Schema and Pydantic, run them against sample warranty claims, and see how types, enums, arrays, and descriptions shape the output.
What you need: A Fabric workspace with F2 or higher capacity and a notebook attached to a lakehouse.

In [12]:
# Refer to https://learn.microsoft.com/en-us/fabric/data-science/ai-functions/overview?tabs=pandas-python%2Cpandas for installation options and guidelines
!wget -q https://aka.ms/fabric-aifunctions-whl -O synapseml_internal-latest-py3-none-any.whl
!wget -q https://aka.ms/fabric-synapseml-core-whl -O synapseml_core-latest-py3-none-any.whl
%pip install -q --force-reinstall openai==1.99.5 synapseml_internal-latest-py3-none-any.whl synapseml_core-latest-py3-none-any.whl

[m[m[m[m[m[m[m[mNote: you may need to restart the kernel to use updated packages.


## **ExtractLabel Using JSON Schema**

In [9]:
import pandas as pd
from synapse.ml.aifunc import ExtractLabel

# data
df = pd.DataFrame({
    "claim_id": ["W-001", "W-002", "W-003"],
    "text": [
        "The smart thermostat stopped turning on after 12 days. I tried a reset and new batteries. Please replace it.",
        "My water bottle arrived with a dented lid and it leaks. Box looked fine. Requesting a refund.",
        "The wireless mouse works, but the scroll wheel skips randomly. Bought it 3 months ago. I'd prefer a repair if possible."
    ]
})

# define schema
claim_schema = ExtractLabel(
    label="claim",
    max_items=1,
    type="object",
    description="Extract structured warranty claim information",
    properties={
        "type": "object",
        "properties": {
            "product_name": {
                "type": "string",
                "description": "The product mentioned in the claim. Use common name, not brand."
            },
            "problem_category": {
                "type": "string",
                "enum": ["defect", "damage_in_transit", "missing_part", "other"],
                "description": "defect=stopped working or malfunctioning, damage_in_transit=arrived damaged, missing_part=something not included"
            },
            "problem_summary": {
                "type": "string",
                "description": "One sentence summary of the issue. Max 15 words."
            },
            "time_owned": {
                "type": ["string", "null"],
                "description": "How long customer has had the product. Null if not mentioned."
            },
            "troubleshooting_tried": {
                "type": "array",
                "items": {"type": "string"},
                "description": "List of steps customer already attempted. Empty array if none mentioned."
            },
            "requested_resolution": {
                "type": "string",
                "enum": ["replacement", "refund", "repair", "replacement_part", "other"],
                "description": "What the customer is asking for. Use 'other' if unclear."
            }
        },
        "required": ["product_name", "problem_category", "problem_summary", "time_owned", "troubleshooting_tried", "requested_resolution"],
        "additionalProperties": False
    }
)


df[["claim"]] = df["text"].ai.extract(claim_schema)
display(df)

100%|██████████| 3/3 [00:01<00:00,  1.91it/s]


## **ExtractLabel Using Pydantic**

In [11]:
import pandas as pd
from pydantic import BaseModel, Field
from typing import Optional, Literal
from synapse.ml.aifunc import ExtractLabel


# Define schema using Pydantic
class WarrantyClaim(BaseModel):
    product_name: str = Field(
        ..., description="The product mentioned in the claim. Use common name, not brand."
    )
    problem_category: Literal["defect", "damage_in_transit", "missing_part", "other"] = Field(
        ..., description="defect=stopped working or malfunctioning, damage_in_transit=arrived damaged, missing_part=something not included"
    )
    problem_summary: str = Field(
        ..., description="One sentence summary of the issue. Max 15 words."
    )
    time_owned: Optional[str] = Field(
        None, description="How long customer has had the product. Null if not mentioned."
    )
    troubleshooting_tried: list[str] = Field(
        default_factory=list, description="List of steps customer already attempted. Empty array if none mentioned."
    )
    requested_resolution: Literal["replacement", "refund", "repair", "replacement_part", "other"] = Field(
        ..., description="What the customer is asking for. Use 'other' if unclear."
    )


def pydantic_to_extract_properties(model: type[BaseModel]) -> dict:
    schema = model.model_json_schema()
    
    # ExtractLabel requires all properties to be in "required"
    if "properties" in schema:
        schema["required"] = list(schema["properties"].keys())
    
    schema["additionalProperties"] = False
    return schema

schema = pydantic_to_extract_properties(WarrantyClaim)

claim_schema = ExtractLabel(
    label="claim",
    max_items=1,
    type="object",
    description="Extract structured warranty claim information",
    properties=schema
)

# extract
df[["claim"]] = df["text"].ai.extract(claim_schema)
display(df)

100%|██████████| 3/3 [00:01<00:00,  2.15it/s]
