# Insurance Claims Text Embedder and Fine-Tuning Data Augmentation

## Description
This notebook processes insurance claims data to create augmented text datasets for fine-tuning language models and embedding training.

## Features
- **Dataset Download**: Automatic retrieval from Mendeley repository
- **Text Generation**: Claims narrative creation from structured data
- **GPT Integration**: Automated text augmentation using OpenAI API
- **Data Augmentation**: Multiple variations per insurance case

## Input
- Insurance claims dataset from Mendeley Data (AQQAD, ABDELRAHIM 2023)
- API key for OpenAI GPT integration

## Output
- `insurance_claims_generated_texts.csv`: Augmented text dataset
- `DB_pro_embedder.csv`: Final processed dataset for embeddings

## Package Installation

Installing necessary packages for enhanced Jupyter functionality:

In [None]:
# !pip install --upgrade jupyter ipywidgets

## Insurance Dataset Upload

### Data Source
**Dataset**: AQQAD, ABDELRAHIM (2023), "insurance_claims", Mendeley Data, V2  
**DOI**: 10.17632/992mh7dk9y.2  
**URL**: Mendeley repository with 1,000 insurance claim records

### Automatic Download and Processing
Downloading and extracting CSV data directly from Mendeley repository:

In [3]:
# upload dataset AQQAD, ABDELRAHIM (2023), “insurance_claims ”, Mendeley Data, V2, doi: 10.17632/992mh7dk9y.2
import pandas as pd
import requests
import zipfile
from io import BytesIO

# URL del file ZIP da Mendeley
url = "https://prod-dcd-datasets-cache-zipfiles.s3.eu-west-1.amazonaws.com/992mh7dk9y-2.zip"

response = requests.get(url)
zip_file = zipfile.ZipFile(BytesIO(response.content))

# Lista tutti i file
file_list = zip_file.namelist()
print("File contenuti nello ZIP:")
for filename in file_list:
    print(filename)

# Trova automaticamente il file CSV
csv_files = [f for f in file_list if f.endswith('.csv')]

if csv_files:
    # Usa il primo file CSV trovato
    csv_file_path = csv_files[0]
    print(f"Caricamento del file: {csv_file_path}")
    
    with zip_file.open(csv_file_path) as file:
        df = pd.read_csv(file)
    
    print(df.head())
    print(f"Dimensioni dataset: {df.shape}")
else:
    print("Nessun file CSV trovato nello ZIP")


File contenuti nello ZIP:
insurance_claims/insurance_claims.csv
Caricamento del file: insurance_claims/insurance_claims.csv
   months_as_customer  age  policy_number policy_bind_date policy_state  \
0                 328   48         521585       2014-10-17           OH   
1                 228   42         342868       2006-06-27           IN   
2                 134   29         687698       2000-09-06           OH   
3                 256   41         227811       1990-05-25           IL   
4                 228   44         367455       2014-06-06           IL   

  policy_csl  policy_deductable  policy_annual_premium  umbrella_limit  \
0    250/500               1000                1406.91               0   
1    250/500               2000                1197.22         5000000   
2    100/300               2000                1413.14         5000000   
3    250/500               2000                1415.74         6000000   
4   500/1000               1000                1583.91 

## Claims Narrative Generation

### Narrative Structure
Converting structured insurance data into detailed claim narratives with:

| Section | Content | Purpose |
|---------|---------|---------|
| **Policyholder Profile** | Demographics, policy details | Customer context |
| **Incident Details** | Vehicle, location, police report | Incident context |
| **Claim Breakdown** | Financial components, severity | Financial analysis |
| **Investigation Outcome** | Fraud determination | Risk assessment |

### Text Transformation Process
Each row is transformed into a comprehensive narrative suitable for:
- **Text Analysis**: NLP processing
- **Embeddings Training**: Vector representation learning
- **Fine-tuning**: Language model adaptation

In [5]:
def crea_cronistoria_sinistro(row):
    """Trasforma una riga del dataset in una cronistoria narrativa del sinistro"""
    
    # Determina il tipo di polizza
    policy_type = "standard" if row['umbrella_limit'] == 0 else "umbrella"
    
    # Calcola l'età della polizza
    from datetime import datetime
    bind_date = datetime.strptime(row['policy_bind_date'], '%Y-%m-%d')
    policy_age_years = (datetime.now() - bind_date).days // 365
    
    # Determina la gravità del sinistro
    if row['total_claim_amount'] > 50000:
        severity = "major"
    elif row['total_claim_amount'] > 20000:
        severity = "moderate"
    else:
        severity = "minor"
    
    cronistoria = f"""
CLAIM INCIDENT REPORT - Policy #{row['policy_number']}

POLICYHOLDER PROFILE:
A {row['age']}-year-old customer who has been with the company for {row['months_as_customer']} months. 
The policy was bound on {row['policy_bind_date']} in {row['policy_state']} state with {row['policy_csl']} 
coverage limits and a ${row['policy_deductable']:,} deductible. Annual premium: ${row['policy_annual_premium']:,.2f}.
{"This customer has umbrella coverage with a limit of ${:,}.".format(row['umbrella_limit']) if row['umbrella_limit'] > 0 else "No umbrella coverage."}

INCIDENT DETAILS:
The incident involved a {row['auto_year']} {row['auto_make']} {row['auto_model']} in zip code {row['insured_zip']}.
Police report {"was" if row['police_report_available'] == 'YES' else "was not"} available for this incident.

CLAIM BREAKDOWN:
This {severity} claim totaled ${row['total_claim_amount']:,.2f}, consisting of:
- Injury claim: ${row['injury_claim']:,.2f}
- Property damage: ${row['property_claim']:,.2f}
- Vehicle damage: ${row['vehicle_claim']:,.2f}

INVESTIGATION OUTCOME:
After thorough investigation, this claim was {"flagged as fraudulent" if row['fraud_reported'] == 'Y' else "determined to be legitimate"}.

---
"""
    return cronistoria

# Applica la funzione a tutto il dataset
cronistorie_complete = []
for index, row in df.iterrows():
    cronistoria = crea_cronistoria_sinistro(row)
    cronistorie_complete.append(cronistoria)

# Unisci tutte le cronistorie in un unico testo
testo_completo = "\n".join(cronistorie_complete)

# Visualizza le prime due cronistorie come esempio
print("PRIME DUE CRONISTORIE:")
print(cronistorie_complete[0])
print(cronistorie_complete[1])


PRIME DUE CRONISTORIE:

CLAIM INCIDENT REPORT - Policy #521585

POLICYHOLDER PROFILE:
A 48-year-old customer who has been with the company for 328 months. 
The policy was bound on 2014-10-17 in OH state with 250/500 
coverage limits and a $1,000 deductible. Annual premium: $1,406.91.
No umbrella coverage.

INCIDENT DETAILS:
The incident involved a 2004 Saab 92x in zip code 466132.
Police report was available for this incident.

CLAIM BREAKDOWN:
This major claim totaled $71,610.00, consisting of:
- Injury claim: $6,510.00
- Property damage: $13,020.00
- Vehicle damage: $52,080.00

INVESTIGATION OUTCOME:
After thorough investigation, this claim was flagged as fraudulent.

---


CLAIM INCIDENT REPORT - Policy #342868

POLICYHOLDER PROFILE:
A 42-year-old customer who has been with the company for 228 months. 
The policy was bound on 2006-06-27 in IN state with 250/500 
coverage limits and a $2,000 deductible. Annual premium: $1,197.22.
This customer has umbrella coverage with a limit of $5

## Augmented Data Generation

### GPT-Powered Text Augmentation
Creating multiple variations of each insurance claim for:
- **Training Diversity**: Varied linguistic expressions
- **Model Robustness**: Different narrative styles
- **Data Enrichment**: Enhanced training corpus

### Prompt Engineering Strategy
- **System Prompt**: Professional insurance industry context
- **User Prompt**: Specific claim details with constraints
- **Output Format**: JSON with 3 distinct texts per claim
- **Quality Control**: Token length and relevance validation

**AUGMENTED DATA**

*creazione prompt*

In [8]:
import json
from datetime import datetime

def crea_descrizione_sinistro(row):
    """Genera una descrizione narrativa dettagliata per una riga del DataFrame"""
    policy_type = "standard" if row['umbrella_limit'] == 0 else "umbrella"
    
    bind_date = datetime.strptime(row['policy_bind_date'], '%Y-%m-%d')
    policy_age_years = (datetime.now() - bind_date).days // 365
    
    if row['total_claim_amount'] > 50000:
        severity = "major"
    elif row['total_claim_amount'] > 20000:
        severity = "moderate"
    else:
        severity = "minor"
    
    umbrella_text = f"This customer has umbrella coverage with a limit of ${row['umbrella_limit']:,}." if row['umbrella_limit'] > 0 else "No umbrella coverage."
    police_report_text = "Police report was available for this incident." if row['police_report_available'] == 'YES' else "Police report was not available for this incident."
    fraud_text = "After thorough investigation, this claim was flagged as fraudulent." if row['fraud_reported'] == 'Y' else "After thorough investigation, this claim was determined to be legitimate."
    
    descrizione = f"""CLAIM INCIDENT REPORT - Policy #{row['policy_number']}

POLICYHOLDER PROFILE:
A {row['age']}-year-old customer who has been with the company for {row['months_as_customer']} months. 
The policy was bound on {row['policy_bind_date']} in {row['policy_state']} state with {row['policy_csl']} 
coverage limits and a ${row['policy_deductable']:,} deductible. Annual premium: ${row['policy_annual_premium']:,.2f}.
{umbrella_text}

INCIDENT DETAILS:
The incident involved a {row['auto_year']} {row['auto_make']} {row['auto_model']} in zip code {row['insured_zip']}.
{police_report_text}

CLAIM BREAKDOWN:
This {severity} claim totaled ${row['total_claim_amount']:,.2f}, consisting of:
- Injury claim: ${row['injury_claim']:,.2f}
- Property damage: ${row['property_claim']:,.2f}
- Vehicle damage: ${row['vehicle_claim']:,.2f}

INVESTIGATION OUTCOME:
{fraud_text}"""
    
    return descrizione

def generate_insurance_claim_texts_prompts(df, output_file='prompts_insurance_claims.txt'):
    """
    Genera prompt per ChatGPT Mini per creare 3 testi distinti e originali in inglese 
    per ogni riga del dataset di sinistri assicurativi.

    Args:
        df (pandas.DataFrame): DataFrame contenente i dati dei sinistri
        output_file (str): Nome del file in cui salvare i prompt generati

    Returns:
        list: Lista dei prompt generati in formato JSON
    """
    # Prepara il system_prompt
    system_prompt = """You are an assistant specialized in creating detailed insurance claim narratives in English. Your task is to generate 3 distinct and original texts for each insurance claim, all strictly relevant to the claim data provided. The texts must be between 300 and 512 tokens in length, realistic, formally correct, and maintain a professional insurance industry tone. Each text should explore different aspects of the claim while maintaining strong thematic relevance to the insurance case. Return the response in JSON format according to this schema:
    {
      "texts": [
        {
          "text": "Generated text 1",
          "policy_number": "original_policy_number"
        },
        {
          "text": "Generated text 2",
          "policy_number": "original_policy_number"
        },
        {
          "text": "Generated text 3",
          "policy_number": "original_policy_number"
        }
      ]
    }
    """

    prompts = []

    for _, row in df.iterrows():
        policy_number = str(row.get('policy_number', ''))
        
        if not policy_number:
            print(f"Warning: Missing policy_number for row {row.name}")
            continue

        # Usa la funzione esistente per creare la descrizione dettagliata
        claim_description = crea_descrizione_sinistro(row)

        # Prepara lo user_prompt
        user_prompt = f"""Generate 3 distinct and original insurance claim narratives in English for policy number "{policy_number}", based on the following detailed claim report:

"{claim_description}"

The texts must:
1. Be relevant to policy number "{policy_number}"
2. Have a length between 300 and 512 tokens
3. Explore different aspects of the insurance claim maintaining a professional tone
4. Be realistic and formally correct in insurance industry language
5. Be completely original (do not copy parts of the claim report)

Provide only the JSON with the 3 texts, following the required schema."""

        # Formato compatibile con il codice di processing
        prompt_json = json.dumps([
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": user_prompt}
        ], ensure_ascii=False)

        prompts.append(prompt_json)

    # Salva i prompt in formato JSON, uno per riga
    with open(output_file, 'w', encoding='utf-8') as f:
        for prompt in prompts:
            f.write(prompt + '\n')

    print(f"Created {len(prompts)} prompts for insurance claim text generation and saved to {output_file}")

    return prompts

# Esempio di utilizzo
if __name__ == "__main__":
    # Genera i prompt per tutto il dataset
    prompts = generate_insurance_claim_texts_prompts(df)


Created 1000 prompts for insurance claim text generation and saved to prompts_insurance_claims.txt


## Generated Prompt Example

### Prompt Structure
Each prompt contains:
- **System Message**: Role definition and output format specification
- **User Message**: Specific claim data and generation requirements
- **JSON Schema**: Structured response format for 3 text variations

In [9]:
print(prompts[0])

[{"role": "system", "content": "You are an assistant specialized in creating detailed insurance claim narratives in English. Your task is to generate 3 distinct and original texts for each insurance claim, all strictly relevant to the claim data provided. The texts must be between 300 and 512 tokens in length, realistic, formally correct, and maintain a professional insurance industry tone. Each text should explore different aspects of the claim while maintaining strong thematic relevance to the insurance case. Return the response in JSON format according to this schema:\n    {\n      \"texts\": [\n        {\n          \"text\": \"Generated text 1\",\n          \"policy_number\": \"original_policy_number\"\n        },\n        {\n          \"text\": \"Generated text 2\",\n          \"policy_number\": \"original_policy_number\"\n        },\n        {\n          \"text\": \"Generated text 3\",\n          \"policy_number\": \"original_policy_number\"\n        }\n      ]\n    }\n    "}, {"

## GPT API Integration

### Asynchronous Processing
- **Concurrent Requests**: Up to 40 simultaneous API calls
- **Progress Tracking**: Real-time progress monitoring with tqdm
- **Error Handling**: Comprehensive error management
- **Rate Limiting**: Semaphore-based request throttling

### API Configuration
- **Model**: GPT-4.1-nano for cost-effective generation
- **Temperature**: 0.7 for balanced creativity and consistency
- **Response Format**: JSON structured output

In [1]:
import asyncio
import openai
import json
from tqdm.notebook import tqdm

async def call_gpt(prompt_json, client):
    """Esegue una singola chiamata a GPT"""
    try:
        # Parsa il JSON delle messages
        messages = json.loads(prompt_json)

        response = await asyncio.to_thread(
            client.chat.completions.create,
            model="gpt-4.1-nano",
            messages=messages,  # Usa direttamente l'array di messaggi
            temperature=0.7
        )
        return {
            "prompt": prompt_json,
            "response": response.choices[0].message.content,
            "status": "success"
        }
    except Exception as e:
        return {
            "prompt": prompt_json,
            "response": f"Error: {str(e)}",
            "status": "error"
        }

async def process_prompts(api_key, input_file, output_file, max_concurrent=40):
    """Processa tutti i prompt dal file"""
    # Inizializza il client OpenAI
    client = openai.OpenAI(api_key=api_key)

    # Leggi i prompt
    with open(input_file, 'r', encoding='utf-8') as f:
        prompts = [line.strip() for line in f if line.strip()]

    # Crea un semaforo per limitare le richieste concorrenti
    sem = asyncio.Semaphore(max_concurrent)

    async def bounded_call(prompt):
        async with sem:
            return await call_gpt(prompt, client)

    # Esegui le richieste con progress bar
    tasks = [bounded_call(prompt) for prompt in prompts]
    results = []

    for coro in tqdm(asyncio.as_completed(tasks), total=len(tasks)):
        result = await coro
        results.append(result)

    # Salva i risultati
    with open(output_file, 'w', encoding='utf-8') as f:
        json.dump(results, f, ensure_ascii=False, indent=2)

    return results

async def main():
    # Configurazione
    API_KEY = "sk-....OPENAI_KEY"  # Sostituisci con la tua chiave API

    # Percorsi dei file
    INPUT_FILE = "prompts_insurance_claims.txt"  # Il file generato dalla funzione precedente
    OUTPUT_FILE = "risposte_testi_simili.json"

    # Numero massimo di richieste concorrenti
    MAX_CONCURRENT = 40

    # Esegui il processo
    results = await process_prompts(API_KEY, INPUT_FILE, OUTPUT_FILE, MAX_CONCURRENT)
    print(f"Completato! Risultati salvati in {OUTPUT_FILE}")

    # Mostra un esempio di risposta se disponibile
    if results and len(results) > 0:
        try:
            first_response = json.loads(results[0]["response"])
            print("\nEsempio di risposta:")
            print(f"Testo 1 generato per la keyword: {first_response['testi'][0]['keyword']}")
            print(first_response['testi'][0]['testo'][:150] + "...")
        except:
            print("\nNon è stato possibile analizzare la prima risposta.")

    return results

# Per Jupyter Notebook, esegui:
results = await main()

  0%|          | 0/1000 [00:00<?, ?it/s]

Completato! Risultati salvati in risposte_testi_simili.json

Esempio di risposta:

Non è stato possibile analizzare la prima risposta.


## Response Processing and DataFrame Creation

### Data Extraction Pipeline
Processing GPT responses into structured format:
- **JSON Parsing**: Extract generated texts from API responses
- **Data Validation**: Verify response structure and content
- **DataFrame Creation**: Organize data for analysis and export
- **Quality Control**: Filter successful responses and handle errors

### Output Structure
Final dataset contains:
- **policy_number**: Original policy identifier
- **original_claim_report**: Source claim narrative
- **generated_text**: GPT-generated variation
- **text_index**: Text variation number (1-3)

In [2]:
import pandas as pd
import json
import re

def create_insurance_claims_dataframe_from_responses(json_file, output_csv='insurance_claims_generated_texts.csv'):
    """
    Crea un DataFrame che contiene policy_number, claim report originale e testi generati

    Args:
        json_file (str): Percorso del file JSON contenente le risposte
        output_csv (str): Nome del file CSV di output

    Returns:
        pandas.DataFrame: DataFrame con i dati estratti
    """
    # Leggi il file JSON
    with open(json_file, 'r', encoding='utf-8') as f:
        try:
            data = json.load(f)
            print(f"JSON file loaded successfully. Contains {len(data)} elements.")
        except json.JSONDecodeError as e:
            print(f"Error loading JSON: {e}")
            f.seek(0)
            content = f.read()
            print(f"File content length: {len(content)} characters")
            print(f"First 200 characters: '{content[:200]}'")
            return pd.DataFrame()

    extracted_data = []

    for item in data:
        try:
            # Verifica il formato dei dati di risposta
            if isinstance(item, dict) and 'response' in item and 'prompt' in item:
                if item.get('status', '') == 'success':
                    response_str = item['response']
                    prompt_str = item['prompt']

                    # Verifica che la risposta non sia vuota
                    if not response_str or response_str.isspace():
                        continue

                    # Rimuovi eventuali caratteri di codifica markdown
                    response_str = re.sub(r'``````', '', response_str)

                    try:
                        response_json = json.loads(response_str)
                    except json.JSONDecodeError:
                        continue

                    # Estrai i testi generati (ora si chiamano 'texts' invece di 'testi')
                    if 'texts' in response_json and isinstance(response_json['texts'], list):
                        texts = response_json['texts']

                        # Estrai policy_number e claim report originale dal prompt
                        policy_number_originale = ""
                        claim_report_originale = ""

                        try:
                            prompt_list = json.loads(prompt_str)

                            if isinstance(prompt_list, list):
                                for msg in prompt_list:
                                    if msg.get('role') == 'user':
                                        input_text = msg.get('content', '')

                                        # Estrai il policy number
                                        policy_match = re.search(r'policy number "([^"]+)"', input_text)
                                        if policy_match:
                                            policy_number_originale = policy_match.group(1)

                                        # Estrai il claim report originale
                                        report_match = re.search(r'following detailed claim report:\s*"([^"]+)"', input_text, re.DOTALL)
                                        if report_match:
                                            claim_report_originale = report_match.group(1)
                        except:
                            pass

                        # Aggiungi ogni testo generato al dataset
                        for i, text_item in enumerate(texts):
                            text = text_item.get('text', '')
                            policy_number = text_item.get('policy_number', policy_number_originale)

                            extracted_data.append({
                                'policy_number': policy_number,
                                'original_claim_report': claim_report_originale,
                                'generated_text': text,
                                'text_index': i+1
                            })
        except Exception as e:
            print(f"Error processing item: {str(e)}")
            continue

    # Crea il DataFrame
    df = pd.DataFrame(extracted_data)

    if len(df) == 0:
        print("No data extracted from JSON file. Check file format.")
        return df

    # Statistiche
    print(f"\nTotal generated texts: {len(df)}")
    print(f"Number of unique policies: {df['policy_number'].nunique()}")
    print(f"Texts per policy: {df.groupby('policy_number').size().to_dict()}")

    # Salva in CSV
    df.to_csv(output_csv, index=False, encoding='utf-8')
    print(f"\nDataset saved to '{output_csv}'")

    return df

# Esempio di utilizzo
if __name__ == "__main__":
    df = create_insurance_claims_dataframe_from_responses('risposte_testi_simili.json')

JSON file loaded successfully. Contains 1000 elements.

Total generated texts: 2769
Number of unique policies: 923
Texts per policy: {'101421': 3, '104594': 3, '106186': 3, '106873': 3, '107181': 3, '108270': 3, '108844': 3, '109392': 3, '110084': 3, '110122': 3, '110408': 3, '111874': 3, '113464': 3, '113516': 3, '114839': 3, '115399': 3, '116645': 3, '116700': 3, '116735': 3, '117862': 3, '118137': 3, '118236': 3, '119513': 3, '120485': 3, '121439': 3, '125324': 3, '125591': 3, '127754': 3, '129872': 3, '130156': 3, '130930': 3, '131478': 3, '132871': 3, '132902': 3, '133889': 3, '134430': 3, '135400': 3, '136520': 3, '137675': 3, '139484': 3, '139872': 3, '140977': 3, '143038': 3, '143109': 3, '143626': 3, '143924': 3, '143972': 3, '144323': 3, '146138': 3, '148498': 3, '149367': 3, '149601': 3, '149839': 3, '150181': 3, '153027': 3, '153154': 3, '153298': 3, '154280': 3, '154982': 3, '155604': 3, '155724': 3, '155912': 3, '156694': 3, '159243': 3, '159536': 3, '159768': 3, '162004'

## Generated Dataset Overview

### Dataset Statistics
Review the final augmented dataset:

In [3]:
df

Unnamed: 0,policy_number,original_claim_report,generated_text,text_index
0,115399,CLAIM INCIDENT REPORT - Policy #115399\n\nPOLI...,The insurance claim associated with policy num...,1
1,115399,CLAIM INCIDENT REPORT - Policy #115399\n\nPOLI...,"In reference to policy number 115399, the rece...",2
2,115399,CLAIM INCIDENT REPORT - Policy #115399\n\nPOLI...,The case concerning policy number 115399 perta...,3
3,832746,CLAIM INCIDENT REPORT - Policy #832746\n\nPOLI...,The insurance claim associated with policy num...,1
4,832746,CLAIM INCIDENT REPORT - Policy #832746\n\nPOLI...,"In relation to policy number 832746, the incid...",2
...,...,...,...,...
2764,431583,CLAIM INCIDENT REPORT - Policy #431583\n\nPOLI...,The insurance policy numbered 431583 covers a ...,2
2765,431583,CLAIM INCIDENT REPORT - Policy #431583\n\nPOLI...,This insurance claim under policy number 43158...,3
2766,221186,CLAIM INCIDENT REPORT - Policy #221186\n\nPOLI...,"This claim pertains to policy number 221186, w...",1
2767,221186,CLAIM INCIDENT REPORT - Policy #221186\n\nPOLI...,Policy number 221186 covers a long-term client...,2


## Final Dataset Export

### Export Configuration
- **Format**: CSV with semicolon separator (European standard)
- **Encoding**: UTF-8 for international compatibility
- **Filename**: `DB_pro_embedder.csv` for embedding training pipeline

In [4]:
# salvo D come DB_finale_pro_CONCEPT.csv
df.to_csv('DB_pro_embedder.csv', index=False,sep=';')

## Results Summary

### Generated Output
The process successfully created:
- **2,769 generated texts** from 923 unique insurance policies
- **3 variations per policy** (average) for diverse training data
- **Professional quality** insurance narratives suitable for:
  - **Embedding Training**: Vector representation learning
  - **Fine-tuning**: Language model specialization
  - **Text Classification**: Fraud detection models
  - **Semantic Search**: Claims similarity matching

### Applications
This augmented dataset enables:
- **Insurance NLP Models**: Domain-specific language understanding
- **Fraud Detection**: Enhanced pattern recognition
- **Claims Processing**: Automated narrative analysis
- **Risk Assessment**: Text-based risk profiling