# When and How to Use Extended Thinking

In this notebook, we'll dive deeper into Claude 3.7 Sonnet's extended thinking capability, exploring:

1. A task complexity classification framework
2. A decision tree for when to use extended thinking
3. Examples of appropriate use cases vs. cases where it's unnecessary
4. Performance benchmarking on different task types
5. Cost implications and optimization strategies

By the end, you'll have a systematic approach to determine when extended thinking is beneficial and how to optimize its use for your specific applications.

> **Note**: In this lesson, we're using the utility functions we developed in Lesson 1. The `claude_utils.py` module contains helper functions for creating Bedrock clients, invoking Claude with or without extended thinking, and displaying responses. This allows us to focus on the core concepts of when and how to use extended thinking rather than repeating boilerplate code.
>
> If you haven't completed Lesson 1 yet, you may want to review it first to understand how these utility functions work. Alternatively, you can examine the `claude_utils.py` file directly to see the implementation details.

In [1]:
# Import required libraries
import boto3
import json
import time
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
from IPython.display import display, Markdown, HTML
import claude_utils

# Configure plot styling
plt.style.use('seaborn-v0_8-whitegrid')
sns.set_context("notebook", font_scale=1.2)
pd.set_option('display.max_colwidth', None)

In [20]:
# Set up the Bedrock clients using our utility module
REGION = 'us-east-2'  # Change to your preferred region
PROFILE_NAME = 'par_servicios'
bedrock, bedrock_runtime = claude_utils.create_bedrock_clients(REGION,PROFILE_NAME)

# Claude 3.7 Sonnet model ID (consistent with Lesson 1)
CLAUDE_37_SONNET_MODEL_ID = 'us.anthropic.claude-sonnet-4-20250514-v1:0'

# Verify model availability
claude_utils.verify_model_availability(bedrock, CLAUDE_37_SONNET_MODEL_ID)

Model anthropic.claude-sonnet-4-20250514-v1:0 CRIS endpoint is available


True

In [39]:
with open("259_CA_2020-02-29.pdf", "rb") as pdf_file:
    pdf_bytes = pdf_file.read()

messages = [{
    "role": "user",
    "content": [
        {
            "document": {
                "name": "My Document",
                "format": "pdf",
                "source": {
                    "bytes": pdf_bytes  # Raw bytes, not base64!
                },
                "citations": {"enabled": True}  # REQUIRED for Claude's PDF support!
            }
        },
        {"text": "Analyze this document"}
    ]
}]

response = bedrock_runtime.converse(
    modelId="us.anthropic.claude-sonnet-4-20250514-v1:0",
    messages=messages,
    inferenceConfig={"maxTokens": 500}
)

ParamValidationError: Parameter validation failed:
Unknown parameter in messages[0].content[0].document: "citations", must be one of: format, name, source

## 1. Task Complexity Classification Framework

To determine when extended thinking is beneficial, we first need a framework to classify task complexity. This will help us make systematic decisions about when to use extended thinking and how much reasoning budget to allocate.

Our framework classifies tasks into four levels of complexity:

1. **Simple**: Straightforward factual queries, basic information retrieval, simple calculations
2. **Medium**: Multi-step reasoning, moderate math problems, basic analysis tasks
3. **Complex**: In-depth analysis, complex reasoning chains, constraint problems
4. **Very Complex**: Systems design, advanced mathematical proofs, multi-stage problem solving

Let's implement a classifier function that can automatically categorize tasks based on their complexity.

In [21]:
def classify_task_complexity(prompt, model_id='us.anthropic.claude-sonnet-4-20250514-v1:0'):
    """
    Use Claude 3.5 Haiku to quickly classify the complexity of a task
    
    Args:
        prompt (str): The user prompt to classify
        model_id (str): The model ID to use for classification (defaults to Claude 3.5 Haiku for speed/cost)
        
    Returns:
        str: Complexity classification ('simple', 'medium', 'complex', or 'very_complex')
    """
    system_prompt = [
        {
            "text": """You are a task complexity classifier. Classify the complexity of the given task into one of these categories: 'simple', 'medium', 'complex', or 'very_complex'. 

Here are examples of each complexity level:
- simple: "What is the capital of France?", "Calculate 25% of 80", "Summarize this short paragraph in one sentence"
- medium: "A man has 53 socks in his drawer: 21 blue, 15 black and 17 red. How many socks must he take out to guarantee a black pair?", "Explain the greenhouse effect and its impact on climate"
- complex: "Design a ride-sharing service that optimizes for driver availability and route efficiency", "Analyze the causes and economic impacts of the 2008 financial crisis"
- very_complex: "Given a graph with n vertices and m edges, design an O(n+m) algorithm to find all bridges", "Design a quantum computing algorithm to solve the traveling salesman problem"

Respond with only the category name, nothing else."""
        }
    ]

    messages = [
        {
            "role": "user",
            "content": [
                {
                    "text": f"Classify the complexity of this task: {prompt}"
                }
            ]
        }
    ]

    try:
        start_time = time.time()
        response = bedrock_runtime.converse(
            modelId=model_id,
            messages=messages,
            system=system_prompt,
            inferenceConfig={
                "temperature": 0,
                "maxTokens": 10  # We only need a short response
            }
        )
        elapsed_time = time.time() - start_time

        # Extract the classification
        result = None
        if response.get('output', {}).get('message', {}).get('content'):
            content_blocks = response['output']['message']['content']
            for block in content_blocks:
                if 'text' in block:
                    result = block['text'].strip().lower()
                    break

        # Ensure the result is one of our expected categories
        valid_categories = ['simple', 'medium', 'complex', 'very_complex']
        if result not in valid_categories:
            result = 'medium'  # Default to medium if unexpected response

        # Calculate approx cost
        tokens = response['usage']['totalTokens']
        cost = tokens * 0.00000125  # Assuming $0.00125 per 1K tokens for Haiku
        
        print(f"Classification: {result} (determined in {elapsed_time:.2f}s, {tokens} tokens, ${cost:.6f})")
        return result

    except Exception as e:
        print(f"Error classifying task complexity: {e}")
        return "medium"  # Default to medium complexity if there's an error

### Understanding the Task Complexity Classifier

The `classify_task_complexity` function is an efficient way to automatically categorize the complexity of user prompts. Here's how it works:

1. **Leveraging a smaller model**: We use **Claude 3.5 Haiku** instead of Claude 3.7 Sonnet for this classification step because it's faster and more cost-effective for this simple decision task.

2. **Classification framework**: The function sends the prompt to Claude with explicit instructions defining four complexity categories (simple, medium, complex, very_complex) along with examples of each.

3. **Efficiency considerations**: The function is optimized for speed and cost by:
   - Using a smaller model **(Claude Haiku 3.5)**
   - Setting temperature to 0 for deterministic responses
   - Limiting the output to just 10 tokens
   - Requesting only the category name

4. **Practical application**: Think of this as the "triage" step in our workflow - similar to how a CPU scheduler determines how much processing time to allocate to different tasks. This initial assessment helps us allocate the appropriate "thinking resources" to the task at hand.

This approach creates a "thinking pipeline" where we efficiently allocate Claude's reasoning capabilities based on task demands - using the right tool for each stage of the process.

In [30]:
task='''You are a precise legal document classification specialist. Your task is to analyze the provided PDF document and classify it into one of the predefined legal document categories.

<context>
- Documents may be in Spanish, Portuguese, English, or other languages (primarily Spanish and Portuguese)
- Document formats and structures vary significantly even within the same category
- Focus on content and meaning, not specific layouts or formatting
- Identify semantic equivalences across languages without translating (e.g., "Identity Card" = "Cédula", "Shareholders" = "Accionistas")
- Return all text you can extract, even if partial or incomplete
- **OCR Tolerance**: Recognize minor orthographic variants or OCR deformations as long as the term is semantically evident (e.g., "CÂMARA DE COMMERCIO", "CEDVLA", "DLAN" for "DIAN")
</context>

<classification_workflow>
Follow this exact order of evaluation:

1. **TEXT EXTRACTION**
   - Extract all readable text from the document
   - If document is scanned/image-only, return whatever text you can identify
   - Never invent or hallucinate text that isn't visible
   - Include all text even if partially illegible or fragmented
   - **OCR Tolerance**: Accept minor variations in key terms due to OCR errors or font issues

2. **PRIORITY CHECKS (Override all other categories)**
   - BLANK: Check if document contains no meaningful content (empty, whitespace only, or minimal meaningless text)
   - LINK_ONLY: Check if document's primary content is just hyperlinks/URLs redirecting elsewhere

3. **CATEGORY EVALUATION**
   - CECRL: Personal identification documents (For individual ID docs)
   - RUT: Tax registration documents (requires DIAN or tax-specific terminology)
   - CERL: Corporate certificates (requires chamber of commerce or existence certificate indicators)
   - ACC: Shareholder composition (Focus on shareholding structure with capital percentages)
   - RUB: Beneficial owners registry (Focus on beneficial ownership for compliance purposes)

4. **CONFIDENCE ASSESSMENT**
   - Strong evidence: Clear must-have indicators present, single category dominance
   - Weak evidence: Missing key indicators, conflicting signals, or unclear purpose
   - If weak evidence → Use POR_REVISAR

5. **FALLBACK RULE**
   Use POR_REVISAR when:
   - Conflicting signals from multiple categories with similar strength
   - Document is incomplete, illegible, or heavily damaged
   - Mixed content from various categories without clear predominance
   - Uncertain about document's primary legal function
</classification_workflow>

<category_definitions>

<category name="BLANK">
**Must-have indicators:** Document is empty, contains only whitespace, or minimal meaningless text
**Red flags:** Any substantial readable content
</category>

<category name="LINK_ONLY">
**Must-have indicators:** Primary content consists of hyperlinks/URLs directing to external sources
**Red flags:** Substantial document content beyond links
**Example:** "shareholder information available at: https://website.com/shareholders"
</category>

<category name="CECRL">
**Must-have indicators:**
- "Cédula", "Identificación personal", "Identity Card", "Passport", "Cartão de Identidade"
- Personal data fields: blood type, gender, birth date/place, photo references
- Individual identification numbers with personal names
**Red flags:** Corporate entities, company names, business activities
**Example:** "REPUBLICA DE COLOMBIA IDENTIFICACION PERSONAL CEDULA DE CIUDADANIA NUMERO 88.199.170 FUENTES LEON"
</category>

<category name="RUT">
**Must-have indicators:**
- "DIAN", "Registro Único Tributario", "Tax Registration"
- Tax classification codes, economic activities, tax responsibilities
**Red flags:** No tax-related terminology
**Example:** "DIAN Formulario del Registro Único Tributario... Razón social VR INGENIERIA"
</category>

<category name="CERL">
**Must-have indicators:**
- "Cámara de Comercio", "Certificate of Existence", "Certificado de Existência"
- "Matrícula", corporate registration, legal representation certificates
**Red flags:** Pure tax documents, personal IDs, shareholder lists without existence certificate
**Example:** "CÁMARA DE COMERCIO DE BOGOTÁ CERTIFICADO DE EXISTENCIA... NIT: 800035887-3"
</category>

<category name="ACC">
**Must-have indicators:**
- Shareholder/ownership terminology: "Accionistas", "Shareholders", "participación"
- Ownership percentages or share capital details
**Red flags:** No ownership percentages, pure corporate certificates without ownership details
**Example:** "Mauricio Obregón Gutiérrez 79.425.769 GERENTE 80% C.C"
</category>

<category name="RUB">
**Must-have indicators:**
- "Registro Único de Beneficiarios", "beneficial owners", "beneficiários"
- Explicit beneficiary listings with ownership details
**Red flags:** General shareholder info without beneficiary context
</category>

<category name="POR_REVISAR">
**Use when:**
- Conflicting evidence from multiple categories with similar strength
- Document incomplete, illegible, or missing critical information
- Mixed content without clear predominant category
- Low confidence in any classification
</category>

</category_definitions>

<output_requirements>
Return exactly one JSON object with proper character escaping:

```json
{
  "category": "CATEGORY_NAME",
  "text": "complete extracted text with proper JSON escaping"
}
```

**Critical requirements:**
- Use exact category names: CERL, CECRL, RUT, RUB, ACC, BLANK, LINK_ONLY, POR_REVISAR
- Properly escape special characters in "text" field (quotes, newlines, backslashes)
- Include ALL extracted text, even if empty (BLANK should return `"text": ""`)
- Return only the JSON object with no additional commentary
- Ensure valid JSON formatting
- When in doubt, prefer POR_REVISAR over incorrect classification
</output_requirements>

<task>
Analyze the attached PDF document and classify it according to the established workflow.
</task>

<instructions>
<text_extraction>
Extract all readable text from the document (include partial text and tolerate OCR errors).
</text_extraction>

<evaluation_order>
1. First check: BLANK (empty/meaningless content)
2. Second check: LINK_ONLY (primarily hyperlinks)
3. Then evaluate: CECRL, RUT, CERL, ACC, RUB based on content indicators
4. If uncertain or conflicting evidence → POR_REVISAR
</evaluation_order>

<output_format>
Return ONLY a valid JSON object with this exact structure:
{
  "category": "CATEGORY_NAME",
  "text": "complete extracted text with proper JSON escaping"
}
</output_format>
</instructions>

<critical_requirements>
- Use EXACT category names: CERL, CECRL, RUT, RUB, ACC, BLANK, LINK_ONLY, POR_REVISAR
- For BLANK category: "text" field must be empty string ""
- Properly escape JSON special characters (quotes, newlines, backslashes)
- NO explanations, comments, or additional content outside the JSON
- When in doubt, prefer POR_REVISAR over incorrect classification
</critical_requirements>

'''



complexity = classify_task_complexity(prompt=task)




Classification: complex (determined in 22.53s, 2041 tokens, $0.002551)


In [33]:
task='''You are a support specialist for reading and extracting data from legal documents specifically Equity Composition documents (category ACC).

The input will be a PDF document in bytes format. Your task is to extract structured information according to the JSON FINAL schema

# Language Handling
- Documents are primarily in Spanish or Portuguese
- Maintain original language terms in extracted data (e.g. "Representante Legal")
- Extract and structure data according to the schema regardless of document language

1) Country Identification (Multi-Layer Approach):
    a. Explicit References First:
      - Search for explicit country or city names in the document (e.g., "Bogotá", "Medellín", "Colombia")
      - Look for headers, footers, or official stamps that may contain country information
      - Caution: if a country or city name appears only within a company name (e.g., “Constructora Perú S.A.S” or “Colombia Coffee Export”), do not automatically assume this is the actual country of the document. Require additional evidence before assigning it.
    b. ID-Based Inference:
      - If no explicit country/city references are found, infer country based on tax-ID or document ID nomenclature
      - Use the COMPANY_ID and PERSON_ID hash tables to determine the country (e.g., "NIT" and "CC" → Colombia).
    c. Domain-Based Inference
      - Identify web domains mentioned in the document.
      - Use the domain extension to infer the country (e.g., .co → Colombia, .pe → Peru, .mx → Mexico).
      - Prioritize official or institutional domains (e.g., .gov.co, .gob.pe) over commercial ones.
    d. Fallback Case
      - If none of the above criteria (explicit references, domains, IDs) provide sufficient evidence, assign: <country>="POR REVISAR"

    Output Requirements
      - Store the identified country in <country>.
      - Store the reasoning and evidence used for inference in identificationDetails field.
    
    identificationDetails field must include:
        - source: either "text_reference" or "id_analysis".
        - indicators: evidence supporting the chosen country.
        - conflictingSources: evidence suggesting a different country.
        - requiresReview:
            * true if can't infer <country> OR if any values of confidence in "confidenceScores" field are below 70 OR if any extracted field value equals "POR REVISAR".
            * false otherwise.

    Hash tables:
        ```text
        COMPANY_ID = {
          "Argentina":["CUIT"],"Bolivia":["NIT"],"Brasil":["CPF","CNPJ"],"Chile":["RUT"],
          "Colombia":["NIT"],"Costa Rica":["CIF"],"Cuba":["NIF"],"Ecuador":["RUC"],"El Salvador":["NIT"],
          "España":["NIF"],"Guatemala":["NIT"],"Honduras":["RTN"],"México":["RFC"],["Nicaragua":"RUC"],
          "Panamá":["RUC"],"Paraguay":["RUC"],"Perú":["RUC"],"Portugal":["NIF","NIPC"],"Puerto Rico":["EIN"],"República Dominicana":["RNC"],
          "Uruguay":["RUT"],"Venezuela":["RIF"]
        }
        PERSON_ID = {
          "Brasil":["CI"],"Bolivia":["CI"],"Chile":["CI"],"Costa Rica":["CI"],"Ecuador":["CI"],
          "Nicaragua":["CI"],"Uruguay":["CI"],"Venezuela":["CI"],"Argentina":["DNI"],
          "España":["DNI"],"Honduras":["DNI"],"Perú":["DNI"],"Colombia":["CC","TI","CE"],
          "Cuba":["CI"],"México":["CURP","CRIP"],"Guatemala":["DPI"],
          "El Salvador":["DUI"],"República Dominicana":["CIE"],
          "Paraguay":["CIC","CI"],"Panamá":["CIP"],"Portugal":["CC"],"Puerto Rico":["ID","LC"]
        }
        ```
2) Root-Level main reporting Company fields Extraction
The root level represents the main reporting company (the entity issuing the equity composition document), NOT any of its shareholders or related parties. These fields must be extracted with the highest priority and accuracy.
    
    a. CompanyName 
      - Primary sources (in order of preference):
        1. Document headers containing phrases like "CERTIFICACIÓN DE COMPOSICIÓN ACCIONARIA DE [COMPANY NAME]"
        2. Official letterhead or stamps
        3. Fields explicitly labeled as "Razón Social", "Denominación Social", "Empresa", or "Sociedad"
        4. The entity referenced as the issuer/certifier of the document
      
      - Extraction rules:
        * Preserve the complete legal name including all suffixes (S.A., S.A.S, S.A.C, Ltda, etc.)
        * Remove any extraneous text like "La empresa", "La sociedad", "La compañía"
        * If the name appears in multiple places with slight variations, use the most complete version
        * Normalize excessive spacing but preserve intentional formatting
      
      - Validation:
        * The companyName of the main reporting company can match with the companyName of a CompanyRelatedParty
        * If no clear companyName ,for the main reporting company, is found or if ambiguous → set companyName = "POR REVISAR"

    b. Tax Identification (taxId)
      - Primary sources:
        1. Fields labeled as "NIT", "RUC", "RUT", "CUIT", "RFC", or other tax ID markers based on country
        2. Often appears near the company name in headers
        3. May be found in certification statements like "con NIT [number]"
      
      - Format rules:
        * Preserve the original format including separators (dots, hyphens, slashes)
        * Common patterns by country:
            ** Colombia: XXX.XXX.XXX-X or XXXXXXXXX-X
            ** Peru: XXXXXXXXXXX 
            ** Mexico: XXX-XXXXXX-XXX
            ** Chile: XX.XXX.XXX-X
        * Do NOT include verification digits if they appear separately (e.g., "NIT: 900123456 DV: 7" → taxId = "900123456")
      - Validation:
        * Must correspond to COMPANY_ID patterns for the identified country
        * If no tax ID found or unclear → set taxId = "POR REVISAR"
    c. Cross-validation requirements:
      - The taxId type should align with the country identification:
        * Example: If taxId starts with "NIT" and country = "Colombia" ✓ Valid
        * Example: If taxId starts with "RUC" and country = "Colombia" ✗ Requires review
      - If misalignment detected → set identificationDetails.conflictingSources and requiresReview = true
    d. Special cases and error handling:
      - Multiple companies mentioned at root level:
        * Identify the issuing/reporting company (usually the one certifying the information)
        * Other companies should be in relatedParties, not at root level
    
      - Holding company structures:
        * Extract the immediate parent company as root level
        * Subsidiaries or parent companies of the parent go in relatedParties
    
      - Missing or illegible information:
        * Always use "POR REVISAR" as placeholder
        * Never leave fields empty or null
    f. Examples of correct extraction:
      - Document header: "CERTIFICACIÓN COMPOSICIÓN ACCIONARIA EMPRESA XYZ S.A.S NIT 900.123.456-7"
        → companyName: "EMPRESA XYZ S.A.S"
        → taxId: "900.123.456-7"
      
      - Formal letter: "Por medio de la presente, INVERSIONES ABC S.A., identificada con NIT 860.000.123-4..."
        → companyName: "INVERSIONES ABC S.A."
        → taxId: "860.000.123-4"


3) Related Parties Extraction and Classification:

A relatedParty represents an individual or company entity that has a relationship with the main reporting company, such as shareholders, representatives, or other stakeholders.
For each relatedParty detected in the document, classify it as either a PersonRelatedParty or a CompanyRelatedParty according to the following rules:

  a. General Identification Principle
      - Let cty = detected country.
      - Every relatedParty must include identificationNumber and participationPercentage.
      - The identificationNumber may be located in different ways depending on the document format:
          * As the value of a column whose header is "Documento Identificación", "CC o NIT", "Identificación", "C.C", or related variations.
          * Placed next to the person or company name, typically preceded by markers like "C.C", "NIT", "DNI", "ID", etc.
      - Identify the associated identificationType by checking the column/marker name.
      - The identificationType must correspond to a valid entry in either PERSON_ID[country] for individuals or COMPANY_ID[country] for companies.
      - Set participationPercentage according to section 4. Participation Percentage Rules
      - Set timeFound according to section 5. Repeated Entity Handling and TimeFound Rules 
  b. Person Detection Rules
  If the relatedParty clearly contains a personal name (e.g., "Juan Pérez") and does not include company markers ("S.A.C", "S.A", "Ltda", "Inc", "Corp", etc.):
      - Set firstName and lastName according to section 3. Name Splitting Rules
      - The identificationType must always belong to PERSON_ID[cty].
      - If the column/marker name is not in PERSON_ID[cty], then force the identificationType to the value in PERSON_ID[cty].
      - If multiple possible values exist in PERSON_ID[cty], apply the following rules:
          * For Colombia, apply these ID type rules (after removing all . from the identificationNumber):
              ** Choose "CC" if identificationNumber matches /^[1-9][0-9]{3,9}$/.
              ** Choose "CE" if identificationNumber matches /^([a-zA-Z]{1,5})?[1-9][0-9]{3,7}$/.
              ** Choose "TI" if identificationNumber matches /^[1-9][0-9]{4,11}$/.
              ** If none of the above rules match, set identificationType = "POR REVISAR".
          * For México and any other country with multiple possible PERSON_ID[cty] values:
              ** Set identificationType = "POR REVISAR".

  If the relatedParty name is ambiguous (not clearly a person nor clearly a company) and does not contain company markers (S.A.C, S.A, Ltda, etc.):
      - Verify if identificationType belongs to PERSON_ID[cty]:
          * If yes → execute the rules under "If the relatedParty clearly contains a personal name".
          * If no → set identificationType = "POR REVISAR" and identificationNumber = "POR REVISAR" and firstName = "POR REVISAR" and lastName = "POR REVISAR".

  Emit a PersonRelatedParty object with the following schema:

  ```json
  {
    "firstName": firstName,
    "lastName":  lastName,
    "identificationType": identificationType,
    "identificationNumber": identificationNumber,
    "relationshipType": "Shareholder",
    "participationPercentage": participationPercentage,
    "timeFound": timeFound,
    "job":<job>,
  }
  ```
  c. Name Splitting Rules

  When extracting firstName and lastName for a PersonRelatedParty, apply the following principles:

      1. General Principle

          - firstName must always contain all given names (one or more).

          - lastName must always contain all family names (one or two).

      2. Order Handling

          - Names may appear in given-name-first order (e.g., Juan Carlos Pérez Gómez → firstName = "Juan Carlos", lastName = "Pérez Gómez")

          - Or in surname-first order (e.g., Pérez Gómez Juan Carlos → firstName = "Juan Carlos", lastName = "Pérez Gómez").

          - Must detect whether the sequence represents surname-first or name-first order.

      3. Common Structures

          - Two given names + two surnames → firstName = "primer_nombre segundo_nombre", lastName = "primer_apellido segundo_apellido"

          - One given name + two surnames → firstName = "unico_nombre", lastName = "primer_apellido segundo_apellido"

          - One given name + one surname → firstName = "nombre", lastName = "apellido"

          - Three given names + two surnames (rare) → firstName = "nombre1 nombre2 nombre3", lastName = "apellido1 apellido2"

      4. Abbreviations and Compound Names

          - Abbreviated given names (e.g., Juan A. Pérez) must be preserved: firstName = "Juan A.", lastName = "Pérez".

          - Compound surnames or given names (e.g., María del Pilar Gómez, José de la Cruz Pérez) must be preserved as written:

              * firstName = "María del Pilar", lastName = "Gómez"

              * firstName = "José", lastName = "de la Cruz Pérez"

      5. Column / Row Variations

          - Names may appear in a single cell (full string) or split across rows/columns (e.g., one row for surnames, another for given names).

          - The model must merge them correctly into one firstName and one lastName.

      6. Impossible or Ambiguous Cases

          - If it is not possible to clearly distinguish given names from surnames, set:

              * firstName = "POR REVISAR"

              * lastName = "POR REVISAR" 

  d. Participation Percentage Rules

  When extracting participationPercentage for a PersonRelatedParty or CompanyRelatedParty, apply the following principles:

      1. Direct Extraction (Preferred Method)

          - Look for explicit fields such as:

              * "porcentaje de participación"

              * "% participación"

              * "participación accionaria"

              * or similar variations.

          - If a numeric value is found:

              - If the value already includes % (e.g., 82%), preserve it exactly as written.

              - If the value does not include % (e.g., 82), preserve it as written without appending %.


      2. Indirect Calculation (Fallback Method)

          - If no explicit participation field is present, check for financial/ownership fields such as:

              * "VR CAPITAL PAGADO"

              * "CAPITAL PAGADO"

              * "VALOR TOTAL"

              * or similar variations.

          - If these fields exist, calculate participationPercentage as:

              * participationPercentage = (VR_CAPITAL_PAGADO / SUM(VR_CAPITAL_PAGADO for all relatedParties)) * 100


          - Round result to 2 decimal places and append % at the end finally add "POR REVISAR"(e.g., "23.45% POR REVISAR").

      3. Special Considerations

          - If the necessary fields to calculate participationPercentage are missing or inconsistent, set:

              * participationPercentage = "POR REVISAR".

          - Total participation across all parties typically equals 100%, but may be less depending on the document.

  e. Repeated Entity Handling and TimeFound Rules 

  When the same relatedParty (person or company) appears multiple times in the document:

      1. Multiple Occurrences

          - Create a separate entry for each occurrence.

          - Add a timeFound counter:

              * Start at 1 for the first occurrence.

              * Increment sequentially for each subsequent occurrence (2, 3, ...).

      2. Job Field Association

          - If the entity appears with a role or title (e.g., "Representante Legal", "CEO", "General Manager", "Apoderado"), include it in <job> value.

          - If no role is provided, leave <job> value empty.

      3. Consistency Across Occurrences

          - The same entity may have the same identificationNumber but different contexts (e.g., one as shareholder, one as representative).

          - Preserve each case independently with its own timeFound.

  f. Company Detection Rules

  When extracting a CompanyRelatedParty, apply the following principles:

      1. General Identification Principle

          - A company must always be represented with:

              * companyName

              * identificationType (from COMPANY_ID[cty])

              * identificationNumber

          - The identificationNumber may appear in a dedicated column (e.g., "NIT", "RUC", "RFC") or next to the company name (e.g., "EMPRESA XYZ S.A.S – NIT 900123456").

      2. Company Name Rules

          - Preserve the full legal name exactly as written in the document, including markers such as:

              - "S.A.", "S.A.S", "S.A.C", "Ltda", "Inc.", "Corp.", "Cía", "S. de R.L.", "SRL", "S. en C.", "PLC", etc.

          - If the company name is split across multiple rows or columns, merge it into a single companyName.

          - Do not attempt to split companyName into firstName or lastName.

      3. Identification Type Rules

          - The identificationType must always be selected from COMPANY_ID[cty].

          - If multiple possible values exist for a country (e.g., Brasil → CPF vs CNPJ, Portugal → NIF vs NIPC):

              * Choose the type that matches the document marker (e.g., "CNPJ 12.345.678/0001-90" → CNPJ).

              * If the marker is ambiguous or missing, set identificationType = "POR REVISAR".

      4. Identification Number Rules

          - Extract the number exactly as written, preserving separators (., /, -) if they are part of the official format.

          - Normalize only when the document contains formatting noise (e.g., spaces between digits).

          - If no valid number is found, set:

              - identificationType = "POR REVISAR"

              - identificationNumber = "POR REVISAR".

  Output Schema for a CompanyRelatedParty

  ```json
  {
    "companyName": companyName,
    "identificationType": identificationType,
    "identificationNumber": identificationNumber,
    "relationshipType": "Shareholder",
    "participationPercentage": participationPercentage,
    "timeFound": timeFound,
    "job":<job>,
  }
  ```
  g. relatedParties JSON Example

  [
    {
      "firstName": "Juan Carlos",
      "lastName": "Pérez Gómez",
      "identificationType": "CC",
      "identificationNumber": "12345678",
      "relationshipType": "Shareholder",
      "participationPercentage": "10%",
      "timeFound": 1,
      "job": ""
    },
    {
      "firstName": "Juan Carlos",
      "lastName": "Pérez Gómez",
      "identificationType": "CC",
      "identificationNumber": "12345678",
      "relationshipType": "Shareholder",
      "participationPercentage": "",
      "timeFound": 2,
      "job": "Representante Legal"
    },
    {
      "companyName": "INVERSIONES ANDES S.A.S",
      "identificationType": "NIT",
      "identificationNumber": "900123456-7",
      "relationshipType": "Shareholder",
      "participationPercentage": "40%",
      "timeFound": 1,
      "job": ""
    },
    {
      "companyName": "INVERSIONES ANDES S.A.S",
      "identificationType": "NIT",
      "identificationNumber": "900123456-7",
      "relationshipType": "Shareholder",
      "participationPercentage": "",
      "timeFound": 2,
      "job": "Mandatario"
    },
    {
      "companyName": "INVERSIONES ANDES S.A.S",
      "identificationType": "NIT",
      "identificationNumber": "900123456-7",
      "relationshipType": "Shareholder",
      "participationPercentage": "",
      "timeFound": 3,
      "job": ""
    },
    {
      "firstName": "María del Pilar",
      "lastName": "Rodríguez López",
      "identificationType": "CC",
      "identificationNumber": "987654321",
      "relationshipType": "Shareholder",
      "participationPercentage": "30%",
      "timeFound": 1,
      "job": "Representante Legal"
    },
    {
      "firstName": "María del Pilar",
      "lastName": "Rodríguez López",
      "identificationType": "CC",
      "identificationNumber": "987654321",
      "relationshipType": "Shareholder",
      "participationPercentage": "",
      "timeFound": 2,
      "job": "Apoderado"
    },
    {
      "firstName": "Ana",
      "lastName": "Martínez",
      "identificationType": "CC",
      "identificationNumber": "44332211",
      "relationshipType": "Shareholder",
      "participationPercentage": "20%",
      "timeFound": 1,
      "job": ""
    }
  ]
4) Confidence Scoring:
    - Assign scores based on coverage and match quality, not just presence:
        * RelatedParties rubric:
          ** Start with 100 × (extracted_parties ÷ detected_candidates)
          ** Subtract 10 points for each missing required field in any entry
          ** Cap at 90 if any names contain obvious OCR noise
        * For all fields:
          ** 90-100: Complete coverage with high match quality
          ** 70-89: Partial coverage or moderate match quality
          ** Below 70: Significant gaps requiring human review
   - If you infer that any required party (company or person) may be missing, set relatedParties score "< 70"
   - If any values in relatedParties field is "POR REVISAR" set also relatedParties score "< 70"
   - Flag any field with score <70 in chain-of-thought
   - Recheck document for low-confidence fields
   - Include scores in final output as a separate "confidenceScores" object
   - IMPORTANT: Always include confidence scores for each extracted field in the final output

5) Validation checklist:
    - Verify all mandatory fields are present:
        * Root level: documentType, taxId, country, relatedParties
        * Person: firstName, lastName, identificationType, identificationNumber, relationshipType, participationPercentage, timeFound, job
        * Company: companyName, identificationType, identificationNumber, relationshipType, participationPercentage, timeFound, job
    - Ensure identificationType matches the country's allowed types per hash tables
    - Verify timeFound values increment correctly for duplicate entities
    - Check that job fields only appear when specified in the document
    - Remove any fields not defined in the JSON FINAL schema
6) Final validation against:
    - The JSON FINAL schema
    - The PERSON_ID / COMPANY_ID lookup rules for documentCountry.
    - Ensure the country field is correctly populated based on either explicit mentions or ID inference
    - Verify identificationDetails contains accurate source information
    - If there's a discrepancy between explicit country mentions and ID-based inference or if any confidenceScore values <70 or if any field="POR REVISAR", set requiresReview flag to true
    - If you spot any missing/extra fields or mis-classified parties, **fix** them now.

By following these steps, you can deliver the correct information from the document.

Below is the JSON FINAL schema you must follow exactly:

```json
$schema
```

Below are some examples of final outputs. Use them to guide optional fields and edge cases — but always validate against the schema above:

$examples_section

When you receive a PDF, extract **only** the data required by the schema and return a single JSON object. Do **not** include any commentary, markdown, or extra keys—just the raw JSON.

Check the draft json and correct if necessary. Only return the data collected instead of the examples.

Extract company equity composition data from the provided PDF document.

# Key Instructions
1. Follow ALL extraction rules from system instructions
2. Identify the main reporting company (root level) first
3. Extract all shareholders/related parties with their participation percentages
4. Apply country detection rules (explicit → ID-based → domain → "POR REVISAR")
5. Use "POR REVISAR" for any unclear/missing values
6. Calculate confidence scores for each field
7. Set requiresReview=true if any confidence<70 or any field="POR REVISAR"

# Required Output Structure
```json
{
  "path": "$pdf_path",
  "result": {
      "companyName": "",
      "documentType": "Equity Composition",
      "taxId": "",
      "country": "",
      "identificationDetails": {
        "source": "",
        "indicators": [],
        "conflictingSources": [],
        "requiresReview":
      },
      "relatedParties": [
        {
          "firstName": "",
          "lastName": "",
          "identificationType": "",
          "identificationNumber": "",
          "relationshipType": "Shareholder",
          "participationPercentage": "",
          "timeFound": ,
          "job":""
        },
        {
          "companyName": "",
          "identificationType": "",
          "identificationNumber": "",
          "relationshipType": "Shareholder",
          "participationPercentage": "",
          "timeFound": ,
          "job":""
        }
      ]
    },
  "document_type": "company",
  "document_number": "$document_number",
  "category": "ACC",
  "confidenceScores": {
    "companyName": ,
    "taxId": ,
    "country": ,
    "relatedParties": 
  }
}
```

Think step-by-step. Extract data carefully. Return only valid JSON without comments or markdown.

'''



complexity = classify_task_complexity(prompt=task)

Classification: complex (determined in 1.05s, 6727 tokens, $0.008409)


## 2. Decision Tree for When to Use Extended Thinking

Based on our task complexity framework, we can create a decision tree to help determine:
1. Whether to use extended thinking
2. How much reasoning budget to allocate

The decision tree takes into account:
- Task complexity
- Performance requirements
- Time sensitivity
- Cost considerations

Here's a visualization of our decision tree:

![Decision Tree](./images/lesson2/complexity.png)

### Now, let's create a function to automatically determine whether to use extended thinking and what budget to allocate:

In [29]:
def determine_extended_thinking_strategy(prompt, time_sensitive=False):
    """
    Determine whether to use extended thinking and what budget to allocate
    based on task complexity and time sensitivity
    
    Args:
        prompt (str): The user prompt
        time_sensitive (bool): Whether the task is time-sensitive
        
    Returns:
        dict: Strategy with 'use_extended_thinking' and 'reasoning_budget' keys
    """
    # First, classify the task complexity
    complexity = classify_task_complexity(prompt)
    
    # Define reasoning budget ranges for each complexity level
    budget_ranges = {
        'simple': (0, 0),  # No extended thinking for simple tasks
        'medium': (1024, 2048),
        'complex': (2048, 8192),
        'very_complex': (8192, 16384)
    }
    
    # Determine whether to use extended thinking based on complexity and time sensitivity
    use_extended_thinking = True
    
    if complexity == 'simple':
        use_extended_thinking = False
    elif complexity == 'medium' and time_sensitive:
        use_extended_thinking = False
    
    # Determine reasoning budget (if using extended thinking)
    if use_extended_thinking:
        min_budget, max_budget = budget_ranges[complexity]
        
        # Use the lower end of the range if time_sensitive, otherwise use the middle
        if time_sensitive:
            reasoning_budget = min_budget
        else:
            reasoning_budget = (min_budget + max_budget) // 2
    else:
        reasoning_budget = 0
    
    strategy = {
        'complexity': complexity,
        'use_extended_thinking': use_extended_thinking,
        'reasoning_budget': reasoning_budget,
        'time_sensitive': time_sensitive
    }
    
    return strategy

In [34]:
determine_extended_thinking_strategy(task, time_sensitive=False)

Classification: complex (determined in 1.66s, 6727 tokens, $0.008409)


{'complexity': 'complex',
 'use_extended_thinking': True,
 'reasoning_budget': 5120,
 'time_sensitive': False}

### Understanding the Extended Thinking Strategy Function

The `determine_extended_thinking_strategy` function acts as an automated decision-making system that applies our decision tree logic, first classifying task complexity and then determining whether to use extended thinking and what reasoning budget to allocate based on both complexity and time sensitivity. Like a smart resource manager, it efficiently routes tasks to the appropriate processing pipeline with the right amount of "thinking power" based on the task's demands.

In [35]:
class DynamicBudgetAllocator:
    """
    Allocates reasoning budgets dynamically based on task complexity and constraints
    """
    def __init__(self):
        # Default budget ranges by complexity
        self.default_budgets = {
            'simple': 0,  # No extended thinking for simple tasks
            'medium': 2048,
            'complex': 4096,
            'very_complex': 8192
        }
        
        # Budget adjustments for time sensitivity
        self.time_sensitive_adjustments = {
            'simple': 0,
            'medium': 0,  # No extended thinking when time-sensitive
            'complex': -2048,  # Reduce budget for time-sensitive tasks
            'very_complex': -4096  # Significant reduction for time-sensitive tasks
        }
        
        # Performance tracking
        self.performance_history = {}
    
    def allocate_budget(self, prompt, time_sensitive=False, cost_constrained=False):
        """
        Allocate an appropriate reasoning budget for a task
        
        Args:
            prompt (str): The user prompt
            time_sensitive (bool): Whether the task is time-sensitive
            cost_constrained (bool): Whether to prioritize cost saving
            
        Returns:
            dict: Allocation decision including reasoning budget and strategy details
        """
        # Step 1: Classify task complexity
        complexity = classify_task_complexity(prompt)
        
        # Step 2: Get base budget for this complexity
        base_budget = self.default_budgets.get(complexity, 2048)
        
        # Step 3: Apply adjustments
        final_budget = base_budget
        
        # Apply time sensitivity adjustment
        if time_sensitive and complexity in self.time_sensitive_adjustments:
            final_budget += self.time_sensitive_adjustments[complexity]
        
        # Apply cost constraint adjustment (reduce by 50% if cost constrained)
        if cost_constrained and final_budget > 0:
            final_budget = max(1024, final_budget // 2)  # Minimum 1024 if using extended thinking
        
        # Step 4: Determine whether to use extended thinking
        use_extended_thinking = final_budget >= 1024
        
        # If not using extended thinking, set budget to 0
        if not use_extended_thinking:
            final_budget = 0
        
        # Step 5: Create allocation decision
        allocation = {
            'complexity': complexity,
            'use_extended_thinking': use_extended_thinking,
            'reasoning_budget': final_budget,
            'time_sensitive': time_sensitive,
            'cost_constrained': cost_constrained
        }
        
        return allocation
    
    def update_performance(self, allocation, elapsed_time, token_count, cost):
        """
        Update performance history for continuous learning
        
        Args:
            allocation (dict): The allocation decision
            elapsed_time (float): Time taken for response
            token_count (int): Total tokens used
            cost (float): Total cost
        """
        complexity = allocation['complexity']
        budget = allocation['reasoning_budget']
        
        if complexity not in self.performance_history:
            self.performance_history[complexity] = []
        
        self.performance_history[complexity].append({
            'budget': budget,
            'elapsed_time': elapsed_time,
            'token_count': token_count,
            'cost': cost,
            'timestamp': time.time()
        })

# Create an instance of our allocator
budget_allocator = DynamicBudgetAllocator()

In [None]:
def test_dynamic_allocation(prompts, allocator):
    """
    Test our dynamic budget allocator on a set of prompts
    
    Args:
        prompts (dict): Dictionary of prompt labels to prompt text
        allocator (DynamicBudgetAllocator): The budget allocator
        
    Returns:
        pd.DataFrame: Results of the test
    """
    results = []
    
    for label, prompt in prompts.items():
        print(f"\nTesting prompt: {label}")
        print(f"Prompt: {prompt[:100]}..." if len(prompt) > 100 else f"Prompt: {prompt}")
        
        # Get allocation for standard mode (not time-sensitive)
        standard_allocation = allocator.allocate_budget(prompt, time_sensitive=False)
        print(f"Standard mode allocation: {standard_allocation}")
        
        # Get allocation for time-sensitive mode
        time_sensitive_allocation = allocator.allocate_budget(prompt, time_sensitive=True)
        print(f"Time-sensitive allocation: {time_sensitive_allocation}")
        
        # Execute with the standard allocation
        print(f"\nExecuting with standard allocation...")
        start_time = time.time()
        
        response = claude_utils.invoke_claude(
            bedrock_runtime,
            prompt,
            CLAUDE_37_SONNET_MODEL_ID,
            enable_reasoning=standard_allocation['use_extended_thinking'],
            reasoning_budget=standard_allocation['reasoning_budget'],
            max_tokens=1000
        )
        
        elapsed_time = time.time() - start_time
        
        # Calculate metrics
        input_tokens = response.get('usage', {}).get('inputTokens', 0)
        output_tokens = response.get('usage', {}).get('outputTokens', 0)
        total_tokens = response.get('usage', {}).get('totalTokens', 0)
        total_cost = (input_tokens * 0.000003) + (output_tokens * 0.000015)
        
        # Update allocator's performance history
        allocator.update_performance(
            standard_allocation,
            elapsed_time,
            total_tokens,
            total_cost
        )
        
        # Store result
        results.append({
            'Prompt': label,
            'Complexity': standard_allocation['complexity'],
            'Use_Extended_Thinking': standard_allocation['use_extended_thinking'],
            'Reasoning_Budget': standard_allocation['reasoning_budget'],
            'Time_Sensitive_Budget': time_sensitive_allocation['reasoning_budget'],
            'Elapsed_Time': elapsed_time,
            'Total_Tokens': total_tokens,
            'Total_Cost': total_cost
        })
        
        print(f"Completed in {elapsed_time:.2f}s, {total_tokens} tokens, ${total_cost:.6f}")
    
    return pd.DataFrame(results)

In [37]:
test_prompts = {
    "Complex_Analysis": """
    You are a precise legal document classification specialist. Your task is to analyze the provided PDF document and classify it into one of the predefined legal document categories.

<context>
- Documents may be in Spanish, Portuguese, English, or other languages (primarily Spanish and Portuguese)
- Document formats and structures vary significantly even within the same category
- Focus on content and meaning, not specific layouts or formatting
- Identify semantic equivalences across languages without translating (e.g., "Identity Card" = "Cédula", "Shareholders" = "Accionistas")
- Return all text you can extract, even if partial or incomplete
- **OCR Tolerance**: Recognize minor orthographic variants or OCR deformations as long as the term is semantically evident (e.g., "CÂMARA DE COMMERCIO", "CEDVLA", "DLAN" for "DIAN")
</context>

<classification_workflow>
Follow this exact order of evaluation:

1. **TEXT EXTRACTION**
   - Extract all readable text from the document
   - If document is scanned/image-only, return whatever text you can identify
   - Never invent or hallucinate text that isn't visible
   - Include all text even if partially illegible or fragmented
   - **OCR Tolerance**: Accept minor variations in key terms due to OCR errors or font issues

2. **PRIORITY CHECKS (Override all other categories)**
   - BLANK: Check if document contains no meaningful content (empty, whitespace only, or minimal meaningless text)
   - LINK_ONLY: Check if document's primary content is just hyperlinks/URLs redirecting elsewhere

3. **CATEGORY EVALUATION**
   - CECRL: Personal identification documents (For individual ID docs)
   - RUT: Tax registration documents (requires DIAN or tax-specific terminology)
   - CERL: Corporate certificates (requires chamber of commerce or existence certificate indicators)
   - ACC: Shareholder composition (Focus on shareholding structure with capital percentages)
   - RUB: Beneficial owners registry (Focus on beneficial ownership for compliance purposes)

4. **CONFIDENCE ASSESSMENT**
   - Strong evidence: Clear must-have indicators present, single category dominance
   - Weak evidence: Missing key indicators, conflicting signals, or unclear purpose
   - If weak evidence → Use POR_REVISAR

5. **FALLBACK RULE**
   Use POR_REVISAR when:
   - Conflicting signals from multiple categories with similar strength
   - Document is incomplete, illegible, or heavily damaged
   - Mixed content from various categories without clear predominance
   - Uncertain about document's primary legal function
</classification_workflow>

<category_definitions>

<category name="BLANK">
**Must-have indicators:** Document is empty, contains only whitespace, or minimal meaningless text
**Red flags:** Any substantial readable content
</category>

<category name="LINK_ONLY">
**Must-have indicators:** Primary content consists of hyperlinks/URLs directing to external sources
**Red flags:** Substantial document content beyond links
**Example:** "shareholder information available at: https://website.com/shareholders"
</category>

<category name="CECRL">
**Must-have indicators:**
- "Cédula", "Identificación personal", "Identity Card", "Passport", "Cartão de Identidade"
- Personal data fields: blood type, gender, birth date/place, photo references
- Individual identification numbers with personal names
**Red flags:** Corporate entities, company names, business activities
**Example:** "REPUBLICA DE COLOMBIA IDENTIFICACION PERSONAL CEDULA DE CIUDADANIA NUMERO 88.199.170 FUENTES LEON"
</category>

<category name="RUT">
**Must-have indicators:**
- "DIAN", "Registro Único Tributario", "Tax Registration"
- Tax classification codes, economic activities, tax responsibilities
**Red flags:** No tax-related terminology
**Example:** "DIAN Formulario del Registro Único Tributario... Razón social VR INGENIERIA"
</category>

<category name="CERL">
**Must-have indicators:**
- "Cámara de Comercio", "Certificate of Existence", "Certificado de Existência"
- "Matrícula", corporate registration, legal representation certificates
**Red flags:** Pure tax documents, personal IDs, shareholder lists without existence certificate
**Example:** "CÁMARA DE COMERCIO DE BOGOTÁ CERTIFICADO DE EXISTENCIA... NIT: 800035887-3"
</category>

<category name="ACC">
**Must-have indicators:**
- Shareholder/ownership terminology: "Accionistas", "Shareholders", "participación"
- Ownership percentages or share capital details
**Red flags:** No ownership percentages, pure corporate certificates without ownership details
**Example:** "Mauricio Obregón Gutiérrez 79.425.769 GERENTE 80% C.C"
</category>

<category name="RUB">
**Must-have indicators:**
- "Registro Único de Beneficiarios", "beneficial owners", "beneficiários"
- Explicit beneficiary listings with ownership details
**Red flags:** General shareholder info without beneficiary context
</category>

<category name="POR_REVISAR">
**Use when:**
- Conflicting evidence from multiple categories with similar strength
- Document incomplete, illegible, or missing critical information
- Mixed content without clear predominant category
- Low confidence in any classification
</category>

</category_definitions>

<output_requirements>
Return exactly one JSON object with proper character escaping:

```json
{
  "category": "CATEGORY_NAME",
  "text": "complete extracted text with proper JSON escaping"
}
```

**Critical requirements:**
- Use exact category names: CERL, CECRL, RUT, RUB, ACC, BLANK, LINK_ONLY, POR_REVISAR
- Properly escape special characters in "text" field (quotes, newlines, backslashes)
- Include ALL extracted text, even if empty (BLANK should return `"text": ""`)
- Return only the JSON object with no additional commentary
- Ensure valid JSON formatting
- When in doubt, prefer POR_REVISAR over incorrect classification
</output_requirements>

<task>
Analyze the attached PDF document and classify it according to the established workflow.
</task>

<instructions>
<text_extraction>
Extract all readable text from the document (include partial text and tolerate OCR errors).
</text_extraction>

<evaluation_order>
1. First check: BLANK (empty/meaningless content)
2. Second check: LINK_ONLY (primarily hyperlinks)
3. Then evaluate: CECRL, RUT, CERL, ACC, RUB based on content indicators
4. If uncertain or conflicting evidence → POR_REVISAR
</evaluation_order>

<output_format>
Return ONLY a valid JSON object with this exact structure:
{
  "category": "CATEGORY_NAME",
  "text": "complete extracted text with proper JSON escaping"
}
</output_format>
</instructions>

<critical_requirements>
- Use EXACT category names: CERL, CECRL, RUT, RUB, ACC, BLANK, LINK_ONLY, POR_REVISAR
- For BLANK category: "text" field must be empty string ""
- Properly escape JSON special characters (quotes, newlines, backslashes)
- NO explanations, comments, or additional content outside the JSON
- When in doubt, prefer POR_REVISAR over incorrect classification
</critical_requirements>
    """  
}

# Run the test
allocation_test_results = test_dynamic_allocation(test_prompts, budget_allocator)

# Display the results
display(allocation_test_results)


Testing prompt: Complex_Analysis
Prompt: 
    You are a precise legal document classification specialist. Your task is to analyze the provide...
Classification: complex (determined in 22.43s, 2046 tokens, $0.002558)
Standard mode allocation: {'complexity': 'complex', 'use_extended_thinking': True, 'reasoning_budget': 4096, 'time_sensitive': False, 'cost_constrained': False}
Classification: complex (determined in 1.58s, 2046 tokens, $0.002558)
Time-sensitive allocation: {'complexity': 'complex', 'use_extended_thinking': True, 'reasoning_budget': 2048, 'time_sensitive': True, 'cost_constrained': False}

Executing with standard allocation...
Info: Extended Thinking enabled increasing maxTokens from 1000 to 4097 to exceed reasoning budget
Completed in 3.67s, 1971 tokens, $0.007569


Unnamed: 0,Prompt,Complexity,Use_Extended_Thinking,Reasoning_Budget,Time_Sensitive_Budget,Elapsed_Time,Total_Tokens,Total_Cost
0,Complex_Analysis,complex,True,4096,2048,3.668263,1971,0.007569
