# Mappr

> Scale up evaluation report mapping against evaluation frameworks using agentic workflows


::: {.callout-warning}
This notebook is a work in progress.
:::

Manually mapping evaluation reports against IOM's [Strategic Results Framework (SRF)](https://srf.iom.int) is time-consuming and resource-intensive with ~150 outputs to analyze. Additionally, the mapping process needs transparent and human-readable traces of LLM decision flows that both reflect natural reasoning patterns and allow human evaluators to audit the mapping logic.

A three-stage async pipeline leveraging [Global Compact for Migration (GCM) UN General Assembly Resolution](https://www.un.org/en/development/desa/population/migration/generalassembly/docs/globalcompact/A_RES_73_195.pdf) objectives as SRF Outputs pruning mechanism:



**Stage 1**: SRF Enablers & Cross-cutting Analysis

- **Async parallel analysis** of Enablers (7 categories) and Cross-cutting Priorities (4 categories) using shared semaphore for rate limiting
- **Purpose**: Identify if report is primarily meta-evaluation/transversal in nature
- **Fast processing**: ~11 items total with concurrent execution, provides context for subsequent stages

**Stage 2**: Informed GCM Analysis

- **Rate-limited parallel processing** of GCM Objectives (23 items) informed by Stage 1 results
- **Condensed representations**: UN General Assembly Resolution formulation simplified for retrieval efficiency
- **Concurrent theme analysis** with API quota management

**Stage 3**: Targeted SRF Analysis

- **SRF Filtering**: Use GCM results + `gcm_srf_lut` lookup table to prune ~150 SRF outputs to ~20-50 relevant ones
- **Deep parallel analysis**: Full hierarchy context (objective → outcome → output → indicators)
- **Async batch processing**: Final targeted analysis of pruned SRF outputs with retry logic and error handling

::: {.column-body}
![Three-stage Pipeline Overview](img/three-stage-pipeline-overview.png){fig-align="center" width="800px"}
:::

In [None]:
#| default_exp mappr

In [None]:
#| exports
from pathlib import Path
from functools import reduce
from toolslm.md_hier import *
from rich import print
import json
from fastcore.all import *
from enum import Enum
import logging
import uuid
from datetime import datetime
from typing import List, Callable
import dspy
from asyncio import Semaphore, gather, sleep
import time
from collections import defaultdict
import copy

from evaluatr.frameworks import (EvalData, 
                                 IOMEvalData, 
                                 FrameworkInfo, 
                                 Framework,
                                 FrameworkCat)

In [None]:
#| exports
from dotenv import load_dotenv
import os

load_dotenv()
GEMINI_API_KEY = os.getenv('GEMINI_API_KEY')

In [None]:
#| exports
cfg = AttrDict({
    'lm': 'gemini/gemini-2.0-flash-exp',
    'api_key': GEMINI_API_KEY,
    'max_tokens': 8192,
    'track_usage': False,
    'rpm_limit': 15, 
    'call_delay': 6,
    'dirs': AttrDict({
        'data': '.evaluator',
        'trace': 'traces'
    }),
    'verbosity': 1,
    'cache': AttrDict({
        'is_active': True,
        'delay': 0.1 # threshold in seconds below which we consider the response is cached
    })
})

In [None]:
#| exports
lm = dspy.LM(cfg.lm, api_key=cfg.api_key, cache=cfg.cache.is_active)
dspy.configure(lm=lm)

In [None]:
#| eval: false
doc = Path("../_data/md_library/49d2fba781b6a7c0d94577479636ee6f/abridged_evaluation_report_final_olta_ndoja_pdf/enriched")
pages = doc.ls(file_exts=".md").sorted(key=lambda p: int(p.stem.split('_')[1]))
report = '\n\n---\n\n'.join(page.read_text() for page in pages)
print(report[:1000])

## Hierarchical report navigation

Thanks to `toolslm.md_hier` and a clean markdown structure of a `report` markdown, we can create a nested dictionary of section, subsection, ... as follows:

In [None]:
#| eval: false
hdgs = create_heading_dict(report); hdgs

{'PPMi .... page 1': {},
 'CONTENTS .... page 3': {},
 '1. Introduction .... page 4': {},
 '2. Background of the JI-HoA .... page 5': {'2.1. Context and design of the JI-HoA .... page 5': {},
  '2.2. External factors affecting the implementation of the JI .... page 7': {}},
 '3. Methodology .... page 8': {},
 '4. Findings .... page 10': {'4.1. Relevance .... page 10': {'4.1.1. Relevance of programme activities for migrants, returnees, and communities .... page 10': {}},
  'Overall performance score for relevance: $3.9 / 5$ <br> Robustness score for the evidence: $4.5 / 5$': {'4.1.1.1 Needs of migrants .... page 10': {},
   '4.1.1.2 Needs of returnees .... page 10': {},
   '4.1.1.3 Needs of community members .... page 12': {},
   "4.1.2. Programme's relevance to the needs of stakeholders .... page 12": {'4.1.2.1 Needs of governments .... page 12': {},
    '4.1.2.2 Needs of other stakeholders .... page 13': {}},
   '4.2. Coherence .... page 13': {"4.2.1. The JI-HoA's alignment with the o

In [None]:
#| exports
def find_section_path(
    hdgs: dict, # The nested dictionary structure
    target_section: str # The section name to find
) -> list: # The nested key path for the given section name
    "Find the nested key path for a given section name."
    def search_recursive(current_dict, path=[]):
        for key, value in current_dict.items():
            current_path = path + [key]
            if key == target_section:
                return current_path
            if isinstance(value, dict):
                result = search_recursive(value, current_path)
                if result:
                    return result
        return None
    
    return search_recursive(hdgs)

Then we can retrieve the subsection path (list of nested headings to reach this specific section) in this nested `hdgs` dict :

In [None]:
#| eval: false
path = find_section_path(hdgs, "4.1.1.1 Needs of migrants .... page 10"); path

['4. Findings .... page 10',
 'Overall performance score for relevance: $3.9 / 5$ <br> Robustness score for the evidence: $4.5 / 5$',
 '4.1.1.1 Needs of migrants .... page 10']

Then retrieve the specific subsection content:

In [None]:
#| exports
def get_content_tool(
    hdgs: dict, # The nested dictionary structure
    keys_list: list, # The list of keys to navigate through
    ) -> str: # The content of the section
    "Navigate through nested levels using the exact key strings."
    return reduce(lambda current, key: current[key], keys_list, hdgs).text

In [None]:
#| eval: false
content = get_content_tool(hdgs, path)
print(content[:500])

## Formatters

We define here a set of function formatting both evaluation frameworks themes to analyze (SRF enablers, objectives, GCM objectives, ...) and traces.

In [None]:
#| exports
def format_enabler_theme(
    theme: EvalData # The theme object
    ) -> str: # The formatted theme string
    "Format SRF enabler into structured text for LM processing."
    parts = [
        f'## Enabler {theme.id}: {theme.title}',
        '### Description', 
        theme.description
    ]
    return '\n'.join(parts)

For instance: 

In [None]:
#| eval: false
eval_data = IOMEvalData()
data_evidence = eval_data.srf_enablers[3]  # "Data and evidence" is at index 3
print(format_enabler_theme(data_evidence))

In [None]:
#| exports
def format_crosscutting_theme(
    theme: EvalData # The theme object
    ) -> str: # The formatted theme string
    "Format SRF cross-cutting into structured text for LM processing."
    parts = [
        f'## Cross-cutting {theme.id}: {theme.title}',
        '### Description', 
        theme.description
    ]
    return '\n'.join(parts)

## Signatures

A [DSPy signature](https://dspy.ai/learn/programming/signatures) is a declarative specification of input/output behavior of a DSPy module. Signatures allow you to tell the LM (Language Model) what it needs to do, rather than specify how we should ask the LM to do it.

In [None]:
#| exports
class Overview(dspy.Signature):
    "Based on framework theme to map and report's TOC determine the sections to explore first."
    theme: str = dspy.InputField(desc="Theme being analyzed")
    all_headings: str = dspy.InputField(desc="Complete document structure")
    priority_sections: List[str] = dspy.OutputField(desc="Ordered list of section keys to explore first")
    strategy: str = dspy.OutputField(desc="Reasoning for this exploration strategy")

For instance on "Data and evidence" SRF Enabler:

In [None]:
#| eval: false
overview_analyzer = dspy.ChainOfThought(Overview)
result_overview = overview_analyzer(
    theme = format_enabler_theme(data_evidence),
    all_headings=str(hdgs),
)

print(f'Priority sections: {result_overview.priority_sections}')
print(f'Strategy: {result_overview.strategy}')

In [None]:
#| exports
class Exploration(dspy.Signature):
    "Decide next exploration step for theme to be mapped based on current findings and available sections."
    theme: str = dspy.InputField(desc="Theme being analyzed")
    current_findings: str = dspy.InputField(desc="Evidence found so far")
    available_sections: str = dspy.InputField(desc="Remaining sections to explore")
    next_section: str = dspy.OutputField(desc="Next section key to explore, or 'DONE' if sufficient")
    reasoning: str = dspy.OutputField(desc="Why this section or why stopping")

In [None]:
#| eval: false
exploration = dspy.ChainOfThought(Exploration)

result_exploration = exploration(
    theme = format_enabler_theme(data_evidence),
    current_findings="No evidence collected yet",
    available_sections=str(result_overview.priority_sections)
)

print("Next section:", result_exploration.next_section)
print("Reasoning:", result_exploration.reasoning)

In [None]:
#| exports
class Assessment(dspy.Signature):
    "Assess if current evidence is sufficient for theme analysis."
    theme: str = dspy.InputField(desc="Theme being analyzed")
    evidence_so_far: str = dspy.InputField(desc="All evidence collected")
    sections_explored: str = dspy.InputField(desc="Sections already checked")
    sufficient: bool = dspy.OutputField(desc="Is evidence sufficient to make conclusion?")
    confidence_score: float = dspy.OutputField(desc="Confidence in current findings (0-1)")
    next_priority: str = dspy.OutputField(desc="If continuing, what type of section to prioritize")
    reasoning: str = dspy.OutputField(desc="Why this assessment was made")


We treat observability and LLM evaluation as core requirements for our mapping pipeline. While DSPy's built-in `dspy.inspect_history()` provides valuable reasoning chains, we enhance it with structured metadata (`report_id`, `phase`, `framework`) to create comprehensive audit trails. This enriched tracing enables systematic evaluation of mapping accuracy, supports human evaluator annotation workflows, and provides the detailed context necessary for debugging and improving our LLM-based document analysis system. 

We define below enum and configuration classes for pipeline tracing and validation. These provide structured metadata for audit trails and evaluation.

In [None]:
#| exports
class Phase(Enum):
    "Pipeline phase number."
    STAGE1 = "stage1"
    STAGE2 = "stage2"
    STAGE3 = "stage3"
    def __str__(self): return self.value

In [None]:
#| exports
class TraceContext(AttrDict):
    "Context for tracing the mapping process"
    def __init__(self, 
                 report_id:str,  # Report identifier
                 phase:Phase,  # Pipeline phase number
                 framework:FrameworkInfo,  # Framework info (name, category, theme_id)
                 ): 
        # self.run_id = str(uuid.uuid4())[:8]  # Short unique ID
        store_attr()

    def __repr__(self):
        return f"TraceContext(report_id={self.report_id}, phase={self.phase}, framework={self.framework})"

In [None]:
#| eval: false
tr_ctx = TraceContext(
    report_id='49d2fba781b6a7c0d94577479636ee6f', 
    phase=Phase.STAGE1, 
    framework=FrameworkInfo(Framework.SRF, FrameworkCat.ENABLERS, "4")
    )

tr_ctx

```json
TraceContext(report_id=49d2fba781b6a7c0d94577479636ee6f, phase=stage1, framework={'category': 'Enablers', 'theme_id': '4', 'name': 'SRF'})
```

In [None]:
#| exports
class Synthesis(dspy.Signature):
    "Provide detailed rationale and synthesis of theme analysis."
    trace_ctx: str = dspy.InputField(desc="Trace context")
    theme: str = dspy.InputField(desc="Theme being analyzed")
    all_evidence: str = dspy.InputField(desc="All collected evidence")
    sections_explored: str = dspy.InputField(desc="List of sections explored")
    theme_covered: bool = dspy.OutputField(desc="Final decision on theme coverage")
    confidence_explanation: str = dspy.OutputField(desc="Detailed explanation of confidence score")
    evidence_summary: str = dspy.OutputField(desc="Key evidence supporting the conclusion")
    gaps_identified: str = dspy.OutputField(desc="Any gaps or missing aspects")

## Reasoning & Acting (ReAct)

**Why We Built a Custom Iterative Analyzer Instead of Using DSPy ReAct?**

We could have leveraged DSPy's built-in [`ReAct` module](https://dspy.ai/api/modules/ReAct), which provides an agent-based approach where the LLM automatically decides when and how to use exploration tools. The "ReAct" concept has been introduced in [this paper](https://arxiv.org/pdf/2210.03629). However, we chose to implement our own iterative analyzer from scratch for several critical reasons:

- **Open-ended vs. Structured Nature**: DSPy's ReAct is designed for open-ended problem solving where the agent explores freely using available tools. Our use case requires a more structured, methodical approach to document analysis with predictable exploration patterns.

- **Document-Specific Control**: Our approach is tailored specifically for structured document exploration with hierarchical headings, allowing us to implement domain-specific logic for section navigation and content retrieval.

- **Evaluator Requirements**: Since traces will be reviewed by human evaluators for error analysis, we needed explicit, step-by-step decision logging rather than the more implicit reasoning chains that ReAct provides.

In [None]:
#| exports
traces_dir = Path.home() / cfg.dirs.data / cfg.dirs.trace
traces_dir.mkdir(parents=True, exist_ok=True)

In [None]:
#| exports
def setup_logger(name, handler, level=logging.INFO, **kwargs):
    "Helper function to setup a logger with common configuration"
    logger = logging.getLogger(name)
    logger.handlers.clear()
    logger.addHandler(handler)
    logger.setLevel(level)
    for k,v in kwargs.items(): setattr(logger, k, v)
    return logger

In [None]:
#| exports
def setup_trace_logging(report_id, verbosity=cfg.verbosity):
    "Setup the trace logging (verbosity and report_id)"
    file_handler = logging.FileHandler(traces_dir / f'{report_id}.jsonl', mode='w')
    setup_logger('trace.file', file_handler)    
    console_handler = logging.StreamHandler()
    setup_logger('trace.console', console_handler, verbosity=verbosity)

In [None]:
#| exports
class ThemeAnalyzer(dspy.Module):
    """
    Analyzes a theme across a document by iteratively exploring sections, collecting evidence, and synthesizing findings. 
    Uses a structured pipeline of overview -> exploration -> assessment -> synthesis.
    """
    def __init__(self, 
                 overview_sig:dspy.Signature, # Overview signature
                 exploration_sig:dspy.Signature, # Exploration signature
                 assessment_sig:dspy.Signature, # Assessment signature
                 synthesis_sig:dspy.Signature, # Synthesis signature
                 trace_ctx:TraceContext, # Trace context
                 confidence_threshold:float=0.8, # Confidence threshold
                 max_iter:int=10, # Maximum number of iterations in the ReAct loop
                 semaphore=None # Semaphore for rate limiting
                 ):
        self.overview = dspy.ChainOfThought(overview_sig)
        self.explore = dspy.ChainOfThought(exploration_sig)
        self.assess = dspy.ChainOfThought(assessment_sig)
        self.synthesize = dspy.ChainOfThought(synthesis_sig)
        self.max_iter = max_iter
        self.trace_ctx = trace_ctx
        self.confidence_threshold = confidence_threshold
        self.semaphore = semaphore

In [None]:
#| exports
@patch
def _log_trace(self:ThemeAnalyzer, event, **extra_data):
    file_logger = logging.getLogger('trace.file')
    console_logger = logging.getLogger('trace.console')
    
    base_data = {
        "timestamp": datetime.now().isoformat(),
        "event": event,
        "report_id": self.trace_ctx.report_id,
        "phase": str(self.trace_ctx.phase),
        "framework": str(self.trace_ctx.framework.name),
        "framework_category": str(self.trace_ctx.framework.category),
        "framework_theme_id": str(self.trace_ctx.framework.theme_id),
    }
    base_data.update(extra_data)
    
    # File logger - always full JSON
    file_logger.info(json.dumps(base_data, indent=2))
    
    # Console logger - verbosity-based formatting
    if hasattr(console_logger, 'verbosity'):
        if console_logger.verbosity == 1:
            console_msg = f"{base_data['report_id']} - {base_data['phase']}"
        elif console_logger.verbosity == 2:
            console_msg = f"{base_data['report_id']} - {base_data['phase']} - {base_data['framework']} - {base_data['framework_category']} - {base_data['framework_theme_id']} - {base_data['event']}"
        else:  # verbosity == 3
            console_msg = json.dumps(base_data, indent=2)
        
        console_logger.info(console_msg)

In [None]:
#| exports
@patch    
async def _rate_limited_fn(self:ThemeAnalyzer, mod, **kwargs):
    async with self.semaphore:
        start = time.time()
        result = await mod.acall(**kwargs)
        
        # Check if cached (fast response + no usage)
        elapsed = time.time() - start
        if elapsed > cfg.cache.delay: await sleep(cfg.call_delay)
        return result

In [None]:
#| exports
@patch
async def aforward(
    self:ThemeAnalyzer, 
    theme: str, # The formatted theme to analyze
    headings: dict, # The headings TOC of the document
    get_content_fn:Callable=get_content_tool, # The function to get the content of a section using `hdgs[keys_list].text` for instance
    ) -> Synthesis: # Synthesized analysis results including theme coverage, confidence, evidence and gaps
    "Executes a structured analysis process."
    self._log_trace(event="Starting Analysis", theme=theme)
    priority_sections = await self.get_overview(theme, headings)
    evidence = await self.explore_iteratively(theme, priority_sections, headings, get_content_fn)
    return await self.synthesize_findings(theme, evidence)

In [None]:
#| exports
@patch
async def get_overview(
    self:ThemeAnalyzer, 
    theme: str, # The formatted theme to analyze
    headings: dict, # The headings TOC of the document
    ) -> Overview:
    "Based on framework theme to map and report's TOC determine the sections to explore first."
    overview = await self._rate_limited_fn(self.overview, theme=theme, all_headings=str(headings))
    self._log_trace(
        event="Overview", 
        priority_sections=overview.priority_sections, 
        strategy=overview.strategy)
    return overview.priority_sections

In [None]:
#| exports
@patch
async def explore_iteratively(
    self:ThemeAnalyzer, 
    theme: str, # The formatted theme to analyze
    priority_sections: list, # The sections to explore first
    headings: dict, # The headings TOC of the document
    get_content_fn: Callable, # The function to get the content of a section using `hdgs[keys_list].text` for instance
    ) -> dict:
    "Iteratively explore the sections to collect evidence."
    evidence_collected = []
    sections_explored = []
    available_sections = priority_sections.copy()
    
    for i in range(self.max_iter):
        if not available_sections:
            self._log_trace(event="Iterative Exploration", iteration_nb=i+1, decision="No more sections to explore, stopping")
            break
            
        if await self.should_stop_exploring(theme, evidence_collected, sections_explored):   
            break
        
        decision = await self.make_exploration_decision(theme, evidence_collected, available_sections)
        self._log_trace(
            event="Iterative Exploration", 
            iteration_nb=i+1, 
            decision=decision.next_section, 
            reasoning=decision.reasoning)
        
        if decision.next_section == 'DONE':
            self._log_trace(event="Iterative Exploration", iteration_nb=i+1, decision="Done")
            break
        
        evidence_collected, sections_explored = self.process_section(decision, 
                                                                     headings, 
                                                                     get_content_fn, 
                                                                     evidence_collected, 
                                                                     sections_explored, 
                                                                     available_sections)
    
    return {"evidence": evidence_collected, "sections": sections_explored}


In [None]:
#| exports
@patch
async def make_exploration_decision(
    self:ThemeAnalyzer, 
    theme: str, # The formatted theme to analyze
    evidence_collected: list, # The evidence collected so far
    available_sections: list # The sections to explore
    ):    
    "Make a decision on the next section to explore."
    decision = await self._rate_limited_fn(
        self.explore, 
        theme=theme, 
        current_findings="\n\n".join(evidence_collected) if evidence_collected else "No evidence collected yet", available_sections=str(available_sections))
    
    return decision


In [None]:
#| exports
@patch
async def should_stop_exploring(
    self:ThemeAnalyzer, 
    theme: str, # The formatted theme to analyze
    evidence_collected: list, # The evidence collected so far
    sections_explored: list # The sections explored so far
    ):
    "Check if the exploration should stop based on the evidence collected and the sections explored."
    if not evidence_collected:
        return False
    assessment = await self._rate_limited_fn(
        self.assess, 
        theme=theme,
        evidence_so_far="\n\n".join(evidence_collected),
        sections_explored=str(sections_explored)
    )
    
    self._log_trace(
        "Should stop exploring", 
        assessment=assessment.sufficient, 
        confidence=assessment.confidence_score,
        reasoning=assessment.reasoning
        )
    
    return assessment.sufficient and assessment.confidence_score > self.confidence_threshold

In [None]:
#| exports
@patch
def process_section(self:ThemeAnalyzer, decision, headings, get_content_fn, evidence_collected, sections_explored, available_sections):
    path = find_section_path(headings, decision.next_section)
    
    if path:
        content = get_content_fn(headings, path)
        evidence_collected.append(f"# Section: {decision.next_section}\n## Content\n{content}")
        sections_explored.append(decision.next_section)
        if decision.next_section in available_sections:
            available_sections.remove(decision.next_section)
    else:
        # No path found for section! TBD
        pass
    
    return evidence_collected, sections_explored

In [None]:
#| exports
@patch
async def synthesize_findings(self:ThemeAnalyzer, theme, evidence):
    synthesis = await self._rate_limited_fn(
        self.synthesize, 
        trace_ctx=str(self.trace_ctx),
        theme=theme,
        all_evidence="\n\n".join(evidence["evidence"]),
        sections_explored=str(evidence["sections"])
    )
    
    self._log_trace("Synthesis", 
                    theme=theme, 
                    reasoning=synthesis.reasoning,
                    theme_covered=synthesis.theme_covered,
                    confidence_explanation=synthesis.confidence_explanation,
                    evidence_summary=synthesis.evidence_summary,
                    gaps_identified=synthesis.gaps_identified
                    )
    
    synthesis.framework_name = self.trace_ctx.framework.name
    synthesis.framework_category = self.trace_ctx.framework.category  
    synthesis.framework_theme_id = self.trace_ctx.framework.theme_id
    return synthesis

To use it:

#### Single theme

Setup the trace logging (verbosity and report_id):

In [None]:
#| eval: false
setup_trace_logging(report_id="49d2fba781b6a7c0d94577479636ee6f", verbosity=2)

In [None]:
#| eval: false
# Number of concurrent requests to the LM to avoid rate limiting
stage1_semaphore = Semaphore(3)

Create the analyzer:

In [None]:
#| eval: false
print(f"Trace Context:\n{tr_ctx}")
theme = format_enabler_theme(eval_data.srf_enablers[3])  # "Data and evidence"
print(f"Test theme:\n{theme}")
analyzer = ThemeAnalyzer(Overview, Exploration, Assessment, Synthesis, tr_ctx, semaphore=stage1_semaphore)

Then analyze the framework's theme of choice:

In [None]:
#| eval: false
result = await analyzer.acall(theme, hdgs, get_content_tool)

49d2fba781b6a7c0d94577479636ee6f - stage1 - SRF - Enablers - 4 - Starting Analysis
49d2fba781b6a7c0d94577479636ee6f - stage1 - SRF - Enablers - 4 - Overview
49d2fba781b6a7c0d94577479636ee6f - stage1 - SRF - Enablers - 4 - Iterative Exploration
49d2fba781b6a7c0d94577479636ee6f - stage1 - SRF - Enablers - 4 - Should stop exploring
49d2fba781b6a7c0d94577479636ee6f - stage1 - SRF - Enablers - 4 - Iterative Exploration
49d2fba781b6a7c0d94577479636ee6f - stage1 - SRF - Enablers - 4 - Should stop exploring
49d2fba781b6a7c0d94577479636ee6f - stage1 - SRF - Enablers - 4 - Iterative Exploration
49d2fba781b6a7c0d94577479636ee6f - stage1 - SRF - Enablers - 4 - Should stop exploring
49d2fba781b6a7c0d94577479636ee6f - stage1 - SRF - Enablers - 4 - Iterative Exploration
49d2fba781b6a7c0d94577479636ee6f - stage1 - SRF - Enablers - 4 - Should stop exploring
49d2fba781b6a7c0d94577479636ee6f - stage1 - SRF - Enablers - 4 - Iterative Exploration
49d2fba781b6a7c0d94577479636ee6f - stage1 - SRF - Enablers -

#### Multiple themes in parallel

In [None]:
#| eval: false
setup_trace_logging(report_id="49d2fba781b6a7c0d94577479636ee6f", verbosity=2)

Let's use two SRF enablers:

In [None]:
#| eval: false
tr_ctx1 = TraceContext(
    report_id='49d2fba781b6a7c0d94577479636ee6f', 
    phase=Phase.STAGE1, 
    framework=FrameworkInfo(Framework.SRF, FrameworkCat.ENABLERS, "1")
)
tr_ctx2 = TraceContext(
    report_id='49d2fba781b6a7c0d94577479636ee6f', 
    phase=Phase.STAGE1, 
    framework=FrameworkInfo(Framework.SRF, FrameworkCat.ENABLERS, "4")
    )   
print(tr_ctx1, tr_ctx2)

Create analyzers with shared semaphore:

In [None]:
#| eval: false
stage_semaphore = Semaphore(3)
analyzer1 = ThemeAnalyzer(Overview, Exploration, Assessment, Synthesis, tr_ctx1,semaphore=stage_semaphore)
analyzer2 = ThemeAnalyzer(Overview, Exploration, Assessment, Synthesis, tr_ctx2,semaphore=stage_semaphore)

In [None]:
#| eval: false
theme1 = format_enabler_theme(eval_data.srf_enablers[0]) # Workforce
theme2 = format_enabler_theme(eval_data.srf_enablers[3]) # Data and evidence
print(f"Theme 1:\n{theme1}\n\nTheme 2:\n{theme2}")

In [None]:
#| eval: false
results = await gather(
    analyzer1.acall(theme1, hdgs, get_content_tool),
    analyzer2.acall(theme2, hdgs, get_content_tool)
)

49d2fba781b6a7c0d94577479636ee6f - stage1 - SRF - Enablers - 1 - Starting Analysis
49d2fba781b6a7c0d94577479636ee6f - stage1 - SRF - Enablers - 1 - Overview
49d2fba781b6a7c0d94577479636ee6f - stage1 - SRF - Enablers - 1 - Iterative Exploration
49d2fba781b6a7c0d94577479636ee6f - stage1 - SRF - Enablers - 1 - Should stop exploring
49d2fba781b6a7c0d94577479636ee6f - stage1 - SRF - Enablers - 1 - Iterative Exploration
49d2fba781b6a7c0d94577479636ee6f - stage1 - SRF - Enablers - 1 - Iterative Exploration
49d2fba781b6a7c0d94577479636ee6f - stage1 - SRF - Enablers - 1 - Synthesis
49d2fba781b6a7c0d94577479636ee6f - stage1 - SRF - Enablers - 4 - Starting Analysis
49d2fba781b6a7c0d94577479636ee6f - stage1 - SRF - Enablers - 4 - Overview
49d2fba781b6a7c0d94577479636ee6f - stage1 - SRF - Enablers - 4 - Iterative Exploration
49d2fba781b6a7c0d94577479636ee6f - stage1 - SRF - Enablers - 4 - Should stop exploring
49d2fba781b6a7c0d94577479636ee6f - stage1 - SRF - Enablers - 4 - Iterative Exploration
49

In [None]:
#| eval: false
results

[Prediction(
     reasoning='The provided evidence touches upon several aspects related to the "Workforce" enabler theme. The evaluation highlights capacity building efforts for stakeholders, increased data availability for policymaking, and the importance of IOM\'s work in supporting migrants and returnees. The recommendations also emphasize the need to enhance capacity and ownership among national, regional, and local stakeholders, as well as building partnerships with service providers. While the evidence doesn\'t directly address internal IOM workforce development, the focus on capacity building of external stakeholders and the need for continued support suggests an indirect link to the theme. The discussion of staff turnover impacting capacity building also hints at workforce-related challenges. However, the primary focus is on external stakeholders rather than IOM\'s internal workforce. Therefore, while there are tangential connections, the evidence doesn\'t fully cover the theme

## Pipeline Orchestrator

In [None]:
#| exports
class PipelineResults(dict):
    def __init__(self):
        super().__init__()
        self[Phase.STAGE1] = defaultdict(lambda: defaultdict(dict))
        self[Phase.STAGE2] = defaultdict(lambda: defaultdict(dict))
        self[Phase.STAGE3] = defaultdict(lambda: defaultdict(dict))

In [None]:
#| exports
@patch
def __call__(self:PipelineResults, stage=Phase.STAGE1, filter_type="all"):
    themes = []
    for frameworks in self[stage].values():
        for categories in frameworks.values():
            for theme in categories.values():
                if filter_type == "all" or \
                   (filter_type == "covered" and theme.theme_covered) or \
                   (filter_type == "uncovered" and not theme.theme_covered):
                    themes.append(theme)
    return themes

In [None]:
#| exports
class PipelineOrchestrator:
    "Orchestrator for the IOM evaluation report mapping pipeline"
    def __init__(self, 
                 report_id:str, # Report identifier
                 headings:dict, # Report headings
                 get_content_fn:Callable, # Function to get the content of a section
                 eval_data:EvalData, # Evaluation data
                 verbosity:int=2, # Verbosity level
                 ):
        store_attr()
        setup_trace_logging(report_id, verbosity)
        self.results = PipelineResults()

In [None]:
#| exports
@patch
async def run_stage1(self:PipelineOrchestrator, semaphore):
    "Run stage 1 of the pipeline"
    analyzers = []
    
    collections = [
        (self.eval_data.srf_enablers, FrameworkCat.ENABLERS, format_enabler_theme),
        (self.eval_data.srf_crosscutting_priorities, FrameworkCat.CROSSCUT, format_crosscutting_theme)
    ]

    for items, framework_cat, format_fn in collections:
        for item in items:
            trace_ctx = TraceContext(self.report_id, Phase.STAGE1, FrameworkInfo(Framework.SRF, framework_cat, item.id))
            theme = format_fn(item)
            analyzer = ThemeAnalyzer(Overview, Exploration, Assessment, Synthesis, trace_ctx, semaphore=semaphore)
            analyzers.append((analyzer, theme))

    results = await gather(*[analyzer.acall(theme, self.headings, self.get_content_fn) 
                             for analyzer, theme in analyzers])
    for result in results: 
        self.results[Phase.STAGE1][result.framework_name][result.framework_category][result.framework_theme_id] = result

In [None]:
#| eval: false
report_id = "49d2fba781b6a7c0d94577479636ee6f"
hdgs = create_heading_dict(report)
get_content_fn = get_content_tool
eval_data = IOMEvalData()

orchestrator = PipelineOrchestrator(report_id, hdgs, get_content_fn, eval_data)

In [None]:
#| eval: false
await orchestrator.run_stage1(Semaphore(1))

49d2fba781b6a7c0d94577479636ee6f - stage1 - SRF - Enablers - 1 - Starting Analysis
49d2fba781b6a7c0d94577479636ee6f - stage1 - SRF - Enablers - 1 - Overview
49d2fba781b6a7c0d94577479636ee6f - stage1 - SRF - Enablers - 1 - Iterative Exploration
49d2fba781b6a7c0d94577479636ee6f - stage1 - SRF - Enablers - 1 - Should stop exploring
49d2fba781b6a7c0d94577479636ee6f - stage1 - SRF - Enablers - 1 - Iterative Exploration
49d2fba781b6a7c0d94577479636ee6f - stage1 - SRF - Enablers - 1 - Iterative Exploration
49d2fba781b6a7c0d94577479636ee6f - stage1 - SRF - Enablers - 1 - Synthesis
49d2fba781b6a7c0d94577479636ee6f - stage1 - SRF - Enablers - 2 - Starting Analysis
49d2fba781b6a7c0d94577479636ee6f - stage1 - SRF - Enablers - 2 - Overview
49d2fba781b6a7c0d94577479636ee6f - stage1 - SRF - Enablers - 2 - Iterative Exploration
49d2fba781b6a7c0d94577479636ee6f - stage1 - SRF - Enablers - 2 - Should stop exploring
49d2fba781b6a7c0d94577479636ee6f - stage1 - SRF - Enablers - 2 - Iterative Exploration
49

49d2fba781b6a7c0d94577479636ee6f - stage1 - SRF - Enablers - 5 - Iterative Exploration
49d2fba781b6a7c0d94577479636ee6f - stage1 - SRF - Enablers - 5 - Iterative Exploration
49d2fba781b6a7c0d94577479636ee6f - stage1 - SRF - Enablers - 5 - Synthesis
49d2fba781b6a7c0d94577479636ee6f - stage1 - SRF - Enablers - 6 - Starting Analysis
49d2fba781b6a7c0d94577479636ee6f - stage1 - SRF - Enablers - 6 - Overview
49d2fba781b6a7c0d94577479636ee6f - stage1 - SRF - Enablers - 6 - Iterative Exploration
49d2fba781b6a7c0d94577479636ee6f - stage1 - SRF - Enablers - 6 - Iterative Exploration
49d2fba781b6a7c0d94577479636ee6f - stage1 - SRF - Enablers - 6 - Iterative Exploration
49d2fba781b6a7c0d94577479636ee6f - stage1 - SRF - Enablers - 6 - Iterative Exploration
49d2fba781b6a7c0d94577479636ee6f - stage1 - SRF - Enablers - 6 - Iterative Exploration
49d2fba781b6a7c0d94577479636ee6f - stage1 - SRF - Enablers - 6 - Iterative Exploration
49d2fba781b6a7c0d94577479636ee6f - stage1 - SRF - Enablers - 6 - Iterati

In [None]:
#| eval: false
orchestrator.results(Phase.STAGE1, filter_type="covered")

[Prediction(
     reasoning='The evidence suggests that the JI-HoA program actively fostered partnerships with various stakeholders, including governments, other international organizations, and service providers. The program addressed the needs of governments by providing capacity building and tools for migration management. It also collaborated with other organizations to avoid duplication and create complementary support initiatives. The program\'s alignment with regional frameworks like IGAD and the African Union further demonstrates its commitment to partnerships. While some partners expressed concerns about IOM\'s guidance, the overall assessment indicates a strong emphasis on building and maintaining partnerships. The evidence supports the theme of "Partnership" as an enabler for the program\'s success.',
     theme_covered=True,
     confidence_explanation="High confidence. The evaluation report explicitly mentions the importance of partnerships and provides several examples of

**Stage 2 [TODO]**

In [None]:
# gcm_small = IOMEvalData().gcm_objectives_small

In [None]:
# gcm_small

[EvalDict({'id': '1', 'title': 'Collect and utilize accurate and disaggregated dat...', 'core_theme': 'Strengthen global evidence base on migration throu...', 'key_principles': [4 items], 'target_groups': [4 items], ...}),
 EvalDict({'id': '2', 'title': 'Minimize the adverse drivers and structural factor...', 'core_theme': 'Address root causes of migration through sustainab...', 'key_principles': [4 items], 'target_groups': [4 items], ...}),
 EvalDict({'id': '3', 'title': 'Provide accurate and timely information at all sta...', 'core_theme': 'Ensure migrants have access to reliable informatio...', 'key_principles': [4 items], 'target_groups': [4 items], ...}),
 EvalDict({'id': '4', 'title': 'Ensure that all migrants have proof of legal ident...', 'core_theme': 'Strengthen civil registry systems and ensure migra...', 'key_principles': [4 items], 'target_groups': [4 items], ...}),
 EvalDict({'id': '5', 'title': 'Enhance availability and flexibility of pathways f...', 'core_theme': 'Expan

In [None]:
# print("GCM Small structure:")
# print(f"Type: {type(gcm_small)}")
# print(f"Keys: {list(gcm_small.keys())[:5]}...")  # First 5 keys
# print(f"Sample entry: {gcm_small[list(gcm_small.keys())[0]]}")

In [None]:
# def format_gcm_theme(
#     theme: EvalData, # The GCM theme object
#     stage1_context: str = "" # Context from Stage 1 covered themes
#     ) -> str: # The formatted theme string
#     "Format GCM objective into structured text for LM processing with Stage 1 context."
#     parts = [
#         f'## GCM Objective {theme.id}: {theme.title}',
#         '### Core Theme', 
#         theme.core_theme
#     ]
    
#     if theme.key_principles:
#         parts.extend(['### Key Principles', ', '.join(theme.key_principles)])
    
#     if theme.target_groups:
#         parts.extend(['### Target Groups', ', '.join(theme.target_groups)])
        
#     if theme.main_activities:
#         parts.extend(['### Main Activities', ', '.join(theme.main_activities)])
        
#     if stage1_context:
#         parts.extend(['### Stage 1 Context', stage1_context])
    
#     return '\n'.join(parts)

In [None]:
# format_gcm_theme(gcm_small["1"])