# Pipeline

> End-to-end pipeline for processing IOM evaluation reports


The `Report` class orchestrates the full evaluation pipeline:

1. **Download** - Fetch PDFs from IOM evaluation repository
2. **OCR** - Convert PDFs to markdown with heading hierarchy
3. **Extract** - Pull core sections from converted documents
4. **Map** - Identify which standardized framework themes are central to each report, including the [Strategic Results Framework (SRF)](https://srf.iom.int) (Enablers, Cross-cutting Priorities, and Outputs) and [Global Compact for Migration](https://www.un.org/en/development/desa/population/migration/generalassembly/docs/globalcompact/A_RES_73_195.pdf) objectives

In [None]:
#| default_exp pipeline

In [None]:
#| export
from fastcore.all import *
from pathlib import Path
import logging
from iomeval.core import n_tokens, load_prompt
from iomeval.readers import load_evals, find_eval, Evaluation
from iomeval.downloaders import download_eval
from iomeval.extract import extract_sections
from iomeval.themes import load_enablers, load_ccp, load_gcms, load_srf_outs, load_gcm_lut, fmt_enablers_ccp, fmt_srf_outs, get_srf_outs
from iomeval.mapper import mk_system_blocks, map_themes, sort_by_centrality, get_top_ids, parse_json_response
from mistocr.core import read_pgs
from mistocr.pipeline import pdf_to_md
import json

logging.root.handlers.clear()
logging.basicConfig(level=logging.INFO, format='%(message)s', stream=sys.stdout, force=True)

## Report class

The `Report` class wraps an `Evaluation` and provides methods for the four pipeline stages: download → ocr → extract → map

In [None]:
#| export
class Report:
    "An evaluation report with full pipeline support"
    def __init__(self,
                 ev:Evaluation,                   # The evaluation metadata object
                 pdf_url:str=None,                # Optional direct URL to PDF
                 results_path:str='data/results'  # Path to save/load results
                ):
        store_attr('ev,pdf_url,results_path')
        self.id = ev.id
        self.pdf_path,self.md_path,self.sections,self.mappings = None,None,None,{}
        self._load_existing()
    
    def _load_existing(self):
        "Load state from saved JSON if it exists"
        p = Path(self.results_path)/f'{self.id}.json'
        if not p.exists(): return
        data = json.loads(p.read_text())
        self.sections,self.mappings = data.get('sections'),data.get('mappings', {})
        if data.get('pdf_path'): self.pdf_path = Path(data['pdf_path'])
        if data.get('md_path'): self.md_path = Path(data['md_path'])
    
    @classmethod
    def from_url(cls,
                 url:str,                         # URL of the evaluation PDF
                 evals:list,                      # List of `Evaluation` objects to search
                 results_path:str='data/results'  # Path to save/load results
                ): return cls(find_eval(evals, url, by='url'), pdf_url=url, results_path=results_path)
    
    @classmethod
    def from_title(cls,
                   title:str,                      # Title to search for
                   evals:list,                     # List of `Evaluation` objects to search
                   results_path:str='data/results' # Path to save/load results
                  ): return cls(find_eval(evals, title, by='title'), results_path=results_path)

In [None]:
#| export
@patch
def _repr_markdown_(self:Report):
    "Display report metadata and processing status in Jupyter notebooks"
    title = self.ev.meta.get('Title', 'Untitled')
    year = self.ev.meta.get('Year', 'n/a')
    org = self.ev.meta.get('Evaluation Commissioner', 'Unknown')
    
    status = []
    if self.pdf_path: status.append(f'✓ PDF downloaded')
    if self.md_path: status.append(f'✓ Markdown converted')
    if self.sections: status.append(f'✓ Sections extracted (~{n_tokens(self.sections)} tokens)')
    if self.mappings:
        mapped = ', '.join(self.mappings.keys())
        status.append(f'✓ Mappings: {mapped}')
    status_str = ' | '.join(status) if status else 'Not processed'
    
    return f"""
## Report: {title}
**Year:** {year} | **Organization:** {org}  
**ID:** `{self.id}`

**Processing Status:**  
{status_str}

**Documents:** {len(self.ev.docs)} available
"""

#### Creating reports

Create a report from a URL:

In [None]:
#| eval: false
evals = load_evals('files/test/evaluations.json')
url = "https://evaluation.iom.int/sites/g/files/tmzbdl151/files/docs/resources/Abridged%20Evaluation%20Report_%20Final_Olta%20NDOJA.pdf"
report = Report.from_url(url, evals, results_path='files/test/results')
report


## Report: Final Evaluation of the EU-IOM Joint Initiative for migrant protection and reintegration in the horn of Africa
**Year:** 2023 | **Organization:** IOM  
**ID:** `49d2fba781b6a7c0d94577479636ee6f`

**Processing Status:**  
✓ PDF downloaded

**Documents:** 5 available


Or from a title:

In [None]:
#| eval: false
title = 'Evaluation of IOM Accountability to Affected Populations'
report = Report.from_title(title, evals, results_path='files/test/results')
report


## Report: Evaluation of IOM Accountability to Affected Populations
**Year:** 2025 | **Organization:** IOM  
**ID:** `6c3c2cf3fa479112967612b0baddab72`

**Processing Status:**  
Not processed

**Documents:** 4 available


::: {.callout-note}
When creating a report from title (rather than URL), if multiple PDFs are available for an evaluation, the `ocr()` and downstream methods will process the first PDF found in the download directory.
:::

## Persistence

Reports automatically save after each pipeline stage. Use `load_report` to resume from any checkpoint.

In [None]:
#| export
@patch
def save(self:Report,
         path:str=None  # Override default results path
        ) -> Report:
    "Save report state to JSON"
    p = Path(path or self.results_path)/f'{self.id}.json'
    p.parent.mkdir(parents=True, exist_ok=True)
    data = dict(id=self.id, ev_meta=self.ev.meta, ev_docs=self.ev.docs, sections=self.sections, mappings=self.mappings,
                pdf_path=str(self.pdf_path) if self.pdf_path else None, md_path=str(self.md_path) if self.md_path else None,
                results_path=self.results_path)
    p.write_text(json.dumps(data, indent=2))
    return self

In [None]:
#| export
def load_report(id:str,                  # Report ID (hash)
                path:str='data/results'  # Results directory
               ) -> Report:
    "Load a saved Report by id"
    data = json.loads((Path(path)/f'{id}.json').read_text())
    ev = Evaluation(id=data['id'], meta=data['ev_meta'], docs=data['ev_docs'])
    report = Report(ev, results_path=data.get('results_path', path))
    report.sections,report.mappings = data['sections'],data['mappings']
    if data.get('pdf_path'): report.pdf_path = Path(data['pdf_path'])
    if data.get('md_path'): report.md_path = Path(data['md_path'])
    return report

#### Resuming from checkpoint

In [None]:
#| eval: false
report = load_report('49d2fba781b6a7c0d94577479636ee6f', path='files/test/results')
report


## Report: Final Evaluation of the EU-IOM Joint Initiative for migrant protection and reintegration in the horn of Africa
**Year:** 2023 | **Organization:** IOM  
**ID:** `49d2fba781b6a7c0d94577479636ee6f`

**Processing Status:**  
✓ PDF downloaded

**Documents:** 5 available


## Pipeline Methods

The pipeline has four main stages: `download` → `ocr` → `extract` → `map_*`


### Download

Downloads the evaluation PDF from IOM's repository.

In [None]:
#| export
@patch
def download(self:Report,
             dst:str='data/pdfs',  # Destination directory for PDFs
             force:bool=False     # Force re-download
            ) -> Report:
    "Download evaluation PDF to `dst`/`eval_id`/"
    if self.pdf_path and not force: return self
    self.pdf_path = download_eval(self.ev, dst=dst)
    self.save(self.results_path)
    return self

In [None]:
#| eval: false
url = "https://evaluation.iom.int/sites/g/files/tmzbdl151/files/docs/resources/Abridged%20Evaluation%20Report_%20Final_Olta%20NDOJA.pdf"
report = Report.from_url(url, evals, results_path='files/test/results')
_ = report.download(dst='files/test/pdfs')

In [None]:
#| eval: false
Path('files/test/pdfs/49d2fba781b6a7c0d94577479636ee6f').ls()

(#5) [Path('files/test/pdfs/49d2fba781b6a7c0d94577479636ee6f/Evaluation%20Learning%20Brief_Final_Olta%20NDOJA.pdf'),Path('files/test/pdfs/49d2fba781b6a7c0d94577479636ee6f/Abridged%20Evaluation%20Report_%20Final_Olta%20NDOJA.pdf'),Path('files/test/pdfs/49d2fba781b6a7c0d94577479636ee6f/ISP_IOM_Case-Management-Return-Reintegr-JI-Review_final.pdf'),Path('files/test/pdfs/49d2fba781b6a7c0d94577479636ee6f/Final%20Evaluation%20Report%20Final_Olta%20NDOJA.pdf'),Path('files/test/pdfs/49d2fba781b6a7c0d94577479636ee6f/HoA%20EU%20JI%20Final%20Eval%20-%20Management%20Response%20Matrix%20-%20Final.pdf')]

### OCR

Runs OCR on the PDF using Mistral's API and converts to markdown with proper heading hierarchy.

In [None]:
#| export
@patch
async def ocr(self:Report,
              dst:str='data/md',       # Destination directory for markdown files
              add_img_desc:bool=True,  # Whether to add image descriptions
              force:bool=False,        # Force re-OCR              
              **kwargs                 # Additional args passed to pdf_to_md
             ) -> Report:
    "Run OCR on PDF and fix heading hierarchy"
    if self.md_path and not force: return self
    if self.pdf_path is None: raise ValueError("Call download() first")
    if self.pdf_url: pdf_file = self.pdf_path/Path(self.pdf_url).name
    else: pdf_file = first(self.pdf_path.glob('*.pdf'))
    await pdf_to_md(pdf_file, Path(dst)/self.id, add_img_desc=add_img_desc, **kwargs)
    self.md_path = Path(dst)/self.id
    self.save(self.results_path)
    return self

In [None]:
#| eval: false
await report.ocr(dst='files/test/md', add_img_desc=False)


## Report: Final Evaluation of the EU-IOM Joint Initiative for migrant protection and reintegration in the horn of Africa
**Year:** 2023 | **Organization:** IOM  
**ID:** `49d2fba781b6a7c0d94577479636ee6f`

**Processing Status:**  
✓ PDF downloaded | ✓ Markdown converted

**Documents:** 5 available


In [None]:
#| eval: false
report.md_path

Path('files/test/md/49d2fba781b6a7c0d94577479636ee6f')

In [None]:
#| eval: false
report.md_path.ls()[:2]

(#2) [Path('files/test/md/49d2fba781b6a7c0d94577479636ee6f/page_18.md'),Path('files/test/md/49d2fba781b6a7c0d94577479636ee6f/page_26.md')]

### Extract

Extracts key sections (executive summary, findings, recommendations, conclusions) from the markdown.


In [None]:
#| export
@patch
def extract(self:Report, 
            force:bool=False, # Force re-extraction
            **kwargs
            ):
    "Extract core sections from markdown"
    if self.sections and not force: return self
    if self.md_path is None: raise ValueError("Call ocr() first")
    md = read_pgs(self.md_path)
    self.sections = extract_sections(md, **kwargs)
    self.save(self.results_path)
    return self

In [None]:
#| eval: false
report.extract()


## Report: Final Evaluation of the EU-IOM Joint Initiative for migrant protection and reintegration in the horn of Africa
**Year:** 2023 | **Organization:** IOM  
**ID:** `49d2fba781b6a7c0d94577479636ee6f`

**Processing Status:**  
✓ PDF downloaded | ✓ Markdown converted | ✓ Sections extracted (~14858 tokens)

**Documents:** 5 available


In [None]:
#| eval: false
print(report.sections[:1000])

## 1. Introduction  ... page 4

In 2016, the EU and IOM launched the EU-IOM Joint Initiative for Migrant Protection and Reintegration, with as overall objective "To contribute to facilitating orderly, safe, regular and rights-based migration through the facilitation of dignified voluntary return and the implementation of development-focused and sustainable reintegration policies and processes". The EU-IOM Joint Initiative in the Horn of Africa (JI-HOA) ${ }^{1}$ commenced in March 2017 in the Khartoum Process countries, with a focus on Djibouti, Ethiopia, Somalia, and Sudan. ${ }^{2}$ The programme was coordinated by a Regional Coordination Unit (RCU) based in the IOM Regional Office for the East and Horn of Africa region (Nairobi, Kenya).

In accordance with the programme planning, the JI-HoA underwent a Mid-Term Evaluation in 2019 and a Final Independent Evaluation in 2022/2023, covering the 2017-2022 period, conducted by PPMI Group and commissioned by IOM. The evaluation covered the

## Thematic mapping

Map extracted sections to IOM's strategic frameworks (SRF and GCM). Each mapping method can be run independently after `extract()`.

In [None]:
#| export
@patch
def _ensure_sys_blocks(self:Report):
    if self.sections is None: raise ValueError("Call extract() first")
    if not hasattr(self, '_sys_blocks'): self._sys_blocks = mk_system_blocks(self.sections)

In [None]:
#| export
def _map_single(sys_blocks,                 # System blocks from mk_system_blocks
                theme_type,                 # One of: 'enablers', 'ccps', 'gcm', 'outputs'
                path=None,                  # Path to theme files
                model='claude-haiku-4-5',   # Model to use for mapping
                gcm_ids=None                # GCM IDs for output mapping
               ):
    "Map system blocks (Report) to a single theme type using appropriate prompts and formatting"
    if theme_type == 'enablers': res = map_themes(sys_blocks, fmt_enablers_ccp(load_enablers(path)), load_prompt('srf_enablers'), model)
    elif theme_type == 'ccps': res = map_themes(sys_blocks, fmt_enablers_ccp(load_ccp(path)), load_prompt('srf_ccps'), model)
    elif theme_type == 'gcm': res = map_themes(sys_blocks, load_gcms(path), load_prompt('gcm'), model)
    elif theme_type == 'outputs':
        srf_obj, gcm_lut = load_srf_outs(path), load_gcm_lut(path)
        output_ids = get_srf_outs(gcm_lut, gcm_ids)
        res = map_themes(sys_blocks, fmt_srf_outs(srf_obj, output_ids), load_prompt('srf_outputs'), model)
    return parse_json_response(res)

### Map enablers

Maps to Strategic Results Framework enablers (organizational capabilities).


In [None]:
#| export
@patch
def map_enablers(self:Report, 
                 force:bool=False, # Re-run even if already completed
                 **kwargs          # Additional args passed to _map_single (e.g. path, model)
                ):
    "Map report sections to Strategic Results Framework enablers"
    if 'enablers' in self.mappings and not force: return self
    self._ensure_sys_blocks()
    self.mappings['enablers'] = _map_single(self._sys_blocks, 'enablers', **kwargs)
    self.save(self.results_path)
    return self

Here let's consider we don't want to start the pipeline from scratch and want to resume where we left some time ago:

In [None]:
#| eval: false
# Resuming where left
report = load_report('49d2fba781b6a7c0d94577479636ee6f', path='files/test/results')

# Mapping enablers
report.map_enablers(model='claude-haiku-4-5')


## Report: Final Evaluation of the EU-IOM Joint Initiative for migrant protection and reintegration in the horn of Africa
**Year:** 2023 | **Organization:** IOM  
**ID:** `49d2fba781b6a7c0d94577479636ee6f`

**Processing Status:**  
✓ PDF downloaded | ✓ Markdown converted | ✓ Sections extracted (~14858 tokens) | ✓ Mappings: enablers

**Documents:** 5 available


In [None]:
#| eval: false
sort_by_centrality(report.mappings['enablers'])[:2]

[{'theme_id': '4',
  'theme_title': 'Data and evidence',
  'centrality_score': 0.75,
  'reasoning': "Data and evidence is a PRIMARY organizational focus. Section 4.3.1.1 demonstrates JI exceeded targets for field studies (20 vs 19). The Regional Data Hub receives extensive praise (Section 5.1, Recommendation 4). Section 4.3.3.1 documents 36 M&E tools established across countries. Conclusions (5.1) emphasize JI's important contributions to data availability and research on migration trends. Data collection systems and evidence generation are core to effectiveness assessment.",
  'confidence': 'high'},
 {'theme_id': '2',
  'theme_title': 'Partnership',
  'centrality_score': 0.72,
  'reasoning': 'Partnership is a MAJOR component throughout the report. Section 4.2.2 extensively analyzes alignment with regional actors (IGAD, AU) and complementarity with other initiatives. Section 4.1.2.2 notes 82% of partners found capacity building useful. Recommendations 1-3 explicitly address partnership

### Map CCPs

Maps to SRF Cross-Cutting Priorities.

In [None]:
#| export
@patch
def map_ccps(self:Report, 
             force:bool=False, # Re-run even if already completed
             **kwargs          # Additional args passed to _map_single (e.g. path, model)
             ):
    "Map report sections to Strategic Results Framework cross-cutting priorities"
    if 'ccps' in self.mappings and not force: return self
    self._ensure_sys_blocks()
    self.mappings['ccps'] = _map_single(self._sys_blocks, 'ccps', **kwargs)
    self.save(self.results_path)
    return self

In [None]:
#| eval: false
report.map_ccps(model='claude-haiku-4-5')


## Report: Final Evaluation of the EU-IOM Joint Initiative for migrant protection and reintegration in the horn of Africa
**Year:** 2023 | **Organization:** IOM  
**ID:** `49d2fba781b6a7c0d94577479636ee6f`

**Processing Status:**  
✓ PDF downloaded | ✓ Markdown converted | ✓ Sections extracted (~14858 tokens) | ✓ Mappings: enablers, ccps

**Documents:** 5 available


In [None]:
#| eval: false
sort_by_centrality(report.mappings['ccps'])[:2]

[{'theme_id': '3',
  'theme_title': 'Protection-centred',
  'centrality_score': 0.78,
  'reasoning': "Protection-centred approach is substantial evaluative lens throughout. Section 4.3.2 explicitly evaluates 'Safe, humane, dignified voluntary return processes,' with findings on AVR outcomes (9,025 migrants returned, 95% satisfaction with travel arrangements). Section 4.1.1.1 assesses migrant needs including protection from dangerous environments/detention. Section 4.3.3.1 addresses psychosocial support as protection dimension. Multiple recommendations (4, 6) emphasize continued protection support. However, not the primary evaluation focus—program evaluation uses it as major criterion rather than thematic focus on protection itself.",
  'confidence': 'high'},
 {'theme_id': '2',
  'theme_title': 'Equality, Diversity & Inclusion',
  'centrality_score': 0.52,
  'reasoning': 'Report demonstrates EDI elements but inconsistently. Section 4.1.1.3 notes community participation survey (1,232 res

### Map GCM objectives

Maps to Global Compact for Migration objectives.

In [None]:
#| export
@patch
def map_gcm(self:Report, 
            force:bool=False, # Re-run even if already completed
            **kwargs          # Additional args passed to _map_single (e.g. path, model)
            ):
    "Map report sections to Global Compact Mapping Objectives"
    if 'gcm' in self.mappings and not force: return self
    self._ensure_sys_blocks()
    self.mappings['gcm'] = _map_single(self._sys_blocks, 'gcm', **kwargs)
    self.save(self.results_path)
    return self

In [None]:
#| eval: false
report.map_gcm(model='claude-haiku-4-5', force=True)


## Report: Final Evaluation of the EU-IOM Joint Initiative for migrant protection and reintegration in the horn of Africa
**Year:** 2023 | **Organization:** IOM  
**ID:** `49d2fba781b6a7c0d94577479636ee6f`

**Processing Status:**  
✓ PDF downloaded | ✓ Markdown converted | ✓ Sections extracted (~13823 tokens) | ✓ Mappings: enablers, ccps, gcm

**Documents:** 5 available


In [None]:
#| eval: false
sort_by_centrality(report.mappings['gcm'])[:2]



[{'theme_id': '21',
  'theme_title': 'Objective 21: Cooperate In Facilitating Safe And Dignified Return And Readmission, As Well As Sustainable Reintegration',
  'centrality_score': 0.96,
  'reasoning': "THIS IS THE PRIMARY FOCUS. The programme's stated objective is 'dignified voluntary return and sustainable reintegration' (Introduction, Section 2.1). All five pillars support this objective. Specific Objectives 2 and 3 directly address safe return (Section 4.3.2) and sustainable reintegration (Section 4.3.3). Actions on cooperation frameworks (a), gender-responsive return (b), identification and travel documents (c), consular support (d), monitoring (f), child protection (g), sustainable reintegration (h), and community needs (i) are extensively discussed. Recommendations 4 and 6 prioritize continuation of AVR and integrated reintegration. Entire evaluation structure revolves around return/reintegration effectiveness, relevance, and sustainability.",
  'confidence': 'high'},
 {'theme_

### Map outputs

Maps to SRF outputs. If `gcm_ids` not provided, uses the top GCM objective from prior mapping.

In [None]:
#| export
@patch
def map_outputs(self:Report, 
                gcm_ids=None,       # GCM IDs to filter SRF objectives
                force:bool=False,   # Re-run even if already completed
                **kwargs            # Additional args passed to _map_single (e.g. path, model)
                ):
    "Map report sections to Strategic Results Framework outputs"
    if 'outputs' in self.mappings and not force: return self
    self._ensure_sys_blocks()
    if gcm_ids is None: gcm_ids = [get_top_ids(self.mappings['gcm'])[0]] if self.mappings['gcm'] else []
    self.mappings['outputs'] = _map_single(self._sys_blocks, 'outputs', gcm_ids=gcm_ids, **kwargs)
    self.save(self.results_path)
    return self

In [None]:
#| eval: false
report = load_report('49d2fba781b6a7c0d94577479636ee6f', path='files/test/results')
report.map_outputs(model='claude-haiku-4-5')


## Report: Final Evaluation of the EU-IOM Joint Initiative for migrant protection and reintegration in the horn of Africa
**Year:** 2023 | **Organization:** IOM  
**ID:** `49d2fba781b6a7c0d94577479636ee6f`

**Processing Status:**  
✓ PDF downloaded | ✓ Markdown converted | ✓ Sections extracted (~14869 tokens) | ✓ Mappings: enablers, ccps, gcm, outputs

**Documents:** 5 available


In [None]:
#| eval: false
sort_by_centrality(report.mappings['outputs'])[:2]

[{'theme_id': '3c31',
  'theme_title': 'Governments and other key stakeholders, including international organizations, civil society and the private sector, have increased knowledge and expertise to collect, manage, analyze and disseminate migration data in line with their needs, regional priorities, global commitments and in full respect of data protection and privacy.',
  'centrality_score': 0.81,
  'reasoning': 'The report extensively documents capacity building as a core objective (Specific Objective 1, Section 4.3.1). Findings show 665 stakeholders strengthened through capacity building (exceeding target of 434); 97% average increase in knowledge across countries; and partnerships with 40 organizations on MRCs. Section 4.1.2.2 notes 82% of partners found capacity building useful. Multiple recommendations address expanding capacity building scope (Recommendation 1). This is a major, directly evaluated deliverable.',
  'confidence': 'high'},
 {'theme_id': '3c22',
  'theme_title': 'A

### Map all themes

Convenience method to run all mapping stages in sequence.

In [None]:
#| export
@patch
def map_all(self:Report, **kwargs): return self.map_enablers(**kwargs).map_ccps(**kwargs).map_gcm(**kwargs).map_outputs(**kwargs)

### Run full pipeline

Run all four stages in sequence on a single evaluation report. This is a convenience wrapper around the individual `Report` methods for processing reports from start to finish.

In [None]:
#| export
def _should_force(force, # Bool to force all steps, or set of step names to force
                  step   # Step name to check
                 ):
    "Check if step should be forced - handles bool or set of step names"
    if isinstance(force, bool): return force
    return step in force

In [None]:
#| export
async def run_pipeline(url:str,                         # URL of the evaluation PDF
                       evals:list,                      # List of `Evaluation` objects to search
                       pdf_dst:str='data/pdfs',         # Destination directory for PDFs
                       md_dst:str='data/md',            # Destination directory for markdown files
                       results_path:str='data/results', # Path to save/load results
                       ocr_kwargs:dict=None,            # Additional arguments passed to ocr (e.g. add_img_desc, model)
                       force:bool|set=False,            # Force re-run: True for all, or set of step names {'download','ocr','extract','enablers','ccps','gcm','outputs'}
                       **kwargs                         # Additional arguments passed to mapping functions
                      ) -> Report:                      # Fully processed report with all mappings
    "Run complete pipeline: download → ocr → extract → map_themes"
    logging.info(f"Creating report from URL...")
    report = Report.from_url(url, evals, results_path=results_path)
    # Try to load existing checkpoint
    try: report = load_report(report.id, path=results_path)
    except FileNotFoundError: pass
    logging.info(f"Step 1/7: Downloading PDF...")
    report.download(dst=pdf_dst, force=_should_force(force, 'download'))
    logging.info(f"Step 2/7: Running OCR...")
    await report.ocr(dst=md_dst, force=_should_force(force, 'ocr'), **(ocr_kwargs or {}))
    logging.info(f"Step 3/7: Extracting sections...")
    report.extract(force=_should_force(force, 'extract'))
    logging.info(f"Step 4/7: Mapping enablers...")
    report.map_enablers(force=_should_force(force, 'enablers'), **kwargs)
    logging.info(f"Step 5/7: Mapping CCPs...")
    report.map_ccps(force=_should_force(force, 'ccps'), **kwargs)
    logging.info(f"Step 6/7: Mapping GCM objectives...")
    report.map_gcm(force=_should_force(force, 'gcm'), **kwargs)
    logging.info(f"Step 7/7: Mapping outputs...")
    report.map_outputs(force=_should_force(force, 'outputs'), **kwargs)
    logging.info(f"Pipeline complete!")
    return report

In [None]:
#| eval: false
evals = load_evals('files/test/evaluations.json')
url = "https://evaluation.iom.int/sites/g/files/tmzbdl151/files/docs/resources/Abridged%20Evaluation%20Report_%20Final_Olta%20NDOJA.pdf"
report = await run_pipeline(
    url, 
    evals, 
    pdf_dst='files/test/pdfs', 
    md_dst='files/test/md', 
    results_path='files/test/results', 
    ocr_kwargs=dict(add_img_desc=False), 
    force=False,
    model='claude-haiku-4-5'
    )

report

INFO:root:Creating report from URL...


INFO:root:Step 1/7: Downloading PDF...


INFO:root:Step 2/7: Running OCR...


INFO:root:Step 3/7: Extracting sections...


INFO:root:Step 4/7: Mapping enablers...


INFO:root:Step 5/7: Mapping CCPs...


INFO:root:Step 6/7: Mapping GCM objectives...


INFO:root:Step 7/7: Mapping outputs...


INFO:root:Pipeline complete!



## Report: Final Evaluation of the EU-IOM Joint Initiative for migrant protection and reintegration in the horn of Africa
**Year:** 2023 | **Organization:** IOM  
**ID:** `49d2fba781b6a7c0d94577479636ee6f`

**Processing Status:**  
✓ PDF downloaded | ✓ Markdown converted | ✓ Sections extracted (~14858 tokens) | ✓ Mappings: enablers, ccps, gcm, outputs

**Documents:** 5 available


Or, if we prefer to re-run (force) only the last two steps:

In [None]:
#| eval: false
report = await run_pipeline(
    url, 
    evals, 
    pdf_dst='files/test/pdfs', 
    md_dst='files/test/md', 
    results_path='files/test/results', 
    ocr_kwargs=dict(add_img_desc=False), 
    force={'gcm', 'outputs'},  # Only re-run these two steps
    model='claude-haiku-4-5'
)

report

Creating report from URL...


Step 1/7: Downloading PDF...


Step 2/7: Running OCR...


Step 3/7: Extracting sections...


Step 4/7: Mapping enablers...


Step 5/7: Mapping CCPs...


Step 6/7: Mapping GCM objectives...


Step 7/7: Mapping outputs...


Pipeline complete!



## Report: Final Evaluation of the EU-IOM Joint Initiative for migrant protection and reintegration in the horn of Africa
**Year:** 2023 | **Organization:** IOM  
**ID:** `49d2fba781b6a7c0d94577479636ee6f`

**Processing Status:**  
✓ PDF downloaded | ✓ Markdown converted | ✓ Sections extracted (~14858 tokens) | ✓ Mappings: enablers, ccps, gcm, outputs

**Documents:** 5 available
