<!-- WARNING: THIS FILE WAS AUTOGENERATED! DO NOT EDIT! -->

## Introduction

This notebook implements an evidence mapping system with:
 - Batch processing for scalability
 - Robust error handling and retries
 - Embedding caching
 - Hybrid search (vector + full-text)
 - Local LanceDB deployment

we can follow these steps:

 - Load the JSON file containing the URLs of the PDF reports.
 - Load the Excel file describing the IOM Results Framework.
 - Download and process the PDF reports to extract text.
 - Integrate the extracted text with the IOM Results Framework.
 - Generate embeddings and store them in LanceDB. 

## Setup

### Create a Virtual Environment

To ensure a clean and isolated environment for this project, we will create a virtual environment using Python's `venv` module. This will help manage dependencies and avoid conflicts with other projects.

```{bash} 
#| eval: false
python -m venv .venv
```

Then, activate the virtual environment:
```{bash} 
#| eval: false
.\.venv\Scripts\activate
```


Then, configure visual Studio Code to use the virtual environment: Open the Command Palette using the shortcut `Ctrl+Shift+P` and type `Jupyter: Select Interpreter` and select the interpreter that corresponds to your newly created virtual environment: `('venv': venv)`.


### Required Python Modules

Once this environment selected as a kernel to run the notebook, we can install the required python modules the rest of the process:

```{python} 
%pip install openai  lancedb pyarrow pandas numpy matplotlib seaborn plotly pymupdf requests tqdm tenacity ipython dotenv langchain langchain-community langchain_openai  ipywidgets openpyxl  filetype
```


then Restart the jupyter kernel for this notebook
```{python}
#| eval: false
%reset -f
```

### Initialise LLM API


### Load PDF library and Strategic Results Framework

The library

In [0]:
#| echo: false
#| output: asis
show_doc(load_evaluations)

---

[source](https://github.com/iom/evaluation_knowledge/blob/main/evaluation_knowledge/core.py#L139){target="_blank" style="float:right; font-size:smaller"}

### load_evaluations

>      load_evaluations (file_path, json_path)

In [None]:
#|eval: false
library =load_evaluations("reference/Evaluation_repository.xlsx","reference/Evaluation_repository.json" )

Now the framework

In [0]:
#| echo: false
#| output: asis
show_doc(load_iom_framework)

---

[source](https://github.com/iom/evaluation_knowledge/blob/main/evaluation_knowledge/core.py#L121){target="_blank" style="float:right; font-size:smaller"}

### load_iom_framework

>      load_iom_framework (excel_path:str)

*Load and validate IOM framework*

In [None]:
#|eval: false
framework= load_iom_framework("reference/Strategic_Result_Framework.xlsx")

## Step 1: Building the Knowledge Base

So we have a collection of Evaluation documents. We have metadata for each Evaluation. For each evaluation, we have multiple documents (The evaluation report itslef, plus in some case: a summary brief, annexes, etc.)

See an example below


In [None]:
#|eval: false
[
    {
        "Title": "Finale Internal Evluation: ENHANCING THE CAPACITY TO MAINSTREAM ENVIRONMENT AND CLIMATE CHANGE WITHIN WIDER FRAMEWORK OF MIGRATION MANAGEMENT IN WEST AND CENTRAL AFRICA",
        "Year": "2022",
        "Author": "Abderrahim El Moulat",
        "Best Practices or Lessons Learnt": "Yes",
        "Date of Publication": "2022-06-22 00:00:00",
        "Donor": "IOM Development Fund",
        "Evaluation Brief": "Yes",
        "Evaluation Commissioner": "Donor, IOM",
        "Evaluation Coverage": "Country",
        "Evaluation Period From Date": "nan",
        "Evaluation Period To Date": "NaT",
        "Executive Summary": "Yes",
        "External Version of the Report": "No",
        "Languages": "English",
        "Migration Thematic Areas": "Migration and climate change",
        "Name of Project(s) Being Evaluated": NaN,
        "Number of Pages Excluding annexes": 20.0,
        "Other Documents Included": NaN,
        "Project Code": "NC.0030",
        "Countries Covered": [
            "Senegal"
        ],
        "Regions Covered": "RO Dakar",
        "Relevant Crosscutting Themes": "Gender",
        "Report Published": "Yes",
        "Terms of Reference": "No",
        "Type of Evaluation Scope": "Programme/Project",
        "Type of Evaluation Timing": "Ex-post (after the end of the project/programme)",
        "Type of Evaluator": "Internal",
        "Level of Evaluation": "Decentralized",
        "Documents": [
            {
                "Document Subtype": "Evaluation brief",
                "File URL": "https://evaluation.iom.int/sites/g/files/tmzbdl151/files/docs/resources/Internal%20Evaluation_NC0030_JUNE_2022_FINAL_Abderrahim%20EL%20MOULAT_0.pdf",
                "File description": "Evaluation Brief"
            },
            {
                "Document Subtype": "Evaluation report",
                "File URL": "https://evaluation.iom.int/sites/g/files/tmzbdl151/files/docs/resources/NC0030_Evaluation%20Brief_June%202022_Abderrahim%20EL%20MOULAT.pdf",
                "File description": "Evaluation Report"
            }
        ]
    },
    {
        "Title": "Local Authorities Network for Migration and Development",
        "Year": "2022",
        "Author": "Action Research for CO-development (ARCO)",
        "Best Practices or Lessons Learnt": "No",
        "Date of Publication": "2022-02-01 00:00:00",
        "Donor": "Government of Italy",
        "Evaluation Brief": "No",
        "Evaluation Commissioner": "IOM",
        "Evaluation Coverage": "Multi-country",
        "Evaluation Period From Date": "2020-07-06 00:00:00",
        "Evaluation Period To Date": "2021-07-31 00:00:00",
        "Executive Summary": "Yes",
        "External Version of the Report": "No",
        "Languages": "English",
        "Migration Thematic Areas": "Migration and Development - diaspora",
        "Name of Project(s) Being Evaluated": NaN,
        "Number of Pages Excluding annexes": 37.0,
        "Other Documents Included": NaN,
        "Project Code": "MD.0003",
        "Countries Covered": [
            "Albania",
            "Italy"
        ],
        "Regions Covered": "RO Brussels",
        "Relevant Crosscutting Themes": "Gender, Rights-based approach",
        "Report Published": "No",
        "Terms of Reference": "Yes",
        "Type of Evaluation Scope": "Programme/Project",
        "Type of Evaluation Timing": "Final (at the end of the project/programme)",
        "Type of Evaluator": "External",
        "Level of Evaluation": "Decentralized",
        "Documents": [
            {
                "Document Subtype": "Evaluation report",
                "File URL": "https://evaluation.iom.int/sites/g/files/tmzbdl151/files/docs/resources/Evaluation%20Brief_ARCO_Shiraz%20JERBI.pdF",
                "File description": "Evaluation Report "
            },
            {
                "Document Subtype": "Evaluation brief",
                "File URL": "https://evaluation.iom.int/sites/g/files/tmzbdl151/files/docs/resources/Final%20evaluation%20report_ARCO_Shiraz%20JERBI_1.pdf",
                "File description": "Evaluation Brief"
            },
            {
                "Document Subtype": "Management response",
                "File URL": "https://evaluation.iom.int/sites/g/files/tmzbdl151/files/docs/resources/Management%20Response%20Matrix_ARCO_Shiraz%20JERBI.pdf",
                "File description": "Management Response"
            }
        ]
    }
]

Workflow Overview

1. Load external metadata for one evaluations as per json file.
2. Then for each evaluation, download the file, convert it to PDF (in case it's a word, excel or ppt), and load text for each document.
3. Then implement a [late chunking that can solve the lost context problem](https://isaacflath.com/blog/blog_post?fpath=posts%2F2025-04-08-LateChunking.ipynb) and  insert in chunk table, enabling [Hybrid Search capability](https://docs.lancedb.com/core/hybrid-search) so that search can be made based on both key words and similirarity.
4. Create additional metadata for both documents and for evaluation using an LLM call. the additional metadata shall help to define an asesssment of the "evidence strenght". The metadata to be created are - evaluation type (formative, summative, impact), Methodology, study design, sample size, and data collection techniques  



### Initialize LanceDB Vector Database

The database includes 23 tables: 

__1. Evaluations Table__
Each row represents a unique evaluation with the following fields:

* evaluation_id (unique identifier)
* title 
* author
* practice_or_lessons
* donor
* is_brief
* commissioner
* coverage
* countries
* from_date
* to_date
* has_summary
* external_version
* language
* thematic_area
* name_project
* project_code
* evaluation_scope
* evaluation_timing
* evaluation_level
* evaluator_type
* theme
* cross_cutting

additional variable will be generated through an LLM prompt on the entire evaluation content

* short_title 
* summary
* population (PICO model)
* intervention (PICO model)
* comparator (PICO model)
* outcome (PICO model)
* methodology
* study_type
* study_design
* sample_size
* data_collection_techniques
* evidence_strength 
* limitations 


__2. Documents Table__

Each row represents a PDF file linked to an evaluation:
 
* document_id: Primary key   ID of the original PDF
* evaluation_id: foreign key to link to the evaluation
* document_subtype: from the original metadata
* document_url: from the original metadata
* document_name: from the original metadata 
* document_tite:  document type as reviewed by the LLM
* document_type_infer: document type as reviewed by the LLM
* document_processed: boolean to confirm it is done


__3. Chunk Table__
* chunk_id: Primary key  
* evaluation_id: foreign key to link to the evaluation
* document_id: ID of the original file
* document_page: for proper referencing of any further information retrieval
* chunk_index: order of the chunk in the document
* text: the chunked content
* embedding (for hybrid search)

Let's start by loading the library from json...


In [0]:
#| echo: false
#| output: asis
show_doc(load_evaluations)

---

[source](https://github.com/iom/evaluation_knowledge/blob/main/evaluation_knowledge/core.py#L139){target="_blank" style="float:right; font-size:smaller"}

### load_evaluations

>      load_evaluations (json_path:str)

*Load evaluation data from a JSON file

Args:
    json_path: Path to the JSON file containing evaluation data

Returns:
    List of evaluation dictionaries*

Load a small subset for testing..

In [None]:
#|eval: false
# Load your   metadata
#evaluation_data =  load_evaluations("reference/Evaluation_repository_test.json")
evaluation_data =  load_evaluations("reference/Evaluation_repository.json")
print(f"Attribute name is: {evaluation_data}")
print(type(evaluation_data))

Id Generation

In [0]:
#| echo: false
#| output: asis
show_doc(generate_id)

---

[source](https://github.com/iom/evaluation_knowledge/blob/main/evaluation_knowledge/core.py#L173){target="_blank" style="float:right; font-size:smaller"}

### generate_id

>      generate_id (text:str)

*Generate a deterministic ID from text*

In [None]:
#|eval: false
eval_id = generate_id( "aaa")
print({eval_id})

In [0]:
#| echo: false
#| output: asis
show_doc(force_delete_directory)

---

[source](https://github.com/iom/evaluation_knowledge/blob/main/evaluation_knowledge/core.py#L183){target="_blank" style="float:right; font-size:smaller"}

### force_delete_directory

>      force_delete_directory (path, max_retries=3, delay=1)

*Robust directory deletion with retries and delay*

In [None]:
#|eval: false
force_delete_directory(LANCE_DB_PATH)

We start prefilling our vector database with the metadata

In [0]:
#| echo: false
#| output: asis
show_doc(initialise_knowledge_base)

---

[source](https://github.com/iom/evaluation_knowledge/blob/main/evaluation_knowledge/core.py#L212){target="_blank" style="float:right; font-size:smaller"}

### initialise_knowledge_base

>      initialise_knowledge_base (db, evaluation:Dict)

*Store full documents without chunking (late chunking approach)*

In [0]:
#| echo: false
#| output: asis
show_doc(safe_get)

---

[source](https://github.com/iom/evaluation_knowledge/blob/main/evaluation_knowledge/core.py#L205){target="_blank" style="float:right; font-size:smaller"}

### safe_get

>      safe_get (d, key, default=None)

*Safely get value from dict, handle NaN and missing keys*

In [None]:
#|eval: false
LANCE_DB_PATH = "./lancedb"
db = connect(LANCE_DB_PATH)
for evaluation in evaluation_data:  # Assuming evaluation_data is a list
    initialise_knowledge_base(db, evaluation)

Let's check each evaluation is in the DB -

In [None]:
#|eval: false
eval_table = db.open_table("evaluations")
#  Convert to Pandas DataFrame (recommended for display)
df = eval_table.to_pandas()
print(df)

and the corresponding documents...

In [None]:
#|eval: false
LANCE_DB_PATH = "./lancedb"
from lancedb import connect
db = connect(LANCE_DB_PATH)
doc_table = db.open_table("documents")
#  Convert to Pandas DataFrame (recommended for display)
df = doc_table.to_pandas()
print(df)
# this table includes document_id, url, and evaluation_id

### Download and prepare all the files

Now we build a smart function to download the files from  URL:
 - this function takes an argument the `doc_table` from the vector DB (`doc_table = db.open_table("documents")`). this table includes document_id, url, and evaluation_id
 - then for each document, and in parallelised way, it loads the url and extract the `file_name` from the `url` within the table
 - it builds a local `file_path` with `PDF_Library`/`evaluation_id`/`file_name` (where `PDF_Library` is an environment variable) 
 - it checks if the 'file_name' is already present and then gracefully exit
 - if not, it downloads the file_name - this done with with some provision to avoid requesting IP being banned - and ensure some retry until the file is downloaded
 - if the file_name extension is not pdf, it identify the file extension then it converts it to pdf

In [0]:
#| echo: false
#| output: asis
show_doc(download_documents)

---

[source](https://github.com/iom/evaluation_knowledge/blob/main/evaluation_knowledge/core.py#L375){target="_blank" style="float:right; font-size:smaller"}

### download_documents

>      download_documents (doc_table)

Here is the file conversion functions that assumes that [libre-office](https://www.libreoffice.org/download/download-libreoffice/) is installed locally.

```{bash}
# Debian/Ubuntu
sudo apt install libreoffice

# Mac (Homebrew)
brew install --cask libreoffice
```


In [0]:
#| echo: false
#| output: asis
show_doc(convert_file_to_pdf)

---

[source](https://github.com/iom/evaluation_knowledge/blob/main/evaluation_knowledge/core.py#L492){target="_blank" style="float:right; font-size:smaller"}

### convert_file_to_pdf

>      convert_file_to_pdf (input_path, output_path)

*Converts Word, Excel, or PowerPoint file to PDF using LibreOffice in headless mode.
Works on Windows, macOS, and Linux.*

In [0]:
#| echo: false
#| output: asis
show_doc(find_libreoffice_exec)

---

[source](https://github.com/iom/evaluation_knowledge/blob/main/evaluation_knowledge/core.py#L456){target="_blank" style="float:right; font-size:smaller"}

### find_libreoffice_exec

>      find_libreoffice_exec ()

*Finds the appropriate LibreOffice command based on OS.
Returns path to LibreOffice CLI tool or raises an error.*

Testing this...


In [None]:
#|eval: false
doc_table = db.open_table("documents")
os.environ["PDF_Library"] = "Evaluation_Library"
download_documents(doc_table)

### Now load file content in the vector DB, chunk and embedd

Building a function that 
 - this function takes an argument the `doc_table` from the vector DB (`doc_table = db.open_table("documents")`). this table includes document_id, url, and evaluation_id, processed
 - then for each document, and in parallelised way, it loads the url and extract the `file_name` from the `url` within the table
 - it assume a local `file_path` with `PDF_Library`/`evaluation_id`/`file_name` (where `PDF_Library` is an environment variable - `file_name` is extracted from the url - and the `file_name` extension is sanitised to include systematically '.pdf' ) 
 - It will extract the text from the PDF using PyMuPDF with error handling
-- it will implement the  It will then fill in the chunk table in lancedb, implementing a late chunking approach to avoid duplicate embedding computation, ensure context-aware chunk boundaries and precise span tracking .
- the lancedb chunk table schema should be 
    - chunk_id: str
    - document_id: str
    - evaluation_id: str
    - metadata: dict[str, str] 
    - content: str = embedding_fn.SourceField()
    - vector: Vector(embedding_fn.ndims()) = embedding_fn.VectorField() 
- Once processed the  processed variable in doc_table is set to true   


Test the embeddings through lanchain....


In [None]:
#|eval: false
test_embedding = embedding_model.embed_query("Hello world")
print(f"Embedding vector length: {len(test_embedding)}")
embedding_dim = len(test_embedding)
# LanceDB-compatible wrapper
class LangchainEmbeddingWrapper:
    def __init__(self, langchain_embedder):
        self._embedder = langchain_embedder

    def __call__(self, texts):
        return self._embedder.embed_documents(texts)
        
    def ndims(self):
        return self._dim

# Wrap and use
embedding_fn = LangchainEmbeddingWrapper(embedding_model)
#print("Embedding dimension:", embedding_fn.ndims())

vec = embedding_fn(["Hello world"])
print(f"Vector through lancedb dim: {len(vec[0])}")
print(embedding_fn(["Hello world"])[0])

In [None]:
#|eval: false
print(dir(embedding_fn))
help(embedding_fn)

So first we create the chunk table in lancedb 

In [None]:
#|eval: false
from pydantic import BaseModel
from lancedb.pydantic import Vector
import pyarrow as pa 
pa_schema = pa.schema([
    pa.field("chunk_id", pa.string()),
    pa.field("document_id", pa.string()),
    pa.field("evaluation_id", pa.string()),
    pa.field("metadata", pa.string()),  # storing metadata dict as JSON string for simplicity
    pa.field("content", pa.string()),
    pa.field("vector", pa.list_(pa.float32(), embedding_dim)),  # vector as list of floats
])

from lancedb import connect
db = connect(LANCE_DB_PATH)
chunk_table = db.create_table("chunks", schema=pa_schema)

and then the function creating embeddings chunck for each document



In [0]:
#| echo: false
#| output: asis
show_doc(process_documents_to_chunks)

---

[source](https://github.com/iom/evaluation_knowledge/blob/main/evaluation_knowledge/core.py#L586){target="_blank" style="float:right; font-size:smaller"}

### process_documents_to_chunks

>      process_documents_to_chunks (doc_table, chunk_table)

Now let's run this! 


In [None]:
#|eval: false
LANCE_DB_PATH = "./lancedb"
from lancedb import connect
db = connect(LANCE_DB_PATH)
doc_table = db.open_table("documents")
chunk_table = db.open_table("chunks")
process_documents_to_chunks(doc_table, chunk_table)

Checking the status of the chunking process


In [0]:
#| echo: false
#| output: asis
show_doc(check_chunk_status)

---

[source](https://github.com/iom/evaluation_knowledge/blob/main/evaluation_knowledge/core.py#L724){target="_blank" style="float:right; font-size:smaller"}

### check_chunk_status

>      check_chunk_status (doc_table, chunk_table)

In [None]:
#| eval: false
LANCE_DB_PATH = "./lancedb"
from lancedb import connect
db = connect(LANCE_DB_PATH)
doc_table = db.open_table("documents")
chunk_table = db.open_table("chunks")
missing_docs_df = check_chunk_status(doc_table, chunk_table)
print(missing_docs_df)
# this table includes document_id, url, and evaluation_id

### Generating AI-Enhanced metadata 

Last, we run a function to generate metadata... 

The function will load the "evaluations" table within the db --connect(LANCE_DB_PATH) --
then loop around each evaluation_id within the "chunks" table to retrive the context - and 
perform an LLM call to then generate as an output additional metadata
Then it will update the evaluations table with the output for each evaluation -
At the end it will save a json file with a dump of the evaluations table

In [0]:
#| echo: false
#| output: asis
show_doc(get_context_for_eval)

---

[source](https://github.com/iom/evaluation_knowledge/blob/main/evaluation_knowledge/core.py#L883){target="_blank" style="float:right; font-size:smaller"}

### get_context_for_eval

>      get_context_for_eval (eval_row, query, chunk_table)

In [0]:
#| echo: false
#| output: asis
show_doc(call_llm_with_retries)

---

[source](https://github.com/iom/evaluation_knowledge/blob/main/evaluation_knowledge/core.py#L841){target="_blank" style="float:right; font-size:smaller"}

### call_llm_with_retries

>      call_llm_with_retries (prompt, max_retries=4, delay=2)

In [0]:
#| echo: false
#| output: asis
show_doc(safe_join)

---

[source](https://github.com/iom/evaluation_knowledge/blob/main/evaluation_knowledge/core.py#L829){target="_blank" style="float:right; font-size:smaller"}

### safe_join

>      safe_join (items, sep=', ')

In [0]:
#| echo: false
#| output: asis
show_doc(clean_json)

---

[source](https://github.com/iom/evaluation_knowledge/blob/main/evaluation_knowledge/core.py#L804){target="_blank" style="float:right; font-size:smaller"}

### clean_json

>      clean_json (obj)

*Recursively clean an object to make it JSON serializable, handling None values.*

We will use the  __PICO structured framework__ as an approach to represent the causal knowledge found in the Evaluation (cf. [EconBERTa: Towards Robust Extraction of Named Entities in Economics](https://aclanthology.org/2023.findings-emnlp.774.pdf)). This scheme helps in systematically organizing and analyzing the effectiveness of interventions by comparing outcomes between groups:

__1. Population (P)__:  The group of individuals or units (e.g., households, schools, firms) affected by the intervention. The target population shall be clearly defined (e.g., smallholder farmers, primary school students, unemployed youth) and it shall Include eligibility criteria (e.g., age, socioeconomic status, geographic location).

__2. Intervention (I)__: The program, policy, or treatment whose effect is being evaluated. Describes the active component being tested (e.g., cash transfers, training workshops, new teaching methods). Should specify dosage, duration, and delivery mechanism.

__3. Comparators (C)__: The counterfactual scenario—what would have happened without the intervention. Ideally involves a control group (if the study approach is randomized or quasi-experimental) that does not receive the intervention. Alternatively refers to "Business-as-usual" groups, placebo interventions, or different treatment arms.

__4. Outcome (O)__:The measurable effects or endpoints used to assess the intervention’s impact. Includes primary outcomes (main indicators of interest, e.g., school enrollment rates, income levels) and secondary outcomes (e.g., health, empowerment). Should be specific, measurable, and time-bound (e.g., "child literacy scores after 12 months").


Using such approach, we can ensure Clarity (the research question is well-defined and testable), Causal Inference (isolate the effect of the intervention by comparing treated and untreated groups), Replicability (to potentially extrapolate the findings) and Relevance (linking outcomes to real-world decision-making).



In [0]:
#| echo: false
#| output: asis
show_doc(generate_metadata_for_evaluation_metadata_descriptive)

---

[source](https://github.com/iom/evaluation_knowledge/blob/main/evaluation_knowledge/core.py#L913){target="_blank" style="float:right; font-size:smaller"}

### generate_metadata_for_evaluation_metadata_descriptive

>      generate_metadata_for_evaluation_metadata_descriptive (eval_row,
>                                                             query_descriptive,
>                                                             chunk_table)

*Process one evaluation row and return updated row with metadata.*

In [0]:
#| echo: false
#| output: asis
show_doc(generate_metadata_for_evaluation_metadata_methodo)

---

[source](https://github.com/iom/evaluation_knowledge/blob/main/evaluation_knowledge/core.py#L991){target="_blank" style="float:right; font-size:smaller"}

### generate_metadata_for_evaluation_metadata_methodo

>      generate_metadata_for_evaluation_metadata_methodo (eval_row,
>                                                         query_methodo,
>                                                         chunk_table)

*Process one evaluation row and return updated row with metadata.*

In [0]:
#| echo: false
#| output: asis
show_doc(generate_metadata_for_evaluation_metadata_evidence)

---

[source](https://github.com/iom/evaluation_knowledge/blob/main/evaluation_knowledge/core.py#L1088){target="_blank" style="float:right; font-size:smaller"}

### generate_metadata_for_evaluation_metadata_evidence

>      generate_metadata_for_evaluation_metadata_evidence (eval_row, query,
>                                                          chunk_table)

*Process one evaluation row and return updated row with metadata.*

In [0]:
#| echo: false
#| output: asis
show_doc(generate_evaluation_metadata)

---

[source](https://github.com/iom/evaluation_knowledge/blob/main/evaluation_knowledge/core.py#L1162){target="_blank" style="float:right; font-size:smaller"}

### generate_evaluation_metadata

>      generate_evaluation_metadata (eval_table, chunk_table, batch_size=10,
>                                    output_file='all_evaluations_metadata.json'
>                                    )

*Main function to generate metadata in batches and update the table, with incremental saving.*

Now let's run it!

In [None]:
#| eval: false
# Initialize DB
LANCE_DB_PATH = "./lancedb"
db = connect(LANCE_DB_PATH)
eval_table = db.open_table("evaluations")
chunk_table = db.open_table("chunks")
enriched_data= generate_evaluation_metadata(eval_table, chunk_table, output_file="all_evaluations_metadata2.json")

## Step 2: Structured Information Extraction


### Standard Questions

```{python} 
# Define the list of experts on impact - outcome - organisation
q_experts = [
   "<s> [INST] Instructions: Act as a public program evaluation expert working for UNHCR. Your specific area of expertise and focus is strictly on the Strategic Impact: ---Attaining favorable protection environments---: i.e., finding or recommendations that require a change in existing policy and regulations. [/INST]",
   "<s> [INST] Instructions: Act as a public program evaluation expert working for UNHCR. Your specific area of expertise and focus is strictly on the Strategic Impact: ---Realizing rights in safe environments---: i.e., finding or recommendations that require a change in existing policy and regulations. [/INST]",
   "<s> [INST] Instructions: Act as a public program evaluation expert working for UNHCR. Your specific area of expertise and focus is strictly on the Strategic Impact: ---Empowering communities and achieving gender equality--- : i.e., finding or recommendations that require a change in existing policy and regulations. [/INST]",
   "<s> [INST] Instructions: Act as a public program evaluation expert working for UNHCR. Your specific area of expertise and focus is strictly on the Strategic Impact: ---Securing durable solutions--- : i.e., finding or recommendations that require a change in existing policy and regulations. [/INST]",

   "<s> [INST] Instructions: Act as a public program evaluation expert working for UNHCR. Your specific area of expertise and focus is strictly on the specific Operational Outcome: ---Access to territory registration and documentation ---, i.e. finding or recommendations that require a change that needs to be implemented in the field as an adaptation or change of current activities. [/INST]",
   "<s> [INST] Instructions: Act as a public program evaluation expert working for UNHCR. Your specific area of expertise and focus is strictly on the specific Operational Outcome: --- Status determination ---, i.e. finding or recommendations that require a change that needs to be implemented in the field as an adaptation or change of current activities. [/INST]",
   "<s> [INST] Instructions: Act as a public program evaluation expert working for UNHCR. Your specific area of expertise and focus is strictly on the specific Operational Outcome: --- Protection policy and law---, i.e. finding or recommendations that require a change that needs to be implemented in the field as an adaptation or change of current activities. [/INST]",
   "<s> [INST] Instructions: Act as a public program evaluation expert working for UNHCR. Your specific area of expertise and focus is strictly on the specific Operational Outcome: --- Gender-based violence ---, i.e. finding or recommendations that require a change that needs to be implemented in the field as an adaptation or change of current activities. [/INST]",
   "<s> [INST] Instructions: Act as a public program evaluation expert working for UNHCR. Your specific area of expertise and focus is strictly on the specific Operational Outcome: --- Child protection ---, i.e. finding or recommendations that require a change that needs to be implemented in the field as an adaptation or change of current activities. [/INST]",
   "<s> [INST] Instructions: Act as a public program evaluation expert working for UNHCR. Your specific area of expertise and focus is strictly on the specific Operational Outcome: --- Safety and access to justice ---, i.e. finding or recommendations that require a change that needs to be implemented in the field as an adaptation or change of current activities. [/INST]",
   "<s> [INST] Instructions: Act as a public program evaluation expert working for UNHCR. Your specific area of expertise and focus is strictly on the specific Operational Outcome: --- Community engagement and women's empowerment ---, i.e. finding or recommendations that require a change that needs to be implemented in the field as an adaptation or change of current activities. [/INST]",
   "<s> [INST] Instructions: Act as a public program evaluation expert working for UNHCR. Your specific area of expertise and focus is strictly on the specific Operational Outcome: --- Well-being and basic needs ---, i.e. finding or recommendations that require a change that needs to be implemented in the field as an adaptation or change of current activities. [/INST]",
   "<s> [INST] Instructions: Act as a public program evaluation expert working for UNHCR. Your specific area of expertise and focus is strictly on the specific Operational Outcome: --- Sustainable housing and settlements ---, i.e. finding or recommendations that require a change that needs to be implemented in the field as an adaptation or change of current activities. [/INST]",
   "<s> [INST] Instructions: Act as a public program evaluation expert working for UNHCR. Your specific area of expertise and focus is strictly on the specific Operational Outcome: --- Healthy lives---, i.e. finding or recommendations that require a change that needs to be implemented in the field as an adaptation or change of current activities. [/INST]",
   "<s> [INST] Instructions: Act as a public program evaluation expert working for UNHCR. Your specific area of expertise and focus is strictly on the specific Operational Outcome: --- Education ---, i.e. finding or recommendations that require a change that needs to be implemented in the field as an adaptation or change of current activities. [/INST]",
   "<s> [INST] Instructions: Act as a public program evaluation expert working for UNHCR. Your specific area of expertise and focus is strictly on the specific Operational Outcome: --- Clean water sanitation and hygiene ---, i.e. finding or recommendations that require a change that needs to be implemented in the field as an adaptation or change of current activities. [/INST]",
   "<s> [INST] Instructions: Act as a public program evaluation expert working for UNHCR. Your specific area of expertise and focus is strictly on the specific Operational Outcome: --- Self-reliance, Economic inclusion, and livelihoods ---, i.e. finding or recommendations that require a change that needs to be implemented in the field as an adaptation or change of current activities. [/INST]",
   "<s> [INST] Instructions: Act as a public program evaluation expert working for UNHCR. Your specific area of expertise and focus is strictly on the specific Operational Outcome: --- Voluntary repatriation and sustainable reintegration ---, i.e. finding or recommendations that require a change that needs to be implemented in the field as an adaptation or change of current activities. [/INST]",
   "<s> [INST] Instructions: Act as a public program evaluation expert working for UNHCR. Your specific area of expertise and focus is strictly on the specific Operational Outcome: --- Resettlement and complementary pathways---, i.e. finding or recommendations that require a change that needs to be implemented in the field as an adaptation or change of current activities. [/INST]",
   "<s> [INST] Instructions: Act as a public program evaluation expert working for UNHCR. Your specific area of expertise and focus is strictly on the specific Operational Outcome: --- Local integration and other local solutions ---, i.e. finding or recommendations that require a change that needs to be implemented in the field as an adaptation or change of current activities. [/INST]", 


   "<s> [INST] Instructions: Act as a public program evaluation expert working for UNHCR. Your specific area of expertise and focus is strictly on Organizational Enablers related to Systems and processes, i.e. elements that require potential changes in either management practices, technical approach, business processes, staffing allocation or capacity building. [/INST]",
   "<s> [INST] Instructions: Act as a public program evaluation expert working for UNHCR. Your specific area of expertise and focus is strictly on Organizational Enablers related to Operational support and supply chain, i.e. elements that require potential changes in either management practices, technical approach, business processes, staffing allocation or capacity building. [/INST]" ,
   "<s> [INST] Instructions: Act as a public program evaluation expert working for UNHCR. Your specific area of expertise and focus is strictly on Organizational Enablers related to People and culture, i.e. elements that require potential changes in either management practices, technical approach, business processes, staffing allocation or capacity building. [/INST]" ,
   "<s> [INST] Instructions: Act as a public program evaluation expert working for UNHCR. Your specific area of expertise and focus is strictly on Organizational Enablers related to External engagement and resource mobilization, i.e. elements that require potential changes in either management practices, technical approach, business processes, staffing allocation or capacity building. [/INST]" ,
   "<s> [INST] Instructions: Act as a public program evaluation expert working for UNHCR. Your specific area of expertise and focus is strictly on Organizational Enablers related to Leadership and governance, i.e. elements that require potential changes in either management practices, technical approach, business processes, staffing allocation or capacity building. [/INST]" 
]

# Predefined knowledge extraction questions
q_questions = [
    " List, as bullet points, all findings and evidences in relation to your specific area of expertise and focus. ",
    " Explain, in relation to your specific area of expertise and focus, what are the root causes for the situation. " ,
    " Explain, in relation to your specific area of expertise and focus, what are the main risks and difficulties here described. ",
    " Explain, in relation to your specific area of expertise and focus, what what can be learnt. ",
    " List, as bullet points, all recommendations made in relation to your specific area of expertise and focus. "#,
    # "Indicate if mentionnend what resource will be required to implement the recommendations made in relation to your specific area of expertise and focus. ",
    # "List, as bullet points, all recommendations made in relation to your specific area of expertise and focus that relates to topics  or activities recommended to be discontinued. ",
    # "List, as bullet points, all recommendations made in relation to your specific area of expertise and focus that relates to topics or activities recommended to be scaled up. " 
    # Add more questions here...
]

## Additional instructions!
q_instr = """
</s>
[INST]  
Keep your answer grounded in the facts of the contexts. 
If the contexts do not contain the facts to answer the QUESTION, return {NONE} 
Be concise in the response and  when relevant include precise citations from the contexts. 
[/INST] 
"""
```

###  Q&A Extraction

```{python} 
qa_questions = [
    "What was the intervention type?",
    "What outcomes were observed?",
    "What population was targeted?",
    "What geographic area was covered?",
    "How strong is the evidence?",
]

def generate_qas(text):
    prompt = f"""Extract answers to the following questions from the evaluation:
    {json.dumps(qa_questions)}
    
    Text: {text[:3000]}
    """
    completion = client.chat.completions.create(
        model="gpt-4",
        messages=[{"role": "user", "content": prompt}],
        temperature=0.2
    )
    return completion.choices[0].message.content

df_docs["qa"] = df_docs["text"].apply(generate_qas)
```

### Hybrid Search in LanceDB

```{python} 
# Sample hybrid search query
query = "What works best to improve health outcomes for displaced persons?"

query_embedding = get_text_embedding(query)
results = table.search(query_embedding).limit(5).to_list()

for result in results:
    print(result['metadata'])
    print(result['text'][:500])
```


```{python} 
def query_evidence(question: str, table: lancedb.db.LanceTable) -> Dict:
    """Enhanced query with hybrid search and evidence grading"""
    try:
        # Hybrid search
        results = hybrid_search(table, question, limit=7)
        
        if results.empty:
            return {"answer": "No relevant evidence found.", "sources": []}
        
        context = "\n\n".join([
            f"Document {i+1} (Relevance: {row.get('combined_score', 0):.2f}):\n{row['text']}\n"
            for i, row in results.iterrows()
        ])
        
        # Evidence-based answer generation
        prompt = f"""
        You are an evidence specialist answering questions about IOM programs.
        Use ONLY the provided context from evaluation reports.
        For each claim in your answer, cite the document number it came from.
        
        Question: {question}
        
        Context:
        {context}
        
        Provide:
        1. A direct answer to the question
        2. Strength of evidence (High/Medium/Low)
        3. Any limitations or caveats
        4. List of sources with relevance scores
        """
        
        response = openai.ChatCompletion.create(
            engine=config.chat_model,
            messages=[{"role": "user", "content": prompt}],
            temperature=config.temperature,
            max_tokens=config.max_tokens
        )
        
        answer = response.choices[0].message.content
        sources = [
            {"url": url, "score": score}
            for url, score in zip(results['url'], results.get('combined_score', 0))
        ]
        
        return {
            "question": question,
            "answer": answer,
            "sources": sources,
            "search_scores": results[['_distance', '_score', 'combined_score']].to_dict()
        }
    
    except Exception as e:
        print(f"Query error: {str(e)}")
        return {"error": str(e)}

```


```{python} 
def extract_structured_info(table, iom_framework):
    """Extract structured information from reports using the IOM Results Framework"""
    # Generate questions based on the IOM framework
    questions = generate_questions_from_framework(iom_framework)
    
    # Store extracted information
    extracted_data = []
    
    # Process each question
    for question in questions:
        print(f"Processing question: {question}")
        
        # Search for relevant chunks
        results = table.search(generate_embeddings([question])[0]).limit(10).to_pandas()
        
        # Combine relevant chunks as context
        context = "\n\n".join(results["text"].tolist())
        
        # Use Azure OpenAI to extract structured answer
        prompt = f"""
        Based on the following evaluation report excerpts, answer the question with structured information.
        Provide your answer in JSON format with the following structure:
        {{
            "question": "the question being asked",
            "answer": "the concise answer",
            "intervention_type": "type of intervention mentioned",
            "population": "target population mentioned",
            "outcome": "outcome measured",
            "geography": "geographic location if mentioned",
            "evidence_strength": "strength of evidence (high/medium/low)"
        }}

        Question: {question}
        Context: {context}
        """
        
        response = openai.ChatCompletion.create(
            engine=config["azure_openai_chat_deployment"],
            messages=[{"role": "user", "content": prompt}],
            temperature=config["temperature"],
            max_tokens=config["max_tokens"]
        )
        
        try:
            answer = json.loads(response.choices[0].message.content)
            answer["source_urls"] = results["url"].unique().tolist()
            extracted_data.append(answer)
        except json.JSONDecodeError:
            print(f"Failed to parse answer for question: {question}")
    
    return pd.DataFrame(extracted_data)

def generate_questions_from_framework(framework_df):
    """Generate questions based on the IOM Results Framework"""
    questions = []
    
    # Example questions based on common evaluation themes
    for _, row in framework_df.iterrows():
        questions.extend([
            f"What interventions has IOM implemented to achieve {row['Objective']}?",
            f"What evidence exists for the effectiveness of interventions targeting {row['Outcome']}?",
            f"What populations have been targeted by interventions aiming for {row['Indicator']}?",
            f"What geographic areas have seen interventions related to {row['Objective']}?",
            f"What methodologies have been used to evaluate interventions for {row['Outcome']}?"
        ])
    
    # Add some general evaluation questions
    questions.extend([
        "What are the most effective interventions for migrant livelihood improvement?",
        "What evidence exists for cash-based interventions in migration contexts?",
        "What are common challenges in implementing migration programs?",
        "What evaluation methodologies are most commonly used in IOM evaluations?",
        "What gaps exist in the evidence base for migration interventions?"
    ])
    
    return list(set(questions))  # Remove duplicates


def hybrid_search(table: lancedb.db.LanceTable, query: str, limit: int = 10) -> pd.DataFrame:
    """Perform hybrid (vector + full-text) search"""
    # Generate query embedding
    query_embedding = generate_embeddings_batch([query])[0]
    
    # Perform hybrid search
    results = table.search(query_embedding, query_string=query)\
                 .limit(limit)\
                 .to_pandas()
    
    # Score normalization (simple example)
    if not results.empty:
        max_vector_score = results["_distance"].max()
        max_fts_score = results["_score"].max()
        
        if max_vector_score > 0 and max_fts_score > 0:
            results["combined_score"] = (
                0.7 * (results["_distance"] / max_vector_score) +
                0.3 * (results["_score"] / max_fts_score)
            )
            results = results.sort_values("combined_score", ascending=False)
    
    return results

def extract_structured_info(table: lancedb.db.LanceTable, iom_framework: pd.DataFrame) -> pd.DataFrame:
    """Enhanced information extraction with hybrid search"""
    questions = generate_questions_from_framework(iom_framework)
    extracted_data = []
    
    for question in questions:
        try:
            # Hybrid search for relevant chunks
            results = hybrid_search(table, question, limit=15)
            
            if results.empty:
                continue
                
            context = "\n\n".join(results["text"].tolist())
            sources = results["url"].unique().tolist()
            
            # Structured extraction prompt
            prompt = f"""
            Extract structured information from this evaluation report context to answer the question.
            Return ONLY valid JSON with this structure:
            {{
                "question": "the question",
                "answer": "concise answer",
                "intervention_type": ["type1", "type2"],
                "population": ["group1", "group2"],
                "outcome": ["outcome1", "outcome2"],
                "geography": ["location1", "location2"],
                "evidence_strength": "high/medium/low",
                "source_urls": ["url1", "url2"]
            }}
            
            Question: {question}
            Context: {context}
            """
            
            response = openai.ChatCompletion.create(
                engine=config.chat_model,
                messages=[{"role": "user", "content": prompt}],
                temperature=config.temperature,
                max_tokens=config.max_tokens,
                response_format={ "type": "json_object" }
            )
            
            answer = json.loads(response.choices[0].message.content)
            answer["source_urls"] = sources
            extracted_data.append(answer)
            
        except Exception as e:
            print(f"Error processing question '{question}': {str(e)}")
            continue
    
    return pd.DataFrame(extracted_data)

```

## Step 3: Cross-Evaluation Analysis





## Step 4: Generate Actionable and Generalizable Insights  

One key challenge is How to generalize the findings from an evaluation from one place to another one? The [Generalizability Framework](https://ssir.org/articles/entry/the_generalizability_puzzle) provides some insights on how to do that.


To implement this we will generate insights using AI-enabled Q&A on all previous Q&A:

```{python} 
def generate_insights(df):
    # Add your insight generation logic here
    return df

df = generate_insights(df)
```

## Step 5: Identify Patterns and Gaps

Identify patterns and gaps in the data:

```{python} 
def identify_patterns(df):
    # Add your pattern identification logic here
    return df

df = identify_patterns(df)
```




```{python} 
def generate_deliverables(df):
    # Generate Q&A dataset
    qa_dataset = df[['question', 'answer']]

    # Generate synthesis report
    synthesis_report = df.describe()

    # Generate visual evidence map
    plt.figure(figsize=(10, 6))
    sns.scatterplot(data=df, x='outcome', y='population', size='sample_size')
    plt.title('Visual Evidence Map')
    plt.show()

    return qa_dataset, synthesis_report

qa_dataset, synthesis_report = generate_deliverables(df)
```

Visualize Patterns & Gaps

```{python} 
# Convert QA to structured fields (intervention, outcome, population, etc.)
qa_df = pd.json_normalize(df_docs["qa"].apply(json.loads))

# Bubble Map Example
fig = px.scatter(qa_df, x="geography", y="outcome",
                 size="sample_size", color="intervention",
                 hover_name="file_name",
                 title="Evidence Bubble Map")
fig.show()

# Heatmap Example
heatmap_df = pd.crosstab(qa_df["intervention"], qa_df["outcome"])
sns.heatmap(heatmap_df, annot=True, cmap="coolwarm")
```

```{python} 
def create_interactive_visualizations(extracted_data: pd.DataFrame):
    """Enhanced visualization functions"""
    # Prepare data
    df = extracted_data.explode("source_urls")
    
    # Evidence Strength Distribution
    strength_dist = df['evidence_strength'].value_counts().reset_index()
    fig1 = px.bar(
        strength_dist,
        x='evidence_strength',
        y='count',
        title='Distribution of Evidence Strength'
    )
    
    # Interventions by Geography
    fig2 = px.treemap(
        df,
        path=['geography', 'intervention_type'],
        title='Interventions by Geographic Region'
    )
    
    # Evidence Timeline (if dates available)
    if 'date' in df.columns:
        fig3 = px.line(
            df.groupby('date').size().reset_index(name='count'),
            x='date',
            y='count',
            title='Evidence Publication Timeline'
        )
    else:
        fig3 = None
    
    return fig1, fig2, fig3

```



```{python} 
def visualize_evidence_map(extracted_data):
    """Create interactive visualizations of the evidence map"""
    
    # Prepare data for visualization
    df = extracted_data.explode("source_urls")
    
    # Bubble map: Interventions by outcome and evidence strength
    fig1 = px.scatter(
        df, 
        x="outcome", 
        y="intervention_type", 
        size="evidence_strength",  # This would need to be mapped to numeric values
        color="population",
        hover_name="answer",
        title="Evidence Map: Interventions by Outcome and Population"
    )
    fig1.update_layout(height=800)
    
    # Heatmap: Evidence concentration by intervention and outcome
    heatmap_data = df.groupby(['intervention_type', 'outcome']).size().unstack().fillna(0)
    fig2 = px.imshow(
        heatmap_data,
        labels=dict(x="Outcome", y="Intervention Type", color="Number of Studies"),
        title="Evidence Concentration Heatmap"
    )
    
    # Gap map: Missing evidence
    all_interventions = df['intervention_type'].unique()
    all_outcomes = df['outcome'].unique()
    complete_grid = pd.MultiIndex.from_product([all_interventions, all_outcomes], names=['intervention_type', 'outcome'])
    gap_data = df.groupby(['intervention_type', 'outcome']).size().reindex(complete_grid, fill_value=0).reset_index()
    gap_data['has_evidence'] = gap_data[0] > 0
    
    fig3 = px.scatter(
        gap_data,
        x="outcome",
        y="intervention_type",
        color="has_evidence",
        title="Evidence Gap Map (Red = Missing Evidence)"
    )
    
    return fig1, fig2, fig3

def generate_synthesis_report(extracted_data):
    """Generate a narrative synthesis report of findings"""
    prompt = f"""
    You are an evaluation specialist analyzing evidence from IOM evaluation reports.
    Below is structured data extracted from multiple evaluation reports:
    
    {extracted_data.to_json()}
    
    Write a comprehensive synthesis report that:
    1. Summarizes key findings across interventions
    2. Identifies areas with strong evidence
    3. Highlights evidence gaps
    4. Provides recommendations for future evaluations
    5. Suggests high-priority research areas
    
    Structure your report with clear sections and bullet points for readability.
    """
    
    response = openai.ChatCompletion.create(
        engine=config["azure_openai_chat_deployment"],
        messages=[{"role": "user", "content": prompt}],
        temperature=0.5,  # Slightly more creative for synthesis
        max_tokens=3000
    )
    
    return response.choices[0].message.content

```


Save the deliverables to files:

```{python} 
qa_dataset.to_csv('qa_dataset.csv', index=False)
synthesis_report.to_csv('synthesis_report.csv')
```


## Conclusions - and potential extension...

* Web interface (Streamlit, Gradio, etc.)

* Periodic syncing with new evaluations via web scraping

* Integration with Hugging Face for fine-tuning a summarization or Q&A model

 

