# IMF Article IV Pipeline – PDFs, Text, EBA, and HTML Summaries

### What this notebook does at a high level
This notebook builds and runs a full pipeline around IMF Article IV reports:
- creates a manifest of country/year/IMF URLs,
- downloads the corresponding Article IV PDFs,
- extracts full-text and the "Executive Board Assessment" (EBA) section from PDFs,
- scrapes the HTML "Summary" section from IMF country report pages,
- tracks all statuses and paths in a single `data/manifest.csv` file.

### How it does it
- Uses `requests` and `BeautifulSoup` to:
  - find the Article IV PDF link on each IMF page,
  - download the PDF file.
- Uses a **manifest CSV** as the central state:
  - one row per report,
  - columns for ISO codes, year, URLs, status, file paths, snippets, notes.
- Uses `pdfminer.six` to:
  - convert PDFs to plain text,
  - extract the "Executive Board Assessment" section as a separate text file.
- Uses a second HTML scraper:
  - fetches the IMF HTML page,
  - extracts the on-page "Summary" section,
  - saves it to text and logs a short snippet.
- Each stage updates the manifest with a new `status` and the relevant paths.

In [2]:
import csv
import os
import time
import requests
from bs4 import BeautifulSoup

BASE = "https://www.imf.org"

## 1. Single-page downloader (scratch / dev version)

### What this does at a high level
Defines an older helper `download_article_iv` that:
- takes a single IMF Article IV HTML page URL,
- finds the PDF link on that page,
- downloads the PDF to a specified local path.

### How it does it
- Sends `requests.get(imf_page_url)` with a timeout to fetch the HTML.
- Uses `BeautifulSoup(html, "html.parser")` to parse the page.
- First tries to locate an `<a>` tag whose **visible text** contains `"Download PDF"`.
- If that fails, falls back to any `<a>` tag whose `href` ends with `.pdf`.
- Converts any relative link (starting with `/`) into a full URL using the `BASE` constant.
- Streams the PDF content via `requests.get(..., stream=True)` in chunks and writes it to `out_path`.
- Includes a `__main__` block at the bottom to test downloading a particular Article IV report.

> You can keep this as a scratch/test helper; the main pipeline below uses a more manifest-driven structure.


In [41]:
# working version
def download_article_iv(imf_page_url, out_path):
    # 1. Fetch the HTML page
    print("Step 1: fetching HTML:", imf_page_url)
    resp = requests.get(
        imf_page_url,
        timeout=(5, 60),  # 5s to connect, 60s to read the HTML
    )
    resp.raise_for_status()
    html = resp.text

    # 2. Parse and find the "Download PDF" link
    print("Step 2: parsing HTML for PDF link")
    soup = BeautifulSoup(html, "html.parser")

    # First try <a> with visible text containing "Download PDF"
    link_tag = soup.find("a", string=lambda s: s and "Download PDF" in s)

    # Fallback: any <a> whose href ends with ".pdf"
    if not link_tag:
        link_tag = soup.find("a", href=lambda h: h and h.lower().endswith(".pdf"))

    if not link_tag:
        raise RuntimeError("Could not find a PDF link on the page")

    pdf_url = link_tag.get("href")
    if pdf_url.startswith("/"):
        pdf_url = BASE + pdf_url

    print("PDF URL:", pdf_url)

    # 3. Download the PDF
    print("Step 3: downloading PDF")
    pdf_resp = requests.get(
        pdf_url,
        stream=True,
        timeout=(5, 120),  # more time for the actual file download
    )
    pdf_resp.raise_for_status()

    with open(out_path, "wb") as f:
        for chunk in pdf_resp.iter_content(chunk_size=8192):
            if chunk:
                f.write(chunk)

    print("Saved to", out_path)


if __name__ == "__main__":
    url = "https://www.imf.org/en/publications/cr/issues/2025/09/22/djibouti-2025-article-iv-consultation-press-release-and-staff-report-570614"
    download_article_iv(url, "djibouti_2025_article_iv.pdf")

Step 1: fetching HTML: https://www.imf.org/en/publications/cr/issues/2025/09/22/djibouti-2025-article-iv-consultation-press-release-and-staff-report-570614
Step 2: parsing HTML for PDF link
PDF URL: https://www.imf.org/-/media/files/publications/cr/2025/english/1djiea2025001-source-pdf.pdf
Step 3: downloading PDF
Saved to djibouti_2025_article_iv.pdf


## 3. Download Article IV PDFs (manifest-driven) - Multiple

### What this does at a high level
Sets up a **manifest-driven** workflow to download Article IV PDFs for many countries at once:
- uses `data/manifest.csv` as the central list of target reports,
- ensures the folder structure exists,
- reads/writes the manifest to keep track of progress and errors.

### How it does it
- Defines:
  - `DATA_DIR = "data"` as the top-level data folder,
  - `MANIFEST_PATH = os.path.join(DATA_DIR, "manifest.csv")` as the pipeline state file,
  - `PDF_DIR = os.path.join(DATA_DIR, "raw_pdfs")` as the folder storing raw PDFs.
- `ensure_dirs()`:
  - creates `data/` and `data/raw_pdfs/` if they don’t exist.
- `read_manifest()`:
  - opens `manifest.csv` with `csv.DictReader`,
  - returns both:
    - `rows`: a `list[dict]` with one dict per row,
    - `fieldnames`: the ordered list of column names.
- `write_manifest(rows, fieldnames)`:
  - completely rewrites `manifest.csv`,
  - uses `csv.DictWriter` and `fieldnames` to ensure consistent column order,
  - persists updated statuses, paths, titles, and notes.

Download Article IV PDFS

In [3]:
BASE = "https://www.imf.org"
DATA_DIR = "data"
MANIFEST_PATH = os.path.join(DATA_DIR, "manifest.csv")
PDF_DIR = os.path.join(DATA_DIR, "raw_pdfs")

In [4]:
def ensure_dirs():
    """
    Ensure that the base data directories exist.

    - Creates the top-level 'data/' directory if it doesn't exist.
    - Creates the 'data/raw_pdfs/' directory where PDFs will be stored.

    This function has no return value; it just prepares the filesystem so
    other functions can safely write files into these locations.
    """
    os.makedirs(DATA_DIR, exist_ok=True)
    os.makedirs(PDF_DIR, exist_ok=True)

def read_manifest():
    """
    Load the manifest CSV into memory.

    The manifest is a CSV file at MANIFEST_PATH with one row per
    country/year/IMF URL. Each row tracks the status of the pipeline for
    that country (e.g., pending, pdf_downloaded, etc.).

    Returns
    -------
    rows : list[dict]
        A list of dictionaries, one per row in the manifest.
        Keys are column names from the CSV header.
    fieldnames : list[str]
        The ordered list of column names in the manifest.

    This function does not modify the file; it just reads it.
    """
    with open(MANIFEST_PATH, newline="", encoding="utf-8") as f:
        reader = csv.DictReader(f)
        rows = list(reader)
        fieldnames = reader.fieldnames
    return rows, fieldnames

def write_manifest(rows, fieldnames):
    """
    Overwrite the manifest CSV with updated rows.

    Parameters
    ----------
    rows : list[dict]
        The updated list of row dictionaries to write back to disk.
    fieldnames : list[str]
        The ordered list of column names. This is used both as the header
        row and to control the column order for each written dictionary.

    Behavior
    --------
    - Completely rewrites MANIFEST_PATH.
    - Ensures the header row matches `fieldnames`.
    - Writes each row in `rows` in order.

    This function is the main way we persist pipeline state
    (statuses, paths, titles, error messages, etc.).
    """
    with open(MANIFEST_PATH, "w", newline="", encoding="utf-8") as f:
        writer = csv.DictWriter(f, fieldnames=fieldnames)
        writer.writeheader()
        for r in rows:
            writer.writerow(r)

### 3.2 Find and download the Article IV PDF for a given manifest row

### What this does at a high level
Given an IMF Article IV page URL and a country/year combo:
- `find_pdf_link` locates the direct PDF URL on the HTML page,
- `download_pdf` downloads that PDF and saves it under `data/raw_pdfs/<ISO2>/<year>_article_iv.pdf`.

### How it does it
- `find_pdf_link(imf_page_url)`:
  - runs `requests.get(imf_page_url, timeout=(5, 60))` to fetch the HTML,
  - parses the HTML via `BeautifulSoup`,
  - tries:
    1. `soup.find("a", string=lambda s: s and "Download PDF" in s)` — a link whose visible text contains “Download PDF”,
    2. if none found, `soup.find("a", href=lambda h: h and h.lower().endswith(".pdf"))` — any `<a>` whose `href` ends with `.pdf`,
  - raises `RuntimeError` if no `link_tag` is found,
  - normalizes relative `href`s (starting with `/`) by prefixing `BASE`,
  - returns `(pdf_url, title)`, where `title` is taken from the page’s `<title>`.
- `download_pdf(pdf_url, iso2, year)`:
  - builds a per-country folder `data/raw_pdfs/<ISO2>/`,
  - constructs filename `"<year>_article_iv.pdf"`,
  - uses `requests.get(..., stream=True)` with a generous timeout,
  - iterates over `iter_content(chunk_size=8192)` and writes chunks to disk,
  - returns the full `out_path` to the saved PDF.


In [None]:
def find_pdf_link(imf_page_url):
    """
    Given an IMF Article IV HTML page, locate the underlying PDF URL.

    Parameters
    ----------
    imf_page_url : str
        URL of the IMF "Country Report" / Article IV page, e.g.:
        https://www.imf.org/en/publications/cr/issues/2025/09/22/...

    Raises
    ------
    requests.HTTPError
        If the GET request fails with a non-2xx status code.
    RuntimeError
        If no suitable PDF link can be found in the HTML.
    """
    resp = requests.get(imf_page_url, timeout=(5, 60))
    resp.raise_for_status()
    soup = BeautifulSoup(resp.text, "html.parser")

    # 1) Try <a> with visible text containing "Download PDF"
    link_tag = soup.find("a", string=lambda s: s and "Download PDF" in s)

    # 2) Fallback: any <a> with href ending in .pdf
    if not link_tag:
        link_tag = soup.find("a", href=lambda h: h and h.lower().endswith(".pdf"))

    if not link_tag:
        raise RuntimeError("Could not find PDF link on IMF page")

    pdf_url = link_tag.get("href")
    if pdf_url.startswith("/"):
        pdf_url = BASE + pdf_url

    # Try to get title for logging
    title_tag = soup.find("h1") or soup.find("title")
    title = title_tag.get_text(strip=True) if title_tag else ""
    return pdf_url, title

def download_pdf(pdf_url, iso2, year):
    """
    Download a single PDF and save it under the ISO2 folder for that country.

    Parameters
    ----------
    pdf_url : str
        Direct URL of the PDF file (from `find_pdf_link`).
    iso2 : str
        2-letter country code, used to organize files by country,
        e.g., "DJ" for Djibouti.
    year : str or int
        Year of the Article IV report, used to build the filename.

    Raises
    ------
    requests.HTTPError
        If the HTTP GET for the PDF fails.
    """
    out_dir = os.path.join(PDF_DIR, iso2)
    os.makedirs(out_dir, exist_ok=True)
    filename = f"{year}_article_iv.pdf"
    out_path = os.path.join(out_dir, filename)

    resp = requests.get(pdf_url, stream=True, timeout=(5, 120))
    resp.raise_for_status()
    with open(out_path, "wb") as f:
        for chunk in resp.iter_content(chunk_size=8192):
            if chunk:
                f.write(chunk)
    return out_path

### 3.3 Orchestrate PDF downloads for all manifest rows

### What this does at a high level
Loops over `data/manifest.csv` and downloads PDFs for all reports that are still pending or had PDF errors, updating their rows with:
- `title` (IMF page title),
- `pdf_path` (where the PDF was saved),
- `status` (`pdf_downloaded` or `pdf_error`),
- `notes` (error messages if something goes wrong).

### How it does it
- Calls `ensure_dirs()` to guarantee `data/` and `data/raw_pdfs/` exist.
- Calls `read_manifest()` to load `rows` and `fieldnames`.
- Ensures that certain key columns exist on every row, adding them if missing:
  - `status`, `title`, `pdf_path`, `text_path`, `first_snippet`, `notes`.
- Iterates over each `row`:
  - reads `imf_url`, `iso2`, and `year`,
  - only acts on rows where:
    - `status` is `"pending"` or `"pdf_error"`,
    - and `imf_url` is non-empty.
  - For each such row:
    1. calls `find_pdf_link(imf_url)` to get `(pdf_url, title)`,
    2. calls `download_pdf(pdf_url, iso2, year)` to fetch and save the PDF,
    3. sets:
       - `row["title"] = title`,
       - `row["pdf_path"] = pdf_path`,
       - `row["status"] = "pdf_downloaded"`,
       - `row["notes"] = ""`.
  - On exceptions:
    - prints the error,
    - sets `row["status"] = "pdf_error"` and `row["notes"] = str(e)`.
  - Sleeps `0.5` seconds between rows to be polite to the IMF server.
- After the loop, calls `write_manifest(rows, fieldnames)` so `manifest.csv` reflects all updated statuses and paths.
- The `if __name__ == "__main__": main()` block lets you run this whole PDF download step as a standalone script.

In [None]:
def main():
    """
    Orchestrate PDF downloads for all rows in the manifest.

    This function is the "PDF stage" of the pipeline: after it runs successfully,
    every processed row should have a local PDF ready for text extraction.
    """
    ensure_dirs()
    rows, fieldnames = read_manifest()

    # Make sure we have the columns we expect
    for col in ["status", "title", "pdf_path", "text_path", "first_snippet", "notes"]:
        if col not in fieldnames:
            fieldnames.append(col)

    for row in rows:
        status = (row.get("status") or "pending").lower()
        imf_url = row.get("imf_url") or ""
        if status not in ("pending", "pdf_error") or not imf_url:
            continue

        iso2 = row["iso2"]
        year = row["year"]
        print(f"=== {iso2} {year} ===")
        print("IMF URL:", imf_url)

        try:
            pdf_url, title = find_pdf_link(imf_url)
            print("  PDF URL:", pdf_url)
            pdf_path = download_pdf(pdf_url, iso2, year)
            print("  Saved PDF to:", pdf_path)

            row["title"] = title
            row["pdf_path"] = pdf_path
            row["status"] = "pdf_downloaded"
            row["notes"] = ""
        except Exception as e:
            print("  ERROR:", e)
            row["status"] = "pdf_error"
            row["notes"] = str(e)

        # Be nice to servers
        time.sleep(0.5)

    write_manifest(rows, fieldnames)
    print("Updated manifest:", MANIFEST_PATH)



## 4. Build the initial Article IV manifest

### What this does at a high level
Constructs the initial `data/manifest.csv` file from a hard-coded list of Article IV metadata, one row per country/year/IMF URL.

### How it does it
- Ensures `data/` exists: `os.makedirs("data", exist_ok=True)`.
- Defines `ART_IV` as a Python list of dictionaries, where each dict contains:
  - `iso2`, `iso3`, `country`, `year`,
  - `imf_url` (link to the IMF Article IV report),
  - `press_release_url` (optional).
- Opens `data/manifest.csv` for writing and creates a `csv.DictWriter` with the fieldnames:
  - `["iso2", "iso3", "country", "year", "imf_url", "press_release_url", "status", "title", "pdf_path", "text_path", "first_snippet", "notes"]`.
- Writes the header row once.
- Iterates over each `row` in `ART_IV` and writes a manifest row with:
  - the metadata from `ART_IV`,
  - `status = "pending"`,
  - empty placeholders for `title`, `pdf_path`, `text_path`, `first_snippet`, `notes`.
- Prints how many rows were written so you can confirm the manifest’s size.

This manifest becomes the **single source of truth** that the rest of the pipeline reads and updates.


In [7]:

# Create data/manifest.csv
os.makedirs("data", exist_ok=True)
path = "data/manifest.csv"

# Put all the Article IV reports you care about here
# You can expand this list as you collect more URLs
ART_IV = [
    {
        "iso2": "DJ",
        "iso3": "DJI",
        "country": "Djibouti",
        "year": 2025,
        "imf_url": "https://www.imf.org/en/publications/cr/issues/2025/09/22/djibouti-2025-article-iv-consultation-press-release-and-staff-report-570614",
        "press_release_url": "https://www.imf.org/en/news/articles/2025/09/11/pr25292-djibouti-imf-executive-board-concludes-2025-article-iv-consultation",
    },
    {
        "iso2": "MX",
        "iso3": "MEX",
        "country": "Mexico",
        "year": 2025,
        "imf_url": "https://www.imf.org/en/publications/cr/issues/2025/10/27/mexico-2025-article-iv-consultation-press-release-staff-report-and-statement-by-the-571378",
        "press_release_url": "https://www.imf.org/en/news/articles/2025/10/27/pr-25350-mexico-imf-executive-board-concludes-2025-article-iv-consultation",
    },
    {
        "iso2": "US",
        "iso3": "USA",
        "country": "United States",
        "year": 2024,
        "imf_url": "https://www.imf.org/en/publications/cr/issues/2024/07/18/united-states-2024-article-iv-consultation-press-release-staff-report-and-statement-by-the-552100",
        "press_release_url": "",
    },
    {
        "iso2": "DO",
        "iso3": "DOM",
        "country": "Dominican Republic",
        "year": 2024,
        "imf_url": "https://www.imf.org/en/publications/cr/issues/2024/09/12/dominican-republic-2024-article-iv-consultation-press-release-and-staff-report-554787",
        "press_release_url": "",
    },
    {
        "iso2": "NI",
        "iso3": "NIC",
        "country": "Nicaragua",
        "year": 2023,
        "imf_url": "https://www.imf.org/en/publications/cr/issues/2024/01/19/nicaragua-2023-article-iv-consultation-press-release-and-staff-report-543914",
        "press_release_url": "",
    },
    {
        "iso2": "CL",
        "iso3": "CHL",
        "country": "Chile",
        "year": 2024,
        "imf_url": "https://www.imf.org/en/publications/cr/issues/2025/02/04/chile-2024-article-iv-consultation-press-release-and-staff-report-561588",
        "press_release_url": "",
    },
    {
        "iso2": "KH",
        "iso3": "KHM",
        "country": "Cambodia",
        "year": 2023,
        "imf_url": "https://www.imf.org/en/publications/cr/issues/2024/01/29/cambodia-2023-article-iv-consultation-press-release-and-staff-report-544276",
        "press_release_url": "",
    },
    {
        "iso2": "KZ",
        "iso3": "KAZ",
        "country": "Kazakhstan",
        "year": 2024,
        "imf_url": "https://www.imf.org/en/publications/cr/issues/2025/01/31/republic-of-kazakhstan-2024-article-iv-consultation-press-release-and-staff-report-561424",
        "press_release_url": "",
    },
    {
        "iso2": "AL",
        "iso3": "ALB",
        "country": "Albania",
        "year": 2023,
        "imf_url": "https://www.imf.org/en/publications/cr/issues/2024/01/12/albania-2023-article-iv-consultation-press-release-and-staff-report-543731",
        "press_release_url": "",
    },
    {
        "iso2": "SA",
        "iso3": "SAU",
        "country": "Saudi Arabia",
        "year": 2024,
        "imf_url": "https://www.imf.org/en/publications/cr/issues/2024/09/03/saudi-arabia-2024-article-iv-consultation-press-release-staff-report-and-informational-annex-554530",
        "press_release_url": "",
    },
    {
        "iso2": "AZ",
        "iso3": "AZE",
        "country": "Azerbaijan",
        "year": 2023,
        "imf_url": "https://www.imf.org/en/publications/cr/issues/2024/02/07/republic-of-azerbaijan-2023-article-iv-consultation-press-release-and-staff-report-544481",
        "press_release_url": "",
    },
    {
        "iso2": "PH",
        "iso3": "PHL",
        "country": "Philippines",
        "year": 2024,
        "imf_url": "https://www.imf.org/en/publications/cr/issues/2024/12/19/philippines-2024-article-iv-consultation-press-release-and-staff-report-559739",
        "press_release_url": "",
    },
    {
        "iso2": "RO",
        "iso3": "ROU",
        "country": "Romania",
        "year": 2023,
        "imf_url": "https://www.imf.org/en/publications/cr/issues/2023/12/07/romania-2023-article-iv-consultation-press-release-and-staff-report-imf-country-report-no-542184",
        "press_release_url": "",
    },
    {
        "iso2": "OM",
        "iso3": "OMN",
        "country": "Oman",
        "year": 2024,
        "imf_url": "https://www.imf.org/en/publications/cr/issues/2025/01/22/oman-2024-article-iv-consultation-press-release-and-staff-report-561155",
        "press_release_url": "",
    },
    {
        "iso2": "AG",
        "iso3": "ATG",
        "country": "Antigua and Barbuda",
        "year": 2023,
        "imf_url": "https://www.imf.org/en/publications/cr/issues/2024/01/23/antigua-and-barbuda-2023-article-iv-consultation-press-release-and-staff-report-544023",
        "press_release_url": "",
    },
    {
        "iso2": "UZ",
        "iso3": "UZB",
        "country": "Uzbekistan",
        "year": 2024,
        "imf_url": "https://www.imf.org/en/publications/cr/issues/2024/07/11/republic-of-uzbekistan-2024-article-iv-consultation-press-release-and-staff-report-551710",
        "press_release_url": "",
    },
    {
        "iso2": "WS",
        "iso3": "WSM",
        "country": "Samoa",
        "year": 2024,
        "imf_url": "https://www.imf.org/en/publications/cr/issues/2025/01/31/samoa-2024-article-iv-consultation-press-release-and-staff-report-561434",
        "press_release_url": "",
    },
    {
        "iso2": "NL",
        "iso3": "NLD",
        "country": "Netherlands",
        "year": 2024,
        "imf_url": "https://www.imf.org/en/publications/cr/issues/2024/04/05/kingdom-of-the-netherlandsthe-netherlands-2024-article-iv-consultation-press-release-and-547340",
        "press_release_url": "",
    },
    {
        "iso2": "TH",
        "iso3": "THA",
        "country": "Thailand",
        "year": 2024,
        "imf_url": "https://www.imf.org/en/publications/cr/issues/2025/02/20/thailand-2024-article-iv-consultation-press-release-staff-report-and-statement-by-the-562284",
        "press_release_url": "",
    },
    {
        "iso2": "SM",
        "iso3": "SMR",
        "country": "San Marino",
        "year": 2024,
        "imf_url": "https://www.imf.org/en/publications/cr/issues/2024/12/09/republic-of-san-marino-2024-article-iv-consultation-press-release-and-staff-report-559288",
        "press_release_url": "",
    },
    {
        "iso2": "GE",
        "iso3": "GEO",
        "country": "Georgia",
        "year": 2025,
        "imf_url": "https://www.imf.org/en/publications/cr/issues/2025/07/25/georgia-2025-article-iv-consultation-press-release-and-staff-report-569007",
        "press_release_url": "",  # can add later
    },
    {
        "iso2": "ME",
        "iso3": "MNE",
        "country": "Montenegro",
        "year": 2025,
        "imf_url": "https://www.imf.org/en/publications/cr/issues/2025/11/19/montenegro-2025-article-iv-consultation-press-release-and-staff-report-571935",
        "press_release_url": "",
    },
    {
        "iso2": "LY",
        "iso3": "LBY",
        "country": "Libya",
        "year": 2025,
        "imf_url": "https://www.imf.org/en/publications/cr/issues/2025/06/25/libya-2025-article-iv-consultation-press-release-and-staff-report-568035",
        "press_release_url": "",
    },
    {
        "iso2": "NO",
        "iso3": "NOR",
        "country": "Norway",
        "year": 2025,
        "imf_url": "https://www.imf.org/en/publications/cr/issues/2025/08/27/norway-2025-article-iv-consultation-press-release-and-staff-report-570009",
        "press_release_url": "",
    },
    {
        "iso2": "PE",
        "iso3": "PER",
        "country": "Peru",
        "year": 2025,
        "imf_url": "https://www.imf.org/en/publications/cr/issues/2025/06/10/peru-2025-article-iv-consultation-press-release-and-staff-report-567572",
        "press_release_url": "",
    },
    {
        "iso2": "IE",
        "iso3": "IRL",
        "country": "Ireland",
        "year": 2023,
        "imf_url": "https://www.imf.org/en/publications/cr/issues/2023/12/14/ireland-2023-article-iv-consultation-press-release-and-staff-report-542470",
        "press_release_url": "",
    },
    {
        "iso2": "TR",
        "iso3": "TUR",
        "country": "Türkiye",
        "year": 2024,
        "imf_url": "https://www.imf.org/en/publications/cr/issues/2024/10/11/republic-of-trkiye-2024-article-iv-consultation-press-release-staff-report-and-statement-by-556139",
        "press_release_url": "",
    },
    {
        "iso2": "EE",
        "iso3": "EST",
        "country": "Estonia",
        "year": 2025,
        "imf_url": "https://www.imf.org/en/publications/cr/issues/2025/07/11/republic-of-estonia-2025-article-iv-consultation-press-release-and-staff-report-568542",
        "press_release_url": "",
    },
    {
        "iso2": "BZ",
        "iso3": "BLZ",
        "country": "Belize",
        "year": 2024,
        "imf_url": "https://www.imf.org/en/publications/cr/issues/2024/05/15/belize-2024-article-iv-consultation-press-release-and-staff-report-549008",
        "press_release_url": "https://www.imf.org/en/News/Articles/2024/05/15/pr24163-imf-concludes-2024-article-iv-consultation-with-belize",
    },
    {
        "iso2": "VU",
        "iso3": "VUT",
        "country": "Vanuatu",
        "year": 2024,
        "imf_url": "https://www.imf.org/en/publications/cr/issues/2024/09/03/vanuatu-2024-article-iv-consultation-press-release-and-staff-report-554259",
        "press_release_url": "https://www.imf.org/en/News/Articles/2024/09/03/pr24315-vanuatu-imf-exec-board-concludes-2024-art-iv-consult",
    },
    {
        "iso2": "FJ",
        "iso3": "FJI",
        "country": "Fiji",
        "year": 2025,
        "imf_url": "https://www.imf.org/en/publications/cr/issues/2025/06/20/fiji-2025-article-iv-consultation-press-release-and-staff-report-567891",
        "press_release_url": "https://www.imf.org/en/news/articles/2025/06/20/pr-25208-fiji-imf-concludes-2025-article-iv-consultation",
    },
    {
        "iso2": "BE",
        "iso3": "BEL",
        "country": "Belgium",
        "year": 2025,
        "imf_url": "https://www.imf.org/en/publications/cr/issues/2025/03/20/belgium-2025-article-iv-consultation-press-release-and-staff-report-565399",
        "press_release_url": "https://www.imf.org/en/news/articles/2025/03/19/pr25070-belgium-imf-executive-board-concludes-2025-article-iv-consultation-with-belgium",
    },
    {
        "iso2": "CA",
        "iso3": "CAN",
        "country": "Canada",
        "year": 2024,
        "imf_url": "https://www.imf.org/en/publications/cr/issues/2024/07/16/canada-2024-article-iv-consultation-press-release-and-staff-report-551903",
        "press_release_url": "",
    },
    {
        "iso2": "QA",
        "iso3": "QAT",
        "country": "Qatar",
        "year": 2024,
        "imf_url": "https://www.imf.org/en/publications/cr/issues/2025/02/24/qatar-2024-article-iv-consultation-press-release-staff-report-and-statement-by-the-562636",
        "press_release_url": "",
    },
    {
        "iso2": "UY",
        "iso3": "URY",
        "country": "Uruguay",
        "year": 2025,
        "imf_url": "https://www.imf.org/en/publications/cr/issues/2025/10/30/uruguay-2025-article-iv-consultation-press-release-staff-report-and-statement-by-the-571423",
        "press_release_url": "",
    },
    {
        "iso2": "TO",
        "iso3": "TON",
        "country": "Tonga",
        "year": 2025,
        "imf_url": "https://www.imf.org/en/publications/cr/issues/2025/11/07/tonga-2025-article-iv-consultation-press-release-and-staff-report-571692",
        "press_release_url": "",
    },
    {
        "iso2": "MN",
        "iso3": "MNG",
        "country": "Mongolia",
        "year": 2025,
        "imf_url": "https://www.imf.org/en/publications/cr/issues/2025/09/15/mongolia-2025-article-iv-consultation-press-release-staff-report-and-statement-by-the-570413",
        "press_release_url": "https://www.imf.org/en/news/articles/2025/09/15/pr-25298-mongolia-imf-executive-board-concludes-2025-article-iv-consultation",
    },
    {
        "iso2": "LR",
        "iso3": "LBR",
        "country": "Liberia",
        "year": 2025,
        "imf_url": "https://www.imf.org/en/publications/cr/issues/2025/11/07/liberia-2025-article-iv-consultation-and-second-review-under-the-extended-credit-facility-571594",
        "press_release_url": "",
    },
    {
        "iso2": "ST",
        "iso3": "STP",
        "country": "São Tomé and Príncipe",
        "year": 2023,
        "imf_url": "https://www.imf.org/en/publications/cr/issues/2025/08/08/democratic-republic-of-so-tom-and-prncipe-2025-article-iv-consultation-first-review-under-569490",
        "press_release_url": "",
    },
    {
        "iso2": "JP",
        "iso3": "JPN",
        "country": "Japan",
        "year": 2025,
        "imf_url": "https://www.imf.org/en/Publications/CR/Issues/2025/04/01/Japan-2025-Article-IV-Consultation-Press-Release-Staff-Report-and-Statement-by-the-565846",
        "press_release_url": "",
    }, 
        {
        "iso2": "DZ",
        "iso3": "DZA",
        "country": "Algeria",
        "year": 2025,
        "imf_url": "https://www.imf.org/en/publications/cr/issues/2025/09/18/algeria-2025-article-iv-consultation-press-release-and-staff-report-570472",
        "press_release_url": "",
    },
    {
        "iso2": "LS",
        "iso3": "LSO",
        "country": "Lesotho",
        "year": 2025,
        "imf_url": "https://www.imf.org/en/publications/cr/issues/2025/09/17/kingdom-of-lesotho-2025-article-iv-consultation-press-release-and-staff-report-570457",
        "press_release_url": "",
    },
    {
        "iso2": "LT",
        "iso3": "LTU",
        "country": "Lithuania",
        "year": 2025,
        "imf_url": "https://www.imf.org/en/publications/cr/issues/2025/09/16/republic-of-lithuania-2025-article-iv-consultation-press-release-staff-report-and-statement-570361",
        "press_release_url": "",
    },
    {
        "iso2": "TV",
        "iso3": "TUV",
        "country": "https://www.imf.org/en/publications/cr/issues/2025/09/12/tuvalu-2025-article-iv-consultation-press-release-staff-report-and-statement-by-the-570252",
        "year": 2025,
        "imf_url": "",
        "press_release_url": "",
    },
    {
        "iso2": "ML",
        "iso3": "MLI",
        "country": "Mali",
        "year": 2025,
        "imf_url": "https://www.imf.org/en/publications/cr/issues/2025/09/10/mali-2025-article-iv-consultation-press-release-staff-report-and-statement-by-the-executive-570204",
        "press_release_url": "",
    },
    {
        "iso2": "LV",
        "iso3": "LVA",
        "country": "Latvia",
        "year": 2025,
        "imf_url": "https://www.imf.org/en/publications/cr/issues/2025/09/18/republic-of-latvia-2025-article-iv-consultation-press-release-staff-report-and-statement-by-570482",
        "press_release_url": "",
    },
    {
        "iso2": "NR",
        "iso3": "NRU",
        "country": "Nauru",
        "year": 2025,
        "imf_url": "https://www.imf.org/en/publications/cr/issues/2025/09/19/republic-of-nauru-2025-article-iv-consultation-press-release-staff-report-and-statement-by-570609",
        "press_release_url": "",
    },
    {
        "iso2": "SZ",
        "iso3": "SWZ",
        "country": "Eswatini",
        "year": 2025,
        "imf_url": "https://www.imf.org/en/publications/cr/issues/2025/09/26/kingdom-of-eswatini-2025-article-iv-consultation-press-release-staff-report-and-statement-570739",
        "press_release_url": "",
    },
    {
        "iso2": "TL",
        "iso3": "TLS",
        "country": "Timor-Leste",
        "year": 2025,
        "imf_url": "https://www.imf.org/en/publications/cr/issues/2025/09/25/democratic-republic-of-timor-leste-2025-article-iv-consultation-press-release-staff-report-570720",
        "press_release_url": "",
    },
    {
        "iso2": "CH",
        "iso3": "CHE",
        "country": "Switzerland",
        "year": 2025,
        "imf_url": "https://www.imf.org/en/publications/cr/issues/2025/09/16/switzerland-2025-article-iv-consultation-press-release-staff-report-and-statement-by-the-570376",
        "press_release_url": "",
    },
    {
        "iso2": "GT",
        "iso3": "GTM",
        "country": "Guatemala",
        "year": 2025,
        "imf_url": "https://www.imf.org/en/publications/cr/issues/2025/09/12/guatemala-2025-article-iv-consultation-press-release-staff-report-and-statement-by-the-570287",
        "press_release_url": "",
    },
    {
        "iso2": "KG",
        "iso3": "KGZ",
        "country": "Kyrgyz Republic",
        "year": 2025,
        "imf_url": "https://www.imf.org/en/publications/cr/issues/2025/06/04/kyrgyz-republic-2025-article-iv-consultation-press-release-and-staff-report-567394",
        "press_release_url": "",
    },
    {
        "iso2": "CY",
        "iso3": "CYP",
        "country": "Cyprus",
        "year": 2025,
        "imf_url": "https://www.imf.org/en/publications/cr/issues/2025/06/02/cyprus-2025-article-iv-consultation-press-release-and-staff-report-567389",
        "press_release_url": "",
    },
    {
        "iso2": "BO",
        "iso3": "BOL",
        "country": "Bolivia",
        "year": 2025,
        "imf_url": "https://www.imf.org/en/publications/cr/issues/2025/06/02/bolivia-2025-article-iv-consultation-press-release-staff-report-and-statement-by-the-567384",
        "press_release_url": "",
    },
    {
        "iso2": "NZ",
        "iso3": "NZL",
        "country": "New Zealand",
        "year": 2025,
        "imf_url": "https://www.imf.org/en/publications/cr/issues/2025/05/23/new-zealand-2025-article-iv-consultation-press-release-staff-report-and-statement-by-the-567168",
        "press_release_url": "",
    },
    {
        "iso2": "ES",
        "iso3": "ESP",
        "country": "Spain",
        "year": 2024,
        "imf_url": "https://www.imf.org/en/publications/cr/issues/2025/06/05/spain-2025-article-iv-consultation-press-release-and-staff-report-567439",
        "press_release_url": "",
    },
    {
        "iso2": "LU",
        "iso3": "LUX",
        "country": "Luxembourg",
        "year": 2024,
        "imf_url": "https://www.imf.org/en/publications/cr/issues/2025/06/04/luxembourg-2025-article-iv-consultation-press-release-and-staff-report-567449",
        "press_release_url": "",
    },
    {
        "iso2": "GW",
        "iso3": "GNB",
        "country": "Guinea-Bissau",
        "year": 2025,
        "imf_url": "https://www.imf.org/en/publications/cr/issues/2025/07/07/guinea-bissau-2025-article-iv-consultation-eighth-review-under-the-extended-credit-facility-568357",
        "press_release_url": "",
    },
    {
        "iso2": "DK",
        "iso3": "DNK",
        "country": "Denmark",
        "year": 2025,
        "imf_url": "https://www.imf.org/en/publications/cr/issues/2025/07/03/denmark-2025-article-iv-consultation-press-release-and-staff-report-568286",
        "press_release_url": "",
    },
    {
        "iso2": "TZ",
        "iso3": "TZA",
        "country": "Tanzania",
        "year": 2025,
        "imf_url": "https://www.imf.org/en/publications/cr/issues/2025/07/03/united-republic-of-tanzania-staff-report-for-the-2025-article-iv-consultation-fifth-review-568276",
        "press_release_url": "",
    },
    {
        "iso2": "AT",
        "iso3": "AUT",
        "country": "Austria",
        "year": 2025,
        "imf_url": "https://www.imf.org/en/publications/cr/issues/2025/07/02/austria-2025-article-iv-consultation-press-release-and-staff-report-568230",
        "press_release_url": "",
    },
    {
        "iso2": "NG",
        "iso3": "NGA",
        "country": "Nigeria",
        "year": 2025,
        "imf_url": "https://www.imf.org/en/publications/cr/issues/2025/07/01/nigeria-2025-article-iv-consultation-press-release-staff-report-and-statement-by-the-568220",
        "press_release_url": "",
    },
    {
        "iso2": "JM",
        "iso3": "JAM",
        "country": "Jamaica",
        "year": 2024,
        "imf_url": "https://www.imf.org/en/publications/cr/issues/2025/06/25/jamaica-2025-article-iv-consultation-press-release-and-staff-report-567997",
        "press_release_url": "",
    },
    {
        "iso2": "MU",
        "iso3": "MUS",
        "country": "Mauritius",
        "year": 2024,
        "imf_url": "https://www.imf.org/en/publications/cr/issues/2025/06/18/mauritius-2025-article-iv-consultation-press-release-and-staff-report-567835",
        "press_release_url": "",
    },
    {
        "iso2": "VN",
        "iso3": "VNM",
        "country": "Vietnam",
        "year": 2024,
        "imf_url": "https://www.imf.org/en/publications/cr/issues/2025/10/03/vietnam-2025-article-iv-consultation-press-release-staff-report-and-statement-by-the-570895",
        "press_release_url": "",
    },
    {
        "iso2": "ZM",
        "iso3": "ZMB",
        "country": "Zambia",
        "year": 2025,
        "imf_url": "https://www.imf.org/en/publications/cr/issues/2025/08/05/zambia-2025-article-iv-consultation-fifth-review-under-the-extended-credit-facility-569341",
        "press_release_url": "",
    },
    {
        "iso2": "GQ",
        "iso3": "GNQ",
        "country": "Equatorial Guinea",
        "year": 2023,
        "imf_url": "https://www.imf.org/en/publications/cr/issues/2025/08/04/republic-of-equatorial-guinea-staff-report-for-the-2025-article-iv-consultation-and-first-569270",
        "press_release_url": "",
    },
    {
        "iso2": "PA",
        "iso3": "PAN",
        "country": "Panama",
        "year": 2024,
        "imf_url": "https://www.imf.org/en/publications/cr/issues/2025/08/25/panama-2025-article-iv-consultation-press-release-staff-report-and-statement-by-the-569920",
        "press_release_url": "",
    },
    {
        "iso2": "BA",
        "iso3": "BIH",
        "country": "Bosnia and Herzegovina",
        "year": 2025,
        "imf_url": "https://www.imf.org/en/publications/cr/issues/2025/09/03/bosnia-and-herzegovina-2025-article-iv-consultation-press-release-staff-report-staff-570072",
        "press_release_url": "",
    },
    {
        "iso2": "HU",
        "iso3": "HUN",
        "country": "Hungary",
        "year": 2025,
        "imf_url": "https://www.imf.org/en/publications/cr/issues/2025/08/29/hungary-2025-article-iv-consultation-press-release-staff-report-and-statement-by-the-570029",
        "press_release_url": "",
    },
    {
        "iso2": "IN",
        "iso3": "IND",
        "country": "India",
        "year": 2024,
        "imf_url": "https://www.imf.org/en/publications/cr/issues/2025/02/27/india-2024-article-iv-consultation-press-release-staff-report-and-statement-by-the-562726",
        "press_release_url": "",
    },
    {
        "iso2": "PK",
        "iso3": "PAK",
        "country": "Pakistan",
        "year": 2024,
        "imf_url": "https://www.imf.org/en/publications/cr/issues/2024/10/10/pakistan-2024-article-iv-consultation-and-request-for-an-extended-arrangement-under-the-556152",
        "press_release_url": "",
    },
    {
        "iso2": "BR",
        "iso3": "BRA",
        "country": "Brazil",
        "year": 2024,
        "imf_url": "https://www.imf.org/en/publications/cr/issues/2024/07/11/brazil-2024-article-iv-consultation-press-release-staff-report-and-statement-by-the-551705",
        "press_release_url": "",
    },
    {
        "iso2": "CN",
        "iso3": "CHN",
        "country": "China",
        "year": 2024,
        "imf_url": "https://www.imf.org/en/publications/cr/issues/2024/08/01/peoples-republic-of-china-2024-article-iv-consultation-press-release-staff-report-and-552803",
        "press_release_url": "",
    },
    {
        "iso2": "GB",
        "iso3": "GBR",
        "country": "United Kingdom",
        "year": 2024,
        "imf_url": "https://www.imf.org/en/publications/cr/issues/2024/07/03/united-kingdom-2024-article-iv-consultation-press-release-staff-report-and-statement-by-the-551376",
        "press_release_url": "",
    },
    {
        "iso2": "FR",
        "iso3": "FRA",
        "country": "France",
        "year": 2024,
        "imf_url": "https://www.imf.org/en/publications/cr/issues/2024/07/12/france-2024-article-iv-consultation-press-release-staff-report-and-statement-by-the-551772",
        "press_release_url": "",
    },
    {
        "iso2": "DE",
        "iso3": "DEU",
        "country": "Germany",
        "year": 2024,
        "imf_url": "https://www.imf.org/en/publications/cr/issues/2024/07/17/germany-2024-article-iv-consultation-press-release-staff-report-and-statement-by-the-552080",
        "press_release_url": "",
    },
    {
        "iso2": "IT",
        "iso3": "ITA",
        "country": "Italy",
        "year": 2024,
        "imf_url": "https://www.imf.org/en/publications/cr/issues/2024/07/11/italy-2024-article-iv-consultation-press-release-staff-report-and-statement-by-the-551833",
        "press_release_url": "",
    },
    {
        "iso2": "PT",
        "iso3": "PRT",
        "country": "Portugal",
        "year": 2024,
        "imf_url": "https://www.imf.org/en/publications/cr/issues/2024/06/28/portugal-2024-article-iv-consultation-press-release-staff-report-and-statement-by-the-551270",
        "press_release_url": "",
    },
    {
        "iso2": "SE",
        "iso3": "SWE",
        "country": "Sweden",
        "year": 2024,
        "imf_url": "https://www.imf.org/en/publications/cr/issues/2024/03/08/sweden-2024-article-iv-consultation-press-release-and-staff-report-546063",
        "press_release_url": "",
    },
    {
        "iso2": "FI",
        "iso3": "FIN",
        "country": "Finland",
        "year": 2024,
        "imf_url": "https://www.imf.org/en/publications/cr/issues/2024/03/08/finland-2024-article-iv-consultation-press-release-and-staff-report-546074",
        "press_release_url": "",
    },
    {
        "iso2": "PL",
        "iso3": "POL",
        "country": "Poland",
        "year": 2024,
        "imf_url": "https://www.imf.org/en/publications/cr/issues/2025/01/17/republic-of-poland-2024-article-iv-consultation-press-release-staff-report-and-statement-by-560997",
        "press_release_url": "",
    },
    {
        "iso2": "CZ",
        "iso3": "CZE",
        "country": "Czech Republic",
        "year": 2023,
        "imf_url": "https://www.imf.org/en/publications/cr/issues/2024/01/25/czech-republic-2023-article-iv-consultation-press-release-staff-report-and-statement-by-the-544127",
        "press_release_url": "",
    },
    {
        "iso2": "SK",
        "iso3": "SVK",
        "country": "Slovak Republic",
        "year": 2023,
        "imf_url": "https://www.imf.org/en/publications/cr/issues/2024/03/12/slovak-republic-2023-article-iv-consultation-press-release-and-staff-report-546198",
        "press_release_url": "",
    },
    {
        "iso2": "SI",
        "iso3": "SVN",
        "country": "Slovenia",
        "year": 2024,
        "imf_url": "https://www.imf.org/en/publications/cr/issues/2024/05/13/republic-of-slovenia-2024-article-iv-consultation-press-release-staff-report-and-statement-548953",
        "press_release_url": "",
    },
    {
        "iso2": "HR",
        "iso3": "HRV",
        "country": "Croatia",
        "year": 2024,
        "imf_url": "https://www.imf.org/en/publications/cr/issues/2024/04/04/republic-of-croatia-2024-article-iv-consultation-press-release-and-staff-report-547115",
        "press_release_url": "",
    },
    {
        "iso2": "RS",
        "iso3": "SRB",
        "country": "Serbia",
        "year": 2025,
        "imf_url": "https://www.imf.org/en/publications/cr/issues/2025/06/11/republic-of-serbia-2025-article-iv-consultation-press-release-staff-report-and-statement-by-the-567628",
        "press_release_url": "",
    },
    {
        "iso2": "BG",
        "iso3": "BGR",
        "country": "Bulgaria",
        "year": 2024,
        "imf_url": "https://www.imf.org/en/publications/cr/issues/2024/06/14/republic-of-bulgaria-2024-article-iv-consultation-press-release-staff-report-and-statement-by-550508",
        "press_release_url": "",
    },
    {
        "iso2": "CO",
        "iso3": "COL",
        "country": "Colombia",
        "year": 2024,
        "imf_url": "https://www.imf.org/en/publications/cr/issues/2024/08/28/colombia-2024-article-iv-consultation-press-release-staff-report-and-statement-by-the-553853",
        "press_release_url": "",
    },
    {
        "iso2": "EC",
        "iso3": "ECU",
        "country": "Ecuador",
        "year": 2024,
        "imf_url": "https://www.imf.org/en/publications/cr/issues/2024/12/19/ecuador-2024-article-iv-consultation-and-first-review-under-the-extended-arrangement-under-559780",
        "press_release_url": "",
    },
    {
        "iso2": "KE",
        "iso3": "KEN",
        "country": "Kenya",
        "year": 2023,
        "imf_url": "https://www.imf.org/en/publications/cr/issues/2024/01/17/kenya-2023-article-iv-consultation-sixth-reviews-under-the-extended-fund-facility-and-extended-543889",
        "press_release_url": "",
    },
    {
        "iso2": "ET",
        "iso3": "ETH",
        "country": "Ethiopia",
        "year": 2025,
        "imf_url": "https://www.imf.org/en/publications/cr/issues/2025/07/15/the-federal-democratic-republic-of-ethiopia-2025-article-iv-consultation-third-review-under-568611",
        "press_release_url": "",
    },
    {
        "iso2": "GH",
        "iso3": "GHA",
        "country": "Ghana",
        "year": 2023,
        "imf_url": "https://www.imf.org/en/publications/cr/issues/2024/01/25/ghana-2023-article-iv-consultation-first-review-under-the-extended-credit-facility-544137",
        "press_release_url": "",
    },
    {
        "iso2": "CI",
        "iso3": "CIV",
        "country": "Côte d'Ivoire",
        "year": 2024,
        "imf_url": "https://www.imf.org/en/publications/cr/issues/2025/04/08/cote-d-ivoire-2024-article-iv-consultation-and-second-review-under-the-arrangement-under-the-565981",
        "press_release_url": "",
    },
    {
        "iso2": "EG",
        "iso3": "EGY",
        "country": "Egypt",
        "year": 2025,
        "imf_url": "https://www.imf.org/en/publications/cr/issues/2025/07/15/arab-republic-of-egypt-2025-article-iv-consultation-fourth-review-under-the-extended-568598",
        "press_release_url": "",
    },
    {
        "iso2": "JO",
        "iso3": "JOR",
        "country": "Jordan",
        "year": 2022,
        "imf_url": "https://www.imf.org/en/publications/cr/issues/2022/07/13/jordan-2022-article-iv-consultation-and-fourth-review-under-the-extended-arrangement-under-520668",
        "press_release_url": "",
    },
    {
        "iso2": "ZA",
        "iso3": "ZAF",
        "country": "South Africa",
        "year": 2024,
        "imf_url": "https://www.imf.org/en/publications/cr/issues/2025/01/30/south-africa-2024-article-iv-consultation-press-release-staff-report-and-statement-by-the-561414",
        "press_release_url": "",
    },
    {
        "iso2": "TN",
        "iso3": "TUN",
        "country": "Tunisia",
        "year": 2020,
        "imf_url": "https://www.imf.org/en/publications/cr/issues/2021/02/26/tunisia-2020-article-iv-consultation-press-release-staff-report-and-statement-by-the-50128",
        "press_release_url": "",
    },
    {
        "iso2": "MV",
        "iso3": "MDV",
        "country": "Maldives",
        "year": 2022,
        "imf_url": "https://www.imf.org/en/publications/cr/issues/2023/11/09/maldives-2022-article-iv-consultation-press-release-staff-report-and-statement-by-the-541373",
        "press_release_url": "",
    },
    {
        "iso2": "AR",
        "iso3": "ARG",
        "country": "Argentina",
        "year": 2022,
        "imf_url": "https://www.imf.org/en/Publications/CR/Issues/2022/03/25/Argentina-Staff-Report-for-2022-Article-IV-Consultation-and-request-for-an-Extended-515742",
        "press_release_url": "",
    },
    {
        "iso2": "ID",
        "iso3": "IDN",
        "country": "Indonesia",
        "year": 2024,
        "imf_url": "https://www.imf.org/en/publications/cr/issues/2024/08/07/indonesia-2024-article-iv-consultation-press-release-staff-report-and-statement-by-the-553165",
        "press_release_url": "",
    },
    {
        "iso2": "KR",
        "iso3": "KOR",
        "country": "Republic of Korea",
        "year": 2024,
        "imf_url": "https://www.imf.org/en/publications/cr/issues/2025/02/07/republic-of-korea-2024-article-iv-consultation-press-release-staff-report-and-statement-by-561780",
        "press_release_url": "",
    },
    {
        "iso2": "SG",
        "iso3": "SGP",
        "country": "Singapore",
        "year": 2024,
        "imf_url": "https://www.imf.org/en/publications/cr/issues/2024/07/30/singapore-2024-article-iv-consultation-press-release-staff-report-and-statement-by-the-552788",
        "press_release_url": "",
    },
    {
        "iso2": "MA",
        "iso3": "MAR",
        "country": "Morocco",
        "year": 2025,
        "imf_url": "https://www.imf.org/en/publications/cr/issues/2025/04/07/morocco-2025-article-iv-consultation-and-third-review-under-the-arrangement-under-the-565940",
        "press_release_url": "",
    },
    {
        "iso2": "SN",
        "iso3": "SEN",
        "country": "Senegal",
        "year": 2021,
        "imf_url": "https://www.imf.org/en/publications/cr/issues/2022/01/14/senegal-2021-article-iv-consultation-fourth-review-under-the-policy-coordination-instrument-511932",
        "press_release_url": "",
    },
    {
        "iso2": "AO",
        "iso3": "AGO",
        "country": "Angola",
        "year": 2024,
        "imf_url": "https://www.imf.org/en/publications/cr/issues/2025/03/05/angola-2024-article-iv-consultation-press-release-staff-report-and-statement-by-the-562971",
        "press_release_url": "",
    },
    {
        "iso2": "CM",
        "iso3": "CMR",
        "country": "Cameroon",
        "year": 2023,
        "imf_url": "https://www.imf.org/en/publications/cr/issues/2024/02/20/cameroon-2023-article-iv-consultation-fifth-reviews-under-the-extended-credit-facility-and-544962",
        "press_release_url": "",
    },
    {
        "iso2": "BD",
        "iso3": "BGD",
        "country": "Bangladesh",
        "year": 2023,
        "imf_url": "https://www.imf.org/en/publications/cr/issues/2023/12/13/bangladesh-2023-article-iv-consultation-first-reviews-under-the-extended-credit-facility-542460",
        "press_release_url": "",
    },
    {
        "iso2": "LK",
        "iso3": "LKA",
        "country": "Sri Lanka",
        "year": 2024,
        "imf_url": "https://www.imf.org/en/publications/cr/issues/2024/06/13/sri-lanka-2024-article-iv-consultation-and-second-review-under-the-extended-fund-facility-550261",
        "press_release_url": "",
    },
    {
        "iso2": "HT",
        "iso3": "HTI",
        "country": "Haiti",
        "year": 2024,
        "imf_url": "https://www.imf.org/en/publications/cr/issues/2024/12/10/haiti-2024-article-iv-consultation-press-release-staff-report-and-statement-by-the-559329",
        "press_release_url": "",
    },
    {
        "iso2": "RW",
        "iso3": "RWA",
        "country": "Rwanda",
        "year": 2023,
        "imf_url": "https://www.imf.org/en/publications/cr/issues/2023/12/18/rwanda-2023-article-iv-consultation-second-reviews-under-the-policy-coordination-instrument-542581",
        "press_release_url": "",
    },
    {
        "iso2": "UG",
        "iso3": "UGA",
        "country": "Uganda",
        "year": 2024,
        "imf_url": "https://www.imf.org/en/publications/cr/issues/2024/09/11/uganda-2024-article-iv-consultation-press-release-staff-report-and-statement-by-the-554753",
        "press_release_url": "",
    },
    {
        "iso2": "NA",
        "iso3": "NAM",
        "country": "Namibia",
        "year": 2025,
        "imf_url": "https://www.imf.org/en/publications/cr/issues/2025/06/16/namibia-2025-article-iv-consultation-press-release-staff-report-and-statement-by-the-567703",
        "press_release_url": "",
    },
    {
        "iso2": "PG",
        "iso3": "PNG",
        "country": "Papua New Guinea",
        "year": 2025,
        "imf_url": "https://www.imf.org/en/publications/cr/issues/2025/06/18/papua-new-guinea-2025-article-iv-consultation-fourth-reviews-under-the-extended-arrangement-567819",
        "press_release_url": "",
    },
    {
        "iso2": "IS",
        "iso3": "ISL",
        "country": "Iceland",
        "year": 2025,
        "imf_url": "https://www.imf.org/en/publications/cr/issues/2025/06/23/iceland-2025-article-iv-consultation-press-release-staff-report-and-statement-by-the-567916",
        "press_release_url": "",
    },
    {
        "iso2": "CR",
        "iso3": "CRI",
        "country": "Costa Rica",
        "year": 2025,
        "imf_url": "https://www.imf.org/en/publications/cr/issues/2025/05/13/costa-rica-2025-article-iv-consultation-press-release-staff-report-and-statement-by-the-566938",
        "press_release_url": "",
    },
    {
        "iso2": "IQ",
        "iso3": "IRQ",
        "country": "Iraq",
        "year": 2025,
        "imf_url": "https://www.imf.org/en/publications/cr/issues/2025/07/11/iraq-2025-article-iv-consultation-press-release-staff-report-and-informational-annex-568569",
        "press_release_url": "",
    },
    {
        "iso2": "BJ",
        "iso3": "BEN",
        "country": "Benin",
        "year": 2024,
        "imf_url": "https://www.imf.org/en/publications/cr/issues/2024/12/14/benin-fourth-review-under-the-extended-fund-facility-and-the-extended-credit-facility-559472",
        "press_release_url": "",
    },
    {
        "iso2": "TG",
        "iso3": "TGO",
        "country": "Togo",
        "year": 2024,
        "imf_url": "https://www.imf.org/en/publications/cr/issues/2024/09/23/togo-2024-article-iv-consultation-press-release-staff-report-and-statement-by-the-executive-555141",
        "press_release_url": "",
    },
        {
        "iso2": "HN",
        "iso3": "HND",
        "country": "Honduras",
        "year": 2023,
        "imf_url": "https://www.imf.org/en/publications/cr/issues/2023/09/22/honduras-2023-article-iv-consultation-and-requests-for-an-arrangement-under-the-extended-539367",
        "press_release_url": "",
    },
    {
        "iso2": "SV",
        "iso3": "SLV",
        "country": "El Salvador",
        "year": 2025,
        "imf_url": "https://www.imf.org/en/publications/cr/issues/2025/07/15/el-salvador-2025-article-iv-consultation-first-review-under-the-extended-fund-facility-and-568621",
        "press_release_url": "",
    },
    {
        "iso2": "PY",
        "iso3": "PRY",
        "country": "Paraguay",
        "year": 2024,
        "imf_url": "https://www.imf.org/en/publications/cr/issues/2024/07/01/paraguay-2024-article-iv-consultation-third-review-under-the-policy-coordination-instrument-551228",
        "press_release_url": "",
    },
    {
        "iso2": "GY",
        "iso3": "GUY",
        "country": "Guyana",
        "year": 2025,
        "imf_url": "https://www.imf.org/en/publications/cr/issues/2025/05/07/guyana-2025-article-iv-consultation-press-release-staff-report-and-statement-by-the-566712",
        "press_release_url": "",
    },
    {
        "iso2": "SR",
        "iso3": "SUR",
        "country": "Suriname",
        "year": 2024,
        "imf_url": "https://www.imf.org/en/publications/cr/issues/2025/01/21/suriname-2024-article-iv-consultation-and-the-eighth-review-under-the-extended-arrangement-561145",
        "press_release_url": "",
    },
    {
        "iso2": "TT",
        "iso3": "TTO",
        "country": "Trinidad and Tobago",
        "year": 2024,
        "imf_url": "https://www.imf.org/en/publications/cr/issues/2024/06/04/trinidad-and-tobago-2024-article-iv-consultation-press-release-staff-report-and-statement-549885",
        "press_release_url": "",
    },
    {
        "iso2": "BS",
        "iso3": "BHS",
        "country": "Bahamas",
        "year": 2024,
        "imf_url": "https://www.imf.org/en/publications/cr/issues/2025/01/17/the-bahamas-2024-article-iv-consultation-press-release-staff-report-and-statement-by-the-561018",
        "press_release_url": "",
    },
    {
        "iso2": "BB",
        "iso3": "BRB",
        "country": "Barbados",
        "year": 2023,
        "imf_url": "https://www.imf.org/en/publications/cr/issues/2023/12/20/barbados-2023-article-iv-consultation-and-second-reviews-under-the-arrangement-under-the-542666",
        "press_release_url": "",
    },
    {
        "iso2": "GD",
        "iso3": "GRD",
        "country": "Grenada",
        "year": 2025,
        "imf_url": "https://www.imf.org/en/publications/cr/issues/2025/02/04/grenada-2024-article-iv-consultation-press-release-staff-report-and-statement-by-the-561583",
        "press_release_url": "",
    },
    {
        "iso2": "LC",
        "iso3": "LCA",
        "country": "Saint Lucia",
        "year": 2024,
        "imf_url": "https://www.imf.org/en/publications/cr/issues/2025/03/13/st-lucia-2024-article-iv-consultation-press-release-staff-report-and-statement-by-the-565180",
        "press_release_url": "",
    },
    {
        "iso2": "VC",
        "iso3": "VCT",
        "country": "Saint Vincent and the Grenadines",
        "year": 2024,
        "imf_url": "https://www.imf.org/en/publications/cr/issues/2024/07/22/st-552186",
        "press_release_url": "",
    },
    {
        "iso2": "KN",
        "iso3": "KNA",
        "country": "Saint Kitts and Nevis",
        "year": 2025,
        "imf_url": "https://www.imf.org/en/publications/cr/issues/2025/05/12/st-kitts-and-nevis-2025-article-iv-consultation-press-release-staff-report-and-statement-by-566921",
        "press_release_url": "",
    },
    {
        "iso2": "BW",
        "iso3": "BWA",
        "country": "Botswana",
        "year": 2025,
        "imf_url": "https://www.google.com/search?q=botswana+article+iv&oq=Botswana+article+iv&gs_lcrp=EgZjaHJvbWUqBwgAEAAYgAQyBwgAEAAYgAQyCAgBEAAYFhgeMg0IAhAAGIYDGIAEGIoFMg0IAxAAGIYDGIAEGIoFMg0IBBAAGIYDGIAEGIoFMgoIBRAAGKIEGIkFMgoIBhAAGIAEGKIEMgoIBxAAGIAEGKIE0gEIMjY5MWowajSoAgCwAgA&sourceid=chrome&ie=UTF-8",
        "press_release_url": "",
    },
    {
        "iso2": "MZ",
        "iso3": "MOZ",
        "country": "Mozambique",
        "year": 2025,
        "imf_url": "https://www.imf.org/en/publications/cr/issues/2024/07/12/republic-of-mozambique-2024-article-iv-consultation-fourth-review-under-the-three-year-551839",
        "press_release_url": "",
    },
    {
        "iso2": "MG",
        "iso3": "MDG",
        "country": "Madagascar",
        "year": 2025,
        "imf_url": "https://www.imf.org/en/publications/cr/issues/2025/03/21/republic-of-madagascar-staff-report-for-the-2024-article-iv-consultation-first-review-under-562961",
        "press_release_url": "",
    },
    {
        "iso2": "MW",
        "iso3": "MWI",
        "country": "Malawi",
        "year": 2025,
        "imf_url": "https://www.imf.org/en/publications/cr/issues/2025/08/06/malawi-2025-article-iv-consultation-press-release-staff-report-and-statement-by-the-569391",
        "press_release_url": "",
    },
    {
        "iso2": "SL",
        "iso3": "SLE",
        "country": "Sierra Leone",
        "year": 2025,
        "imf_url": "https://www.imf.org/en/publications/cr/issues/2024/11/22/sierra-leone-2024-article-iv-consultation-and-request-for-a-38-month-arrangement-under-the-558772",
        "press_release_url": "",
    },
    {
        "iso2": "NE",
        "iso3": "NER",
        "country": "Niger",
        "year": 2025,
        "imf_url": "https://www.imf.org/en/publications/cr/issues/2025/01/29/niger-2024-article-iv-consultation-sixth-review-under-the-extended-credit-facility-561389",
        "press_release_url": "",
    },
    {
        "iso2": "BF",
        "iso3": "BFA",
        "country": "Burkina Faso",
        "year": 2024,
        "imf_url": "https://www.imf.org/en/publications/cr/issues/2024/07/29/burkina-faso-2024-article-iv-consultation-and-first-review-under-the-extended-credit-552750",
        "press_release_url": "",
    },
    {
        "iso2": "GA",
        "iso3": "GAB",
        "country": "Gabon",
        "year": 2025,
        "imf_url": "https://www.imf.org/en/publications/cr/issues/2024/05/31/gabon-2024-article-iv-consultation-press-release-staff-report-and-statement-by-the-549681",
        "press_release_url": "",
    },
    {
        "iso2": "MT",
        "iso3": "MLT",
        "country": "Malta",
        "year": 2024,
        "imf_url": "https://www.imf.org/en/publications/cr/issues/2024/06/26/malta-2024-article-iv-consultation-press-release-and-staff-report-551330",
        "press_release_url": "",
    },
    {
        "iso2": "HK",
        "iso3": "HKG",
        "country": "Hong Kong SAR",
        "year": 2024,
        "imf_url": "https://www.imf.org/en/publications/cr/issues/2024/09/26/peoples-republic-of-china-hong-kong-special-administrative-region-2024-article-iv-555310",
        "press_release_url": "",
    },
    {
        "iso2": "LA",
        "iso3": "LAO",
        "country": "Lao People's Democratic Republic",
        "year": 2023,
        "imf_url": "https://www.imf.org/en/publications/cr/issues/2023/05/22/lao-people-s-democratic-republic-2023-article-iv-consultation-press-release-staff-report-533636",
        "press_release_url": "",
    },
    {
        "iso2": "JO",
        "iso3": "JOR",
        "country": "Jordan",
        "year": 2022,
        "imf_url": "https://www.imf.org/en/publications/cr/issues/2022/07/13/jordan-2022-article-iv-consultation-and-fourth-review-under-the-extended-arrangement-under-520668",
        "press_release_url": "",
    },
    
]

In [49]:
with open(path, "w", newline="", encoding="utf-8") as f:
    writer = csv.DictWriter(
        f,
        fieldnames=[
            "iso2",
            "iso3",
            "country",
            "year",
            "imf_url",
            "press_release_url",
            "status",
            "title",
            "pdf_path",
            "text_path",
            "first_snippet",
            "notes",
        ],
    )
    writer.writeheader()

    for row in ART_IV:
        writer.writerow({
            "iso2": row["iso2"],
            "iso3": row["iso3"],
            "country": row["country"],
            "year": row["year"],
            "imf_url": row["imf_url"],
            "press_release_url": row.get("press_release_url", ""),
            "status": "pending",   # your download script will flip this to 'pdf_downloaded'
            "title": "",
            "pdf_path": "",
            "text_path": "",
            "first_snippet": "",
            "notes": "",
        })

print(f"Wrote manifest with {len(ART_IV)} rows to {path}")

Wrote manifest with 143 rows to data/manifest.csv


## 5. Convert PDFs to text and extract the Executive Board Assessment (EBA) -- OPTIONAL

### What this does at a high level
Takes all downloaded Article IV PDFs and:
- converts each full PDF into plain text,
- saves the full text to `data/text/<ISO2>/<year>_article_iv.txt`,
- extracts the "Executive Board Assessment" section,
- saves that section to `data/text/<ISO2>/<year>_eba.txt`,
- updates `data/manifest.csv` with:
  - `text_path`, `first_snippet`,
  - `eba_text_path`, `eba_snippet`,
  - `status` and `eba_status` flags for each row.

### How it does it
- Sets:
  - `DATA_DIR = "data"`,
  - `TEXT_DIR = os.path.join(DATA_DIR, "text")`,
  - `MANIFEST_PATH = os.path.join(DATA_DIR, "manifest.csv")`.
- `ensure_dirs()`:
  - creates `data/text/` if it does not exist.
- `read_manifest()` / `write_manifest()`:
  - same pattern as before: read all rows, rewrite them after updates.
- `extract_eba_section(full_text)`:
  - searches `full_text` for the string `"Executive Board Assessment"`,
  - if found, takes the substring from that point until a likely end marker such as:
    - `"Annex"`, `"ANNEX"`,
    - `"APPENDIX"`, `"Appendix"`,
    - or other logical section boundaries,
  - returns the extracted EBA text, `.strip()`’d.
- `main()`:
  - calls `ensure_dirs()` and `read_manifest()`,
  - ensures these columns exist on each row (adding them if missing):
    - `text_path`, `first_snippet`,
    - `eba_text_path`, `eba_snippet`, `eba_status`,
    - `status`, `notes`.
  - loops over rows where:
    - `status` is `"pdf_downloaded"` or `"text_error"`,
    - `pdf_path` is non-empty and exists on disk.
  - for each such row:
    1. uses `extract_text(pdf_path)` from `pdfminer.six` to extract the full text,
    2. saves full text to `data/text/<ISO2>/<year>_article_iv.txt`,
    3. sets `text_path` accordingly,
    4. builds a short `first_snippet` (~first 300 chars with newlines removed) for quick inspection,
    5. calls `extract_eba_section(full_text)`:
       - if EBA text is found:
         - saves it as `data/text/<ISO2>/<year>_eba.txt`,
         - sets `eba_text_path`, `eba_status = "eba_extracted"`,
         - builds a longer ~800-char `eba_snippet`,
       - if no EBA section:
         - sets `eba_status = "eba_missing"`, `eba_text_path = ""`, `eba_snippet = ""`.
    6. clears `notes` on success.
  - on any exception:
    - prints the error,
    - sets:
      - `status = "text_error"`,
      - `notes` to the error string,
      - empties `first_snippet`, `eba_text_path`, `eba_snippet`,
      - sets `eba_status = "eba_error"`.
  - after the loop, calls `write_manifest(rows, fieldnames)` to persist all updates.
  - the `if __name__ == "__main__": main()` block lets you run this stage as a standalone script.

Extract Article IV in Text

In [8]:
from pdfminer.high_level import extract_text

DATA_DIR = "data"
TEXT_DIR = os.path.join(DATA_DIR, "text")
MANIFEST_PATH = os.path.join(DATA_DIR, "manifest.csv")


def ensure_dirs():
    """
    Ensure that the text output directory exists.

    Creates 'data/text/' if it doesn't already exist. This is where
    the plain-text versions of the Article IV PDFs and the Executive
    Board Assessment sections will be stored, organized by ISO2 country
    code.
    """
    os.makedirs(TEXT_DIR, exist_ok=True)


def read_manifest():
    """
    Read the manifest CSV and return its rows and fieldnames.

    Returns
    -------
    rows : list[dict]
        Each dict represents a manifest row.
    fieldnames : list[str]
        Ordered column names in the manifest header.

    This function is used here to know which PDFs have been downloaded
    and where they are located.
    """
    with open(MANIFEST_PATH, newline="", encoding="utf-8") as f:
        reader = csv.DictReader(f)
        rows = list(reader)
        fieldnames = reader.fieldnames
    return rows, fieldnames


def write_manifest(rows, fieldnames):
    """
    Persist updated manifest rows back to the CSV file.

    Parameters
    ----------
    rows : list[dict]
        New manifest content.
    fieldnames : list[str]
        Column names in the desired order.

    This step updates, among others:
    - status          (e.g., 'text_extracted', 'text_error')
    - text_path       (path to the full Article IV .txt file)
    - first_snippet   (first 300 chars of full text, for quick inspection)
    - eba_text_path   (path to the Executive Board Assessment .txt file)
    - eba_snippet     (preview of the EBA section)
    - notes           (error messages if something went wrong)
    """
    with open(MANIFEST_PATH, "w", newline="", encoding="utf-8") as f:
        writer = csv.DictWriter(f, fieldnames=fieldnames)
        writer.writeheader()
        for r in rows:
            writer.writerow(r)


def extract_eba_section(full_text: str) -> str:
    """
    Extract the 'Executive Board Assessment' section from the full
    Article IV report text.

    Parameters
    ----------
    full_text : str
        The entire text of the Article IV PDF as extracted by pdfminer.six.

    How it works
    ------------
    1) Find the position of the phrase 'Executive Board Assessment'.
    2) From that point onward, look for likely end markers, such as:
       - 'Annex'
       - 'APPENDIX'
       - 'Table 1.'
       - 'It is expected that the next Article IV consultation'
    3) Take the substring between the start marker and the earliest
       end marker (if any). If no end marker is found, take a generous
       slice (e.g., up to start+8000 characters) as a fallback.

    Returns
    -------
    eba_text : str
        The extracted Executive Board Assessment section, or an empty
        string if the start marker cannot be found.
    """
    marker = "Executive Board Assessment"
    start = full_text.find(marker)
    if start == -1:
        return ""

    # Heuristics for where the EBA might end
    end_markers = [
        "Annex",
        "ANNEX",
        "Appendix",
        "APPENDIX",
        "Table 1.",
        "Table 1 ",
        "It is expected that the next Article IV consultation",
    ]

    end_candidates = []
    for m in end_markers:
        idx = full_text.find(m, start)
        if idx != -1:
            end_candidates.append(idx)

    if end_candidates:
        end = min(end_candidates)
    else:
        # Fallback: just take a big chunk (e.g., ~8k characters)
        end = min(len(full_text), start + 8000)

    eba_text = full_text[start:end].strip()
    return eba_text


def main():
    """
    Extract plain text from all downloaded Article IV PDFs AND
    extract the 'Executive Board Assessment' section from that text.

    High-level behavior
    -------------------
    - Ensures 'data/text/' exists.
    - Reads the manifest CSV.
    - Ensures the following columns exist:
        * text_path, first_snippet
        * eba_text_path, eba_snippet, eba_status
        * notes, status
    - For each row with status 'pdf_downloaded' or 'text_error':
        * Verify that 'pdf_path' exists on disk.
        * Use pdfminer.six's `extract_text` to convert the PDF to full
          plain text.
        * Save the full text as:
              data/text/<ISO2>/<year>_article_iv.txt
        * Extract the 'Executive Board Assessment' section from that
          full text via `extract_eba_section`.
        * If an EBA section is found:
              - Save it as data/text/<ISO2>/<year>_eba.txt
              - Update eba_text_path, eba_snippet, eba_status='eba_extracted'
          Otherwise:
              - Set eba_status='eba_missing'
        * Update:
              status = 'text_extracted'
              text_path = path to the full .txt file
              first_snippet = first ~300 characters of the full text
              notes = "" on success
      If something fails (missing file, parse error, etc.), the row is marked
      as 'text_error' and given an explanatory message in 'notes'.

    After this step, all successfully processed rows have:
    - The full Article IV text
    - A dedicated Executive Board Assessment text file
    - Preview snippets for both, stored in the manifest.
    """
    ensure_dirs()
    rows, fieldnames = read_manifest()

    # Make sure columns exist
    for col in ["text_path", "first_snippet", "notes",
                "eba_text_path", "eba_snippet", "eba_status"]:
        if col not in fieldnames:
            fieldnames.append(col)
    if "status" not in fieldnames:
        fieldnames.append("status")

    for row in rows:
        status = row.get("status", "")
        if status not in ("pdf_downloaded", "text_error"):
            continue

        pdf_path = row.get("pdf_path") or ""
        if not pdf_path or not os.path.exists(pdf_path):
            row["status"] = "text_error"
            row["notes"] = f"Missing PDF: {pdf_path}"
            row["first_snippet"] = ""
            row["eba_status"] = "eba_error"
            row["eba_snippet"] = ""
            row["eba_text_path"] = ""
            continue

        iso2 = row["iso2"]
        year = row["year"]
        print(f"Extracting text for {iso2} {year} from {pdf_path}")

        try:
            # Extract full text from the PDF using pdfminer.six
            full_text = extract_text(pdf_path)

            # Save full text to per-country file
            out_dir = os.path.join(TEXT_DIR, iso2)
            os.makedirs(out_dir, exist_ok=True)
            full_out_path = os.path.join(out_dir, f"{year}_article_iv.txt")
            with open(full_out_path, "w", encoding="utf-8") as f:
                f.write(full_text)

            row["text_path"] = full_out_path
            row["status"] = "text_extracted"

            # Store a short snippet of the full report in the manifest
            full_snippet = (full_text[:300] or "").replace("\n", " ").replace("\r", " ")
            row["first_snippet"] = full_snippet

            # Extract Executive Board Assessment from the full text
            eba_text = extract_eba_section(full_text)
            if eba_text:
                eba_out_path = os.path.join(out_dir, f"{year}_eba.txt")
                with open(eba_out_path, "w", encoding="utf-8") as f:
                    f.write(eba_text)

                row["eba_text_path"] = eba_out_path
                row["eba_status"] = "eba_extracted"

                # Make EBA snippet a bit longer (e.g. 800 chars) for better context
                eba_snippet = (eba_text[:800] or "").replace("\n", " ").replace("\r", " ")
                row["eba_snippet"] = eba_snippet
            else:
                row["eba_text_path"] = ""
                row["eba_status"] = "eba_missing"
                row["eba_snippet"] = ""

            row["notes"] = ""
        except Exception as e:
            print("  ERROR extracting:", e)
            row["status"] = "text_error"
            row["notes"] = str(e)
            row["first_snippet"] = ""
            row["eba_text_path"] = ""
            row["eba_status"] = "eba_error"
            row["eba_snippet"] = ""

    write_manifest(rows, fieldnames)
    print("Updated manifest with text and EBA info:", MANIFEST_PATH)


if __name__ == "__main__":
    main()

Updated manifest with text and EBA info: data\manifest.csv


## 6. Extract HTML "Summary" sections from IMF country report pages

### What this does at a high level
For each Article IV report in the manifest, fetches the IMF HTML page and:
- extracts the on-page "Summary" section,
- saves it to `data/text/<ISO2>/<year>_summary.txt`,
- stores a short snippet and status flags in `data/manifest.csv`.

### How it does it
- Reuses:
  - `DATA_DIR = "data"`,
  - `TEXT_DIR = os.path.join(DATA_DIR, "text")`,
  - `MANIFEST_PATH = os.path.join(DATA_DIR, "manifest.csv")`.
- Creates a shared `requests.Session()` with a custom `User-Agent`:
  - `SESSION = requests.Session()`,
  - `SESSION.headers.update({"User-Agent": "urap-sovereign-debt/1.0 (summary-scraper)"})`
  so that requests identify your project politely.
- `ensure_dirs()`:
  - ensures `data/text/` exists.
- `read_manifest()` / `write_manifest()`:
  - same pattern as previous sections; this stage also updates the same manifest file.
- `fetch_imf_html(url)`:
  - uses `SESSION.get(url, timeout=30)` to download the HTML,
  - checks for HTTP errors via `resp.raise_for_status()`,
  - returns `resp.text`.
- `extract_summary_from_html(html)`:
  - parses the HTML with `BeautifulSoup(html, "html.parser")`,
  - finds the "Summary" heading and then gathers the text in the paragraphs following it,
  - stops when it encounters metadata headings like `"Subject:"`, `"Keywords:"`, or `"Publication Details"`,
  - joins collected parts into one `summary_text` string and returns it.
- `main()`:
  - calls `ensure_dirs()` and `read_manifest()`,
  - ensures these columns exist for each row:
    - `summary_path`, `summary_snippet`, `summary_status`,
    - `notes` (preserving any earlier notes).
  - loops over each row with a non-empty `imf_url`:
    1. fetches the HTML page via `fetch_imf_html(imf_url)`,
    2. runs `extract_summary_from_html(html)`:
       - if non-empty:
         - saves it as `data/text/<ISO2>/<year>_summary.txt`,
         - sets `summary_path` and `summary_status = "summary_extracted"`,
         - builds a short `summary_snippet` (e.g., first ~400–500 characters, cleaned of newlines),
       - if empty:
         - sets `summary_status = "summary_missing"`,
         - keeps `summary_path` and `summary_snippet` empty.
    3. on success, clears `notes` or appends info as needed.
  - on exception:
    - prints an error message,
    - sets:
      - `summary_status = "summary_error"`,
      - `summary_path = ""`, `summary_snippet = ""`,
      - appends a `"summary_error: <exception>"` tag into `notes` without discarding existing notes.
  - writes everything back via `write_manifest(rows, fieldnames)` and prints a confirmation.
  - the `if __name__ == "__main__": main()` block allows running this summary extraction step on its own.


In [10]:
DATA_DIR = "data"
TEXT_DIR = os.path.join(DATA_DIR, "text")
MANIFEST_PATH = os.path.join(DATA_DIR, "manifest.csv")

SESSION = requests.Session()
SESSION.headers.update({
    "User-Agent": "urap-sovereign-debt/1.0 (summary-scraper)"
})


def ensure_dirs():
    """
    Ensure that the text output directory exists.

    Creates 'data/text/' if it doesn't already exist. This is where
    the plain-text IMF "Summary" snippets will be stored, organized by
    ISO2 country code.
    """
    os.makedirs(TEXT_DIR, exist_ok=True)


def read_manifest():
    """
    Read the manifest CSV and return its rows and fieldnames.

    Returns
    -------
    rows : list[dict]
        Each dict represents a manifest row.
    fieldnames : list[str]
        Ordered column names in the manifest header.
    """
    with open(MANIFEST_PATH, newline="", encoding="utf-8") as f:
        reader = csv.DictReader(f)
        rows = list(reader)
        fieldnames = reader.fieldnames
    return rows, fieldnames


def write_manifest(rows, fieldnames):
    """
    Persist updated manifest rows back to the CSV file.

    Parameters
    ----------
    rows : list[dict]
        New manifest content.
    fieldnames : list[str]
        Column names in the desired order.

    This step updates, among others:
    - summary_status   (e.g., 'summary_extracted', 'summary_error', 'summary_missing')
    - summary_path     (path to the summary .txt file)
    - summary_snippet  (first ~400 chars of the summary)
    - notes            (error messages if something went wrong)
    """
    with open(MANIFEST_PATH, "w", newline="", encoding="utf-8") as f:
        writer = csv.DictWriter(f, fieldnames=fieldnames)
        writer.writeheader()
        for r in rows:
            writer.writerow(r)


def fetch_imf_html(url: str) -> str:
    """
    Fetch the HTML page for a given IMF country report URL.

    Parameters
    ----------
    url : str
        The 'imf_url' from the manifest row.

    Returns
    -------
    html : str
        Raw HTML as a string.

    Raises
    ------
    Exception
        If the request fails or times out.
    """
    resp = SESSION.get(url, timeout=30)
    resp.raise_for_status()
    return resp.text


def extract_summary_from_html(html: str) -> str:
    """
    Extract the 'Summary' section from an IMF country report HTML page.

    How it works
    ------------
    1) Parse the HTML with BeautifulSoup.
    2) Find a heading tag (h2/h3/h4) whose text is exactly 'Summary'.
    3) Starting from that heading, walk through its next sibling elements.
       Collect text from each sibling until we hit a line that looks like
       the start of another section (e.g., 'Subject:', 'Keywords:', or a
       new heading).
    4) Join the collected paragraphs into a single string.

    Returns
    -------
    summary_text : str
        The extracted Summary text, or an empty string if we cannot
        find the "Summary" heading.
    """
    soup = BeautifulSoup(html, "html.parser")

    # Find a heading with text 'Summary'
    summary_heading = None
    for tag in soup.find_all(["h2", "h3", "h4"]):
        if tag.get_text(strip=True) == "Summary":
            summary_heading = tag
            break

    if summary_heading is None:
        return ""

    summary_parts = []

    # Look at the siblings after the heading
    for sib in summary_heading.find_next_siblings():
        # Stop if we hit another major heading
        if sib.name in ["h2", "h3", "h4"]:
            break

        text = sib.get_text(" ", strip=True)
        if not text:
            continue

        # Stop when we reach metadata sections
        if text.startswith("Subject:") or text.startswith("Keywords:") or text.startswith("Publication Details"):
            break

        summary_parts.append(text)

    summary_text = " ".join(summary_parts).strip()
    return summary_text


def main():
    """
    Extract the HTML 'Summary' for all IMF Article IV reports in manifest.csv.

    High-level behavior
    -------------------
    - Ensures 'data/text/' exists.
    - Reads the manifest CSV.
    - Ensures the following columns exist:
        * summary_path, summary_snippet, summary_status
        * notes (if not already there)
    - For each row with a non-empty 'imf_url':
        * Fetch the HTML page.
        * Extract the 'Summary' section via `extract_summary_from_html`.
        * If summary is found:
              - Save it as data/text/<ISO2>/<year>_summary.txt
              - Update summary_path, summary_snippet, summary_status='summary_extracted'
          Otherwise:
              - Set summary_status='summary_missing'
              - summary_path='', summary_snippet=''
        * On any error (network/parsing), mark summary_status='summary_error'
          and store the exception message in 'notes'.

    After this step, all successfully processed rows will have:
    - A per-country summary text file, ready for sentiment or other NLP.
    - A short snippet in the manifest so you can eyeball what was scraped.
    """
    ensure_dirs()
    rows, fieldnames = read_manifest()

    # Make sure columns exist
    for col in ["summary_path", "summary_snippet", "summary_status", "notes"]:
        if col not in fieldnames:
            fieldnames.append(col)

    for row in rows:
        imf_url = (row.get("imf_url") or "").strip()
        if not imf_url:
            # No URL, nothing to do
            row["summary_status"] = "summary_error"
            row["summary_path"] = ""
            row["summary_snippet"] = ""
            row["notes"] = (row.get("notes") or "") + " | Missing imf_url"
            continue

        iso2 = row.get("iso2", "XX")
        year = str(row.get("year", "")).strip()
        print(f"Extracting SUMMARY for {iso2} {year} from {imf_url}")

        try:
            html = fetch_imf_html(imf_url)
            summary_text = extract_summary_from_html(html)

            if not summary_text:
                row["summary_status"] = "summary_missing"
                row["summary_path"] = ""
                row["summary_snippet"] = ""
                # Keep any existing notes, but add a hint
                old_notes = row.get("notes") or ""
                row["notes"] = (old_notes + " | Summary heading not found").strip(" |")
                continue

            # Save summary to per-country file
            out_dir = os.path.join(TEXT_DIR, iso2)
            os.makedirs(out_dir, exist_ok=True)
            # Fallback if year is blank
            file_year = year if year else "unknown"
            summary_path = os.path.join(out_dir, f"{file_year}_summary.txt")

            with open(summary_path, "w", encoding="utf-8") as f:
                f.write(summary_text)

            row["summary_path"] = summary_path
            row["summary_status"] = "summary_extracted"

            snippet = (summary_text or "").replace("\n", " ").replace("\r", " ")
            row["summary_snippet"] = snippet
            # Clear summary-related error notes if any
            # (or leave existing notes if you want to track PDF issues separately)
            # Here we'll leave other notes alone unless they were empty.
            if not row.get("notes"):
                row["notes"] = ""
        except Exception as e:
            print("  ERROR extracting SUMMARY:", e)
            row["summary_status"] = "summary_error"
            row["summary_path"] = ""
            row["summary_snippet"] = ""
            old_notes = row.get("notes") or ""
            row["notes"] = (old_notes + f" | summary_error: {e}").strip(" |")

    write_manifest(rows, fieldnames)
    print("Updated manifest with summary info:", MANIFEST_PATH)


if __name__ == "__main__":
    main()

Extracting SUMMARY for DJ 2025 from https://www.imf.org/en/publications/cr/issues/2025/09/22/djibouti-2025-article-iv-consultation-press-release-and-staff-report-570614
Extracting SUMMARY for MX 2025 from https://www.imf.org/en/publications/cr/issues/2025/10/27/mexico-2025-article-iv-consultation-press-release-staff-report-and-statement-by-the-571378
Extracting SUMMARY for US 2024 from https://www.imf.org/en/publications/cr/issues/2024/07/18/united-states-2024-article-iv-consultation-press-release-staff-report-and-statement-by-the-552100
Extracting SUMMARY for DO 2024 from https://www.imf.org/en/publications/cr/issues/2024/09/12/dominican-republic-2024-article-iv-consultation-press-release-and-staff-report-554787
Extracting SUMMARY for NI 2023 from https://www.imf.org/en/publications/cr/issues/2024/01/19/nicaragua-2023-article-iv-consultation-press-release-and-staff-report-543914
Extracting SUMMARY for CL 2024 from https://www.imf.org/en/publications/cr/issues/2025/02/04/chile-2024-art