# Batch Ministry Workflow

This notebook automates the collection of organization names and cybersecurity responsibility assessments across multiple ministry types.

## Features

- **Sequential Processing:** Domains processed one at a time to avoid rate limits
- **Automatic Retry:** Exponential backoff (2s, 4s, 8s, 16s) handles transient failures
- **Progress Tracking:** See status of each step as it completes

## Setup

Follow [these instructions](https://docs.astral.sh/uv/getting-started/installation/) to install `uv`. Then run the following command to set up your environment:

```bash
uv sync --quiet
```

In [None]:
import sys
from pathlib import Path
sys.path.append(str(Path("../").resolve()))

from scripts.batch_ministry_workflow import run_batch_workflow, MinistryWorkflow
from data import DOMAINS, COUNTRIES
import pandas as pd

## Configuration

Choose which domains to process:

In [None]:
# Option 1: Process specific domains
domains_to_process = [
    "Science & Technology",
    "Foreign Affairs",
    "Defense",
    "Education",
]

# Option 2: Process all available domains (uncomment to use)
# domains_to_process = DOMAINS

# Configuration
output_dir = Path("../outputs")
workers = 2  # Number of parallel workers

print(f"Will process {len(domains_to_process)} domains:")
for i, domain in enumerate(domains_to_process, 1):
    print(f"  {i}. {domain}")
print(f"\nOutput directory: {output_dir}")
print(f"Workers: {workers}")
print(f"\nAvailable domains: {', '.join(DOMAINS)}")

Will process 4 domains:
  1. Science & Technology
  2. Foreign Affairs
  3. Defense
  4. Education

Output directory: ../outputs
Workers: 2

Available domains: Foreign Affairs, Education, Economy, Finance, Commerce, Transportation, Defense, Energy, Science and Technology (and/or Innovation), Communication, Interior/Domestic Affairs, Justice, Trade, Housing, Health, Labor, Culture, Agriculture, Industry


## Run Batch Workflow

This will process all selected domains automatically:

In [None]:
results = await run_batch_workflow(domains_to_process, output_dir, workers=workers)


############################################################
# BATCH MINISTRY WORKFLOW
# Processing 4 domains SEQUENTIALLY
# Workers per domain: 2
# Output directory: ../outputs
############################################################

Note: Domains processed one at a time to avoid rate limits.

[1/4] Processing domain: Science & Technology

🚀 Starting complete workflow for domain: Science & Technology

STEP 1: Collecting organizations for Science & Technology
✓ Saved 579 organizations to ../outputs/science_&_technology/organization_names_science_&_technology.csv

STEP 2: Assessing cybersecurity responsibility for Science & Technology
Found cached response
Using cached response
QueryResponse(full_response={'id': '524dad75-96ac-4b96-a747-b53831f7c6c5', 'model': 'sonar', 'created': 1763177723, 'usage': {'prompt_tokens': 561, 'completion_tokens': 260, 'total_tokens': 821, 'search_context_size': 'low', 'cost': {'input_tokens_cost': 0.001, 'output_tokens_cost': 0.0, 'request_cost': 0.0

## Alternative: Run Single Domain

If you want to run just one domain (useful for testing or re-running failed domains):

In [None]:
# Run workflow for a single domain
if False:
    single_domain = "" \
    ""

    workflow = MinistryWorkflow(single_domain, output_dir, workers=workers)
    org_df, cyber_df = await workflow.run_complete_workflow()

    print(f"\nResults for {single_domain}:")
    print(f"  Organizations: {len(org_df)}")
    print(f"  Assessments: {len(cyber_df)}")

## View Results

Load and inspect results for a specific domain:

In [None]:
# Choose a domain to view
domain_to_view = "Science & Technology"

# Load organizations
org_file = output_dir / domain_to_view.lower().replace(" ", "_") / f"organization_names_{domain_to_view.lower().replace(' ', '_')}.csv"
if org_file.exists():
    org_df = pd.read_csv(org_file)
    print(f"Organizations for {domain_to_view}:")
    display(org_df)
else:
    print(f"No data found for {domain_to_view}")

# Load cybersecurity assessments
cyber_file = output_dir / domain_to_view.lower().replace(" ", "_") / f"organization_cyber_{domain_to_view.lower().replace(' ', '_')}.xlsx"
if cyber_file.exists():
    cyber_df = pd.read_excel(cyber_file)
    print(f"\nCybersecurity assessments for {domain_to_view}:")
    display(cyber_df)
else:
    print(f"No cyber assessment data found for {domain_to_view}")

Organizations for Science & Technology:


Unnamed: 0,question,error,domain,country,organization_name,confidence
0,"What is the top-level state Organ (i.e., minis...",,SCIENCE & TECHNOLOGY,BELARUS,State Committee on Science and Technology of t...,ConfidenceLevel.HIGH
1,"What is the top-level state Organ (i.e., minis...",,SCIENCE & TECHNOLOGY,DENMARK,Ministry of Higher Education and Science,ConfidenceLevel.HIGH
2,"What is the top-level state Organ (i.e., minis...",,SCIENCE & TECHNOLOGY,MEXICO,"Secretariat of Science, Humanities, Technology...",ConfidenceLevel.HIGH
3,"What is the top-level state Organ (i.e., minis...",,SCIENCE & TECHNOLOGY,INDONESIA,"Ministry of Higher Education, Science, and Tec...",ConfidenceLevel.HIGH
4,"What is the top-level state Organ (i.e., minis...",,SCIENCE & TECHNOLOGY,BELGIUM,Belgian Federal Science Policy Office (BELSPO),ConfidenceLevel.HIGH
...,...,...,...,...,...,...
188,"What is the top-level state Organ (i.e., minis...",,SCIENCE & TECHNOLOGY,ROMANIA,"Ministry of Research, Innovation and Digitaliz...",ConfidenceLevel.HIGH
189,"What is the top-level state Organ (i.e., minis...",,SCIENCE & TECHNOLOGY,THAILAND,"Ministry of Higher Education, Science, Researc...",ConfidenceLevel.HIGH
190,"What is the top-level state Organ (i.e., minis...",,SCIENCE & TECHNOLOGY,DEMOCRATIC REPUBLIC OF THE CONGO,Ministry of Scientific Research and Technologi...,ConfidenceLevel.HIGH
191,"What is the top-level state Organ (i.e., minis...",,SCIENCE & TECHNOLOGY,TIMOR-LESTE,"Ministry of Higher Education, Science and Culture",ConfidenceLevel.HIGH



Cybersecurity assessments for Science & Technology:


Unnamed: 0,question,error,organization_country,organization,country,responsibility_level,explanation,confidence,enriched_citations
0,"Is the Ministry of Education, Science and Yout...",,"Ministry of Education, Science and Youth of Ge...","Ministry of Education, Science and Youth of Ge...",Georgia,ResponsibilityLevel.LOW,"The Ministry of Education, Science and Youth o...",ConfidenceLevel.HIGH,[{'url': 'https://www.economy.ge/uploads/files...
1,"Is the Ministry of Telecommunications, Science...",,"Ministry of Telecommunications, Science and Te...","Ministry of Telecommunications, Science and Te...",Saint Vincent and the Grenadines,ResponsibilityLevel.LOW,"The Ministry of Telecommunications, Science an...",ConfidenceLevel.MEDIUM,[{'url': 'https://security.gov.vc/security/ind...
2,"Is the Ministry of Higher Education, Research,...",,"Ministry of Higher Education, Research, Scienc...","Ministry of Higher Education, Research, Scienc...",Papua New Guinea,ResponsibilityLevel.LOW,"The Ministry of Higher Education, Research, Sc...",ConfidenceLevel.HIGH,[{'url': 'https://pacificislands.ai/png-takes-...
3,"Is the Federal Ministry of Research, Technolog...",,"Federal Ministry of Research, Technology and S...","Federal Ministry of Research, Technology and S...",Germany,ResponsibilityLevel.LOW,"The Federal Ministry of Research, Technology a...",ConfidenceLevel.HIGH,[{'url': 'http://connections-qj.org/article/cy...
4,Is the Ministry of Industry and Technology in ...,,Ministry of Industry and Technology in TURKEY,Ministry of Industry and Technology,Turkey,ResponsibilityLevel.HIGH,The Ministry of Industry and Technology in Tur...,ConfidenceLevel.HIGH,[{'url': 'https://www.dailysabah.com/opinion/o...
...,...,...,...,...,...,...,...,...,...
188,Is the Ministry of Scientific Research and Tec...,,Ministry of Scientific Research and Technologi...,Ministry of Scientific Research and Technologi...,Democratic Republic of the Congo,ResponsibilityLevel.LOW,Available sources do not explicitly mention th...,ConfidenceLevel.HIGH,[{'url': 'https://www.scirp.org/journal/paperi...
189,Is the Ministry for Innovation and Technology ...,,Ministry for Innovation and Technology in HUNGARY,Ministry for Innovation and Technology,Hungary,ResponsibilityLevel.LOW,The available sources do not explicitly mentio...,ConfidenceLevel.HIGH,[{'url': 'https://practiceguides.chambers.com/...
190,Is the Ministry of Higher Education and Scient...,,Ministry of Higher Education and Scientific Re...,Ministry of Higher Education and Scientific Re...,Yemen,ResponsibilityLevel.LOW,The Ministry of Higher Education and Scientifi...,ConfidenceLevel.HIGH,[{'url': 'https://ncsi.ega.ee/country/ye_2022/...
191,"Is the Ministry of Education, Culture and Scie...",,"Ministry of Education, Culture and Science of ...","Ministry of Education, Culture and Science of ...",Mongolia,ResponsibilityLevel.LOW,"The Ministry of Education, Culture and Science...",ConfidenceLevel.HIGH,[{'url': 'https://education-profiles.org/easte...


## Summary Statistics

Get a summary of all processed domains:

In [None]:
summary = []
for domain in domains_to_process:
    domain_slug = domain.lower().replace(" ", "_").replace("/", "_")
    org_file = output_dir / domain_slug / f"organization_names_{domain_slug}.csv"
    cyber_file = output_dir / domain_slug / f"organization_cyber_{domain_slug}.xlsx"
    
    if org_file.exists() and cyber_file.exists():
        org_df = pd.read_csv(org_file)
        cyber_df = pd.read_excel(cyber_file)
        summary.append({
            "Domain": domain,
            "Organizations": len(org_df),
            "Assessments": len(cyber_df),
            "Status": "✓ Complete"
        })
    else:
        summary.append({
            "Domain": domain,
            "Organizations": 0,
            "Assessments": 0,
            "Status": "✗ Incomplete"
        })

summary_df = pd.DataFrame(summary)
display(summary_df)

Unnamed: 0,Domain,Organizations,Assessments,Status
0,Science & Technology,193,193,✓ Complete
1,Foreign Affairs,0,0,✗ Incomplete
2,Defense,0,0,✗ Incomplete
3,Education,0,0,✗ Incomplete
