# Batch Ministry Workflow

This notebook automates the collection of organization names and cybersecurity responsibility assessments across multiple ministry types.

## Setup

Follow [these instructions](https://docs.astral.sh/uv/getting-started/installation/) to install `uv`. Then run the following command to set up your environment:

```bash
uv sync --quiet
```

In [1]:
import sys
from pathlib import Path
sys.path.append(str(Path("../").resolve()))

from scripts.batch_ministry_workflow import run_batch_workflow, MinistryWorkflow
from data import DOMAINS, COUNTRIES
import pandas as pd

## Configuration

Choose which domains to process:

In [2]:
# Option 1: Process specific domains
domains_to_process = [
    "Science & Technology",
    "Foreign Affairs",
    "Defense",
    "Education",
]

# Option 2: Process all available domains (uncomment to use)
# domains_to_process = DOMAINS

# Configuration
output_dir = Path("../outputs")
workers = 4  # Number of parallel workers

print(f"Will process {len(domains_to_process)} domains:")
for i, domain in enumerate(domains_to_process, 1):
    print(f"  {i}. {domain}")
print(f"\nOutput directory: {output_dir}")
print(f"Workers: {workers}")
print(f"\nAvailable domains: {', '.join(DOMAINS)}")

Will process 4 domains:
  1. Science & Technology
  2. Foreign Affairs
  3. Defense
  4. Education

Output directory: ../outputs
Workers: 4

Available domains: Foreign Affairs, Education, Economy, Finance, Commerce, Transportation, Defense, Energy, Science and Technology (and/or Innovation), Communication, Interior/Domestic Affairs, Justice, Trade, Housing, Health, Labor, Culture, Agriculture, Industry


## Run Batch Workflow

This will process all selected domains automatically:

In [4]:
results = await run_batch_workflow(domains_to_process, output_dir, workers=workers)


############################################################
# BATCH MINISTRY WORKFLOW
# Processing 4 domains
# Output directory: ../outputs
############################################################

[1/4] Processing domain: Science & Technology

üöÄ Starting complete workflow for domain: Science & Technology

STEP 1: Collecting organizations for Science & Technology
Found cached response
Using cached response
QueryResponse(full_response={'id': '6ec7de19-542e-40ba-955d-60040d161d98', 'model': 'sonar', 'created': 1763175218, 'usage': {'prompt_tokens': 261, 'completion_tokens': 18, 'total_tokens': 279, 'search_context_size': 'low', 'cost': {'input_tokens_cost': 0.0, 'output_tokens_cost': 0.0, 'request_cost': 0.005, 'total_cost': 0.005}}, 'citations': ['https://edu.gov.by/en-uk/nauka-i-innovatsii/upravlenie-nauki-i-innovatsionnoy-deyatelnosti/', 'https://president.gov.by/en/statebodies/state-committee-for-science-and-technology', 'https://president.gov.by/en/belarus/science', 'http:/

## Alternative: Run Single Domain

If you want to run just one domain (useful for testing or re-running failed domains):

In [5]:
# Run workflow for a single domain
if False:
    single_domain = "" \
    ""

    workflow = MinistryWorkflow(single_domain, output_dir, workers=workers)
    org_df, cyber_df = await workflow.run_complete_workflow()

    print(f"\nResults for {single_domain}:")
    print(f"  Organizations: {len(org_df)}")
    print(f"  Assessments: {len(cyber_df)}")

## View Results

Load and inspect results for a specific domain:

In [6]:
# Choose a domain to view
domain_to_view = "Science & Technology"

# Load organizations
org_file = output_dir / domain_to_view.lower().replace(" ", "_") / f"organization_names_{domain_to_view.lower().replace(' ', '_')}.csv"
if org_file.exists():
    org_df = pd.read_csv(org_file)
    print(f"Organizations for {domain_to_view}:")
    display(org_df)
else:
    print(f"No data found for {domain_to_view}")

# Load cybersecurity assessments
cyber_file = output_dir / domain_to_view.lower().replace(" ", "_") / f"organization_cyber_{domain_to_view.lower().replace(' ', '_')}.xlsx"
if cyber_file.exists():
    cyber_df = pd.read_excel(cyber_file)
    print(f"\nCybersecurity assessments for {domain_to_view}:")
    display(cyber_df)
else:
    print(f"No cyber assessment data found for {domain_to_view}")

No data found for Science & Technology
No cyber assessment data found for Science & Technology


## Summary Statistics

Get a summary of all processed domains:

In [None]:
summary = []
for domain in domains_to_process:
    domain_slug = domain.lower().replace(" ", "_").replace("/", "_")
    org_file = output_dir / domain_slug / f"organization_names_{domain_slug}.csv"
    cyber_file = output_dir / domain_slug / f"organization_cyber_{domain_slug}.xlsx"
    
    if org_file.exists() and cyber_file.exists():
        org_df = pd.read_csv(org_file)
        cyber_df = pd.read_excel(cyber_file)
        summary.append({
            "Domain": domain,
            "Organizations": len(org_df),
            "Assessments": len(cyber_df),
            "Status": "‚úì Complete"
        })
    else:
        summary.append({
            "Domain": domain,
            "Organizations": 0,
            "Assessments": 0,
            "Status": "‚úó Incomplete"
        })

summary_df = pd.DataFrame(summary)
display(summary_df)