# Batch Ministry Workflow

This notebook automates the collection of organization names and cybersecurity responsibility assessments across multiple ministry types.

## Setup

Follow [these instructions](https://docs.astral.sh/uv/getting-started/installation/) to install `uv`. Then run the following command to set up your environment:

```bash
uv sync --quiet
```

In [None]:
import sys
from pathlib import Path
sys.path.append(str(Path("../").resolve()))

from scripts.batch_ministry_workflow import run_batch_workflow, MinistryWorkflow
from data import DOMAINS, COUNTRIES
import pandas as pd

## Configuration

Choose which domains to process:

In [None]:
# Option 1: Process specific domains
domains_to_process = [
    "Defense",
    "Health",
    "Finance",
]

# Option 2: Process all available domains (uncomment to use)
# domains_to_process = DOMAINS

# Configuration
output_dir = Path("../outputs")
workers = 4  # Number of parallel workers

print(f"Will process {len(domains_to_process)} domains:")
for i, domain in enumerate(domains_to_process, 1):
    print(f"  {i}. {domain}")
print(f"\nOutput directory: {output_dir}")
print(f"Workers: {workers}")
print(f"\nAvailable domains: {', '.join(DOMAINS)}")

## Run Batch Workflow

This will process all selected domains automatically:

In [None]:
results = await run_batch_workflow(domains_to_process, output_dir, workers=workers)

## Alternative: Run Single Domain

If you want to run just one domain (useful for testing or re-running failed domains):

In [None]:
# Run workflow for a single domain
single_domain = "Justice"

workflow = MinistryWorkflow(single_domain, output_dir, workers=workers)
org_df, cyber_df = await workflow.run_complete_workflow()

print(f"\nResults for {single_domain}:")
print(f"  Organizations: {len(org_df)}")
print(f"  Assessments: {len(cyber_df)}")

## View Results

Load and inspect results for a specific domain:

In [None]:
# Choose a domain to view
domain_to_view = "Justice"

# Load organizations
org_file = output_dir / domain_to_view.lower().replace(" ", "_") / f"organization_names_{domain_to_view.lower().replace(' ', '_')}.csv"
if org_file.exists():
    org_df = pd.read_csv(org_file)
    print(f"Organizations for {domain_to_view}:")
    display(org_df)
else:
    print(f"No data found for {domain_to_view}")

# Load cybersecurity assessments
cyber_file = output_dir / domain_to_view.lower().replace(" ", "_") / f"organization_cyber_{domain_to_view.lower().replace(' ', '_')}.xlsx"
if cyber_file.exists():
    cyber_df = pd.read_excel(cyber_file)
    print(f"\nCybersecurity assessments for {domain_to_view}:")
    display(cyber_df)
else:
    print(f"No cyber assessment data found for {domain_to_view}")

## Summary Statistics

Get a summary of all processed domains:

In [None]:
summary = []
for domain in domains_to_process:
    domain_slug = domain.lower().replace(" ", "_").replace("/", "_")
    org_file = output_dir / domain_slug / f"organization_names_{domain_slug}.csv"
    cyber_file = output_dir / domain_slug / f"organization_cyber_{domain_slug}.xlsx"
    
    if org_file.exists() and cyber_file.exists():
        org_df = pd.read_csv(org_file)
        cyber_df = pd.read_excel(cyber_file)
        summary.append({
            "Domain": domain,
            "Organizations": len(org_df),
            "Assessments": len(cyber_df),
            "Status": "✓ Complete"
        })
    else:
        summary.append({
            "Domain": domain,
            "Organizations": 0,
            "Assessments": 0,
            "Status": "✗ Incomplete"
        })

summary_df = pd.DataFrame(summary)
display(summary_df)