Skip to content

Janitor

ssd edited this page Feb 24, 2026 · 1 revision

Janitor

The Janitor is one of four AI-powered tools in Alfred for managing an Obsidian vault. It periodically scans every file in the vault for structural problems and automatically fixes issues like broken wikilinks, missing frontmatter fields, and orphaned files.

Overview

Janitor maintains vault health by detecting and repairing structural problems across all vault records. It runs periodic sweeps with a multi-stage pipeline that combines deterministic Python fixes with targeted LLM calls for complex repairs.

Key capabilities:

  • Detects broken wikilinks, invalid frontmatter, orphaned files, and stub records
  • Applies deterministic fixes for common structural issues
  • Uses LLM calls for link disambiguation and stub enrichment
  • Supports both light (structural-only) and deep (full agent) sweep modes
  • Respects vault scope rules (can edit and delete, but not create new records)

Issue Detection

Janitor scans for the following issue types:

Issue Code Description
FM001 Missing required frontmatter fields (type, status, name/title)
FM002 Invalid type (not in KNOWN_TYPES)
FM003 Invalid status for the record type
FM004 Wrong field types (string provided where list expected, or vice versa)
LK001 Broken wikilinks (target file doesn't exist in vault)
ORPHAN Files with no incoming wikilinks from other records

3-Stage Fix Pipeline

Janitor uses a three-stage pipeline to repair issues, progressing from fast deterministic fixes to targeted LLM interventions.

Stage 1: Autofix (Pure Python)

Applies deterministic fixes for frontmatter issues (FM001-FM004) without any LLM calls.

What it fixes:

  • Infers record type from directory placement (e.g., file in person/ gets type: person)
  • Infers name/title from filename if missing
  • Fixes field type mismatches (converts strings to lists, lists to strings as needed)
  • Repairs invalid status values to valid ones for the record type
  • Adds missing required fields with sensible defaults

Output: Reports counts of fixed/flagged/skipped issues.

Stage 2: Link Repair (LLM, per-file)

For each file with broken wikilinks, makes a focused LLM call to disambiguate and repair the links.

Process:

  • Provides the file content and the broken link text
  • Includes a list of existing vault records as candidate targets
  • LLM suggests the correct target for each broken link
  • Applies fixes via alfred vault edit

Example: A broken link [[John]] might be resolved to [[person/John Smith]] or [[person/John Doe]] based on context.

Stage 3: Stub Enrichment (LLM, per-file)

For stub records (files with minimal body content), makes an LLM call to enrich them using existing vault context.

Guidelines:

  • Only adds verifiable facts from existing vault records
  • Expands relationships using existing wikilinks
  • Does NOT generate speculative or filler content
  • Preserves the record's original purpose and scope

Sweep Modes

Light Sweep (Structural-Only)

Fast scan that runs autofix (Stage 1) only. No LLM calls are made.

Use when:

  • You want frequent health checks without API costs
  • Frontmatter issues are the main concern
  • Agent backend is not configured

Configuration:

janitor:
  interval: 300  # seconds between light sweeps
  structural_only: true

Deep Sweep (Full Pipeline)

Runs all three stages, including LLM-powered link repair and stub enrichment.

Use when:

  • You have broken links that need disambiguation
  • Stub records need enrichment
  • Agent backend is configured and available

Configuration:

janitor:
  interval: 300  # light sweep interval
  deep_interval_hours: 24  # deep sweep interval
  structural_only: false

Configuration

Janitor configuration lives in the janitor section of config.yaml:

janitor:
  # Scan interval for light sweeps (seconds)
  interval: 300

  # Deep sweep interval (hours)
  deep_interval_hours: 24

  # Whether to apply fixes or just report issues
  fix_mode: true

  # Skip LLM stages (Stage 2 & 3), run autofix only
  structural_only: false

Global Agent Configuration

Janitor uses the global agent section for backend selection:

agent:
  backend: claude  # or 'openclaw', 'zo'

  claude:
    default_model: claude-opus-4-6

  openclaw:
    agent_id: vault-janitor
    stagger_startup_seconds: 10

  zo:
    api_key: ${ZO_API_KEY}
    model: anthropic/claude-opus-4-6

CLI Commands

One-Shot Scan

Run a single structural scan and print a report (no fixes applied):

alfred janitor scan

Output: Lists all detected issues with file paths and issue codes.

One-Shot Fix

Run a single scan and apply fixes:

alfred janitor fix

Behavior: Runs the full pipeline (all 3 stages) if agent is configured, or just autofix (Stage 1) if structural_only: true.

Watch Daemon

Run periodic sweeps as a foreground daemon:

alfred janitor watch

Behavior: Runs light sweeps at interval seconds, and deep sweeps at deep_interval_hours hours (if configured).

Background Daemon

Start Janitor as a background process:

alfred up --only janitor

Check status:

alfred status

Stop daemon:

alfred down

Backend Support

OpenClaw (Recommended for Pipeline Mode)

The 3-stage pipeline mode was designed for OpenClaw and works best with its agent architecture.

Setup:

  1. Register a vault-janitor agent in OpenClaw
  2. Set the agent's workspace to include vault schema files
  3. Configure janitor.structural_only: false to enable all stages

Concurrency: OpenClaw requires concurrency: 1 due to session locking.

Claude Code

Uses a single-call legacy approach: all issues for a sweep are sent to Claude in one agent invocation.

Tradeoffs: Less granular than pipeline mode, but works well for small vaults.

Zo Computer

Uses a single-call legacy approach with snapshot/diff fallback for mutation tracking.

Tradeoffs: No per-file pipeline, but good for HTTP-based agent workflows.

State & Logging

State File

Janitor maintains state in data/janitor_state.json:

{
  "processed_hashes": {},
  "last_sweep": "2026-02-23T10:30:00Z",
  "last_deep_sweep": "2026-02-22T08:00:00Z",
  "sweep_count": 42
}

Purpose: Tracks sweep history and timing. Can be deleted to force a fresh sweep.

Log Files

  • Tool log: data/janitor.log — daemon activity, scan results, error messages
  • Audit log: data/vault_audit.log — append-only JSONL of every vault mutation

Mutation Tracking

For CLI backends (Claude, OpenClaw), changes are tracked via session-scoped JSONL files:

{"op": "edit", "path": "person/John Smith.md", "fields_changed": ["status", "tags"]}
{"op": "edit", "path": "project/Alpha.md", "fields_changed": ["related"]}

Location: vault/.mutations/{session-id}.jsonl

Vault Scope Rules

Janitor operates under the janitor scope, which allows:

  • Edit: Modify frontmatter and body content
  • Delete: Remove orphaned or invalid files
  • Move: Rename files (via Obsidian CLI if available)

Restricted:

  • Create: Cannot create new records (use Curator for that)

See src/alfred/vault/scope.py for full scope definitions.

Common Workflows

Daily Health Check

Run a light sweep every 5 minutes, deep sweep once per day:

janitor:
  interval: 300
  deep_interval_hours: 24
  fix_mode: true
  structural_only: false
alfred up --only janitor

Structural-Only Mode (No LLM)

Fast, free, frequent scans with no API costs:

janitor:
  interval: 60
  structural_only: true
  fix_mode: true
alfred up --only janitor

Manual Fix Run

Scan the vault once and apply fixes interactively:

# Review issues first
alfred janitor scan

# Apply fixes
alfred janitor fix

Troubleshooting

"No issues detected" but vault has problems

Check:

  • Ensure records are in correct directories (type must match directory)
  • Verify KNOWN_TYPES includes the record types in your vault
  • Check data/janitor.log for scan errors

Link repairs are incorrect

Fix:

  • Stage 2 relies on LLM understanding of context
  • Ensure the agent has access to vault/CLAUDE.md (schema documentation)
  • For OpenClaw, verify the workspace includes vault schema files

Orphan detection flags valid files

Explanation: ORPHAN detection only checks for incoming wikilinks. Hub files (dashboards, indexes) may legitimately have no incoming links.

Solution: Exclude specific files/directories from orphan checks (feature not yet implemented).

Deep sweeps not running

Check:

  • Verify structural_only: false in config
  • Ensure agent backend is configured
  • Check last_deep_sweep timestamp in data/janitor_state.json
  • Review data/janitor.log for agent errors

Related Tools

  • Curator: Processes inbox files into structured vault records
  • Distiller: Extracts latent knowledge from operational records
  • Surveyor: Discovers semantic relationships via embeddings and clustering

See the main Alfred documentation for architecture and setup guides.

Clone this wiki locally