# Janitor

The Janitor is one of four AI-powered tools in Alfred for managing an Obsidian vault. It periodically scans every file in the vault for structural problems and automatically fixes issues like broken wikilinks, missing frontmatter fields, and orphaned files.

## Overview

Janitor maintains vault health by detecting and repairing structural problems across all vault records. It runs periodic sweeps with a multi-stage pipeline that combines deterministic Python fixes with targeted LLM calls for complex repairs.

**Key capabilities:**
- Detects broken wikilinks, invalid frontmatter, orphaned files, and stub records
- Applies deterministic fixes for common structural issues
- Uses LLM calls for link disambiguation and stub enrichment
- Supports both light (structural-only) and deep (full agent) sweep modes
- Respects vault scope rules (can edit and delete, but not create new records)

## Issue Detection

Janitor scans for the following issue types:

| Issue Code | Description |
|------------|-------------|
| **FM001** | Missing required frontmatter fields (type, status, name/title) |
| **FM002** | Invalid type (not in KNOWN_TYPES) |
| **FM003** | Invalid status for the record type |
| **FM004** | Wrong field types (string provided where list expected, or vice versa) |
| **LK001** | Broken wikilinks (target file doesn't exist in vault) |
| **ORPHAN** | Files with no incoming wikilinks from other records |

## 3-Stage Fix Pipeline

Janitor uses a three-stage pipeline to repair issues, progressing from fast deterministic fixes to targeted LLM interventions.

### Stage 1: Autofix (Pure Python)

Applies deterministic fixes for frontmatter issues (FM001-FM004) without any LLM calls.

**What it fixes:**
- Infers record type from directory placement (e.g., file in `person/` gets `type: person`)
- Infers name/title from filename if missing
- Fixes field type mismatches (converts strings to lists, lists to strings as needed)
- Repairs invalid status values to valid ones for the record type
- Adds missing required fields with sensible defaults

**Output:** Reports counts of fixed/flagged/skipped issues.

### Stage 2: Link Repair (LLM, per-file)

For each file with broken wikilinks, makes a focused LLM call to disambiguate and repair the links.

**Process:**
- Provides the file content and the broken link text
- Includes a list of existing vault records as candidate targets
- LLM suggests the correct target for each broken link
- Applies fixes via `alfred vault edit`

**Example:** A broken link `[[John]]` might be resolved to `[[person/John Smith]]` or `[[person/John Doe]]` based on context.

### Stage 3: Stub Enrichment (LLM, per-file)

For stub records (files with minimal body content), makes an LLM call to enrich them using existing vault context.

**Guidelines:**
- Only adds verifiable facts from existing vault records
- Expands relationships using existing wikilinks
- Does NOT generate speculative or filler content
- Preserves the record's original purpose and scope

## Sweep Modes

### Light Sweep (Structural-Only)

Fast scan that runs autofix (Stage 1) only. No LLM calls are made.

**Use when:**
- You want frequent health checks without API costs
- Frontmatter issues are the main concern
- Agent backend is not configured

**Configuration:**
```yaml
janitor:
  interval: 300  # seconds between light sweeps
  structural_only: true
```

### Deep Sweep (Full Pipeline)

Runs all three stages, including LLM-powered link repair and stub enrichment.

**Use when:**
- You have broken links that need disambiguation
- Stub records need enrichment
- Agent backend is configured and available

**Configuration:**
```yaml
janitor:
  interval: 300  # light sweep interval
  deep_interval_hours: 24  # deep sweep interval
  structural_only: false
```

## Configuration

Janitor configuration lives in the `janitor` section of `config.yaml`:

```yaml
janitor:
  # Scan interval for light sweeps (seconds)
  interval: 300

  # Deep sweep interval (hours)
  deep_interval_hours: 24

  # Whether to apply fixes or just report issues
  fix_mode: true

  # Skip LLM stages (Stage 2 & 3), run autofix only
  structural_only: false
```

### Global Agent Configuration

Janitor uses the global `agent` section for backend selection:

```yaml
agent:
  backend: claude  # or 'openclaw', 'zo'

  claude:
    default_model: claude-opus-4-6

  openclaw:
    agent_id: vault-janitor
    stagger_startup_seconds: 10

  zo:
    api_key: ${ZO_API_KEY}
    model: anthropic/claude-opus-4-6
```

## CLI Commands

### One-Shot Scan

Run a single structural scan and print a report (no fixes applied):

```bash
alfred janitor scan
```

**Output:** Lists all detected issues with file paths and issue codes.

### One-Shot Fix

Run a single scan and apply fixes:

```bash
alfred janitor fix
```

**Behavior:** Runs the full pipeline (all 3 stages) if agent is configured, or just autofix (Stage 1) if `structural_only: true`.

### Watch Daemon

Run periodic sweeps as a foreground daemon:

```bash
alfred janitor watch
```

**Behavior:** Runs light sweeps at `interval` seconds, and deep sweeps at `deep_interval_hours` hours (if configured).

### Background Daemon

Start Janitor as a background process:

```bash
alfred up --only janitor
```

Check status:

```bash
alfred status
```

Stop daemon:

```bash
alfred down
```

## Backend Support

### OpenClaw (Recommended for Pipeline Mode)

The 3-stage pipeline mode was designed for OpenClaw and works best with its agent architecture.

**Setup:**
1. Register a `vault-janitor` agent in OpenClaw
2. Set the agent's workspace to include vault schema files
3. Configure `janitor.structural_only: false` to enable all stages

**Concurrency:** OpenClaw requires `concurrency: 1` due to session locking.

### Claude Code

Uses a single-call legacy approach: all issues for a sweep are sent to Claude in one agent invocation.

**Tradeoffs:** Less granular than pipeline mode, but works well for small vaults.

### Zo Computer

Uses a single-call legacy approach with snapshot/diff fallback for mutation tracking.

**Tradeoffs:** No per-file pipeline, but good for HTTP-based agent workflows.

## State & Logging

### State File

Janitor maintains state in `data/janitor_state.json`:

```json
{
  "processed_hashes": {},
  "last_sweep": "2026-02-23T10:30:00Z",
  "last_deep_sweep": "2026-02-22T08:00:00Z",
  "sweep_count": 42
}
```

**Purpose:** Tracks sweep history and timing. Can be deleted to force a fresh sweep.

### Log Files

- **Tool log:** `data/janitor.log` — daemon activity, scan results, error messages
- **Audit log:** `data/vault_audit.log` — append-only JSONL of every vault mutation

### Mutation Tracking

For CLI backends (Claude, OpenClaw), changes are tracked via session-scoped JSONL files:

```jsonl
{"op": "edit", "path": "person/John Smith.md", "fields_changed": ["status", "tags"]}
{"op": "edit", "path": "project/Alpha.md", "fields_changed": ["related"]}
```

**Location:** `vault/.mutations/{session-id}.jsonl`

## Vault Scope Rules

Janitor operates under the `janitor` scope, which allows:

- **Edit:** Modify frontmatter and body content
- **Delete:** Remove orphaned or invalid files
- **Move:** Rename files (via Obsidian CLI if available)

**Restricted:**
- **Create:** Cannot create new records (use Curator for that)

See `src/alfred/vault/scope.py` for full scope definitions.

## Common Workflows

### Daily Health Check

Run a light sweep every 5 minutes, deep sweep once per day:

```yaml
janitor:
  interval: 300
  deep_interval_hours: 24
  fix_mode: true
  structural_only: false
```

```bash
alfred up --only janitor
```

### Structural-Only Mode (No LLM)

Fast, free, frequent scans with no API costs:

```yaml
janitor:
  interval: 60
  structural_only: true
  fix_mode: true
```

```bash
alfred up --only janitor
```

### Manual Fix Run

Scan the vault once and apply fixes interactively:

```bash
# Review issues first
alfred janitor scan

# Apply fixes
alfred janitor fix
```

## Troubleshooting

### "No issues detected" but vault has problems

**Check:**
- Ensure records are in correct directories (type must match directory)
- Verify `KNOWN_TYPES` includes the record types in your vault
- Check `data/janitor.log` for scan errors

### Link repairs are incorrect

**Fix:**
- Stage 2 relies on LLM understanding of context
- Ensure the agent has access to `vault/CLAUDE.md` (schema documentation)
- For OpenClaw, verify the workspace includes vault schema files

### Orphan detection flags valid files

**Explanation:** ORPHAN detection only checks for incoming wikilinks. Hub files (dashboards, indexes) may legitimately have no incoming links.

**Solution:** Exclude specific files/directories from orphan checks (feature not yet implemented).

### Deep sweeps not running

**Check:**
- Verify `structural_only: false` in config
- Ensure agent backend is configured
- Check `last_deep_sweep` timestamp in `data/janitor_state.json`
- Review `data/janitor.log` for agent errors

## Related Tools

- **Curator:** Processes inbox files into structured vault records
- **Distiller:** Extracts latent knowledge from operational records
- **Surveyor:** Discovers semantic relationships via embeddings and clustering

See the main Alfred documentation for architecture and setup guides.