OpenClaw — Personal AI Operations Platform

I built a self-hosted AI agent on a Mac Mini that runs 8 automated pipelines, routes across 4 LLM tiers, and delivers everything important directly to my WhatsApp. Here's exactly how it works.

What Is This?

OpenClaw is a self-hosted AI automation platform that runs as a persistent daemon on a Mac Mini. It coordinates multiple intelligence pipelines — fetching news, scanning jobs, tracking expenses, delivering morning briefs — all routed through a multi-model AI backbone.

The core principle: Manoj doesn't check dashboards. Everything important arrives on WhatsApp.

No SaaS. No vendor lock-in. ~$15–30/month in API credits. Runs 24/7 on hardware you already own.

This repo is a full technical walkthrough — architecture, design decisions, the Python scripts that power it, and the model routing strategy that keeps costs predictable. If you want to build something similar, this is the blueprint.

What It Actually Does

Every day, OpenClaw runs 8 automated pipelines without any manual input:

Time (Dubai)	Pipeline	What Arrives on WhatsApp
07:00 AM	News Brief	Top headlines from BBC, Al Jazeera, Guardian, DW — Middle East focus
08:00 AM	Job Intelligence	Scored list of relevant job postings from LinkedIn (no auth)
10:00 AM	Reddit Digest	Curated posts from 5 rotating topic buckets (AI, UAE, Finance, Marketing, Lifestyle)
02:00 PM	HN Signal	Top Hacker News stories filtered for relevance
08:00 PM	Daily Spend Check	Quick expense status from Google Sheet
09:07 PM	Brain Check-in	Agent reviews state, logs today, sends status
03:17 AM	Nightly Healer	Auto-detects and fixes common failure patterns
06:30 AM	Morning Brief	Plain-English summary of what the healer fixed overnight

Plus: IPL match alerts, weekly expense summaries, job follow-up reminders, and a weekly job market pulse. All to WhatsApp.

Architecture

┌────────────────────────────────────────────────────────┐
│                    OpenClaw Gateway                     │
│           (persistent daemon via launchd)               │
│                                                         │
│   ┌──────────────┐  ┌────────────┐  ┌───────────────┐  │
│   │ Cron Engine  │  │   Agent    │  │ Model Router  │  │
│   │              │  │  (main)    │  │  4-tier stack │  │
│   └──────┬───────┘  └─────┬──────┘  └──────┬────────┘  │
│          │                │                │            │
│   ┌──────▼────────────────▼────────────────▼──────────┐ │
│   │                  Execution Layer                   │ │
│   │   Python scripts  │  Shell scripts  │  APIs       │ │
│   └──────────────────────────┬─────────────────────── ┘ │
│                               │                          │
│   ┌───────────────────────────▼──────────────────────┐  │
│   │                  Delivery Layer                   │  │
│   │       WhatsApp               Brain (Sheet)        │  │
│   └───────────────────────────────────────────────────┘  │
└────────────────────────────────────────────────────────┘
              │                          │
              ▼                          ▼
        Your Phone                Google Sheet
        (WhatsApp)              (source of truth)

How Every Pipeline Works

The pattern is always the same — and it's deliberate:

Cron fires → Python script (cheap fetch) → LLM (smart summary) → WhatsApp

Example: Reddit Digest

Cron fires at 10:00 AM → triggers daily-reddit-bucket-digest-10am
Agent calls reddit_fetch.py --topic "AI news" --limit 5
Script returns raw JSON (posts, scores, top comments) — no AI cost here
GPT-5.4 ranks by relevance, writes a tight summary
Delivery via openclaw message send --channel whatsapp
Manoj reads a clean 5-point digest on WhatsApp

Scripts handle the deterministic, cheap work. The LLM handles the expensive, creative work. Costs stay predictable.

The Brain (Why I Put AI Memory in a Google Sheet)

This is the most important architectural decision in the system.

The problem: AI agents have context windows. Long-running agents forget things. If your job scanner logs 200 jobs over 2 weeks, that history doesn't survive a restart.

The solution: OpenClaw externalises all durable state to a Google Sheet called The Brain.

┌──────────────────────────────────────────┐
│              The Brain (Google Sheet)    │
│                                          │
│  Tab           Purpose                   │
│  ─────────     ──────────────────────    │
│  Daily_Log     Free-form log entries     │
│  Projects      One row per project       │
│  Tasks         Authoritative task list   │
│  Comments      Two-way check-in loop     │
│  Memory        Key-value facts           │
│  Job_Status    Cron job state mirror     │
│  Archive       Completed items           │
└──────────────────────────────────────────┘

The agent can read and write the sheet via a CLI tool called brain.py:

# Log a thought
brain.py write daily_log --type note --content "Need to renew lease" --tags "admin"

# Check open tasks
brain.py read tasks --status pending

# Check unread comments (from the sheet, added manually)
brain.py read comments --status pending

# Refresh the in-memory snapshot
brain.py snapshot

Why a Google Sheet and not a database?

Free (within quota)
Human-editable from phone
No schema migrations
The AI can read it without any special tool — it's just structured text
I can add a comment from my iPhone and the agent picks it up at the next check-in

When Brain writes fail (API down, auth expired), writes queue to brain_queue/. Next run retries automatically.

Model Routing — 4-Tier Strategy

This is where most self-hosted AI setups fall down. Running everything through GPT-4 is expensive. Running everything through a cheap model produces garbage.

OpenClaw uses a 4-tier routing strategy that matches model capability to task complexity:

┌─────────────────────────────────────────────────────────────┐
│  Tier   Model                   Used For                     │
│  ─────  ──────────────────────  ─────────────────────────    │
│  1      GPT-5.4 (primary)       Chat, workflows, reasoning   │
│  1b     Gemini 2.5 Pro Preview  GPT fallback (low tokens)    │
│  1.5    Gemini 2.5 Flash        Summaries, digest formatting │
│  2      Gemini 2.5 Flash-Lite   Heartbeat, bulk parsing      │
└─────────────────────────────────────────────────────────────┘

The auto-switch mechanism:

# auto_switch_model.sh runs on cron
# Monitors token window every 15 minutes

if remaining_tokens < 3%:
    switch primary → Gemini 2.5 Flash  # cheap fallback
elif available_tokens > 35%:
    switch primary → GPT-5.4           # recover to best model

# Logs to ~/.openclaw/logs/auto_switch_model.log

Off-minute scheduling — a trick worth knowing:

All cron jobs run at :07, :15, :20, :45 — never :00 or :30. When thousands of users fire cron jobs on the hour, API rate limits spike. Off-minute scheduling means you're in the quiet period. Response times are measurably faster.

Result: ~$15–30/month total API spend for 8 daily pipelines. The expensive model (GPT-5.4) only touches the reasoning steps. Everything else routes to Gemini.

The Python Scripts — Zero-Auth Web Fetching

Every data source in OpenClaw works without paid APIs or OAuth. Here's how:

Reddit — Public JSON Endpoint

Reddit exposes a public JSON API that requires no authentication:

# reddit_fetch.py — core logic
import requests

def fetch_subreddit(subreddit: str, sort: str = "hot", limit: int = 10):
    url = f"https://www.reddit.com/r/{subreddit}/{sort}.json?limit={limit}"
    headers = {"User-Agent": "openclaw-personal/1.0"}
    response = requests.get(url, headers=headers, timeout=10)
    return response.json()["data"]["children"]

No API key. No OAuth. No rate limit registration. Works today.

Hacker News — Firebase API

HN runs on Firebase and has a completely open, documented API:

# hn_fetch.py — core logic
import requests

HN_API = "https://hacker-news.firebaseio.com/v0"

def get_top_stories(limit: int = 30) -> list[dict]:
    ids = requests.get(f"{HN_API}/topstories.json").json()[:limit]
    stories = []
    for story_id in ids:
        item = requests.get(f"{HN_API}/item/{story_id}.json").json()
        stories.append(item)
    return stories

No rate limits. Completely public. Updated in real-time.

LinkedIn Jobs — Guest API

LinkedIn has a guest API that powers the job search page before you log in. It's not documented, but it's stable:

# job_scan.py — core logic
import requests

def search_jobs(keywords: str, location: str = "UAE", days: int = 7):
    url = "https://www.linkedin.com/jobs-guest/jobs/api/seeMoreJobPostings/search"
    params = {
        "keywords": keywords,
        "location": location,
        "f_TPR": f"r{days * 86400}",  # time filter in seconds
        "start": 0,
    }
    headers = {"User-Agent": "Mozilla/5.0"}
    response = requests.get(url, params=params, headers=headers)
    return response.text  # returns HTML, parse with BeautifulSoup

YouTube — Transcript Without yt-dlp

YouTube video transcripts (auto-generated captions) are accessible without yt-dlp or any API key:

# youtube_summarize.py — core logic
import requests, re, json

def get_transcript(video_url: str) -> str:
    video_id = extract_video_id(video_url)
    # YouTube embeds captions data in the page HTML
    page = requests.get(f"https://www.youtube.com/watch?v={video_id}").text
    # Extract captionTracks from the page JSON
    caption_url = extract_caption_url(page)
    captions = requests.get(caption_url).text
    return parse_xml_captions(captions)

No yt-dlp. No API key. No quota.

News — Direct RSS Feeds

Instead of web searching (slow, hallucination-prone), news_brief.py fetches directly from RSS:

# news_brief.py — feed map
FEEDS = {
    "world":       ["http://feeds.bbci.co.uk/news/world/rss.xml",
                    "https://www.theguardian.com/world/rss"],
    "middle_east": ["http://feeds.bbci.co.uk/news/world/middle_east/rss.xml",
                    "https://www.aljazeera.com/xml/rss/all.xml"],
    "india":       ["http://feeds.bbci.co.uk/news/world/south_asia/rss.xml"],
}

Result: 6 curated feeds → 24h window → deduplicated → structured JSON in under 5 seconds. The previous web search approach took 60 seconds and sometimes hallucinated sources.

Nightly Healer — Deterministic Auto-Repair

At 3:17 AM every night, a script called healer.py scans for common failure patterns and fixes them automatically.

Key design decision: no LLM in the fix path.

# healer.py — pattern matching, not AI
PATTERNS = [
    {
        "name": "stale_brain_snapshot",
        "detect": lambda: snapshot_age_hours() > 25,
        "fix": lambda: run("brain.py snapshot"),
        "severity": "low",
    },
    {
        "name": "queued_brain_writes",
        "detect": lambda: len(os.listdir(BRAIN_QUEUE)) > 0,
        "fix": lambda: run("brain.py flush"),
        "severity": "medium",
    },
    {
        "name": "log_bloat",
        "detect": lambda: log_size_mb() > 50,
        "fix": lambda: rotate_logs(),
        "severity": "low",
    },
]

for pattern in PATTERNS:
    if pattern["detect"]():
        pattern["fix"]()
        log_fix(pattern["name"])

Pattern match → fix → verify. No AI hallucinating creative solutions to mundane problems. If a pattern isn't whitelisted, it escalates to "needs-human" — Manoj sees it in the morning brief and decides.

At 6:30 AM, morning_brief.py reads overnight_report.json and formats it into a WhatsApp message:

☀️ Morning Brief — Apr 13

✅ All systems healthy
🔧 1 fix applied: stale brain snapshot (refreshed)
📊 Cron runs: 14/14 successful

Nothing needs your attention today.

Setup Overview

This is not a one-click install — it's a personal system that you adapt to your needs. Here's the shape of it:

Prerequisites

Mac Mini (or any always-on Mac/Linux box)
Python 3.11+
OpenClaw daemon installed (openclaw.ai)
Google account (for Brain Sheet)
WhatsApp number for delivery
API key: OpenAI or Google AI Studio (Gemini)

Directory Structure

~/.openclaw/
├── README.md                   ← You are here
├── openclaw.json               Main configuration
├── exec-approvals.json         Script execution allowlist
│
├── scripts/                    Operational scripts
│   ├── brain.py                Brain Google Sheet CLI
│   ├── healer.py               Nightly auto-fixer
│   ├── morning_brief.py        Morning brief formatter
│   ├── auto_switch_model.sh    Token-based model switcher
│   ├── reddit_fetch.py         Reddit public API fetcher
│   ├── hn_fetch.py             HN Firebase API fetcher
│   ├── job_scan.py             LinkedIn guest API scanner
│   ├── youtube_summarize.py    YouTube transcript extractor
│   └── news_brief.py           RSS-based news fetcher
│
├── workspace/                  Working directory
│   ├── scripts/                Project-specific scripts
│   ├── expense_tracker/        Expense tracker codebase
│   └── [identity docs]         AGENTS.md, SOUL.md, USER.md
│
├── credentials/                API keys and auth (git-ignored)
├── cron/                       Cron job definitions
└── logs/                       Operational logs

The Three Configuration Files That Matter

openclaw.json — the main config. Sets models, agents, tools, delivery channels, and gateway settings.

exec-approvals.json — the security allowlist. Only scripts explicitly listed here can be executed by the agent without asking. Mine is set to full (my choice — default is restricted).

AGENTS.md — the agent's instruction manual. This is where you write the rules the AI follows. Think of it as a system prompt that persists across sessions.

## Tool Lock — YouTube
⚠️ TOOL LOCK: For any YouTube URL, you MUST call youtube_summarize.py.
Do NOT use web search. Do NOT use web fetch.
If youtube_summarize.py fails, send exactly:
"❌ YouTube summary failed: [error]. Try again later."
Stop. Do not attempt alternatives.

Writing explicit tool locks in AGENTS.md prevents the agent from going off-script when tools fail. Without this, agents invent creative (usually wrong) fallbacks.

Key Design Decisions

1. WhatsApp as the Only Output Channel

Dashboards require you to go check them. WhatsApp pushes to you. Reliability beats features — one message that always arrives is worth more than ten that sometimes do.

2. Brain as External Memory

The AI's context window is a sliding window — it forgets. The Google Sheet is permanent. Put everything that needs to survive a restart in the sheet. Keep the context window for reasoning, not storage.

3. Scripts for Fetching, LLM for Thinking

Every API call that doesn't need intelligence runs in Python with no AI involved. The LLM only touches the output formatting step. This makes costs linear and predictable.

4. Deterministic Healer, Not AI Healer

An AI healer would hallucinate fixes. A pattern-matching healer is boring, auditable, and actually works. Save the AI for the parts that need creativity.

5. Off-Minute Cron Scheduling

Run at :07, :15, :20 — never :00 or :30. When millions of cron jobs fire on the hour, API rate limits spike. Off-minute scheduling costs nothing and measurably improves response times.

6. Single Backup Policy

Keep current + one .bak. Historical versions go to archive/. Backup file proliferation is how directories become unmanageable. One backup is enough to recover from a bad edit.

What This Costs

Component	Cost
OpenAI (GPT-5.4)	~$10–20/mo (only touches reasoning steps)
Google AI Studio (Gemini)	~$3–8/mo (bulk of the fetch/parse work)
Google Sheets API	Free (within quota)
Reddit, HN, LinkedIn, YouTube	Free (no API keys)
RSS feeds	Free
Mac Mini electricity	~$3–5/mo
Total	~$15–30/month

The model routing is what keeps this number low. Without it, running everything through GPT-5.4 would cost 5–10x more.

What I've Learned

1. The hardest part is not the AI — it's the delivery. Getting a well-formed WhatsApp message to arrive reliably at 7 AM every day, even when something upstream failed, took more engineering than the AI pipelines themselves.

2. External memory is non-negotiable for long-running agents. Without the Brain sheet, the agent would lose state on every restart. With it, it can reconstruct context from the sheet in seconds.

3. Write explicit failure behaviours into your AGENTS.md. When a tool fails, the default AI behaviour is to improvise. Improvisation is usually wrong. Explicit failure rules ("if X fails, send this exact message and stop") make the system predictable.

4. Free APIs are more stable than you think. Reddit's public JSON, HN's Firebase API, YouTube's caption endpoints — these have been stable for years. They're not going away because they serve millions of unauthenticated users. They're safer than OAuth integrations that can expire.

5. Cron scheduling is an underrated skill. Timing, order of operations, idempotency, failure recovery — good cron design is what separates a system that runs for a week from one that runs for a year.

What's Next

Fix Morning Brief broken jobs (#27 in Brain)
Habit tracker pipeline
OpenClaw GitHub Pages — visual architecture showcase
Export pipeline configs as reusable templates

For the Reddit Thread

If you're reading this from a Reddit post — here are the specific things I'm happy to go deeper on:

How the Brain sheet is structured and why (the tab schema, the CLI design)
The full news_brief.py implementation (RSS parsing, deduplication, the feed list)
How the job scanner handles pagination and deduplication without auth
The full model routing config (the exact tier assignments and why)
The AGENTS.md tool lock pattern — how to write instructions that actually stick
How to set up launchd on macOS to run a persistent daemon

Drop a comment or open an issue.

License

MIT — use it, adapt it, build on it.

Built incrementally over March–April 2026 through ~30 Claude Code sessions. The system is live and running daily as of Apr 2026.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

OpenClaw — Personal AI Operations Platform

What Is This?

What It Actually Does

Architecture

How Every Pipeline Works

The Brain (Why I Put AI Memory in a Google Sheet)

Model Routing — 4-Tier Strategy

The Python Scripts — Zero-Auth Web Fetching

Reddit — Public JSON Endpoint

Hacker News — Firebase API

LinkedIn Jobs — Guest API

YouTube — Transcript Without yt-dlp

News — Direct RSS Feeds

Nightly Healer — Deterministic Auto-Repair

Setup Overview

Prerequisites

Directory Structure

The Three Configuration Files That Matter

Key Design Decisions

1. WhatsApp as the Only Output Channel

2. Brain as External Memory

3. Scripts for Fetching, LLM for Thinking

4. Deterministic Healer, Not AI Healer

5. Off-Minute Cron Scheduling

6. Single Backup Policy

What This Costs

What I've Learned

What's Next

For the Reddit Thread

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages