Skip to content

m83iyer/openclaw

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 

Repository files navigation

OpenClaw — Personal AI Operations Platform

I built a self-hosted AI agent on a Mac Mini that runs 8 automated pipelines, routes across 4 LLM tiers, and delivers everything important directly to my WhatsApp. Here's exactly how it works.

Platform Python License Status


What Is This?

OpenClaw is a self-hosted AI automation platform that runs as a persistent daemon on a Mac Mini. It coordinates multiple intelligence pipelines — fetching news, scanning jobs, tracking expenses, delivering morning briefs — all routed through a multi-model AI backbone.

The core principle: Manoj doesn't check dashboards. Everything important arrives on WhatsApp.

No SaaS. No vendor lock-in. ~$15–30/month in API credits. Runs 24/7 on hardware you already own.

This repo is a full technical walkthrough — architecture, design decisions, the Python scripts that power it, and the model routing strategy that keeps costs predictable. If you want to build something similar, this is the blueprint.


What It Actually Does

Every day, OpenClaw runs 8 automated pipelines without any manual input:

Time (Dubai) Pipeline What Arrives on WhatsApp
07:00 AM News Brief Top headlines from BBC, Al Jazeera, Guardian, DW — Middle East focus
08:00 AM Job Intelligence Scored list of relevant job postings from LinkedIn (no auth)
10:00 AM Reddit Digest Curated posts from 5 rotating topic buckets (AI, UAE, Finance, Marketing, Lifestyle)
02:00 PM HN Signal Top Hacker News stories filtered for relevance
08:00 PM Daily Spend Check Quick expense status from Google Sheet
09:07 PM Brain Check-in Agent reviews state, logs today, sends status
03:17 AM Nightly Healer Auto-detects and fixes common failure patterns
06:30 AM Morning Brief Plain-English summary of what the healer fixed overnight

Plus: IPL match alerts, weekly expense summaries, job follow-up reminders, and a weekly job market pulse. All to WhatsApp.


Architecture

┌────────────────────────────────────────────────────────┐
│                    OpenClaw Gateway                     │
│           (persistent daemon via launchd)               │
│                                                         │
│   ┌──────────────┐  ┌────────────┐  ┌───────────────┐  │
│   │ Cron Engine  │  │   Agent    │  │ Model Router  │  │
│   │              │  │  (main)    │  │  4-tier stack │  │
│   └──────┬───────┘  └─────┬──────┘  └──────┬────────┘  │
│          │                │                │            │
│   ┌──────▼────────────────▼────────────────▼──────────┐ │
│   │                  Execution Layer                   │ │
│   │   Python scripts  │  Shell scripts  │  APIs       │ │
│   └──────────────────────────┬─────────────────────── ┘ │
│                               │                          │
│   ┌───────────────────────────▼──────────────────────┐  │
│   │                  Delivery Layer                   │  │
│   │       WhatsApp               Brain (Sheet)        │  │
│   └───────────────────────────────────────────────────┘  │
└────────────────────────────────────────────────────────┘
              │                          │
              ▼                          ▼
        Your Phone                Google Sheet
        (WhatsApp)              (source of truth)

How Every Pipeline Works

The pattern is always the same — and it's deliberate:

Cron fires → Python script (cheap fetch) → LLM (smart summary) → WhatsApp

Example: Reddit Digest

  1. Cron fires at 10:00 AM → triggers daily-reddit-bucket-digest-10am
  2. Agent calls reddit_fetch.py --topic "AI news" --limit 5
  3. Script returns raw JSON (posts, scores, top comments) — no AI cost here
  4. GPT-5.4 ranks by relevance, writes a tight summary
  5. Delivery via openclaw message send --channel whatsapp
  6. Manoj reads a clean 5-point digest on WhatsApp

Scripts handle the deterministic, cheap work. The LLM handles the expensive, creative work. Costs stay predictable.


The Brain (Why I Put AI Memory in a Google Sheet)

This is the most important architectural decision in the system.

The problem: AI agents have context windows. Long-running agents forget things. If your job scanner logs 200 jobs over 2 weeks, that history doesn't survive a restart.

The solution: OpenClaw externalises all durable state to a Google Sheet called The Brain.

┌──────────────────────────────────────────┐
│              The Brain (Google Sheet)    │
│                                          │
│  Tab           Purpose                   │
│  ─────────     ──────────────────────    │
│  Daily_Log     Free-form log entries     │
│  Projects      One row per project       │
│  Tasks         Authoritative task list   │
│  Comments      Two-way check-in loop     │
│  Memory        Key-value facts           │
│  Job_Status    Cron job state mirror     │
│  Archive       Completed items           │
└──────────────────────────────────────────┘

The agent can read and write the sheet via a CLI tool called brain.py:

# Log a thought
brain.py write daily_log --type note --content "Need to renew lease" --tags "admin"

# Check open tasks
brain.py read tasks --status pending

# Check unread comments (from the sheet, added manually)
brain.py read comments --status pending

# Refresh the in-memory snapshot
brain.py snapshot

Why a Google Sheet and not a database?

  • Free (within quota)
  • Human-editable from phone
  • No schema migrations
  • The AI can read it without any special tool — it's just structured text
  • I can add a comment from my iPhone and the agent picks it up at the next check-in

When Brain writes fail (API down, auth expired), writes queue to brain_queue/. Next run retries automatically.


Model Routing — 4-Tier Strategy

This is where most self-hosted AI setups fall down. Running everything through GPT-4 is expensive. Running everything through a cheap model produces garbage.

OpenClaw uses a 4-tier routing strategy that matches model capability to task complexity:

┌─────────────────────────────────────────────────────────────┐
│  Tier   Model                   Used For                     │
│  ─────  ──────────────────────  ─────────────────────────    │
│  1      GPT-5.4 (primary)       Chat, workflows, reasoning   │
│  1b     Gemini 2.5 Pro Preview  GPT fallback (low tokens)    │
│  1.5    Gemini 2.5 Flash        Summaries, digest formatting │
│  2      Gemini 2.5 Flash-Lite   Heartbeat, bulk parsing      │
└─────────────────────────────────────────────────────────────┘

The auto-switch mechanism:

# auto_switch_model.sh runs on cron
# Monitors token window every 15 minutes

if remaining_tokens < 3%:
    switch primary → Gemini 2.5 Flash  # cheap fallback
elif available_tokens > 35%:
    switch primary → GPT-5.4           # recover to best model

# Logs to ~/.openclaw/logs/auto_switch_model.log

Off-minute scheduling — a trick worth knowing:

All cron jobs run at :07, :15, :20, :45 — never :00 or :30. When thousands of users fire cron jobs on the hour, API rate limits spike. Off-minute scheduling means you're in the quiet period. Response times are measurably faster.

Result: ~$15–30/month total API spend for 8 daily pipelines. The expensive model (GPT-5.4) only touches the reasoning steps. Everything else routes to Gemini.


The Python Scripts — Zero-Auth Web Fetching

Every data source in OpenClaw works without paid APIs or OAuth. Here's how:

Reddit — Public JSON Endpoint

Reddit exposes a public JSON API that requires no authentication:

# reddit_fetch.py — core logic
import requests

def fetch_subreddit(subreddit: str, sort: str = "hot", limit: int = 10):
    url = f"https://www.reddit.com/r/{subreddit}/{sort}.json?limit={limit}"
    headers = {"User-Agent": "openclaw-personal/1.0"}
    response = requests.get(url, headers=headers, timeout=10)
    return response.json()["data"]["children"]

No API key. No OAuth. No rate limit registration. Works today.

Hacker News — Firebase API

HN runs on Firebase and has a completely open, documented API:

# hn_fetch.py — core logic
import requests

HN_API = "https://hacker-news.firebaseio.com/v0"

def get_top_stories(limit: int = 30) -> list[dict]:
    ids = requests.get(f"{HN_API}/topstories.json").json()[:limit]
    stories = []
    for story_id in ids:
        item = requests.get(f"{HN_API}/item/{story_id}.json").json()
        stories.append(item)
    return stories

No rate limits. Completely public. Updated in real-time.

LinkedIn Jobs — Guest API

LinkedIn has a guest API that powers the job search page before you log in. It's not documented, but it's stable:

# job_scan.py — core logic
import requests

def search_jobs(keywords: str, location: str = "UAE", days: int = 7):
    url = "https://www.linkedin.com/jobs-guest/jobs/api/seeMoreJobPostings/search"
    params = {
        "keywords": keywords,
        "location": location,
        "f_TPR": f"r{days * 86400}",  # time filter in seconds
        "start": 0,
    }
    headers = {"User-Agent": "Mozilla/5.0"}
    response = requests.get(url, params=params, headers=headers)
    return response.text  # returns HTML, parse with BeautifulSoup

YouTube — Transcript Without yt-dlp

YouTube video transcripts (auto-generated captions) are accessible without yt-dlp or any API key:

# youtube_summarize.py — core logic
import requests, re, json

def get_transcript(video_url: str) -> str:
    video_id = extract_video_id(video_url)
    # YouTube embeds captions data in the page HTML
    page = requests.get(f"https://www.youtube.com/watch?v={video_id}").text
    # Extract captionTracks from the page JSON
    caption_url = extract_caption_url(page)
    captions = requests.get(caption_url).text
    return parse_xml_captions(captions)

No yt-dlp. No API key. No quota.

News — Direct RSS Feeds

Instead of web searching (slow, hallucination-prone), news_brief.py fetches directly from RSS:

# news_brief.py — feed map
FEEDS = {
    "world":       ["http://feeds.bbci.co.uk/news/world/rss.xml",
                    "https://www.theguardian.com/world/rss"],
    "middle_east": ["http://feeds.bbci.co.uk/news/world/middle_east/rss.xml",
                    "https://www.aljazeera.com/xml/rss/all.xml"],
    "india":       ["http://feeds.bbci.co.uk/news/world/south_asia/rss.xml"],
}

Result: 6 curated feeds → 24h window → deduplicated → structured JSON in under 5 seconds. The previous web search approach took 60 seconds and sometimes hallucinated sources.


Nightly Healer — Deterministic Auto-Repair

At 3:17 AM every night, a script called healer.py scans for common failure patterns and fixes them automatically.

Key design decision: no LLM in the fix path.

# healer.py — pattern matching, not AI
PATTERNS = [
    {
        "name": "stale_brain_snapshot",
        "detect": lambda: snapshot_age_hours() > 25,
        "fix": lambda: run("brain.py snapshot"),
        "severity": "low",
    },
    {
        "name": "queued_brain_writes",
        "detect": lambda: len(os.listdir(BRAIN_QUEUE)) > 0,
        "fix": lambda: run("brain.py flush"),
        "severity": "medium",
    },
    {
        "name": "log_bloat",
        "detect": lambda: log_size_mb() > 50,
        "fix": lambda: rotate_logs(),
        "severity": "low",
    },
]

for pattern in PATTERNS:
    if pattern["detect"]():
        pattern["fix"]()
        log_fix(pattern["name"])

Pattern match → fix → verify. No AI hallucinating creative solutions to mundane problems. If a pattern isn't whitelisted, it escalates to "needs-human" — Manoj sees it in the morning brief and decides.

At 6:30 AM, morning_brief.py reads overnight_report.json and formats it into a WhatsApp message:

☀️ Morning Brief — Apr 13

✅ All systems healthy
🔧 1 fix applied: stale brain snapshot (refreshed)
📊 Cron runs: 14/14 successful

Nothing needs your attention today.

Setup Overview

This is not a one-click install — it's a personal system that you adapt to your needs. Here's the shape of it:

Prerequisites

  • Mac Mini (or any always-on Mac/Linux box)
  • Python 3.11+
  • OpenClaw daemon installed (openclaw.ai)
  • Google account (for Brain Sheet)
  • WhatsApp number for delivery
  • API key: OpenAI or Google AI Studio (Gemini)

Directory Structure

~/.openclaw/
├── README.md                   ← You are here
├── openclaw.json               Main configuration
├── exec-approvals.json         Script execution allowlist
│
├── scripts/                    Operational scripts
│   ├── brain.py                Brain Google Sheet CLI
│   ├── healer.py               Nightly auto-fixer
│   ├── morning_brief.py        Morning brief formatter
│   ├── auto_switch_model.sh    Token-based model switcher
│   ├── reddit_fetch.py         Reddit public API fetcher
│   ├── hn_fetch.py             HN Firebase API fetcher
│   ├── job_scan.py             LinkedIn guest API scanner
│   ├── youtube_summarize.py    YouTube transcript extractor
│   └── news_brief.py           RSS-based news fetcher
│
├── workspace/                  Working directory
│   ├── scripts/                Project-specific scripts
│   ├── expense_tracker/        Expense tracker codebase
│   └── [identity docs]         AGENTS.md, SOUL.md, USER.md
│
├── credentials/                API keys and auth (git-ignored)
├── cron/                       Cron job definitions
└── logs/                       Operational logs

The Three Configuration Files That Matter

openclaw.json — the main config. Sets models, agents, tools, delivery channels, and gateway settings.

exec-approvals.json — the security allowlist. Only scripts explicitly listed here can be executed by the agent without asking. Mine is set to full (my choice — default is restricted).

AGENTS.md — the agent's instruction manual. This is where you write the rules the AI follows. Think of it as a system prompt that persists across sessions.

## Tool Lock — YouTube
⚠️ TOOL LOCK: For any YouTube URL, you MUST call youtube_summarize.py.
Do NOT use web search. Do NOT use web fetch.
If youtube_summarize.py fails, send exactly:
"❌ YouTube summary failed: [error]. Try again later."
Stop. Do not attempt alternatives.

Writing explicit tool locks in AGENTS.md prevents the agent from going off-script when tools fail. Without this, agents invent creative (usually wrong) fallbacks.


Key Design Decisions

1. WhatsApp as the Only Output Channel

Dashboards require you to go check them. WhatsApp pushes to you. Reliability beats features — one message that always arrives is worth more than ten that sometimes do.

2. Brain as External Memory

The AI's context window is a sliding window — it forgets. The Google Sheet is permanent. Put everything that needs to survive a restart in the sheet. Keep the context window for reasoning, not storage.

3. Scripts for Fetching, LLM for Thinking

Every API call that doesn't need intelligence runs in Python with no AI involved. The LLM only touches the output formatting step. This makes costs linear and predictable.

4. Deterministic Healer, Not AI Healer

An AI healer would hallucinate fixes. A pattern-matching healer is boring, auditable, and actually works. Save the AI for the parts that need creativity.

5. Off-Minute Cron Scheduling

Run at :07, :15, :20 — never :00 or :30. When millions of cron jobs fire on the hour, API rate limits spike. Off-minute scheduling costs nothing and measurably improves response times.

6. Single Backup Policy

Keep current + one .bak. Historical versions go to archive/. Backup file proliferation is how directories become unmanageable. One backup is enough to recover from a bad edit.


What This Costs

Component Cost
OpenAI (GPT-5.4) ~$10–20/mo (only touches reasoning steps)
Google AI Studio (Gemini) ~$3–8/mo (bulk of the fetch/parse work)
Google Sheets API Free (within quota)
Reddit, HN, LinkedIn, YouTube Free (no API keys)
RSS feeds Free
Mac Mini electricity ~$3–5/mo
Total ~$15–30/month

The model routing is what keeps this number low. Without it, running everything through GPT-5.4 would cost 5–10x more.


What I've Learned

1. The hardest part is not the AI — it's the delivery. Getting a well-formed WhatsApp message to arrive reliably at 7 AM every day, even when something upstream failed, took more engineering than the AI pipelines themselves.

2. External memory is non-negotiable for long-running agents. Without the Brain sheet, the agent would lose state on every restart. With it, it can reconstruct context from the sheet in seconds.

3. Write explicit failure behaviours into your AGENTS.md. When a tool fails, the default AI behaviour is to improvise. Improvisation is usually wrong. Explicit failure rules ("if X fails, send this exact message and stop") make the system predictable.

4. Free APIs are more stable than you think. Reddit's public JSON, HN's Firebase API, YouTube's caption endpoints — these have been stable for years. They're not going away because they serve millions of unauthenticated users. They're safer than OAuth integrations that can expire.

5. Cron scheduling is an underrated skill. Timing, order of operations, idempotency, failure recovery — good cron design is what separates a system that runs for a week from one that runs for a year.


What's Next

  • Fix Morning Brief broken jobs (#27 in Brain)
  • Habit tracker pipeline
  • OpenClaw GitHub Pages — visual architecture showcase
  • Export pipeline configs as reusable templates

For the Reddit Thread

If you're reading this from a Reddit post — here are the specific things I'm happy to go deeper on:

  • How the Brain sheet is structured and why (the tab schema, the CLI design)
  • The full news_brief.py implementation (RSS parsing, deduplication, the feed list)
  • How the job scanner handles pagination and deduplication without auth
  • The full model routing config (the exact tier assignments and why)
  • The AGENTS.md tool lock pattern — how to write instructions that actually stick
  • How to set up launchd on macOS to run a persistent daemon

Drop a comment or open an issue.


License

MIT — use it, adapt it, build on it.


Built incrementally over March–April 2026 through ~30 Claude Code sessions. The system is live and running daily as of Apr 2026.

About

Personal AI Operations Platform — self-hosted agent on Mac Mini with 8 daily pipelines, 4-tier LLM routing, and WhatsApp delivery

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors