Skip to content

feat(cloud): entrypoint.zig — Pure Zig agent entrypoint + Telegram live UX + P0 hardening #325

@gHashTag

Description

@gHashTag

Goal

Заменить agent-entrypoint.sh (850 строк bash) на чистый Zig binary.
Добавить Telegram live UX (edit messages, progress bar).
Закрыть 10 P0 уязвимостей.

Part 1: entrypoint.zig rewrite

Заменяет: deploy/agent-entrypoint.sh → src/cloud/entrypoint.zig

Pipeline шаги (как функции Zig, не bash):

Step Функция Bash equivalent
1. Init init() lines 1-50: env vars, trace_id
2. Health healthCheck() circuit breaker z.ai
3. Fetch gitFetch() lines 446-463: git clone --depth=1
4. Context buildContext() lines 300-400: read issue, SOUL.md
5. Code runClaude() lines 500-650: claude CLI invoke
6. Build zigBuild() lines 680-723: zig build + repair loop 3x
7. Test zigTest() lines 682-699: zig build test (parallel)
8. Push gitPush() lines 730-780: commit + push
9. PR createPR() lines 780-830: gh pr create
10. Report report() emit events, Telegram notify

Structured error handling:

const PipelineError = error{
    ZaiUnreachable,      // circuit breaker tripped
    GitCloneFailed,      // network/auth issue
    ClaudeTimeout,       // claude CLI timeout
    BuildFailed,         // after 3 repair attempts
    TestFailed,          // zig build test failed
    PushRejected,        // git push conflict
    PrCreateFailed,      // gh API error
};

fn runPipeline(config: Config) PipelineError!void {
    const trace = TraceId.generate(config.issue);
    try init(config, trace);
    try healthCheck(config);
    try gitFetch(config);
    const context = try buildContext(config);
    try runClaude(context);
    try zigBuild(config, trace);  // 3x repair loop inside
    try zigTest(config);
    try gitPush(config);
    try createPR(config, context);
    try report(config, trace);
}

Part 2: Telegram Live UX

Вместо спама отдельными сообщениями — один message, edit in-place:

🤖 Agent #318 — OHEM hard example mining
━━━━━━━━━━━━━━━━━━━━
✅ Init          0.2s
✅ Health check   1.1s
✅ Git fetch      4.3s
✅ Context built  0.5s
🔄 Coding...     47s ▰▰▰▰▰▱▱▱▱▱ 50%
⏳ Build
⏳ Test
⏳ Push
⏳ PR
━━━━━━━━━━━━━━━━━━━━
⏱️ Elapsed: 53s | ETA: ~60s

Реализация:

  • Первый вызов: sendMessage() → сохранить message_id
  • Каждый шаг: editMessageText(message_id, updated_text)
  • Progress bar: ▰▰▰▰▰▱▱▱▱▱ на шаге Coding (по таймауту)
  • Финал: ✅ все шаги + ссылка на PR
  • Ошибка: 🔴 красная строка + error details

Telegram API:

const TelegramUX = struct {
    bot_token: []const u8,
    chat_id: []const u8,
    message_id: ?i64 = null,

    fn send(self: *TelegramUX, text: []const u8) !void {
        // POST /sendMessage → save message_id
    }

    fn update(self: *TelegramUX, text: []const u8) !void {
        // POST /editMessageText with self.message_id
    }

    fn stepComplete(self: *TelegramUX, step: Step, duration_ms: u64) !void {
        // Update progress display
    }
};

Part 3: P0 Hardening (10 vulnerabilities)

# Vulnerability Fix
1 No timeout on claude CLI timeout 600s, kill after 630s
2 No timeout on git operations timeout 120s per git command
3 Unbounded retry loops Max 3 retries with exponential backoff
4 No self-review enforcement After coding: claude -p "Review your changes for bugs"
5 Secrets in error logs Sanitize: strip ANTHROPIC_API_KEY, GITHUB_TOKEN from stderr
6 No disk space check Check df before start, abort if <500MB
7 No memory limit Set process memory limit, OOM → clean exit
8 Zombie processes Proper child process cleanup with defer
9 Race on concurrent git push Retry push with rebase 3x
10 No max file size for edits Reject claude edits >100KB per file

Files

NEW:

  • src/cloud/entrypoint.zig (main pipeline)
  • src/cloud/telegram_ux.zig (live message editing)
  • src/cloud/health.zig (circuit breaker, disk/memory checks)
  • src/cloud/git.zig (git operations with timeouts)
  • src/cloud/claude.zig (claude CLI wrapper with timeout + sanitization)

MODIFY:

  • build.zig (add cloud-entrypoint binary)
  • deploy/Dockerfile.agent (ENTRYPOINT → zig-out/bin/cloud-entrypoint)
  • deploy/agent-entrypoint.sh (deprecate, keep as --legacy flag)

Verification

  1. zig build cloud-entrypoint — компилируется
  2. Dry run: cloud-entrypoint --issue=TEST --dry-run → все шаги pass без side effects
  3. Telegram: один message, обновляется в реальном времени
  4. Timeout: claude CLI убивается через 600s
  5. Secrets: grep ANTHROPIC_API_KEY в логах → 0 results
  6. Self-review: после coding всегда запускается review step
  7. Repair loop: сломать код → 3 попытки починить → видно в Telegram
  8. Legacy fallback: --legacy запускает старый bash entrypoint

Migration

  1. Build entrypoint.zig, deploy alongside bash
  2. Test на 3-5 issues с --dry-run
  3. Switch Dockerfile ENTRYPOINT
  4. Monitor через Agent MU (feat(mu): Agent MU watchdog daemon — self-healing swarm monitor #323)
  5. После 20 успешных PR → удалить agent-entrypoint.sh

Priority

P0 — заменяет 850 строк bash на типобезопасный Zig.
Telegram UX — killer feature для мониторинга.
P0 hardening — закрывает реальные уязвимости.

Metadata

Metadata

Assignees

No one assigned

    Labels

    agent:spawnAuto-spawn agent containerenhancementNew feature or request

    Projects

    Status

    No status

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions