Skip to content

parsewave/harbor-bot

Repository files navigation

Harbor Bot

A GitHub bot for Harbor via /bot check command.

Features

  • Polling-based monitoring of whitelisted repos for /bot check comments on PRs
  • Comprehensive validation: basic checks, AI detection, Oracle/Nop agents, similarity checking, TB Check/Run/Debug
  • PostgreSQL job queue with multi-instance deployment support
  • Formatted results posted directly to pull requests
  • Modular protocol-based check system

Quick Start

Docker (Recommended)

cp .env.example .env
# Edit .env with your credentials

docker compose up -d
docker compose logs -f bot

Local Development

Prerequisites: Python 3.13+, PostgreSQL

uv sync
cp .env.example .env
# Edit .env with your credentials

uv run harbor-bot
uv run harbor-bot --dry-run   # Don't post to GitHub
uv run harbor-bot --debug     # Enable debug logging

Configuration

Edit config.yaml to configure whitelisted repositories, polling interval, check thresholds, and model settings.

Check Workflow

The bot runs checks in phases, with fail-fast behavior for blocking checks.

Phase 1: BASIC

  • Required Files - validates task.toml, instruction.md, Dockerfile, solve.sh, test.sh, test_outputs.py
  • Test Script Format - validates test.sh structure
  • Instruction Length (warning) - checks instruction.md is not too short
  • Solution Length (warning) - checks solve.sh is not too short
  • Tests Length (warning) - checks test.sh is not too short
  • Task Size (warning) - checks total task size

Phase 2: AI_DETECTION

  • AI Detection - uses GPTZero API to detect AI-generated content (threshold: 70%)

Phase 3: SANITY

  • Oracle Agent - must achieve 100% accuracy (proves task is solvable)
  • Nop Agent - must achieve 0% accuracy (proves tests aren't trivially passing)

Phase 4: SIMILARITY

  • Similarity Check - TF-IDF + cosine similarity against existing TB2 tasks (threshold: 80%)

Phase 5: VALIDATION

  • TB Check - runs harbor tasks check to validate task structure
  • TB Run Small - informational run with smaller model
  • TB Run Large - main solvability check with larger model

Phase 6: DEBUG

  • TB Debug - runs harbor tasks debug to analyze failures (only if there are failed trials)

Environment Variables

# Required
GITHUB_TOKEN=<personal_access_token>
OPENROUTER_API_KEY=<openrouter_api_key>

# Database
DB_HOST=localhost
DB_PORT=5432
DB_NAME=github_bot
DB_USER=postgres
DB_PASSWORD=<password>

# Optional
GPTZERO_API_KEY=<gptzero_api_key>
DISCORD_WEBHOOK_URL=<webhook_url>

Development

uv run pytest tests/ -v      # Run tests
uvx ruff check harbor_bot/       # Lint
uvx ruff format harbor_bot/      # Format

Adding a New Check

  1. Create a new file in the appropriate checks/ subdirectory
  2. Inherit from BaseCheck, HarborCheck
  3. Implement name, phase properties and run() method
  4. Register in checks/registry.py
from harbor_bot.checks.base import BaseCheck
from harbor_bot.checks.protocol import CheckContext, CheckPhase
from harbor_bot.models import CheckResult


class MyNewCheck(BaseCheck):
    @property
    def name(self) -> str:
        return "My New Check"

    @property
    def phase(self) -> CheckPhase:
        return CheckPhase.BASIC

    async def run(self, ctx: CheckContext) -> CheckResult:
        return self._make_result(passed=True, message="Check passed!")

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors