A GitHub bot for Harbor via /bot check command.
- Polling-based monitoring of whitelisted repos for
/bot checkcomments on PRs - Comprehensive validation: basic checks, AI detection, Oracle/Nop agents, similarity checking, TB Check/Run/Debug
- PostgreSQL job queue with multi-instance deployment support
- Formatted results posted directly to pull requests
- Modular protocol-based check system
cp .env.example .env
# Edit .env with your credentials
docker compose up -d
docker compose logs -f botPrerequisites: Python 3.13+, PostgreSQL
uv sync
cp .env.example .env
# Edit .env with your credentials
uv run harbor-bot
uv run harbor-bot --dry-run # Don't post to GitHub
uv run harbor-bot --debug # Enable debug loggingEdit config.yaml to configure whitelisted repositories, polling interval, check thresholds, and model settings.
The bot runs checks in phases, with fail-fast behavior for blocking checks.
- Required Files - validates task.toml, instruction.md, Dockerfile, solve.sh, test.sh, test_outputs.py
- Test Script Format - validates test.sh structure
- Instruction Length (warning) - checks instruction.md is not too short
- Solution Length (warning) - checks solve.sh is not too short
- Tests Length (warning) - checks test.sh is not too short
- Task Size (warning) - checks total task size
- AI Detection - uses GPTZero API to detect AI-generated content (threshold: 70%)
- Oracle Agent - must achieve 100% accuracy (proves task is solvable)
- Nop Agent - must achieve 0% accuracy (proves tests aren't trivially passing)
- Similarity Check - TF-IDF + cosine similarity against existing TB2 tasks (threshold: 80%)
- TB Check - runs
harbor tasks checkto validate task structure - TB Run Small - informational run with smaller model
- TB Run Large - main solvability check with larger model
- TB Debug - runs
harbor tasks debugto analyze failures (only if there are failed trials)
# Required
GITHUB_TOKEN=<personal_access_token>
OPENROUTER_API_KEY=<openrouter_api_key>
# Database
DB_HOST=localhost
DB_PORT=5432
DB_NAME=github_bot
DB_USER=postgres
DB_PASSWORD=<password>
# Optional
GPTZERO_API_KEY=<gptzero_api_key>
DISCORD_WEBHOOK_URL=<webhook_url>uv run pytest tests/ -v # Run tests
uvx ruff check harbor_bot/ # Lint
uvx ruff format harbor_bot/ # Format- Create a new file in the appropriate
checks/subdirectory - Inherit from
BaseCheck,HarborCheck - Implement
name,phaseproperties andrun()method - Register in
checks/registry.py
from harbor_bot.checks.base import BaseCheck
from harbor_bot.checks.protocol import CheckContext, CheckPhase
from harbor_bot.models import CheckResult
class MyNewCheck(BaseCheck):
@property
def name(self) -> str:
return "My New Check"
@property
def phase(self) -> CheckPhase:
return CheckPhase.BASIC
async def run(self, ctx: CheckContext) -> CheckResult:
return self._make_result(passed=True, message="Check passed!")