Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 8 additions & 7 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -79,7 +79,7 @@ source "$SCRIPT_DIR/git_workflow.sh"
git_clone_and_branch

# 3. Appeler le CLI IA
myprovider-cli run "Fix issue #${ISSUE_NUMBER}: ${GITHUB_ISSUE_TITLE:-no title}. ..."
myprovider-cli run "Fix issue #${ISSUE_NUMBER}: ${SOURCE_ISSUE_TITLE:-no title}. ..."

# 4. Push & PR (logique partagee)
git_push_and_pr "Automated PR created by MyProvider for issue #${ISSUE_NUMBER}."
Expand All @@ -96,12 +96,12 @@ set -euo pipefail
echo "=== worker start ==="
echo "TIME: $(date -u --iso-8601=seconds)"
echo "AI_PROVIDER=${AI_PROVIDER:-myprovider}"
echo "GITHUB_REPO=${GITHUB_REPO:-}"
echo "GITHUB_ISSUE_NUMBER=${GITHUB_ISSUE_NUMBER:-}"
echo "GITHUB_INSTALLATION_ID=${GITHUB_INSTALLATION_ID:-}"
echo "SOURCE_REPO=${SOURCE_REPO:-}"
echo "SOURCE_ISSUE_NUMBER=${SOURCE_ISSUE_NUMBER:-}"
echo "SOURCE_INSTALLATION_ID=${SOURCE_INSTALLATION_ID:-}"
if [[ "${DEBUG_ENV:-0}" == "1" ]]; then
echo "---- env (whitelist) ----"
printenv | grep -E '^(AI_PROVIDER|GITHUB_REPO|GITHUB_ISSUE_NUMBER|GITHUB_INSTALLATION_ID|NAMESPACE|JOB_IMAGE|HOME|PATH)=' || true
printenv | grep -E '^(AI_PROVIDER|SOURCE_REPO|SOURCE_ISSUE_NUMBER|SOURCE_INSTALLATION_ID|NAMESPACE|JOB_IMAGE|HOME|PATH)=' || true
echo "---- end env ----"
fi

Expand Down Expand Up @@ -133,8 +133,9 @@ RUN curl -fsSL https://example.com/install.sh | bash
WORKDIR /app
COPY --chown=worker:worker images/worker-myprovider/run.sh /app/run.sh
COPY --chown=worker:worker providers/ /app/providers/
RUN sed -i 's/\r$//' /app/run.sh /app/providers/*.sh \
&& chmod +x /app/run.sh /app/providers/*.sh
COPY --chown=worker:worker prompt/ /app/prompt/
RUN sed -i 's/\r$//' /app/run.sh /app/providers/*.sh /app/prompt/*.sh \
&& chmod +x /app/run.sh /app/providers/*.sh /app/prompt/*.sh

ENV PATH="/home/worker/.local/bin:${PATH}"
WORKDIR /work
Expand Down
103 changes: 59 additions & 44 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,20 +6,20 @@ Kubernetes orchestrator that turns GitHub issues into pull requests using AI age

This project automates the **Issue -> Label -> Pull Request** flow: an `ai-pr-*` label on an issue triggers an AI worker that clones the repo, solves the problem, and opens a PR.

It avoids AI vendor lock-in with 3 built-in worker providers:
It avoids AI vendor lock-in with 3 built-in worker providers:

| Label | Provider | Backend |
| -------------- | ----------- | ----------------------- |
| `ai-pr-claude` | Claude Code | Anthropic |
| `ai-pr-codex` | Codex | OpenAI |
| `ai-pr-aider` | Aider | OpenRouter (extensible) |

The source hosting layer is abstracted behind `SourceProvider`; GitHub is the
only built-in source provider today. See
`docs/adr/0001-source-provider-abstraction.md` for the design decision.

The worker architecture is designed to easily add more AI providers (see
`CONTRIBUTING.md`).
The source hosting layer is abstracted behind `SourceProvider`; GitHub is the
only built-in source provider today. See
`docs/adr/0001-source-provider-abstraction.md` for the design decision.
The worker architecture is designed to easily add more AI providers (see
`CONTRIBUTING.md`).

Tested on: VPS / 8 GB RAM / 4 vCPU / k3s single-node.

Expand All @@ -38,12 +38,12 @@ GitHub Issue (label ai-pr-*)
POST /webhook/github
|
v
+-------------------+
| Orchestrator | Deployment FastAPI
| app/app.py |
| providers/source | GitHub webhook + clone credentials
+--------+----------+
| creates a K8s Job based on the AI worker provider
+-------------------+
| Orchestrator | Deployment FastAPI
| app/app.py |
| providers/source | GitHub webhook + clone credentials
+--------+----------+
| creates a K8s Job based on the AI worker provider
v
+----------------+ +----------------+ +----------------+
| worker-claude | | worker-codex | | worker-aider |
Expand All @@ -54,9 +54,16 @@ GitHub Issue (label ai-pr-*)
clone > AI fix > commit > push > PR
```

**Source auth flow**: `GitHubProvider` generates an ephemeral installation token
(1h) via GitHub App JWT and returns git clone credentials to the orchestrator.
Workers receive only the short-lived token and never receive the PEM key.
**Source auth flow**: `GitHubProvider` generates an ephemeral installation token
(1h) via GitHub App JWT and returns git clone credentials to the orchestrator.
Workers receive only the short-lived token and never receive the PEM key.

**Job environment (metadata)** : the orchestrator injects source-agnostic
variables for each worker Job: `SOURCE_REPO`, `SOURCE_ISSUE_NUMBER`,
`SOURCE_ISSUE_TITLE`, `SOURCE_ISSUE_BODY` (GitHub issue description, bounded to
64 KiB), `SOURCE_ISSUE_URL`, `SOURCE_EVENT_ACTION`, `SOURCE_INSTALLATION_ID`.
See `docs/adr/0001-source-provider-abstraction.md`. The clone credential secret
key remains `GITHUB_TOKEN` (installation token).

---

Expand All @@ -65,7 +72,7 @@ Workers receive only the short-lived token and never receive the PEM key.
### 1. Prerequisites

- A VPS (or machine) with 4 vCPU / 8 GB RAM minimum
- API keys for your desired AI worker providers
- API keys for your desired AI worker providers
- **Ansible option**: `ansible` installed locally + SSH root access to the VPS
- **Manual option**: k3s, Docker, and `kubectl` installed on the VPS

Expand Down Expand Up @@ -230,6 +237,10 @@ kubectl -n ai-bot delete job debug-<provider> --ignore-not-found && kubectl -n a

### Manual Jobs (ai-issue)

> Manual jobs in `k8s/ai-issue-*.yaml` must set `SOURCE_REPO`,
> `SOURCE_ISSUE_NUMBER`, and `SOURCE_INSTALLATION_ID` (not the old `GITHUB_*`
> names).

```shell
# Run / logs / rerun (replace <provider>)
kubectl -n ai-bot apply -f k8s/ai-issue-<provider>.yaml
Expand Down Expand Up @@ -259,24 +270,24 @@ curl -s -X POST http://127.0.0.1:8080/jobs/run -H "Authorization: Bearer <ADMIN_

| Surface | Risk | Mitigation |
| --- | --- | --- |
| **Incoming webhook** | Fake webhook to trigger a job | HMAC-SHA256 signature (`WEBHOOK_SECRET`) verified on every request |
| **Admin endpoints** | Unauthorized access | Bearer token (`ADMIN_TOKEN`), not exposed via Ingress |
| **Source-provider private key** | Theft = source repo access | Secret stays in orchestrator pod only; workers receive short-lived clone credentials |
| **GitHub token (workers)** | Compromised worker | Token stored in ephemeral K8s Secret (ownerReference to Job), scoped to one installation, expires in 1h, ephemeral container |
| **AI API keys** | Leak | Injected via K8s `secretKeyRef`, one secret per AI worker provider |
| **AI code execution** | Malicious code | Workers run as non-root, ephemeral, no persistent volume |
| **Git credentials** | Token in logs | Auth via `GIT_ASKPASS`, no token in URLs |
| **K8s RBAC** | Out-of-scope access | Role limited to `ai-bot` namespace, workers without ServiceAccount |
| **Incoming webhook** | Fake webhook to trigger a job | HMAC-SHA256 signature (`WEBHOOK_SECRET`) verified on every request |
| **Admin endpoints** | Unauthorized access | Bearer token (`ADMIN_TOKEN`), not exposed via Ingress |
| **Source-provider private key** | Theft = source repo access | Secret stays in orchestrator pod only; workers receive short-lived clone credentials |
| **GitHub token (workers)** | Compromised worker | Token stored in ephemeral K8s Secret (ownerReference to Job), scoped to one installation, expires in 1h, ephemeral container |
| **AI API keys** | Leak | Injected via K8s `secretKeyRef`, one secret per AI worker provider |
| **AI / LLM input** | Issue title & body influence model behavior | Content is user-controlled; bounded length in Job env; treat as untrusted input (prompt injection). See `SECURITY.md` |
| **Git credentials** | Token in logs | Auth via `GIT_ASKPASS`, no token in URLs |
| **K8s RBAC** | Out-of-scope access | Role limited to `ai-bot` namespace, workers without ServiceAccount |

### Production Recommendations

- Use a secrets operator (Sealed Secrets, External Secrets)
- Restrict RBAC access to Secrets and Jobs
- Monitor jobs > 30 min (token expires at 1h)
- Regularly rotate `WEBHOOK_SECRET` and `ADMIN_TOKEN`
- Review any new `SourceProvider` for webhook verification, credential scope,
and logging behavior
- See `SECURITY.md` for vulnerability reporting and provider security rules
- Restrict RBAC access to Secrets and Jobs
- Monitor jobs > 30 min (token expires at 1h)
- Regularly rotate `WEBHOOK_SECRET` and `ADMIN_TOKEN`
- Review any new `SourceProvider` for webhook verification, credential scope,
and logging behavior
- See `SECURITY.md` for vulnerability reporting and provider security rules

---

Expand All @@ -288,6 +299,7 @@ curl -s -X POST http://127.0.0.1:8080/jobs/run -H "Authorization: Bearer <ADMIN_
| `CrashLoopBackOff` | `kubectl logs pod/<pod> --previous` |
| `Not logged in` | Missing API secret (depends on provider) |
| `Pods Pending` | `kubectl describe pod <pod>` |
| `Job missing SOURCE_REPO` / clone fails | Orchestrator + worker images out of sync; manual YAML still using `GITHUB_REPO` / `GITHUB_ISSUE_*` — use `SOURCE_*` envs |
| Job 409 conflict | Job already exists, `kubectl delete job <name>` |

```shell
Expand All @@ -302,10 +314,10 @@ sudo systemctl status k3s --no-pager -l

```text
.
|-- app/
| |-- app.py # FastAPI Orchestrator
| |-- config.py # Runtime env/config
| `-- requirements.txt
|-- app/
| |-- app.py # FastAPI Orchestrator
| |-- config.py # Runtime env/config
| `-- requirements.txt
|-- images/
| |-- orchestrator/Dockerfile
| |-- worker-claude/ # Dockerfile + run.sh
Expand All @@ -318,22 +330,25 @@ sudo systemctl status k3s --no-pager -l
| |-- ai-issue-*.yaml # Manual jobs per provider
| |-- debug-*.yaml # Debug jobs per provider
| `-- secrets/ # Templates (no values)
|-- providers/
| |-- source/ # SourceProvider interface + GitHub implementation
| |-- git_workflow.sh # Shared Git logic
| |-- claude_code.sh
| |-- openai.sh
| `-- aider.sh
|-- prompt/
| |-- issue_prompt.sh # Optional SOURCE_ISSUE_BODY appendix
| `-- issue_start_prompt.sh # Shared task instructions (all workers)
|-- providers/
| |-- source/ # SourceProvider interface + GitHub implementation
| |-- git_workflow.sh # Shared Git logic
| |-- claude_code.sh
| |-- openai.sh
| `-- aider.sh
|-- ansible/
| |-- playbook.yml # Full VPS deployment
| |-- inventory.ini
| |-- inventory-local.ini
| |-- inventory-prod.ini # gitignored
| |-- requirements.yml # Ansible collections
| `-- group_vars/vps.yml
|-- docs/
| |-- adr/ # Architecture decision records
| `-- workspace.dsl # C4 architecture (Structurizr)
|-- docs/
| |-- adr/ # Architecture decision records
| `-- workspace.dsl # C4 architecture (Structurizr)
|-- .github/
| `-- workflows/secret-scan.yml # CI secret scanning
|-- CONTRIBUTING.md
Expand Down
1 change: 1 addition & 0 deletions SECURITY.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@ If your report contains secrets, rotate them immediately after sharing.
- Never commit real secret values to git history.
- Kubernetes secret manifests under `k8s/secrets/` are templates only.
- Webhook fixture files under `tests/` must be anonymized and must not contain real repository names, users, tokens, signatures, private issue content, or internal URLs.
- **Worker Jobs** receive issue metadata in environment variables (`SOURCE_ISSUE_TITLE`, `SOURCE_ISSUE_BODY`, etc.). Issue bodies are **untrusted user content**; they are included in AI prompts after an orchestrator-side size cap. Operators should assume **prompt-injection** risk from issue text (same class of risk as pasting issue content into any LLM). Do not log full `SOURCE_ISSUE_BODY` in production.

## Source Provider Security

Expand Down
19 changes: 13 additions & 6 deletions app/app.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,9 @@
get_provider,
)

# Issue body in a pod env var must stay bounded (etcd / API limits, huge GitHub bodies).
_MAX_ISSUE_BODY_CHARS = 65536

# --- Logging setup ---
# Use uvicorn's logger so messages aren't disabled by uvicorn's dictConfig
logger = logging.getLogger("uvicorn.error")
Expand Down Expand Up @@ -322,6 +325,8 @@ async def github_webhook(request: Request):
issue = payload.get("issue") or {}
issue_number = issue.get("number")
issue_title = issue.get("title", "")[:200]
raw_issue_body = issue.get("body")
issue_body = (raw_issue_body if isinstance(raw_issue_body, str) else "")[:_MAX_ISSUE_BODY_CHARS]
issue_url = issue.get("html_url", "")
installation_id = (payload.get("installation") or {}).get("id")

Expand Down Expand Up @@ -356,12 +361,14 @@ async def github_webhook(request: Request):
cfg=cfg,
provider=provider,
env_vars={
"GITHUB_REPO": repo_full,
"GITHUB_ISSUE_NUMBER": str(issue_number),
"GITHUB_EVENT_ACTION": str(action),
"GITHUB_ISSUE_TITLE": issue_title,
"GITHUB_ISSUE_URL": issue_url,
"GITHUB_INSTALLATION_ID": str(installation_id),
# Source-provider-agnostic metadata (see docs/adr/0001-source-provider-abstraction.md).
"SOURCE_REPO": repo_full,
"SOURCE_ISSUE_NUMBER": str(issue_number),
"SOURCE_EVENT_ACTION": str(action),
"SOURCE_ISSUE_TITLE": issue_title,
"SOURCE_ISSUE_BODY": issue_body,
"SOURCE_ISSUE_URL": issue_url,
"SOURCE_INSTALLATION_ID": str(installation_id),
},
github_token_secret_name=token_secret_name,
)
Expand Down
2 changes: 1 addition & 1 deletion app/requirements-dev.txt
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
-r requirements.txt
pytest>=8.0,<9.0
pytest>=8.0
pytest-asyncio>=0.23,<1.0
respx>=0.21,<1.0
ruff>=0.8,<1.0
10 changes: 7 additions & 3 deletions docs/adr/0001-source-provider-abstraction.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,6 @@ For GitHub, this means:
## Non-Goals

- No GitLab, Gitea, Forgejo, or Linear implementation in this change.
- No worker environment variable migration yet.
- No label state machine changes.
- No change to the existing `ai-pr-*` trigger behavior.
- No replacement of Kubernetes job orchestration.
Expand All @@ -70,8 +69,13 @@ Tradeoffs:
- Some source platforms cannot always provide short-lived repo-scoped clone
credentials. Their implementations must document the best available security
model.
- The workers still receive GitHub-shaped environment variables in this PR.
Generic `SOURCE_*` variables should be handled in a later migration.

Worker jobs receive **source-agnostic metadata** as `SOURCE_*` environment
variables (for example `SOURCE_REPO`, `SOURCE_ISSUE_NUMBER`, `SOURCE_ISSUE_BODY`,
`SOURCE_ISSUE_URL`, `SOURCE_INSTALLATION_ID`, `SOURCE_EVENT_ACTION`) populated
from the active `SourceProvider`. The HTTPS clone credential still uses the
secret key **`GITHUB_TOKEN`** today (GitHub App installation token). Renaming
that credential for non-GitHub hosts is a separate change.

## Validation

Expand Down
5 changes: 3 additions & 2 deletions images/worker-aider/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -18,8 +18,9 @@ RUN pip install --user aider-chat
WORKDIR /app
COPY --chown=worker:worker images/worker-aider/run.sh /app/run.sh
COPY --chown=worker:worker providers/ /app/providers/
RUN sed -i 's/\r$//' /app/run.sh /app/providers/*.sh \
&& chmod +x /app/run.sh /app/providers/*.sh
COPY --chown=worker:worker prompt/ /app/prompt/
RUN sed -i 's/\r$//' /app/run.sh /app/providers/*.sh /app/prompt/*.sh \
&& chmod +x /app/run.sh /app/providers/*.sh /app/prompt/*.sh

ENV PATH="/home/worker/.local/bin:${PATH}"
WORKDIR /work
Expand Down
8 changes: 4 additions & 4 deletions images/worker-aider/run.sh
Original file line number Diff line number Diff line change
Expand Up @@ -4,12 +4,12 @@ set -euo pipefail
echo "=== worker start ==="
echo "TIME: $(date -u --iso-8601=seconds)"
echo "AI_PROVIDER=${AI_PROVIDER:-aider}"
echo "GITHUB_REPO=${GITHUB_REPO:-}"
echo "GITHUB_ISSUE_NUMBER=${GITHUB_ISSUE_NUMBER:-}"
echo "GITHUB_INSTALLATION_ID=${GITHUB_INSTALLATION_ID:-}"
echo "SOURCE_REPO=${SOURCE_REPO:-}"
echo "SOURCE_ISSUE_NUMBER=${SOURCE_ISSUE_NUMBER:-}"
echo "SOURCE_INSTALLATION_ID=${SOURCE_INSTALLATION_ID:-}"
if [[ "${DEBUG_ENV:-0}" == "1" ]]; then
echo "---- env (whitelist) ----"
printenv | grep -E '^(AI_PROVIDER|GITHUB_REPO|GITHUB_ISSUE_NUMBER|GITHUB_INSTALLATION_ID|NAMESPACE|JOB_IMAGE|HOME|PATH)=' || true
printenv | grep -E '^(AI_PROVIDER|SOURCE_REPO|SOURCE_ISSUE_NUMBER|SOURCE_INSTALLATION_ID|NAMESPACE|JOB_IMAGE|HOME|PATH)=' || true
echo "---- end env ----"
fi

Expand Down
5 changes: 3 additions & 2 deletions images/worker-claude/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -18,8 +18,9 @@ RUN curl -fsSL https://claude.ai/install.sh | bash
WORKDIR /app
COPY --chown=worker:worker images/worker-claude/run.sh /app/run.sh
COPY --chown=worker:worker providers/ /app/providers/
RUN sed -i 's/\r$//' /app/run.sh /app/providers/*.sh \
&& chmod +x /app/run.sh /app/providers/*.sh
COPY --chown=worker:worker prompt/ /app/prompt/
RUN sed -i 's/\r$//' /app/run.sh /app/providers/*.sh /app/prompt/*.sh \
&& chmod +x /app/run.sh /app/providers/*.sh /app/prompt/*.sh

ENV PATH="/home/worker/.local/bin:${PATH}"
WORKDIR /work
Expand Down
8 changes: 4 additions & 4 deletions images/worker-claude/run.sh
Original file line number Diff line number Diff line change
Expand Up @@ -4,12 +4,12 @@ set -euo pipefail
echo "=== worker start ==="
echo "TIME: $(date -u --iso-8601=seconds)"
echo "AI_PROVIDER=${AI_PROVIDER:-claude_code}"
echo "GITHUB_REPO=${GITHUB_REPO:-}"
echo "GITHUB_ISSUE_NUMBER=${GITHUB_ISSUE_NUMBER:-}"
echo "GITHUB_INSTALLATION_ID=${GITHUB_INSTALLATION_ID:-}"
echo "SOURCE_REPO=${SOURCE_REPO:-}"
echo "SOURCE_ISSUE_NUMBER=${SOURCE_ISSUE_NUMBER:-}"
echo "SOURCE_INSTALLATION_ID=${SOURCE_INSTALLATION_ID:-}"
if [[ "${DEBUG_ENV:-0}" == "1" ]]; then
echo "---- env (whitelist) ----"
printenv | grep -E '^(AI_PROVIDER|GITHUB_REPO|GITHUB_ISSUE_NUMBER|GITHUB_INSTALLATION_ID|NAMESPACE|JOB_IMAGE|HOME|PATH)=' || true
printenv | grep -E '^(AI_PROVIDER|SOURCE_REPO|SOURCE_ISSUE_NUMBER|SOURCE_INSTALLATION_ID|NAMESPACE|JOB_IMAGE|HOME|PATH)=' || true
echo "---- end env ----"
fi

Expand Down
5 changes: 3 additions & 2 deletions images/worker-codex/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -25,8 +25,9 @@ RUN git config --global user.name "patchwork-agent" \
WORKDIR /app
COPY --chown=worker:worker images/worker-codex/run.sh /app/run.sh
COPY --chown=worker:worker providers/ /app/providers/
RUN sed -i 's/\r$//' /app/run.sh /app/providers/*.sh \
&& chmod +x /app/run.sh /app/providers/*.sh
COPY --chown=worker:worker prompt/ /app/prompt/
RUN sed -i 's/\r$//' /app/run.sh /app/providers/*.sh /app/prompt/*.sh \
&& chmod +x /app/run.sh /app/providers/*.sh /app/prompt/*.sh

WORKDIR /work

Expand Down
8 changes: 4 additions & 4 deletions images/worker-codex/run.sh
Original file line number Diff line number Diff line change
Expand Up @@ -4,12 +4,12 @@ set -euo pipefail
echo "=== worker start ==="
echo "TIME: $(date -u --iso-8601=seconds)"
echo "AI_PROVIDER=${AI_PROVIDER:-openai}"
echo "GITHUB_REPO=${GITHUB_REPO:-}"
echo "GITHUB_ISSUE_NUMBER=${GITHUB_ISSUE_NUMBER:-}"
echo "GITHUB_INSTALLATION_ID=${GITHUB_INSTALLATION_ID:-}"
echo "SOURCE_REPO=${SOURCE_REPO:-}"
echo "SOURCE_ISSUE_NUMBER=${SOURCE_ISSUE_NUMBER:-}"
echo "SOURCE_INSTALLATION_ID=${SOURCE_INSTALLATION_ID:-}"
if [[ "${DEBUG_ENV:-0}" == "1" ]]; then
echo "---- env (whitelist) ----"
printenv | grep -E '^(AI_PROVIDER|GITHUB_REPO|GITHUB_ISSUE_NUMBER|GITHUB_INSTALLATION_ID|NAMESPACE|JOB_IMAGE|HOME|PATH)=' || true
printenv | grep -E '^(AI_PROVIDER|SOURCE_REPO|SOURCE_ISSUE_NUMBER|SOURCE_INSTALLATION_ID|NAMESPACE|JOB_IMAGE|HOME|PATH)=' || true
echo "---- end env ----"
fi

Expand Down
Loading
Loading