TFactory

Autonomous test generation + execution platform. Started as a sister project to AIFactory — now a standalone product you can drive from any tool.

Hand TFactory a finished feature's acceptance criteria — from AIFactory, Claude Code, or anything else, via the MCP control plane or a plain file (markdown / Gherkin / EARS, see guides/spec-sources.md). It generates tests aligned to those criteria across the v0.2 lane spine (unit / browser / api / integration / mutation), runs them in a sandbox, evaluates quality with a 5-signal verdict, commits the tests to the feature branch, and posts a triage report to the PR — autonomously.

Status: v0.2.0 released (2026-05-29) — 16 of 16 v0.2 tasks delivered · Browser + API + Integration lanes active · test evidence capture live · Triager surfaces portal evidence links in every PR comment · 1177 backend tests (up from 531 at v0.1.0-mvp). See the v0.2.0 release and Progress page for the per-task log.

Quickstart (NixOS / flake-based)

# One-command dev environment via the flake:
nix develop

# (inside the shell)
tfactory-minimal-venv   # creates apps/backend/.venv with just pytest+pytest-asyncio
tfactory-test           # runs the non-SDK backend suite (~10s)

# For the full backend SDK install (graphiti, claude-agent-sdk, etc.):
bootstrap-venv

The dev shell brings in Python 3.13, Node 22, uv, git, gh, just, ripgrep, jq, docker-client plus four shell functions: bootstrap-venv, tfactory-minimal-venv, tfactory-test, verify-fork.

For auto-loading via direnv:

nix profile install nixpkgs#nix-direnv
direnv allow

Non-Nix users can fall back to npm run install:backend (per the Quickstart on Pages) — the Nix path just makes setup deterministic.

Note for non-Nix npm users: the nix devShell sets NODE_ENV=production, which makes npm install skip devDependencies (including vitest). If you're inside nix develop and running npm install in apps/frontend-web/, first unset NODE_ENV. Captured in detail in guides/e2e-smoke.md.

Running the portal

# Backend (FastAPI on :3102)
cd apps/web-server
source .venv/bin/activate    # if you have a per-app venv
python -m server.main

# Frontend (Vite dev server on :3100)
cd apps/frontend-web
npm install                  # unset NODE_ENV first if inside nix develop
npm run dev

Then visit http://localhost:3100 for the TFactory portal.

The portal exposes a /tfactory view powered by the components under apps/frontend-web/src/components/tfactory/:

TFactoryTaskList — workspace list with status badges
TFactoryTaskDetail — tabs for Status / Lanes / Verdicts / Report / Logs
LaneStatusGrid — Unit / Browser / API / Integration / Mutation lane spine
TFactoryLogViewer — WebSocket live tail (one snapshot per connect at MVP)

End-to-end smoke

Once you have a real AIFactory project + a Claude API key + Docker:

# List the 9 verification scenarios
scripts/e2e-smoke.sh --list

# Dry-run (no env, no LLM calls) — sanity check the runner itself
scripts/e2e-smoke.sh --dry-run --all

# Real run
export ANTHROPIC_API_KEY=sk-ant-...
export TFACTORY_AIFACTORY_ROOT=$HOME/Source/GitHub/MyApp
export TFACTORY_AIFACTORY_BRANCH=feature/...
scripts/e2e-smoke.sh --all

Full walkthrough — including the 3 manual scenarios (mutation, hallucination guard, docker-down) — in guides/e2e-smoke.md.

Tests

Suite	What	Count	Time
Backend non-SDK (`tests/test_*.py`)	Pure-Python primitives + agent loops with mocked SDK	531	~9s
Frontend (`apps/frontend-web/src/*/.test.tsx`)	vitest + React Testing Library	112	~1.5s
End-to-end smoke (`scripts/e2e-smoke.sh`)	Real LLM + Docker + git + gh — manual	9 scenarios	—

CI runs the first two on every commit; the third is operator-driven.

# Backend
PYTHONPATH=apps/backend apps/backend/.venv/bin/pytest -q tests/

# Frontend (under nix devShell, unset NODE_ENV first)
cd apps/frontend-web && ../../node_modules/.bin/vitest run

# Fork-hygiene check (every stray AIFactory reference is allowlisted explicitly)
scripts/verify-fork.sh --no-import

Docs

Full project documentation is published as a GitHub Pages site: https://olafkfreund.github.io/TFactory/

Direct links:

Design Plan — full rationale, 10 locked decisions, landscape research, risk register
Spec — Agent OS spec
Technical Spec — architecture detail
Test Coverage Spec — TDD plan
Task Breakdown — 12 tasks with dependency graph

In-repo guides (guides/):

guides/e2e-smoke.md — operator guide for the 9 verification scenarios
guides/planner-manual-smoke.md — Planner-only sibling smoke
guides/HANDOVER_WORKFLOW.md — how to trigger TFactory from a live Claude Code session
guides/CLAUDE_CODE_MCP_TOOLS.md — driving TFactory tasks from the MCP control plane
guides/byo-llm.md — run TFactory fully on your own infrastructure (Ollama / vLLM / LM Studio / LocalAI) with a verifiable no-egress guarantee — for GDPR / HIPAA / air-gapped teams. python apps/backend/byo_llm.py <model> prints the live data-egress posture (🔒 Local / 🏠 Self-hosted / ☁️ Managed)
guides/spec-sources.md — use TFactory without AIFactory: ingest any acceptance-criteria source (markdown / Gherkin .feature / EARS) into the pipeline via python apps/backend/spec_sources.py <file>

Project tracking

Epic + sub-issues: https://github.com/olafkfreund/TFactory/issues
Discussions / questions: open an issue with the question label

High-level architecture

AIFactory finished branch  ─►  /handover-to-tfactory  ─►  TFactory MCP
                                                              │
                                                              ▼
                                                          Planner
                                                              │
                              ┌──────────┬─────────┬──────────┼──────────┐
                              ▼          ▼         ▼          ▼          ▼
                       Gen-Unit  Gen-Browser  Gen-API  Gen-Integration  Gen-Mut
                              └──────────┴────┬────┴──────────┴──────────┘
                                              ▼
                                          Executor  (Docker per task)
                                              ▼
                                          Evaluator  (separate agent)
                                              ▼
                                          Triager   ─►  git commit + PR comment

Five pipeline stages (Planner / per-lane Generators / Executor / Evaluator / Triager), five lanes (unit / browser / api / integration / mutation), Docker sandbox, spec-aware handover from AIFactory.

The four-stage chain auto-advances via TFACTORY_AUTO_* env vars; each stage writes its outputs to ~/.tfactory/workspaces/{project}/specs/{spec}/ and forwards via a fire-and-forget scheduler. See apps/backend/agents/ for each agent's implementation.

Status by lane

v0.2 swapped the v0.1 pipeline-stage decomposition for a modality-based spine (Decision 2). Security scanning is delegated to dedicated pipelines and out of scope here; TFactory focuses on functional + feature testing.

Lane	v0.2.0 status	Runtime	Coverage	Evidence
Unit	✅ Active	`tfactory-runner-pytest` (Python) · `tfactory-runner-jest` (TypeScript)	line (cobertura / lcov)	—
Browser	✅ Active	`tfactory-runner-playwright` + `AppRuntime` (docker-compose + HTTP HEAD health-poll)	`null` (Decision 11 — line coverage doesn't apply when the test drives the browser)	screenshots · video · trace.zip
API	✅ Active	per-framework Docker image + HTTP HAR recorder	line where applicable	network.har
Integration	✅ Active	per-framework Docker image + `AppRuntime` (multi-service compose)	line where applicable	network.har · service logs
Mutation	✅ Active	`mutmut` (Python) / Stryker (TypeScript) — one-mutation-per-run probe inside the Evaluator	per-mutant (killed / survived)	—

All five lanes shipped with v0.2.0. The Planner picks each subtask's lane from its (language, framework) via the framework registry (frameworks/{pytest,jest,playwright}/descriptor.yaml). New languages (Go / Rust / Ruby) and additional security-pipeline integrations slot into this same spine through new FrameworkDescriptors — no lane additions required.

Connect to your environment — Credential Broker

Agents often need to reach real services and cloud environments (a staging API, a Kubernetes cluster, a GCP/AWS/Azure project) to plan and run tests — but secrets must never land in the repo. The Credential Broker (epic #62) resolves credentials from a pluggable backend and exposes them to the agents ephemerally:

Backends: Azure Key Vault · AWS Secrets Manager · GCP Secret Manager · HashiCorp Vault · local sops / age / agenix · plain env. One ref syntax (vault:path#field, gcp-sm://proj/secret, sops:file#key, …); cloud SDKs load lazily so an absent package never breaks startup.
Ephemeral + redacted: file credentials (kubeconfig, GCP ADC) are written 0600 to a per-task scratch dir and wiped when the task ends; resolved values are redacted from logs.
Honest egress: off by default — no cloud credential is resolved unless the project opts in (.tfactory.yml egress.enabled). python -m tfactory_secrets.cli audit prints a secret-free manifest of exactly what would leave your network.

Why: it extends the existing core/mcp_credentials.py ambient chain (K8s / AWS-IRSA / Azure-MI / GCP-ADC) with a vault-fetch head rather than reinventing auth, and keeps the same honest-egress posture as BYO-LLM. See guides/credentials.md and the Credentials page.

Run on any LLM

TFactory routes each pipeline phase to a provider purely from the model string — no separate provider switch. Supported: the Claude Agent SDK (primary), OpenAI Codex, Gemini CLI, GitHub Copilot CLI, Ollama (local), and any OpenAI-compatible endpoint (vLLM / LM Studio / OpenRouter / Together / Groq / LocalAI). This lets a team run on a flat-rate subscription, a self-hosted model, or fully air-gapped — with an honest data-egress badge (python apps/backend/byo_llm.py <model>) so you always know whether a run keeps data on your network. See guides/byo-llm.md.

License

MIT OR GPL-3.0.

Name		Name	Last commit message	Last commit date
Latest commit History 290 Commits
.agent-os		.agent-os
.claude		.claude
.github		.github
.husky		.husky
apps		apps
charts/tfactory		charts/tfactory
companion-skills		companion-skills
docker		docker
docs		docs
frameworks		frameworks
guides		guides
run.py		run.py
scripts		scripts
shared_docs		shared_docs
skills		skills
tests		tests
.coderabbit.yaml		.coderabbit.yaml
.dockerignore		.dockerignore
.env.example		.env.example
.envrc		.envrc
.gitignore		.gitignore
.mcp.json		.mcp.json
.pre-commit-config.yaml		.pre-commit-config.yaml
.secretsignore		.secretsignore
.secretsignore.example		.secretsignore.example
.tfactory.yml.example		.tfactory.yml.example
.trivyignore		.trivyignore
AGENTS.md		AGENTS.md
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
CODEOWNERS		CODEOWNERS
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
ContainerAPP.md		ContainerAPP.md
Dockerfile		Dockerfile
GEMINI.md		GEMINI.md
Justfile		Justfile
LICENSE		LICENSE
README.md		README.md
RELEASE.md		RELEASE.md
SECURITY.md		SECURITY.md
SUPPORT.md		SUPPORT.md
devenv.lock		devenv.lock
docker-compose.yml		docker-compose.yml
flake.lock		flake.lock
flake.nix		flake.nix
package-lock.json		package-lock.json
package.json		package.json
renovate.json		renovate.json
ruff.toml		ruff.toml
shell.nix		shell.nix
tsconfig.scripts.json		tsconfig.scripts.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TFactory

Quickstart (NixOS / flake-based)

Running the portal

End-to-end smoke

Tests

Docs

Project tracking

High-level architecture

Status by lane

Connect to your environment — Credential Broker

Run on any LLM

License

About

Uh oh!

Releases 4

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

TFactory

Quickstart (NixOS / flake-based)

Running the portal

End-to-end smoke

Tests

Docs

Project tracking

High-level architecture

Status by lane

Connect to your environment — Credential Broker

Run on any LLM

License

About

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 4

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages