The rewrite engine that gives LLM text a pulse.
Brotherizer gives LLM text a pulse when the model lands the facts but goes flat on feeling.
It pulls from donor writing, rewrites for the right surface, and reranks until something actually sticks.
Think of it as voice middleware for teams that want less committee and more human.
No detector theater. No fake warmth. No polished-for-no-reason copy.
Just text that sounds awake instead of overmanaged.
The point is not to remove the human. It is to give them a better machine.
Brotherizer stays narrow by design:
- it retrieves donor writing patterns
- it rewrites for the mode and surface you actually need
- it reranks multiple candidates
- it lets the client keep the winner or choose another option later
Think of it as voice middleware for LLM output.
If your model already knows what to say but keeps saying it like it had to clear legal first, this is the lane.
Brotherizer is not:
- a general chat model
- a giant prompt-management suite
- a full writing app
- a detector-evasion gimmick
The point is not to make text look less AI just to win a benchmark.
The point is to make it sound more like a person actually meant it.
Brotherizer runs a five-part pipeline:
-
Retrieve donor texture
- pull donor snippets from local packs or the corpus database
- optionally use local embeddings for semantic lookup
-
Resolve mode + surface
- choose the right voice family
- apply surface-aware formatting and style directives
-
Generate multiple rewrites
- produce several candidates instead of pretending the first shot is always the best shot
-
Rerank
- score candidates for semantic fidelity, mode fit, surface fit, anti-generic behavior, and composition quality
- optionally run an xAI/Grok judge pass for harder selection calls
-
Persist the decision
- keep the
winner - allow a client or user to
choosea different candidate later - store job, candidate, and choice history in the runtime DB
- keep the
The result is simple:
- send text in
- get ranked options back
- keep the winner, or override it
Brotherizer is explicit about the model split it ships with today:
- Generation lane: Perplexity Sonar
- Judge lane: xAI Grok reasoning models
- Optional semantic retrieval lane: local Ollama embeddings
In practice, that split looks like this:
- Perplexity Sonar handles the fast rewrite pass
- Grok handles the optional judgment-heavy pass when selection quality matters more than speed
- Ollama is there if you want local semantic retrieval for the donor corpus
Current defaults in the repo:
- generation model:
sonar - judge model:
grok-4.20-reasoning - embedding model:
nomic-embed-text
You can still point the judge lane at earlier Grok reasoning variants by setting BROTHERIZER_XAI_MODEL. The public docs explain the split in more detail in docs/wiki/MODEL_ROUTING_AND_PROVIDERS.md.
Brotherizer ships with a public research substrate. It is the part contributors can inspect, rebuild, and extend:
- donor packs under
data/donor_packs/ - corpus DB builder
- optional embedding index builder
- style radar seed signals and DB builder
- formatting / internet-symbol packs
- retrieval selectors that feed the rewrite engine
What is intentionally not public is the private collection layer.
That is deliberate:
- the public repo still shows how the system thinks
- it just does not include collection machinery or internal ops lanes
If you want the longer public explanation, start here:
docs/wiki/HOW_IT_WORKS.mddocs/wiki/RETRIEVAL_ARCHITECTURE.mddocs/wiki/LOCAL_SETUP_AND_DATABASES.mdRESEARCH/README.md
Brotherizer ships with multiple voice families, including:
british_banter_modeworldwide_ironic_modeen_reflective_human_modeen_professional_human_modebritish_professional_human_modecasual_us_human_modeptbr_twitter_modeptbr_narrative_human_modeptbr_professional_human_modeseriously_english_modeseriously_ptbr_mode
Defined in configs/brotherizer_modes.json.
Quick mode picker:
- use
casual_us_human_modefor lines that need to feel current and lived-in - use
en_reflective_human_modewhen you want the text to breathe a bit more - use
british_professional_human_modefor restraint, without brochure polish - use
seriously_*modes if the source already carries weight; no extra performance needed - use the PT-BR modes to keep things culturally native, not flattened into generic international Portuguese
Brotherizer can condition the rewrite for:
replypostthreadbiocaptionnote
That changes more than formatting. It changes rhythm, looseness, compression, and reranking behavior.
Brotherizer does not rely on prompt adjectives alone.
It retrieves donor snippets from real writing packs and uses them as texture, pressure, and voice reference without copying them verbatim.
Brotherizer also uses:
That helps it reason about:
- internet-native markers
- compact reaction language
- reflective vs casual surfaces
- profile/bio cleanliness
- reply vs thread vs note behavior
Brotherizer does not emit one rewrite and pray.
It generates several candidates and reranks them with:
- semantic preservation
- mode fit
- surface fit
- anti-generic heuristics
- composition penalties
- optional xAI judge scoring
The runtime persists:
- jobs
- candidates
- choices
- runtime errors
- idempotency keys
That gives you:
- stable
job_id winnervschosen- replay-safe reads of completed jobs
- idempotent rewrite submission
python3 -m venv .venv
source .venv/bin/activate
python -m pip install -e .That gives you installable entrypoints such as:
brotherizer-apibrotherizebrotherizer-build-corpusbrotherizer-build-style-radarbrotherizer-build-embeddings
export PERPLEXITY_API_KEY=your_key_here
export XAI_API_KEY=your_key_hereNotes:
PERPLEXITY_API_KEYis required for generationXAI_API_KEYis only required if you want the judge lane- local embeddings require a running Ollama instance if you choose to build them
You can also copy the example env:
cp .runtime/brotherizer.env.example .runtime/brotherizer.envBuild the corpus DB:
brotherizer-build-corpus \
--inputs data/donor_packs/english_v3.ndjson data/donor_packs/ptbr_v2.ndjson \
--db data/corpus/brotherizer.dbBuild the style radar DB:
brotherizer-build-style-radar \
--input configs/style_radar_seed_signals.json \
--db data/corpus/style_radar.dbOptional: build embeddings for semantic retrieval:
brotherizer-build-embeddings \
--db data/corpus/brotherizer.dbRecommended mode-driven example:
brotherize \
--mode casual_us_human_mode \
--text "This still sounds too polished and generic." \
--use-xai-judgeGrounded, more restrained example:
brotherize \
--mode seriously_english_mode \
--text "I think this still sounds too polished and generic." \
--use-xai-judgeRun the API directly:
brotherizer-apiOr use the helper script:
bash scripts/start_brotherizer_api.shBy default, Brotherizer serves on http://127.0.0.1:5555.
Rewrite via API:
curl -X POST http://127.0.0.1:5555/v1/rewrite \
-H 'Content-Type: application/json' \
-d '{
"text": "I think this sounds too polished and generic.",
"mode": "casual_us_human_mode",
"surface_mode": "reply",
"candidate_count": 3,
"use_xai_judge": false
}'Choose a non-winner candidate later:
curl -X POST http://127.0.0.1:5555/v1/jobs/<job_id>/choose \
-H 'Content-Type: application/json' \
-d '{
"candidate_id": "<candidate_id>",
"actor": { "type": "client", "id": "codex" },
"reason": "User preferred the alternate"
}'Canonical endpoints:
GET /GET /v1/healthGET /v1/modesGET /v1/capabilitiesPOST /v1/rewriteGET /v1/jobs/:idPOST /v1/jobs/:id/choose
Legacy wrappers:
GET /healthGET /modesPOST /rewrite
The real contract lives under /v1/*.
Brotherizer now ships with a small build baseline:
Useful local commands:
make dev-install
make test
make run-api
make build-corpus
make build-style-radar
make build-embeddingsContainer build:
docker build -t brotherizer:local .
docker run -p 5555:5555 --env-file .runtime/brotherizer.env brotherizer:localStart here:
Most useful pages:
docs/wiki/HOW_IT_WORKS.mddocs/wiki/POSITIONING.mddocs/wiki/MODEL_ROUTING_AND_PROVIDERS.mddocs/wiki/API_REFERENCE.mddocs/wiki/RUNTIME_LIFECYCLE_AND_RECOVERY.mddocs/wiki/LEGACY_WRAPPERS_AND_COMPATIBILITY.mddocs/wiki/RETRIEVAL_ARCHITECTURE.mddocs/wiki/FORMATTING_PACKS_AND_SYMBOL_LIBRARY.mddocs/wiki/SECURITY_AND_SECRETS.md
Research and corpus-building docs:
RESEARCH/README.mdRESEARCH/BUILDING_DATABASES.mdRESEARCH/DONOR_PACKS.mdRESEARCH/PROVIDERS.mdRESEARCH/CONTRIBUTING.mdRESEARCH/SHIPPED_VS_NOT_SHIPPED.md
Clean machine. Human output. Builder energy.
Brotherizer only gets as good as the voice library.
That means the best contributions are usually not another endpoint.
They are:
- a cleaner donor pack
- a sharper register
- a language the repo barely covers today
- a better note / reply / caption surface
We especially want:
- more languages
- more registers
- cleaner professional voices
- better note / reply / caption coverage
If you can build a clean, text-only donor pack in your language, we want it.
If you can build two, even better. The machine has no shame and would like to sound less generic in more countries.
Please keep identity out of the data:
- no handles
- no names
- no emails
- no signatures
- no
source_ref - no metadata that can reveal the author
Start here:
RESEARCH/CONTRIBUTING.mdRESEARCH/PROVIDERS.mdRESEARCH/SAFETY_AND_SANITIZATION.mdRESEARCH/LANGUAGE_COVERAGE.md
Brotherizer lives between brand-voice systems and LLM middleware.
It is closer to:
- a style-retrieval runtime
- a rewrite-and-rerank engine
- a choice layer for agent output
It is not trying to be:
- Jasper
- Grammarly
- PromptLayer
- LangSmith
- an "undetectable AI" circus
Those sit nearby. Brotherizer's lane stays narrow:
retrieve the right texture, rewrite the line, rerank the options, and keep what sounds alive.
Core regression checks:
python3 -m py_compile api/brotherizer_api.py brotherize.py runtime/service.py storage/runtime_db.py tests/test_runtime_service.py tests/test_runtime_api.py
python3 -m unittest tests/test_runtime_service.py tests/test_runtime_api.py

