Cut LLM token costs by 15–55% before your prompts ever leave your machine.
Defluff works three ways — pick any or all:
| Method | Best for |
|---|---|
| Chrome Extension | Compressing prompts in ChatGPT, Claude.ai, Gemini, Perplexity & more |
| Claude Code Hooks | Automatic compression inside every Claude Code conversation |
| Python CLI / SDK | Pipelines, scripts, API cost reduction in your own apps |
- Chrome Extension
- Claude Code Hooks
- Python CLI
- Python SDK
- Compression Levels
- Supported Sites & Models
- Troubleshooting
- Uninstall Everything
Requires: Google Chrome, Brave, Arc, or any Chromium-based browser.
Step 1 — Download / clone this repo
git clone https://github.com/subrahmanyabhat/defluff.git
# or just download the ZIP and unzip itStep 2 — Open Chrome Extensions
Paste this in your address bar and press Enter:
chrome://extensions
Step 3 — Enable Developer Mode
Toggle Developer mode ON in the top-right corner.
Step 4 — Load the extension
Click Load unpacked → navigate to the defluff folder (the one with manifest.json) → click Select Folder.
The ⚡ Defluff icon appears in your Chrome toolbar. Done.
Auto-update: Because this is a local extension, it won't auto-update from the Chrome Web Store. Pull the repo and reload the extension to get updates.
Floating badge
When you open any supported LLM chat site, a small badge appears near the input box:
⚡ 1,234 tokens [Minimize ↓] 📋
| Element | What it does |
|---|---|
| ⚡ 1,234 tokens | Live token count as you type |
| Minimize ↓ | Compress your prompt in place |
| 📋 | Copy minimized version to clipboard (without changing the input) |
After minimizing:
⚡ 847 tokens -31% [↩ Restore] 📋
Click ↩ Restore to undo and get your original text back.
Diff preview (default: ON)
When you click Minimize, a preview panel shows before/after so you can confirm before applying:
┌─────────────────────────────────────────┐
│ Preview changes │
│ │
│ Before │
│ I would like you to please explain in │
│ order to understand... │
│ │
│ After │
│ Please explain... │
│ │
│ Tokens: 1,234 → 847 Saved: 387 (31%) │
│ │
│ [Apply ↓] [Cancel] │
└─────────────────────────────────────────┘
Popup stats
Click the ⚡ icon in the toolbar to see:
- Session tokens saved + cost saved
- All-time tokens saved + cost saved
- Average compression ratio
- Quick controls for level and model
Click ⚙ Options in the popup to open the full settings page.
| Setting | Default | Description |
|---|---|---|
| Extension enabled | ON | Global on/off switch |
| Show live token counter | ON | Show the floating badge |
| Show diff before applying | ON | Preview before/after |
| Auto-minimize on send | OFF | Compress automatically when you press Enter |
| Compression level | Medium | Light / Medium / Aggressive |
| Model | Claude Sonnet 4 | Used for cost calculation |
| Custom price | — | USD per 1M tokens (if your model isn't listed) |
| Excluded sites | — | Hostnames where Defluff should not run |
Live preview — paste any prompt into the Options page to see exactly what gets compressed before using it in a real chat.
Defluff includes three Claude Code hooks. Each runs automatically — no clicks needed.
| Hook | Event | What it does |
|---|---|---|
claude_hook.py |
UserPromptSubmit |
Compresses your message before Claude reads it |
claude_code_hook.py |
PreToolUse |
Compresses prompt/content fields in tool inputs |
post_tool_hook.py |
PostToolUse |
Compresses large tool outputs (Bash, Grep, MCP) before Claude reads them |
Option A — Automated installer (recommended)
cd defluff
chmod +x hooks/install.sh
bash hooks/install.shThis copies defluff_hook.py to ~/.claude/ and registers it in ~/.claude/settings.json.
# Test the hook with a sample prompt
bash hooks/install.sh --test
# Test with your own text
bash hooks/install.sh --test "I would like you to please explain in order to understand..."
# Check installation status + all-time stats
bash hooks/install.sh --status
# Change compression level
bash hooks/install.sh --config level=aggressive
# Remove the hook
bash hooks/install.sh --uninstallOption B — Python package hooks (more powerful)
These hooks use the full defluff pipeline (deduplication, JSON minification, URL compression, etc.) and require the Python package to be installed.
pip install -e ".[dev]" # install in editable mode from this repoThen add all three hooks to ~/.claude/settings.json:
{
"hooks": {
"UserPromptSubmit": [
{
"matcher": "",
"hooks": [
{
"type": "command",
"command": "python3 /absolute/path/to/defluff/src/defluff/integrations/claude_hook.py"
}
]
}
],
"PreToolUse": [
{
"matcher": "",
"hooks": [
{
"type": "command",
"command": "python3 /absolute/path/to/defluff/src/defluff/integrations/claude_code_hook.py"
}
]
}
],
"PostToolUse": [
{
"matcher": "",
"hooks": [
{
"type": "command",
"command": "python3 /absolute/path/to/defluff/src/defluff/integrations/post_tool_hook.py"
}
]
}
]
}
}Replace /absolute/path/to/defluff with the actual path (find it with pwd inside the repo).
What it compresses:
Before: "I would like you to please explain, due to the fact that I am
confused, how in order to write a sort function. Please note
that I am basically a beginner."
After: "Please explain how to write a sort function. I'm a beginner."
Output in Claude Code UI (stderr):
⚡ defluff 67 → 50 tokens ↓23.1% saved 15 [medium]
Configure (src/defluff/integrations/claude_hook.py top of file):
PRESET = "medium" # "light" | "medium" | "aggressive"
MODEL = "claude-sonnet-4-6" # used for stats only
MIN_TOKENS = 30 # skip if prompt is shorter than this
SHOW_STATS = True # show savings in the UICompresses large prompt, content, message, text, query fields passed into tools.
Safe by design — never touches:
Read,Write,Edit,MultiEdit,NotebookRead,NotebookEditinputs (Claude needs exact content to produce correct file edits)
Configure (src/defluff/integrations/claude_code_hook.py):
PRESET = "medium"
MIN_TOKENS = 50 # skip short values
VERBOSE = False # set True to log per-field savingsCompresses large tool outputs before Claude reads them — the biggest savings opportunity since tool outputs can be huge.
Supported tools:
| Tool | Compression |
|---|---|
mcp__filesystem__* |
✅ Full output replacement |
mcp__fetch__* |
✅ Full output replacement |
mcp__brave_search__* |
✅ Full output replacement |
mcp__github__* |
✅ Full output replacement |
mcp__postgres__* |
✅ Full output replacement |
mcp__*__* (any MCP) |
✅ Full output replacement |
Bash |
✅ stdout compressed |
Grep |
✅ output compressed |
WebFetch |
✅ content compressed |
WebSearch |
✅ results compressed |
Glob, LS |
✅ output compressed |
Read, Write, Edit |
❌ Never touched |
Configure (src/defluff/integrations/post_tool_hook.py):
PRESET = "tool_output" # conservative preset safe for code/JSON
MIN_TOKENS = 100 # only compress outputs larger than this
SHOW_STATS = TrueIf you prefer to edit ~/.claude/settings.json yourself:
# Find your absolute path
cd /path/to/defluff && pwdOpen ~/.claude/settings.json (create it if it doesn't exist) and add the hooks block. Example with all three hooks:
{
"hooks": {
"UserPromptSubmit": [
{
"matcher": "",
"hooks": [
{
"type": "command",
"command": "python3 /Users/yourname/defluff/src/defluff/integrations/claude_hook.py"
}
]
}
],
"PreToolUse": [
{
"matcher": "",
"hooks": [
{
"type": "command",
"command": "python3 /Users/yourname/defluff/src/defluff/integrations/claude_code_hook.py"
}
]
}
],
"PostToolUse": [
{
"matcher": "",
"hooks": [
{
"type": "command",
"command": "python3 /Users/yourname/defluff/src/defluff/integrations/post_tool_hook.py"
}
]
}
]
}
}Restart Claude Code after saving.
From this repo (development):
pip install -e .With optional ML tokenizers:
pip install -e ".[ml]" # adds transformers + torch for HuggingFace tokenizersVerify:
defluff --helpdefluff compress Compress text to reduce token usage
defluff count Count tokens in text
defluff diff Show diff between two files
defluff list-models List supported models and pricing
defluff list-strategies List all compression strategies
defluff stats Show your all-time savings
defluff share Generate a shareable savings summary
Compress a string:
defluff compress --text "I would like you to please explain in order to understand why this is important"
# Output:
# Please explain why this is important
# Before: 18 tokens | After: 7 tokens | Saved: 11 (61%)Compress a file:
defluff compress prompt.txt
defluff compress prompt.txt --output compressed.txt
defluff compress prompt.txt --preset aggressiveCompress from stdin (pipe):
echo "Please note that in order to do this you should..." | defluff compress -
cat big_prompt.txt | defluff compress - --preset lightCompress and show diff:
defluff compress prompt.txt --diff
defluff compress prompt.txt --diff --diff-format side-by-sideChoose compression preset:
defluff compress prompt.txt --preset light # safe, whitespace only
defluff compress prompt.txt --preset medium # default, removes filler phrases
defluff compress prompt.txt --preset aggressive # maximum compressionCount tokens only (no compression):
defluff count --text "How many tokens is this sentence?"
defluff count prompt.txt --model claude-sonnet-4-6
defluff count prompt.txt --tokenizer anthropicSee all model prices:
defluff list-modelsSee all compression strategies:
defluff list-strategiesView your savings history:
defluff stats
defluff share # generate a shareable text summaryJSON output (for scripts):
defluff compress prompt.txt --output-json
# Outputs:
# {
# "original_tokens": 45,
# "compressed_tokens": 28,
# "tokens_saved": 17,
# "ratio": 37.8,
# "compressed_text": "..."
# }Use defluff in your own Python code:
from defluff.pipeline import build_preset
from defluff.tokenizers import AnthropicCounter
# Build a pipeline
pipeline = build_preset("medium") # "light" | "medium" | "aggressive" | "tool_output"
# Compress text
result = pipeline.run("I would like you to please explain in order to understand this topic.")
print(result.compressed_text)
# → "Please explain this topic."
print(result.original_text)
# → "I would like you to please explain in order to understand this topic."
# See step-by-step what each compressor did
for step in result.steps:
print(f"{step.name}: {len(step.text_before)} → {len(step.text_after)} chars")Count tokens accurately:
from defluff.tokenizers import AnthropicCounter
counter = AnthropicCounter(model="claude-sonnet-4-6")
tokens = counter.count("Your prompt text here")
print(tokens) # e.g. 5Available tokenizers:
from defluff.tokenizers import AnthropicCounter # uses Anthropic's token counting
# tiktoken counter also available (for OpenAI models)Track savings:
from defluff.stats import record, summarize
# Record a compression
record(
tokens_before=45,
tokens_after=28,
model="claude-sonnet-4-6",
preset="medium",
source="my-app",
)
# Get summary
summary = summarize()
print(f"Total saved: {summary['total_tokens_saved']:,} tokens")
print(f"Avg ratio: {summary['avg_ratio']}%")Integration with Anthropic SDK:
import anthropic
from defluff.pipeline import build_preset
client = anthropic.Anthropic()
pipeline = build_preset("medium")
def chat(user_message: str) -> str:
# Compress before sending
result = pipeline.run(user_message)
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=1024,
messages=[{"role": "user", "content": result.compressed_text}]
)
return response.content[0].text
print(chat("I would like you to please explain in order to understand how sorting works."))- Normalize line endings
- Remove trailing whitespace per line
- Collapse multiple blank lines → max 2
- Collapse multiple spaces → single space
Everything in Light, plus:
| Removes | Replaces with |
|---|---|
please note that |
(nothing) |
it is important to note that |
(nothing) |
in order to |
to |
due to the fact that |
because |
at this point in time |
now |
with respect to |
regarding |
in the event that |
if |
I would like you to |
please |
each and every |
every |
first and foremost |
first |
a large number of |
many |
Certainly! / Of course! / Absolutely! |
(nothing) |
As an AI language model |
(nothing) |
past history / future plans / free gift |
history / plans / gift |
| (60+ more rules) |
Everything in Medium, plus:
| Removes |
|---|
very / really / quite / rather |
basically / literally / actually / simply / just |
totally / extremely / incredibly |
end result / forward progress / complete opposite |
I think that → I think |
I believe that → I believe |
there is no doubt that → certainly |
it is clear that → *(removed)* |
| Site | URL |
|---|---|
| ChatGPT | chat.openai.com, chatgpt.com |
| Claude | claude.ai |
| Gemini | gemini.google.com |
| Microsoft Copilot | copilot.microsoft.com |
| Perplexity | perplexity.ai |
| Poe | poe.com |
| Mistral | chat.mistral.ai |
| HuggingFace Chat | huggingface.co/chat |
| Model | Price per 1M input tokens |
|---|---|
| Claude Opus 4 | $15.00 |
| Claude Sonnet 4 | $3.00 |
| Claude Haiku 4 | $0.80 |
| GPT-4o | $2.50 |
| GPT-4o mini | $0.15 |
| GPT-4 Turbo | $10.00 |
| Gemini 1.5 Pro | $3.50 |
| Gemini 1.5 Flash | $0.35 |
| Mistral Large | $8.00 |
| Custom | Your price |
Chrome extension badge doesn't appear
- Make sure the extension is enabled at
chrome://extensions - Refresh the page after loading the extension
- Check that "Show live token counter" is ON in Options
- The site may not be in the supported list — check
manifest.jsonhost_permissions
Extension not minimizing text on Claude.ai
Claude.ai uses ProseMirror (a rich text editor). If the minimize button applies but text doesn't change:
- Click directly inside the chat input first
- Try clicking Minimize again
- The extension uses
execCommand('insertText')which requires the input to be focused
Hook not running in Claude Code
# Check if hook is registered
bash hooks/install.sh --status
# Test the hook directly
bash hooks/install.sh --test "your prompt here"
# Check settings.json
cat ~/.claude/settings.jsonMake sure the path in settings.json is an absolute path (starts with /).
Hook crashes / exits with error
The hooks are built to never crash Claude Code — they always exit 0 and fall back to passthrough on any error. If you see issues:
# Run the hook directly and check stderr
echo '{"hook_event_name":"UserPromptSubmit","prompt":"test prompt"}' \
| python3 src/defluff/integrations/claude_hook.pyPython package not found (ModuleNotFoundError: No module named 'defluff')
cd /path/to/defluff
pip install -e .Or use the standalone hook (hooks/defluff_hook.py) which has zero dependencies.
pip not found
python3 -m pip install -e .Compression changes meaning / breaks sentences
Lower the compression level: Options → Level → Light
Or exclude specific sites: Options → Excluded Sites → add the hostname.
Chrome Extension:
- Go to
chrome://extensions - Find Defluff → click Remove
Claude Code Hooks:
bash hooks/install.sh --uninstallOr manually remove the hook entries from ~/.claude/settings.json.
Python CLI:
pip uninstall defluffStats data:
rm -rf ~/.defluff # CLI stats (JSON Lines file)
rm ~/.claude/defluff_stats.json # Hook statsdefluff/
├── manifest.json Chrome Extension manifest (MV3)
├── background/
│ └── service-worker.js Stats persistence, settings, message hub
├── content/
│ ├── content.js Injected into LLM sites — live counter + minimize UI
│ └── content.css Styles for the floating badge
├── popup/
│ ├── popup.html/js/css Toolbar popup — stats + quick controls
├── options/
│ ├── options.html/js/css Full settings page with live preview
├── utils/
│ ├── tokenizer.js Token counting (BPE approximation)
│ └── minimizer.js 60+ compression rules, 3 levels
├── hooks/
│ ├── defluff_hook.py Standalone hook (zero dependencies)
│ └── install.sh Automated hook installer/uninstaller
├── icons/
│ ├── icon16/48/128.png Extension icons
│ └── generate-icons.py Regenerate icons (no external deps)
└── src/defluff/ Python package
├── cli.py Typer CLI app
├── pipeline.py Compression pipeline + presets
├── stats.py JSON Lines stats tracker
├── tokenizers/ tiktoken + Anthropic token counters
├── compressors/ Individual compression strategies
└── integrations/
├── claude_hook.py UserPromptSubmit hook
├── claude_code_hook.py PreToolUse hook
└── post_tool_hook.py PostToolUse hook
# 1. Clone
git clone https://github.com/subrahmanyabhat/defluff.git
cd defluff
# 2. Chrome Extension: chrome://extensions → Developer mode → Load unpacked → select this folder
# 3. Claude Code hooks (automated)
bash hooks/install.sh
# 4. Python CLI
pip install -e .
defluff compress --text "I would like you to please explain in order to understand this"That's it. Every prompt you send costs less from now on.