code-debloater

ast-driven structural code-bloat scanner powered by nvidia nim (deepseek v4 pro).

detects duplicate logic, lazy placeholders, ai-generated stubs, and todo debt — then auto-fixes them with production-grade code via nvidia's hosted inference api. no local models, no gpu needed.

this is a fork of zenapta/bloathunter by shashank bhardwaj. the original used ollama + llama 3 locally; this version uses nvidia nim (deepseek v4 pro) — no local llm required.

features

ast structural analysis — uses the typescript compiler api to normalize function bodies (strips variable names, formatting, string/number literals) so it finds true copy-pasted duplicates even when variables are renamed.
nvidia nim auto-fixes — connects to deepseek v4 pro via nvidia's hosted api (integrate.api.nvidia.com). free tier available. no local llm, no ollama, no gpu required.
smart placeholder detection — 30+ patterns for ai-generated stubs, lazy todos, unimplemented code, and hidden technical debt. catches generated-by-ai comments, "insert logic here", "not implemented" errors, and more.
duplicate refactoring strategies — for each duplicate cluster, gets deepseek v4 pro to generate a concrete extraction plan with import paths and shared function signatures.
health scoring — grades your codebase a–f with severity levels (low → critical) so you know where to focus.
dry-run mode — --dry-run shows colored diffs of what would change without touching files.

quick start

# set your nvidia api key
export NVIDIA_API_KEY=nvapi-...

# run it (scans current directory)
npx code-debloater

# or install globally
npm install -g code-debloater
code-debloater ./src

get a free api key

go to integrate.nvidia.com
sign up / log in
navigate to the api section and generate a free api key
deepseek v4 pro is available on the free tier with rate limits

usage

code-debloater [options] [directory]

options:
  --dry-run, --dry           preview fixes without writing
  --scan-only, --no-fix      audit only; skip ai fixes
  --yes, -y                  non-interactive auto-fix
  --verbose, -v              detailed per-file progress
  --json                     structured json output (ci)
  --output, -o <file>        write results to file
  --exclude, -x <patterns>   glob exclude patterns (comma-sep)
  --model, -m <name>         nim model (default: deepseek-ai/deepseek-v4-pro)
  --max-concurrent <n>       parallel nim requests (default: 3)
  --threshold <n>            minimum health score (0-100)
  --max-function-lines <n>   warn on functions over n lines (default: 60)
  --init                     scaffold .code-debloaterrc
  --version                  print version
  --help, -h                 show this help

environment:
  NVIDIA_API_KEY             required. get yours at https://integrate.nvidia.com
  CODE_DEBLOATER_MODEL       model override (same as --model)

config file (auto-loaded):
  .code-debloaterrc           project-specific settings (json)

examples

# scan current directory interactively
code-debloater

# audit only — no fixes
code-debloater --scan-only ./src

# preview what would change
code-debloater --dry-run --verbose

# ci-friendly json report
code-debloater --json --output report.json

# skip test & vendor dirs
code-debloater --exclude "test/**,vendor/**"

# fast unattended fixes
code-debloater --yes --max-concurrent 5

# scaffold a config file
code-debloater --init

what it detects

category	examples
lazy todos	`// TODO: implement later`, `// FIXME: needs work`
ai-generated stubs	`// generated by claude`, `// auto-generated stub`
incomplete code	`// insert logic here`, `// add your own code`
unimplemented	`throw new Error('Not implemented')`, `// needs implementation`
placeholders	`// your code goes here`, `// ... implement this`
structural duplicates	functions with identical ast bodies (variable names ignored)
oversized functions	functions exceeding `--max-function-lines` (default: 60)

architecture

src/
├── index.ts                  # entry point + cli flag parsing
├── config.ts                 # config loader (.code-debloaterrc + env + cli)
├── ai/
│   └── nimConnector.ts       # nvidia nim api client with retry logic
├── cli/
│   ├── interface.ts          # terminal ui (spinners, tables, diffs)
│   └── output.ts             # json/csv formatters for ci
├── core/
│   ├── crawler.ts            # file discovery (gitignore-aware)
│   ├── fixer.ts              # parallel nim fix executor
│   ├── issueScorer.ts        # health scoring, grading, recommendations
│   └── scanners/
│       ├── astScanner.ts     # typescript ast function extraction + normalization
│       ├── astUtils.ts       # shared ast helpers (no source-code duplication)
│       ├── bloatScanner.ts   # oversized function detection
│       └── commentScanner.ts # regex-based placeholder/todo detection

how duplicates work

extract — every function/method/arrow function is parsed from js/ts files using the typescript compiler api.
normalize — variable names → __id1, __id2, string literals → __str, numbers → 0. this strips cosmetic differences.
cluster — functions with identical normalized bodies are grouped.
report — clusters with 2+ members are reported as duplicates.
fix — deepseek v4 pro generates a refactoring strategy for each cluster.

compared to local models

	ollama (local llama)	code-debloater (nvidia nim)
gpu needed	yes (or very slow on cpu)	no
setup	install ollama, pull model	just set `NVIDIA_API_KEY`
speed	depends on hardware	~1-3s per fix on nim
model	llama 3 (8b/70b)	deepseek v4 pro (moe, 200b+)
quality	decent for small fixes	production-grade code generation
cost	free (your electricity)	free tier available
context window	8k–128k	1m tokens

development

# clone and build
git clone https://github.com/houseofmates/code-debloater
cd code-debloater
npm install
npm run build

# run locally
NVIDIA_API_KEY=nvapi-... npm run start -- ./test-sandbox

# test with dry run
NVIDIA_API_KEY=nvapi-... npm run start -- --dry-run ./test-sandbox

license

MIT license

credits

forked from zenapta/bloathunter by shashank bhardwaj (mit license). the original project deservers credit for the ast scanning architecture and the concept of structural duplicate detection. this fork replaces the local ollama/llama 3 backend with nvidia nim (deepseek v4 pro) and adds extensive new features.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
.github/workflows		.github/workflows
src		src
test-sandbox		test-sandbox
.env.example		.env.example
.gitignore		.gitignore
.npmrc		.npmrc
LICENSE		LICENSE
README.md		README.md
SKILL.md		SKILL.md
eslint.config.cjs		eslint.config.cjs
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

code-debloater

features

quick start

get a free api key

usage

examples

what it detects

architecture

how duplicates work

compared to local models

development

license

credits

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

code-debloater

features

quick start

get a free api key

usage

examples

what it detects

architecture

how duplicates work

compared to local models

development

license

credits

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages