Skip to content

houseofmates/code-debloater

Repository files navigation

code-debloater

ast-driven structural code-bloat scanner powered by nvidia nim (deepseek v4 pro).

detects duplicate logic, lazy placeholders, ai-generated stubs, and todo debt — then auto-fixes them with production-grade code via nvidia's hosted inference api. no local models, no gpu needed.

this is a fork of zenapta/bloathunter by shashank bhardwaj. the original used ollama + llama 3 locally; this version uses nvidia nim (deepseek v4 pro) — no local llm required.


features

  • ast structural analysis — uses the typescript compiler api to normalize function bodies (strips variable names, formatting, string/number literals) so it finds true copy-pasted duplicates even when variables are renamed.
  • nvidia nim auto-fixes — connects to deepseek v4 pro via nvidia's hosted api (integrate.api.nvidia.com). free tier available. no local llm, no ollama, no gpu required.
  • smart placeholder detection — 30+ patterns for ai-generated stubs, lazy todos, unimplemented code, and hidden technical debt. catches generated-by-ai comments, "insert logic here", "not implemented" errors, and more.
  • duplicate refactoring strategies — for each duplicate cluster, gets deepseek v4 pro to generate a concrete extraction plan with import paths and shared function signatures.
  • health scoring — grades your codebase a–f with severity levels (low → critical) so you know where to focus.
  • dry-run mode--dry-run shows colored diffs of what would change without touching files.

quick start

# set your nvidia api key
export NVIDIA_API_KEY=nvapi-...

# run it (scans current directory)
npx code-debloater

# or install globally
npm install -g code-debloater
code-debloater ./src

get a free api key

  1. go to integrate.nvidia.com
  2. sign up / log in
  3. navigate to the api section and generate a free api key
  4. deepseek v4 pro is available on the free tier with rate limits

usage

code-debloater [options] [directory]

options:
  --dry-run, --dry           preview fixes without writing
  --scan-only, --no-fix      audit only; skip ai fixes
  --yes, -y                  non-interactive auto-fix
  --verbose, -v              detailed per-file progress
  --json                     structured json output (ci)
  --output, -o <file>        write results to file
  --exclude, -x <patterns>   glob exclude patterns (comma-sep)
  --model, -m <name>         nim model (default: deepseek-ai/deepseek-v4-pro)
  --max-concurrent <n>       parallel nim requests (default: 3)
  --threshold <n>            minimum health score (0-100)
  --max-function-lines <n>   warn on functions over n lines (default: 60)
  --init                     scaffold .code-debloaterrc
  --version                  print version
  --help, -h                 show this help

environment:
  NVIDIA_API_KEY             required. get yours at https://integrate.nvidia.com
  CODE_DEBLOATER_MODEL       model override (same as --model)

config file (auto-loaded):
  .code-debloaterrc           project-specific settings (json)

examples

# scan current directory interactively
code-debloater

# audit only — no fixes
code-debloater --scan-only ./src

# preview what would change
code-debloater --dry-run --verbose

# ci-friendly json report
code-debloater --json --output report.json

# skip test & vendor dirs
code-debloater --exclude "test/**,vendor/**"

# fast unattended fixes
code-debloater --yes --max-concurrent 5

# scaffold a config file
code-debloater --init

what it detects

categoryexamples
lazy todos// TODO: implement later, // FIXME: needs work
ai-generated stubs// generated by claude, // auto-generated stub
incomplete code// insert logic here, // add your own code
unimplementedthrow new Error('Not implemented'), // needs implementation
placeholders// your code goes here, // ... implement this
structural duplicatesfunctions with identical ast bodies (variable names ignored)
oversized functionsfunctions exceeding --max-function-lines (default: 60)

architecture

src/
├── index.ts                  # entry point + cli flag parsing
├── config.ts                 # config loader (.code-debloaterrc + env + cli)
├── ai/
│   └── nimConnector.ts       # nvidia nim api client with retry logic
├── cli/
│   ├── interface.ts          # terminal ui (spinners, tables, diffs)
│   └── output.ts             # json/csv formatters for ci
├── core/
│   ├── crawler.ts            # file discovery (gitignore-aware)
│   ├── fixer.ts              # parallel nim fix executor
│   ├── issueScorer.ts        # health scoring, grading, recommendations
│   └── scanners/
│       ├── astScanner.ts     # typescript ast function extraction + normalization
│       ├── astUtils.ts       # shared ast helpers (no source-code duplication)
│       ├── bloatScanner.ts   # oversized function detection
│       └── commentScanner.ts # regex-based placeholder/todo detection

how duplicates work

  1. extract — every function/method/arrow function is parsed from js/ts files using the typescript compiler api.
  2. normalize — variable names → __id1, __id2, string literals → __str, numbers → 0. this strips cosmetic differences.
  3. cluster — functions with identical normalized bodies are grouped.
  4. report — clusters with 2+ members are reported as duplicates.
  5. fix — deepseek v4 pro generates a refactoring strategy for each cluster.

compared to local models

ollama (local llama)code-debloater (nvidia nim)
gpu neededyes (or very slow on cpu)no
setupinstall ollama, pull modeljust set NVIDIA_API_KEY
speeddepends on hardware~1-3s per fix on nim
modelllama 3 (8b/70b)deepseek v4 pro (moe, 200b+)
qualitydecent for small fixesproduction-grade code generation
costfree (your electricity)free tier available
context window8k–128k1m tokens

development

# clone and build
git clone https://github.com/houseofmates/code-debloater
cd code-debloater
npm install
npm run build

# run locally
NVIDIA_API_KEY=nvapi-... npm run start -- ./test-sandbox

# test with dry run
NVIDIA_API_KEY=nvapi-... npm run start -- --dry-run ./test-sandbox

license

credits

forked from zenapta/bloathunter by shashank bhardwaj (mit license). the original project deservers credit for the ast scanning architecture and the concept of structural duplicate detection. this fork replaces the local ollama/llama 3 backend with nvidia nim (deepseek v4 pro) and adds extensive new features.

About

AST-driven structural code-bloat scanner powered by NVIDIA NIM (deepseek V4 pro) — detects duplicate logic, lazy placeholders, and AI-generated stubs, then auto-fixes them

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors