"Every legacy codebase tells a story. Sometimes it's a tragedy, sometimes it's a comedy. Usually it's both."
A powerful CLI tool for analyzing legacy codebases, measuring technical debt, and generating beautiful HTML reports with AI-powered insights.
- Multi-language Support - Automatically detects Python, JavaScript, TypeScript, Go, Java, and more
- Code Health Analysis - Cyclomatic complexity, dead code detection, security scanning
- Dependency Health - Vulnerability detection, outdated package identification
- Tech Stack Detection - Framework identification, era determination (classic vs modern)
- Technical Debt Heatmap - Visual representation of problem density by file
- AI-Powered Reports - LLM-generated summaries, refactoring priorities, migration paths
# Clone the repository
git clone https://github.com/code-archaeologist/code-archaeologist.git
cd code-archaeologist
# Install with pip
pip install -e .
# Or install with full features
pip install -e ".[full]"# Analyze a project (output: ./<project>-analysis-<timestamp>.html)
code-archaeologist ./my-legacy-project
# With verbose output
code-archaeologist ./my-legacy-project -v
# Skip LLM analysis (offline mode)
code-archaeologist ./my-legacy-project --no-llm
# Specify output path
code-archaeologist ./my-legacy-project -o ./reports/analysis.htmlcode-archaeologist <project_path> [OPTIONS]
Options:
-o, --output PATH Output report path (default: ./archaeology-report.html)
--llm-backend TEXT LLM backend: openai, ollama, groq, deepseek, azure
--llm-api-key TEXT API key for LLM
--llm-model TEXT Model name (default: gpt-4)
--no-llm Skip LLM analysis
--exclude PATHS Comma-separated directories to exclude
--config PATH Config file path
-v, --verbose Verbose output
Create a config.json file for persistent configuration:
{
"exclude_patterns": [".git", "node_modules", "__pycache__"],
"llm": {
"backend": "openai",
"model": "gpt-4",
"api_key": "your-key-here"
},
"analysis": {
"max_file_size_kb": 1000,
"timeout_seconds": 300
}
}Or use environment variables:
LLM_BACKENDLLM_API_KEYLLM_MODELLLM_BASE_URLOUTPUT_PATH
code-archaeologist ./project --llm-backend openai --llm-api-key sk-xxx --llm-model gpt-4# First, install and start Ollama
brew install ollama
ollama serve
# Pull a model (e.g., llama3, mistral, codellama)
ollama pull llama3
# Run analysis with Ollama
code-archaeologist ./project --llm-backend ollama --llm-model llama3Ollama requires no API key - it runs locally on http://localhost:11434/v1.
code-archaeologist ./project --llm-backend groq --llm-api-key gsk_xxx --llm-model llama-3.1-70b-versatile
code-archaeologist ./project --llm-backend deepseek --llm-api-key sk-xxx --llm-model deepseek-chatexport LLM_BACKEND=ollama
export LLM_MODEL=llama3
code-archaeologist ./project| Language | Complexity | Dead Code | Security |
|---|---|---|---|
| Python | ✅ radon | ✅ vulture | ✅ bandit |
| JavaScript/TypeScript | ✅ ESLint | ||
| Go | ❌ | ||
| Other | ❌ | ❌ |
Core dependencies (installed automatically):
click- CLI frameworkjinja2- HTML templatingpygments- Syntax highlighting & language detectionradon- Python complexity analysisrich- Terminal output formatting
Optional dependencies (for full analysis):
vulture- Dead code detectionbandit- Security scanningsafety/pip-audit- Dependency vulnerabilitiesopenai- LLM integration
- Overall health score (0-100)
- Language distribution chart
- Complexity breakdown
- Issue categorization
- Detected frameworks (Django, React, etc.)
- Era determination (Classic/Modern)
- Architecture patterns (MVC, Microservices, etc.)
- File-by-file problem density
- Color-coded severity
- Click to see details
## Technical Debt Summary
This project is like an abandoned theme park from the 90s - the roller coasters
are rusty, the cotton candy machines are breeding something suspicious, but
you can still smell the nostalgia in the air. The code complexity has broken
through the stratosphere, and those untouchable legacy modules are ticking
time bombs...
[Read the full AI analysis report]
- Onboarding - Quickly understand a new codebase
- Due Diligence - Assess technical health before acquisitions
- Planning - Prioritize refactoring efforts
- Monitoring - Track technical debt over time
- Team Communication - Share health reports with stakeholders
MIT
Contributions welcome! Please read the contributing guidelines first.