git-forensics

A TypeScript library for providing insights from git commit history.

Features

Actionable insights
Fast - ~700ms for 100,000 commits (getting the git-log will be slow)
Follows file rename and removal
Optimized for CI
Integrated (a VERY basic) code complexity engine
Bring your own code complexity score
Add custom metrics using full temporal history

Motivation

Part exploration after dealing with such questions and digging online for good ways to get insights from the "meta" of coding, and part needing such library for some other tools I am looking to build. While there are some great tools out there that can look at Git history and provide insights and reports (git-of-theseus, code-maat, git-fame, git-quick-stats,MergeStat, Hercules, gitinspector) - they "feel" heavy and not well suited to be backend for any dev-tool.

Lastly, while very enticing, I would not recommend running this on years of commits, but rather focus on more recent history (6-9 months of work). While you could do this for 20 years of code, and given this library does follows renames and such, history will make sense, but most of the data may get polluted given the long history.

Installation

npm install git-forensics

Quick Start

import { simpleGit } from 'simple-git';
import { computeForensics } from 'git-forensics';

const git = simpleGit('/path/to/repo');
const forensics = await computeForensics(git);

forensics.hotspots; // Files changed most often
forensics.churn; // Code volatility (lines added/deleted)
forensics.coupledPairs; // Hidden dependencies
forensics.couplingRankings; // Architectural hubs
forensics.codeAge; // Stale code detection
forensics.ownership; // Knowledge silos
forensics.communication; // Developer coordination needs

Example Output

Running computeForensics on a repository returns structured data across all metrics:

{
  "analyzedCommits": 842,
  "dateRange": { "from": "2024-03-10", "to": "2025-01-15" },
  "metadata": {
    "maxCommitsAnalyzed": 1000,
    "topN": 50,
    "totalFilesAnalyzed": 134,
    "totalAuthors": 12,
    "analyzedAt": "2025-01-20T14:32:00Z",
  },

  // Files changed most often — where bugs likely hide
  "hotspots": [
    { "file": "src/api/routes.ts", "revisions": 87, "exists": true },
    { "file": "src/core/engine.ts", "revisions": 64, "exists": true },
    { "file": "src/utils/helpers.ts", "revisions": 41, "exists": true },
  ],

  // Files that always change together — hidden dependencies
  "coupledPairs": [
    {
      "file1": "src/api/routes.ts",
      "file2": "src/api/middleware.ts",
      "couplingPercent": 82,
      "coChanges": 34,
      "file1Exists": true,
      "file2Exists": true,
    },
    {
      "file1": "src/db/schema.ts",
      "file2": "src/db/migrations.ts",
      "couplingPercent": 91,
      "coChanges": 22,
      "file1Exists": true,
      "file2Exists": true,
    },
  ],

  // Architectural hubs — files coupled to many others
  "couplingRankings": [
    { "file": "src/api/routes.ts", "couplingScore": 8, "exists": true },
    { "file": "src/core/engine.ts", "couplingScore": 5, "exists": true },
  ],

  // Stale code — unchanged for a long time
  "codeAge": [
    {
      "file": "src/legacy/parser.ts",
      "ageMonths": 14,
      "lastModified": "2023-11-02",
      "exists": true,
    },
    {
      "file": "src/utils/constants.ts",
      "ageMonths": 9,
      "lastModified": "2024-04-15",
      "exists": true,
    },
  ],

  // Knowledge silos — who owns what
  "ownership": [
    {
      "file": "src/core/engine.ts",
      "mainDev": "alice",
      "ownershipPercent": 34,
      "refactoringDev": "bob",
      "refactoringOwnership": 28,
      "fractalValue": 0.18,
      "authorCount": 7,
      "exists": true,
    },
    {
      "file": "src/api/routes.ts",
      "mainDev": "carol",
      "ownershipPercent": 62,
      "refactoringDev": "carol",
      "refactoringOwnership": 55,
      "fractalValue": 0.52,
      "authorCount": 4,
      "exists": true,
    },
  ],

  // Code volatility — lines added and deleted
  "churn": [
    {
      "file": "src/core/engine.ts",
      "added": 3200,
      "deleted": 1800,
      "churn": 5000,
      "revisions": 64,
      "exists": true,
    },
    {
      "file": "src/api/routes.ts",
      "added": 1400,
      "deleted": 600,
      "churn": 2000,
      "revisions": 87,
      "exists": true,
    },
  ],

  // Developer coordination needs (Conway's Law)
  "communication": [
    { "author1": "alice", "author2": "bob", "sharedEntities": 12, "strength": 67 },
    { "author1": "carol", "author2": "alice", "sharedEntities": 8, "strength": 45 },
  ],
}

Passing the result to generateInsights produces actionable alerts:

[
  {
    "file": "src/core/engine.ts",
    "type": "hotspot",
    "severity": "critical",
    "data": { "type": "hotspot", "revisions": 64, "rank": 2 },
    "fragments": {
      "title": "Hotspot",
      "finding": "64 revisions, ranked #2 in repository",
      "risk": "Top-ranked churn file — prioritize for refactoring or test hardening",
      "suggestion": "Consider breaking into smaller modules or adding test coverage",
    },
  },
  {
    "file": "src/core/engine.ts",
    "type": "high-churn",
    "severity": "critical",
    "data": { "type": "high-churn", "churn": 5000, "added": 3200, "deleted": 1800 },
    "fragments": {
      "title": "High Churn",
      "finding": "5,000 lines changed (+3,200 / -1,800)",
      "risk": "Frequent rewrites suggest unclear requirements or architectural friction",
      "suggestion": "Consider refactoring to stabilize this file",
    },
  },
  {
    "file": "src/core/engine.ts",
    "type": "ownership-risk",
    "severity": "critical",
    "data": {
      "type": "ownership-risk",
      "fractalValue": 0.18,
      "authorCount": 7,
      "mainDev": "alice",
    },
    "fragments": {
      "title": "Fragmented Ownership",
      "finding": "7 contributors, fragmentation score 0.18",
      "risk": "Diffuse ownership slows review cycles and increases merge conflicts",
      "suggestion": "Request review from alice (primary contributor)",
    },
  },
  {
    "file": "src/legacy/parser.ts",
    "type": "stale-code",
    "severity": "info",
    "data": { "type": "stale-code", "ageMonths": 14, "lastModified": "2023-11-02" },
    "fragments": {
      "title": "Stale Code",
      "finding": "Unchanged for 14 months (since Nov 2023)",
      "risk": "Untouched code drifts from current conventions and loses institutional knowledge",
      "suggestion": "Extra review recommended; verify tests still cover this code",
    },
  },
]

Actionable Insights

Raw metrics are useful, but generateInsights transforms them into actionable alerts with human-readable messages:

import { computeForensics, generateInsights } from 'git-forensics';

const forensics = await computeForensics(git);
const insights = generateInsights(forensics);

for (const insight of insights) {
  console.log(`${insight.file} — ${insight.fragments.title}`);
  console.log(`  ${insight.fragments.finding}`);
  console.log(`  ${insight.fragments.suggestion}`);
}

Each insight includes severity (info, warning, critical) and pre-composed fragments:

insight.severity; // "warning" | "critical"
insight.fragments.title; // "Hotspot"
insight.fragments.finding; // "45 revisions, ranked #3 in repository"
insight.fragments.risk; // "Frequently changed files correlate with higher defect rates"
insight.fragments.suggestion; // "Consider breaking into smaller modules..."

Insight thresholds

Question	Metric	Insight triggers when
Where's the riskiest code?	`hotspots`	≥25 revisions
What keeps getting rewritten?	`churn`	≥1000 lines churned
What hidden dependencies exist?	`coupledPairs`	≥70% co-change rate
What has ripple effects?	`couplingRankings`	Coupled to ≥5 files
What's been forgotten?	`codeAge`	Unchanged ≥12 months
Who owns what? Any knowledge silos?	`ownership`	≥3 authors, fragmented

Build your own insights

The forensics.stats field contains the complete temporal history—every commit, by every author, for every file:

const forensics = await computeForensics(git);

// Access raw stats for custom analysis
for (const [file, fileStats] of Object.entries(forensics.stats.fileStats)) {
  // fileStats.byAuthor: Record<author, CommitEntry[]>
  // fileStats.authorContributions: Record<author, {additions, deletions, revisions}>
  // fileStats.totalRevisions, latestCommit, nameHistory, couplingScore
}

Ideas for custom metrics:

Temporal histograms — Activity by week/month, burst detection (use CommitEntry.date)
Author expertise scores — Weight recent changes higher (use byAuthor + recency)
Structural stability — Files that move often signal architectural churn (use nameHistory)
Churn velocity — Is volatility increasing or stabilizing? (use CommitEntry.additions/deletions)
Handoff detection — Who leaves code for whom to modify? (use sequential authors in byAuthor)

Complexity Analysis

git-forensics separates commit analysis from static code analysis. It provides optional complexity helpers for convenience (using indent-complexity). It is recommended you use a language-aware complexity scoring and pass the results to computeForensics.

CI Usage

Building a report

Loop over insights and build a PR comment or CI annotation:

const insights = generateInsights(forensics, { minSeverity: 'warning' });

for (const insight of insights) {
  const prefix = insight.severity === 'critical' ? '[CRITICAL]' : '[WARNING]';
  console.log(`${prefix} ${insight.file} - ${insight.fragments.title}`);
  console.log(`  ${insight.fragments.finding}`);
  console.log(`  ${insight.fragments.suggestion}\n`);
}

Optimization: Store & Reuse (large codebases)

git-forensics is fast (~700ms for 100k commits), for very large repos you can eliminate git history scans entirely by storing forensics data between runs.

Step 1: Full analysis (scheduled or first run)

import { simpleGit } from 'simple-git';
import { computeForensics } from 'git-forensics';

const git = simpleGit();
const forensics = await computeForensics(git);

// Store on your server for later reuse
await fetch('https://your-server/api/forensics', {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({
    repo: 'your-org/your-repo',
    data: forensics,
  }),
});

Step 2: Fast PR insights (no git scan needed)

import { simpleGit } from 'simple-git';
import { generateInsights, getChangedFiles } from 'git-forensics';

const git = simpleGit();

// Fetch pre-computed forensics from your server
const res = await fetch('https://your-server/api/forensics?repo=your-org/your-repo');
const forensics = await res.json();

// Generate insights only for PR changed files
const changedFiles = await getChangedFiles(git, 'origin/main');
const insights = generateInsights(forensics, {
  files: changedFiles,
  minSeverity: 'warning',
});

Data-Driven API

For environments without direct git access use computeForensicsFromData() with pre-fetched git data:

import { computeForensicsFromData, gitLogDataSchema, validateGitLogData } from 'git-forensics';

// Data must match the following format
const data = {
  log: {
    all: [
      {
        hash: 'abc123',
        date: '2025-01-15T10:00:00Z',
        author_name: 'Alice',
        message: 'Add feature',
        diff: {
          files: [
            { file: 'src/app.ts', insertions: 50, deletions: 10 },
            { file: 'src/utils.ts', insertions: 20, deletions: 5 },
          ],
        },
      },
      // ... more commits
    ],
  },
  trackedFiles: 'src/app.ts\nsrc/utils.ts\nsrc/index.ts', // from git ls-files
};

// Print JSON-schema if needed
console.log(gitLogDataSchema); // JSON Schema object

// Validate before processing
validateGitLogData(data); // throws if invalid

const forensics = computeForensicsFromData(data);

Attribution

Based on concepts from Adam Tornhill's Your Code as a Crime Scene and Software Design X-Rays.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
.github		.github
.husky		.husky
benchmarks		benchmarks
src		src
.gitignore		.gitignore
.npmrc		.npmrc
.oxlintrc.json		.oxlintrc.json
.prettierignore		.prettierignore
.prettierrc		.prettierrc
LICENSE		LICENSE
README.md		README.md
package.json		package.json
pnpm-lock.yaml		pnpm-lock.yaml
tsconfig.json		tsconfig.json
tsup.config.ts		tsup.config.ts
vitest.config.ts		vitest.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

git-forensics

Features

Motivation

Installation

Quick Start

Example Output

Actionable Insights

Insight thresholds

Build your own insights

Complexity Analysis

CI Usage

Building a report

Optimization: Store & Reuse (large codebases)

Data-Driven API

Attribution

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

License

itaymendel/git-forensics

Folders and files

Latest commit

History

Repository files navigation

git-forensics

Features

Motivation

Installation

Quick Start

Example Output

Actionable Insights

Insight thresholds

Build your own insights

Complexity Analysis

CI Usage

Building a report

Optimization: Store & Reuse (large codebases)

Data-Driven API

Attribution

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages