AI Code Audit Taxonomy

A curated taxonomy of code defect patterns that AI assistants produce with distinctive frequency, form, or mechanism in Python. Each entry documents the defective shape, explains why language models produce it, links to real-world incidents, and provides mechanical detection cues. The current release covers 24 patterns — the set where evidence met the inclusion rule, not a target count.

The patterns are AI-amplified, not AI-exclusive. Most are ordinary defects — swallowed exceptions, off-by-one errors, missing timeouts — that human programmers also write. What makes them worth cataloging separately is that AI-generated code produces them at characteristic densities, in characteristic forms, and through mechanisms tied to how language models generate code. The honest claim is AI-shaped, not AI-only.

Why this exists

AI coding assistants produce correct code most of the time. The difficult part is staying alert across hundreds of correct outputs and catching the broken one. Knowing the characteristic shapes to watch for makes the difference.

This taxonomy is a reference for anyone who reads AI-generated code: reviewers, auditors, developers using AI assistants, or teams building review workflows. It is organized around two questions:

What does the defect look like? Detection cues you can grep for or spot on visual scan.
Why does the model produce it? Mechanism explanations that connect the defect to how language models work — not just what to look for, but why it keeps appearing.

Each entry also documents false-positive shapes — what looks like the pattern but isn't — so you can triage confidently rather than over-flag.

How to use this

If you review AI-generated code: scan the Detection cues sections. Most are grep-able (except Exception: pass, requests.get( without timeout=, logger.info(f"). Start with the entries rated difficulty: low — they have the highest signal-to-effort ratio.

If you build or configure AI coding tools: the Mechanism sections explain why each pattern recurs. The cross-cutting note codified-guidance-is-insufficient documents why CLAUDE.md / AGENTS.md conventions alone don't prevent these — enforcement via lint rules and CI is the cure.

If you maintain ruff, bandit, or similar linters: many entries map to existing rules (ruff BLE001, B006, G004, SIM115, PLC0415, RUF029; bandit B101, B113, B202, B602, B608). The taxonomy documents the AI-amplified densities at which these rules fire and the false-positive shapes that help triage.

How language models generate code — a brief primer

The Mechanism sections use a small vocabulary from how language models work. Three ideas cover most of it.

Token-level prediction. A language model writes code one small piece at a time — roughly word-sized chunks called tokens — picking the most likely next token given what it has just produced. It commits as it goes; it does not draft a whole function and then refine.

The training corpus. The body of text the model learned from: Stack Overflow answers, tutorials, GitHub repositories, documentation, blog posts. The shapes that occur most frequently per token in the training corpus are the ones the model produces most fluently. Under-represented shapes are under-produced even when they are the right answer for the situation.

Local attention. The model decides each next token mostly from the surrounding code — recent lines, the current function signature, nearby imports — not by re-reading the whole file or project. Conventions documented elsewhere (style guides, lint rules, project docs) sit outside this local window unless they are explicitly pulled in.

Most defects in the taxonomy come from these three forces together: the corpus-fluent shape wins the per-token decision, and consistency that would require looking outside the local window is not enforced at generation time. No further ML background is needed.

The entries

By surface (category)

Category	Entries
error-handling	swallowed-exceptions, inconsistent-error-handling, brittle-error-detection
structure	near-identical-siblings, unjustified-lazy-import
control-flow	off-by-one, swapped-args
security	string-built-sql, shell-true-subprocess-injection, tarfile-extractall-without-filter
observability	print-instead-of-logging, f-string-in-logger-call
reliability	missing-network-timeout, resource-leak-no-context-manager
async	async-await-mismatch, sleep-based-synchronization
language-pitfall	mutable-default-arguments, assert-for-runtime-validation
configuration	hardcoded-config-values
consistency	convention-drift
documentation	narrating-comments
testing	weak-test-assertion
defensive-programming	unreachable-defensive-guard
library-usage	wrong-tool-for-job

By mechanism (cross-cutting notes)

Each entry has a category (the surface where the defect appears) and may participate in one or more cross-cutting notes (the mechanism connecting it to other entries). The two axes are independent.

Note	What it observes	Entries
ai-pedagogical-bias	Model treats production code as tutorial code	6 entries
same-project-knows-right-pattern	Same codebase uses the right pattern at one site, wrong at another	10 entries
codified-guidance-is-insufficient	Documented conventions don't prevent the violations; enforcement is the cure	16+ entries
surface-failure-modes-explicitly	Typed-exception meta-family: surface failure modes through the type system	4 entries
defensive-choice-with-justifying-comment	Defensive choices paired with comments justifying constraints that don't survive verification	9+ entries
partial-fix-propagation	A prior fix addressed some sites; sibling sites retain the wrong pattern	3 entries

Entry format

Each taxonomy entry follows the same structure:

Code example — minimal defective code; the bug should be visible to someone who knows Python.
Mechanism — why a language model produces this shape. Connects to the primer above.
Evidence / incident — real-world GitHub issues, PRs, or CVEs where the pattern was found in AI-generated code. Every entry requires concrete evidence of an AI-vs-human frequency or form differential.
Detection cues — what to grep for or spot on visual scan. Mechanical enough to use as a checklist.
Notes — false-positive shapes, connections to cross-cutting notes, difficulty rating.

All current entries are classified generation: evergreen — patterns stable across model generations. The entry format supports a current-model generation for patterns tied to specific model families, but none have met the inclusion rule yet. Longitudinal tracking of which generations introduce or reduce specific patterns is a future goal.

What this is (and isn't)

This is a reference taxonomy — a structured catalog of patterns with evidence and mechanism. It is not:

An exhaustive list. 24 entries is a starting point, not a ceiling.
A claim that AI is bad at coding. The stance is neutral and practical: assistants are useful tools with characteristic distributional properties.
A frozen document. The patterns AI produces will shift as models change; the taxonomy is meant to be updated alongside them.

Inclusion rule

A pattern enters the taxonomy only if there is concrete evidence of a frequency or form differential between AI-generated and human-written code — not just a plausible mechanism story. Each entry carries an evidence grade: reproduced (independently verified), observed (captured from real projects), reported (documented by others), or analogical (structurally predicted from a confirmed mechanism).

Evidence methodology

The taxonomy's evidence base draws from three streams:

GitHub issues and PRs from AI-coded open-source projects — identified by CLAUDE.md/AGENTS.md presence, AI-attributed commit trailers, or bot-authored audit frameworks. 75 specimens drawn from 65+ distinct repositories across the entries. A small number of repositories contribute specimens to multiple entries; individual entries may have narrower provenance — the per-entry Evidence sections are transparent about sourcing.
Community lint rules (ruff, bandit, pylint, SonarCloud) that independently flag the same patterns — evidence that the broader Python community recognizes these as defect classes regardless of authorship.
Academic cross-validation — Zhu, Tsantalis & Rigby (2026), "AI-Generated Smells" (arXiv:2605.02741), provides statistical evidence on structural code smells in AI-generated Python code, cross-validating the near-identical-siblings entry and the broader claim that AI-generated code has measurable distributional properties.

Evidence specimens referenced in entries link to the original GitHub issues. Local specimen files (detailed research notes) are not included in this repository.

Sources and background

Zhu, Tsantalis & Rigby (2026): AI-Generated Smells: An Analysis of Code and Architecture in LLM- and Agent-Driven Development. Concordia University. Complementary scope: production-code structural smells via static analysis.
Community lint ecosystems: ruff, bandit, pylint, SonarCloud — the rules these tools enforce against many of the same patterns are cited throughout the entries.

License

MIT — see LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
.github/workflows		.github/workflows
docs		docs
scripts		scripts
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
mkdocs.yml		mkdocs.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AI Code Audit Taxonomy

Why this exists

How to use this

How language models generate code — a brief primer

The entries

By surface (category)

By mechanism (cross-cutting notes)

Entry format

What this is (and isn't)

Inclusion rule

Evidence methodology

Sources and background

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

AI Code Audit Taxonomy

Why this exists

How to use this

How language models generate code — a brief primer

The entries

By surface (category)

By mechanism (cross-cutting notes)

Entry format

What this is (and isn't)

Inclusion rule

Evidence methodology

Sources and background

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages