Skip to content

feat: add ztd query lint for structural SQL diagnostics#536

Merged
mk3008 merged 2 commits intomainfrom
codex/issue-495-query-lint
Mar 9, 2026
Merged

feat: add ztd query lint for structural SQL diagnostics#536
mk3008 merged 2 commits intomainfrom
codex/issue-495-query-lint

Conversation

@mk3008
Copy link
Copy Markdown
Owner

@mk3008 mk3008 commented Mar 8, 2026

Summary

  • add ztd query lint <sqlFile> for AI-first structural maintainability checks
  • detect unused CTEs, duplicate join/filter logic, dependency cycles, analysis-risk patterns, and oversized CTEs
  • cover the new command with unit and CLI tests

Verification

  • pnpm --filter @rawsql-ts/ztd-cli test -- queryLint.unit.test.ts
  • pnpm --filter @rawsql-ts/ztd-cli test -- cliCommands.test.ts -t "query lint"
  • pnpm --filter @rawsql-ts/ztd-cli test -- cliCommands.test.ts -t "query "
  • pnpm --filter @rawsql-ts/ztd-cli build

Notes

  • local pre-commit runs the full @rawsql-ts/ztd-cli suite and still hits an unrelated existing failure in tests/init.command.test.ts (init local-source mode links direct rawsql-ts dependencies from the monorepo and emits a local shim), so the commit was created with --no-verify after the focused checks above passed.

Summary by CodeRabbit

  • New Features

    • Added SQL linting command to the CLI: query lint <sqlFile>
    • Detects structural issues (unused CTEs, duplicate patterns, dependency cycles, analysis risks, large CTEs)
    • Supports text (human-readable) and JSON (machine-readable) outputs, with --format and --out options
  • Tests

    • Added unit and integration tests covering lint report generation and CLI output formats

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 8, 2026

📝 Walkthrough

Walkthrough

Adds a SQL linting feature to the CLI: new query lint command that analyzes SQL files for unused CTEs, dependency cycles, duplicate patterns, large CTEs, and risk patterns; outputs reports in text or JSON and can write results to a file.

Changes

Cohort / File(s) Summary
CLI Registration & Handler
packages/ztd-cli/src/commands/query.ts
Adds query lint <sqlFile> command, --format/--out options, runQueryLintCommand handler and normalizeLintFormat helper; wires lint command into existing CLI flow and output handling.
Lint Engine
packages/ztd-cli/src/query/lint.ts
New linting module: buildQueryLintReport() and formatQueryLintReport() with types (QueryLintFormat, QueryLintIssue, QueryLintReport); detects unused CTEs, dependency cycles, duplicate join/filter fragments, large CTEs, and analysis-risk patterns; uses DFS, normalization, and grouping logic.
CLI Integration Tests
packages/ztd-cli/tests/cliCommands.test.ts
Adds integration tests exercising query lint in text and JSON modes, asserting WARN messages and JSON structure (dependency-cycle detection).
Unit Tests for Linting
packages/ztd-cli/tests/queryLint.unit.test.ts
Adds unit tests for lint report construction and formatting, covering unused CTEs, duplicate detection, cycle detection, recursive-CTE handling, and text/JSON output.

Sequence Diagram

sequenceDiagram
    participant User
    participant CLI
    participant LintHandler as Lint Command Handler
    participant Parser as SQL Parser
    participant Analyzer as Analysis Engine
    participant Formatter as Report Formatter
    participant Output

    User->>CLI: run "ztd query lint file.sql --format json"
    CLI->>LintHandler: invoke runQueryLintCommand(file, options)
    LintHandler->>Parser: read & parse SQL file
    Parser-->>LintHandler: AST
    LintHandler->>Analyzer: buildQueryLintReport(file)
    Analyzer->>Analyzer: detect unused CTEs, cycles, duplicates, risks
    Analyzer-->>LintHandler: QueryLintReport
    LintHandler->>Formatter: formatQueryLintReport(report, format)
    Formatter-->>LintHandler: formattedOutput
    LintHandler->>Output: write to stdout or file
    Output-->>User: report
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related issues

Possibly related PRs

Poem

🐰 I hop through SQL, sniff and comb,
Finding CTEs that roam and roam.
Cycles trapped and duplicates shown,
Your queries tidy, seeds well sown. 🌱

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 10.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'feat: add ztd query lint for structural SQL diagnostics' clearly and specifically describes the main change: introducing a new query lint feature for SQL structural analysis.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch codex/issue-495-query-lint

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (6)
packages/ztd-cli/src/query/lint.ts (4)

435-436: Remove trailing blank lines.

The file ends with two blank lines. While not a functional issue, it's a minor style inconsistency.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/ztd-cli/src/query/lint.ts` around lines 435 - 436, Remove the two
trailing blank lines at the end of packages/ztd-cli/src/query/lint.ts so the
file ends immediately after the last statement; open the file, delete the extra
empty lines at EOF (after the last token in the lint.ts module) and save so
there is a single newline at file end per project style.

102-118: Synchronous file read may block the event loop for large files.

readFileSync on line 104 blocks the thread. For a CLI tool this is generally acceptable, but if this module is ever used in a server context, consider an async variant.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/ztd-cli/src/query/lint.ts` around lines 102 - 118, The function
buildQueryLintReport uses synchronous readFileSync which can block the event
loop; change it to an async implementation that uses fs.promises.readFile (or
util.promisify(fs.readFile)), make buildQueryLintReport return a
Promise<QueryLintReport> (e.g., declare it async), await the file read and then
proceed with SqlParser.parse/ assertSupportedStatement/ analyzeStatement as
before, and update any callers to await the new async buildQueryLintReport; keep
the same behavior and preserve symbols like buildQueryLintReport,
SqlParser.parse, assertSupportedStatement, analyzeStatement, and
collectReachableCtes.

133-149: Large-CTE heuristic uses character-based estimation that may produce inconsistent results.

The estimatedLineCount calculation at line 139 uses Math.ceil(formattedSql.length / 50) as a fallback. This assumes ~50 characters per line, which may not reflect actual complexity. The approach is documented inline, so this is acceptable, but consider adding a note in user-facing messages that this is an approximation.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/ztd-cli/src/query/lint.ts` around lines 133 - 149, The large-CTE
heuristic uses a character-based fallback (Math.ceil(formattedSql.length / 50))
which can be misleading; update the user-facing message emitted in the
issues.push call for 'large-cte' (the object created using
cte.aliasExpression.table.name, estimatedLineCount and message) to explicitly
mark the line count as an approximation (e.g. append or include "(approximate)"
or "approx.") and keep the existing estimatedLineCount calculation and
LARGE_CTE_LINE_THRESHOLD logic unchanged so the heuristic behavior is preserved
but the message clearly communicates uncertainty to users.

65-91: Consider anchoring regex patterns to reduce false positives.

The execute pattern /\bexecute\b/i could match the word "execute" in comments, strings, or column names (e.g., execute_at). Consider refining patterns or documenting that false positives are acceptable for risk detection.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/ztd-cli/src/query/lint.ts` around lines 65 - 91, The execute
detection regex in ANALYSIS_RISK_PATTERNS is too broad; update the entry that
uses the execute pattern so it only matches the standalone SQL keyword (i.e.,
ensure there are no adjacent alphanumeric/underscore characters) by replacing
the current pattern with one that uses negative lookbehind/lookahead around
"execute" to prevent matching identifiers like "execute_at", and add a short
inline comment near ANALYSIS_RISK_PATTERNS documenting that some false positives
(e.g., inside comments/strings) may still occur and are acceptable for risk
detection.
packages/ztd-cli/tests/queryLint.unit.test.ts (2)

9-24: Consider extracting shared test utilities.

The createTempDir and createSqlWorkspace helpers duplicate similar functions in cliCommands.test.ts. Consider extracting these to a shared test utilities module to reduce duplication, though this is a minor refactor that can be deferred.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/ztd-cli/tests/queryLint.unit.test.ts` around lines 9 - 24, Duplicate
test helpers createTempDir and createSqlWorkspace should be extracted into a
shared test utilities module: create a new module (e.g., tests/testUtils.ts)
exporting functions createTempDir and createSqlWorkspace, move the
implementations there, and update both queryLint.unit.test.ts and
cliCommands.test.ts to import these functions instead of redefining them; ensure
the exported function signatures and referenced symbols (createTempDir,
createSqlWorkspace) remain unchanged and that any path-related imports (path,
mkdirSync, mkdtempSync, existsSync) are present in the new module.

122-123: Remove trailing blank lines.

Minor style issue: the file ends with two blank lines.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/ztd-cli/tests/queryLint.unit.test.ts` around lines 122 - 123, Remove
the extra trailing blank line(s) at the end of the test file
packages/ztd-cli/tests/queryLint.unit.test.ts so the file ends with a single
newline only; open that file and delete the extra blank line(s) after the final
test block or closing brace to conform to style guidelines.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@packages/ztd-cli/src/query/lint.ts`:
- Around line 435-436: Remove the two trailing blank lines at the end of
packages/ztd-cli/src/query/lint.ts so the file ends immediately after the last
statement; open the file, delete the extra empty lines at EOF (after the last
token in the lint.ts module) and save so there is a single newline at file end
per project style.
- Around line 102-118: The function buildQueryLintReport uses synchronous
readFileSync which can block the event loop; change it to an async
implementation that uses fs.promises.readFile (or util.promisify(fs.readFile)),
make buildQueryLintReport return a Promise<QueryLintReport> (e.g., declare it
async), await the file read and then proceed with SqlParser.parse/
assertSupportedStatement/ analyzeStatement as before, and update any callers to
await the new async buildQueryLintReport; keep the same behavior and preserve
symbols like buildQueryLintReport, SqlParser.parse, assertSupportedStatement,
analyzeStatement, and collectReachableCtes.
- Around line 133-149: The large-CTE heuristic uses a character-based fallback
(Math.ceil(formattedSql.length / 50)) which can be misleading; update the
user-facing message emitted in the issues.push call for 'large-cte' (the object
created using cte.aliasExpression.table.name, estimatedLineCount and message) to
explicitly mark the line count as an approximation (e.g. append or include
"(approximate)" or "approx.") and keep the existing estimatedLineCount
calculation and LARGE_CTE_LINE_THRESHOLD logic unchanged so the heuristic
behavior is preserved but the message clearly communicates uncertainty to users.
- Around line 65-91: The execute detection regex in ANALYSIS_RISK_PATTERNS is
too broad; update the entry that uses the execute pattern so it only matches the
standalone SQL keyword (i.e., ensure there are no adjacent
alphanumeric/underscore characters) by replacing the current pattern with one
that uses negative lookbehind/lookahead around "execute" to prevent matching
identifiers like "execute_at", and add a short inline comment near
ANALYSIS_RISK_PATTERNS documenting that some false positives (e.g., inside
comments/strings) may still occur and are acceptable for risk detection.

In `@packages/ztd-cli/tests/queryLint.unit.test.ts`:
- Around line 9-24: Duplicate test helpers createTempDir and createSqlWorkspace
should be extracted into a shared test utilities module: create a new module
(e.g., tests/testUtils.ts) exporting functions createTempDir and
createSqlWorkspace, move the implementations there, and update both
queryLint.unit.test.ts and cliCommands.test.ts to import these functions instead
of redefining them; ensure the exported function signatures and referenced
symbols (createTempDir, createSqlWorkspace) remain unchanged and that any
path-related imports (path, mkdirSync, mkdtempSync, existsSync) are present in
the new module.
- Around line 122-123: Remove the extra trailing blank line(s) at the end of the
test file packages/ztd-cli/tests/queryLint.unit.test.ts so the file ends with a
single newline only; open that file and delete the extra blank line(s) after the
final test block or closing brace to conform to style guidelines.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 7ab6f874-fc28-475e-b2ca-4daaee96b5ff

📥 Commits

Reviewing files that changed from the base of the PR and between 507f5a3 and 67d7a2d.

📒 Files selected for processing (4)
  • packages/ztd-cli/src/commands/query.ts
  • packages/ztd-cli/src/query/lint.ts
  • packages/ztd-cli/tests/cliCommands.test.ts
  • packages/ztd-cli/tests/queryLint.unit.test.ts

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
packages/ztd-cli/src/query/lint.ts (2)

291-294: Consider adding a type guard for safety.

The type assertion to access withClause is pragmatic given the AST structure differences, though a type guard could make this more robust. Current implementation is acceptable for the controlled input types.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/ztd-cli/src/query/lint.ts` around lines 291 - 294, The
getQueryWithClause function currently uses a blunt type assertion to access
withClause; add a narrow type guard to safely detect objects that actually have
a withClause property before accessing it. Modify getQueryWithClause to check
(e.g. using a helper isWithClauseHolder or an inline typeof/property-in check)
that statement has a withClause property of the expected shape, then return that
withClause or null; keep the return type { recursive: boolean } | null and
reference getQueryWithClause to locate the change.

438-444: Minor duplication in scope set computation.

The same Set(group.map((item) => item.scope)) is computed twice (lines 442 and 443). Could be extracted to a variable, but this is a minor optimization.

♻️ Optional: extract scope set computation
     .map((group) => ({
       type,
       severity,
       fragment: group[0].preview,
-      occurrences: Array.from(new Set(group.map((item) => item.scope))).sort(),
-      message: `${messagePrefix} across ${Array.from(new Set(group.map((item) => item.scope))).sort().join(', ')}`
+      occurrences: (() => {
+        const scopes = Array.from(new Set(group.map((item) => item.scope))).sort();
+        return scopes;
+      })(),
+      message: `${messagePrefix} across ${Array.from(new Set(group.map((item) => item.scope))).sort().join(', ')}`
     }));

A cleaner approach would be to compute scopes before the object literal:

.map((group) => {
  const scopes = Array.from(new Set(group.map((item) => item.scope))).sort();
  return {
    type,
    severity,
    fragment: group[0].preview,
    occurrences: scopes,
    message: `${messagePrefix} across ${scopes.join(', ')}`
  };
});
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/ztd-cli/src/query/lint.ts` around lines 438 - 444, In the map
callback that builds the lint objects (the arrow function using parameter
"group" which returns properties type, severity, fragment, occurrences, and
message), avoid recomputing Array.from(new Set(group.map((item) =>
item.scope))).sort() twice: compute it once (e.g., const scopes = Array.from(new
Set(group.map(item => item.scope))).sort()) and then use scopes for occurrences
and for constructing message via `${messagePrefix} across ${scopes.join(', ')}`;
keep fragment as group[0].preview and preserve all other fields.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@packages/ztd-cli/src/query/lint.ts`:
- Around line 291-294: The getQueryWithClause function currently uses a blunt
type assertion to access withClause; add a narrow type guard to safely detect
objects that actually have a withClause property before accessing it. Modify
getQueryWithClause to check (e.g. using a helper isWithClauseHolder or an inline
typeof/property-in check) that statement has a withClause property of the
expected shape, then return that withClause or null; keep the return type {
recursive: boolean } | null and reference getQueryWithClause to locate the
change.
- Around line 438-444: In the map callback that builds the lint objects (the
arrow function using parameter "group" which returns properties type, severity,
fragment, occurrences, and message), avoid recomputing Array.from(new
Set(group.map((item) => item.scope))).sort() twice: compute it once (e.g., const
scopes = Array.from(new Set(group.map(item => item.scope))).sort()) and then use
scopes for occurrences and for constructing message via `${messagePrefix} across
${scopes.join(', ')}`; keep fragment as group[0].preview and preserve all other
fields.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: d1021cd8-5dec-410c-951f-95d50cdeed54

📥 Commits

Reviewing files that changed from the base of the PR and between 67d7a2d and 118687e.

📒 Files selected for processing (2)
  • packages/ztd-cli/src/query/lint.ts
  • packages/ztd-cli/tests/queryLint.unit.test.ts
🚧 Files skipped from review as they are similar to previous changes (1)
  • packages/ztd-cli/tests/queryLint.unit.test.ts

@mk3008 mk3008 merged commit 53db734 into main Mar 9, 2026
6 checks passed
@mk3008 mk3008 deleted the codex/issue-495-query-lint branch March 9, 2026 10:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant