Phase 3.10: PgSQL error classification + unit tests#5514
Phase 3.10: PgSQL error classification + unit tests#5514renecannao merged 4 commits intov3.0-5473from
Conversation
New: include/PgSQLErrorClassifier.h, lib/PgSQLErrorClassifier.cpp Functions: - classify_pgsql_error(): classifies by SQLSTATE class — connection (08), transaction rollback (40), resources (53) = retryable; operator intervention (57), system error (58) = fatal; all others (syntax 42, constraints 23, data 22) = report to client - pgsql_can_retry_error(): checks retry conditions (retries left, not in transaction — PgSQL transactions are atomic)
25 test cases: connection errors (08xxx), transaction errors (40xxx), resource errors (53xxx), fatal errors (57xxx/58xxx), non-retryable (syntax/constraint/data), edge cases (null/empty), retry conditions (retries, in-transaction, non-retryable, fatal).
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the system's resilience when interacting with PostgreSQL databases by implementing a robust error classification mechanism. It allows for intelligent decision-making on whether to retry operations after transient errors, handle fatal conditions appropriately, or report non-retryable issues to the client. This improvement is crucial for maintaining application stability and responsiveness under varying database load and availability conditions. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
|
Caution Review failedThe pull request is closed. ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
📝 WalkthroughWalkthroughAdds PostgreSQL SQLSTATE-based error classification and retry gating: new public header, implementation, build entry, and unit tests. Introduces Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related issues
Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request introduces a new module for classifying PostgreSQL errors to decide on retry logic, along with comprehensive unit tests. The implementation is clean and well-tested. I have a couple of suggestions to improve the efficiency and conciseness of the new classification functions in lib/PgSQLErrorClassifier.cpp.
| PgSQLErrorAction classify_pgsql_error(const char *sqlstate) { | ||
| if (sqlstate == nullptr || strlen(sqlstate) < 2) { | ||
| return PGSQL_ERROR_REPORT_TO_CLIENT; | ||
| } | ||
|
|
||
| // Classify by SQLSTATE class (first 2 characters) | ||
| char cls[3] = {sqlstate[0], sqlstate[1], '\0'}; | ||
|
|
||
| // Connection exceptions — retryable | ||
| if (strcmp(cls, "08") == 0) return PGSQL_ERROR_RETRY; | ||
|
|
||
| // Transaction rollback (serialization failure, deadlock) — retryable | ||
| if (strcmp(cls, "40") == 0) return PGSQL_ERROR_RETRY; | ||
|
|
||
| // Insufficient resources (too many connections) — retryable | ||
| if (strcmp(cls, "53") == 0) return PGSQL_ERROR_RETRY; | ||
|
|
||
| // Operator intervention (admin shutdown, crash) — fatal | ||
| if (strcmp(cls, "57") == 0) return PGSQL_ERROR_FATAL; | ||
|
|
||
| // System error (I/O error, crash shutdown) — fatal | ||
| if (strcmp(cls, "58") == 0) return PGSQL_ERROR_FATAL; | ||
|
|
||
| // Everything else (syntax, constraints, data, etc.) — report to client | ||
| return PGSQL_ERROR_REPORT_TO_CLIENT; | ||
| } |
There was a problem hiding this comment.
This function can be made more efficient and slightly cleaner by avoiding strlen and the temporary cls buffer. Using strncmp directly on the sqlstate string is a more performant and direct way to check the SQLSTATE class prefix. The initial nullptr and length check can also be optimized.
PgSQLErrorAction classify_pgsql_error(const char *sqlstate) {
if (sqlstate == nullptr || sqlstate[0] == '\0' || sqlstate[1] == '\0') {
return PGSQL_ERROR_REPORT_TO_CLIENT;
}
// Classify by SQLSTATE class (first 2 characters)
// Connection exceptions — retryable
if (strncmp(sqlstate, "08", 2) == 0) return PGSQL_ERROR_RETRY;
// Transaction rollback (serialization failure, deadlock) — retryable
if (strncmp(sqlstate, "40", 2) == 0) return PGSQL_ERROR_RETRY;
// Insufficient resources (too many connections) — retryable
if (strncmp(sqlstate, "53", 2) == 0) return PGSQL_ERROR_RETRY;
// Operator intervention (admin shutdown, crash) — fatal
if (strncmp(sqlstate, "57", 2) == 0) return PGSQL_ERROR_FATAL;
// System error (I/O error, crash shutdown) — fatal
if (strncmp(sqlstate, "58", 2) == 0) return PGSQL_ERROR_FATAL;
// Everything else (syntax, constraints, data, etc.) — report to client
return PGSQL_ERROR_REPORT_TO_CLIENT;
}| bool pgsql_can_retry_error( | ||
| PgSQLErrorAction action, | ||
| int retries_remaining, | ||
| bool in_transaction) | ||
| { | ||
| if (action != PGSQL_ERROR_RETRY) { | ||
| return false; | ||
| } | ||
| if (retries_remaining <= 0) { | ||
| return false; | ||
| } | ||
| if (in_transaction) { | ||
| return false; // PgSQL transactions are atomic, can't retry mid-txn | ||
| } | ||
| return true; | ||
| } |
There was a problem hiding this comment.
This function's logic can be expressed more concisely as a single boolean expression. This improves readability by making the conditions for a retry immediately clear.
bool pgsql_can_retry_error(
PgSQLErrorAction action,
int retries_remaining,
bool in_transaction)
{
return action == PGSQL_ERROR_RETRY && retries_remaining > 0 && !in_transaction;
}There was a problem hiding this comment.
Pull request overview
Adds a small, pure PgSQL error classifier module and integrates unit tests for it, aiming to make PgSQL retry decisions unit-testable as part of the “extract testable logic” milestone work.
Changes:
- Introduces
classify_pgsql_error()(SQLSTATE-based) andpgsql_can_retry_error()as reusable helper functions. - Adds a new TAP unit test binary with 25 test cases for the classifier and retry helper.
- Wires the new library object and unit test target into the respective Makefiles.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
include/PgSQLErrorClassifier.h |
Declares the new PgSQL error action enum + classifier/retry APIs. |
lib/PgSQLErrorClassifier.cpp |
Implements SQLSTATE “class” based classification and retry gating. |
lib/Makefile |
Links PgSQLErrorClassifier into libproxysql.a. |
test/tap/tests/unit/pgsql_error_classifier_unit-t.cpp |
Adds TAP unit tests for classification and retry helper behavior. |
test/tap/tests/unit/Makefile |
Registers the new unit test target in UNIT_TESTS. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
lib/PgSQLErrorClassifier.cpp
Outdated
| // Operator intervention (admin shutdown, crash) — fatal | ||
| if (strcmp(cls, "57") == 0) return PGSQL_ERROR_FATAL; | ||
|
|
There was a problem hiding this comment.
classify_pgsql_error() maps the entire SQLSTATE class "57" to PGSQL_ERROR_FATAL, but class 57 includes non-fatal/non-connection-ending conditions like 57014 (query_canceled) and potentially retryable ones like 57P03 (cannot_connect_now). Current production logic handles ERRCODE_QUERY_CANCELED and {ADMIN,CRASH}_SHUTDOWN/CANNOT_CONNECT_NOW specially in PgSQL_Session::handler_minus1_HandleErrorCodes, so classifying by class here will misclassify these. Suggest switching to full-code (5-char) matching (or PGSQL_ERROR_CODES) for the 57xx family and only marking truly fatal cases as fatal.
| // Operator intervention (admin shutdown, crash) — fatal | |
| if (strcmp(cls, "57") == 0) return PGSQL_ERROR_FATAL; | |
| // Operator intervention (admin shutdown, crash, query canceled, cannot connect now) | |
| if (strcmp(cls, "57") == 0) { | |
| // Use full SQLSTATE for 57xxx to avoid misclassifying non-fatal/retryable cases. | |
| if (strlen(sqlstate) >= 5) { | |
| // 57014: query_canceled — non-fatal, report to client | |
| if (strcmp(sqlstate, "57014") == 0) { | |
| return PGSQL_ERROR_REPORT_TO_CLIENT; | |
| } | |
| // 57P03: cannot_connect_now — retryable | |
| if (strcmp(sqlstate, "57P03") == 0) { | |
| return PGSQL_ERROR_RETRY; | |
| } | |
| // 57P01: admin_shutdown | |
| // 57P02: crash_shutdown | |
| // 57P04: database_dropped | |
| // These are considered fatal. | |
| if (strcmp(sqlstate, "57P01") == 0 || | |
| strcmp(sqlstate, "57P02") == 0 || | |
| strcmp(sqlstate, "57P04") == 0) { | |
| return PGSQL_ERROR_FATAL; | |
| } | |
| } | |
| // For other 57xxx codes, default to reporting to client. | |
| return PGSQL_ERROR_REPORT_TO_CLIENT; | |
| } |
include/PgSQLErrorClassifier.h
Outdated
| * - "57" (operator intervention, e.g. admin_shutdown): fatal | ||
| * - "58" (system error, e.g. crash_shutdown): fatal |
There was a problem hiding this comment.
The header comment lists class "58" as including crash_shutdown, but crash_shutdown is 57P02 (operator_intervention), not a "58" system error. Please correct the SQLSTATE examples/descriptions so the documentation matches PostgreSQL SQLSTATE definitions (and the PGSQL_ERROR_CODES mapping in PgSQL_Error_Helper.h).
| * - "57" (operator intervention, e.g. admin_shutdown): fatal | |
| * - "58" (system error, e.g. crash_shutdown): fatal | |
| * - "57" (operator intervention, e.g. admin_shutdown, crash_shutdown): fatal | |
| * - "58" (system error): fatal |
| static void test_fatal_errors() { | ||
| ok(classify_pgsql_error("57000") == PGSQL_ERROR_FATAL, | ||
| "57000 (operator intervention): fatal"); | ||
| ok(classify_pgsql_error("57P01") == PGSQL_ERROR_FATAL, | ||
| "57P01 (admin shutdown): fatal"); | ||
| ok(classify_pgsql_error("57P02") == PGSQL_ERROR_FATAL, | ||
| "57P02 (crash shutdown): fatal"); | ||
| ok(classify_pgsql_error("58000") == PGSQL_ERROR_FATAL, | ||
| "58000 (system error): fatal"); |
There was a problem hiding this comment.
These tests assert that the whole "57" SQLSTATE class is PGSQL_ERROR_FATAL (e.g. 57000). However, class 57 contains codes that are explicitly handled as non-fatal/retryable in production (57014 query_canceled, 57P03 cannot_connect_now, and shutdown codes are used to trigger reconnect+optional retry). Once the classifier logic is aligned with PgSQL_Session::handler_minus1_HandleErrorCodes, this test set should be adjusted to cover those specific codes and their intended actions rather than treating the whole class as fatal.
| // Insufficient resources (too many connections) — retryable | ||
| if (strcmp(cls, "53") == 0) return PGSQL_ERROR_RETRY; |
There was a problem hiding this comment.
classify_pgsql_error() treats every SQLSTATE class "53" (insufficient resources) as retryable. This will mark non-transient errors like 53100 (disk_full) / 53200 (out_of_memory) as retryable, which conflicts with existing PgSQL error handling (e.g. ERRCODE_OUT_OF_MEMORY is not retried in PgSQL_Session::handler_minus1_HandleErrorCodes). Consider classifying only specific retryable codes (e.g. 53300 too_many_connections) or using PGSQL_ERROR_CODES for a precise allowlist.
| // Insufficient resources (too many connections) — retryable | |
| if (strcmp(cls, "53") == 0) return PGSQL_ERROR_RETRY; | |
| // Insufficient resources — only specific conditions are retryable | |
| if (strcmp(cls, "53") == 0) { | |
| // 53300: too_many_connections — known transient, safe to retry | |
| if (strcmp(sqlstate, "53300") == 0) { | |
| return PGSQL_ERROR_RETRY; | |
| } | |
| // Other insufficient-resource errors (e.g. disk full, out of memory) | |
| // are treated as non-retryable and reported to the client. | |
| return PGSQL_ERROR_REPORT_TO_CLIENT; | |
| } |
- Fix: 57014 (query_canceled) is now classified as REPORT_TO_CLIENT instead of FATAL (it's a normal cancellation, not a server crash) - Fix docstring: crash_shutdown is 57P02 (class 57), not class 58; class 58 is system/I/O errors - Add test for 57014 exception (+1 test, plan 25→26)
There was a problem hiding this comment.
🧹 Nitpick comments (1)
test/tap/tests/unit/pgsql_error_classifier_unit-t.cpp (1)
91-98: Minor: Inline comment mismatch on line 94.The comment says
// 4buttest_fatal_errors()actually contains 5 tests (including the57014case added per the PR objectives). The total count inplan(26)is correct.📝 Suggested fix
test_connection_errors(); // 3 test_transaction_errors(); // 2 test_resource_errors(); // 2 - test_fatal_errors(); // 4 + test_fatal_errors(); // 5 test_non_retryable_errors(); // 5 test_edge_cases(); // 3 test_retry_conditions(); // 5 - // Total: 1+3+2+2+5+5+3+5 = 26 + // Total: 1+3+2+2+5+5+3+5 = 26 (init + connection + txn + resource + fatal + non-retry + edge + retry)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/tap/tests/unit/pgsql_error_classifier_unit-t.cpp` around lines 91 - 98, Update the inline count comment for test_fatal_errors() to reflect the actual number of tests (change the comment from "// 4" to "// 5"); locate the call to test_fatal_errors() in the file (the line that currently reads test_fatal_errors(); // 4) and adjust the trailing comment so the per-function count matches the actual tests (no other changes needed since the overall plan(26) is already correct).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@test/tap/tests/unit/pgsql_error_classifier_unit-t.cpp`:
- Around line 91-98: Update the inline count comment for test_fatal_errors() to
reflect the actual number of tests (change the comment from "// 4" to "// 5");
locate the call to test_fatal_errors() in the file (the line that currently
reads test_fatal_errors(); // 4) and adjust the trailing comment so the
per-function count matches the actual tests (no other changes needed since the
overall plan(26) is already correct).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 035a02d6-4dea-4ba7-8109-cf23f29ee745
📒 Files selected for processing (3)
include/PgSQLErrorClassifier.hlib/PgSQLErrorClassifier.cpptest/tap/tests/unit/pgsql_error_classifier_unit-t.cpp
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: CI-builds / builds (debian12,-dbg)
- GitHub Check: CI-builds / builds (ubuntu22,-tap)
- GitHub Check: run / trigger
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: rahim-kanji
Repo: sysown/proxysql PR: 5299
File: lib/PgSQL_Session.cpp:5173-5233
Timestamp: 2026-01-14T11:36:39.934Z
Learning: The function `PgSQL_Session::send_parameter_error_response()` in `lib/PgSQL_Session.cpp` has a default parameter for its second argument (PGSQL_ERROR_CODES code) with the default value of PGSQL_ERROR_CODES::ERRCODE_INVALID_TEXT_REPRESENTATION, as declared in include/PgSQL_Session.h. Single-argument calls to this function are valid.
📚 Learning: 2026-01-14T11:36:39.934Z
Learnt from: rahim-kanji
Repo: sysown/proxysql PR: 5299
File: lib/PgSQL_Session.cpp:5173-5233
Timestamp: 2026-01-14T11:36:39.934Z
Learning: The function `PgSQL_Session::send_parameter_error_response()` in `lib/PgSQL_Session.cpp` has a default parameter for its second argument (PGSQL_ERROR_CODES code) with the default value of PGSQL_ERROR_CODES::ERRCODE_INVALID_TEXT_REPRESENTATION, as declared in include/PgSQL_Session.h. Single-argument calls to this function are valid.
Applied to files:
lib/PgSQLErrorClassifier.cppinclude/PgSQLErrorClassifier.htest/tap/tests/unit/pgsql_error_classifier_unit-t.cpp
📚 Learning: 2026-01-20T09:34:19.124Z
Learnt from: yuji-hatakeyama
Repo: sysown/proxysql PR: 5307
File: test/tap/tests/reg_test_5306-show_warnings_with_comment-t.cpp:39-48
Timestamp: 2026-01-20T09:34:19.124Z
Learning: In ProxySQL's TAP test suite, resource leaks (e.g., not calling mysql_close() on early return paths) are commonly tolerated because test processes are short-lived and OS frees resources on exit. This pattern applies to all C++ test files under test/tap/tests. When reviewing, recognize this as a project-wide test convention and focus on test correctness and isolation rather than insisting on fixing such leaks in these test files.
Applied to files:
test/tap/tests/unit/pgsql_error_classifier_unit-t.cpp
🪛 Clang (14.0.6)
lib/PgSQLErrorClassifier.cpp
[error] 10-10: 'cstring' file not found
(clang-diagnostic-error)
include/PgSQLErrorClassifier.h
[error] 38-38: must use 'enum' tag to refer to type 'PgSQLErrorAction'
(clang-diagnostic-error)
[error] 52-52: unknown type name 'bool'
(clang-diagnostic-error)
[error] 53-53: must use 'enum' tag to refer to type 'PgSQLErrorAction'
(clang-diagnostic-error)
[error] 55-55: unknown type name 'bool'
(clang-diagnostic-error)
test/tap/tests/unit/pgsql_error_classifier_unit-t.cpp
[error] 8-8: 'tap.h' file not found
(clang-diagnostic-error)
🪛 Cppcheck (2.20.0)
test/tap/tests/unit/pgsql_error_classifier_unit-t.cpp
[warning] 86-86: If memory allocation fails, then there is a possible null pointer dereference
(nullPointerOutOfMemory)
[warning] 31-31: If memory allocation fails, then there is a possible null pointer dereference
(nullPointerOutOfMemory)
🔇 Additional comments (4)
include/PgSQLErrorClassifier.h (1)
1-58: LGTM!The header is well-documented with clear SQLSTATE class mappings and properly documents the special case for
57014(query_canceled). The enum and function declarations are appropriate for C++ usage.Note: The static analysis hints about
enumtags andbooltype are false positives—they apply to C compilation, but this header is used exclusively by C++ code (.cppfiles).lib/PgSQLErrorClassifier.cpp (2)
12-43: LGTM!The classification logic correctly handles:
- Defensive null/short-string checks (line 13)
- Retryable error classes (08, 40, 53)
- The 57014 exception with proper length check before
strncmp- Fatal error classes (57 except 57014, and 58)
- Safe default to
PGSQL_ERROR_REPORT_TO_CLIENT
45-60: LGTM!The retry gating logic is correct and follows short-circuit evaluation efficiently.
test/tap/tests/unit/pgsql_error_classifier_unit-t.cpp (1)
14-84: LGTM!Comprehensive test coverage including:
- All retryable SQLSTATE classes (08, 40, 53)
- Fatal errors with the
57014exception properly verified- Non-retryable error classes
- Edge cases (null, empty, short input)
- Retry gating logic with all condition combinations
Signed-off-by: René Cannaò <rene@proxysql.com>
|


Implements #5498. Extracts classify_pgsql_error() by SQLSTATE class and pgsql_can_retry_error() — 25 tests covering connection/transaction/resource/fatal/syntax/constraint errors.
Summary by CodeRabbit
New Features
Tests