Phase 3.9: PgSQL monitor health decisions + unit tests#5513
Phase 3.9: PgSQL monitor health decisions + unit tests#5513renecannao merged 3 commits intov3.0-5473from
Conversation
New: include/PgSQLMonitorDecision.h, lib/PgSQLMonitorDecision.cpp Functions: - pgsql_should_shun_on_ping_failure(): threshold-based shunning (simpler than MySQL — always aggressive with kill_all) - pgsql_should_offline_for_readonly(): read-only server in writer hostgroup should go OFFLINE_SOFT Unshun recovery is already covered by MonitorHealthDecision.h can_unshun_server() (same logic for both protocols).
11 test cases: ping shunning (threshold, boundary, disabled), read-only offline (writer/reader HG combinations).
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request refactors the PostgreSQL monitor's health decision logic by isolating specific decision-making processes into dedicated, testable functions. This improves code modularity and maintainability, and the addition of thorough unit tests enhances the robustness and reliability of the PostgreSQL monitoring system. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
|
Caution Review failedThe pull request is closed. ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
📝 WalkthroughWalkthroughAdds two PgSQL monitor decision functions (shunning on missed heartbeats and offline-for-readonly for writer hostgroups), their public header, implementation, build entries, and a TAP unit test validating both behaviors. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes Possibly related issues
Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request extracts two pure decision-making functions, pgsql_should_shun_on_ping_failure and pgsql_should_offline_for_readonly, into a new PgSQLMonitorDecision component. This is a good refactoring that improves modularity and testability. The PR also includes a comprehensive set of unit tests for these new functions, covering all logical paths and edge cases. The implementation of both the functions and their tests is clear and correct. Overall, this is a high-quality change that improves the codebase.
There was a problem hiding this comment.
Pull request overview
Adds a small set of pure “decision” helpers for PgSQL monitor behavior and introduces a new unit test binary to exercise them, as part of the ongoing effort to extract monitor logic into testable units.
Changes:
- Added
pgsql_should_shun_on_ping_failure()andpgsql_should_offline_for_readonly()in a new PgSQL decision module. - Added a new TAP unit test (
pgsql_monitor_unit-t) covering these helpers (11 assertions). - Wired the new unit test and new decision object into the unit-test and library build Makefiles.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
test/tap/tests/unit/pgsql_monitor_unit-t.cpp |
New unit test binary covering the new PgSQL decision helpers. |
test/tap/tests/unit/Makefile |
Registers pgsql_monitor_unit-t in the unit-test build. |
lib/PgSQLMonitorDecision.cpp |
Implements the new PgSQL monitor decision helper functions. |
lib/Makefile |
Adds PgSQLMonitorDecision.oo to the library build. |
include/PgSQLMonitorDecision.h |
Declares the new PgSQL monitor decision helper APIs with Doxygen docs. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if (max_failures_threshold == 0) { | ||
| return false; // shunning disabled | ||
| } |
There was a problem hiding this comment.
The special-case handling of max_failures_threshold==0 (“disabled”) (and the corresponding unit test) doesn’t match current config validation: pgsql_variables.monitor_ping_max_failures is constrained to a minimum of 1 in PgSQL_Thread.cpp (VariablesPointers_int["monitor_ping_max_failures"] = …, 1, …). Either drop the disabled==0 semantics/tests here, or relax the variable validation if 0 is intended to be a supported value.
| if (max_failures_threshold == 0) { | |
| return false; // shunning disabled | |
| } |
| /** | ||
| * @brief Determine if a PgSQL server should be shunned based on ping failures. | ||
| * | ||
| * PgSQL monitor shuns servers when they miss N consecutive heartbeats. | ||
| * Unlike MySQL, PgSQL always uses aggressive shunning (kill all connections). | ||
| * | ||
| * @param missed_heartbeats Number of consecutive missed pings. | ||
| * @param max_failures_threshold Config: pgsql-monitor_ping_max_failures. | ||
| * @return true if the server should be shunned. | ||
| */ | ||
| bool pgsql_should_shun_on_ping_failure( | ||
| unsigned int missed_heartbeats, | ||
| unsigned int max_failures_threshold | ||
| ); |
There was a problem hiding this comment.
These decision helpers are currently not used anywhere in production code (only referenced by the new unit test), so the tests don’t validate PgSQL_Monitor’s real shunning behavior. In lib/PgSQL_Monitor.cpp, shunning is decided via a query over the last N rows in pgsql_server_ping_log and explicitly excludes authentication failures (ping_error NOT LIKE '%password authentication failed for user%'). Consider wiring PgSQL_Monitor’s shun decision through this helper (or adjusting the helper’s inputs/behavior to match the current SQL-based logic), otherwise the helper/test can easily diverge from actual behavior.
| /** | ||
| * @brief Determine if a PgSQL server's read-only status indicates it should | ||
| * be taken offline for a writer hostgroup. | ||
| * | ||
| * @param is_read_only Whether the server reports read_only=true. | ||
| * @param is_writer_hg Whether this is a writer hostgroup. | ||
| * @return true if the server should be set to OFFLINE_SOFT. | ||
| */ | ||
| bool pgsql_should_offline_for_readonly( | ||
| bool is_read_only, | ||
| bool is_writer_hg | ||
| ); |
There was a problem hiding this comment.
The read-only decision described here (“read-only server in a writer hostgroup should go OFFLINE_SOFT”) doesn’t appear to match the current PgSQL read-only handling: PgSQL_Monitor calls PgSQL_HostGroups_Manager::read_only_action_v2(), which re-maps servers between WRITER/READER hostgroups and does not set OFFLINE_SOFT based on read_only. If OFFLINE_SOFT is not actually part of the PgSQL read-only path, this function/docs/tests are misleading; consider aligning this helper with the real read-only action logic or removing it until it’s used.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@lib/PgSQLMonitorDecision.cpp`:
- Around line 11-19: Replace the inline decision logic in the PgSQL monitor flow
with the new decision helpers: call pgsql_should_shun_on_ping_failure(...) from
the shunn_non_resp_srv code path instead of re-implementing the threshold check,
and similarly replace the inline readonly-handling condition with the readonly
decision helper (the function declared in the companion helper at lines 21-27).
Ensure you pass the same missed_heartbeats and max_failures_threshold arguments
used in the existing comparisons and keep behavior identical otherwise so the
helpers drive production decisions.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 7c683bf4-e95d-4760-a510-00acab9783c4
📒 Files selected for processing (5)
include/PgSQLMonitorDecision.hlib/Makefilelib/PgSQLMonitorDecision.cpptest/tap/tests/unit/Makefiletest/tap/tests/unit/pgsql_monitor_unit-t.cpp
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: CI-builds / builds (debian12,-dbg)
- GitHub Check: CI-builds / builds (ubuntu22,-tap)
- GitHub Check: Agent
- GitHub Check: run / trigger
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2026-01-20T09:34:19.124Z
Learnt from: yuji-hatakeyama
Repo: sysown/proxysql PR: 5307
File: test/tap/tests/reg_test_5306-show_warnings_with_comment-t.cpp:39-48
Timestamp: 2026-01-20T09:34:19.124Z
Learning: In ProxySQL's TAP test suite, resource leaks (e.g., not calling mysql_close() on early return paths) are commonly tolerated because test processes are short-lived and OS frees resources on exit. This pattern applies to all C++ test files under test/tap/tests. When reviewing, recognize this as a project-wide test convention and focus on test correctness and isolation rather than insisting on fixing such leaks in these test files.
Applied to files:
test/tap/tests/unit/pgsql_monitor_unit-t.cpp
🪛 Clang (14.0.6)
include/PgSQLMonitorDecision.h
[error] 26-26: unknown type name 'bool'
(clang-diagnostic-error)
[error] 39-39: unknown type name 'bool'
(clang-diagnostic-error)
[error] 40-40: unknown type name 'bool'
(clang-diagnostic-error)
[error] 41-41: unknown type name 'bool'
(clang-diagnostic-error)
test/tap/tests/unit/pgsql_monitor_unit-t.cpp
[error] 8-8: 'tap.h' file not found
(clang-diagnostic-error)
🪛 Cppcheck (2.20.0)
test/tap/tests/unit/pgsql_monitor_unit-t.cpp
[warning] 31-31: If memory allocation fails, then there is a possible null pointer dereference
(nullPointerOutOfMemory)
🔇 Additional comments (4)
lib/Makefile (1)
111-111: Good build wiring for the new decision module.Line 111 correctly adds
PgSQLMonitorDecision.ootolibproxysql.ainputs.test/tap/tests/unit/pgsql_monitor_unit-t.cpp (1)
14-27: Test matrix and TAP plan are solid.This covers the critical decision permutations and keeps
plan(11)consistent with executed assertions.Also applies to: 29-38, 41-47
test/tap/tests/unit/Makefile (1)
238-239: Unit test target integration looks correct.
pgsql_monitor_unit-tis now part ofUNIT_TESTS, so it will be built with the standard unit suite.include/PgSQLMonitorDecision.h (1)
26-29: This header is part of a C++-only codebase (457 .cpp vs 4 .c files) and is only included by C++ code. Usingbooldirectly in C++ headers withoutstdbool.hor C compatibility guards is standard and correct. No C code includes this header, so C/C++ compatibility is unnecessary.> Likely an incorrect or invalid review comment.
| bool pgsql_should_shun_on_ping_failure( | ||
| unsigned int missed_heartbeats, | ||
| unsigned int max_failures_threshold) | ||
| { | ||
| if (max_failures_threshold == 0) { | ||
| return false; // shunning disabled | ||
| } | ||
| return (missed_heartbeats >= max_failures_threshold); | ||
| } |
There was a problem hiding this comment.
These decision helpers are not wired into the active PgSQL monitor flow yet.
The helper logic is correct, but current runtime paths in lib/PgSQL_Monitor.cpp (shunn_non_resp_srv around Lines 1687-1720 and readonly handling around Lines 1815-1839) still implement decisions inline instead of calling these functions. This leaves tested logic disconnected from production behavior.
Also applies to: 21-27
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@lib/PgSQLMonitorDecision.cpp` around lines 11 - 19, Replace the inline
decision logic in the PgSQL monitor flow with the new decision helpers: call
pgsql_should_shun_on_ping_failure(...) from the shunn_non_resp_srv code path
instead of re-implementing the threshold check, and similarly replace the inline
readonly-handling condition with the readonly decision helper (the function
declared in the companion helper at lines 21-27). Ensure you pass the same
missed_heartbeats and max_failures_threshold arguments used in the existing
comparisons and keep behavior identical otherwise so the helpers drive
production decisions.
Signed-off-by: René Cannaò <rene@proxysql.com>
|



Implements #5497. Extracts pgsql_should_shun_on_ping_failure() and pgsql_should_offline_for_readonly() — 11 tests.
Summary by CodeRabbit
New Features
Tests