Skip to content

Phase 3.9: PgSQL monitor health decisions + unit tests#5513

Merged
renecannao merged 3 commits intov3.0-5473from
v3.0-5497
Mar 22, 2026
Merged

Phase 3.9: PgSQL monitor health decisions + unit tests#5513
renecannao merged 3 commits intov3.0-5473from
v3.0-5497

Conversation

@renecannao
Copy link
Contributor

@renecannao renecannao commented Mar 22, 2026

Implements #5497. Extracts pgsql_should_shun_on_ping_failure() and pgsql_should_offline_for_readonly() — 11 tests.

Summary by CodeRabbit

  • New Features

    • PostgreSQL monitor now shuns servers when missed heartbeat count meets the configured failure threshold.
    • PostgreSQL monitor can mark read-only replicas as offline when they belong to writer hostgroups.
  • Tests

    • Added unit tests validating the PostgreSQL monitor health decision logic.

New: include/PgSQLMonitorDecision.h, lib/PgSQLMonitorDecision.cpp

Functions:
- pgsql_should_shun_on_ping_failure(): threshold-based shunning
  (simpler than MySQL — always aggressive with kill_all)
- pgsql_should_offline_for_readonly(): read-only server in writer
  hostgroup should go OFFLINE_SOFT

Unshun recovery is already covered by MonitorHealthDecision.h
can_unshun_server() (same logic for both protocols).
11 test cases: ping shunning (threshold, boundary, disabled),
read-only offline (writer/reader HG combinations).
Copilot AI review requested due to automatic review settings March 22, 2026 11:01
@gemini-code-assist
Copy link

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request refactors the PostgreSQL monitor's health decision logic by isolating specific decision-making processes into dedicated, testable functions. This improves code modularity and maintainability, and the addition of thorough unit tests enhances the robustness and reliability of the PostgreSQL monitoring system.

Highlights

  • New PostgreSQL Monitor Decision Functions: Extracted two core decision-making functions for the PostgreSQL monitor: pgsql_should_shun_on_ping_failure() to determine shunning based on ping failures, and pgsql_should_offline_for_readonly() to decide if a read-only server in a writer hostgroup should be taken offline.
  • Comprehensive Unit Testing: Introduced a new unit test file, pgsql_monitor_unit-t.cpp, which includes 11 tests covering various scenarios for the new PostgreSQL monitor decision functions, ensuring their correctness and reliability.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@coderabbitai
Copy link

coderabbitai bot commented Mar 22, 2026

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 5c613ee3-dd75-45c7-969d-c45a2814996f

📥 Commits

Reviewing files that changed from the base of the PR and between fe49380 and 8d7459e.

📒 Files selected for processing (2)
  • lib/Makefile
  • test/tap/tests/unit/Makefile

📝 Walkthrough

Walkthrough

Adds two PgSQL monitor decision functions (shunning on missed heartbeats and offline-for-readonly for writer hostgroups), their public header, implementation, build entries, and a TAP unit test validating both behaviors.

Changes

Cohort / File(s) Summary
Public API
include/PgSQLMonitorDecision.h
New header declaring pgsql_should_shun_on_ping_failure() and pgsql_should_offline_for_readonly() with Doxygen docs and include guards.
Implementation
lib/PgSQLMonitorDecision.cpp
New definitions: shunning enabled when max_failures_threshold != 0 and missed_heartbeats >= max_failures_threshold; offline decision returns true for read-only servers in writer hostgroups.
Library build
lib/Makefile
Adds PgSQLMonitorDecision.oo to C++ objects built into libproxysql.a.
Unit tests build
test/tap/tests/unit/Makefile
Registers new unit test binary pgsql_monitor_unit-t in UNIT_TESTS.
Unit tests
test/tap/tests/unit/pgsql_monitor_unit-t.cpp
New TAP test suite (11 assertions) covering shunning logic (including threshold=0 disabled case) and read-only/offline permutations.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Possibly related issues

Possibly related PRs

Poem

🐰 I hop in code with keen delight,
Two tiny choices set just right.
Failures shunned, read-onlys go soft,
Tests hum low and pass aloft. ✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly and clearly summarizes the main changes: adding PgSQL monitor health decision functions (pgsql_should_shun_on_ping_failure and pgsql_should_offline_for_readonly) along with comprehensive unit tests, which is exactly what the changeset implements.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch v3.0-5497

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request extracts two pure decision-making functions, pgsql_should_shun_on_ping_failure and pgsql_should_offline_for_readonly, into a new PgSQLMonitorDecision component. This is a good refactoring that improves modularity and testability. The PR also includes a comprehensive set of unit tests for these new functions, covering all logical paths and edge cases. The implementation of both the functions and their tests is clear and correct. Overall, this is a high-quality change that improves the codebase.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a small set of pure “decision” helpers for PgSQL monitor behavior and introduces a new unit test binary to exercise them, as part of the ongoing effort to extract monitor logic into testable units.

Changes:

  • Added pgsql_should_shun_on_ping_failure() and pgsql_should_offline_for_readonly() in a new PgSQL decision module.
  • Added a new TAP unit test (pgsql_monitor_unit-t) covering these helpers (11 assertions).
  • Wired the new unit test and new decision object into the unit-test and library build Makefiles.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
test/tap/tests/unit/pgsql_monitor_unit-t.cpp New unit test binary covering the new PgSQL decision helpers.
test/tap/tests/unit/Makefile Registers pgsql_monitor_unit-t in the unit-test build.
lib/PgSQLMonitorDecision.cpp Implements the new PgSQL monitor decision helper functions.
lib/Makefile Adds PgSQLMonitorDecision.oo to the library build.
include/PgSQLMonitorDecision.h Declares the new PgSQL monitor decision helper APIs with Doxygen docs.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +15 to +17
if (max_failures_threshold == 0) {
return false; // shunning disabled
}
Copy link

Copilot AI Mar 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The special-case handling of max_failures_threshold==0 (“disabled”) (and the corresponding unit test) doesn’t match current config validation: pgsql_variables.monitor_ping_max_failures is constrained to a minimum of 1 in PgSQL_Thread.cpp (VariablesPointers_int["monitor_ping_max_failures"] = …, 1, …). Either drop the disabled==0 semantics/tests here, or relax the variable validation if 0 is intended to be a supported value.

Suggested change
if (max_failures_threshold == 0) {
return false; // shunning disabled
}

Copilot uses AI. Check for mistakes.
Comment on lines +16 to +29
/**
* @brief Determine if a PgSQL server should be shunned based on ping failures.
*
* PgSQL monitor shuns servers when they miss N consecutive heartbeats.
* Unlike MySQL, PgSQL always uses aggressive shunning (kill all connections).
*
* @param missed_heartbeats Number of consecutive missed pings.
* @param max_failures_threshold Config: pgsql-monitor_ping_max_failures.
* @return true if the server should be shunned.
*/
bool pgsql_should_shun_on_ping_failure(
unsigned int missed_heartbeats,
unsigned int max_failures_threshold
);
Copy link

Copilot AI Mar 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These decision helpers are currently not used anywhere in production code (only referenced by the new unit test), so the tests don’t validate PgSQL_Monitor’s real shunning behavior. In lib/PgSQL_Monitor.cpp, shunning is decided via a query over the last N rows in pgsql_server_ping_log and explicitly excludes authentication failures (ping_error NOT LIKE '%password authentication failed for user%'). Consider wiring PgSQL_Monitor’s shun decision through this helper (or adjusting the helper’s inputs/behavior to match the current SQL-based logic), otherwise the helper/test can easily diverge from actual behavior.

Copilot uses AI. Check for mistakes.
Comment on lines +31 to +42
/**
* @brief Determine if a PgSQL server's read-only status indicates it should
* be taken offline for a writer hostgroup.
*
* @param is_read_only Whether the server reports read_only=true.
* @param is_writer_hg Whether this is a writer hostgroup.
* @return true if the server should be set to OFFLINE_SOFT.
*/
bool pgsql_should_offline_for_readonly(
bool is_read_only,
bool is_writer_hg
);
Copy link

Copilot AI Mar 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The read-only decision described here (“read-only server in a writer hostgroup should go OFFLINE_SOFT”) doesn’t appear to match the current PgSQL read-only handling: PgSQL_Monitor calls PgSQL_HostGroups_Manager::read_only_action_v2(), which re-maps servers between WRITER/READER hostgroups and does not set OFFLINE_SOFT based on read_only. If OFFLINE_SOFT is not actually part of the PgSQL read-only path, this function/docs/tests are misleading; consider aligning this helper with the real read-only action logic or removing it until it’s used.

Copilot uses AI. Check for mistakes.
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@lib/PgSQLMonitorDecision.cpp`:
- Around line 11-19: Replace the inline decision logic in the PgSQL monitor flow
with the new decision helpers: call pgsql_should_shun_on_ping_failure(...) from
the shunn_non_resp_srv code path instead of re-implementing the threshold check,
and similarly replace the inline readonly-handling condition with the readonly
decision helper (the function declared in the companion helper at lines 21-27).
Ensure you pass the same missed_heartbeats and max_failures_threshold arguments
used in the existing comparisons and keep behavior identical otherwise so the
helpers drive production decisions.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 7c683bf4-e95d-4760-a510-00acab9783c4

📥 Commits

Reviewing files that changed from the base of the PR and between 1354b79 and fe49380.

📒 Files selected for processing (5)
  • include/PgSQLMonitorDecision.h
  • lib/Makefile
  • lib/PgSQLMonitorDecision.cpp
  • test/tap/tests/unit/Makefile
  • test/tap/tests/unit/pgsql_monitor_unit-t.cpp
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: CI-builds / builds (debian12,-dbg)
  • GitHub Check: CI-builds / builds (ubuntu22,-tap)
  • GitHub Check: Agent
  • GitHub Check: run / trigger
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2026-01-20T09:34:19.124Z
Learnt from: yuji-hatakeyama
Repo: sysown/proxysql PR: 5307
File: test/tap/tests/reg_test_5306-show_warnings_with_comment-t.cpp:39-48
Timestamp: 2026-01-20T09:34:19.124Z
Learning: In ProxySQL's TAP test suite, resource leaks (e.g., not calling mysql_close() on early return paths) are commonly tolerated because test processes are short-lived and OS frees resources on exit. This pattern applies to all C++ test files under test/tap/tests. When reviewing, recognize this as a project-wide test convention and focus on test correctness and isolation rather than insisting on fixing such leaks in these test files.

Applied to files:

  • test/tap/tests/unit/pgsql_monitor_unit-t.cpp
🪛 Clang (14.0.6)
include/PgSQLMonitorDecision.h

[error] 26-26: unknown type name 'bool'

(clang-diagnostic-error)


[error] 39-39: unknown type name 'bool'

(clang-diagnostic-error)


[error] 40-40: unknown type name 'bool'

(clang-diagnostic-error)


[error] 41-41: unknown type name 'bool'

(clang-diagnostic-error)

test/tap/tests/unit/pgsql_monitor_unit-t.cpp

[error] 8-8: 'tap.h' file not found

(clang-diagnostic-error)

🪛 Cppcheck (2.20.0)
test/tap/tests/unit/pgsql_monitor_unit-t.cpp

[warning] 31-31: If memory allocation fails, then there is a possible null pointer dereference

(nullPointerOutOfMemory)

🔇 Additional comments (4)
lib/Makefile (1)

111-111: Good build wiring for the new decision module.

Line 111 correctly adds PgSQLMonitorDecision.oo to libproxysql.a inputs.

test/tap/tests/unit/pgsql_monitor_unit-t.cpp (1)

14-27: Test matrix and TAP plan are solid.

This covers the critical decision permutations and keeps plan(11) consistent with executed assertions.

Also applies to: 29-38, 41-47

test/tap/tests/unit/Makefile (1)

238-239: Unit test target integration looks correct.

pgsql_monitor_unit-t is now part of UNIT_TESTS, so it will be built with the standard unit suite.

include/PgSQLMonitorDecision.h (1)

26-29: This header is part of a C++-only codebase (457 .cpp vs 4 .c files) and is only included by C++ code. Using bool directly in C++ headers without stdbool.h or C compatibility guards is standard and correct. No C code includes this header, so C/C++ compatibility is unnecessary.

			> Likely an incorrect or invalid review comment.

Comment on lines +11 to +19
bool pgsql_should_shun_on_ping_failure(
unsigned int missed_heartbeats,
unsigned int max_failures_threshold)
{
if (max_failures_threshold == 0) {
return false; // shunning disabled
}
return (missed_heartbeats >= max_failures_threshold);
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

These decision helpers are not wired into the active PgSQL monitor flow yet.

The helper logic is correct, but current runtime paths in lib/PgSQL_Monitor.cpp (shunn_non_resp_srv around Lines 1687-1720 and readonly handling around Lines 1815-1839) still implement decisions inline instead of calling these functions. This leaves tested logic disconnected from production behavior.

Also applies to: 21-27

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lib/PgSQLMonitorDecision.cpp` around lines 11 - 19, Replace the inline
decision logic in the PgSQL monitor flow with the new decision helpers: call
pgsql_should_shun_on_ping_failure(...) from the shunn_non_resp_srv code path
instead of re-implementing the threshold check, and similarly replace the inline
readonly-handling condition with the readonly decision helper (the function
declared in the companion helper at lines 21-27). Ensure you pass the same
missed_heartbeats and max_failures_threshold arguments used in the existing
comparisons and keep behavior identical otherwise so the helpers drive
production decisions.

Signed-off-by: René Cannaò <rene@proxysql.com>
@renecannao renecannao merged commit ec5a9f9 into v3.0-5473 Mar 22, 2026
1 of 2 checks passed
@sonarqubecloud
Copy link

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants