Skip to content

Conversation

@claudio-pi
Copy link
Collaborator

Summary

  • _send_alert() no longer silently swallows failures — retries 3x with exponential backoff (1s, 2s, 4s) and logs on final failure. This was the root cause of never receiving "backups are stale" alerts.
  • backup_run() validates mount points for /mnt/* and /media/* destinations before running rsync. Prevents silent failures when an external drive disconnects.
  • _check_backup_freshness() detects unmounted drives and triggers a specific "drive not mounted" alert instead of the generic "backups are stale" message.

Context

The SSD disconnected due to a USB controller error (~Feb 8). Backups failed every hour for days, health check detected it, but _send_alert() discarded the curl response with > /dev/null 2>&1 || true — so the alert never reached Telegram.

Test plan

  • bash -n syntax validation passes
  • All 17 health-check.bats tests pass
  • All 57 Python tests pass
  • Mount check is conditional: only activates for /mnt/* and /media/* paths (won't break local directory backups or tests using /tmp)
  • CISO review: GO with conditions (all addressed)

- _send_alert() now retries 3 times with exponential backoff and logs
  failures instead of silently discarding them (was `> /dev/null || true`)
- backup_run() validates that /mnt/* and /media/* destinations are
  actually mounted before attempting rsync
- _check_backup_freshness() detects unmounted backup destinations and
  triggers a specific "drive not mounted" alert
@gemini-code-assist
Copy link

Summary of Changes

Hello @claudio-pi, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the robustness and reliability of the backup system by addressing critical failure points in both the backup process and the alerting mechanism. It ensures that backup operations are not performed on unmounted drives and that all health check alerts are reliably delivered, preventing silent failures and providing more precise notifications about the state of backups.

Highlights

  • Alert Reliability: The _send_alert() function no longer silently swallows failures. It now retries sending alerts up to 3 times with exponential backoff (1s, 2s, 4s) and logs any final failures, ensuring that critical 'backups are stale' alerts are not lost.
  • Backup Destination Validation: The backup_run() function now validates mount points for /mnt/* and /media/* destinations before initiating rsync. This prevents silent backup failures that occur when an external drive becomes disconnected, as it detects empty mount point directories.
  • Specific Unmounted Drive Alerts: The _check_backup_freshness() function has been enhanced to specifically detect unmounted drives. If a backup destination is found to be unmounted, it triggers a distinct 'drive not mounted' alert instead of the more generic 'backups are stale' message, providing clearer diagnostic information.
Changelog
  • lib/backup.sh
    • Added a conditional check within backup_run to verify if destination paths resembling external drives (/mnt/*, /media/*) are actual mounted filesystems, returning an error if not.
  • lib/health-check.sh
    • Implemented a retry mechanism with exponential backoff for the _send_alert function, along with internal logging of failures after all retries are exhausted.
    • Modified _check_backup_freshness to include a check for unmounted backup destinations, specifically for paths under /mnt/* or /media/*.
    • Updated the main health check logic to differentiate between a generic 'Backups are stale' alert and a specific 'Backup drive not mounted' alert based on the _check_backup_freshness result.
    • Added comments to _send_alert calls, clarifying that || true is used to prevent script abortion, as the function now handles internal failure logging.
Activity
  • Bash syntax validation (bash -n) passed.
  • All 17 health-check.bats tests passed.
  • All 57 Python tests passed.
  • The new mount check functionality is conditional, only applying to /mnt/* and /media/* paths to avoid breaking other backup scenarios.
  • A CISO review was conducted, and all conditions raised were addressed.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces crucial reliability improvements by addressing silent failures in backup and alerting mechanisms. The addition of retry logic with exponential backoff in _send_alert is a robust solution to prevent lost alerts. Similarly, validating mount points before backup and in health checks is an excellent proactive measure against failures from disconnected drives. The code is well-structured and the changes are clear. I have one suggestion to improve maintainability by reducing code duplication in the health check script, aligning with our guidelines on refactoring common logic.

Copy link

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 2 files

Confidence score: 3/5

  • There is a concrete user-facing risk: mountpoint in lib/backup.sh will reject valid backup destinations under mounted drives (e.g., /mnt/ssd/backups), which can cause backups to fail unexpectedly.
  • This is a medium-severity logic issue (6/10) in a critical path, so merge risk is moderate despite being localized.
  • Pay close attention to lib/backup.sh - mount check logic rejects valid subdirectories under mounted drives.
Prompt for AI agents (all issues)

Check if these issues are valid — if so, understand the root cause of each and fix them.


<file name="lib/backup.sh">

<violation number="1" location="lib/backup.sh:30">
P2: The mount check incorrectly rejects valid subdirectories under mounted drives because `mountpoint` only succeeds for the mount root. A destination like `/mnt/ssd/backups` will fail even when `/mnt/ssd` is mounted. Consider checking the mount point of the destination’s filesystem instead of the destination path itself.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

- Use findmnt --target instead of mountpoint -q to correctly handle
  subdirectories under mount points (e.g. /mnt/ssd/backups when
  /mnt/ssd is the actual mount). Falls back to mountpoint on the
  root component when findmnt is unavailable.

- _check_backup_freshness now returns distinct exit codes (0=fresh,
  1=stale, 2=unmounted) so the caller uses the exit code instead of
  re-running the mount check — eliminates the duplicated logic.
Source telegram.sh in health-check.sh and delegate alert sending to
telegram_send_message, which already handles retries via telegram_api,
message chunking for >4096 chars, and parse-mode fallback.

Removes ~30 lines of duplicated curl/retry logic from _send_alert.
@claudio-pi claudio-pi merged commit 8fcd9b7 into main Feb 10, 2026
4 checks passed
@claudio-pi claudio-pi deleted the fix/backup-alert-reliability branch February 10, 2026 16:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants