Skip to content

netnutmike/ansible-PI-Maintenance

Repository files navigation

Raspberry Pi Fleet Maintenance Automation

An Ansible-based automation solution for performing routine maintenance tasks across multiple Raspberry Pi devices. This project provides comprehensive OS updates, software management, health monitoring, and reporting capabilities for Pi fleets of any size.

Features

  • OS Updates: Automated package updates, system upgrades, and cleanup
  • Software Management: Package installation, service management, and version control
  • Health Monitoring: System health checks, temperature monitoring, and disk space tracking
  • Comprehensive Reporting: Detailed reports with multiple output formats and notification channels
  • Error Handling: Robust error recovery and graceful failure handling
  • Flexible Configuration: Customizable maintenance profiles and host-specific settings

Quick Start

Prerequisites

  • Ansible 2.9 or higher
  • Python 3.6+ on control machine
  • SSH access to all Raspberry Pi devices
  • Sudo privileges on target devices

Installation

  1. Clone this repository:
git clone <repository-url>
cd raspberry-pi-maintenance
  1. Install Ansible (if not already installed):
# On Ubuntu/Debian
sudo apt update && sudo apt install ansible

# On macOS
brew install ansible

# Using pip
pip install ansible
  1. Configure your inventory file:
cp inventory/examples/small_fleet.yml inventory/hosts.yml
# Edit inventory/hosts.yml with your Pi details
  1. Set up SSH key authentication:
# Generate SSH key if needed
ssh-keygen -t rsa -b 4096 -f ~/.ssh/pi_key

# Copy to all Pis
ssh-copy-id -i ~/.ssh/pi_key.pub pi@<pi-ip-address>

Basic Usage

Run complete maintenance on all Pis:

ansible-playbook -i inventory/hosts.yml playbooks/maintenance.yml

Run specific maintenance tasks:

# OS updates only
ansible-playbook -i inventory/hosts.yml playbooks/maintenance.yml --tags "os_update"

# Health checks only
ansible-playbook -i inventory/hosts.yml playbooks/maintenance.yml --tags "health_check"

# Software management only
ansible-playbook -i inventory/hosts.yml playbooks/maintenance.yml --tags "software"

Dry run (preview changes):

ansible-playbook -i inventory/hosts.yml playbooks/maintenance.yml --check

Configuration

Inventory Configuration

The inventory file defines your Raspberry Pi fleet and their configurations. Example structure:

raspberry_pis:
  hosts:
    pi-sensor-01:
      ansible_host: 192.168.1.101
      maintenance_profile: standard
    pi-gateway-01:
      ansible_host: 192.168.1.102
      maintenance_profile: minimal
  vars:
    ansible_user: pi
    ansible_ssh_private_key_file: ~/.ssh/pi_key

Maintenance Profiles

Configure different maintenance levels in vars/maintenance_config.yml:

maintenance_profiles:
  standard:
    os_update: true
    software_management: true
    health_checks: true
    reboot_allowed: true
    cleanup_enabled: true

  minimal:
    os_update: true
    software_management: false
    health_checks: true
    reboot_allowed: false
    cleanup_enabled: false

  full:
    os_update: true
    software_management: true
    health_checks: true
    reboot_allowed: true
    cleanup_enabled: true
    deep_cleanup: true

Host Group Variables

Customize settings for different Pi groups in inventory/group_vars/:

  • raspberry_pis.yml: Common settings for all Pis
  • sensor_pis.yml: Settings specific to sensor nodes
  • production_pis.yml: Production environment settings
  • development_pis.yml: Development environment settings

Usage Examples

Scenario 1: Weekly Maintenance

Run comprehensive maintenance on all production Pis:

# Full maintenance with reporting
ansible-playbook -i inventory/hosts.yml playbooks/maintenance.yml \
  --limit production_pis \
  --extra-vars "send_email_report=true"

Scenario 2: Emergency Security Updates

Apply only critical OS updates without rebooting:

ansible-playbook -i inventory/hosts.yml playbooks/maintenance.yml \
  --tags "os_update" \
  --extra-vars "reboot_allowed=false emergency_mode=true"

Scenario 3: Health Check Only

Monitor system health without making changes:

ansible-playbook -i inventory/hosts.yml playbooks/maintenance.yml \
  --tags "health_check,reporting" \
  --extra-vars "health_check_only=true"

Scenario 4: Selective Host Maintenance

Maintain specific hosts:

# Single host
ansible-playbook -i inventory/hosts.yml playbooks/maintenance.yml \
  --limit "pi-sensor-01"

# Multiple hosts
ansible-playbook -i inventory/hosts.yml playbooks/maintenance.yml \
  --limit "pi-sensor-01,pi-gateway-01"

# Host pattern
ansible-playbook -i inventory/hosts.yml playbooks/maintenance.yml \
  --limit "pi-sensor-*"

Scenario 5: Custom Software Installation

Install additional packages on specific hosts:

ansible-playbook -i inventory/hosts.yml playbooks/maintenance.yml \
  --tags "software" \
  --extra-vars "additional_packages=['htop','vim','git']"

Configuration Options

Global Configuration Variables

Located in vars/maintenance_config.yml:

OS Update Settings

  • os_update_enabled: Enable/disable OS updates (default: true)
  • reboot_allowed: Allow automatic reboots (default: true)
  • reboot_timeout: Reboot timeout in seconds (default: 300)
  • package_cache_update: Update package cache (default: true)
  • autoremove_packages: Remove orphaned packages (default: true)
  • autoclean_cache: Clean package cache (default: true)

Software Management Settings

  • software_management_enabled: Enable software management (default: true)
  • required_packages: List of packages to ensure are installed
  • service_management: Manage services after package changes (default: true)
  • package_hold_list: Packages to hold at current version

Health Check Settings

  • health_checks_enabled: Enable health monitoring (default: true)
  • disk_usage_warning_threshold: Disk usage warning level (default: 80)
  • disk_usage_critical_threshold: Disk usage critical level (default: 90)
  • temperature_warning_threshold: Temperature warning in Celsius (default: 70)
  • temperature_critical_threshold: Temperature critical level (default: 80)
  • load_average_threshold: System load warning threshold (default: 2.0)

Reporting Settings

  • reporting_enabled: Enable report generation (default: true)
  • report_format: Report format - html, text, json (default: html)
  • log_retention_days: Days to keep logs (default: 30)
  • send_email_report: Send email reports (default: false)
  • email_recipients: List of email addresses for reports
  • slack_webhook_url: Slack webhook for notifications
  • webhook_url: Generic webhook URL for notifications

Error Handling Settings

  • max_retries: Maximum retry attempts for failed tasks (default: 3)
  • retry_delay: Delay between retries in seconds (default: 10)
  • continue_on_error: Continue with other hosts on failure (default: true)
  • emergency_rollback: Enable emergency rollback on critical failures (default: true)

Package Lock Handling Settings

  • software_management_force_unlock: Automatically remove stale lock files (default: true)
  • software_management_kill_blocking_processes: Kill blocking processes (default: false)
  • software_management_lock_timeout_short: Initial wait timeout in seconds (default: 60)
  • software_management_lock_timeout_long: Extended wait timeout in seconds (default: 180)

Host-Specific Variables

Override global settings for specific hosts in inventory:

raspberry_pis:
  hosts:
    pi-critical-01:
      ansible_host: 192.168.1.100
      maintenance_profile: minimal
      reboot_allowed: false
      health_check_interval: 300
      custom_packages:
        - docker.io
        - nginx

Environment-Specific Variables

Use group variables for different environments:

# inventory/group_vars/production_pis.yml
reboot_allowed: false
send_email_report: true
health_check_interval: 600
log_level: WARNING

# inventory/group_vars/development_pis.yml
reboot_allowed: true
send_email_report: false
health_check_interval: 1800
log_level: DEBUG

Troubleshooting

Common Issues

SSH Connection Problems

Problem: "Permission denied" or "Connection refused" errors

Solutions:

  1. Verify SSH key is correctly configured:

    ssh -i ~/.ssh/pi_key pi@<pi-ip> -v
  2. Check SSH service on Pi:

    sudo systemctl status ssh
    sudo systemctl enable ssh
    sudo systemctl start ssh
  3. Verify SSH key permissions:

    chmod 600 ~/.ssh/pi_key
    chmod 644 ~/.ssh/pi_key.pub

Package Manager Lock Issues

Problem: Script hangs waiting for package manager locks

Solutions:

  1. Automatic handling (recommended): The script now automatically handles locks with configurable timeouts and force removal.

  2. Manual cleanup script:

    sudo ./scripts/clear-package-locks.sh
  3. Manual commands:

    # Kill blocking processes
    sudo pkill -f "apt|dpkg|unattended-upgrade"
    
    # Remove lock files
    sudo rm -f /var/lib/dpkg/lock*
    sudo rm -f /var/cache/apt/archives/lock
    
    # Fix broken packages
    sudo dpkg --configure -a
    sudo apt-get update
  4. Configuration options in vars/maintenance_config.yml:

    software_management_force_unlock: true # Auto-remove stale locks
    software_management_kill_blocking_processes: false # Kill blocking processes

Package Update Failures

Problem: Package updates fail with dependency errors

Solutions:

  1. Run manual cleanup on affected Pi:

    ssh pi@<pi-ip>
    sudo apt update
    sudo apt --fix-broken install
    sudo apt autoremove
  2. Use emergency maintenance profile:

    ansible-playbook -i inventory/hosts.yml playbooks/maintenance.yml \
      --extra-vars "maintenance_profile=emergency"

Disk Space Issues

Problem: Insufficient disk space prevents updates

Solutions:

  1. Run cleanup-only maintenance:

    ansible-playbook -i inventory/hosts.yml playbooks/maintenance.yml \
      --tags "cleanup" --limit "<affected-host>"
  2. Manually clean up logs:

    ansible-playbook -i inventory/hosts.yml playbooks/maintenance.yml \
      --tags "log_cleanup" --extra-vars "aggressive_cleanup=true"

Temperature Throttling

Problem: Pi is throttling due to high temperature

Solutions:

  1. Check current temperature:

    ansible raspberry_pis -i inventory/hosts.yml -m shell \
      -a "vcgencmd measure_temp"
  2. Skip temperature-sensitive operations:

    ansible-playbook -i inventory/hosts.yml playbooks/maintenance.yml \
      --extra-vars "skip_cpu_intensive=true"

Debugging

Enable Verbose Output

# Ansible verbose output
ansible-playbook -i inventory/hosts.yml playbooks/maintenance.yml -vvv

# Debug specific host
ansible-playbook -i inventory/hosts.yml playbooks/maintenance.yml \
  --limit "problematic-host" -vvv

Check Logs

# View maintenance logs
tail -f logs/maintenance-$(date +%Y%m%d).log

# View Ansible logs
export ANSIBLE_LOG_PATH=./ansible.log
ansible-playbook -i inventory/hosts.yml playbooks/maintenance.yml
tail -f ansible.log

Test Connectivity

# Test all hosts
ansible raspberry_pis -i inventory/hosts.yml -m ping

# Test specific host
ansible pi-sensor-01 -i inventory/hosts.yml -m ping

# Gather facts
ansible raspberry_pis -i inventory/hosts.yml -m setup

FAQ

Q: How often should I run maintenance?

A: Recommended schedule:

  • Weekly: Full maintenance for production systems
  • Daily: Health checks only
  • Monthly: Deep cleanup and comprehensive reporting
  • As needed: Security updates and emergency maintenance

Q: Can I run maintenance on a subset of hosts?

A: Yes, use the --limit flag:

ansible-playbook -i inventory/hosts.yml playbooks/maintenance.yml --limit "sensor_pis"

Q: What happens if a Pi is offline during maintenance?

A: The script will skip offline hosts and continue with others. Check the final report for a summary of unreachable hosts.

Q: How do I add custom packages to be installed?

A: Add them to the required_packages list in vars/maintenance_config.yml or use host-specific variables:

required_packages:
  - htop
  - vim
  - git
  - custom-package

Q: Can I prevent specific packages from being updated?

A: Yes, add them to the package_hold_list:

package_hold_list:
  - kernel-package
  - critical-service

Q: How do I customize email notifications?

A: Configure email settings in vars/maintenance_config.yml:

email_notifications:
  enabled: true
  smtp_server: smtp.gmail.com
  smtp_port: 587
  username: your-email@gmail.com
  password: your-app-password
  recipients:
    - admin@company.com
    - team@company.com

Q: What if the script gets stuck waiting for package locks?

A: This happens when another package management process is running. Solutions:

  1. Automatic handling (recommended): Enable force unlock in vars/maintenance_config.yml:

    software_management_force_unlock: true
    software_management_kill_blocking_processes: false # Set to true for aggressive cleanup
  2. Manual cleanup: Run the provided script:

    sudo ./scripts/clear-package-locks.sh
  3. Manual commands:

    # Kill blocking processes
    sudo pkill -f "apt|dpkg|unattended-upgrade"
    
    # Remove lock files
    sudo rm -f /var/lib/dpkg/lock*
    sudo rm -f /var/cache/apt/archives/lock
    
    # Fix broken packages
    sudo dpkg --configure -a
    sudo apt-get update

Q: What if maintenance fails on critical systems?

A: The script includes rollback mechanisms. For critical failures:

  1. Check the error logs in logs/
  2. Review the maintenance report
  3. Use emergency rollback if available:
    ansible-playbook -i inventory/hosts.yml playbooks/emergency_rollback.yml

Q: How do I scale this for large Pi fleets (100+ devices)?

A: For large fleets:

  1. Increase parallelism in ansible.cfg:
    [defaults]
    forks = 20
  2. Use dynamic inventory sources
  3. Implement staged rollouts:
    ansible-playbook -i inventory/hosts.yml playbooks/maintenance.yml \
      --limit "batch_1" --extra-vars "batch_mode=true"

Q: Can I integrate this with monitoring systems?

A: Yes, the script supports webhook notifications and structured JSON output that can be consumed by monitoring systems like Prometheus, Grafana, or custom dashboards.

Support

For issues, questions, or contributions:

  1. Check the troubleshooting section above
  2. Review the logs in the logs/ directory
  3. Create an issue with detailed error information and configuration details

License

This project is licensed under the MIT License - see the LICENSE file for details.

About

Perform Regular Maintenance on the Raspberry Pis

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published