An Ansible-based automation solution for performing routine maintenance tasks across multiple Raspberry Pi devices. This project provides comprehensive OS updates, software management, health monitoring, and reporting capabilities for Pi fleets of any size.
- OS Updates: Automated package updates, system upgrades, and cleanup
- Software Management: Package installation, service management, and version control
- Health Monitoring: System health checks, temperature monitoring, and disk space tracking
- Comprehensive Reporting: Detailed reports with multiple output formats and notification channels
- Error Handling: Robust error recovery and graceful failure handling
- Flexible Configuration: Customizable maintenance profiles and host-specific settings
- Ansible 2.9 or higher
- Python 3.6+ on control machine
- SSH access to all Raspberry Pi devices
- Sudo privileges on target devices
- Clone this repository:
git clone <repository-url>
cd raspberry-pi-maintenance
- Install Ansible (if not already installed):
# On Ubuntu/Debian
sudo apt update && sudo apt install ansible
# On macOS
brew install ansible
# Using pip
pip install ansible
- Configure your inventory file:
cp inventory/examples/small_fleet.yml inventory/hosts.yml
# Edit inventory/hosts.yml with your Pi details
- Set up SSH key authentication:
# Generate SSH key if needed
ssh-keygen -t rsa -b 4096 -f ~/.ssh/pi_key
# Copy to all Pis
ssh-copy-id -i ~/.ssh/pi_key.pub pi@<pi-ip-address>
Run complete maintenance on all Pis:
ansible-playbook -i inventory/hosts.yml playbooks/maintenance.yml
Run specific maintenance tasks:
# OS updates only
ansible-playbook -i inventory/hosts.yml playbooks/maintenance.yml --tags "os_update"
# Health checks only
ansible-playbook -i inventory/hosts.yml playbooks/maintenance.yml --tags "health_check"
# Software management only
ansible-playbook -i inventory/hosts.yml playbooks/maintenance.yml --tags "software"
Dry run (preview changes):
ansible-playbook -i inventory/hosts.yml playbooks/maintenance.yml --check
The inventory file defines your Raspberry Pi fleet and their configurations. Example structure:
raspberry_pis:
hosts:
pi-sensor-01:
ansible_host: 192.168.1.101
maintenance_profile: standard
pi-gateway-01:
ansible_host: 192.168.1.102
maintenance_profile: minimal
vars:
ansible_user: pi
ansible_ssh_private_key_file: ~/.ssh/pi_key
Configure different maintenance levels in vars/maintenance_config.yml
:
maintenance_profiles:
standard:
os_update: true
software_management: true
health_checks: true
reboot_allowed: true
cleanup_enabled: true
minimal:
os_update: true
software_management: false
health_checks: true
reboot_allowed: false
cleanup_enabled: false
full:
os_update: true
software_management: true
health_checks: true
reboot_allowed: true
cleanup_enabled: true
deep_cleanup: true
Customize settings for different Pi groups in inventory/group_vars/
:
raspberry_pis.yml
: Common settings for all Pissensor_pis.yml
: Settings specific to sensor nodesproduction_pis.yml
: Production environment settingsdevelopment_pis.yml
: Development environment settings
Run comprehensive maintenance on all production Pis:
# Full maintenance with reporting
ansible-playbook -i inventory/hosts.yml playbooks/maintenance.yml \
--limit production_pis \
--extra-vars "send_email_report=true"
Apply only critical OS updates without rebooting:
ansible-playbook -i inventory/hosts.yml playbooks/maintenance.yml \
--tags "os_update" \
--extra-vars "reboot_allowed=false emergency_mode=true"
Monitor system health without making changes:
ansible-playbook -i inventory/hosts.yml playbooks/maintenance.yml \
--tags "health_check,reporting" \
--extra-vars "health_check_only=true"
Maintain specific hosts:
# Single host
ansible-playbook -i inventory/hosts.yml playbooks/maintenance.yml \
--limit "pi-sensor-01"
# Multiple hosts
ansible-playbook -i inventory/hosts.yml playbooks/maintenance.yml \
--limit "pi-sensor-01,pi-gateway-01"
# Host pattern
ansible-playbook -i inventory/hosts.yml playbooks/maintenance.yml \
--limit "pi-sensor-*"
Install additional packages on specific hosts:
ansible-playbook -i inventory/hosts.yml playbooks/maintenance.yml \
--tags "software" \
--extra-vars "additional_packages=['htop','vim','git']"
Located in vars/maintenance_config.yml
:
os_update_enabled
: Enable/disable OS updates (default: true)reboot_allowed
: Allow automatic reboots (default: true)reboot_timeout
: Reboot timeout in seconds (default: 300)package_cache_update
: Update package cache (default: true)autoremove_packages
: Remove orphaned packages (default: true)autoclean_cache
: Clean package cache (default: true)
software_management_enabled
: Enable software management (default: true)required_packages
: List of packages to ensure are installedservice_management
: Manage services after package changes (default: true)package_hold_list
: Packages to hold at current version
health_checks_enabled
: Enable health monitoring (default: true)disk_usage_warning_threshold
: Disk usage warning level (default: 80)disk_usage_critical_threshold
: Disk usage critical level (default: 90)temperature_warning_threshold
: Temperature warning in Celsius (default: 70)temperature_critical_threshold
: Temperature critical level (default: 80)load_average_threshold
: System load warning threshold (default: 2.0)
reporting_enabled
: Enable report generation (default: true)report_format
: Report format - html, text, json (default: html)log_retention_days
: Days to keep logs (default: 30)send_email_report
: Send email reports (default: false)email_recipients
: List of email addresses for reportsslack_webhook_url
: Slack webhook for notificationswebhook_url
: Generic webhook URL for notifications
max_retries
: Maximum retry attempts for failed tasks (default: 3)retry_delay
: Delay between retries in seconds (default: 10)continue_on_error
: Continue with other hosts on failure (default: true)emergency_rollback
: Enable emergency rollback on critical failures (default: true)
software_management_force_unlock
: Automatically remove stale lock files (default: true)software_management_kill_blocking_processes
: Kill blocking processes (default: false)software_management_lock_timeout_short
: Initial wait timeout in seconds (default: 60)software_management_lock_timeout_long
: Extended wait timeout in seconds (default: 180)
Override global settings for specific hosts in inventory:
raspberry_pis:
hosts:
pi-critical-01:
ansible_host: 192.168.1.100
maintenance_profile: minimal
reboot_allowed: false
health_check_interval: 300
custom_packages:
- docker.io
- nginx
Use group variables for different environments:
# inventory/group_vars/production_pis.yml
reboot_allowed: false
send_email_report: true
health_check_interval: 600
log_level: WARNING
# inventory/group_vars/development_pis.yml
reboot_allowed: true
send_email_report: false
health_check_interval: 1800
log_level: DEBUG
Problem: "Permission denied" or "Connection refused" errors
Solutions:
-
Verify SSH key is correctly configured:
ssh -i ~/.ssh/pi_key pi@<pi-ip> -v
-
Check SSH service on Pi:
sudo systemctl status ssh sudo systemctl enable ssh sudo systemctl start ssh
-
Verify SSH key permissions:
chmod 600 ~/.ssh/pi_key chmod 644 ~/.ssh/pi_key.pub
Problem: Script hangs waiting for package manager locks
Solutions:
-
Automatic handling (recommended): The script now automatically handles locks with configurable timeouts and force removal.
-
Manual cleanup script:
sudo ./scripts/clear-package-locks.sh
-
Manual commands:
# Kill blocking processes sudo pkill -f "apt|dpkg|unattended-upgrade" # Remove lock files sudo rm -f /var/lib/dpkg/lock* sudo rm -f /var/cache/apt/archives/lock # Fix broken packages sudo dpkg --configure -a sudo apt-get update
-
Configuration options in
vars/maintenance_config.yml
:software_management_force_unlock: true # Auto-remove stale locks software_management_kill_blocking_processes: false # Kill blocking processes
Problem: Package updates fail with dependency errors
Solutions:
-
Run manual cleanup on affected Pi:
ssh pi@<pi-ip> sudo apt update sudo apt --fix-broken install sudo apt autoremove
-
Use emergency maintenance profile:
ansible-playbook -i inventory/hosts.yml playbooks/maintenance.yml \ --extra-vars "maintenance_profile=emergency"
Problem: Insufficient disk space prevents updates
Solutions:
-
Run cleanup-only maintenance:
ansible-playbook -i inventory/hosts.yml playbooks/maintenance.yml \ --tags "cleanup" --limit "<affected-host>"
-
Manually clean up logs:
ansible-playbook -i inventory/hosts.yml playbooks/maintenance.yml \ --tags "log_cleanup" --extra-vars "aggressive_cleanup=true"
Problem: Pi is throttling due to high temperature
Solutions:
-
Check current temperature:
ansible raspberry_pis -i inventory/hosts.yml -m shell \ -a "vcgencmd measure_temp"
-
Skip temperature-sensitive operations:
ansible-playbook -i inventory/hosts.yml playbooks/maintenance.yml \ --extra-vars "skip_cpu_intensive=true"
# Ansible verbose output
ansible-playbook -i inventory/hosts.yml playbooks/maintenance.yml -vvv
# Debug specific host
ansible-playbook -i inventory/hosts.yml playbooks/maintenance.yml \
--limit "problematic-host" -vvv
# View maintenance logs
tail -f logs/maintenance-$(date +%Y%m%d).log
# View Ansible logs
export ANSIBLE_LOG_PATH=./ansible.log
ansible-playbook -i inventory/hosts.yml playbooks/maintenance.yml
tail -f ansible.log
# Test all hosts
ansible raspberry_pis -i inventory/hosts.yml -m ping
# Test specific host
ansible pi-sensor-01 -i inventory/hosts.yml -m ping
# Gather facts
ansible raspberry_pis -i inventory/hosts.yml -m setup
A: Recommended schedule:
- Weekly: Full maintenance for production systems
- Daily: Health checks only
- Monthly: Deep cleanup and comprehensive reporting
- As needed: Security updates and emergency maintenance
A: Yes, use the --limit
flag:
ansible-playbook -i inventory/hosts.yml playbooks/maintenance.yml --limit "sensor_pis"
A: The script will skip offline hosts and continue with others. Check the final report for a summary of unreachable hosts.
A: Add them to the required_packages
list in vars/maintenance_config.yml
or use host-specific variables:
required_packages:
- htop
- vim
- git
- custom-package
A: Yes, add them to the package_hold_list
:
package_hold_list:
- kernel-package
- critical-service
A: Configure email settings in vars/maintenance_config.yml
:
email_notifications:
enabled: true
smtp_server: smtp.gmail.com
smtp_port: 587
username: your-email@gmail.com
password: your-app-password
recipients:
- admin@company.com
- team@company.com
A: This happens when another package management process is running. Solutions:
-
Automatic handling (recommended): Enable force unlock in
vars/maintenance_config.yml
:software_management_force_unlock: true software_management_kill_blocking_processes: false # Set to true for aggressive cleanup
-
Manual cleanup: Run the provided script:
sudo ./scripts/clear-package-locks.sh
-
Manual commands:
# Kill blocking processes sudo pkill -f "apt|dpkg|unattended-upgrade" # Remove lock files sudo rm -f /var/lib/dpkg/lock* sudo rm -f /var/cache/apt/archives/lock # Fix broken packages sudo dpkg --configure -a sudo apt-get update
A: The script includes rollback mechanisms. For critical failures:
- Check the error logs in
logs/
- Review the maintenance report
- Use emergency rollback if available:
ansible-playbook -i inventory/hosts.yml playbooks/emergency_rollback.yml
A: For large fleets:
- Increase parallelism in
ansible.cfg
:[defaults] forks = 20
- Use dynamic inventory sources
- Implement staged rollouts:
ansible-playbook -i inventory/hosts.yml playbooks/maintenance.yml \ --limit "batch_1" --extra-vars "batch_mode=true"
A: Yes, the script supports webhook notifications and structured JSON output that can be consumed by monitoring systems like Prometheus, Grafana, or custom dashboards.
For issues, questions, or contributions:
- Check the troubleshooting section above
- Review the logs in the
logs/
directory - Create an issue with detailed error information and configuration details
This project is licensed under the MIT License - see the LICENSE file for details.