Skip to content

Commit 29bbb43

Browse files
committed
doc: full host management
1 parent b712bb7 commit 29bbb43

File tree

1 file changed

+382
-0
lines changed

1 file changed

+382
-0
lines changed

docs/roles/full_host_management.md

Lines changed: 382 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,385 @@
11
## Full Management Host.
22

3+
# Full Host Management Configuration
34

5+
## Overview
6+
7+
Full host management in LinuxAid provides comprehensive control over Linux servers through Puppet. This document outlines what components are managed in **noop mode** (simulation only) versus **no-noop mode** (active changes applied).
8+
9+
## Configuration Matrix
10+
11+
### Management Status Table
12+
13+
| Component | Parameter | Enabled | Mode | What Gets Managed |
14+
|-----------|-----------|---------|------|-------------------|
15+
| **Repository Management** | `common::repo::manage` | ✅ Yes | No-noop | YUM/DNF/APT repositories, GPG keys, package sources, repository priorities |
16+
| **Logging** | `common::logging::manage` | ✅ Yes | No-noop | Rsyslog/syslog forwarding, log rotation, journald settings, centralized logging |
17+
| **Backup** | `common::backup::manage` | ✅ Yes | No-noop | Backup schedules, retention policies, backup scripts, storage locations |
18+
| **Cron Jobs** | `common::cron::purge_unmanaged` | ⚙️ root-only | No-noop | Root user cron jobs (unmanaged jobs will be purged) |
19+
| **Virtualization** | `common::virtualization::manage` | ✅ Yes | No-noop | KVM/QEMU settings, VMware Tools, VirtIO drivers, guest tools |
20+
| **Network** | `common::network::manage` | ✅ Yes | No-noop | Network interfaces, routing tables, firewall rules, DNS settings |
21+
| **Services** | `common::services::manage` | ✅ Yes | No-noop | System services (start/stop/enable), service dependencies, init scripts |
22+
| **Storage** | `common::storage::manage` | ❌ No | Disabled | File systems, LVM, disk partitioning, mount points |
23+
| **System** | `common::system::manage` | ✅ Yes | No-noop | Hostname, timezone, kernel parameters, system packages, OS settings |
24+
| **Security** | `common::security::manage` | ✅ Yes | No-noop | Firewall, SELinux/AppArmor, SSH configuration, sudo rules, user accounts |
25+
| **Monitoring** | `common::monitoring::manage` | ✅ Yes | No-noop | Monitoring agents, health checks, metrics collection, alerting |
26+
| **Extra Features** | `common::extras::manage` | ❌ No | Disabled | Additional optional features and integrations |
27+
| **Mail** | `common::mail::manage` | ✅ Yes | No-noop | Mail transfer agent (MTA), relay configuration, mail routing |
28+
29+
## Mode Definitions
30+
31+
### 🟢 No-noop Mode (Active Management)
32+
When a component is enabled, Puppet **actively applies** all configuration changes to the host. Changes are immediately enforced and the system is brought into compliance with the desired state.
33+
34+
### 🔵 Noop Mode (Simulation)
35+
Puppet only **simulates** changes and reports what would be changed without actually applying them. Useful for testing and validation.
36+
37+
### 🔴 Disabled
38+
Component is not managed by Puppet at all. Manual configuration or other tools must be used.
39+
40+
## Detailed Component Breakdown
41+
42+
### 1. Repository Management
43+
**Status:** ✅ Enabled (No-noop)
44+
45+
| Aspect | Details |
46+
|--------|---------|
47+
| **What's Managed** | Package repositories, GPG keys, mirror configurations |
48+
| **Impact** | Controls where packages are installed from |
49+
| **Risk Level** | Medium - can affect package availability |
50+
| **Rollback** | Can revert repository configs via Puppet |
51+
52+
**Manages:**
53+
- YUM/DNF repositories (RHEL/CentOS/Fedora/SLES)
54+
- APT repositories (Debian/Ubuntu)
55+
- Repository priorities and exclusions
56+
- GPG key imports and validation
57+
58+
---
59+
60+
### 2. Logging Configuration
61+
**Status:** ✅ Enabled (No-noop)
62+
63+
| Aspect | Details |
64+
|--------|---------|
65+
| **What's Managed** | Syslog, rsyslog, journald, log rotation |
66+
| **Impact** | Controls log collection and forwarding |
67+
| **Risk Level** | Low - doesn't affect application functionality |
68+
| **Rollback** | Easy via Puppet configuration changes |
69+
70+
**Manages:**
71+
- Centralized logging destinations
72+
- Log retention and rotation policies
73+
- Log format and filtering rules
74+
- Remote syslog forwarding
75+
76+
---
77+
78+
### 3. Backup Management
79+
**Status:** ✅ Enabled (No-noop)
80+
81+
| Aspect | Details |
82+
|--------|---------|
83+
| **What's Managed** | Backup jobs, schedules, retention |
84+
| **Impact** | Ensures data protection compliance |
85+
| **Risk Level** | Low - backup failures don't affect production |
86+
| **Rollback** | Can adjust schedules and policies |
87+
88+
**Manages:**
89+
- Backup tool installation and configuration
90+
- Backup schedules (cron jobs)
91+
- Retention policies
92+
- Backup destination configuration
93+
94+
---
95+
96+
### 4. Cron Job Management
97+
**Status:** ⚙️ Root-only purge (No-noop)
98+
99+
| Aspect | Details |
100+
|--------|---------|
101+
| **What's Managed** | Root user's crontab |
102+
| **Impact** | Removes unauthorized scheduled tasks |
103+
| **Risk Level** | Medium - can remove manually added cron jobs |
104+
| **Rollback** | Must re-add via Puppet or restore from backup |
105+
106+
**Behavior:**
107+
- Purges unmanaged cron jobs for root user only
108+
- Other users' cron jobs are left untouched
109+
- Ensures only Puppet-managed tasks run
110+
111+
---
112+
113+
### 5. Virtualization Settings
114+
**Status:** ✅ Enabled (No-noop)
115+
116+
| Aspect | Details |
117+
|--------|---------|
118+
| **What's Managed** | Hypervisor tools, guest agents |
119+
| **Impact** | Optimizes VM performance and integration |
120+
| **Risk Level** | Low - improves VM functionality |
121+
| **Rollback** | Can remove or update tools |
122+
123+
**Manages:**
124+
- VMware Tools / open-vm-tools
125+
- VirtIO drivers
126+
- QEMU guest agent
127+
- Hypervisor-specific optimizations
128+
129+
---
130+
131+
### 6. Network Configuration
132+
**Status:** ✅ Enabled (No-noop)
133+
134+
| Aspect | Details |
135+
|--------|---------|
136+
| **What's Managed** | Network interfaces, routing, firewall |
137+
| **Impact** | Controls network connectivity |
138+
| **Risk Level** | **High** - can cause network outages |
139+
| **Rollback** | May require console access if misconfigured |
140+
141+
**Manages:**
142+
- Network interface configuration (IP, gateway, DNS)
143+
- Static routes
144+
- Firewall rules (iptables/firewalld/nftables)
145+
- Network bonding and VLANs
146+
147+
**⚠️ WARNING:** Network changes can cause loss of connectivity. Test thoroughly before production deployment.
148+
149+
---
150+
151+
### 7. Service Management
152+
**Status:** ✅ Enabled (No-noop)
153+
154+
| Aspect | Details |
155+
|--------|---------|
156+
| **What's Managed** | System services (systemd/init) |
157+
| **Impact** | Controls which services run |
158+
| **Risk Level** | Medium-High - can stop critical services |
159+
| **Rollback** | Can restart services via Puppet |
160+
161+
**Manages:**
162+
- Service enable/disable state
163+
- Service start/stop/restart
164+
- Service dependencies
165+
- Init scripts and systemd units
166+
167+
---
168+
169+
### 8. Storage Management
170+
**Status:** ❌ Disabled
171+
172+
| Aspect | Details |
173+
|--------|---------|
174+
| **What's Managed** | Nothing - disabled |
175+
| **Impact** | No automated storage management |
176+
| **Risk Level** | N/A |
177+
| **Manual Required** | Yes - manage manually or via other tools |
178+
179+
**NOT Managed:**
180+
- Disk partitioning
181+
- LVM configuration
182+
- File system creation
183+
- Mount points
184+
- RAID configuration
185+
186+
**Managed:**
187+
- ZFS scrub
188+
- NFS mount
189+
- Samba setup
190+
- Filesystem Quota setup
191+
192+
**Reason for Disabling:** Storage changes are high-risk and typically require manual intervention.
193+
194+
---
195+
196+
### 9. System Configuration
197+
**Status:** ✅ Enabled (No-noop)
198+
199+
| Aspect | Details |
200+
|--------|---------|
201+
| **What's Managed** | Hostname, timezone, kernel parameters |
202+
| **Impact** | Core system settings |
203+
| **Risk Level** | Medium - some changes require reboot |
204+
| **Rollback** | Can revert via Puppet |
205+
206+
**Manages:**
207+
- Hostname and domain name
208+
- Timezone configuration
209+
- Kernel parameters (sysctl)
210+
- System packages
211+
- OS-level settings
212+
213+
---
214+
215+
### 10. Security Settings
216+
**Status:** ✅ Enabled (No-noop)
217+
218+
| Aspect | Details |
219+
|--------|---------|
220+
| **What's Managed** | Firewall, SELinux, SSH, sudo, users |
221+
| **Impact** | Controls system access and security |
222+
| **Risk Level** | **High** - can lock out users |
223+
| **Rollback** | May require console access if misconfigured |
224+
225+
**Manages:**
226+
- Firewall rules and policies
227+
- SELinux/AppArmor policies
228+
- SSH daemon configuration
229+
- Sudo rules and policies
230+
- User and group accounts
231+
- Password policies
232+
233+
**⚠️ WARNING:** Security changes can lock you out. Always test with a backup access method.
234+
235+
---
236+
237+
### 11. Monitoring Configuration
238+
**Status:** ✅ Enabled (No-noop)
239+
240+
| Aspect | Details |
241+
|--------|---------|
242+
| **What's Managed** | Monitoring agents and checks |
243+
| **Impact** | Observability and alerting |
244+
| **Risk Level** | Low - doesn't affect production workloads |
245+
| **Rollback** | Easy via Puppet |
246+
247+
**Manages:**
248+
- Monitoring agent installation (Nagios, Prometheus, etc.)
249+
- Health check configuration
250+
- Metrics collection
251+
- Alert configuration
252+
253+
---
254+
255+
### 12. Extra Features
256+
**Status:** ❌ Disabled
257+
258+
| Aspect | Details |
259+
|--------|---------|
260+
| **What's Managed** | Nothing - disabled |
261+
| **Impact** | No additional features managed |
262+
| **Risk Level** | N/A |
263+
| **Manual Required** | Enable if needed |
264+
265+
**Purpose:** Placeholder for optional integrations and features not required for standard host management.
266+
267+
---
268+
269+
### 13. Mail Configuration
270+
**Status:** ✅ Enabled (No-noop)
271+
272+
| Aspect | Details |
273+
|--------|---------|
274+
| **What's Managed** | Mail transfer agent (MTA) |
275+
| **Impact** | System email delivery |
276+
| **Risk Level** | Low - usually only affects system notifications |
277+
| **Rollback** | Can reconfigure via Puppet |
278+
279+
**Manages:**
280+
- MTA installation (Postfix, Exim, etc.)
281+
- Mail relay configuration
282+
- Mail routing rules
283+
- SMTP authentication
284+
285+
---
286+
287+
## Risk Assessment Summary
288+
289+
### High Risk Components (Require Careful Testing)
290+
291+
| Component | Risk | Why |
292+
|-----------|------|-----|
293+
| **Network** | 🔴 High | Can cause complete loss of connectivity |
294+
| **Security** | 🔴 High | Can lock out administrative access |
295+
| **Services** | 🟡 Medium-High | Can stop critical applications |
296+
297+
### Medium Risk Components
298+
299+
| Component | Risk | Why |
300+
|-----------|------|-----|
301+
| **Cron** | 🟡 Medium | May remove manually added scheduled tasks |
302+
| **System** | 🟡 Medium | Some changes may require reboot |
303+
| **Repository** | 🟡 Medium | Can affect package availability |
304+
305+
### Low Risk Components
306+
307+
| Component | Risk | Why |
308+
|-----------|------|-----|
309+
| **Logging** | 🟢 Low | Doesn't affect application functionality |
310+
| **Backup** | 🟢 Low | Failures don't impact production |
311+
| **Monitoring** | 🟢 Low | Only affects observability |
312+
| **Mail** | 🟢 Low | Only affects system notifications |
313+
| **Virtualization** | 🟢 Low | Improves performance, minimal risk |
314+
315+
---
316+
317+
## Best Practices
318+
319+
### Before Enabling Full Host Management
320+
321+
1. **Test in Development First**
322+
- Deploy to test/dev environment
323+
- Validate all changes in noop mode
324+
- Monitor for issues
325+
326+
2. **Have Rollback Plan**
327+
- Document current configuration
328+
- Ensure console/OOB access available
329+
- Keep backup of critical configs
330+
331+
3. **Staged Rollout**
332+
- Start with low-risk components
333+
- Enable high-risk components last
334+
- Monitor each stage before proceeding
335+
336+
4. **Communication**
337+
- Notify stakeholders of changes
338+
- Schedule maintenance windows for risky changes
339+
- Document expected changes
340+
341+
## Configuration Example
342+
343+
```yaml
344+
# Full host management with safe defaults
345+
common::repo::manage: true
346+
common::logging::manage: true
347+
common::backup::manage: true
348+
common::cron::purge_unmanaged: 'root-only'
349+
common::virtualization::manage: true
350+
common::network::manage: true # ⚠️ TEST CAREFULLY
351+
common::services::manage: true
352+
common::storage::manage: false # Disabled by default (high risk)
353+
common::system::manage: true
354+
common::security::manage: true # ⚠️ TEST CAREFULLY
355+
common::monitoring::manage: true
356+
common::extras::manage: false # Disabled by default (not needed)
357+
common::mail::manage: true
358+
```
359+
360+
---
361+
362+
## Troubleshooting
363+
364+
### Common Issues
365+
366+
| Issue | Cause | Solution |
367+
|-------|-------|----------|
368+
| Lost network connectivity | Network config error | Use console access to revert changes |
369+
| Locked out of SSH | Security policy too strict | Use console to adjust SSH/firewall rules |
370+
| Services not starting | Service dependency issue | Check Puppet logs and service status |
371+
| Cron jobs disappeared | Purged by Puppet | Add jobs to Puppet configuration |
372+
373+
### Recovery Steps
374+
375+
1. **Access via Console**: Use out-of-band management (iLO, iDRAC, KVM)
376+
2. **Check Puppet Logs**: journalctl -u run-puppet
377+
3. **Run Puppet Manually**: `puppet agent -t --noop` to see what would change
378+
4. **Disable Puppet Agent**: `puppet agent --disable` to prevent further changes
379+
5. **Revert Configuration**: Update Hiera and re-run Puppet
380+
381+
---
382+
383+
## Summary
384+
385+
Full host management provides comprehensive control but requires careful planning. Start with low-risk components, test thoroughly, and always maintain a rollback plan. The configuration matrix above helps you understand what will be actively managed when full host management is enabled.

0 commit comments

Comments
 (0)