Showing with 224 additions and 3 deletions.
  1. +9 −0 CHANGELOG.md
  2. +95 −0 README.md
  3. +37 −0 REFERENCE.md
  4. +2 −2 lib/facter/pe_status_check.rb
  5. +1 −1 metadata.json
  6. +80 −0 plans/agent_state_summary.pp
9 changes: 9 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,15 @@

All notable changes to this project will be documented in this file. The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/) and this project adheres to [Semantic Versioning](http://semver.org).

## [v4.4.0](https://github.com/puppetlabs/puppetlabs-pe_status_check/tree/v4.4.0) (2024-09-19)

[Full Changelog](https://github.com/puppetlabs/puppetlabs-pe_status_check/compare/v4.3.0...v4.4.0)

### Added

- \(SUP-4955\) Wildcard Operator Added for file license ingestion [\#233](https://github.com/puppetlabs/puppetlabs-pe_status_check/pull/233) ([Aaronoftheages](https://github.com/Aaronoftheages))
- Add plan for Puppet agent state summary [\#226](https://github.com/puppetlabs/puppetlabs-pe_status_check/pull/226) ([bastelfreak](https://github.com/bastelfreak))

## [v4.3.0](https://github.com/puppetlabs/puppetlabs-pe_status_check/tree/v4.3.0) (2024-07-01)

[Full Changelog](https://github.com/puppetlabs/puppetlabs-pe_status_check/compare/v4.2.0...v4.3.0)
Expand Down
95 changes: 95 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -157,6 +157,101 @@ environment. You can plott it in a more human-readable way with the
[puppet/format](https://github.com/voxpupuli/puppet-format?tab=readme-ov-file#puppet-format)
modules.


The plan `pe_status_check::agent_state_summary` provides you a hash with all nodes, grouped by failure state:

```json
{
"noop": [ ],
"corrective_changes": [ ],
"used_cached_catalog": [ ],
"failed": [ ],
"changed": [ "student2.local" ],
"unresponsive": [ "student3.local", "student4.local", "student1.local", "login.local" ],
"responsive": [ "pe.bastelfreak.local"],
"unhealthy": [ "student2.local", "student3.local", "student4.local", "student1.local", "login.local" ],
"unhealthy_counter": 5,
"healthy": [ "pe.bastelfreak.local" ],
"healthy_counter": 1,
"total_counter": 6
}
```

* `noop`: last catalog was applied in noop mode
* `failed`: The last catalog couldn't be compiled or catalog application raised an error
* `changed`: A node reported a change
* `unresponsive`: Last report is older than 30 minutes (can be configured via the `runinterval` parameter)
* `corrective_changes`: A node reported corrective changes
* `used_cached_catalog`: The node didn't apply a new catalog but used a cached version
* `unhealthy`: List of nodes that are in any of the above categories
* `responsive`: Last report isn't older than 30 minutes (can be configured via the `runinterval` parameter). Doesn't matter if the report is healthy.
* `healthy`: All nodes - unhealthy
* `unhealthy_counter`: Amount of unhealthy nodes
* `healthy_counter`: Amount of healthy nodes
* `total_counter`: Amount of all nodes in PuppetDB

The goal of this plan is to run it before doing major upgrades, to ensure that your agents are in a healthy state.

To turn this into a table:

```
$result = run_plan('pe_status_check::agent_state_summary')
$table = format::table(
{
title => 'Puppet Agent states',
head => ['status check', 'Nodes'],
rows => $result.map |$key, $data| { [$key, [$data].flatten.join(', ')]},
}
)
out::message($table)
```

example output:

```
+------------------------------------------------+
| Puppet Agent states |
+---------------------+--------------------------+
| status check | Nodes |
+---------------------+--------------------------+
| noop | |
| corrective_changes | |
| used_cached_catalog | |
| failed | |
| changed | |
| unresponsive | |
| responsive | puppet.bastelfreak.local |
| unhealthy | |
| unhealthy_counter | 0 |
| healthy | puppet.bastelfreak.local |
| healthy_counter | 1 |
| total_counter | 1 |
+---------------------+--------------------------+
```

The plan has two parameters. By default it will log all unhealthy nodes. You can disable it by setting `log_unhealthy_nodes` to `false`. Then you will get:

```json
{
"total_counter": 1,
"healthy_counter": 1,
"unhealthy_counter": 0
}
```

You can also enable the logging of of healthy nodes by setting `log_healthy_nodes` to `true`. In combination with `log_unhealthy_nodes` to `false` you get:

```json
{
"healthy": [
"puppet.bastelfreak.local"
],
"total_counter": 1,
"healthy_counter": 1,
"unhealthy_counter": 0
}
```

### Using a Puppet Query to report status.

As the pe_status_check module uses Puppet's existing fact behavior to gather the status data from each of the agents, it is possible to use PQL (puppet query language) to gather this information.
Expand Down
37 changes: 37 additions & 0 deletions REFERENCE.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@

### Plans

* [`pe_status_check::agent_state_summary`](#pe_status_check--agent_state_summary): provides an overview of all Puppet agents and their error states
* [`pe_status_check::agent_summary`](#pe_status_check--agent_summary): Summary report of the state of agent_status_check on each node
Uses the facts task to get the current status from each node
and produces a summary report in JSON
Expand Down Expand Up @@ -84,6 +85,42 @@ Default value: `true`

## Plans

### <a name="pe_status_check--agent_state_summary"></a>`pe_status_check::agent_state_summary`

provides an overview of all Puppet agents and their error states

#### Parameters

The following parameters are available in the `pe_status_check::agent_state_summary` plan:

* [`runinterval`](#-pe_status_check--agent_state_summary--runinterval)
* [`log_healthy_nodes`](#-pe_status_check--agent_state_summary--log_healthy_nodes)
* [`log_unhealthy_nodes`](#-pe_status_check--agent_state_summary--log_unhealthy_nodes)

##### <a name="-pe_status_check--agent_state_summary--runinterval"></a>`runinterval`

Data type: `Integer[0]`

the runinterval for the Puppet Agent in minutes. We consider latest reports that are older than runinterval as unresponsive

Default value: `30`

##### <a name="-pe_status_check--agent_state_summary--log_healthy_nodes"></a>`log_healthy_nodes`

Data type: `Boolean`

optionally return all healthy nodes, not only the unhealthy

Default value: `false`

##### <a name="-pe_status_check--agent_state_summary--log_unhealthy_nodes"></a>`log_unhealthy_nodes`

Data type: `Boolean`

optionally hide unhealthy nodes

Default value: `true`

### <a name="pe_status_check--agent_summary"></a>`pe_status_check::agent_summary`

Summary report of the state of agent_status_check on each node
Expand Down
4 changes: 2 additions & 2 deletions lib/facter/pe_status_check.rb
Original file line number Diff line number Diff line change
Expand Up @@ -238,12 +238,12 @@
next unless ['primary'].include?(Facter.value('pe_status_check_role'))

# Check for suite license file
suite_license_file = '/etc/puppetlabs/suite-license.lic'
suite_license_pattern = '/etc/puppetlabs/*.lic'

# Check for license key file
license_file = '/etc/puppetlabs/license.key'

if File.exist?(suite_license_file)
if !Dir.glob(suite_license_pattern).empty?
# Presence of suite-license.lic file satisfies check
validity = true
elsif !validity && File.exist?(license_file)
Expand Down
2 changes: 1 addition & 1 deletion metadata.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"name": "puppetlabs-pe_status_check",
"version": "4.3.0",
"version": "4.4.0",
"author": "Marty Ewings",
"summary": "A Puppet Enterprise Module to Promote Preventative Maintenance and Self Service",
"license": "Apache-2.0",
Expand Down
80 changes: 80 additions & 0 deletions plans/agent_state_summary.pp
Original file line number Diff line number Diff line change
@@ -0,0 +1,80 @@
#
# @summary provides an overview of all Puppet agents and their error states
#
# @param runinterval the runinterval for the Puppet Agent in minutes. We consider latest reports that are older than runinterval as unresponsive
# @param log_healthy_nodes optionally return all healthy nodes, not only the unhealthy
# @param log_unhealthy_nodes optionally hide unhealthy nodes
#
# @author Tim Meusel <tim@bastelfreak.de>
#
plan pe_status_check::agent_state_summary (
Integer[0] $runinterval = 30,
Boolean $log_healthy_nodes = false,
Boolean $log_unhealthy_nodes = true,
){
# a list of all nodes and their latest catalog state
$nodes = puppetdb_query('nodes[certname,latest_report_noop,latest_report_corrective_change,cached_catalog_status,latest_report_status,report_timestamp]{}')
$fqdns = $nodes.map |$node| { $node['certname'] }

# check if the last catalog is older than X minutes
$current_timestamp = Integer(Timestamp().strftime('%s'))
$runinterval_seconds = $runinterval * 60
$unresponsive = $nodes.map |$node| {
$old_timestamp = Integer(Timestamp($node['report_timestamp']).strftime('%s'))
if ($current_timestamp - $old_timestamp) >= $runinterval_seconds {
$node['certname']
}
}.filter |$node| { $node =~ NotUndef }

# all nodes that delivered a report in time
$responsive = $fqdns - $unresponsive

# all nodes that used noop for the last catalog
$noop = $nodes.map |$node| { if ($node['latest_report_noop'] == true){ $node['certname'] } }.filter |$node| { $node =~ NotUndef }

# all nodes that reported corrective changes
$corrective_changes = $nodes.map |$node| { if ($node['latest_report_corrective_change'] == true){ $node['certname'] } }.filter |$node| { $node =~ NotUndef }

# all nodes that used a cached catalog on the last run
$used_cached_catalog = $nodes.map |$node| { if ($node['cached_catalog_status'] != 'not_used'){ $node['certname'] } }.filter |$node| { $node =~ NotUndef }

# all nodes with failed resources in the last report
$failed = $nodes.map |$node| { if ($node['latest_report_status'] == 'failed'){ $node['certname'] } }.filter |$node| { $node =~ NotUndef }

# all nodes with changes in the last report
$changed = $nodes.map |$node| { if ($node['latest_report_status'] == 'changed'){ $node['certname'] } }.filter |$node| { $node =~ NotUndef }

# all nodes that aren't healthy in any form
$unhealthy = [$noop, $corrective_changes, $used_cached_catalog, $failed, $changed, $unresponsive].flatten.unique

# all healthy nodes
$healthy = $fqdns - $unhealthy

$data = if $log_unhealthy_nodes {
{
'noop' => $noop,
'corrective_changes' => $corrective_changes,
'used_cached_catalog' => $used_cached_catalog,
'failed' => $failed,
'changed' => $changed,
'unresponsive' => $unresponsive,
'responsive' => $responsive,
'unhealthy' => $unhealthy,
'unhealthy_counter' => $unhealthy.count,
'healthy_counter' => $healthy.count,
'total_counter' => $fqdns.count,
}
} else {
{
'unhealthy_counter' => $unhealthy.count,
'healthy_counter' => $healthy.count,
'total_counter' => $fqdns.count,
}
}

return if $log_healthy_nodes {
$data + { 'healthy' => $healthy }
} else {
$data
}
}