Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(SUP-4625) Add check for excessive JRubies #209

Merged
merged 2 commits into from Nov 12, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
7 changes: 4 additions & 3 deletions README.md
Expand Up @@ -279,9 +279,10 @@ Refer below for next steps when any indicator reports a `false`.
| S0039 | Determines if Puppets Server has reached its `queue-limit-hit-rate`,and is sending messages to agents. | [Check the max-queued-requests article for more information.](https://support.puppet.com/hc/en-us/articles/115003769433) | If the article is unable to solve your issue, open a Support ticket referencing S0039, indicating the investigation so far, and any issues you encountered, then provide the [support script](https://puppet.com/docs/pe/latest/getting_support_for_pe.html#pe_support_script) output from the primary server.
| S0040 | Determines if PE is collecting system metrics. | If system metrics are not collected by default, the sysstat package is not installed on the impacted PE infrastructure component. Install the package and set the parameter `puppet_enterprise::enable_system_metrics_collection` to true. [See the documentation.](https://puppet.com/docs/pe/latest/getting_support_for_pe.html#puppet_metrics_collector) | After system metrics are configured, you do not see any files in `/var/log/sa` or if the `/var/log/sa` directory does not exist, open a Support ticket. |
| S0041 | Determines if the pxp broker on a compiler has an established connection to another pxp broker | To resolve a connection issue from a compiler to a pcp broker examine the following log `/var/log/puppetlabs/puppetserver/pcp-broker.log` for an explanation, Compilers should be attempting to make a connection to port 8143 on the primary server, ssl can not be terminated on a network appliance and must passthrough directly to the primary server. Ensure the connnection attempt is not to another compiler in the pool | If unable to make a connection to a broker, raise a ticket with the support team quoting S0041 and attaching the file `/var/log/puppetlabs/puppetserver/pcp-broker.log` along with the conclusions of your investigation so far |
| S0042 |Determines if the pxp-agent has an established connection to a pxp broker | Ensure the pxp-agent service is running. Check S0002 can make that determination. if running check `/var/log/puppetlabs/pxp-agent/pxp-agent.log` (on *nix) or `C:/ProgramData/PuppetLabs/pxp-agent/var/log/pxp-agent.log` (on Windows), for connection issues, first ensuring the agent is connecting to the proper endpoint, for example, a compiler and not the primary. This fact can also be used as a target filter for running tasks, ensuring time is not wasted sending instructions to agents not connected to a broker | If unable to make a connection to a broker, raise a ticket with the support team quoting S0042 and attaching the file `/var/log/puppetlabs/pxp-agent/pxp-agent.log` (on *nix) or `C:/ProgramData/PuppetLabs/pxp-agent/var/log/pxp-agent.log` (on Windows), along with the conclusions of your investigation so far |
| S0043 |Determines if there are nodes with Puppet agent versions ahead of the primary server | Agent nodes should not be running Puppet agent versions ahead of infrastructure nodes. Instead consider upgrading PE so that PE package management contains the desired Puppet agent version. See the [upgrading PE](https://puppet.com/docs/pe/latest/upgrading_pe.html) and [upgrading agents](https://puppet.com/docs/latest/upgrading_agents.html) documentation for more information. | If you are unable to determine why the indicator is evaluating to `false` or have questions about Puppet agent versions, open a support ticket and reference S0043. |
| S0044 |Determines if Puppet Servers are using the the PE classifier for the node data plugin (node terminus) | Due to performance optimizations, it is recommended to use the PE classifier plugin instead of external node classifier (ENC) scripts or applications. See the [node_terminus configuration setting documentation](https://www.puppet.com/docs/puppet/7/configuration.html#node-terminus) for more information. | If you have additional questions about the node_terminus configuration setting, open a support ticket and reference S0044. |
| S0042 | Determines if the pxp-agent has an established connection to a pxp broker | Ensure the pxp-agent service is running. Check S0002 can make that determination. if running check `/var/log/puppetlabs/pxp-agent/pxp-agent.log` (on *nix) or `C:/ProgramData/PuppetLabs/pxp-agent/var/log/pxp-agent.log` (on Windows), for connection issues, first ensuring the agent is connecting to the proper endpoint, for example, a compiler and not the primary. This fact can also be used as a target filter for running tasks, ensuring time is not wasted sending instructions to agents not connected to a broker | If unable to make a connection to a broker, raise a ticket with the support team quoting S0042 and attaching the file `/var/log/puppetlabs/pxp-agent/pxp-agent.log` (on *nix) or `C:/ProgramData/PuppetLabs/pxp-agent/var/log/pxp-agent.log` (on Windows), along with the conclusions of your investigation so far |
| S0043 | Determines if there are nodes with Puppet agent versions ahead of the primary server | Agent nodes should not be running Puppet agent versions ahead of infrastructure nodes. Instead consider upgrading PE so that PE package management contains the desired Puppet agent version. See the [upgrading PE](https://puppet.com/docs/pe/latest/upgrading_pe.html) and [upgrading agents](https://puppet.com/docs/latest/upgrading_agents.html) documentation for more information. | If you are unable to determine why the indicator is evaluating to `false` or have questions about Puppet agent versions, open a support ticket and reference S0043. |
| S0044 | Determines if Puppet Servers are using the the PE classifier for the node data plugin (node terminus) | Due to performance optimizations, it is recommended to use the PE classifier plugin instead of external node classifier (ENC) scripts or applications. See the [node_terminus configuration setting documentation](https://www.puppet.com/docs/puppet/7/configuration.html#node-terminus) for more information. | If you have additional questions about the node_terminus configuration setting, open a support ticket and reference S0044. |
| S0045 | Determines if Puppet Servers are configured with an excessive number of JRubies. | Because each JRuby instance consumes additional memory, having too many can reduce the amount of heap space available to Puppet server and cause excessive garbage collections. While it is possible to increase the heap along with the number of JRubies, we have observered diminishing returns with more than 12 JRubies and therefore recommend an upper limit of 12. We also recommend allocating between 1 - 2gb of heap memory for each JRuby. | If you would like to measure the effects of changing JRubies and heap settings, use the [Puppet Operational Dashboards module](https://forge.puppet.com/modules/puppetlabs/puppet_operational_dashboards/readme) to configure a metrics stack and Grafana dashboards for viewing the metrics. If you still have performance issues or further questions, open a support ticket and reference S0045. |

### Fact: agent_status_check

Expand Down
23 changes: 23 additions & 0 deletions lib/facter/pe_status_check.rb
Expand Up @@ -563,4 +563,27 @@
{ S0044: false }
end
end

chunk(:S0045) do
next unless ['primary', 'legacy_primary', 'replica', 'pe_compiler', 'legacy_compiler'].include?(Facter.value('pe_status_check_role'))
begin
response = PEStatusCheck.http_get('/status/v1/services/jruby-metrics?level=debug', 8140)

if response
num_jrubies = response.dig('status', 'experimental', 'metrics', 'num-jrubies')

unless num_jrubies.nil?
{ S0045: false }
end

{ S0045: num_jrubies <= 12 }
else
{ S0045: false }
end
rescue StandardError => e
Facter.warn("Error in fact 'pe_status_check.S0045': #{e.message}")
Facter.debug(e.backtrace)
{ S0045: false }
end
end
end
28 changes: 27 additions & 1 deletion spec/acceptance/pe_status_check_spec.rb
Expand Up @@ -17,8 +17,8 @@
end
# Test Confirms all facts are false which is another indicator the class is performing correctly
describe 'check no pe_status_check fact is false' do
it 'if idempotent all facts should be true' do

Check failure on line 20 in spec/acceptance/pe_status_check_spec.rb

View workflow job for this annotation

GitHub Actions / PE LTS Testing / Ubuntu-2004, 2021.7.5

pe_status_check class activates module default parameters check no pe_status_check fact is false if idempotent all facts should be true On host `35.197.65.190' Failure/Error: expect(host_inventory['facter']['pe_status_check'].filter { |_k, v| !v }).to be_empty expected `{"S0019"=>false}.empty?` to be truthy, got false

Check failure on line 20 in spec/acceptance/pe_status_check_spec.rb

View workflow job for this annotation

GitHub Actions / PE LTS Testing / Sles-15, 2021.7.5

pe_status_check class activates module default parameters check no pe_status_check fact is false if idempotent all facts should be true On host `34.83.170.35' Failure/Error: expect(host_inventory['facter']['pe_status_check'].filter { |_k, v| !v }).to be_empty expected `{"S0038"=>false}.empty?` to be truthy, got false
expect(host_inventory['facter']['pe_status_check'].size).to eq(40)
expect(host_inventory['facter']['pe_status_check'].size).to eq(41)
expect(host_inventory['facter']['pe_status_check'].filter { |_k, v| !v }).to be_empty
end
end
Expand All @@ -37,7 +37,7 @@
expect(output).to match('S0001 is at fault. The indicator S0001 Determines if Puppet agent Service is running, refer to documentation for required action')
expect(result.stdout).to match(%r{false})
end
it 'if in the exclude list a parameter should not notify' do

Check failure on line 40 in spec/acceptance/pe_status_check_spec.rb

View workflow job for this annotation

GitHub Actions / PE latest Testing / Ubuntu-2004, 2023.4.0

pe_status_check class activates module default parameters check notifications work as expected if in the exclude list a parameter should not notify On host `34.82.39.241' Failure/Error: idempotent_apply(ppp) RuntimeError: apply manifest expected no changes `LC_ALL=en_US.UTF-8 puppet apply manifest_20231108_2493_l4ki8b.pp --trace --detailed-exitcodes` ====== Start output of Puppet apply with unexpected changes ====== �[mNotice: Compiled catalog for litmus-68c42d4f8eb74748.c.ia-content.internal in environment production in 0.07 seconds �[mNotice: S0038 is at fault. The indicator S0038 Determines whether the number of environments within $codedir/environments is less than 100, refer to documentation for required action �[mNotice: /Stage[main]/Pe_status_check/Notify[pe_status_check S0038]/message: defined 'message' as 'S0038 is at fault. The indicator S0038 Determines whether the number of environments within $codedir/environments is less than 100, refer to documentation for required action' �[mNotice: Applied catalog in 0.50 seconds ====== End output of Puppet apply with unexpected changes ======

Check failure on line 40 in spec/acceptance/pe_status_check_spec.rb

View workflow job for this annotation

GitHub Actions / PE latest Testing / Alma-Linux-9, 2023.4.0

pe_status_check class activates module default parameters check notifications work as expected if in the exclude list a parameter should not notify On host `35.247.58.207' Failure/Error: idempotent_apply(ppp) RuntimeError: apply manifest expected no changes `LC_ALL=en_US.UTF-8 puppet apply manifest_20231108_2506_4shakh.pp --trace --detailed-exitcodes` ====== Start output of Puppet apply with unexpected changes ====== �[mNotice: Compiled catalog for litmus-d18976fb886b39c7.c.ia-content.internal in environment production in 0.07 seconds �[mNotice: S0038 is at fault. The indicator S0038 Determines whether the number of environments within $codedir/environments is less than 100, refer to documentation for required action �[mNotice: /Stage[main]/Pe_status_check/Notify[pe_status_check S0038]/message: defined 'message' as 'S0038 is at fault. The indicator S0038 Determines whether the number of environments within $codedir/environments is less than 100, refer to documentation for required action' �[mNotice: Applied catalog in 0.47 seconds ====== End output of Puppet apply with unexpected changes ======
ppp = <<-MANIFEST
class {'pe_status_check':
indicator_exclusions => ['S0001','S0019'],
Expand Down Expand Up @@ -360,6 +360,32 @@
expect(result.stdout).to match(%r{false})
run_shell('puppet config set --section master node_terminus classifier')
end
it 'if S0045 conditions for false are met' do
manifest = <<-PUPPETCODE
pe_hocon_setting { 'jruby-puppet.max-active-instances':
ensure => present,
path => '/etc/puppetlabs/puppetserver/conf.d/pe-puppet-server.conf',
setting => 'jruby-puppet.max-active-instances',
value => 13,
}
PUPPETCODE

apply_manifest(manifest)
run_shell('systemctl restart pe-puppetserver')
result = run_shell('facter -p pe_status_check.S0045')
expect(result.stdout).to match(%r{false})

manifest = <<-PUPPETCODE
pe_hocon_setting { 'jruby-puppet.max-active-instances':
ensure => present,
path => '/etc/puppetlabs/puppetserver/conf.d/pe-puppet-server.conf',
setting => 'jruby-puppet.max-active-instances',
value => 1,
}
PUPPETCODE
apply_manifest(manifest)
run_shell('systemctl restart pe-puppetserver')
end
end
end
end