sensu::client::config keepalives 'change' every run #336

Closed
poolski opened this Issue Mar 25, 2015 · 58 comments

Projects

None yet

10 participants

@poolski
Contributor
poolski commented Mar 25, 2015

Every single time I run puppet, the module seems to change my thresholds for keepalive, flapping detection etc as follows. It's not a big deal but it does add time to puppet runs and it's a bit misleading, given no change takes place

Notice: /Stage[main]/Sensu::Client::Config/Sensu_client_config[hostname.example.com]/custom: custom changed 'handlers => ["default"]' to 'handlers => ["default"], keepalive => {"high_flap_threshold"=>20, "low_flap_threshold"=>5, "refresh"=>14400, "thresholds"=>{"critical"=>120, "warning"=>90}}'
Notice: /Stage[main]/Sensu::Client::Config/Sensu_client_config[hostname.example.com]/keepalive: keepalive changed 'high_flap_threshold => 20, low_flap_threshold => 5, refresh => 14400, thresholds => {"critical"=>120, "warning"=>90}' to ''
@jlambert121
Collaborator

What version of the module are you using?

@poolski
Contributor
poolski commented Mar 25, 2015

1.5.0

@poolski
Contributor
poolski commented Apr 1, 2015

Any joy with this?

Also, when's the next release of the module out? I'm looking forward to being able to incorporate the fixes from #298 in my environments.

@superseb
Contributor

@poolski Can you supply the manifest that is causing these messages? I'll try to reproduce.

@superseb
Contributor

@poolski And please check if the fix in #313 works for you.

@poolski
Contributor
poolski commented Apr 14, 2015

I will, @superseb

@poolski
Contributor
poolski commented Apr 14, 2015

With the addition of the port check in #343 my puppet runs now also changes the port every time it runs

Notice: /Stage[main]/Sensu::Client::Config/Sensu_client_config[myhost.mynetwork.net]/port: port changed '' to '3030'
Notice: /Stage[main]/Sensu::Client::Config/Sensu_client_config[myhost.mynetwork.net]/custom: custom changed 'handlers => ["default"]' to 'handlers => ["default"], keepalive => {"high_flap_threshold"=>20, "low_flap_threshold"=>5, "refresh"=>14400, "thresholds"=>{"critical"=>120, "warning"=>90}}'
Notice: /Stage[main]/Sensu::Client::Config/Sensu_client_config[myhost.mynetwork.net]/keepalive: keepalive changed 'high_flap_threshold => 20, low_flap_threshold => 5, refresh => 14400, thresholds => {"critical"=>120, "warning"=>90}' to ''

I'm running v1.5.5 of the module to test if it fixes anything.

@superseb
Contributor

@poolski That's quite weird, you are using v1.5.5 at the moment? Did you restart the puppetmaster? (puppetmaster/pupperserver/pe-httpd)

@poolski
Contributor
poolski commented Apr 14, 2015

@superseb yeah I did. It was throwing all sorts of crazy invalid setting errors till I tried that.
We run a puppetmaster with Apache/Passenger.

Also, yes, I'm using v1.5.5.

@superseb
Contributor

Ok, what Puppet version? Are you using multiple environments?

@superseb
Contributor

Oh @poolski , please post your manifest aswell so I can try to reproduce

@poolski
Contributor
poolski commented Apr 14, 2015

We are running

  • Puppet version 3.7.5

And yes, we have multiple envs, managed with r10k. I'm testing it on our "develop" env before rolling it out to the wider world.

@zakuni
zakuni commented Apr 15, 2015

Hi,
It happens to me also.
keepalive, port, and also redis_reconnect_on_error changes from 'true' to 'false' every time puppet runs.

I'm using sensu-puppet v1.5.5 now, but it've been occured before this version. (sorry I can't tell exactly from which version)

Here is part of my manifest

  class { '::sensu':
    version => '0.17.1-1',
    rabbitmq_ssl_cert_chain => '/etc/sensu/ssl/cert.pem',
    rabbitmq_ssl_private_key => '/etc/sensu/ssl/key.pem',
    rabbitmq_host => $server_host,
    rabbitmq_port => 5671,
    rabbitmq_password => $rabbitmq_pass,
    rabbitmq_vhost => '/sensu',
    purge_config => false,
    server    => $server,
    api       => $server,
    subscriptions => $subscriptions,
    client_custom => {
      'keepalive' => {
        'handlers' => ['default', 'slack']
      }
    },
    use_embedded_ruby => $embedded_ruby,
    sensu_plugin_version => latest,
  }

I've specified 'client_port' and 'redis_reconnect_on_error' explicitly, but it didn't fix.

@superseb
Contributor

Okay. So:

  • The client_port is fixed after v1.5.5, so you can use master till @jamtur01 releases a new version.
  • I just submitted #345 for fixing redis_reconnect_on_error
  • For using the keepalive, you can use client_keepalive like so:
  class { '::sensu':
    version => '0.17.1-1',
    rabbitmq_ssl_cert_chain => '/etc/sensu/ssl/cert.pem',
    rabbitmq_ssl_private_key => '/etc/sensu/ssl/key.pem',
    rabbitmq_host => $server_host,
    rabbitmq_port => 5671,
    rabbitmq_password => $rabbitmq_pass,
    rabbitmq_vhost => '/sensu',
    purge_config => false,
    server    => $server,
    api       => $server,
    subscriptions => $subscriptions,
    client_keepalive => {
        'handlers' => ['default', 'slack']
      }
    },
    use_embedded_ruby => $embedded_ruby,
    sensu_plugin_version => latest,
  }

Let me know if this solves it for you.

@zakuni
zakuni commented Apr 16, 2015

@superseb
Thank you, it solved for me!

@superseb
Contributor

Okay cool, @poolski if you could post your manifest I can take a look.

@poolski
Contributor
poolski commented Apr 16, 2015

@superseb, alright, here we go. It's a bit broken up because most of the stuff lives in Hiera and is loaded automatically by class-based params. My r10k Sensu block is applied to all hosts (it's in a 'base' profile) and looks like this:

  # Sensu config
  $subs = hiera_array('sensu::subscriptions')
  $plugs = hiera_array('sensu::plugins')
  $checks = hiera_hash('sensu::checks')
  if $checks { create_resources(sensu::check, $checks) }
  class {'::sensu':
    subscriptions => $subs,
    plugins       => $plugs,
  }

The corresponding data is as follows. Checks have been removed because there are lots of them and they aren't complaining.

{
  "sensu::client": true,
  "sensu::client_custom": {
    "handlers": [
      "default"
    ],
    "keepalive": {
      "high_flap_threshold": "20",
      "low_flap_threshold": "5",
      "refresh": 14400,
      "thresholds": {
        "critical": 120,
        "warning": 90
      }
    }
  },
  "sensu::plugins": [
    "puppet:///modules/sensu_site/plugins/system/check-apt.sh",
    "puppet:///modules/sensu_site/plugins/system/check-cpu.rb",
    "puppet:///modules/sensu_site/plugins/system/check-disk.rb",
    "puppet:///modules/sensu_site/plugins/system/check-load.rb",
    "puppet:///modules/sensu_site/plugins/system/cpu-metrics.rb",
    "puppet:///modules/sensu_site/plugins/processes/check-procs.rb",
    "puppet:///modules/sensu_site/plugins/system/check-ram.rb",
    "puppet:///modules/sensu_site/plugins/sendmail/sendmail-mqueue.rb",
    "puppet:///modules/sensu_site/plugins/enabler/check-pidfile.sh",
    "puppet:///modules/sensu_site/plugins/network/check-ports.sh"
  ],
  "sensu::purge_config": true,
  "sensu::rabbitmq_host": "10.10.10.10",
  "sensu::rabbitmq_password": "passwordz",
  "sensu::rabbitmq_port": 5671,
  "sensu::rabbitmq_ssl": true,
  "sensu::rabbitmq_ssl_cert_chain": "puppet:///modules/sensu_site/ssl/cert.pem",
  "sensu::rabbitmq_ssl_private_key": "puppet:///modules/sensu_site/ssl/key.pem",
  "sensu::rabbitmq_vhost": "/sensu",
  "sensu::rabbitmq_reconnect_on_error": true,
  "sensu::redis_reconnect_on_error": true,
  "sensu::sensu_plugin_version": "present",
  "sensu::subscriptions": [
    "common"
  ],
  "sensu::use_embedded_ruby": true,
}
@poolski
Contributor
poolski commented Apr 16, 2015

Additionally, having installed master over the v1.5.5 release, I've now got a consistent

Error: Could not retrieve catalog from remote server: Error 400 on SERVER: Invalid parameter reconnect_on_error on Sensu_rabbitmq_config[myserver.mynetwork.net] at /etc/puppet/environments/develop/modules/sensu/manifests/rabbitmq/config.pp:120 on node myserver.mynetwork.net
Warning: Not using cache on failed catalog
Error: Could not retrieve catalog; skipping run
@superseb
Contributor

I'll take a look today, did you restart the master after getting master?

@poolski
Contributor
poolski commented Apr 16, 2015

I did indeed. I killed Apache and waited for the passenger processes to stop before restarting, too.

@superseb
Contributor

In a clean Vagrant setup with open source Puppet 3.7.5-1 master+agent and the following declaration, it doesn't reset the properties in every run. Could you verify @poolski ? The issue you are experiencing seems like the old bug with custom providers and environments, but should be fixed in 3.7.5 (same version as I tested it on)

  class { '::sensu':
    client => true,
    client_custom => {
      'handlers' => [
        'default'
       ],
    },
    client_keepalive => {
      "high_flap_threshold" => "20",
      "low_flap_threshold" => "5",
      "refresh" => 14400,
      "thresholds" => {
        "critical" => 120,
        "warning" => 90
      }
    },
    purge_config => true,
    rabbitmq_host => '10.10.10.10',
    rabbitmq_password => 'password',
    rabbitmq_port => 5671,
    rabbitmq_ssl => true,
    rabbitmq_vhost => '/sensu',
    rabbitmq_reconnect_on_error => true,
    redis_reconnect_on_error => true,
    subscriptions => [ 'common' ],
    use_embedded_ruby => true,
  }
@superseb
Contributor

@poolski Any update on this?

@jsfrerot

Hi,
I applied this patch #345 on v1.5.5 and I still get the Sensu_client_config changing every puppet run.

1st run:
Notice: /Stage[main]/Sensu::Client::Config/Sensu_client_config[my-server]/port: port changed '' to '3030'

2nd run:
Notice: /Stage[main]/Sensu::Client::Config/Sensu_client_config[my-server]/custom: custom changed 'check_load => critical1.66,1.25,1.00warning1.25,1.00,0.80, port => 3030' to 'check_load => critical1.66,1.25,1.00warning1.25,1.00,0.80'

The manifest:

$load_warn=inline_template("<%= sprintf('%.2f',(@processorcount.to_i*1.25)) %>,<%= sprintf('%.2f',(@processorcount.to_i*1)) %>,<%= sprintf('%.2f',(@processorcount.to_i*0.8)) %>")
$load_crit=inline_template("<%= sprintf('%.2f',($processorcount.to_i*1.66)) %>,<%= sprintf('%.2f',($processorcount.to_i*1.25)) %>,<%= sprintf('%.2f',($processorcount.to_i*1)) %>")
class { 'sensu':
    rabbitmq_ssl_private_key => "puppet:///data/sensu/certs/client_key.pem",
    rabbitmq_ssl_cert_chain => "puppet:///data/sensu/certs/client_cert.pem",
    rabbitmq_password => 'xxx',
    rabbitmq_host => 'my-sensu-server',
    rabbitmq_port => 5671,
    rabbitmq_vhost => "/sensu",
    plugins => [
        'puppet:///data/sensu/plugins/system/check-ntp.rb',
        'puppet:///data/sensu/plugins/system/check-disk.rb',
        'puppet:///data/sensu/plugins/system/check-load.rb',
    ],  
    use_embedded_ruby => true,
    sensu_plugin_provider => sensu_gem,
    sensu_plugin_version => 'present',
    install_repo => false,
    client_custom => {
        check_load => {
            warning => $load_warn,
            critical => $load_crit,
        },  
    },  
    subscriptions => ['base']
}   
if $is_virtual == "false" {
    sensu::subscription { 'physical': }
}   
@jsfrerot
jsfrerot commented Jun 1, 2015

@superseb, since you were the main contact for this issue, would it be possible to look at my last comment and maybe point me where my problem could be?

Thanks.

@superseb
Contributor
superseb commented Jun 2, 2015

@jsfrerot Did you only apply #345? Because I think you need to apply #343. Let me know if this solves it.

@jsfrerot
jsfrerot commented Jun 8, 2015

@superseb I just applied #343 and it works as expected. Thank you.

@poolski
Contributor
poolski commented Jun 10, 2015

@superseb I'm now running v1.5.5 which as I understand it has both #343 and #345 rolled in?

Still experiencing the same issue and something new:
Error 400 on SERVER: Invalid parameter reconnect_on_error on Sensu_rabbitmq_config

Restarting the puppetmaster fixes it but for only one run after which it reverts back to erroring.

@superseb
Contributor

@poolski No, v1.5.5 doesn't contain those fixes. Let me ping @jamtur01 or @jlambert121 to release a new version.

@poolski
Contributor
poolski commented Jun 10, 2015

Oh, durp!

Thanks @superseb.

@poolski
Contributor
poolski commented Jun 11, 2015

@superseb - so another fun fact. I'm testing out using master to see if the bleeding edge code fixes any of my issues. It doesn't seem to - in fact, it introduces another one!

Notice: /Stage[main]/Sensu::Repo::Apt/Apt::Source[sensu]/Apt::Key[Add key: 8911D8FF37778F24B4E726A218609E3D7580C77F from Apt::Source sensu]/Exec[d36677d9164a673e5a4b8cdd005afa63c1c67926]/returns: executed successfully
Notice: /Stage[main]/Sensu::Client::Config/Sensu_client_config[server.network.net]/custom: custom changed 'handlers => ["default"]' to 'handlers => ["default"], keepalive => {"high_flap_threshold"=>20, "low_flap_threshold"=>5, "refresh"=>14400, "thresholds"=>{"critical"=>120, "warning"=>90}}'
Info: Class[Sensu::Client::Config]: Scheduling refresh of Service[sensu-client]

Now it "adds" the repo key every time Puppet runs. Also, even though the port changes have been fixed, it's still resetting keepalives and flap thresholds, even though nothing's changed

@jcustenborder

I'm seeing similar behavior with 3.7.5.

cat client.json 
{
  "client": {
    "address": "10.10.0.222",
    "name": "sensu-server.elysium.home",
    "subscriptions": [

    ],
    "bind": "127.0.0.1",
    "safe_mode": false,
    "keepalive": {
    }
  }
}
puppet agent --test
...
Notice: /Stage[main]/Sensu::Redis::Config/Sensu_redis_config[sensu-server.elysium.home]/reconnect_on_error: reconnect_on_error changed 'true' to 'false'
Info: Class[Sensu::Redis::Config]: Scheduling refresh of Service[sensu-api]
Info: Class[Sensu::Redis::Config]: Scheduling refresh of Service[sensu-server]
Notice: /Stage[main]/Sensu::Client::Config/Sensu_client_config[sensu-server.elysium.home]/port: port changed '' to '3030'
Info: Class[Sensu::Client::Config]: Scheduling refresh of Service[sensu-client]
Notice: /Stage[main]/Sensu::Client::Service/Service[sensu-client]: Triggered 'refresh' from 1 events
Notice: /Stage[main]/Sensu::Api::Service/Service[sensu-api]: Triggered 'refresh' from 1 events
Notice: /Stage[main]/Sensu::Server::Service/Service[sensu-server]: Triggered 'refresh' from 1 events
{
  "client": {
    "name": "sensu-server.elysium.home",
    "safe_mode": false,
    "address": "10.10.0.222",
    "port": "3030",
    "subscriptions": [

    ],
    "bind": "127.0.0.1",
    "keepalive": {
    }
  }
}
puppet agent --test
...
Notice: /Stage[main]/Sensu::Redis::Config/Sensu_redis_config[sensu-server.elysium.home]/reconnect_on_error: reconnect_on_error changed 'true' to 'false'
Info: Class[Sensu::Redis::Config]: Scheduling refresh of Service[sensu-server]
Info: Class[Sensu::Redis::Config]: Scheduling refresh of Service[sensu-api]
Notice: /Stage[main]/Sensu::Client::Config/Sensu_client_config[sensu-server.elysium.home]/custom: custom changed 'port => 3030' to ''
Info: Class[Sensu::Client::Config]: Scheduling refresh of Service[sensu-client]
Notice: /Stage[main]/Sensu::Client::Service/Service[sensu-client]: Triggered 'refresh' from 1 events
Notice: /Stage[main]/Sensu::Api::Service/Service[sensu-api]: Triggered 'refresh' from 1 events
Notice: /Stage[main]/Sensu::Server::Service/Service[sensu-server]: Triggered 'refresh' from 1 events
cat client.json 
{
  "client": {
    "subscriptions": [

    ],
    "keepalive": {
    },
    "bind": "127.0.0.1",
    "address": "10.10.0.222",
    "safe_mode": false,
    "name": "sensu-server.elysium.home"
  }
}

It's dropping some of the properties between each run. Notice the missing port between runs. All runs were with the same version.

I'm using this

include ::sensu
sensu::rabbitmq_password: 'password'
sensu::rabbitmq_host:
  - 'rabbitmq-01'
  - 'rabbitmq-02'
sensu::purge_config: true
sensu::rabbitmq_port: 5672
sensu::server: true
sensu::api: true
sensu::api_user: 'asdf'
sensu::api_password: 'asdf'
@superseb
Contributor

@poolski Did you split your data to client_custom and client_keepalive? I'll see if I can reproduce your apt::key.

@poolski
Contributor
poolski commented Jun 22, 2015

@superseb - here's my data:

  "sensu::client_custom": {
    "handlers": [
      "default"
    ],
    "keepalive": {
      "high_flap_threshold": "20",
      "low_flap_threshold": "5",
      "refresh": 14400,
      "thresholds": {
        "critical": 120,
        "warning": 90
      }
    }
  }
@superseb
Contributor

@poolski Please see example like I posted before, and let me know if this helps.

  class { '::sensu':
    client => true,
    client_custom => {
      'handlers' => [
        'default'
       ],
    },
    client_keepalive => {
      "high_flap_threshold" => "20",
      "low_flap_threshold" => "5",
      "refresh" => 14400,
      "thresholds" => {
        "critical" => 120,
        "warning" => 90
      }
    },
    purge_config => true,
    rabbitmq_host => '10.10.10.10',
    rabbitmq_password => 'password',
    rabbitmq_port => 5671,
    rabbitmq_ssl => true,
    rabbitmq_vhost => '/sensu',
    rabbitmq_reconnect_on_error => true,
    redis_reconnect_on_error => true,
    subscriptions => [ 'common' ],
    use_embedded_ruby => true,
  }
@poolski
Contributor
poolski commented Jun 22, 2015

That's just what I was about to do @superseb :D

@superseb
Contributor

@poolski ๐Ÿ‘ ๐Ÿ˜„

@poolski
Contributor
poolski commented Jun 22, 2015

@superseb when I first started using the module, I don't recall there being a dedicated keepalive field - it had to be shoehorned in with client_custom.

@superseb
Contributor

@poolski Correct, was added in 30ddb26 (v1.3.0)

@poolski
Contributor
poolski commented Jun 22, 2015

Okay, progress - now all I have is it trying to change the port on me...
Notice: /Stage[main]/Sensu::Client::Config/Sensu_client_config[host.network.net]/port: port changed '' to '3030'
or
Notice: /Stage[main]/Sensu::Client::Config/Sensu_client_config[host.network.net]/custom: custom changed 'handlers => ["default"], port => 3030' to 'handlers => ["default"]'

The above happen with no discernible pattern.

@superseb
Contributor

Awesome. This was fixed in #342, what codebase are you running on now?

@poolski
Contributor
poolski commented Jun 22, 2015

v1.5.5

@superseb
Contributor

Yeah, the fix was merged after the v1.5.5 tag.

@poolski
Contributor
poolski commented Jun 22, 2015

Alright, I'll try master and see what happens

@aamerik
aamerik commented Jun 23, 2015

Can someone suggest to me how to fix "Invalid parameter reconnect_on_error on Sensu_rabbitmq_config"?

using master, client ver 3.7.5, puppetserver 1.0.2, r10k environments. restarted both puppetservers - which fixes the issue for the first run, then back to invalid parameter errors.
tx

@aamerik
aamerik commented Jun 23, 2015

Looks like the invalid parameter issue is related to r10k environments. If you want the change to stick you need to update this module in all environments simultaneously and restart the puppetserver.

@poolski
Contributor
poolski commented Jun 30, 2015

@superseb any idea when the next version will be released?

@superseb
Contributor

@poolski I can't release a new version, let's ping @jamtur01 @jlambert121 again. Does it work as expected now?

@jlambert121
Collaborator

This is one issue that I think needs to get closed out before a release. Is this still an issue with the latest master?

@jcustenborder

I believe master as of a week or so ago fixed the issue for me. It'd be
great if someone double checked.

On Tue, Jun 30, 2015, 9:31 AM Justin Lambert notifications@github.com
wrote:

This is one issue that I think needs to get closed out before a release.
Is this still an issue with the latest master?

โ€”
Reply to this email directly or view it on GitHub
#336 (comment).

@deepakhj
deepakhj commented Jul 1, 2015

I had the same issue with the port changing on every run. Switched from 1.5.5 to master and now it's fixed.

@jlambert121
Collaborator

I'm going to close this, let us know if this is still an issue.

@poolski
Contributor
poolski commented Jul 15, 2015

@jlambert121 I'm getting the following error if I switch to using master:

Error: Could not retrieve catalog from remote server: Error 400 on SERVER: Invalid parameter source on Sensu_check[os-disks] at /etc/puppet/environments/staging/modules/sensu/manifests/check.pp:148 on node mynode.mynetwork.net

What broke between 1.5.5 and master? :(

@superseb
Contributor

Restarted?

#377 seems to contain a fix for source param issues.

@poolski
Contributor
poolski commented Jul 15, 2015

Yeah I restarted All The Things.

According to #377, it's been merged into master and fixes that error, but something's still not right.

@poolski
Contributor
poolski commented Jul 15, 2015

Ok, so, stopping the puppetserver actually seemed to fix it.

Issuing an /etc/init.d/puppetserver restart doesn't seem to flush whatever it's caching as effectively.

Might be worth noting in docs somewhere?

@devshorts

Just chiming in here, any ideas when this will get rolled into an official release?

@jlambert121
Collaborator

@devshorts I keep hoping to get a 2.0 release done real soon. Enterprise support looks like it's pretty much done.

@cintiadr
cintiadr commented Sep 7, 2015

It took me a while to understand that this issue was closed but not yet released.

I will be using commit cf40de6, as I won't be able to get apt 2.0 any time soon :/, caused by #411

Looking at the commits, I believe that's the first breaking change after 1.5.5.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment