Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sensu custom json reordered on each run #271

Closed
jaxxstorm opened this issue Nov 18, 2014 · 11 comments
Closed

sensu custom json reordered on each run #271

jaxxstorm opened this issue Nov 18, 2014 · 11 comments

Comments

@jaxxstorm
Copy link
Contributor

I can't figure out WHY this is happening.

Puppet 3.6.2
Ruby 1.8.7

When I define a check like this:

sensu::check { $name:
      ensure              => $ensure,
      handlers            => $handlers,
      command             => $command,
      standalone          => true,
      interval            => $interval,
      custom              => merge({
        event_description    => $event_description,
        notification_email   => $notification_email,
        create_ticket        => $create_ticket,
        occurrences          => $occurrences,
        send_email           => $send_email,
        pagerduty_key        => $pagerduty_key,
      }, $sensu_custom)
    }

and the define this check with the wrapped define:

sensu_moncheck {'standalone-puppet-running':
    command            => '/etc/sensu/plugins/check_puppet_agent -s /var/log/puppet_last_run_summary.yaml -r /var/run/puppet/agent_disabled.lock',
    interval           => 360,
    event_description  => 'There is a problem with the puppet agent on this server.',
    occurrences         => '2',
  }

It results in the json being reordered for each run. Note, this almost the same check as #265 so it's probably the custom definitions, but it doesn't seem to happen for all checks, so I can't narrow down exactly what is causing it.

Here's the diff

[root@hostname]# diff /etc/sensu/conf.d/checks/standalone-puppet-running.json /tmp/standalone-puppet-running.json
4d3
<       "notification_email": "none",
5a5
>       "create_ticket": false,
7,8d6
<       "event_description": "There is a problem with the puppet agent on this server.",
<       "occurrences": 2,
9a8,9
>       "interval": 360,
>       "event_description": "There is a problem with the puppet agent on this server.",
13,15d12
<       "create_ticket": false,
<       "standalone": true,
<       "interval": 360,
18c15,18
<       ]

---
>       ],
>       "notification_email": "none",
>       "standalone": true,
>       "occurrences": 2

This has completely crippled our sensu install for the time being, so any help would be appreciated.

@jaxxstorm
Copy link
Contributor Author

I narrowed this down to the occurrences parameter.

If that's removed, there's no longer any problem with the custom reordering.

@jlambert121
Copy link
Contributor

@jaxxstorm thanks for the update - greatly appreciated. I hope to get a larger chunk of time to dive into this in the next few days.

@jaxxstorm
Copy link
Contributor Author

If it's any help, I'm also getting the same behaviour when using the handlers and keepalive config options in the client config options as well.

The sporadic behaviour of this is becoming a real pain. I can't implement a new feature without potentially breaking my entire setup

@jaxxstorm jaxxstorm reopened this Nov 26, 2014
@jaxxstorm
Copy link
Contributor Author

Sorry, didn't mean to close

@jaxxstorm
Copy link
Contributor Author

So far I have been able to find the following things affected:

keepalive => handler
keepalive => handlers
check => custom

I believe there's more, but that's what I have confirmed so far

@jamtur01
Copy link
Contributor

jamtur01 commented Dec 2, 2014

@jaxxstorm So I started to look at this and got very confused - occurences isn't a custom flag. Try this:

sensu::check { $name:
      ensure              => $ensure,
      handlers            => $handlers,
      command             => $command,
      standalone          => true,
      interval            => $interval,
      occurrences         => $occurrences,
      custom              => merge({
        event_description    => $event_description,
        notification_email   => $notification_email,
        create_ticket        => $create_ticket,
        send_email           => $send_email,
        pagerduty_key        => $pagerduty_key,
      }, $sensu_custom)
    }

That totally fixed the check issue for me.

Can you show me sample code with the keepalive handlers please?

@jaxxstorm
Copy link
Contributor Author

So as discussed in IRC, I think these might be different problems now I've narrowed them down, thanks for spotting my mistake @jamtur01 My original code looked like this:

# puppet.pp
include ::sensu

# common.yaml
sensu::client_keepalive:
  event_description: "The sensu client has not checked in for the specified time. This might mean the host is down! See <url> for more information"
  thresholds:
    warning: "45"
    critical: "90"
  handler: "irc"

I then changed the handler param to use handlers, which requires a hash

sensu::client_keepalive:
  event_description: "The sensu client has not checked in for the specified time. This might mean the host is down! See <url> for more information"
  thresholds:
    warning: "45"
    critical: "90"
  handlers: 
     - "irc"
     - "pagerduty"

And this happens:

[root@lbriggs-1 lbriggs]# puppet agent -t --environment=sensupoc
Info: Retrieving pluginfacts
Info: Retrieving plugin
Info: Loading facts in /var/lib/puppet/lib/facter/ipmi.rb
Info: Loading facts in /var/lib/puppet/lib/facter/facter_dot_d.rb
Info: Loading facts in /var/lib/puppet/lib/facter/hardware.rb
Info: Loading facts in /var/lib/puppet/lib/facter/location.rb
Info: Loading facts in /var/lib/puppet/lib/facter/biit.rb
Info: Loading facts in /var/lib/puppet/lib/facter/bios_and_system.rb
Info: Loading facts in /var/lib/puppet/lib/facter/hostnode.rb
Info: Loading facts in /var/lib/puppet/lib/facter/vz_license_expiry.rb
Info: Loading facts in /var/lib/puppet/lib/facter/puppet_vardir.rb
Info: Loading facts in /var/lib/puppet/lib/facter/pe_version.rb
Info: Loading facts in /var/lib/puppet/lib/facter/staging_windir.rb
Info: Loading facts in /var/lib/puppet/lib/facter/gemhome.rb
Info: Loading facts in /var/lib/puppet/lib/facter/bonding.rb
Info: Loading facts in /var/lib/puppet/lib/facter/concat_basedir.rb
Info: Loading facts in /var/lib/puppet/lib/facter/c2.rb
Info: Loading facts in /var/lib/puppet/lib/facter/hypervisor.rb
Info: Loading facts in /var/lib/puppet/lib/facter/rabbitmq_erlang_cookie.rb
Info: Loading facts in /var/lib/puppet/lib/facter/physicalprocessorcount.rb
Info: Loading facts in /var/lib/puppet/lib/facter/aide.rb
Info: Loading facts in /var/lib/puppet/lib/facter/splunk_version.rb
Info: Loading facts in /var/lib/puppet/lib/facter/redis_version.rb
Info: Loading facts in /var/lib/puppet/lib/facter/default_gateway.rb
Info: Loading facts in /var/lib/puppet/lib/facter/root_home.rb
Info: Loading facts in /var/lib/puppet/lib/facter/staging_http_get.rb
Info: Loading facts in /var/lib/puppet/lib/facter/vz_encrypted.rb
Info: Caching catalog for lbriggs-1.fqdn
Info: Applying configuration version '1417491586'
Notice: /Stage[main]/Sensu::Client::Config/Sensu_client_config[lbriggs-1.fqdn]/keepalive: keepalive changed 'event_description => The sensu client has not checked in for the specified time. This might mean the host is down! See http://confluence.apptio.lan/display/ops/Keepalive for more information, event_descripton => foo, handlers => pagerdutyirc, thresholds => critical90warning45' to 'event_description => The sensu client has not checked in for the specified time. This might mean the host is down! See <url> for more information, handlers => pagerdutyirc, thresholds => warning45critical90'
Info: Class[Sensu::Client::Config]: Scheduling refresh of Service[sensu-client]
Notice: /Stage[main]/Sensu::Client::Service/Service[sensu-client]: Triggered 'refresh' from 1 events
Notice: Finished catalog run in 17.00 seconds
[root@lbriggs-1 lbriggs]# cp /etc/sensu/conf.d/client.json /tmp/
cp: overwrite `/tmp/client.json'? y
[root@lbriggs-1 lbriggs]# puppet agent -t --environment=sensupoc
Info: Retrieving pluginfacts
Info: Retrieving plugin
Info: Loading facts in /var/lib/puppet/lib/facter/ipmi.rb
Info: Loading facts in /var/lib/puppet/lib/facter/facter_dot_d.rb
Info: Loading facts in /var/lib/puppet/lib/facter/hardware.rb
Info: Loading facts in /var/lib/puppet/lib/facter/location.rb
Info: Loading facts in /var/lib/puppet/lib/facter/biit.rb
Info: Loading facts in /var/lib/puppet/lib/facter/bios_and_system.rb
Info: Loading facts in /var/lib/puppet/lib/facter/hostnode.rb
Info: Loading facts in /var/lib/puppet/lib/facter/vz_license_expiry.rb
Info: Loading facts in /var/lib/puppet/lib/facter/puppet_vardir.rb
Info: Loading facts in /var/lib/puppet/lib/facter/pe_version.rb
Info: Loading facts in /var/lib/puppet/lib/facter/staging_windir.rb
Info: Loading facts in /var/lib/puppet/lib/facter/gemhome.rb
Info: Loading facts in /var/lib/puppet/lib/facter/bonding.rb
Info: Loading facts in /var/lib/puppet/lib/facter/concat_basedir.rb
Info: Loading facts in /var/lib/puppet/lib/facter/c2.rb
Info: Loading facts in /var/lib/puppet/lib/facter/hypervisor.rb
Info: Loading facts in /var/lib/puppet/lib/facter/rabbitmq_erlang_cookie.rb
Info: Loading facts in /var/lib/puppet/lib/facter/physicalprocessorcount.rb
Info: Loading facts in /var/lib/puppet/lib/facter/aide.rb
Info: Loading facts in /var/lib/puppet/lib/facter/splunk_version.rb
Info: Loading facts in /var/lib/puppet/lib/facter/redis_version.rb
Info: Loading facts in /var/lib/puppet/lib/facter/default_gateway.rb
Info: Loading facts in /var/lib/puppet/lib/facter/root_home.rb
Info: Loading facts in /var/lib/puppet/lib/facter/staging_http_get.rb
Info: Loading facts in /var/lib/puppet/lib/facter/vz_encrypted.rb
Info: Caching catalog for lbriggs-1.fqdn
Info: Applying configuration version '1417491637'
Notice: /Stage[main]/Sensu::Client::Config/Sensu_client_config[lbriggs-1.fqdn]/keepalive: keepalive changed 'event_description => The sensu client has not checked in for the specified time. This might mean the host is down! See http://confluence.apptio.lan/display/ops/Keepalive for more information, event_descripton => foo, handlers => pagerdutyirc, thresholds => warning45critical90' to 'event_description => The sensu client has not checked in for the specified time. This might mean the host is down! See <url> for more information, handlers => pagerdutyirc, thresholds => warning45critical90'
Info: Class[Sensu::Client::Config]: Scheduling refresh of Service[sensu-client]
Notice: /Stage[main]/Sensu::Client::Service/Service[sensu-client]: Triggered 'refresh' from 1 events
Notice: Finished catalog run in 13.58 seconds

If I diff the files:

# diff /tmp/client.json /etc/sensu/conf.d/client.json
2a3,4
>     "bind": "127.0.0.1",
>     "address": "172.16.22.10",
7d8
<     "bind": "127.0.0.1",
14c15
<       "event_description": "The sensu client has not checked in for the specified time. This might mean the host is down! See <url> for more information",
---
>       "event_descripton": "foo",
16,17c17,18
<         "critical": 90,
<         "warning": 45
---
>         "warning": 45,
>         "critical": 90
19,21c20,21
<       "event_descripton": "foo"
<     },
<     "address": "172.16.22.10"
---
>       "event_description": "The sensu client has not checked in for the specified time. This might mean the host is down! See <url> for more information"
>     }

and the raw file looks like

{
  "client": {
    "bind": "127.0.0.1",
    "address": "172.16.22.10",
    "subscriptions": [
      "base"
    ],
    "name": "lbriggs-1.fqdn",
    "safe_mode": false,
    "keepalive": {
      "handlers": [
        "pagerduty",
        "irc"
      ],
      "event_descripton": "foo",
      "thresholds": {
        "warning": 45,
        "critical": 90
      },
      "event_description": "The sensu client has not checked in for the specified time. This might mean the host is down! See <url> for more information"
    }
  }
}

What I also noticed is that I revert back to the original, it doesn't remove the handlers hash, so you end up with this raw file:

{
  "client": {
    "bind": "127.0.0.1",
    "address": "172.16.22.10",
    "subscriptions": [
      "base"
    ],
    "name": "lbriggs-1.fqdn",
    "safe_mode": false,
    "keepalive": {
      "handlers": [
        "pagerduty",
        "irc"
      ],
     "handler": "irc",
      "event_descripton": "foo",
      "thresholds": {
        "warning": 45,
        "critical": 90
      },
      "event_description": "The sensu client has not checked in for the specified time. This might mean the host is down! See <url> for more information"
    }
  }
}

Which is obviously also a problem.

NOW, what's interesting here is that if I remove the file completely:

rm /etc/sensu/conf.d/client.json

then rerun puppet, the file gets ordered correctly when it's created, and there's no subsequent changes on rerun.

[root@lbriggs-1 lbriggs]# puppet agent -t --environment=sensupoc
Info: Retrieving pluginfacts
Info: Retrieving plugin
Info: Loading facts in /var/lib/puppet/lib/facter/ipmi.rb
Info: Loading facts in /var/lib/puppet/lib/facter/facter_dot_d.rb
Info: Loading facts in /var/lib/puppet/lib/facter/hardware.rb
Info: Loading facts in /var/lib/puppet/lib/facter/location.rb
Info: Loading facts in /var/lib/puppet/lib/facter/biit.rb
Info: Loading facts in /var/lib/puppet/lib/facter/bios_and_system.rb
Info: Loading facts in /var/lib/puppet/lib/facter/hostnode.rb
Info: Loading facts in /var/lib/puppet/lib/facter/vz_license_expiry.rb
Info: Loading facts in /var/lib/puppet/lib/facter/puppet_vardir.rb
Info: Loading facts in /var/lib/puppet/lib/facter/pe_version.rb
Info: Loading facts in /var/lib/puppet/lib/facter/staging_windir.rb
Info: Loading facts in /var/lib/puppet/lib/facter/gemhome.rb
Info: Loading facts in /var/lib/puppet/lib/facter/bonding.rb
Info: Loading facts in /var/lib/puppet/lib/facter/concat_basedir.rb
Info: Loading facts in /var/lib/puppet/lib/facter/c2.rb
Info: Loading facts in /var/lib/puppet/lib/facter/hypervisor.rb
Info: Loading facts in /var/lib/puppet/lib/facter/rabbitmq_erlang_cookie.rb
Info: Loading facts in /var/lib/puppet/lib/facter/physicalprocessorcount.rb
Info: Loading facts in /var/lib/puppet/lib/facter/aide.rb
Info: Loading facts in /var/lib/puppet/lib/facter/splunk_version.rb
Info: Loading facts in /var/lib/puppet/lib/facter/redis_version.rb
Info: Loading facts in /var/lib/puppet/lib/facter/default_gateway.rb
Info: Loading facts in /var/lib/puppet/lib/facter/root_home.rb
Info: Loading facts in /var/lib/puppet/lib/facter/staging_http_get.rb
Info: Loading facts in /var/lib/puppet/lib/facter/vz_encrypted.rb
Info: Caching catalog for lbriggs-1.fqdn
Info: Applying configuration version '1417491814'
Notice: /Stage[main]/Sensu::Client::Config/Sensu_client_config[lbriggs-1.fqd ]/ensure: created
Notice: /Stage[main]/Sensu::Client::Config/File[/etc/sensu/conf.d/client.json]/owner: owner changed 'root' to 'sensu'
Notice: /Stage[main]/Sensu::Client::Config/File[/etc/sensu/conf.d/client.json]/group: group changed 'root' to 'sensu'
Notice: /Stage[main]/Sensu::Client::Config/File[/etc/sensu/conf.d/client.json]/mode: mode changed '0644' to '0444'
Info: Class[Sensu::Client::Config]: Scheduling refresh of Service[sensu-client]
Notice: /Stage[main]/Sensu::Client::Service/Service[sensu-client]: Triggered 'refresh' from 1 events
Notice: Finished catalog run in 12.46 seconds
[root@lbriggs-1 lbriggs]# puppet agent -t --environment=sensupoc
Info: Retrieving pluginfacts
Info: Retrieving plugin
Info: Loading facts in /var/lib/puppet/lib/facter/ipmi.rb
Info: Loading facts in /var/lib/puppet/lib/facter/facter_dot_d.rb
Info: Loading facts in /var/lib/puppet/lib/facter/hardware.rb
Info: Loading facts in /var/lib/puppet/lib/facter/location.rb
Info: Loading facts in /var/lib/puppet/lib/facter/biit.rb
Info: Loading facts in /var/lib/puppet/lib/facter/bios_and_system.rb
Info: Loading facts in /var/lib/puppet/lib/facter/hostnode.rb
Info: Loading facts in /var/lib/puppet/lib/facter/vz_license_expiry.rb
Info: Loading facts in /var/lib/puppet/lib/facter/puppet_vardir.rb
Info: Loading facts in /var/lib/puppet/lib/facter/pe_version.rb
Info: Loading facts in /var/lib/puppet/lib/facter/staging_windir.rb
Info: Loading facts in /var/lib/puppet/lib/facter/gemhome.rb
Info: Loading facts in /var/lib/puppet/lib/facter/bonding.rb
Info: Loading facts in /var/lib/puppet/lib/facter/concat_basedir.rb
Info: Loading facts in /var/lib/puppet/lib/facter/c2.rb
Info: Loading facts in /var/lib/puppet/lib/facter/hypervisor.rb
Info: Loading facts in /var/lib/puppet/lib/facter/rabbitmq_erlang_cookie.rb
Info: Loading facts in /var/lib/puppet/lib/facter/physicalprocessorcount.rb
Info: Loading facts in /var/lib/puppet/lib/facter/aide.rb
Info: Loading facts in /var/lib/puppet/lib/facter/splunk_version.rb
Info: Loading facts in /var/lib/puppet/lib/facter/redis_version.rb
Info: Loading facts in /var/lib/puppet/lib/facter/default_gateway.rb
Info: Loading facts in /var/lib/puppet/lib/facter/root_home.rb
Info: Loading facts in /var/lib/puppet/lib/facter/staging_http_get.rb
Info: Loading facts in /var/lib/puppet/lib/facter/vz_encrypted.rb
Info: Caching catalog for lbriggs-1.fqdn
Info: Applying configuration version '1417491882'
Notice: Finished catalog run in 9.87 seconds

@jamtur01
Copy link
Contributor

jamtur01 commented Dec 3, 2014

What version of the module are you using?

@jaxxstorm
Copy link
Contributor Author

I was a few commits behind, but I've just done a git pull and I'm up to date.

Problem persists

@jamtur01
Copy link
Contributor

jamtur01 commented Dec 3, 2014

@jaxxstorm The problem is that it shouldn't persist and I can't replicate it. Ping @johnf - any thoughts?

@jlambert121
Copy link
Contributor

@jaxxstorm is this still an issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants