BGP alerting , empty alert, no detail #5207

Closed
mmarchand opened this Issue Dec 21, 2016 · 7 comments

Projects

None yet

2 participants

@mmarchand
Contributor

DO NOT DELETE THIS INFORMATION.

Please read this information carefully.

GitHub issues is for feature requests or bugs, please do not post issues asking for help or how to do X, Y or Z.
You can use our irc channel ##librenms on freenode to ask questions or our community site.

  • Is your install up to date? Updating your install
    Please do not submit an issue if your install is not up to date within the last 24 hours or on a stable monthly release.
  • Please include all of the information between the ==================================== section of ./validate.php.
  • If you would like us to add a new device or your device is listed as Generic then please provide the information asked for here
  • Please provide as much detail as possible.

I am using a BGP alert :
%bgpPeers.bgpPeerState != "established" && %macros.device_up = "1" && %bgpPeers.bgpPeerAdminStatus != "stop" && %bgpPeers.bgpPeerFsmEstablishedTime >= "300"

it works fine on an ASR 9010 , properly alerting down sessions,
but on my ASR 9922 the alert details are empty (so I keep having an empty alert) and I don't get any new (worse/better) notification
I suspect this might be a size problem somewhere, as these 9922 routers have a lot of sessions (down and up)

for example : https://gyazo.com/d4383fc8a742026481b2340f36a2853b
for the first alert there should be "details" appearing like for the second alert

tell me what details I can provide to help debugging, thanks

@laf laf added the Alerting label Dec 27, 2016
@laf
Member
laf commented Dec 27, 2016

SELECT details FROM alert_log WHERE rule_id = Y AND device_id = Z ORDER BY id DESC LIMIT 1

Replace Y with the rule id and Z with the device id, see what's output in that?

@mmarchand
Contributor

It probably some function to give a text :

+----------+
| details  |
+----------+
| x�      |
+----------+
1 row in set (0,00 sec)
@laf
Member
laf commented Dec 28, 2016

Ha, forgot it was encoded and gzipd!

Create this file /tmp/a.php:

<?php

$init_modules = array();
require '/opt/librenms/includes/init.php';

$rule_id = $argv[1];
$device_id = $argv[2];

$details = dbFetchCell('SELECT details FROM alert_log WHERE rule_id = ? AND device_id = ? ORDER BY id DESC LIMIT 1', array($rule_id, $device_id));
$data = json_decode(gzuncompress($details), true);
print_r($data);

Then run php /tmp/a.php RULE_ID DEVICE__ID

Replace RULE_ID and DEVICE_ID. Paste the output.

@mmarchand
Contributor

thanks, here it is :

php dump-alert.php 26 2776
Array
(
    [interval] => 1482939482
)
@laf
Member
laf commented Dec 28, 2016

I'm confused with that, interval doesn't event seem to be a genuine value that can end up in that :(

Here's mine:

php /tmp/p.php 46 92
Array
(
    [contacts] => Array
        (
            [redacted_email] => NOC
        )

    [rule] => Array
        (
            [0] => Array
                (
                    [device_id] => 92
                    [hostname] => tplink-tl-sg3452
                    [sysName] => t2600g-52ts
                    [ip] =>
                    [community] => tplink-TL-SG3452
                    [authlevel] =>
                    [authname] =>
                    [authpass] =>
                    [authalgo] =>
                    [cryptopass] =>
                    [cryptoalgo] =>
                    [snmpver] => v2c
                    [port] => 1161
                    [transport] => udp
                    [timeout] =>
                    [retries] =>
                    [bgpLocalAs] =>
                    [sysObjectID] => enterprises.11863.5.34
                    [sysDescr] => JetStream 48-Port Gigabit L2 Managed Switch with 4 SFP Slots
                    [sysContact] => www.tp-link.com
                    [version] =>
                    [hardware] =>
                    [features] =>
                    [location] => Dywizji 2c
                    [os] => generic
                    [status] => 1
                    [status_reason] =>
                    [ignore] => 0
                    [disabled] => 0
                    [uptime] => 0
                    [agent_uptime] => 0
                    [last_polled] => 2016-12-26 23:40:47
                    [last_poll_attempted] =>
                    [last_polled_timetaken] => 9.30
                    [last_discovered_timetaken] => 20.95
                    [last_discovered] => 2016-12-26 23:40:23
                    [last_ping] => 2016-12-26 23:40:47
                    [last_ping_timetaken] => 0.03
                    [purpose] =>
                    [type] => server
                    [serial] =>
                    [icon] => generic
                    [poller_group] => 0
                    [override_sysLocation] => 0
                    [notes] =>
                    [port_association_mode] => 1
                )

        )

)

I'd be inclined to say let's delete that alert and see what it re-generates.

@mmarchand
Contributor

very strange, I deleted the rule, waited a bit then added it again :

php dump-alert.php 28 2776
Array
(
    [interval] => 1483000562
    [count] => 1
)

the same rule still works on a third ASR, only my 2 big ones give this kind of strange output
these have a lot of BGP sessions though, maybe some kind of array overflow when feeding the details ?
I guess interval and count could come from the alert settings itself ?

@mmarchand mmarchand added a commit to mmarchand/librenms that referenced this issue Dec 29, 2016
@mmarchand mmarchand Fix #5207
Array can contain more than 1 entry
f881b55
@laf
Member
laf commented Dec 30, 2016

Merged, thanks for fixing :)

@laf laf closed this Dec 30, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment