Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

false "got worse" notification send #15584

Closed
albertsulva opened this issue Nov 16, 2023 · 7 comments
Closed

false "got worse" notification send #15584

albertsulva opened this issue Nov 16, 2023 · 7 comments

Comments

@albertsulva
Copy link

The problem

Hi there.

We found that LibreNMS makes notifcation "got worse" for rule with port down, even if the port is up before time to send the notification.

Rules for device group:

Max: 1
Delay: 900
Interval: 300

ports.ifOperStatus = "down" AND 
ports.ifOperStatus_prev = "up" AND 
macros.device_up = 1 AND 
ports.ifAdminStatus != "down"

And there are two eventlogs from switches:

2023-11-16 02:25:02    alert    switch1    Issued acknowledgment for rule '060 Port DOWN' to transport 'mail'    System
2023-11-15 22:31:02    alert    switch1    Issued got worse for rule '060 Port DOWN' to transport 'mail'    System
2023-11-15 22:31:02    alert    switch1    Issued got worse for rule '060 Port DOWN' to transport 'playsms'    System
2023-11-15 22:20:21    Gi1/0/28    switch1    ifOperStatus: down -> up    System
2023-11-15 22:20:21    Gi1/0/28    switch1    ifDuplex: unknown -> fullDuplex    System
2023-11-15 22:15:36    Gi1/0/28    switch1    ifOperStatus: up -> down    System
2023-11-15 22:15:36    Gi1/0/28    switch1    ifDuplex: fullDuplex -> unknown    System
2023-11-14 09:26:02    alert    switch1    Issued warning alert for rule '060 Port DOWN' to transport 'playsms'    System


2023-11-15 18:56:02    alert    switch2    Issued acknowledgment for rule '060 Port DOWN' to transport 'mail'    System
2023-11-15 18:51:02    alert    switch2    Issued got worse for rule '060 Port DOWN' to transport 'mail'    System
2023-11-15 18:40:45    Gi1/0/17    switch2    ifOperStatus: down -> up    System
2023-11-15 18:40:45    Gi1/0/17    switch2    ifDuplex: unknown -> fullDuplex    System
2023-11-15 18:35:52    Gi1/0/17    switch2    ifOperStatus: up -> down    System
2023-11-15 18:35:52    Gi1/0/17    switch2    ifDuplex: fullDuplex -> unknown    System
2023-11-15 17:46:02    alert    switch2    Issued acknowledgment for rule '060 Port DOWN' to transport 'mail'    System
2023-11-15 16:31:02    alert    switch2    Issued got better for rule '060 Port DOWN' to transport 'mail'    System
2023-11-15 16:15:54    Gi2/0/18    switch2    ifOperStatus: down -> up    System
2023-11-15 16:15:54    Gi2/0/18    switch2    ifDuplex: unknown -> fullDuplex    System
2023-11-14 14:10:02    alert    switch2    Issued acknowledgment for rule '060 Port DOWN' to transport 'mail'    System
2023-11-14 10:20:55    Gi1/0/47    switch2    ifAlias: XX-YYYY -> FREE    System
2023-11-14 10:20:55    Gi1/0/47    switch2    ifAdminStatus: up -> down    System
2023-11-14 09:35:24    Gi1/0/26    switch2    ifAlias: XX-YYZZ -> FREE    System
2023-11-14 09:35:24    Gi1/0/26    switch2    ifAdminStatus: up -> down    System
2023-11-13 16:26:02    alert    switch2    Issued got worse for rule '060 Port DOWN' to transport 'mail'    System
2023-11-13 16:10:41    Gi2/0/18    switch2    ifOperStatus: up -> down    System
2023-11-13 16:10:41    Gi2/0/18    switch2    ifDuplex: fullDuplex -> unknown    System
2023-11-13 11:06:02    alert    switch2    Issued acknowledgment for rule '060 Port DOWN' to transport 'mail'    System

Output of ./validate.php

===========================================
Component | Version
--------- | -------
LibreNMS  | 23.10.0-71-gfaf66035e (2023-11-15T15:21:06+01:00)
DB Schema | 2023_11_04_125846_packages_increase_name_column_length (273)
PHP       | 8.1.24
Python    | 3.6.8
Database  | MariaDB 10.3.28-MariaDB
RRDTool   | 1.7.0
SNMP      | 5.8
===========================================

[OK]    Composer Version: 2.6.5
[OK]    Dependencies up-to-date.
[WARN]  Debug enabled.  This is a security risk.
[OK]    Database connection successful
[OK]    Database Schema is current
[OK]    SQL Server meets minimum requirements
[OK]    lower_case_table_names is enabled
[OK]    MySQL engine is optimal
[OK]    Database and column collations are correct
[OK]    Database schema correct
[OK]    MySQl and PHP time match
[OK]    Active pollers found
[OK]    Dispatcher Service not detected
[OK]    Locks are functional
[FAIL]  Some poller nodes have not checked in recently
        Inactive Nodes:
          librenms.fqdn 
[OK]    Redis is unavailable
[OK]    rrdtool version ok
[OK]    Connected to rrdcached
[WARN]  Your local git contains modified files, this could prevent automatic updates.
        [FIX]:
        You can fix this with ./scripts/github-remove
        Modified Files:
         rrd/.gitignore


Poller node is active, this FAIL is relict of renaming. :/

What was the last working version of LibreNMS?

No response

Anything in the logs that might be useful for us?

No response

@murrant
Copy link
Member

murrant commented Nov 20, 2023

Can you show the alert log screen with alert expanded before and after?

@albertsulva
Copy link
Author

We do not know, what precisely you want to see. Hope that is what you want. Or you wanna to simulate breakdown from us?

switch1

@PipoCanaja
Copy link
Contributor

PipoCanaja commented Dec 2, 2023

Two variants here would help :

  • Push the "+" left of the Hostname to expand (which then becomes a "-") :
    image

  • Push the "Details" button and filter sensitive data before exporting the text here
    image

@albertsulva
Copy link
Author

Yes, here it is.


[
  {
    "device_id": 30,
    "inserted": "2020-11-12 11:16:33",
    "hostname": "x.x.x.x",
    "sysName": "switch1",
    "display": null,
    "ip": "x.x.x.x",
    "overwrite_ip": "",
    "community": "public",
    "authlevel": null,
    "authname": null,
    "authpass": null,
    "authalgo": null,
    "cryptopass": null,
    "cryptoalgo": null,
    "snmpver": "v2c",
    "port": 161,
    "transport": "udp",
    "timeout": null,
    "retries": null,
    "snmp_disable": 0,
    "bgpLocalAs": null,
    "sysObjectID": ".1.3.6.1.4.1.9.1.516",
    "sysDescr": "Cisco IOS Software, C3750E Software (C3750E-UNIVERSALK9-M), Version 15.0(2)SE11, RELEASE SOFTWARE (fc3)\r\nTechnical Support: http://www.cisco.com/techsupport\r\nCopyright (c) 1986-2017 by Cisco Systems, Inc.\r\nCompiled Sat 19-Aug-17 08:39 by prod_rel_team",
    "sysContact": null,
    "version": "15.0(2)SE11",
    "hardware": "WS-C3750X-48T-S",
    "features": "UNIVERSALK9",
    "location_id": 7,
    "os": "ios",
    "status": 1,
    "status_reason": "",
    "ignore": 0,
    "disabled": 0,
    "uptime": 89685859,
    "agent_uptime": 0,
    "last_polled": "2023-11-16 17:45:41",
    "last_poll_attempted": null,
    "last_polled_timetaken": 8.6341400146484,
    "last_discovered_timetaken": 32.772,
    "last_discovered": "2023-11-16 13:09:28",
    "last_ping": "2023-11-16 17:45:34",
    "last_ping_timetaken": 1.4,
    "purpose": "",
    "type": "network",
    "serial": "FDO1913F06W",
    "icon": "cisco.svg",
    "poller_group": 0,
    "override_sysLocation": 0,
    "notes": null,
    "port_association_mode": 1,
    "max_depth": 897,
    "disable_notify": 0,
    "port_id": 11601,
    "port_descr_type": null,
    "port_descr_descr": null,
    "port_descr_circuit": null,
    "port_descr_speed": null,
    "port_descr_notes": null,
    "ifDescr": "GigabitEthernet1/0/19",
    "ifName": "Gi1/0/19",
    "portName": null,
    "ifIndex": 10119,
    "ifSpeed": 1000000000,
    "ifSpeed_prev": 100000000,
    "ifConnectorPresent": "true",
    "ifOperStatus": "down",
    "ifOperStatus_prev": "up",
    "ifAdminStatus": "up",
    "ifAdminStatus_prev": "up",
    "ifDuplex": "unknown",
    "ifMtu": 1500,
    "ifType": "ethernetCsmacd",
    "ifAlias": "xxxxxxxxx",
    "ifPhysAddress": "a46c2a9c5113",
    "ifLastChange": 358758970,
    "ifVlan": "106",
    "ifTrunk": null,
    "ifVrf": 0,
    "deleted": 0,
    "pagpOperationMode": null,
    "pagpPortState": null,
    "pagpPartnerDeviceId": null,
    "pagpPartnerLearnMethod": null,
    "pagpPartnerIfIndex": null,
    "pagpPartnerGroupIfIndex": null,
    "pagpPartnerDeviceName": null,
    "pagpEthcOperationMode": null,
    "pagpDeviceId": null,
    "pagpGroupIfIndex": null,
    "ifInUcastPkts": 2343845039,
    "ifInUcastPkts_prev": 2343845039,
    "ifInUcastPkts_delta": 0,
    "ifInUcastPkts_rate": 0,
    "ifOutUcastPkts": 2386127881,
    "ifOutUcastPkts_prev": 2386127881,
    "ifOutUcastPkts_delta": 0,
    "ifOutUcastPkts_rate": 0,
    "ifInErrors": 0,
    "ifInErrors_prev": 0,
    "ifInErrors_delta": 0,
    "ifInErrors_rate": 0,
    "ifOutErrors": 0,
    "ifOutErrors_prev": 0,
    "ifOutErrors_delta": 0,
    "ifOutErrors_rate": 0,
    "ifInOctets": 1457127061875,
    "ifInOctets_prev": 1457127061875,
    "ifInOctets_delta": 0,
    "ifInOctets_rate": 0,
    "ifOutOctets": 2094069827420,
    "ifOutOctets_prev": 2094069827420,
    "ifOutOctets_delta": 0,
    "ifOutOctets_rate": 0,
    "poll_time": 1700153141,
    "poll_prev": 1700152832,
    "poll_period": 309
  },
  {
    "device_id": 30,
    "inserted": "2020-11-12 11:16:33",
    "hostname": "x.x.x.x",
    "sysName": "switch1",
    "display": null,
    "ip": "x.x.x.x",
    "overwrite_ip": "",
    "community": "public",
    "authlevel": null,
    "authname": null,
    "authpass": null,
    "authalgo": null,
    "cryptopass": null,
    "cryptoalgo": null,
    "snmpver": "v2c",
    "port": 161,
    "transport": "udp",
    "timeout": null,
    "retries": null,
    "snmp_disable": 0,
    "bgpLocalAs": null,
    "sysObjectID": ".1.3.6.1.4.1.9.1.516",
    "sysDescr": "Cisco IOS Software, C3750E Software (C3750E-UNIVERSALK9-M), Version 15.0(2)SE11, RELEASE SOFTWARE (fc3)\r\nTechnical Support: http://www.cisco.com/techsupport\r\nCopyright (c) 1986-2017 by Cisco Systems, Inc.\r\nCompiled Sat 19-Aug-17 08:39 by prod_rel_team",
    "sysContact": null,
    "version": "15.0(2)SE11",
    "hardware": "WS-C3750X-48T-S",
    "features": "UNIVERSALK9",
    "location_id": 7,
    "os": "ios",
    "status": 1,
    "status_reason": "",
    "ignore": 0,
    "disabled": 0,
    "uptime": 89685859,
    "agent_uptime": 0,
    "last_polled": "2023-11-16 17:45:41",
    "last_poll_attempted": null,
    "last_polled_timetaken": 8.6341400146484,
    "last_discovered_timetaken": 32.772,
    "last_discovered": "2023-11-16 13:09:28",
    "last_ping": "2023-11-16 17:45:34",
    "last_ping_timetaken": 1.4,
    "purpose": "",
    "type": "network",
    "serial": "FDO1913F06W",
    "icon": "cisco.svg",
    "poller_group": 0,
    "override_sysLocation": 0,
    "notes": null,
    "port_association_mode": 1,
    "max_depth": 897,
    "disable_notify": 0,
    "port_id": 1953,
    "port_descr_type": null,
    "port_descr_descr": null,
    "port_descr_circuit": null,
    "port_descr_speed": null,
    "port_descr_notes": null,
    "ifDescr": "GigabitEthernet2/0/47",
    "ifName": "Gi2/0/47",
    "portName": null,
    "ifIndex": 10647,
    "ifSpeed": 10000000,
    "ifSpeed_prev": 1000000000,
    "ifConnectorPresent": "true",
    "ifOperStatus": "down",
    "ifOperStatus_prev": "up",
    "ifAdminStatus": "up",
    "ifAdminStatus_prev": "down",
    "ifDuplex": "unknown",
    "ifMtu": 1500,
    "ifType": "ethernetCsmacd",
    "ifAlias": "xxxxxx",
    "ifPhysAddress": "bcc4931ce82f",
    "ifLastChange": 378259766,
    "ifVlan": "106",
    "ifTrunk": null,
    "ifVrf": 0,
    "deleted": 0,
    "pagpOperationMode": null,
    "pagpPortState": null,
    "pagpPartnerDeviceId": null,
    "pagpPartnerLearnMethod": null,
    "pagpPartnerIfIndex": null,
    "pagpPartnerGroupIfIndex": null,
    "pagpPartnerDeviceName": null,
    "pagpEthcOperationMode": null,
    "pagpDeviceId": null,
    "pagpGroupIfIndex": null,
    "ifInUcastPkts": 1045936600,
    "ifInUcastPkts_prev": 1045936600,
    "ifInUcastPkts_delta": 0,
    "ifInUcastPkts_rate": 0,
    "ifOutUcastPkts": 1394721133,
    "ifOutUcastPkts_prev": 1394721133,
    "ifOutUcastPkts_delta": 0,
    "ifOutUcastPkts_rate": 0,
    "ifInErrors": 0,
    "ifInErrors_prev": 0,
    "ifInErrors_delta": 0,
    "ifInErrors_rate": 0,
    "ifOutErrors": 0,
    "ifOutErrors_prev": 0,
    "ifOutErrors_delta": 0,
    "ifOutErrors_rate": 0,
    "ifInOctets": 979259036107,
    "ifInOctets_prev": 979259036107,
    "ifInOctets_delta": 0,
    "ifInOctets_rate": 0,
    "ifOutOctets": 934879551575,
    "ifOutOctets_prev": 934879551575,
    "ifOutOctets_delta": 0,
    "ifOutOctets_rate": 0,
    "poll_time": 1700153141,
    "poll_prev": 1700152832,
    "poll_period": 309
  }
]

@albertsulva
Copy link
Author

Hi there, any news in this case? Or do you wanna have any further info? Plese, let me know. We've stucked on this in our supervising scheme. Thank in advance.

@murrant
Copy link
Member

murrant commented Jan 9, 2024

Looks like it is working as expected.

You went from one port down on the device to two ports down. If you want it to trigger a new alert instead of a got worse one, resolve all the issues causing the alert to be triggered.

For example:
Disable the port in the equipment that was down before the second port went down.
Disable alerting for the port that was already down.
Set LibreNMS to exclude the already down port.

@murrant murrant closed this as completed Jan 9, 2024
@librenms-bot
Copy link

This issue has been mentioned on LibreNMS Community. There might be relevant details there:

https://community.librenms.org/t/false-got-worse-notification/24315/1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants