Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handler/Rule selection for the same trap #47

Open
manfredw opened this issue Sep 2, 2020 · 7 comments
Open

Handler/Rule selection for the same trap #47

manfredw opened this issue Sep 2, 2020 · 7 comments
Assignees
Labels
enhancement New feature or request

Comments

@manfredw
Copy link

manfredw commented Sep 2, 2020

Is your feature request related to a problem? Please describe.
Let's say a vendor uses an universal trap to send state info. The Trap contains variables for
state (OK/NOK), severity (minor/major/...), problem category (CPU/memory/disk/powersupply/fan/...)
and an specific error message.
The first thing you need are rules to differentiate between OK and NOK for this trap.
Then you need additional separation for severity and and problem categories.
At last the device sends out some annoying traps over several categories you want to ignore.
How to create rules with such exceptions without exponential complexity?

Describe the solution you'd like
There will be several ways to to solve this problem, this list is not intended to be exhaustive:

  • use a longest match methode (i.e. more specific rules within same trap wins)
  • use priorities for rules/handlers
  • use an adjustable processing order of rules/handlers
  • use a dynamic rule set within one handler by adding a new rule with individual filter-action pairs
@patrickpr patrickpr self-assigned this Sep 3, 2020
@patrickpr patrickpr added the enhancement New feature or request label Sep 3, 2020
@patrickpr
Copy link
Owner

I've run into that problem already and I the solution I found best is to have multiple evaluations in one rule (ordered).
Something like :

  • OIDa contains "CPU"
  1. OIDb > 90 then warning
  2. OIDb < 5 then ignore
  • OIDa contains "not useful" then ignore

The main problem is not in implementation but how to setup the GUI for this. I was planning using something like the 'assign where' filter in the service apply rules of Director.

@patrickpr
Copy link
Owner

Worked on it a little, I end up with two new DB tables and logic is :

Rule (same as actual rule) select trap based on source IP / OID

Evaluation of trap content is made by a new type of rule which has :

  • the rule itself based on trap content
  • Index of action in other table for match and no match

Action can be :

  • Return with code (OK/Warn/crit/nothing/ignore), display, and optionnal host/hostgroup and service reassign
  • Forward to another rule (the new type).

Is there something I didn't think of ?

@manfredw
Copy link
Author

manfredw commented Sep 3, 2020

I'm not sure if you need forwarding to another rule.

The first selection criteria on trap source and trap OID is mandatory (handler). IMHO you need only one handler per host(group)/trap combination.

Within this handler, the trap content should be evaluated against an ordered list of selection rules and corresponding actions.
If the rule matches, the action will be returned and no further rules from the list are checked. There could be an explicit default action on the end of the list without ruleset.

Rules with unique selection criteria will not need a specific order, but criteria like (a & b & c)->ignore and (a & b)->ok will need exactly this order to work. Of course you can also change this simple rule (a & b & !c)->ok, but with multiple "c" or additional "d" criteria it will not scale. This also should create less complex rules which are better readable and run faster.

At least you need one additional DB table. In my example it should contain rule-id, handler-id, order, rule, description and action.

@patrickpr
Copy link
Owner

You are right, it's much simpler like this (and doesn't need recursion).
So adding two tables :
<prefix>_rules_details with

  • Handler ref and order num
  • Rule
  • Ref to action for match / no match

<prefix>_rules_action with

  • display, status and optionnal reassignement to other service/host
  • keeping forwarding to rule for now

@manfredw
Copy link
Author

manfredw commented Sep 4, 2020

Why do you want to use two tables for rule processing?

There is a 1:1 relation between rule and action and no need for splitting up, just between handler and rule(s) is 1:n.

I've looked into the current rules table and found some columns which seems not be used (or reserved for future use).
These are ip4 and ip6 (IMHO only hostname is unique in icinga2), action_nomatch, display_nok and num_match_nok.

I would suggest the following tables (audit or statistics fields not included):
_handler with

  • handler ID
  • trap OID
  • hostname
  • hostgroupname
  • description
  • default action
  • default revert time
  • default servicename

_rules with

  • rule ID
  • handler ID
  • order number of rule
  • rule
  • action
  • revert time
  • servicename
  • display

This design should gain maximum flexibilty, you can still set different services and states for the same trap/host combination.
When the trap comes in, your first step will be to find the corresponding handler. After that read all rules for this specific handler and process them in the defined order (defined by order numer). If the rule matches the trap content, there is no need to process subsequent rules, just stop an return action/servicename/display information.
If no rule matches, then default action is returned.

There is no need to add a second handler for this trap/host combination, just add a new rule.

@patrickpr
Copy link
Owner

ip4 & ip6 makes it easy to select correct handler - wihtout additional queries (IDO or API) on Icinga - when receiving traps.
When receiving trap, the only information from host is the source IP.

Rules and action actually have a 1:2 relation as there is a match and a no match action for one rule. But with the new system, maybe the no match action is not needed anymore as you can do it with two rules.

Revert time is not really used and will soon be obsolete as it is easy to set it in the passive service configuration on icinga : there is also a big problem as there is no trapdirector service so I can't be sure the revert action will be sent on time to Icinga.

If I implement this, I will also need to reassign host base on rules. Use case is when VSphere is sending traps for ESX VM : all come from VSphere but alarms must be on the virtual machine host.

There must be a default display with the default action.

Anyway, your design is more simple so I will probably go for it.

patrickpr pushed a commit that referenced this issue Nov 2, 2020
Multiple rules GUI
patrickpr pushed a commit that referenced this issue Nov 4, 2020
patrickpr pushed a commit that referenced this issue Nov 8, 2020
@Tqnsls
Copy link

Tqnsls commented Aug 8, 2023

Hi @patrickpr,
do you have any updates regarding this feature-bug?
We also ran into this issue because one of our providers sends all traps with the same OID.

I first tried to accomplish this by modifying the rule, e.g. like:
( $3$ = 2 ) & ($3.oid$ = ".1.3.6.1.2.1.1.3.0" ) to determine that the specific OID is the one I need.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants