Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpected behavior of hooks mechanism #778

Closed
pilyon3 opened this issue Apr 12, 2021 · 5 comments
Closed

Unexpected behavior of hooks mechanism #778

pilyon3 opened this issue Apr 12, 2021 · 5 comments

Comments

@pilyon3
Copy link

pilyon3 commented Apr 12, 2021

Expected Behavior

Hooks should be trigger for each occurrence configured

Current Behavior

Hooks triggerd randomly

Possible Solution

May be this behavior is inherent to job queue mechanism but some hooks appears to be definitively lost which is not terrific. If this analyze is right, could it be possible to have the possibility to prioritize some hooks that can't be delayed or ever less forgotten ?
For information, expired_jobs of 14 days by default has not been modified nor default backend daemon's workers parameters like timeout or tasks.

Steps to Reproduce (for bugs)

  1. Configure hooks for by example discover event sending data into a temporarily log file called "test.log"
  2. Schedule discover for a device each minute to have better visibility
  3. Active debug level for logs
  4. Hooks are logged into netdisco-backend log file, but hooks not triggerd each time into test.log file

Context

New device or discovered devices not present/refreshed into local monitoring application

Your Environment

  • Netdisco version used: 2.47.5
  • SNMP::Info version used: 3.70

Config info (deployment.yml)

hooks:

schedule:
discover_specific:
action: 'discover'
when: '*/1 * * * *'
device: '192.168.100.6'

Device information

All devices impacted

Logs into netdisco-backend.log (parsed for specific device 192.168.100.6). Seems to be as expected (hooks triggerd every minutes):

[113785] 2021-04-12 09:23:12 debug [192.168.100.6] hooks - 2 queued
[113789] 2021-04-12 09:23:36 debug [192.168.100.6] hooks - 2 queued
[113790] 2021-04-12 09:25:34 debug [192.168.100.6] hooks - 2 queued
[113791] 2021-04-12 09:24:19 debug [192.168.100.6] hooks - 2 queued
[114374] 2021-04-12 09:26:50 debug [192.168.100.6] hooks - 2 queued
[114375] 2021-04-12 09:27:31 debug [192.168.100.6] hooks - 2 queued
[114376] 2021-04-12 09:28:55 debug [192.168.100.6] hooks - 2 queued
[114377] 2021-04-12 09:29:39 debug [192.168.100.6] hooks - 2 queued
[114378] 2021-04-12 09:30:44 debug [192.168.100.6] hooks - 2 queued
[114828] 2021-04-12 09:31:43 debug [192.168.100.6] hooks - 2 queued
[114843] 2021-04-12 09:32:26 debug [192.168.100.6] hooks - 2 queued
[114916] 2021-04-12 09:33:42 debug [192.168.100.6] hooks - 2 queued
[114968] 2021-04-12 09:34:40 debug [192.168.100.6] hooks - 2 queued
[115272] 2021-04-12 09:35:38 debug [192.168.100.6] hooks - 2 queued
[115445] 2021-04-12 09:36:38 debug [192.168.100.6] hooks - 2 queued
[115448] 2021-04-12 09:37:43 debug [192.168.100.6] hooks - 2 queued
[115451] 2021-04-12 09:38:41 debug [192.168.100.6] hooks - 2 queued
[115452] 2021-04-12 09:39:52 debug [192.168.100.6] hooks - 2 queued
[115461] 2021-04-12 09:40:40 debug [192.168.100.6] hooks - 2 queued
[116260] 2021-04-12 09:41:22 debug [192.168.100.6] hooks - 2 queued
[116263] 2021-04-12 09:42:33 debug [192.168.100.6] hooks - 2 queued
[116264] 2021-04-12 09:43:28 debug [192.168.100.6] hooks - 2 queued

Logs into test.log

Monday, April 12, 2021, 11:24:19 (UTC+0200) https://superview.XXX/oxidized/reload
Monday, April 12, 2021, 11:24:19 (UTC+0200) https://superview.XXX/webservices/discovery.php?ip=192.168.100.6&action=discover
Monday, April 12, 2021, 11:31:30 (UTC+0200) https://superview.XXX/webservices/discovery.php?ip=192.168.100.11&action=discover
Monday, April 12, 2021, 11:31:43 (UTC+0200) https://superview.XXX/oxidized/reload
Monday, April 12, 2021, 11:31:43 (UTC+0200) https://superview.XXX/webservices/discovery.php?ip=192.168.100.6&action=discover
Monday, April 12, 2021, 11:32:27 (UTC+0200) https://superview.XXX/webservices/discovery.php?ip=192.168.100.6&action=discover
Monday, April 12, 2021, 11:32:27 (UTC+0200) https://superview.XXX/oxidized/reload
Monday, April 12, 2021, 11:33:43 (UTC+0200) https://superview.XXX/oxidized/reload
Monday, April 12, 2021, 11:33:43 (UTC+0200) https://superview.XXX/webservices/discovery.php?ip=192.168.100.6&action=discover
Monday, April 12, 2021, 11:34:33 (UTC+0200) https://superview.XXX/webservices/discovery.php?ip=192.168.100.11&action=discover
Monday, April 12, 2021, 11:34:33 (UTC+0200) https://superview.XXX/oxidized/reload
Monday, April 12, 2021, 11:34:41 (UTC+0200) https://superview.XXX/oxidized/reload
Monday, April 12, 2021, 11:34:41 (UTC+0200) https://superview.XXX/webservices/discovery.php?ip=192.168.100.6&action=discover
Monday, April 12, 2021, 11:36:39 (UTC+0200) https://superview.XXX/webservices/discovery.php?ip=192.168.100.6&action=discover
Monday, April 12, 2021, 11:36:39 (UTC+0200) https://superview.XXX/oxidized/reload
Monday, April 12, 2021, 11:37:43 (UTC+0200) https://superview.XXX/oxidized/reload
Monday, April 12, 2021, 11:37:44 (UTC+0200) https://superview.XXX/webservices/discovery.php?ip=192.168.100.6&action=discover
Monday, April 12, 2021, 11:39:52 (UTC+0200) https://superview.XXX/oxidized/reload
Monday, April 12, 2021, 11:39:52 (UTC+0200) https://superview.XXX/webservices/discovery.php?ip=192.168.100.6&action=discover
Monday, April 12, 2021, 11:43:28 (UTC+0200) https://superview.XXX/webservices/discovery.php?ip=192.168.100.6&action=discover
Monday, April 12, 2021, 11:43:28 (UTC+0200) https://superview.XXX/oxidized/reload
Monday, April 12, 2021, 11:44:44 (UTC+0200) https://superview.XXX/oxidized/reload
Monday, April 12, 2021, 11:44:44 (UTC+0200) https://superview.XXX/webservices/discovery.php?ip=192.168.100.11&action=discover

@ollyg
Copy link
Member

ollyg commented Jul 14, 2021

thanks for the bug report @pilyon3

I think what is happening is that Netdisco is seeing duplicate jobs and deleting/cancelling them. This is sensible for things like macsuck and discover but not sensible for Hooks, I guess (or we could introduce a new flag to control the behaviour).

Are you able to explore the database and see if you have entries in the "admin" table saying something like "...duplicate of..." in the log field?

In any case I will have a think about a good way to resolve and deal with this.

@ollyg
Copy link
Member

ollyg commented Jul 14, 2021

ah! I have discovered a bug
Exec jobs are not classed as high priority (only HTTP ones). Whoops.

I will release a fix and we can see if this helps.

@pilyon3
Copy link
Author

pilyon3 commented Jul 19, 2021

Hi ollyg,

Thanks for your work, and sorry for delayed reaction. We'll try your fix asap and report to you.

@pilyon3
Copy link
Author

pilyon3 commented Jul 19, 2021

After few tests, it seems to be better, and hooks works as expected. Reusing setup as mentionned in this issue, we have now:

Logs into netdisco-backend.log

[89405] 2021-07-19 14:30:59 debug [discover] hooks - 192.168.100.6 queued
[89405] 2021-07-19 14:30:59 debug [discover] hooks - 192.168.100.6 queued
[89836] 2021-07-19 14:31:35 debug [discover] hooks - 192.168.100.6 queued
[89836] 2021-07-19 14:31:35 debug [discover] hooks - 192.168.100.6 queued
[89847] 2021-07-19 14:32:37 debug [discover] hooks - 192.168.100.6 queued
[89847] 2021-07-19 14:32:37 debug [discover] hooks - 192.168.100.6 queued
[89850] 2021-07-19 14:33:32 debug [discover] hooks - 192.168.100.6 queued
[89850] 2021-07-19 14:33:32 debug [discover] hooks - 192.168.100.6 queued
[89984] 2021-07-19 14:34:48 debug [discover] hooks - 192.168.100.6 queued
[89984] 2021-07-19 14:34:48 debug [discover] hooks - 192.168.100.6 queued
[90301] 2021-07-19 14:35:32 debug [discover] hooks - 192.168.100.6 queued
[90301] 2021-07-19 14:35:32 debug [discover] hooks - 192.168.100.6 queued
[90611] 2021-07-19 14:36:44 debug [discover] hooks - 192.168.100.6 queued
[90611] 2021-07-19 14:36:44 debug [discover] hooks - 192.168.100.6 queued
[90614] 2021-07-19 14:37:26 debug [discover] hooks - 192.168.100.6 queued
[90614] 2021-07-19 14:37:26 debug [discover] hooks - 192.168.100.6 queued
[90617] 2021-07-19 14:38:42 debug [discover] hooks - 192.168.100.6 queued
[90617] 2021-07-19 14:38:42 debug [discover] hooks - 192.168.100.6 queued
[90626] 2021-07-19 14:39:54 debug [discover] hooks - 192.168.100.6 queued
[90626] 2021-07-19 14:39:54 debug [discover] hooks - 192.168.100.6 queued
[90647] 2021-07-19 14:40:31 debug [discover] hooks - 192.168.100.6 queued
[90647] 2021-07-19 14:40:31 debug [discover] hooks - 192.168.100.6 queued
[91207] 2021-07-19 14:41:27 debug [discover] hooks - 192.168.100.6 queued
[91207] 2021-07-19 14:41:27 debug [discover] hooks - 192.168.100.6 queued
[91210] 2021-07-19 14:42:55 debug [discover] hooks - 192.168.100.6 queued
[91210] 2021-07-19 14:42:55 debug [discover] hooks - 192.168.100.6 queued
[91211] 2021-07-19 14:43:41 debug [discover] hooks - 192.168.100.6 queued
[91211] 2021-07-19 14:43:41 debug [discover] hooks - 192.168.100.6 queued
[91215] 2021-07-19 14:44:44 debug [discover] hooks - 192.168.100.6 queued
[91215] 2021-07-19 14:44:44 debug [discover] hooks - 192.168.100.6 queued
[91228] 2021-07-19 14:45:30 debug [discover] hooks - 192.168.100.6 queued
[91228] 2021-07-19 14:45:30 debug [discover] hooks - 192.168.100.6 queued

Logs into test.log

lundi 19 juillet 2021, 14:31:00 (UTC+0200) https://superview.XXX/webservices/discovery.php?ip=192.168.100.6&action=discover
lundi 19 juillet 2021, 14:31:01 (UTC+0200) https://superview.XXX/oxidized/reload
lundi 19 juillet 2021, 14:31:35 (UTC+0200) https://superview.XXX/webservices/discovery.php?ip=192.168.100.6&action=discover
lundi 19 juillet 2021, 14:31:35 (UTC+0200) https://superview.XXX/oxidized/reload
lundi 19 juillet 2021, 14:32:37 (UTC+0200) https://superview.XXX/webservices/discovery.php?ip=192.168.100.6&action=discover
lundi 19 juillet 2021, 14:32:37 (UTC+0200) https://superview.XXX/oxidized/reload
lundi 19 juillet 2021, 14:33:32 (UTC+0200) https://superview.XXX/oxidized/reload
lundi 19 juillet 2021, 14:33:32 (UTC+0200) https://superview.XXX/webservices/discovery.php?ip=192.168.100.6&action=discover
lundi 19 juillet 2021, 14:33:53 (UTC+0200) https://superview.XXX/oxidized/reload
lundi 19 juillet 2021, 14:33:53 (UTC+0200) https://superview.XXX/webservices/discovery.php?ip=192.168.100.22&action=discover
lundi 19 juillet 2021, 14:34:49 (UTC+0200) https://superview.XXX/webservices/discovery.php?ip=192.168.100.6&action=discover
lundi 19 juillet 2021, 14:34:49 (UTC+0200) https://superview.XXX/oxidized/reload
lundi 19 juillet 2021, 14:35:33 (UTC+0200) https://superview.XXX/webservices/discovery.php?ip=192.168.100.6&action=discover
lundi 19 juillet 2021, 14:35:33 (UTC+0200) https://superview.XXX/oxidized/reload
lundi 19 juillet 2021, 14:36:44 (UTC+0200) https://superview.XXX/webservices/discovery.php?ip=192.168.100.6&action=discover
lundi 19 juillet 2021, 14:36:44 (UTC+0200) https://superview.XXX/oxidized/reload
lundi 19 juillet 2021, 14:37:27 (UTC+0200) https://superview.XXX/webservices/discovery.php?ip=192.168.100.6&action=discover
lundi 19 juillet 2021, 14:37:27 (UTC+0200) https://superview.XXX/oxidized/reload
lundi 19 juillet 2021, 14:38:42 (UTC+0200) https://superview.XXX/webservices/discovery.php?ip=192.168.100.6&action=discover
lundi 19 juillet 2021, 14:38:42 (UTC+0200) https://superview.XXX/oxidized/reload
lundi 19 juillet 2021, 14:39:54 (UTC+0200) https://superview.XXX/oxidized/reload
lundi 19 juillet 2021, 14:39:54 (UTC+0200) https://superview.XXX/webservices/discovery.php?ip=192.168.100.6&action=discover
lundi 19 juillet 2021, 14:41:27 (UTC+0200) https://superview.XXX/webservices/discovery.php?ip=192.168.100.6&action=discover
lundi 19 juillet 2021, 14:41:27 (UTC+0200) https://superview.XXX/oxidized/reload
lundi 19 juillet 2021, 14:43:41 (UTC+0200) https://superview.XXX/webservices/discovery.php?ip=192.168.100.6&action=discover
lundi 19 juillet 2021, 14:43:41 (UTC+0200) https://superview.XXX/oxidized/reload
lundi 19 juillet 2021, 14:44:45 (UTC+0200) https://superview.XXX/oxidized/reload
lundi 19 juillet 2021, 14:44:45 (UTC+0200) https://superview.XXX/webservices/discovery.php?ip=192.168.100.6&action=discover
lundi 19 juillet 2021, 14:45:31 (UTC+0200) https://superview.XXX/webservices/discovery.php?ip=192.168.100.6&action=discover
lundi 19 juillet 2021, 14:45:32 (UTC+0200) https://superview.XXX/oxidized/reload
lundi 19 juillet 2021, 14:46:47 (UTC+0200) https://superview.XXX/oxidized/reload
lundi 19 juillet 2021, 14:46:47 (UTC+0200) https://superview.XXX/webservices/discovery.php?ip=&action=discover

We can see that each hook triggerd is well executed, sometimes with a little delayed time because a macwalk occurs.

Just a question. Fix seems to be just a add into config.yml file: is the parameter "job_prio" overridable into deployment.yml ? It's not mentionned into wiki.

@ollyg
Copy link
Member

ollyg commented Oct 4, 2021

Hi @pilyon3 sorry for the slow answer, yes job_prio is overridable but is not documented.

@ollyg ollyg closed this as completed Oct 4, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants