New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve ports polling #5805

Merged
merged 12 commits into from Feb 14, 2017

Conversation

Projects
None yet
8 participants
@laf
Member

laf commented Feb 7, 2017

DO NOT DELETE THIS TEXT

Please note

Please read this information carefully. You can run ./scripts/pre-commit.php to check your code before submitting.

  • Have you signed the Contributors agreement - please do NOT submit a pull request unless you have (signing the agreement in the same pull request is fine). Your commit message for signing the agreement must appear as per the docs.
  • Have you followed our code guidelines?

Background: https://community.librenms.org/t/looking-for-testers-for-port-polling-update/524/1

Looking for code review.

@mention-bot

This comment has been minimized.

Show comment
Hide comment
@mention-bot

mention-bot Feb 7, 2017

Thank you for submitting a PR @laf! We have found the following @f0o, @murrant and @paulgear based on the history of these files to review this PR.

mention-bot commented Feb 7, 2017

Thank you for submitting a PR @laf! We have found the following @f0o, @murrant and @paulgear based on the history of these files to review this PR.

@LibreNMS-CI

This comment has been minimized.

Show comment
Hide comment
@LibreNMS-CI

LibreNMS-CI commented Feb 7, 2017

Auto-Deploy finished, Test PR at http://5805.ci.librenms.org or https://5805.ci.librenms.org

@geordish

This comment has been minimized.

Show comment
Hide comment
@geordish

geordish Feb 7, 2017

Contributor

I like what you have done, and it looks fairly sane (although I've not ran it!)

Would checking for ifOper in your initial walk, and only collecting stats on those that are up be beneficial also?

Contributor

geordish commented Feb 7, 2017

I like what you have done, and it looks fairly sane (although I've not ran it!)

Would checking for ifOper in your initial walk, and only collecting stats on those that are up be beneficial also?

@laf

This comment has been minimized.

Show comment
Hide comment
@laf

laf Feb 7, 2017

Member

Someone else mentioned that and I considered it before creating the PR last night. The issue I see is that the port could have just gone down 1 second before and that means we lose the last 5 minutes of data. It could flap or be mis-reported.

It's easy enough to add in but I wasn't sure it was a safe thing to do.

Member

laf commented Feb 7, 2017

Someone else mentioned that and I considered it before creating the PR last night. The issue I see is that the port could have just gone down 1 second before and that means we lose the last 5 minutes of data. It could flap or be mis-reported.

It's easy enough to add in but I wasn't sure it was a safe thing to do.

@geordish

This comment has been minimized.

Show comment
Hide comment
@geordish

geordish Feb 7, 2017

Contributor

That is a fair point.

What about storing the previous ifOper status of the port (We might already?) and not polling if it was down, and is currently down.

If it is up, or was up, then collect the stats and update to the new ifOper status?

Contributor

geordish commented Feb 7, 2017

That is a fair point.

What about storing the previous ifOper status of the port (We might already?) and not polling if it was down, and is currently down.

If it is up, or was up, then collect the stats and update to the new ifOper status?

@laf

This comment has been minimized.

Show comment
Hide comment
@laf

laf Feb 7, 2017

Member

We do indeed, both ifOper and ifAdmin so yes, an excellent idea :)

Member

laf commented Feb 7, 2017

We do indeed, both ifOper and ifAdmin so yes, an excellent idea :)

@rucarrol

This comment has been minimized.

Show comment
Hide comment
@rucarrol

rucarrol Feb 7, 2017

Contributor

ok, I enabled this on some QFabric and EX3300 stacks. While it's a bit early, I can see a VERY huge improvement (About 50%).

Someone else mentioned that and I considered it before creating the PR last night. The issue I see is that the port could have just gone down 1 second before and that means we lose the last 5 minutes of data. It could flap or be mis-reported.

Yeah that was me. I understand the concern, however to count the point - you're assuming that ifoper and ifadmin are going to be constant across the poll.

If you wanted to tighten the timing around then, then each port should have ifOper and ifAdmin polled just before fetching port stats.

Anyway, I'm very happy with this patch so far. I'll leave it run 1-2 days before enabling globally.

Contributor

rucarrol commented Feb 7, 2017

ok, I enabled this on some QFabric and EX3300 stacks. While it's a bit early, I can see a VERY huge improvement (About 50%).

Someone else mentioned that and I considered it before creating the PR last night. The issue I see is that the port could have just gone down 1 second before and that means we lose the last 5 minutes of data. It could flap or be mis-reported.

Yeah that was me. I understand the concern, however to count the point - you're assuming that ifoper and ifadmin are going to be constant across the poll.

If you wanted to tighten the timing around then, then each port should have ifOper and ifAdmin polled just before fetching port stats.

Anyway, I'm very happy with this patch so far. I'll leave it run 1-2 days before enabling globally.

@laf

This comment has been minimized.

Show comment
Hide comment
@laf

laf Feb 7, 2017

Member

@geordish Actually just went do this update and then realised, we'd have to add ifOperStatus and/or ifAdminStatus to the list of full walks we do which is where things slow down slightly. What do you reckon, still worth it?

Member

laf commented Feb 7, 2017

@geordish Actually just went do this update and then realised, we'd have to add ifOperStatus and/or ifAdminStatus to the list of full walks we do which is where things slow down slightly. What do you reckon, still worth it?

@geordish

This comment has been minimized.

Show comment
Hide comment
@geordish

geordish Feb 7, 2017

Contributor
Contributor

geordish commented Feb 7, 2017

@laf

This comment has been minimized.

Show comment
Hide comment
@laf

laf Feb 7, 2017

Member

Not walked but multi get. If the all ports were up then in theory this would be slower, if some were down then it would be quicker depending on how many.

I'll ask one of the testers to benchmark a walk so we can see.

Member

laf commented Feb 7, 2017

Not walked but multi get. If the all ports were up then in theory this would be slower, if some were down then it would be quicker depending on how many.

I'll ask one of the testers to benchmark a walk so we can see.

@laf

This comment has been minimized.

Show comment
Hide comment
@laf

laf Feb 7, 2017

Member

This is possibly ok for a merge as is at present, the community post shows this working well and it's not enabled until someone chooses to.

Member

laf commented Feb 7, 2017

This is possibly ok for a merge as is at present, the community post shows this working well and it's not enabled until someone chooses to.

@laf laf referenced this pull request Feb 7, 2017

Closed

Long discovery and polling times on Enterasys N7 Switches #4988

4 of 4 tasks complete
@laf

This comment has been minimized.

Show comment
Hide comment
@laf

laf Feb 8, 2017

Member

Ok from one users test output and a back of a fag packet approach:

We take the total time to poll ports, and add 50% on (we do 4 walks and would be adding 2 more).

Test 1. This ran the ports module in 19 seconds without the patch and 9 seconds with. We could expect that then to change to 13.5 seconds so still under the original but not as effective overall. This doesn't however take into account ports that are down and have been down for the last poll.

Test 2. This ran the ports module in 217 seconds without the patch and 40 seconds with. We could expect that then to change to 60 seconds so way under the original still.

Seems like it's worth doing that and tracking the port status, here I've assumed we should track ifOperStatus and ifAdminStatus as both would result in no data. It would mean we wouldn't pick up things like ifType changes but I think that's an acceptable compromise.

Member

laf commented Feb 8, 2017

Ok from one users test output and a back of a fag packet approach:

We take the total time to poll ports, and add 50% on (we do 4 walks and would be adding 2 more).

Test 1. This ran the ports module in 19 seconds without the patch and 9 seconds with. We could expect that then to change to 13.5 seconds so still under the original but not as effective overall. This doesn't however take into account ports that are down and have been down for the last poll.

Test 2. This ran the ports module in 217 seconds without the patch and 40 seconds with. We could expect that then to change to 60 seconds so way under the original still.

Seems like it's worth doing that and tracking the port status, here I've assumed we should track ifOperStatus and ifAdminStatus as both would result in no data. It would mean we wouldn't pick up things like ifType changes but I think that's an acceptable compromise.

@laf

This comment has been minimized.

Show comment
Hide comment
@laf

laf Feb 8, 2017

Member

Ok updated.

Member

laf commented Feb 8, 2017

Ok updated.

@LibreNMS-CI

This comment has been minimized.

Show comment
Hide comment
@LibreNMS-CI

LibreNMS-CI commented Feb 8, 2017

Auto-Deploy finished, Test PR at http://5805.ci.librenms.org or https://5805.ci.librenms.org

@LibreNMS-CI

This comment has been minimized.

Show comment
Hide comment
@LibreNMS-CI

LibreNMS-CI commented Feb 8, 2017

Auto-Deploy finished, Test PR at http://5805.ci.librenms.org or https://5805.ci.librenms.org

@LibreNMS-CI

This comment has been minimized.

Show comment
Hide comment
@LibreNMS-CI

LibreNMS-CI commented Feb 8, 2017

Auto-Deploy finished, Test PR at http://5805.ci.librenms.org or https://5805.ci.librenms.org

@geordish

This comment has been minimized.

Show comment
Hide comment
@geordish

geordish Feb 8, 2017

Contributor

is ifAdmin strictly necessary? If a port is admin up, the Oper state may be up or down. If the port is admin down, the port will be down.

ifAdmin ifOper Port State
Up Up Up
Up Down Down
Down Down Down

The missing truth (right word here?) from that table is impossible.

Seems to be that the Port State will always be reliant on the ifOper state.

Contributor

geordish commented Feb 8, 2017

is ifAdmin strictly necessary? If a port is admin up, the Oper state may be up or down. If the port is admin down, the port will be down.

ifAdmin ifOper Port State
Up Up Up
Up Down Down
Down Down Down

The missing truth (right word here?) from that table is impossible.

Seems to be that the Port State will always be reliant on the ifOper state.

@laf

This comment has been minimized.

Show comment
Hide comment
@laf

laf Feb 8, 2017

Member

You're correct that's it not necessary as if the port is admin down then ifOper would be down (or should be), however having seen many vendors IF-MIB implementation I don't think we should rely on it + it looks cleaner so people know what it's doing rather than having to know what's intended.

Member

laf commented Feb 8, 2017

You're correct that's it not necessary as if the port is admin down then ifOper would be down (or should be), however having seen many vendors IF-MIB implementation I don't think we should rely on it + it looks cleaner so people know what it's doing rather than having to know what's intended.

@philmd

This comment has been minimized.

Show comment
Hide comment
@philmd

philmd Feb 8, 2017

Contributor

poller_perf

tested with Mikrotik RouterOS 6.37.4, working well.

Contributor

philmd commented Feb 8, 2017

poller_perf

tested with Mikrotik RouterOS 6.37.4, working well.

@LibreNMS-CI

This comment has been minimized.

Show comment
Hide comment
@LibreNMS-CI

LibreNMS-CI commented Feb 9, 2017

Auto-Deploy finished, Test PR at http://5805.ci.librenms.org or https://5805.ci.librenms.org

@murrant

This comment has been minimized.

Show comment
Hide comment
@murrant

murrant Feb 9, 2017

Member

Any thoughts on using this automatically when we think it will be faster instead of having a config option?

Or do you want wait till it is better tested?

Member

murrant commented Feb 9, 2017

Any thoughts on using this automatically when we think it will be faster instead of having a config option?

Or do you want wait till it is better tested?

@laf

This comment has been minimized.

Show comment
Hide comment
@laf

laf Feb 9, 2017

Member

How do you propose we auto-enable it?

query ports table for count of total, deleted, ifOperStatus == ifOperStatus_prev and ifAdminStatus == ifAdminStatus_prev and if we have 50% of deleted or down ports turn it on?

Member

laf commented Feb 9, 2017

How do you propose we auto-enable it?

query ports table for count of total, deleted, ifOperStatus == ifOperStatus_prev and ifAdminStatus == ifAdminStatus_prev and if we have 50% of deleted or down ports turn it on?

@murrant

This comment has been minimized.

Show comment
Hide comment
@murrant

murrant Feb 10, 2017

Member

That is basically what I had in my mind. Perhaps we add this as is, then we can collect some performance data and see how that would work later. We can figure out where the performance tipping point roughly is.

Member

murrant commented Feb 10, 2017

That is basically what I had in my mind. Perhaps we add this as is, then we can collect some performance data and see how that would work later. We can figure out where the performance tipping point roughly is.

@laf

This comment has been minimized.

Show comment
Hide comment
@laf

laf Feb 10, 2017

Member

Yeah the issue at the moment is we don't have the stats to show what's good/bad thresholds.

I'm happy with the code, running it at work and 3/4 users have confirmed it works for them.

Member

laf commented Feb 10, 2017

Yeah the issue at the moment is we don't have the stats to show what's good/bad thresholds.

I'm happy with the code, running it at work and 3/4 users have confirmed it works for them.

@LibreNMS-CI

This comment has been minimized.

Show comment
Hide comment
@LibreNMS-CI

LibreNMS-CI commented Feb 13, 2017

Auto-Deploy finished, Test PR at http://5805.ci.librenms.org or https://5805.ci.librenms.org

@scrutinizer-notifier

This comment has been minimized.

Show comment
Hide comment
@scrutinizer-notifier

scrutinizer-notifier Feb 13, 2017

The inspection completed: No new issues

scrutinizer-notifier commented Feb 13, 2017

The inspection completed: No new issues

@murrant murrant merged commit cc51a68 into librenms:master Feb 14, 2017

2 checks passed

Auto-Deploy Build finished.
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details

@laf laf deleted the laf:port-perf branch Feb 14, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment