Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regular CPU/lag spikes #5046

Closed
ahgraber opened this issue Jun 14, 2021 · 5 comments
Closed

Regular CPU/lag spikes #5046

ahgraber opened this issue Jun 14, 2021 · 5 comments
Labels
support Community support

Comments

@ahgraber
Copy link

Important notices

Before you add a new report, we ask you kindly to acknowledge the following:

Describe the bug
I have opnsense OPNsense 21.1.6-amd64 running on a protectli box with a 4-core Celeron J3160 @ 1.60GHz. I've noticed semi-regular spikes (usually around once/minute) of up to 80% CPU usage. This causes poor quality during VOIP/video calls, streaming, and gaming.
At first, I thought it might be intrusion detection, but I have disabled that and still see similar behavior.
When I ssh in and look at top, I see that the spike seems to correspond with python3.7 processes (not always the same PID) running as root.
Based on this post, it appears my problem may be my LAGG/LACP setup between the firewall and switch. Are there know fixes for this?

This has occurred for this and at least the prior version; I do not directly recollect 2 versions prior, and I don't have the capacity to take the internet down for prolonged testing.

To Reproduce

Expected behavior
I don't care if CPU utilization spikes, but I do expect it to not affect service quality

Describe alternatives you considered

  • I have tried with/out intrusion detection enabled, and it does not change this behavior.
  • It does not appear to be dependent on number of devices connected or what those devices are doing

Screenshots
image

image

Relevant log files
Happy to provide any logs requested; I don't know of any that would directly be useful.

Environment

Software version used and hardware type
OPNsense 21.1.6-amd64
FreeBSD 12.1-RELEASE-p16-HBSD
OpenSSL 1.1.1k 25 Mar 2021
Intel(R) Celeron(R) CPU J3160 @ 1.60GHz (4 cores)

LAG/LACP connection to TP-Link T1700G-28TQ v3

@fichtner fichtner added the support Community support label Jun 14, 2021
@fichtner
Copy link
Member

Hmm, a userspace process hog is caused by kernel LACP processing? It is a bit far-fetched. ;)

It would help to know which python process this is. The command name isn't helpful at all.

# top -a  | cat

After all, it's likely a setup quirk.

Cheers,
Franco

@ahgraber
Copy link
Author

Thanks for the quick response.

Primary looks like /usr/local/opnsense/scripts/netflow/flowd_aggregate.py, sometimes compounded by simultaneous /usr/local/opnsense/scripts/filter/update_tables.py updates:
image

@fichtner
Copy link
Member

So you have Insight running and maybe also large alias URL tables? In both cases a faster CPU can certainly help avoid any bumps.

Cheers,
Franco

@ahgraber
Copy link
Author

I can probably limit insight/netflow to run only when I have specific problems I'm trying to diagnose.

I don't see a global switch to enable/disable netflow reporting. I'm inferring from the docs that I should just remove all listening interfaces to disable?

TYVM for your help!

@fichtner
Copy link
Member

Yes, clear "Listening interfaces" and perhaps "Capture local" (ends up as destination entry "127.0.0.1:2056" which can be cleared as well). Close ticket then? :)

Cheers,
Franco

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
support Community support
Development

No branches or pull requests

2 participants