Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Netdata notification #2614

Closed
Nirjonadda opened this issue Aug 21, 2017 · 4 comments
Closed

Netdata notification #2614

Nirjonadda opened this issue Aug 21, 2017 · 4 comments
Labels

Comments

@Nirjonadda
Copy link

I am getting some netdata notification. Please let me know this how to fixing netdata notification.

  1. ipv4.udperrors > ipv4 udperrors last collected secs = 00:00:08 ago
  2. netfilter.conntrack_sockets > netfilter last collected secs = 00:00:08 ago
  3. ipv4.tcphandshake > ipv4 tcphandshake last collected secs = 00:00:08 ago
  4. disk_space._var_tmp > out of disk space time = 1h
  5. memcached_local.cache > out of cache space time = 7h
  6. disk_backlog.sda > 10min disk backlog = 3422 ms
  7. disk_space._tmp > out of disk space time = 3h
@ktsaou
Copy link
Member

ktsaou commented Aug 21, 2017

Hi,

The first 3 probably mean that your system "freezes" for some time. netdata tries to read files from /proc and the kernel does not respond, which results in the metrics not being collected, which of course triggers the alarms. Since netdata runs with the idle process scheduling priority, this could happen if your system is extremely busy for a long time (making netdata run in "slow motion"). In most of the cases though, your system is frozen during that time.

4 and 7 mean that your /var/tmp and /tmp are probably too small for your needs. If you are sure they are ok, you can disable these alarms by adding families: !/var/tmp !/tmp * to the alarms to disable them for these disks (the alarm notifications say the source of the alarms).

For 5, your memcached will start loosing data in 7 hours. It may or may not be important for your case. It depends on your data. If it is not important for you, remove this alarm.

For 6, your disk is slowing down your system when this happens. If you don't care, just remove the alarm.

In general, I am trying to ship netdata with a lot of alarms. The notifications sent by netdata point the file and line each alarm is configured. So, if there are alarms you don't care, just comment them out or increase the thresholds to avoid false positives.

@Nirjonadda
Copy link
Author

@ktsaou How can we fix this netdata alarms ?

system - softnet_stat
number of times, during the last 10min, ksoftirq ran out of sysctl net.core.netdev_budget or time slice, with work remaining (this can be a cause for dropped packets)
warning when $this > (($status >= $WARNING) ? (0) : (10))

@ktsaou
Copy link
Member

ktsaou commented Aug 23, 2017

@Nirjonadda I suggest to work like this:

  1. try to understand what the alarm is, by searching online. Also examine the charts linked with the alarms (the dashboard has relative links in many cases). Also, there may be other issues in this repo with enough information (the last one you posted has been discussed extensively here).

  2. If you cannot find enough information, please open an issue that "alarm X is misleading". The outcome if this issue will be to update netdata by changing the wording on the alarm and charts and possibly provide links to relative documentation.

  3. If you searched it enough and concluded that an alarm is wrong, please open an issue that "alarm X is wrong". The outcome of this issue will be to fix the alarm in netdata.

In general, I help everyone with any imaginable issue, while trying to make netdata better. In this process, and while trying to figure out if netdata has a fault or if netdata can be better in some area, I am helping out users. You see, my only goal is to make netdata better.

Please don't misunderstand me. I need input. I love feedback. I love to get issues and work on them. But please focus on making netdata better. Otherwise, I lose my motivation....

@Nirjonadda
Copy link
Author

@ktsaou I am searching in google and get two same issue from Alarm "system.softnet_stat"

#1076

@ktsaou ktsaou closed this as completed Sep 16, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants