@firehol-automation firehol-automation released this Mar 27, 2018 · 796 commits to master since this release

Assets 20

New to netdata? Check its demo: https://my-netdata.io

User Base Monitored Servers Sessions Served

New Users Today New Machines Today Sessions Today


Posted on twitter, facebook, reddit r/linux,


Hi all,

Another great netdata release: netdata v1.10.0 !

This is a birthday release: netdata is now 2 years old !

Many thanks to all the contributors that help building, enhancing and improving a project useful and helpful for thousands of admins, devops and developers around the world! You rock!

- @ktsaou

At a glance

netdata now has a new web server (called static) with a fixed number of threads, providing a lot better performance and finer control of the resources allocated to it.

All dashboard elements (javascript) have been updated to their latest versions - this allows a smoother experience when embedding netdata charts on third party web sites and apps.


IMPORTANT: all users using older netdata are advised to update to this version. This version offers improved stability, security and a huge number of bug fixes, compared to any prior version of netdata.


new plugins

  • BTRFS - monitor the allocations of BTRFS filesystems (yes, netdata can now properly detect when btrfs is going out of space)
  • BCACHE - monitor the caching block layer that allows building hybrid disks using normal HDDs and SSDs
  • Ceph - monitor ceph distributed storage
  • nginx plus - monitor the nginx+ web servers
  • libreswan - monitor IPSEC tunnels
  • Traefik - monitor traefik reverse proxies
  • icecast - monitor icecast streaming servers
  • ntpd - monitor NTP servers
  • httpcheck - monitor any remote web server
  • portcheck - monitor any remote TCP port
  • spring-boot - monitor java spring boot applications
  • dnsdist - monitor dnsdist name servers
  • hugepages - monitor the allocation of Linux hugepages

enhanced / improved plugins

  • statsd
  • web_log
  • containers monitoring
  • system memory
  • diskspace
  • network interfaces
  • postgres
  • rabbitmq
  • apps.plugin
  • haproxy
  • uptime
  • ksm
  • mdstat
  • elasticsearch
  • apcupsd
  • isc-dhcpd
  • fronius
  • stiebeleltron

new alarm notifications methods

  • alerta
  • IRC

And as always, hundreds more enhancements, improvements and bugfixes.


BTRFS monitoring

BTRFS space usage monitoring and related alarms.

netdata is able to detect if any of the space-related components (physical disk allocation, data, metdata and system) of BTRFS is about the become exhausted!

#3150 - thanks to @Ferroin for explaining everything about btrfs...

screenshot from 2017-12-19 01-15-38

bcache monitoring

netdata now monitors bcache metrics - they are automatically added to any disk that is found to be a bcache disk.

ceph monitoring

New plugin to monitor ceph, the unified, distributed storage system designed for excellent performance, reliability and scalability (#3166 @lets00).

containers and VMs monitoring

  • netdata now monitors systemd-nspawn containers.
  • netdata now renames charts of kubernetes containers.
  • virsh is now called with -r to avoid prompting for password #3144
  • cgroup-network is now a lot more strict, preventing unauthorized privilege escalation #3269
  • cgroup-network now searches for container processes in sub-cgroups too - this improves the mapping of network interfaces to containers
  • cgroup-network now works even when there are no veth interfaces in the system

monitor ntpd

netdata can now monitor isc-ntpd. @rda0 did a marvelous job decoding NTP Control Message Protocol, collecting ntpd metrics in the most efficient way #3421, #3454 @rda0

ntpd_system

btw, netdata also monitors chrony but the chrony module of netdata is disabled by default, because certain CentOS versions ship a version of chrony that consumes 100% cpu when queried for statistics.

nginx plus web servers monitoring

Added python plugin to monitor the operation of nginx plus servers. The plugin monitors everything about nginx+, except streaming #3312 @l2isbad

libreswan IPSEC tunnels monitoring

netdata now monitors libreswan tunnels - #3204
screenshot from 2018-01-03 00-32-14

remote HTTP/HTTPS server monitoring

netdata now has an httpcheck plugin (module of python.d.plugin), that can query remote http/https servers, track the response timings and check that the response body contains certain text #3448 @ccremer .

httpcheck

remote TCP port monitoring

netdata now has portcheck plugin (module of python.d.plugin), that can check any remote TCP port is open #3447 @ccremer

portcheck

icecast streaming server monitoring

netdata now monitors icecast servers #3511 @l2isbad.

traefik reverse proxy monitoring

netdata now monitors traefik reverse proxies - #3557.

spring-boot monitoring

netdata can now monitor java spring-boot applications @Wing924
2018-02-23 11 34 37
2018-02-23 11 34 48

dnsdist

netdata now monitors dnsdist name servers - @nobody-nobody #3009

statsd

  • statsd dimensions now support the options the external plugin dimensions support (currently the only usable option is hidden to add the dimension, but make it hidden on the dashboard - a hidden dimension can participate in various calculations, including alarms).
  • statsd now reports the CPU usage of its threads at the netdata section.
  • statsd metrics are logged to access.log the first time they are encountered.
  • statsd metrics now accept the special value zinit to allow them get initialized without altering their values (this is useful if you have rare metrics that you need to initialize when netdata starts).
  • statsd over TCP is now a lot faster - netdata can process up to 3.5mil statsd metrics / second using just one core. Added options to control the timeouts of TCP statsd connections.
  • fixed the title and context of statsd private charts
  • statsd private charts can now be hidden from the dashboard #3467

postgres

Several new charts have been added to monitor (#3400 by @anayrat):

  1. checkpointer charts
  2. bgwriter charts
  3. autovacuum charts
  4. replication delta charts
  5. WAL archive charts
  6. WAL charts
  7. temporary files charts

Also, the postgres plugin now also works when postgres is in recovery mode.

rabbitmq

  • added Erlang run queue chart. This is useful in conjunction with the existing Erlang processes chart to get a better overall idea of what's going on in the Erlang VM. @arch273
  • added rabbitmq information on the dashboard to complement the charts.

apps.plugin

netdata prior to this version was detecting the user and group of processes by examining the ownership of /proc/PID/stat. Unfortunately it seems that the owneship of files in /proc do not change when the process switches user. So, netdata could not detect the user and group of processes that started as root and then switched to another user.

Now netdata reads /proc/PID/status:

  • process ownship information is now accurate
  • eliminated the need to read /proc/PID/statm (all the information of /proc/PID/statm is available in /proc/PID/status)
  • allowed netdata to read VmSwap, so a new chart has been added to monitor the swap memory usage per process, user and group. screenshot from 2018-02-24 15-07-47
  • fixed issue with unreasonable spikes on processes cpu on FreeBSD (there was a typo) #3245
  • fixed issue with errors reported on FreeBSD about pid 0 #3099

The new plugin is 20% more expensive in terms of CPU. We tried hard to optimize it, but this is as good as it can get. Read about it at #3434 and #3436

haproxy

Added charts:

  • hrsp_1xx, hrsp_2xx, hrsp_3xx, hrsp_4xx, hrsp_5xx, hrsp_other, hrsp_total for backands and frontends
  • qtime, ctime, rtime, ttime metrics for backend servers
  • backend servers In UP state

@ktarasz

uptime

netdata now uses /proc/uptime when CLOCK_BOOTTIME does not report the same uptime. In containers CLOCK_BOOTTIME reports the uptime of the host, while /proc/uptime reports the uptime of the container, so now netdata correctly reports the uptime of the container.

mdstat

various fixes to better monitor rebuild time and rate @l2isbad

KSM

  • removed to_scan dimension
  • the savings % reported by netdata was less than the actual - fixed it.

elasticsearch

Added several charts for translog / indices segments statistics and JVM buffer pool utilization, which are often helpful when evaluating an elasticsearch node health #3544 @NeonSludge

memory monitoring

  • treat slab memory as cached #3288 @amichelic
  • added a new chart for monitoring the memory available for use, before hitting swap screenshot from 2018-01-07 03-38-30
  • netdata now monitors Linux hugepages and transparent hugepages screenshot from 2018-02-24 14-28-44
  • added hugepages monitoring #3462screenshot from 2018-02-23 15-07-26

diskspace monitoring

  • support huge amounts of mountpoints #3258 - netdata was crashing with stack overflow due to recursion - now it is loop, so any number of mount points is supported

network monitoring

  • moved tcp passive and active opens to a separate chart, to allow the TCP issues dimensions scale better by default #3238
  • updated the information presented on TCP charts to match the latest v4.15 kernel source #3239

APC UPS

netdata now supports monitoring multiple APC UPSes.

ISC DHCPd

netdata now also supports monitoring IPv6 leases - @l2isbad

fronius

stiebeleltron

web_log

Added web server response timings histogram #3558 @Wing924 .
2018-03-19 0 06 00

python.d.plugin

  • python.d.plugin can now start even if /etc/netdata/python.d.conf is missing @l2isbad
  • python.d.plugin now has an internal run counter @l2isbad
  • the unicode decoding of the plugin has been fixed (#3406) @l2isbad
  • the plugin now does not validate self-signed certificates @l2isbad
  • the plugin can not revive obsolete charts @l2isbad

charts.d.plugin

charts.d.plugin BASH modules can now have custom number of retries in case of data collection failures #3524.

web server

  • netdata now has a new internal web server that supports a fixed number of threads - we call it static web server. This web server allows netdata to work around memory fragmentation (since the treads are fixed, the underlying memory allocators reuse the same memory arenas) and cpu utilization (we can control the number of threads that will be used by netdata). This is the default now. #3248
  • now the static threads web server reports the CPU usage of each of its threads.
  • the HTTP response headers now include the netdata version

dashboard

  • the print button now respects the URL path netdata is hosted.

  • dygraphs updated to the latest version - this fixes an issue that prevented netdata charts from being interactive under certain conditions

  • added dygraph theme logscale #3283

  • fontawesome updated to version 5

  • d3 updated to the latest version (this broke c3 charts that require an older version)

  • added d3pie charts optimized-d3pie

  • custom dashboards can now have alarms for specific roles (all, none, one or more).

  • allow stacked charts to zoom vertically when dimensions are selected peek 2018-01-27 13-35

  • netdata now has a global XSS protection #3363 screenshot from 2018-01-30 00-30-05

  • netdata now uses intersectionObserver when available #3280 - this improves the scrolling performance of the dashboard.

  • prevent date, time and units from wrapping at the charts legends #3286

  • various units scaling improvements #3285

  • added data-common-colors="NAME" chart option for custom dashboards #3282.

  • added wiki page for creating custom dashboards on Atlassian's Confluence. final-confluence4

  • prevented a double click on the charts' toolbox to select the text of the buttons.

  • fixed the alignment of dashboard icons #3224 @xPaw

  • added a simple js, called refresh-badges.js, to update badges on a custom web page

badges

netdata badges can now be scaled #3474

screenshot from 2018-02-26 01-50-33
screenshot from 2018-02-26 01-50-55
screenshot from 2018-02-26 01-51-21

API

  • added gtime parameter, for group time. This is used to request from netdata to return values in a different rate (i.e. gtime=60 on a X/sec dimension, will return X/min).
  • fixed a rounding bug in JSON generation #3309
  • the dimensions= parameter now supports simple patterns #3170 and added option values match-ids and match-names to control which matches are executed for dimensions.

alarms

  • system.swap alarms now send notifications with a 30 seconds delay, to work-around a kernel bug that incorrectly reports all swap as instantly used under containers #3380.

  • added alarm to predict the time a mount point will run out of inodes #3566.

  • all system alarms are now ported to FreeBSD too #3337 @arch273

  • added alerta.io notifications @kattunga

  • added available memory alarm screenshot from 2018-01-07 03-39-05

  • removed unsupported html tags from hipchat notifications.

  • pagerduty notifications have been modified to avoid incident duplication #3549.

  • alarm definitions can now use both chart IDs and chart names (prior to this version only chart IDs were allowed).

  • curl options (eg for disabling SSL certificates verification) for alarm-notify.sh can now be defined in health_alarm_notify.conf.

  • netdata can now send notifications to IRC channels #3458 @manosf

    IRCCloud web client:
    image

    Irssi terminal client:image

backends

  • on netdata masters, allow filtering the hosts that will be sent to backends with send hosts matching = * pattern.
  • improved connection error handling and added retries to allow netdata connect to certain backends that failed with EALREADY or EINPROGRESS.
  • json backends now receive host tags (the tags have to be formatted in a json friendly way) #3556.
  • re-worked the alarm that triggers when backend data are lost, to avoid flip-flops.

prometheus backends

  • added URL option timestamps=yes|no to /api/v1/allmetrics to support prometheus Pushgateway #3533
  • added netdata_info variable with the version of netdata
  • renamed netdata_host_tags to netdata_host_tags_info (the old exists but is deprecated and will be removed eventually)
  • when prometheus uses average metrics, netdata remembers the last access time the prometheus collected metrics, on a per host basis.

metrics streaming between netdata

  • netdata masters and proxies now expose the version of the netdata collecting the metrics, not their own. So, now a netdata master shows on the dashboard and sends to backends the version of the netdata collecting the metrics #3538.
  • added stream.conf option multiple connections = accept | deny to allow or deny multiple connection for the same netdata host. The default remains accept, but it is likely to be changed to no on future versions.

packaging

  • added docker hub builds for aarch64/arm64 @justin8
  • updated debian containers to use stretch @justin8
  • added FreeBSD init file
  • various installers fixes and improvements (make sure netdata is started, do not give information about features not supported on each operating system, allow non-root installations without errors, etc.)
  • various installer fixes for FreeBSD and MacOS
  • netdata-updater was growing the PATH variable on each of its runs - fixed it.
  • added --accept and --dont-start-it command line options to kickstart-static64.sh
  • netdata can be compiled with long double support (useful in embedded devices that don't support long double numbers) #3354
  • fixed netdata.spec to allow building netdata on older and newer rpm based distros. Also added a script to build a netdata rpm
  • static netdata installer now tries to find the location of the SSL ca-certificates on a system and properly configured the static curl provided with this path.
  • the netdata updater starts netdata only if it was running
  • added alpine dockerfile

other

  • added global option gap when lost iterations to control the number of iterations that should be lost to show a gap on the charts.
  • various fixes/improvements related to netdata logs - the main change is that now netdata logs the thread name that logged the message, providing helpful insights about the thread that complained.
  • re-worked the exit procedure of netdata to allow it cleanup properly - sometimes netdata was deadlocked during exit, waiting forever - now netdata always exits promptly #3184
  • fixed compilation on ancient gcc versions
  • netdata was always setting itself to the idle process scheduling priority, even when it was configured to do otherwise. Fixed it #3523

@firehol-automation firehol-automation released this Dec 16, 2017 · 1510 commits to master since this release

Assets 20

New to netdata? Check its demo: https://my-netdata.io

User Base Monitored Servers Sessions Served

New Users Today New Machines Today Sessions Today


Overview of netdata v1.9

  1. snapshots
    We can now save and load dashboard snapshots for any timeframe in any resolution. snapshots allow us to save artifacts, evidence, documentation of incidents, or just the raw data for postmortem analysis.

  2. highlighted time-frame
    We can now highlight a selected time-frame on all dashboard charts. So, to quickly compare charts press ALT or CONTROL and select an area on one chart. The same area will be highlighted on all charts.

  3. export to PDF
    We can now export netdata dashboards to PDF, for any timeframe with any detail.

  4. access lists (IP filtering)
    We can now setup IP filtering at netdata.conf for all functions of netdata (dashboard access, streaming, registry, badges, etc - no more iptables rules for protecting netdata).

  5. TCP overflows and connection drops
    netdata can now detect TCP listening sockets overflows and connection drops, for any server running on the host (even the ones netdata is not aware of).

  6. libvirt VMs
    netdata now detects libvirt network interfaces and moves them to VM section of the dashboard (it also supports .libvirt-qemu naming of cgroups).

  7. Units auto-scaling
    netdata dashboards can now scale units (KB -> MB -> GB -> TB, etc), on the fly.

  8. Units conversions
    netdata dashboards can now convert units (eg. Celsius to Fahrenheit, seconds to HH:MM:DD, etc), on the fly.

  9. Multiple Timezones
    netdata dashboards can now change timezone on the fly (yes, we can now compare charts with server logs).

  10. python.d.plugin rewritten
    @l2isbad rewrote the whole of it, to add flexibility and support the latest netdata features! The new plugin supports the old python modules.

  11. better / faster dashboard scrolling
    netdata now uses passive event listeners to detect page scrolling. This improved significantly the responsiveness of the dashboard (check your dashboard settings: sync scrolling is the fastest, async is closer to the older behavior).

  12. netdata now monitors couchdb, powerdns, beanstalkd and dnsdist !

  13. netdata now detects redis background save failures

  14. netdata can now send flock.com and kavenegar.com alarm notifications

and as always... dozens more improvements, enhancements, new features and bug fixes!


netdata dashboard snapshots !

Netdata can now export and import dashboard snapshots.

Snapshots are JSON files containing everything the dashboard needs to be rendered: charts and chart data.

They are exported as JSON files, to your computer. The saved snapshots can be loaded back on any netdata dashboard (even of different host). When importing, not network traffic is generated. The web browser loads the local file and renders an interactive dashboard to examine it.

The current visible timeframe of the dashboard is respected, so first align the dashboard to the timeframe required and the click "Export". The pop-up allows selecting the resolution of the export (its detail).

peek 2017-11-13 13-13


highlighted time-frame !

Press the ALT or CONTROL key and select a time-frame at a chart. An overlay will appear with the selected time-frame and all the charts will highlight the same region.

The highlighted time-frame:

  1. Is added to the URL hash, so that reloading the page keeps it
  2. Is propagated to other netdata servers, via the my-netdata menu
  3. Is save in dashboard snapshots (and of course restored when they are loaded back)

peek 2017-11-19 19-39

Also, netdata charts can now be zoomed vertically (use the SHIFT key, like in zoom, but select the chart vertically):

peek 2017-11-19 20-10


netdata dashboards to PDF !

netdata dashboards can now be printed to PDF. Just click the 🖨 icon on the dashboard.

The current visible timeframe of the dashboard is respected, so first align the dashboard to the timeframe required and the click "Print".

peek 2017-11-11 19-55


netdata now supports API access lists (IP filtering)

netdata can now check the client IPs connecting to it and deny/allow access based on your settings. No more iptables rules to control access to netdata.

All these settings are netdata simple patterns that are checked against the client IP (string matching - not subnet matching). localhost clients (IPv4, IPv6 and unix domain sockets) can be matched with localhost:

Global access control

  • [web].allow connections from to match the clients' IPs allowed to connect to netdata. This has the same effect with iptables (but implemented at the application level - so clients will get connected, and disconnected immediately if they are not allowed access, without any response from netdata).

Dashboard access control

  • netdata.conf: [web].allow dashboard from to match the clients' IPs that are allowed to access the dashboard (ie fetch static files and query netdata API).
  • netdata.conf: [web].allow badges from to match the clients' IPs that are allowed to access badges (the dashboard clients are allowed to access badges too, so this setting allows badges to clients that do not have access to the dashboard).

Streaming access control

  • netdata.conf: [web].allow streaming from to match the the clients' IPs that are allowed to stream to stream metrics.
  • stream.conf: [API_KEY].allow from to match the clients' IPs allowed to push metrics for the given API KEY.
  • stream.conf: [MACHINE_GUID].allow from to match the clients' IPs allowed to push metrics for the specific machine.

netdata will also check the API keys supplied by slaves and proxies connected.

Other access lists

  • netdata.conf: [web].allow netdata.conf from to limit the clients that can get netdata.conf - by default netdata allows only private IPs.
  • netdata.conf: [registry].allow from to limit the clients allowed to access the registry (only when this netdata acts as a registry).

netdata detects TCP listening sockets overflowing or dropping connections

Added a new chart: ipv4.tcplistenissues with dimensions ListenOverflows and ListenDrops.

This chart detects if any listening TCP socket on the host, is overflown, or it drops connections. This is system-wide: any listening TCP socket, of any application.

The chart will not be shown if these kernel counters are zero. It will be enabled automatically if it is found non-zero at any point (it is collected via /proc/net/netstat every second). If you need to enable it even if it is zero, edit netdata.conf and set:

[plugin:proc:/proc/net/netstat]
	TCP listen issues = yes

Two alarms have been added, one for ListenOverflows and one for ListenDrops that detect if there is any overflow or drop in the last minute (they run every 10 seconds).

slack alarm for overflows:

image

slack alarm for drops:

image

and the alarms configuration:

screenshot from 2017-10-09 23-04-05

The alarms will automatically be attached when the chart is active.

The overflows dimension and alarm is supported on FreeBSD too.

/proc/net/sockstat and /proc/net/sockstat6

These files provide sockets statistics for all protocols.

screenshot from 2017-11-07 02-39-37

netdata also adds 3 new alarms:

  1. too many tcp orphan sockets
  2. tcp memory that detects that the tcp stack is under memory pressure or close to giving memory errors
  3. too many tcp connections (for kernels that do not support dynamic allocation of connections)

Streaming

  • netdata proxies with more than 100 slaves, had a timing issue that caused them to crash randomly on slave reconnects. Parts of the code have been rewritten to get rid of the timing issue.

  • netdata slaves and proxies, now have a protection that ensures they will never use 100% CPU, even if the master is misbehaving.

  • expired orphaned hosts are now removed from the my-netdata menu of the dashboard.

  • streaming functions can now be monitored via access.log

  • streaming now support IP filtering. So the entire streaming functionality, API keys and MACHINE GUIDs can be associated with one or more IPs or IP patterns.

  • streaming now transfers alarm variables too


python.d.plugin rewritten

@l2isbad did a marvelous job rewriting python.d.plugin. The new plugin:

  1. supports option autodetection_retry: SECONDS. When set to non-zero, the plugin will re-check the module every that many seconds. This solves the problem that netdata did not persist on collecting metrics from applications, if the application is not found running when netdata starts. By default is zero for all modules, so you need to enable it for all the applications you need it.

  2. got a rewrite of several functions, like logging, module configuration, chart and dimensions management.

  3. the new URL service disables by default certificates checks, to allow self-signed certificates to work without configuration.

The new plugin is compatible with custom python modules developed for the previous version.


web_log plugin

  • custom regex now supports parsing hostnames and IPs @l2isbad

  • web_log now parses lines with error 408 (request timeout - these are a special case, since the request has not received by the web server, so the log line is incomplete) @l2isbad

  • now properly parses resp_length with value - @racciari


couchdb monitoring

CouchDB maintainer @wohali, submitted a couchdb plugin for netdata. The plugin monitors:

  • database activity
  • http response codes
  • server operations
  • per DB statistics

mwsnap 2017-09-29 22_54_33
mwsnap 2017-09-29 22_54_44


redis monitoring

2 charts have been added to monitor background save health status, bundled with 2 alarms that detect if background save has failed, or background save is slow (warn > 10 mins, crit > 20min). @l2isbad

screenshot_20170925_092235


Other new and enhanced plugins

  • netdata now monitors PowerDNS, @l2isbad

  • netdata now monitors beanstalkd, @l2isbad

  • netdata now monitors dnsdist, @nobody-nobody

  • disks under Linux are renamed using /dev/disk/by-label. An option has been added at netdata.conf to also allow renaming based on /dev/disk/by-id.

  • chrony is now disabled by default, because there have been reports that chronyc enters an infinite loop in CentOS and RHEL.

  • tomcat improvements to support flavors of the tomcat server @Wing924

  • zfs on FreeBSD now monitors ZFS TRIM statistics

  • disks monitoring charts on FreeBSD got a lot more FreeBSD related dimensions.

  • added CPU frequency charts on FreeBSD (Linux already had them).

  • chart system.io (the total system Disk I/O) is now calculated by aggregating the reads and writes of all physical disks. The previous system.io chart (that is based on pgpgin and pgpgout from /proc/vmstat) is now named system.pgpgio. The key difference is that the new system.io now sees ZFS I/O, and it also correctly and accurately sums the real disk bandwidth of RAID arrays.

  • chart system.net (the total system network bandwidth) is now calculated by aggregating the bandwidth of all physical network interfaces and is common for both IPv4 and IPv6.

  • tc (QoS) charts now sort the dimensions on the legends, the same way tc reports them.

  • postgres versions <= 10 the WAL directory was named pg_xlog' and from 10 upwards has been renamed to pg_wal @facetoe

  • mysql (and mariadb) got new charts for galera replication @spinitron

  • openvpn_log improvements @l2isbad

  • smartd improvements @l2isbad

  • varnish module has been rewritten @l2isbad

  • mdstat regex fix @l2isbad

  • smartd_log improvements @l2isbad

  • dns_query_time improvements @wungad

  • isc_dhcpd improvements @wungad

  • freeipmi.plugin got a command line option (can be given at netdata.conf) to ignore certain sensor IDs that are faulty.

  • freeradius improvements @wungad

  • node.d.plugin bugfixes

Plugins protocol enhancements

  • netdata now supports multiple plugin directories. The setting is the same in netdata.conf, plugins directory = "DIRECTORY1" "DIRECTORY2" ..., up to 20 directories. By default netdata sets:
[global]
      plugins directory = "/usr/libexec/netdata/plugins.d" "/etc/netdata/custom-plugins.d"
  • netdata now supports alarms variables.

    Each plugin can now define host global and chart local variables with static values, that can be used in alarms' expressions. So, hosts and charts can now have any number of static values associated with them (eg. an application server may expose its max connections limit), and these static values can be used to trigger alarms (eg. the current connections, is compared to the max connections variable). The whole setup allows alarm templates to use this feature (eg each netdata can maintain different such variables for each server it monitors).

    Alarm variables are propagated to upstream netdata servers.


O/S - distro support

  • added init file for SLC 6.9 and CloudLinux Server release 6.9

  • packages installer was incorrectly detecting all python versions as version 2.

  • a makeself bug that prevented the static netdata binaries from being installed on busybox systems, has been fixed.

  • openrc startup script (gentoo, alpine) had hardcoded the path to netdata. This affected all static-64bit builds when installed on these distros. Fixed.

  • the static 64bit installer now downloads netdata.conf, much like the git installer does.

  • openrc / gentoo init improvements @candrews

  • enabled support for macOS versions 10.5+ (10.11 was working already) @vlvkobal

  • enabled support for FreeBSD 12 @vlvkobal

  • fixed a crash on macOS hosts with empty disk names.

  • added Dockerfile.armv7hf for running netdata under docker on ARM v7 machines @justin8


Dashboard improvements

  • hover selection of charts is now faster on all browsers. Perfect on Chrome, Firefox and Opera. Quite usable on Edge.

  • the dashboard is now fixed when a modal is open, preventing scrolling the page.

  • the dashboard now uses fontawesome 5.0.1 for icons.

  • the chart names can now be searched with browser control-F (find in page). netdata lazy loads all charts for it was impossible to search of a chart. Now the charts are searchable. This is important on dashboards with several hundreds of statsd charts, because all these charts appear under the same section.

  • netdata now detects libvirt VM network interfaces and moves them to the VM section of the dashboard. The same functionality already exists for containers.

    screenshot from 2017-10-31 01-32-43

  • Show the context of each chart. The context is used in alarm templates. (hover on the date of the chart)

    image

  • Show the resolution of the chart. (hover on the time of the chart)

    image

  • The dashboard now adds a tooltip at the date of the charts, to show the plugin and its module that collects each chart.

  • The dashboard should now put a lot less CPU pressure on the browser when the page does not have focus.

automatic units scaling

The dashboard does dynamic units scaling, on the fly ! It converts:

  • network bandwidth (kilobits/s to megabits/s or gigabits/s)
  • input/output bandwidth (kilobytes/s to megabytes/s or gigabytes/s, similarly for KB/s)
  • memory sizes (MB to KB, GB or TB)
  • disk sizes (GB to MB or TB)

Chart units dynamically adapt based on the value of the selected dimension too:

peek 2017-10-06 22-58

Custom dashboards can give data-desired-units="UNITS" and netdata will automatically convert the presented values to the desired units. UNITS can be any of the supported one, or auto for auto-scaling based on the values, or original to show the original units maintained by the netdata server.

units conversions

The dashboard now supports units conversions. Currently it converts:

temperatures from Celsius to Fahrenheit

image

seconds to human readable duration DDd:HH:MM:SS

image

timezone conversions

netdata can now convert all dates presented to any timezone. Traditionally netdata presented all charts at the timezone of the viewer. This allowed homogeneous central administration of systems that are installed all over the world. However, this was inefficient when we needed to compare the information presented on the dashboard, with the log files of the servers.

So, now netdata can present the charts on any timezone. The netdata server auto-detects the timezone of the server and new dashboard settings have been added to allow this conversion.

If autodetection of the servers timezone fails, the configuration option [global].timezone has been added in netdata.conf to set it. Also, the dashboard itself allows the viewers to configure the timezone (it is saved at browser local storage, so this has to be set just once per viewer).

new dashboard options

To support all the above, the dashboard settings got a new tab, with all the required options:

screenshot from 2017-10-10 23-54-01


statsd improvements

  • statsd metrics can now be added to statsd synthetic charts using patterns. No need to add a dimension line for each statsd metric to be added. netdata will also extract the wildcarded part of the metric name and use that one for the dimension name.

  • dimensions added to statsd synthetic charts, can automatically be renamed using a dictionary. Each synthetic charts application has its own dictionary of name - value pairs, which is used to automatically rename statsd metrics when they are added to synthetic charts.

  • statsd timers and histograms now report zeros when nothing is collected


Badges improvements

  • fixed a bug in netdata badges that was incorrectly matching zero values with the null color condition.

  • added API option display_absolute to allow badges use the signed value for color evaluation, but present the absolute value.


Other Alarm and Alarm Notifications Improvements

  • warning emails sent by netdata, are now a little bit more orange (they were a bit green'sh).

  • added flock.com notifications @tvarsis

  • added kavenegar.com support for SMS notifications @vahit

  • fixed a bug in email notifications that was triggering a corrupted MIME match by anti-spam solutions.

  • pushbullet notifications now track the devices, so that per device filtering at pushbullet is possible. Also improved the formatting a bit. @user501254

  • pushover notifications fixes (the priority of warnings was set incorrectly)

  • alarms can now use variables like this ${variable with spaces or +, -, *, / in it}. So, alarms can now use dimension names with any character in them.


Other Improvements

  • access.log has been refactored to support monitoring all netdata operations

  • inodes monitoring is now by default disabled for mount points based on filesystems that do not have a maximum inode threshold (such as cephfs).

  • rabbitmq has been added to apps_groups.conf so that apps.plugin now monitors (cpu, memory, disk I/O, sockets, etc) for rabbitmq instances.

  • several email and log management apps have been added to email and logs targets of apps_groups.conf, @Flums

  • ceph target added to apps_groups.conf to allow netdata monitor Ceph - the unified, distributed storage system, @k0ste

  • refactored several internal data collection plugins to eliminate a few hundreds of index lookups per second.

  • netdata.conf settings that are loaded from disk, but were the same with the default ones, were generated commented when the server was asked to give its config. Now all loaded settings are generated uncommented.

  • netdata simple patterns can now extract the the wildcarded part of the string they match (used in statsd synthetic charts)

  • netdata simple patterns can allow escaping spaces by prefixing them with a backslash.

@firehol-automation firehol-automation released this Sep 17, 2017 · 2167 commits to master since this release

Assets 20

New to netdata? Check its demo: https://my-netdata.io

User Base Monitored Servers Sessions Served

New Users Today New Machines Today Sessions Today


netdata v1.8.0 released.

This release focuses on metrics streaming improvements and containers monitoring.

As always, this netdata is the fastest and the more stable netdata ever! Update now!

To install or update netdata, click here!

key streaming improvements

bug fix: streaming slaves consuming 100% CPU

netdata, as a slave, was not handling all the error cases properly, resulting in 100% cpu utilization of a single core, under certain conditions. Especially under FreeBSD and macOS slaves, these conditions were always met, so using FreeBSD or macOS as netdata slaves, was completely broken.

bug fix: missing alarm notifications on netdata masters

netdata was incorrectly messing cached alarm state data between the alarms of the mirrored hosts, resulting in alarm notifications not dispatched under certain conditions. This was affecting only netdata masters (ie. netdata servers with more than one host databases, with health monitoring enabled). The alarms were generated and were visible at the dashboards, but the notifications were not always sent.

bug fix: streamed charts with duplicate names

There was a minor issue with charts that were created with name aliases. When these charts were streamed from netdata slaves to netdata masters, they ended up with duplicate chart names (ie instead of type.name they had type.type.name).


key containers monitoring improvements

  • Container network interfaces are now moved to the container section and they are rendered from the container view point (i.e. sent = what the container sent) - no more veth* garbage on the dashboard.

  • The interfaces also appear as eth0 (or whatever the container sees) and they are inside the container section of the dashboard. netdata maps each veth* interface to the right container, using plain cgroups features, so this works for all container managers (docker, lxc, etc).

  • Eliminated the nested containers shown under certain versions of lxc.

  • Also, containers and VMs now have summary gauges on the dashboard

    image


key plugins improvements

python.d.plugin now supports HTTP keep-alive

netdata now uses urllib3 (shipped with netdata for both python v2 and v3) for URLService based plugins.

This enables HTTP keep-alive on all connections, which allows netdata to have permanent connections to third party web applications.

Fixed by @l2isbad


compatibility enhancements

  • better support for Oracle Linux, by @schindlerd
  • better support for Alpine Linux
  • various fixes at the build procedure for macOS
  • fping can now run as non-root, in static binary netdata packages

netdata generic enhancements

  • netdata can now listen on UNIX domain sockets (.sock files). This allows a local web server and netdata to communicate bypassing the network stack (for netdata set bind to = unix:/path/to/netdata.sock - this option supports multiple arguments, so netdata can listen to multiple unix sockets and tcp sockets, at the same time).

  • netdata was assuming that the JSON representation of a chart would at most be 1024 bytes, and it was generating corrupted JSON output when any chart was exceeding that limit. Removed the limitation (ie. now there is no limit).

  • netdata was crashing while starting, if no usable disks were found.

  • systemd netdata.service now allows setting negative netdata OOM score and restarts netdata if it crashes. The new netdata.service is not automatically installed when updating netdata. Either delete /etc/systemd/system/netdata.service and then update/re-install netdata, or copy the file by hand.

  • minor fixes at the installer, by @vincele


new plugins

  • Added Intel CPU temperature charts on FreeBSD and macOS, by @vlvkobal
  • Added CPU thermal throttling charts on Linux (useful on physical servers and possibly laptops)
  • Added chrony plugin, by @domschl
  • Added Stiebel Eltron plugin to collect metrics from heat pumps and hot water installations from Stiebel Eltron ISG @BrainDoctor

improved plugins

  • web_log bugfixes, enhancements and optimizations (including squid logs), by @l2isbad
  • web_log now enables parsing HTTP/2 logs in custom_log_format, by @Funzinator
  • redis bugfixes, by @l2isbad
  • haproxy bugfixes, by @l2isbad
  • elasticsearch bugfixes and optimizations, by @l2isbad
  • rabbitmq bugfixes and optimizations, by @l2isbad
  • mdstat bugfixes, by @JeffHenson
  • tomcat improvements, by @Wing924
  • mysql improvements, by @alibo and @l2isbad
  • dovecot improvements
  • postgres improvements, by @facetoe
  • cpufreq fixed a bug that prevented accurate reporting of CPU frequencies. accurate works with the acpi-cpufreq driver and calculates the average CPU clock of the CPUs utilizing the accounting per frequency, as reported by the kernel, by @tycho
  • cpuidle performance improvements (faster under load) by @tycho
  • fail2ban bugfixes, by @l2isbad
  • SNMP plugin new uses latest net-snmp and the corrupted 64 bit counters encountered under certain node.js version is now fixed.

dashboard improvements

  • easypiecharts and gauges can now render arbitrary ranges and animate clock wise or counter clock wise.

  • traditionally netdata was using 1024 bits = 1 kilobit. It is fixed: 1000 bits = 1 kilobit.

  • netdata charts should now work on wordpress pages.


alarms and notifications

  • alarm-notify.sh now supports debug mode, showing the exact commands it runs to send notifications, when export NETDATA_ALARM_NOTIFY_DEBUG=1

  • alarm-notify.sh now supports setting the sender email address of the emails it sends.

  • emails sent by alarm-notify.sh now include headers to reduce the possibility of them being scored as spam, by @Ferroin

  • network related alarms got new thresholds and improved badges

  • netdata now detects if the system has been suspended and pauses all alarms for 60 seconds on resume, to prevent false alarms (no more false alarms on laptops when they resume).

  • netdata alarms now support filtering based on hostname and O/S (linux, freebsd, macos). This means that netdata masters, can now support alarms for slaves of any O/S (i.e. a Linux netdata master can handle alarms for a FreeBSD slave).

  • netdata slack notifications now show the host sent the alarm. In the image below, the alarm is about bangalore, and is sent by netdata-build-server (at the lower left corner):

    image


statsd

  • the number of fractional points supported by statsd is now configurable (1 to 7).
  • 95th percentile calculation on statsd histograms and timers, was incorrectly averaging the values. It is now fixed.
  • statsd metrics with non ASCII text were processed by the statsd server, but were breaking JSON data generated by netdata. Fixed it by replacing all invalid characters.

@philwhineray philwhineray released this Jul 16, 2017 · 2443 commits to master since this release

Assets 20

New to netdata? Check its demo: https://my-netdata.io

User Base Monitored Servers Sessions Served

New Users Today New Machines Today Sessions Today


This is release v1.7 of netdata.

netdata is still spreading fast: we are at 320.000 users and 132.000 servers! Almost 100k new users, 52k new installations and 800k docker pulls since the previous release 4 and a half months ago! netdata user base grows at about 1000 new users and 600 new servers per day! Thank you! You are awesome!

The next release (v1.8) will be focused on providing a global health monitoring service, for all netdata users, for free! Read more about it here. We need supporters for this cause. Join us!

highlights of netdata v1.7

  1. netdata is now a (very fast) fully featured statsd server and the only one with automatic visualization: push a statsd metric and hit F5 on the netdata dashboard: your metric visualized. It also supports synthetic charts, defined by you, so that you can correlate and visualize your application the way you like it.

  2. netdata got new installation options - it is now easier than ever to install netdata - we also distribute a statically linked netdata x86_64 binary, including key dependencies (like bash, curl, etc) that can run everywhere a Linux kernel runs (CoreOS, CirrOS, etc).

  3. metrics streaming and replication has been improved significantly. All known issues have been solved and key enhancements have been added. headless collectors and proxies can now send metrics to backends when data source = as collected.

  4. backends have got quite a few enhancements, including host tags, metrics filtering at the netdata side and sending of chart and dimension names instread of IDs; prometheus support has been re-written to utilize more prometheus features and provide more flexibility and integration options. IF YOU UPDATE FROM NETDATA 1.6 PLEASE CHECK YOUR DASHBOARDS, SINCE MANY METRICS HAVE CHANGED NAMES.

  5. netdata now monitors ZFS (on Linux and FreeBSD), ElasticSearch, RabbitMQ, Go applications (via expvar), ipfw (on FreeBSD 11), samba, squid logs (with web_log plugin!).

  6. netdata dashboard loading times have been improved significantly (hit F5 a few times on a netdata dashboard - it is now amazingly fast), to support dashboards with thousands of charts.

  7. netdata alarms now support custom hooks, so you can run whatever you like in parallel with netdata alarms.

  8. As usual, this release brings dozens more improvements, enhancements and compatibility fixes.

netdata is now a fully featured statsd server

netdata is now a fully featured statsd server. It can collect statsd formatted metrics, visualize them on its dashboards, stream them to other netdata servers or archive them to backend time-series databases.

netdata statsd is fast. It can collect more than 1.200.000 metrics per second on modern hardware, more than 200Mbps of sustained statsd traffic. netdata statsd is inside netdata. This provides a distributed statsd implementation.

netdata also supports statsd synthetic charts: You can create dedicated sections on the dashboard to render the charts. You can control everything: the main menu, the submenus, the charts, the dimensions on each chart, etc.

Read more about netdata statsd

counters

  • Scope: count the events of something (e.g. number of file downloads)
  • Format: name:INTEGER|c or name:INTEGER|C or name|c
  • statsd increments the counter by the INTEGER number supplied (positive, or negative).

image

gauges

  • Scope: report the value of something (e.g. cache memory used by the application server)
  • Format: name:FLOAT|g
  • statsd remembers the last value supplied, and can increment or decrement the latest value if FLOAT begins with + or -.

image

histograms

  • Scope: statistics on a size of events (e.g. statistics on the sizes of files downloaded)
  • Format: name:FLOAT|h
  • statsd maintains a list of all the values supplied and provides statistics on them.

image

The same chart with sum unselected, to show the detail of the dimensions supported:
image

meters

This is identical to counter.

  • Scope: count the events of something (e.g. number of file downloads)
  • Format: name:INTEGER|m or name|m or just name
  • statsd increments the counter by the INTEGER number supplied (positive, or negative).

image

sets

  • Scope: count the unique occurrences of something (e.g. unique filenames downloaded, or unique users that downloaded files)
  • Format: name:TEXT|s
  • statsd maintains a unique index of all values supplied, and reports the unique entries in it.

image

timers

  • Scope: statistics on the duration of events (e.g. statistics for the duration of file downloads)
  • Format: name:FLOAT|ms
  • statsd maintains a list of all the values supplied and provides statistics on them.

image

The same chart with the sum unselected:
image


dashboard improvements

There have been significant optimizations to the loading times of the dashboard. The dashboard loads instantly now, even when there are several hundreds of charts in it (hit F5 on the dashboard - it is super fast).

For those who know: we eliminated most browser reflows, by refactoring the way the charts are initialized and splitting initialization in 2 phases. Unfortunately we had to re-shape gauge and easypiecharts, so pay some attention to your custom dashboards after updating.

We now use natural sorting on the dashboard elements (i.e. instead of 1, 10, 2, 3 we get 1, 2, 3, 10).

There have been dozens of performance improvements on the netdata dashboard. Like all the previous releases, this release makes netdata the fastest netdata so far!

new installation methods

  • Single line installation on Linux
  • Static 64bit packages for Linux
  • Improved support for Red Hat Enterprise Linux @racciari,
  • Improved support for Amazon Machine Image
  • Improved support for Centos @n0coast
  • Many more installer/updater improvements @nielsAD, @mfurlend

Streaming

  • improved self cleanup of obsolete charts and hosts at a central netdata.
  • host tags are now propagated from netdata to netdata while streaming metrics.
  • log error when multiple clients are streaming the metrics of the same host.
  • dozens more streaming improvements and bugfixes.

Backends

  • New prometheus backend, supporting all the features of the others backends netdata supports. The new format changed the names of metrics, so if you use grafana or other tools you will have to update your queries.
  • Prometheus and opentsdb now support host tags (advanced ephemeral nodes monitoring)
  • Metrics sent to backends with data source average, sum or volume (from the netdata database) are now more accurate.
  • Added contrib/nc-backend.sh, a script that can act as a fallback backend for graphite, opentsdb and compatibles.
  • netdata nodes without a database (slaves and proxies) can now send as collected metrics to backends.

New and improved plugins

  • Go apps monitoring via expvar ! @kralewitz
  • ElasticSearch monitoring ! @l2isbad
  • RabbitMQ monitoring ! @l2isbad
  • ipfw monitoring under FreeBSD 11 ! @vlvkobal
  • ZFS monitoring under FreeBSD (@vlvkobal) and Linux !
  • samba monitoring ! @ntlug
  • web_log plugin can now monitor squid logs too ! @l2isbad
  • web_log plugin can now monitor apache cache logs too (removed old apache_cache plugin) @l2isbad
  • many more web_log improvements - web_log is now a lot more powerful! @l2isbad
  • python.d.plugin LogService now supports monitoring web log files matching a pattern @l2isbad
  • disk monitoring under Linux now utilizes /dev/mapper names. It also has improved docker compatibility.
  • haproxy improvements @l2isbad
  • dns_query_time plugin to monitor the response time of nameservers @l2isbad
  • Fronius Solar @BrainDoctor
  • better support for monitoring Proxmox/qemu @efaden and libvirt/qemu VMs
  • cpufreq improvements @l2isbad
  • smartd_log improvements @pkoenig10
  • bind_rndc rewritten @l2isbad
  • lighttpd improvements (part of the apache plugin)
  • isc_dhcpd improvements @l2isbad
  • fping improvements
  • apps.plugin improvements (added many more applications to monitor, notably hadoop and friends, improved compatibility)
  • freeipmi improvements
  • mdstat improvements @l2isbad
  • mysql improvements @alibo
  • redis improvements @l2isbad
  • postgres rds fixes @facetoe
  • fail2ban improvements @l2isbad
  • idlejitter rewritten
  • openvpn improvements @l2isbad
  • numa improvements @Benje06

New and improved alarms

  • alarm-notify.sh now supports custom notification methods (you can hook whatever you like to netdata alarms).
  • email notifications are now multipart (have both HTML and text versions in them)
  • low memory alarm now excludes ZFS ARC.
  • improved discord notifications.
  • improved telegraf notifications @alibo
  • lighttpd alarm
  • mongodb alarm @jnogol

Other improvements

  • memory mode ram utilizes KSM (kernel memory deduper).
  • many memory mode map improvements for faster operation with huge databases.
  • netdata is now even faster on FreeBSD, thank to several optimization made by @vlvkobal
  • netdata can now be compiled with clang, even on FreeBSD
  • netdata can now be compiled on FreeBSD 10.3

@philwhineray philwhineray released this Mar 20, 2017 · 3142 commits to master since this release

Assets 14

New to netdata? Check its demo: https://my-netdata.io

User Base Monitored Servers Sessions Served

New Users Today New Machines Today Sessions Today

Release announced on twitter, hacker news, reddit r/linux, reddit r/sysadmin, reddit r/linuxadmin, reddit r/freebsd reddit r/devops reddir r/homelab facebook

birthday release: 1 year netdata

netdata was first published on March 30th, 2016.
It has been a crazy year since then:

225.000 unique netdata users
currently, at 1.000 new unique users per day
 
80.000 unique netdata installations
currently, at 500 new unique installations per day
 
610.000 docker pulls on docker hub

4.000.000 netdata sessions served
currently, at 15.000 unique netdata sessions served per day
 
20.000 github stars

Thank you!
You are awesome!

Central netdata is here!

This is the first release that supports real-time streaming of metrics between netdata servers.

netdata can now be:

  • autonomous host monitoring (like it always has been)
  • headless data collector (collect and stream metrics in real-time to another netdata)
  • headless proxy (collect metrics from multiple netdata and stream them to another netdata)
  • store and forward proxy (like headless proxy, but with a local database)
  • central database (metrics from multiple hosts are aggregated)

metrics databases can be configured on all nodes and each node maintaining a database may have a different retention policy and possibly run (even different) alarms on them.

There are 4 settings that control what netdata can be:

  1. [global].memory mode in netdata.conf, controls if a netdata will maintain a local database and the type of it. For more information check Running a dedicated central netdata server.

  2. [web].mode in netdata.conf, controls if netdata will expose its API, and the type of web server to enable (single or multi-threaded). Check netdata.conf configuration for streaming.

  3. [stream].enabled in stream.conf, controls if netdata will stream its metrics to another netdata. Check stream.conf for sending metrics.

  4. [API KEY].enabled in stream.conf, controls if netdata will accept metrics from other netdata. Check stream.conf for receiving metrics.

Using the above, we support a lot of different configurations, like these:

target memory
mode
web
mode
stream
enabled
send to
backend
local
alarms
local
dashboard
headless collector none none yes not possible not possible no
headless proxy none not none yes not possible not possible no
proxy with db not none not none yes possible possible yes
central netdata not none not none no possible possible yes

monitoring ephemeral nodes

netdata now supports monitoring autoscaled ephemeral nodes, that are started and stopped on demand (their IP is not known).

When the ephemeral nodes start streaming metrics to the central netdata, the central netdata will show register them at my-netdata menu on the dashboard, like this:

You can see this live at https://build.my-netdata.io (this server may not always be available for demo).

For more information check: monitoring ephemeral nodes.

monitoring ephemeral containers and VM guests

netdata now cleans up container, guest VM, network interfaces and mounted disk metrics, disabling automatically their alarms too.

For more information check monitoring ephemeral containers.

apps.plugin ported for FreeBSD

Vladimir Kobal has ported apps.plugin to FreeBSD.

netdata can now provide Applications, Users and User Groups under FreeBSD too:

Also, the CPU utilization of netdata under FreeBSD, is now a lot less compared to netdata v1.5.

See it live at our FreeBSD demo server.

web_log plugin

Ilya Mashchenko has done a wonderful job creating a unified web log parsing plugin for all kinds of web server logs. With it, netdata provides real-time performance information and health monitoring alarms for web applications and web sites!

Requests by http status:
image

Requests by http status code family:
image

Requests by http status code:
image

Requests bandwidth:
image

Requests timings:
image

URL patterns of interest (you configure the patterns):
image

Requests by http method:
image

Requests by IP version:
image

Number of unique clients:
image

and a lot more, including alarms:

alarm description minimum
requests
warning critical
1m_redirects The ratio of HTTP redirects (3xx except 304) over all the requests, during the last minute.
 
Detects if the site or the web API is suffering from too many or circular redirects.
 
(i.e. oops! this should not redirect clients to itself)
120/min > 20% > 30%
1m_bad_requests The ratio of HTTP bad requests (4xx) over all the requests, during the last minute.
 
Detects if the site or the web API is receiving too many bad requests, including 404, not found.
 
(i.e. oops! a few files were not uploaded)
120/min > 30% > 50%
1m_internal_errors The ratio of HTTP internal server errors (5xx), over all the requests, during the last minute.
 
Detects if the site is facing difficulties to serve requests.
 
(i.e. oops! this release crashes too much)
120/min > 2% > 5%
5m_requests_ratio The percentage of successful web requests of the last 5 minutes, compared with the previous 5 minutes.
 
Detects if the site or the web API is suddenly getting too many or too few requests.
 
(i.e. too many = oops! we are under attack)
(i.e. too few = oops! call the network guys)
120/5min > double or < half > 4x or < 1/4x
web_slow The average time to respond to requests, over the last 1 minute, compared to the average of last 10 minutes.
 
Detects if the site or the web API is suddenly a lot slower.
 
(i.e. oops! the database is slow again)
120/min > 2x > 4x
1m_successful The ratio of successful HTTP responses (1xx, 2xx, 304) over all the requests, during the last minute.
 
Detects if the site or the web API is performing within limits.
 
(i.e. oops! help us God!)
120/min < 85% < 75%

For more information check: the spectacles of a web server log file.

backends

netdata can now archive metrics to JSON backends (both push, by @lfdominguez, and pull modes).

IPMI monitoring

netdata now has an IPMI plugin (based on freeipmi) for monitoring server hardware.

The plugin creates (up to) 8 charts, based on the information collected from IPMI:

  1. number of sensors by state
  2. number of events in SEL
  3. Temperatures CELCIUS
  4. Temperatures FAHRENHEIT
  5. Voltages
  6. Currents
  7. Power
  8. Fans

It also supports alarms (including the number of sensors in critical state):

image

For more information, check monitoring IPMI.

New Plugins

Ilya Mashchenko builds python data collection plugins for netdata at an wonderfull rate! He rocks!

Improved Plugins

  • nfacct reworked and now collects connection tracker information using netlink.
  • ElasticSearch re-worked @l2isbad
  • mysql re-worked to allow faster development of custom mysql based plugins (MySQLService) @l2isbad
  • SNMP
  • tomcat @NMcCloud
  • ap (monitoring hostapd access points)
  • php_fpm @l2isbad
  • postgres @l2isbad
  • isc_dhcpd @l2isbad
  • bind_rndc @l2isbad
  • numa
  • apps.plugin improvements and freebsd support @vlvkobal
  • fail2ban @l2isbad
  • freeradius @l2isbad
  • nut (monitoring UPSes)
  • tc (Linux QoS) now works on qdiscs instead of classes for the same result (a lot faster) @t-h-e
  • varnish @l2isbad

New and Improved Alarms

  • web_log, many alarms to detect common web site/API issues
  • fping, alarms to detect packet loss, disconnects and unusually high latency
  • cpu, cpu utilization alarm now ignores nice

New and improved alarm notification methods

Dashboard Improvements

  • dashboard now works on HiDPi screens
  • dashboard now shows version of netdata
  • dashboard now resets charts properly
  • dashboard updated to use latest gauge.js release

Other Improvements

  • thanks to @rlefevre netdata now uses a lot of different high resolution system clocks.

netdata has received a lot more improvements from many more contributors! (it was really a lot of work to dig into git log to collect all the above, so forgive me if I forgot to mention a few contributions and contributors).

Thank you all!

@ktsaou ktsaou released this Jan 22, 2017 · 3844 commits to master since this release

Assets 14

New to netdata? Check its demo: http://my-netdata.io

User Base Monitored Servers Sessions Served

New Users Today New Machines Today Sessions Today

Release announced on twitter, hacker news, reddit r/linux, reddit r/sysadmin, reddit r/linuxadmin, reddit r/freebsd

Yet another release that makes netdata the fastest netdata ever!

This is probably the release with the largest changeset so far. A lot of work, by a lot of people made this release possible!

FreeBSD, MacOS and FreeNAS

Vladimir Kobal has done a magnificent work porting netdata to FreeBSD and MacOS.

Everything works:

  • cpu and interrupts, memory, disks (performance and space monitoring)
  • network interfaces and softnet
  • IPv4 and IPv6 metrics
  • processes and context switches
  • IPC (queues, semaphores, shared memory)
  • and of course all the netdata external plugins

Wow! Check it live on FreeBSD, at https://freebsd.my-netdata.io/

Backends

netdata supports data archiving to backend databases:

  • Graphite
  • OpenTSDB
  • Prometheus

and of course all the compatible ones (KairosDB, InfluxDB, Blueflood, etc)

image

With this feature netdata can interface with your existing devops infrastructure and allow you to visualize its metrics with other tools, like grafana.

New Plugins

Ilya Mashchenko has created most of the python data collection plugins in this release! He rocks!

  • Systemd Services (real-time monitoring of the resource utilization of all systemd services, using cgroups!)
  • FPing (network latency and jitter monitoring with netdata!)
  • Postgres databases @facetoe, @moumoul
  • Vanish disk cache (v3 and v4) @l2isbad
  • ElasticSearch @l2isbad
  • HAproxy @l2isbad
  • FreeRadius @l2isbad, @lgz
  • mdstat (RAID) @l2isbad
  • ISC bind (via rndc) @l2isbad
  • ISC dhcpd @l2isbad, @lgz
  • Fail2Ban @l2isbad
  • OpenVPN status log @l2isbad, @lgz
  • NUMA memory @tycho
  • CPU Idle States @tycho
  • gunicorn @deltaskelta
  • ECC memory hardware errors
  • IPC semaphores
  • uptime ( with a nice badge too: uptime badge )

Improved Plugins

New and Improved Alarms

  • MySQL/MariaDB alarms (incl. replication)
  • IPFS alarms
  • HAproxy alarms
  • UDP buffer alarms
  • TCP AttemptFails
  • ECC memory alarms
  • netfilter connections alarms

New Alarm Notification Methods

Shell Integration

Shell scripts can now query netdata easily!

eval "$(curl -s 'http://localhost:19999/api/v1/allmetrics')"

after this command, all the netdata metrics are exposed to shell. Check:

# source the metrics
eval "$(curl -s 'http://localhost:19999/api/v1/allmetrics')"

# let's see if there are variables exposed by netdata for system.cpu
set | grep "^NETDATA_SYSTEM_CPU"

NETDATA_SYSTEM_CPU_GUEST=0
NETDATA_SYSTEM_CPU_GUEST_NICE=0
NETDATA_SYSTEM_CPU_IDLE=95
NETDATA_SYSTEM_CPU_IOWAIT=0
NETDATA_SYSTEM_CPU_IRQ=0
NETDATA_SYSTEM_CPU_NICE=0
NETDATA_SYSTEM_CPU_SOFTIRQ=0
NETDATA_SYSTEM_CPU_STEAL=0
NETDATA_SYSTEM_CPU_SYSTEM=1
NETDATA_SYSTEM_CPU_USER=4
NETDATA_SYSTEM_CPU_VISIBLETOTAL=5

# let's see the total cpu utilization of the system
echo ${NETDATA_SYSTEM_CPU_VISIBLETOTAL}
5

# what about alarms?
set | grep "^NETDATA_ALARM_SYSTEM_SWAP_"
NETDATA_ALARM_SYSTEM_SWAP_RAM_IN_SWAP_STATUS=CRITICAL
NETDATA_ALARM_SYSTEM_SWAP_RAM_IN_SWAP_VALUE=53
NETDATA_ALARM_SYSTEM_SWAP_USED_SWAP_STATUS=CLEAR
NETDATA_ALARM_SYSTEM_SWAP_USED_SWAP_VALUE=51

# let's get the current status of the alarm 'ram in swap'
echo ${NETDATA_ALARM_SYSTEM_SWAP_RAM_IN_SWAP_STATUS}
CRITICAL

# is it fast?
time curl -s 'http://localhost:19999/api/v1/allmetrics' >/dev/null

real  0m0,070s
user  0m0,000s
sys   0m0,007s

# it is...
# 0.07 seconds for curl to be loaded, connect to netdata and fetch the response back...

The _VISIBLETOTAL variable sums up all the dimensions of each chart.

The format of the variables is:

NETDATA_${chart_id^^}_${dimension_id^^}="${value}"

The value is rounded to the closest integer, since shell script cannot process decimal numbers.

Dashboard Improvements

  • dashboard is now faster on firefox, safari, opera, edge (edge is still the slowest)
  • dashboard charts legends now have bigger fonts
  • SHIFT + mousewheel to zoom charts, works on all browsers
  • perfect-scrollbar on the dashboard
  • dashboard 4K resolution fixes
  • dashboard compatibility fixes for embedding charts in third party web sites
  • charts on custom dashboards can have common min/max even if they come from different netdata servers
  • alarm log is now saved and loaded back so that the alarm history is available at the dashboard

Other Improvements

  • python.d.plugin has received way to many improvements from many contributors!
  • charts.d.plugin can now be forked to support multiple independent instances
  • registry has been re-factored to lower its memory requirements (required for the public registry)
  • simple patterns in cgroups, disks and alarms
  • netdata-installer.sh can now correctly install netdata in containers
  • supplied logrotate script compatibility fixes
  • spec cleanup @breed808
  • clocks and timers reworked @rlefevre

netdata has received a lot more improvements from many more contributors! (it was really a lot of work to dig into git log to collect all the above, so forgive me if I forgot to mention a few contributions and contributors).

Thank you all!

@ktsaou ktsaou released this Oct 3, 2016 · 4860 commits to master since this release

Assets 14

New to netdata? Check its demo: http://my-netdata.io

User Base Monitored Servers Sessions Served

New Users Today New Machines Today Sessions Today

Release announced on Hacker News
Release announced on reddit r/linux
Release announced on reddit r/sysadmin
Release announced on twitter

At a glance

  • the fastest netdata ever (with a better look too)!
  • improved IoT and containers support!
  • alarms improved in almost every way!
  • new plugins:
    • softnet netdev,
    • extended TCP metrics,
    • UDPLite
    • NFS v2, v3 client (server was there already),
    • NFS v4 server & client,
    • APCUPSd,
    • RetroShare
  • improved plugins:
    • mysql,
    • cgroups,
    • hddtemp,
    • sensors,
    • phpfm,
    • tc (QoS)

In detail

improved alarms!

Many new alarms have been added to detect common kernel configuration errors and old alarms have been re-worked to avoid notification floods.

Alarms now support:

  • notification hysteresis (both static and dynamic)

    image

  • notification self-cancellation, and

  • dynamic thresholds based on current alarm status

    image

Also, a new alarms log:

image

improved alarm notifications

netdata now supports:

  • email notifications
  • slack.com notifications on slack channels
  • pushover.net notifications (mobile push notifications)
  • telegram.org notifications

For all the above methods, netdata supports role-based notifications, with multiple recipients for each role and severity filtering per recipient!

Also, netdata support HTML5 notifications, while the dashboard is open in a browser window (no need to be the active one).

image

All notifications (HTML5, emails, slack, pushover, telegram) are now clickable to get to the chart that raised the alarm.

other improvements

  • improved IoT support!

    netdata builds and runs with musl libc and runs on systems based on busybox.

  • improved containers support!

    netdata runs on alpine linux (a low profile linux distribution used in containers).

  • Dozens of other improvements and bugfixes


netdata 1.4.0 - download release tarfiles from http://firehol.org/download/netdata/releases/v1.4.0

@ktsaou ktsaou released this Aug 27, 2016 · 5262 commits to master since this release

Assets 14

New to netdata? Check its demo: http://my-netdata.io

User Base Monitored Servers Sessions Served

New Users Today New Machines Today Sessions Today

At a glance

  1. netdata has health monitoring / alarms!
  2. netdata generates badges that can be embeded anywhere!
  3. netdata plugins are now written in python!
  4. new plugins: redis, memcached, nginx_log, ipfs, apache_cache

IMPORTANT:
Since netdata now uses python plugins, new packages are
required to be installed on a system to allow it work.
For more information, please check the installation page.

In detail

netdata has alarms!

Based on the POLL we made on github, health monitoring was the winner. So here it is!

netdata now has a powerful health monitoring system embedded.

image

netdata has badges!

netdata can generate badges with live information from the collected metrics.

netdata plugins are now written in python!

Thanks to the great work of Paweł Krupa (@paulfantom), most BASH plugins have been ported to python.

The new python.d.plugin supports both python2 and python3 and data collection from multiple sources for all modules.

The following pre-existing modules have been ported to python:

  • apache
  • cpufreq
  • example
  • exim
  • hddtemp
  • mysql
  • nginx
  • phpfm
  • postfix
  • sensors
  • squid
  • tomcat

The following new modules have been added:

  • apache_cache
  • dovecot
  • ipfs
  • memcached
  • nginx_log
  • redis

other data collectors

Thanks to @simonnagl netdata now reports disk space usage.

other improvements

  • dashboards now transfer certain settings from server to server when changing servers via the my-netdata menu.

    The settings transferred are the dashboard theme, the online help status and current pan and zoom timeframe of the dashboard.

  • API improvements:

    • reduction functions now support 'min', 'sum' and 'incremental-sum'.
    • netdata now offers a multi-threaded and a single threaded web server (single threaded is better for IoT).
  • apps.plugin improvements:

    • can now run with command line argument 'without-files' to prevent it from enumating all the open files/sockets/pipes of all running processes.
    • apps.plugin now scales the collected values to match the
      the total system usage.
    • apps.plugin can now report guest CPU usage per process.
    • repeating errors are now logged once per process.
  • netdata now runs with IDLE process priority (lower than nice 19)

  • netdata now instructs the kernel to kill it first when it starves for memory.

  • netdata listens for signals:

    • SIGHUP to netdata instructs it to re-open its log files (new logrotate file added too).
    • SIGUSR1 to netdata saves the database
    • SIGUSR2 to netdata reloads health / alarms configuration
  • netdata can now bind to multiple IPs and ports.

  • netdata now has new systemd service file (it starts as user netdata and does not fork).

  • Dozens of other improvements and bugfixes

netdata 1.3.0 - download release tarfiles from http://firehol.org/download/netdata/releases/v1.3.0

@ktsaou ktsaou released this May 16, 2016 · 6228 commits to master since this release

Assets 14

Netdata demo sites: http://my-netdata.io

At a glance

  1. netdata now is 30% faster !
  2. netdata now has a registry (my-netdata dashboard menu) !
  3. netdata now monitors Linux Containers (cgroups, docker, lxc, etc) !

IMPORTANT:
This version requires libuuid. The package you need to build netdata is:

  • uuid-dev (debian/ubuntu), or
  • libuuid-devel (centos/fedora/redhat)

In detail

netdata is now 30% faster !

  • Patches submitted by @fredericopissarra improved overall netdata performance by 10%.
  • A new improved search function in the internal indexes made all searches faster by 50%, resulting in about 20% better performance for the core of netdata.
  • More efficient threads locking in key components contributed to the overall speed up.

netdata now has a central registry !

The central registry tracks all your netdata servers and bookmarks them for you at the my-netdata menu on all dashboards.

Every netdata can act as a registry, but there is also a global registry provided for free for all netdata users!

netdata now monitors Linux Containers !

docker, lxc, or anything else. For each container it monitors CPU, RAM, DISK I/O (network interfaces were already monitored).

Other improvements

  • apps.plugin: now uses linux capabilities by default without setuid to root
  • netdata has now an improved signal handler thanks to @simonnagl
  • API: new improved CORS support
  • SNMP: counter64 support fixed
  • MYSQL: more charts, about QCache, MyISAM key cache, InnoDB buffer pools, open files
  • DISK charts now show mount point when available
  • Dashboard: improved support for older web browsers and mobile web browsers (thanks to @simonnagl)
  • Multi-server dashboards now allow de-coupled refreshes for each chart, so that if one netdata has a network latency the other charts are not affected
  • Dozens of other improvements, optimizations and bug-fixes.

netdata 1.2.0 - download release tarfiles also from http://firehol.org/download/netdata/releases/v1.2.0

@ktsaou ktsaou released this Apr 20, 2016 · 6454 commits to master since this release

Assets 14

netdata 1.1.0 - download release tarfiles from http://firehol.org/download/netdata/releases/v1.1.0

Dozens of commits that improve netdata in several ways:

Data collection

  • added IPv6 monitoring
  • added SYNPROXY DDoS protection monitoring
  • apps.plugin: added charts for users and user groups
  • apps.plugin: grouping of processes now support patterns
  • apps.plugin: now it is faster, after the new features added
  • better auto-detection of partitions for disk monitoring
  • better fireqos intergation for QoS monitoring
  • squid monitoring now uses squidclient
  • SNMP monitoring now supports 64bit counters

API

  • fixed issues in CSV output generation
  • netdata can now be restricted to listen on a specific IP (API and web server)

Core

  • added error log flood protection

Web Dashboard

  • better error handling when the netdata server is unreachable
  • each chart now has a toolbox
  • on-line help support
  • check for netdata updates button
  • added example /tv.html dashboard
  • now compiles with musl libc (alpine linux)

Packaging

  • added debian packaging
  • support non-root installations
  • the installer generates uninstall script