-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Faster parents #16127
Merged
Merged
Faster parents #16127
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
github-actions
bot
added
area/collectors
Everything related to data collection
area/database
collectors/plugins.d
area/daemon
area/tests
area/streaming
labels
Oct 4, 2023
ktsaou
force-pushed
the
faster-parents
branch
from
October 23, 2023 08:52
0d97a1d
to
a0b3a1d
Compare
…e pluginsd_acquire_dimension()
…during reading them
…ms if the buffer contains DATA only.
… sender buffer has been committed, so that replication will not send dimensions prematurely
ktsaou
force-pushed
the
faster-parents
branch
from
October 25, 2023 17:54
1ad4751
to
ddb1fda
Compare
stelfrag
approved these changes
Oct 27, 2023
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
area/collectors
Everything related to data collection
area/daemon
area/database
area/exporting
area/streaming
area/tests
collectors/cgroups
collectors/diskspace
collectors/plugins.d
collectors/proc
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Current master on parent receiver:
This PR on parent receiver:
To understand the difference, check the width of
quoted_strings_splitter()
in the 2 charts.On the second this function (that has't changed in this PR) is many times larger than in the first.
This means the rest of the code is now many times faster.
Simple optimizations to increase the efficiency of busy parents.
rd
andrd->id
(as const char) together withrda
for speeding up dimension lookup by pluginsd.collected
flags insideRRDDIM
andRRDSET
to avoid calling rrdcontexts to update the collected status on every data collection.SLOTS
. The new protocol requires for the sender to number uniquely all the RRDSET and RRDDIM it sends. The numbers are used to help the receiver quickly find the RRDSET and RRDDIM pointers.Comparison: 2.7 million metrics per second, Netdata vs Prometheus
In this setup, both Netdata and Prometheus are configured to collect the same 2.5 million metrics per second from 500 Netdata children. To test similar functionality, we disabled ML and Health at netdata.conf of the Netdata parent.
CPU utilization
Prometheus has a huge spike every 2 minutes, utilizing almost all CPU cores available on the system (both VMs have 24 cores available).
Memory consumption
As far as memory consumption is concerned:
Disk footprint
Netdata total, including data and metadata is 1 TiB.
Disk I/O
Each of the VMs has each own physical disk (so that we can measure the disk I/O of each VM). In the following screenshot, Prometheus is using
sdd
and Netdata is usingzd16
:As you can see, Prometheus is really stressing the disks at this scale, possibly due to its WAL. Netdata achieves the same safety against data loss by re-streaming its metrics to another Netdata Parent (when configured to do so).
Network bandwidth
Netdata reception is 380Mbps.
Prometheus reception is 240Mbps.
Netdata is using LZ4 compression on a much more compact communication, while Prometheus uses gzip/deflate on a more chatty communication. However, the compression efficiency of gzip is quite higher than LZ4.
In PR #16268 we add ZSTD streaming support in Netdata, to see how its bandwidth changes.