implement support for NUMA metrics #1552

Merged
merged 2 commits into from Jan 18, 2017

Projects

None yet

2 participants

@tycho
Contributor
tycho commented Jan 12, 2017

These changes implement support for NUMA locality statistics (from CONFIG_NUMA and CONFIG_NUMA_BALANCING).

Automatically detects if you have a system with more than one NUMA node and enables the graphs in such a case.

Here's a demo of the charts when running STREAM with 32GB of RAM and OpenMP support:

numa1

src/sys_devices_system_node.c
+ DIR *dir = opendir(dirname);
+ if(!dir) {
+ error("Cannot read NUMA node directory '%s'", dirname);
+ return;
@ktsaou
ktsaou Jan 12, 2017 Member

this should be return 0;

@ktsaou
ktsaou Jan 12, 2017 Member

travis is complaining about this too

@tycho
tycho Jan 12, 2017 Contributor

Good point. Fixing.

@ktsaou
Member
ktsaou commented Jan 12, 2017

nice work!

@tycho
Contributor
tycho commented Jan 12, 2017

Alright, fixed. Travis is passing too. 😄

@ktsaou
Member
ktsaou commented Jan 12, 2017

hm... check this:

==10302== 32,816 bytes in 1 blocks are definitely lost in loss record 10,777 of 10,790
==10302==    at 0x4C29F81: malloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==10302==    by 0x581F251: __alloc_dir (opendir.c:247)
==10302==    by 0x581F343: opendir_tail (opendir.c:145)
==10302==    by 0x46CF83: do_proc_vmstat (proc_vmstat.c:617)
==10302==    by 0x45723F: proc_main (plugin_proc.c:248)
==10302==    by 0x55545D4: start_thread (pthread_create.c:334)
==10302==    by 0x58537EC: clone (clone.S:109)

To reproduce:

  1. compile/install with CFLAGS="-O1 -ggdb -Wall -Wextra -fstack-protector-all -DNETDATA_INTERNAL_CHECKS=1" ./netdata-installer.sh

  2. start it with valgrind --leak-check=full ./src/netdata -D

  3. leave it running for a while, then hit control-C

@tycho
Contributor
tycho commented Jan 12, 2017

Whoops, forgot to closedir().

@tycho
Contributor
tycho commented Jan 12, 2017

Alright. closedir() added. Should be happy now.

@ktsaou
Member
ktsaou commented Jan 12, 2017

I would like to make a few changes, to optimize it a bit:

  1. in sys_devices_system_node.c turn int numa_node_count = 0; to a global variable.
  2. add it to plugin_proc.h as extern, so that it is exposed to all other modules
  3. remove the code to find numa_node_count from proc_vmstat.c
  4. make sure vmstat is run after numa in proc plugin (it is already).
  5. the line else if (strncmp(name, "numa_", 5) == 0) in do_proc_vmstat is very expensive. It is called for every line in the file. Given the above changes, I think it can be else if(unlikely(numa_node_count))
  6. the line if(do_numa == CONFIG_ONDEMAND_YES || (do_numa == CONFIG_ONDEMAND_ONDEMAND && numa_node_count > 1 && saw_numa)) { I think should be:
if(numa_node_count) {
  if(do_numa == CONFIG_ONDEMAND_YES || (do_numa == CONFIG_ONDEMAND_ONDEMAND && (numa_local || numa_foreign || numa_interleave || numa_other || numa_pte_updates || numa_huge_pte_updates || numa_hint_faults || numa_hint_faults_local || numa_pages_migrated))) {

The above will examine numa only if there are nodes and on-demand will be switched to true only if the collected values are not zero.

Do you agree?

PS: Which kernel version do you use? I see a lot of vmstat names are removed and new ones have been added.

@tycho
Contributor
tycho commented Jan 13, 2017

Happy to work on improving it!

1 through 4. I agree that I should unify the numa_node_count code, but I disagree with how. Instead of ensuring some kind of specific ordering, just let the numa_node_count value get populated by whichever site needs it first. Either way, the enumeration to determine numa_node_count would happen only once.
5. I think the whole proc_vmstat.c file needs some rework. Doing a linear search for integer hash matches is pretty costly too, especially since a lot of the lines in /proc/vmstat won't match any of the hashes. I'll consider how to change it and play around with that.
6. I disagree. Adding more conditions isn't going to help performance at all, and just makes it painful if we ever add more numa_* values. It'd be cheaper to just do the string compare at that rate.

I'm running Linux 4.9.2 currently. The values that I removed were effectively non-functional changes since they're not collected by netdata anyway. And I just added all the ones that were present on my kernel.

@tycho
Contributor
tycho commented Jan 13, 2017

By the way, the condition should definitely not be if(numa_node_count)! The > 1 is deliberate. To make it more explicitly clear, I'll change it to >= 2. The reason is this -- if CONFIG_NUMA is enabled in the kernel config, and the system is single-socket without NUMA, then the kernel will treat it as a single-node NUMA system. The NUMA stats in such a case are completely uninteresting since all memory accesses are "local".

@ktsaou
Member
ktsaou commented Jan 13, 2017 edited

ok nice!

Yesterday I reworked /proc/net/dev parsing, for speed: https://github.com/firehol/netdata/blob/master/src/proc_net_dev.c

You can also use avl trees. plugic_tc uses them: https://github.com/firehol/netdata/blob/master/src/plugin_tc.c

There is also a sequential array matching in https://github.com/firehol/netdata/blob/master/src/proc_net_netstat.c

On my tests, a system can do several million number comparisons per second, but a lot less string comparisons per second. Comparing a number is a lot more efficient, than calling a function.

@tycho
Contributor
tycho commented Jan 13, 2017

It's most often not a function call these days. The compiler has builtins for strcmp() and many other C standard library functions. But yes, a string compare cannot win against a single cmp instruction. That's one of the nice things about hashing (which we already do in there). The main issue is that every time we hit a line that fails to match a hash, we also run through numerous compare/branch pairs.

An AVL tree would be perfect, actually! I may rework the proc_vmstat.c code to use that. It'd likely be faster because of the fast-failure nature of binary search, and it'd probably be more maintainable and easier on the eyes... I'll have to dig into that option tomorrow. I need sleep right now, it's midnight here. 😄

@ktsaou
Member
ktsaou commented Jan 13, 2017 edited

ok. Keep in mind the proc_net_dev way might be optimal for vmstat (minimal comparisons, since the linked list is examined from the point the last metric was found: https://github.com/firehol/netdata/blob/1a83793af847d334c5f7efd74cb1eeca609c7796/src/proc_net_dev.c#L78-L84)

The challenge with vmstat, would be to maintain RRDIM pointers, so that updating the charts will not have to search for the metrics.

@ktsaou
Member
ktsaou commented Jan 14, 2017

I changed pluginc_proc.c and now this breaks. sorry... (plugin_proc is now a lot better).

@tycho
Contributor
tycho commented Jan 14, 2017

I fixed the conflicts for now.

I think the cleanup of proc_vmstat.c could be a separate task from this. I killed the offending strncmp(), but we're still linearly searching for hashes for the moment.

I do agree the proc_net_dev way of doing it sounds like it could be better than an AVL tree.

tycho added some commits Jan 12, 2017
@tycho tycho proc_vmstat.c: update commented list of metrics from /proc/vmstat, NFC
These are the metrics available on one of my systems running Linux 4.9.2
with this[1] kernel config.

This commit should not introduce any functional changes, and is only
a preparatory step for adding NUMA, NUMA balancing, and other stats.

[1] https://git.uplinklabs.net/steven/projects/archlinux/ec2/ec2-packages.git/tree/linux-ec2/config.x86_64?id=7a8cae7f38faf6f422095c53f2762092e788a44c

Signed-off-by: Steven Noonan <steven@uplinklabs.net>
cb3a270
@tycho tycho add support for tracking NUMA locality metrics
Note that the 'hit' and 'miss' metrics are not used, because they are
sums of other metrics in the same file.

Signed-off-by: Steven Noonan <steven@uplinklabs.net>
61b0832
@ktsaou ktsaou merged commit 2ae1525 into firehol:master Jan 18, 2017

2 checks passed

codeclimate no new or fixed issues
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@ktsaou
Member
ktsaou commented Jan 18, 2017

merged!
thank you!

@ktsaou
Member
ktsaou commented Jan 19, 2017 edited

I have been trying to find a way to speed up parsing of long name-value like files, like vmstat and memory_stat (cgroups).

Check this:

test1() in 1609349 usecs: simple system strcmp().
test2() in 591239 usecs: inline simple_hash() with system strtoull().
test3() in 584816 usecs: statement expression simple_hash(), system strtoull().
test4() in 596122 usecs: inline simple_hash(), if-continue checks.
test5() in 328788 usecs: inline simple_hash(), if-else-if-else-if (netdata default).
test6() in 91777 usecs: adaptive re-sortable array (wow!)

test2(), test3() and test4() are used in netdata.
With minor adjustments, we can move to test5() - I have started this.

But check test6 ! This one is completely adaptive to the contents of the file, by keeping a sorted linked list in memory, in the same order found in the file, and making the absolute minimum on each iteration!

sample here: https://ghostbin.com/paste/aydfv

cc: @simonnagl

@ktsaou
Member
ktsaou commented Jan 19, 2017 edited

test5() and test6() also use a custom str2ull() that can be inlined, to avoid the function call.

@ktsaou
Member
ktsaou commented Jan 19, 2017

the trick with test6() is that it sorts the linked list every time it reads a line from the file, so that after a single loop on the the entire file, the linked list matches exactly the order and the contents of the file (it adds entries in the linked list, as needed).

At the next iteration, the linked list matches exactly the contents of the file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment