Generalize kernel monitor. #44

Random-Liu · 2016-11-22T00:43:35Z

Discussed with @apatil.

Kernel monitor was initially introduced to monitor kernel log and detect kernel issues.

However, in fact it could be extended to monitor other logs such as docker log, systemd log etc. by adding new translator. Currently it is already doable, but not very intuitive because:

All files, types and functions are named as kernel xxx.
Translator is not configurable.

We should refactor the code to make it easier and more intuitive to extend kernel monitor:

Change kernelmonitor to logmonitor. We'll only use log monitor to monitor kernel log for K8s, but it should be easy for other users to reconfigure and extend it to monitor other logs.
Extend the configuration to make translator and log source configurable after logwatchers: add new kmsg-based kernel log watcher #41 landed, including:
- Make the journald log filter configurable.
- Make the translate function configurable.

/cc @kubernetes/sig-node

The text was updated successfully, but these errors were encountered:

jfilak · 2016-11-22T09:42:24Z

With systemd-journal plugin enabled you can monitor the node for user space core files (either generated by ABRT or systemd-coredump). ABRT also logs uncaught Python, Ruby and Java exceptions to systemd-journal (systemd developers are working on their own Python exception handler). In Kernel you don't need to limit yourself to oopses, you can also detect Hardware issues in the form of MCEs and pstore oopses.

Random-Liu · 2016-12-15T20:03:09Z

@jfilak Thanks for your suggestion.

On one hand, we'd like NPD to detect more issues. As you mentioned, MCEs and pstore are all very useful signals for bare metal.
On the other hand, we'd also like to keep the default behavior minimum so as to make it light-weight. After all, problem is rare case, we should only catch problems people really care and avoid consuming too much resource.

That's why we want to make NPD as configurable and plug-able as possible. I'd like to:

Make kernel log monitor more generic on the premise of efficiency. So that people can use it to monitor different kinds of system logs if they really need the last line of defense. However, the default behavior should be only monitoring kernel log to detect known kernel issues. If people have configurations or code for bare metal or other environments, it's welcome to share!
Integrate with different problem detection solution.

Any ideas and suggestions are welcome!

Random-Liu · 2017-02-07T01:23:20Z

We plan to add arbitrary log support in 1.6, there are following working items need to be done:

Refactor kernel monitor to make it support arbitrary log monitoring. Add arbitray system log support #88
Refactor the code to get rid of kernel log specific names. Generalize the kernel monitor code. #92
Add multiple log watcher support, so that NPD and watch logs of different system daemons and detect problems Add multiple system log monitor support #94.

Random-Liu · 2017-02-27T22:49:46Z

Close since this is done.

Random-Liu added the enhancement label Nov 22, 2016

Random-Liu mentioned this issue Dec 8, 2016

Journald support #39

Merged

Random-Liu mentioned this issue Jan 7, 2017

NPD Kubernetes 1.6 Planning #58

Closed

11 tasks

This was referenced Feb 3, 2017

Add arbitray system log support #88

Merged

Generalize the kernel monitor code. #92

Merged

Random-Liu mentioned this issue Feb 8, 2017

Add multiple system log monitor support #94

Merged

Random-Liu closed this as completed Feb 27, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generalize kernel monitor. #44

Generalize kernel monitor. #44

Random-Liu commented Nov 22, 2016

jfilak commented Nov 22, 2016

Random-Liu commented Dec 15, 2016 •

edited

Loading

Random-Liu commented Feb 7, 2017 •

edited

Loading

Random-Liu commented Feb 27, 2017

Generalize kernel monitor. #44

Generalize kernel monitor. #44

Comments

Random-Liu commented Nov 22, 2016

jfilak commented Nov 22, 2016

Random-Liu commented Dec 15, 2016 • edited Loading

Random-Liu commented Feb 7, 2017 • edited Loading

Random-Liu commented Feb 27, 2017

Random-Liu commented Dec 15, 2016 •

edited

Loading

Random-Liu commented Feb 7, 2017 •

edited

Loading