Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generalize kernel monitor. #44

Closed
Random-Liu opened this issue Nov 22, 2016 · 4 comments
Closed

Generalize kernel monitor. #44

Random-Liu opened this issue Nov 22, 2016 · 4 comments

Comments

@Random-Liu
Copy link
Member

Discussed with @apatil.

Kernel monitor was initially introduced to monitor kernel log and detect kernel issues.

However, in fact it could be extended to monitor other logs such as docker log, systemd log etc. by adding new translator. Currently it is already doable, but not very intuitive because:

  1. All files, types and functions are named as kernel xxx.
  2. Translator is not configurable.

We should refactor the code to make it easier and more intuitive to extend kernel monitor:

  • Change kernelmonitor to logmonitor. We'll only use log monitor to monitor kernel log for K8s, but it should be easy for other users to reconfigure and extend it to monitor other logs.
  • Extend the configuration to make translator and log source configurable after logwatchers: add new kmsg-based kernel log watcher #41 landed, including:
    • Make the journald log filter configurable.
    • Make the translate function configurable.

/cc @kubernetes/sig-node

@jfilak
Copy link

jfilak commented Nov 22, 2016

With systemd-journal plugin enabled you can monitor the node for user space core files (either generated by ABRT or systemd-coredump). ABRT also logs uncaught Python, Ruby and Java exceptions to systemd-journal (systemd developers are working on their own Python exception handler). In Kernel you don't need to limit yourself to oopses, you can also detect Hardware issues in the form of MCEs and pstore oopses.

@Random-Liu
Copy link
Member Author

Random-Liu commented Dec 15, 2016

@jfilak Thanks for your suggestion.

  • On one hand, we'd like NPD to detect more issues. As you mentioned, MCEs and pstore are all very useful signals for bare metal.
  • On the other hand, we'd also like to keep the default behavior minimum so as to make it light-weight. After all, problem is rare case, we should only catch problems people really care and avoid consuming too much resource.

That's why we want to make NPD as configurable and plug-able as possible. I'd like to:

  • Make kernel log monitor more generic on the premise of efficiency. So that people can use it to monitor different kinds of system logs if they really need the last line of defense. However, the default behavior should be only monitoring kernel log to detect known kernel issues. If people have configurations or code for bare metal or other environments, it's welcome to share!
  • Integrate with different problem detection solution.

Any ideas and suggestions are welcome!

@Random-Liu
Copy link
Member Author

Random-Liu commented Feb 7, 2017

We plan to add arbitrary log support in 1.6, there are following working items need to be done:

@Random-Liu
Copy link
Member Author

Close since this is done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants