New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OSD daemon logs are not being collected #2479
Comments
I'm unable to repro this with Rook master and Ceph 13.2.2. Is this an issue with Rook 0.9.1? |
Do you see any output in the osd log after the |
Here is a pastebin of the log from one of my osd containers. https://paste.opensuse.org/view//1664407a It seems to me that this is what I would expect, but maybe there is data missing I don't see. My pods are running bluestore with |
I think the issue here is that the osd spec should use |
@travisn is the intention for OSDs to not produce any log files, and rather send all logging output to stdout/stderr to be collected by k8s logging? |
@noahdesu Yes, collecting all the log output would be the goal. Files inside the pod are more easily lost when the pod is restarted. |
The logging problem might just be that the default logging level is 0, which seems far too low. Seems like we should remove these default logging levels and rely on the default ceph logging levels. @liewegas thoughts on this? Note that the translation to level 0 happens here, which was an attempt to make it simpler to enable debug logging and make sure the logs weren't filling up too quickly. Level 0 seems very low though. Perhaps we should only set the log levels with the ceph.conf override ability, which we already have. #2496 still looks like a good idea to make sure we're capturing all the logging. |
Yes, please just use the ceph defaults. |
The issue seems to be in the following scenario, only affecting a certain type of osd:
Rook seems to be losing the stderr that is written by the The workaround to collect osd logs is to set the
|
It looks like this problem is relevant to mine #2547 It looks like the |
Ok, I think I found a bug: you mentioned
but the implementation is the following:
which has the following line: https://github.com/rook/rook/blob/master/pkg/util/exec/exec.go#L202 in := bufio.NewScanner(io.MultiReader(stdout, stderr)) The problem with it is that So the first reader (in this case So lines from |
@noahdesu right, I did not spot a PR was already sent to address it, sorry. |
Does this need to be backported to 0.9? it's in the 0.9 project. |
yes |
The io.MultiReader(r1, r2) reads r1 until EOF before moving on to r2. When used for stdout/stderr stderr will not be written to the log until stdout reaches EOF. This patch reads stderr in a go routine so that the two streams can be interleaved properly. Fixes: rook#2479 Signed-off-by: Noah Watkins <noahwatkins@gmail.com> (cherry picked from commit fbb56c4)
Is this a bug report or feature request?
Deviation from expected behavior:
Since the decoupling of the ceph version, the ceph-osd log output is not being captured in the k8s logs. In the osd logs we see the output of ceph-volume and that the osd process is started, but nothing after that. It seems to be an issue with rook starting the ceph-osd process and not capturing the stderr output.
Expected behavior:
OSD logging should be captured.
How to reproduce it (minimal and precise):
The text was updated successfully, but these errors were encountered: