New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
journalctl --list-boots fails with v232-v234 again (was working in v231) #6447
Comments
Well, id you can't share the files, you're you could try a git bisect. In any case, you might wanna make a copy of the files to prevent them from getting rotated away. |
|
It seems the changes done to journal-file.c for v232 caused the problem to occur. When I reverted the journal-file.c code from v231, then the list-boots worked again for corrupted files. |
That doesn't help. OI course it would be around there, but there are 10 commits in between v231 and v232. You'll need to run a git bisect. I can help you with that if you don't know how. |
I now used git bisect for the first time.
|
@poettering any thoughts? |
The problematic journal file give verify error:
And when printing out logs with journalctl output, it shows strange alone "--Reboot--" text as last output text line. In other journal files that are corrupted (after user gave "reboot -f" also in these all cases), there is not this alone "--Reboot--" text as last shown output from journalctl. |
Any chance you can provide me with the offending journal file? That'd be easiest to track down this issue. |
I cannot provide the offending journal file as that has company confidential material. I have not yet managed to reproduce this in normal fedora. |
I fear I can't debug this without the journal file :-( Not sure what else we can do on this one... |
In journalctl.c, in get_boots functions, it is always giving up from error. Can we change the code to still output the data that had been collected before meeting the first error?
at least in my problem case the boot ids were printed out with this change. |
But we really shouldn't generate the error in the first place if its recoverable... Any chance you can add |
What is this "--verbose" parameter ?, as there is no such parameter in journalctl |
Ah sorry, I was a bit confused... meant to say, please set SYSTEMD_LOG_LEVEL=debug...: # SYSTEMD_LOG_LEVEL=debug journalctl … |
Here is output. |
Ok, I moved #undef/#define after all #include things, then it changed outputs. |
Is there going to be any progress? |
Hello, |
@shibumi it's besides the point whether they are corrupted or not, journalctl is supposed to deal with all kinds of corruptions implicitly and silently and make the best of it. |
@poettering Maybe this issue has something todo with this thread: #4088 |
Hi, I got access to the journal logs (I work for the same company). In this case there are total 20+ journal files, and one of the corrupted files is causing
The last partially written entry in the file is starting from offset eecfe8, and the bytes are: If I get everything right, this is parsed as:
It looks like As a quick test, I copy-pasted the OBJECT_ENTRY checks from
With this change,
|
Any comments? Do we just create pull request for this change proposal? |
Your patch looks excellent. In the long run we really should add similar validators for all object types really. And yes, please submit this patch as PR. The patch as it is looks pretty flawless, except maybe that I'd split out these checks into a function of its own that journal_file_move_to_object() just calls if the type matches. Thank you for tracking this down and prepping the fix! |
Introduce journal_file_check_object(), which does lightweight object sanity checks, and use it in journal_file_move_to_object(), so that we will catch certain corrupted objects in the journal file. This fixes systemd#6447, where we had only partially written out OBJECT_ENTRY (ObjectHeader written, but rest of object zero bytes), causing "journalctl --list-boots" to fail. $ builddir.vanilla/journalctl --list-boots -D bug6447/ Failed to determine boots: No data available $ builddir.patched/journalctl --list-boots -D bug6447/ -52 22633da1c5374a728d6c215e2c301dc2 Mon 2017-07-10 05:29:21 EEST—Mon 2017-07-10 05:31:51 EEST -51 2253aab9ea7e4a2598f2abda82939eff Mon 2017-07-10 05:32:22 EEST—Mon 2017-07-10 05:36:49 EEST -50 ef0d85d35c74486fa4104f9d6391b6ba Mon 2017-07-10 05:40:33 EEST—Mon 2017-07-10 05:40:40 EEST [...] Note that journal_file_check_object() is similar to journal_file_object_verify(). The most expensive checks are omitted, as they would slow down every journal_file_move_to_object() call too much. With this implementation, the added overhead is small, for example when dumping some journal content to /dev/null (built with -Dbuildtype=debugoptimized -Db_ndebug=true): Performance counter stats for 'builddir.vanilla/journalctl -D 76f4d4c3406945f9a60d3ca8763aa754/': 12542,311634 task-clock:u (msec) # 1,000 CPUs utilized 0 context-switches:u # 0,000 K/sec 0 cpu-migrations:u # 0,000 K/sec 80 100 page-faults:u # 0,006 M/sec 41 786 963 456 cycles:u # 3,332 GHz 105 453 864 770 instructions:u # 2,52 insn per cycle 24 342 227 334 branches:u # 1940,809 M/sec 105 709 217 branch-misses:u # 0,43% of all branches 12,545199291 seconds time elapsed Performance counter stats for 'builddir.patched/journalctl -D 76f4d4c3406945f9a60d3ca8763aa754/': 12734,723233 task-clock:u (msec) # 1,000 CPUs utilized 0 context-switches:u # 0,000 K/sec 0 cpu-migrations:u # 0,000 K/sec 80 693 page-faults:u # 0,006 M/sec 42 661 017 429 cycles:u # 3,350 GHz 107 696 985 865 instructions:u # 2,52 insn per cycle 24 950 526 745 branches:u # 1959,252 M/sec 101 762 806 branch-misses:u # 0,41% of all branches 12,737527327 seconds time elapsed Fixes systemd#6447.
Introduce journal_file_check_object(), which does lightweight object sanity checks, and use it in journal_file_move_to_object(), so that we will catch certain corrupted objects in the journal file. This fixes systemd#6447, where we had only partially written out OBJECT_ENTRY (ObjectHeader written, but rest of object zero bytes), causing "journalctl --list-boots" to fail. $ builddir.vanilla/journalctl --list-boots -D bug6447/ Failed to determine boots: No data available $ builddir.patched/journalctl --list-boots -D bug6447/ -52 22633da1c5374a728d6c215e2c301dc2 Mon 2017-07-10 05:29:21 EEST—Mon 2017-07-10 05:31:51 EEST -51 2253aab9ea7e4a2598f2abda82939eff Mon 2017-07-10 05:32:22 EEST—Mon 2017-07-10 05:36:49 EEST -50 ef0d85d35c74486fa4104f9d6391b6ba Mon 2017-07-10 05:40:33 EEST—Mon 2017-07-10 05:40:40 EEST [...] Note that journal_file_check_object() is similar to journal_file_object_verify(). The most expensive checks are omitted, as they would slow down every journal_file_move_to_object() call too much. With this implementation, the added overhead is small, for example when dumping some journal content to /dev/null (built with -Dbuildtype=debugoptimized -Db_ndebug=true): Performance counter stats for 'builddir.vanilla/journalctl -D 76f4d4c3406945f9a60d3ca8763aa754/': 12542,311634 task-clock:u (msec) # 1,000 CPUs utilized 0 context-switches:u # 0,000 K/sec 0 cpu-migrations:u # 0,000 K/sec 80 100 page-faults:u # 0,006 M/sec 41 786 963 456 cycles:u # 3,332 GHz 105 453 864 770 instructions:u # 2,52 insn per cycle 24 342 227 334 branches:u # 1940,809 M/sec 105 709 217 branch-misses:u # 0,43% of all branches 12,545199291 seconds time elapsed Performance counter stats for 'builddir.patched/journalctl -D 76f4d4c3406945f9a60d3ca8763aa754/': 12734,723233 task-clock:u (msec) # 1,000 CPUs utilized 0 context-switches:u # 0,000 K/sec 0 cpu-migrations:u # 0,000 K/sec 80 693 page-faults:u # 0,006 M/sec 42 661 017 429 cycles:u # 3,350 GHz 107 696 985 865 instructions:u # 2,52 insn per cycle 24 950 526 745 branches:u # 1959,252 M/sec 101 762 806 branch-misses:u # 0,41% of all branches 12,737527327 seconds time elapsed Fixes systemd#6447.
journal: add object sanity check to journal_file_move_to_object() (#6447)
Submission type
systemd version the issue has been seen with
systemd v233/v234
Used distribution
company specific.
In case of bug report: Expected behaviour you didn't see
journalctrl --list-boots shuold print all boot ids.
In case of bug report: Unexpected behaviour you saw
journalctl --list-boots returns error:
Failed to determine boots: No data available
In case of bug report: Steps to reproduce the problem
causing corrupted journal files with "reboot -f" commands.
then trying to list boots.
With old systemd v231, same journal files can be parsed correctly, and thus boot ids are listed corrected. But with systemd v232,v233,v234, there comes error, and no boot ids are listed.
binary journal files cannot be shared because of company confidentially. Maybe if this can be produced in plain fedora build, i can then share the files.
The text was updated successfully, but these errors were encountered: