Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

boot time (as reported by systemd-analyze) charges systemd startup time to the kernel #17175

Open
joshtriplett opened this issue Sep 27, 2020 · 3 comments
Labels
needs-reporter-feedback ❓ There's an unanswered question, the reporter needs to answer pid1

Comments

@joshtriplett
Copy link
Contributor

systemd version the issue has been seen with

systemd 246 (246.6-1)

Used distribution

Latest Debian sid

The boot time statistics, as reported by systemd-analyze, treat the time that systemd is up-and-running as the time where the kernel is done booting. However, this can vary substantially, if systemd takes a while to initialize, such as if reading it and its dependencies from disk takes time. For instance:

...
[    0.060286] Run /sbin/init as init process
[    0.060286]   with arguments:
[    0.060287]     /sbin/init
[    0.060287]   with environment:
[    0.060287]     HOME=/
[    0.060288]     TERM=linux
[    0.060288]     BOOT_IMAGE=/vmlinuz
[    1.059489] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x2408f24b9db, max_idle_ns: 440795321801 ns
[    1.059509] clocksource: Switched to clocksource tsc
[    3.068643] systemd[1]: systemd 246.6-1 running in system mode. (+PAM +AUDIT +SELINUX +IMA +APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +ZSTD +SECCOMP +BLKID +ELFUTILS +KMOD +IDN2 -IDN +PCRE2 default-hierarchy=hybrid)
[    3.068658] systemd[1]: Detected virtualization kvm.
...

In this case, the kernel booted in 60ms and started running /sbin/init. But systemd-analyze reports:

Startup finished in 3.057s (kernel) + 1.109s (userspace) = 4.167s 

The time shown as "kernel" here is primarily counting the time taken for systemd to load.

@poettering
Copy link
Member

we take the timestamps that measure this very early in main(). there's really nothing much we do in between. Loading of unit files is much later. In fact the "systemd … running in system mode" message is also printed long before we load any unit files.

Maybe the clock source change you are seeing in the logs is fucking up timing?

Otherwise not sure where the discrepancy might come from. I mean, in the time between the kernel's PID 1 execution and us getting control in main() there's just the ELF interpretor running really, that can't take 3s though?

Or maybe some stupid lib is using gcc constructors to do some nonsensical crypto validation before we even start? I know that on rhel there was some nonsense like that in place once upon a time?

Basically, we issue clock_gettime() with CLOCK_MONOTONIC really early on, under the assumption it started at zero when the kernel initialized. Maybe that assumption is wrong on your system?

@poettering poettering added needs-reporter-feedback ❓ There's an unanswered question, the reporter needs to answer pid1 labels Oct 1, 2020
@poettering
Copy link
Member

is this a system with an initrd? maybe the initrd spends that time on something and doesn#t mention tht in dmesg?

/cc @mbiebl, given this is debian?

@joshtriplett
Copy link
Contributor Author

joshtriplett commented Oct 1, 2020

@poettering
No initrd here; I'm running a custom-built kernel with all the necessary drivers built in, and no module support at all. And the clocksource message is the normal switch from tsc-early to tsc; it didn't affect the timing.

Dynamically linking systemd requires loading various libraries that are being read for the first time, which means many reads that need to go all the way to the disk. Many cloud VMs have throttled disk I/O. In addition, some cloud systems when booting for the first time need to dynamically initialize their disk from a snapshot image, which happens on demand as pieces of the disk get loaded.

As a simple way to reproduce, try booting a disk image that uses systemd in qemu, with bps_rd=5000000 (5MB/s) appended to the -drive setting. That makes the dynamic linker take about 2.5s to load systemd from disk, and systemd-analyze will attribute that time to the kernel.

I can confirm, based on watching the console, that the timestamps in the log appear to accurately reflect when they were printed.

How about this: grab the current timestamp at the start of main as you currently do, but also, once /proc is mounted, read /proc/1/stat, which includes a field for "time the process started after system boot". That'll give you the timestamp from the moment PID 1 was started, before the dynamic linker ran. (It is, unfortunately, in units of USER_HZ, which on most architectures is just centiseconds, but it's still useful.) That timestamp gives you the "kernel" time, and then the delta from that to the timestamp at the start of main gives you how long it took the loader to get systemd and its dependencies from disk.

(I can also think of several ways to reduce that time, but measuring it is the first step.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs-reporter-feedback ❓ There's an unanswered question, the reporter needs to answer pid1
Development

No branches or pull requests

2 participants