Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Boot time regressed 20 times after upgrading from 244.1 to 244.2 #14828

Closed
andreesteve opened this issue Feb 8, 2020 · 71 comments · Fixed by #15627 or #16281
Closed

Boot time regressed 20 times after upgrading from 244.1 to 244.2 #14828

andreesteve opened this issue Feb 8, 2020 · 71 comments · Fixed by #15627 or #16281
Labels
regression ⚠️ A bug in something that used to work correctly and broke through some recent commit
Milestone

Comments

@andreesteve
Copy link

systemd version the issue has been seen with

244.2

Used distribution

ArchLinux - Linux scout 5.5.2-arch1-1 SMP PREEMPT Tue, 04 Feb 2020 18:56:18 +0000 x86_64 GNU/Linux

Expected behaviour you didn't see

3 seconds user space boot time

Unexpected behaviour you saw

40 seconds user space boot time

Steps to reproduce the problem

sudo pacman -U /var/cache/pacman/pkg/systemd-244.2-1-x86_64.pkg.tar.zst && reboot

Boot times & critical chain on 244.1

[andre@scout ~]$ systemd-analyze 
Startup finished in 5.454s (kernel) + 2.843s (userspace) = 8.297s 
graphical.target reached after 2.843s in userspace

andre@scout ~]$ systemd-analyze critical-chain
The time when unit became active or started is printed after the "@" character.
The time the unit took to start is printed after the "+" character.

graphical.target @2.843s
└─multi-user.target @2.842s
  └─systemd-logind.service @1.692s +1.149s
    └─basic.target @1.685s
      └─sockets.target @1.685s
        └─org.cups.cupsd.socket @1.685s
          └─sysinit.target @1.679s
            └─systemd-timesyncd.service @1.461s +218ms
              └─systemd-tmpfiles-setup.service @1.433s +24ms
                └─local-fs.target @1.432s
                  └─mnt-data.mount @1.393s +39ms
                    └─systemd-fsck@dev-disk-by\x2duuid-693c2dae\x2d3f8d\x2d4825\x2dbfdf\x2d9268a8b9584b.service @1.305s +86ms
                      └─local-fs-pre.target @1.304s
                        └─lvm2-monitor.service @211ms +1.092s
                          └─lvm2-lvmetad.service @227ms
                            └─systemd-journald.socket @208ms
                              └─-.mount @198ms
                                └─system.slice @198ms
                                  └─-.slice @198ms

Boot times & critical chain on 244.2

[andre@scout ~]$ systemd-analyze 
Startup finished in 9.144s (kernel) + 40.945s (userspace) = 50.089s 
graphical.target reached after 40.944s in userspace

[andre@scout ~]$ systemd-analyze critical-chain
The time when unit became active or started is printed after the "@" character.
The time the unit took to start is printed after the "+" character.

graphical.target @40.944s
└─multi-user.target @40.943s
  └─systemd-logind.service @38.863s +2.077s
    └─basic.target @38.855s
      └─sockets.target @38.855s
        └─org.cups.cupsd.socket @38.854s
          └─sysinit.target @38.848s
            └─systemd-update-done.service @38.257s +590ms
              └─systemd-journal-catalog-update.service @35.780s +2.475s
                └─systemd-tmpfiles-setup.service @33.729s +2.050s
                  └─local-fs.target @33.728s
                    └─mnt-gamessd.mount @33.720s +8ms
                      └─systemd-fsck@dev-disk-by\x2duuid-f9ac8e20\x2d5991\x2d4af3\x2d8763\x2d0c7a693bb7db.service @30.269s +3.449s
                        └─local-fs-pre.target @30.262s
                          └─lvm2-monitor.service @7.747s +22.514s
                            └─lvm2-lvmetad.service @7.758s
                              └─systemd-journald.socket @7.743s
                                └─system.slice @7.732s
                                  └─-.slice @7.732s
@floppym
Copy link
Contributor

floppym commented Feb 8, 2020

Version numbers 244.1 and 244.2 are meaningless outside of Arch Linux. They likely contain some number of back-ported changes.

Please attempt to identify the specific commit that introduced the problem.

@boucman
Copy link
Contributor

boucman commented Feb 8, 2020

that's not true... since 243 the systemd-stable does tag .1 and .2 releases, specifically for that kind of bug reports...

that said arch might carry some specific patches...

@floppym
Copy link
Contributor

floppym commented Feb 9, 2020

that's not true... since 243 the systemd-stable does tag .1 and .2 releases, specifically for that kind of bug reports...

Ah ha, that's a nice improvement. It really deserves an announcement on the mailing list.

@andreesteve
Copy link
Author

@floppym - after a bit of git bisect and few reboots, here's the git bisect result.

[andre@scout systemd-stable]$ git bisect bad
84c048799a78a121c0536bf947b02c99093d3bfc is the first bad commit
commit 84c048799a78a121c0536bf947b02c99093d3bfc
Author: Arian van Putten <arian.vanputten@gmail.com>
Date:   Wed Jan 15 17:10:11 2020 +0100

    Disable reading SystemdOptions EFI Var when in SecureBoot mode
    
    In SecureBoot mode this is probably not what you want. As your cmdline
    is cryptographically signed like when using Type #2 EFI Unified Kernel
    Images (https://systemd.io/BOOT_LOADER_SPECIFICATION/) The user's
    intention is then that the cmdline should not be modified.  You want to
    make sure that the system starts up as exactly specified in the signed
    artifact.
    
    (cherry picked from commit c7d26acce6dcb0e72be6160873fac758e9b7c440)

 src/basic/efivars.c      | 36 ++++++++++++++++++++++++++++++++++++
 src/basic/efivars.h      | 16 ++++++++++++++++
 src/basic/proc-cmdline.c | 16 ++++++++++++++--
 src/shared/efi-loader.c  | 34 ----------------------------------
 src/shared/efi-loader.h  | 15 ---------------
 5 files changed, 66 insertions(+), 51 deletions(-)
[andre@scout systemd-stable]$ git bisect log
git bisect start
# bad: [77c04ce5c2709c1d51bf4c0e9d443c7d76546537] hwdb: update to v245-rc1
git bisect bad 77c04ce5c2709c1d51bf4c0e9d443c7d76546537
# good: [639dc9f4bfd2c09535bee079ae9bc7006b520a66] network: set ipv6 mtu after link-up or device mtu change
git bisect good 639dc9f4bfd2c09535bee079ae9bc7006b520a66
# good: [e6d694254fe115cc04852732172959998e051b87] mount: mark an existing "mounting" unit from /proc/self/mountinfo as "just_mounted"
git bisect good e6d694254fe115cc04852732172959998e051b87
# bad: [d7ede1ade56426db83523523e473ed52133c39af] generator: order growfs for the root fs after systemd-remount-fs
git bisect bad d7ede1ade56426db83523523e473ed52133c39af
# bad: [e5f2d11489ec4852f1ad45a1271f502d20602126] tree-wide: we forgot to destroy some bus errors
git bisect bad e5f2d11489ec4852f1ad45a1271f502d20602126
# good: [4c2d72b53091ed8d8e362dca052e5b9fa8325d96] sysctl: downgrade message when we have no permission
git bisect good 4c2d72b53091ed8d8e362dca052e5b9fa8325d96
# bad: [9ba11dffb09a7f4b63be8a5970fb51d600b53087] typo: "May modify to" -> "May modify"
git bisect bad 9ba11dffb09a7f4b63be8a5970fb51d600b53087
# bad: [84c048799a78a121c0536bf947b02c99093d3bfc] Disable reading SystemdOptions EFI Var when in SecureBoot mode
git bisect bad 84c048799a78a121c0536bf947b02c99093d3bfc
# first bad commit: [84c048799a78a121c0536bf947b02c99093d3bfc] Disable reading SystemdOptions EFI Var when in SecureBoot mode

After rebooting so many times, I think I hear the HDD spinning when I get the 30-40 seconds boot times, vs no noise when I get the 3-4 seconds.

I am not running SecureBoot (or so I understand this output):

andre@scout systemd-stable]$ od --address-radix=n --format=u1 /sys/firmware/efi/efivars/SecureBoot-8be4df61-93ca-11d2-aa0d-00e098032b8c 
  22   0   0   0   0

I was a bit confused whether I messed up the git bisect, so I checked out 4c2d72b (the commit before 84c0487), rebuilt, installed it, and rebooted three times using this commit. And I got 3 seconds boot time on this commit consistently. Did the same with 84c0487 and got 3 out 3 reboots about 40 seconds boot time. So it seems to be indeed the one that introduced the slower boot up.

@anitazha anitazha added the regression ⚠️ A bug in something that used to work correctly and broke through some recent commit label Mar 12, 2020
@stellarator
Copy link

Colleagues, do you have any estimates of the resolution of the problem? Original author is not alone and, honestly, author did a certain job, from which, at least, you can push off.

@poettering
Copy link
Member

so LVM takes 20s to initialize? can you reproduce without LVM in the mix?

@poettering poettering added the needs-reporter-feedback ❓ There's an unanswered question, the reporter needs to answer label Apr 2, 2020
@poettering
Copy link
Member

somehow the bisect doesn't look right to me, it's unlikely that one would cause any slowdown.

@stellarator
Copy link

I don't use LVM at all (in no form). It is present in critical-chain but I can’t imagine how it can influence my situation.

I can try to bisect myself. I could not find the time for this at all, but it seems that this needs to be done. I'll report parallel surveys but I need a couple of days to find spare time.

@stellarator
Copy link

@poettering

git bisect good
c7d26acce6dcb0e72be6160873fac758e9b7c440 is the first bad commit
commit c7d26acce6dcb0e72be6160873fac758e9b7c440
Author: Arian van Putten <arian.vanputten@gmail.com>
Date:   Wed Jan 15 17:10:11 2020 +0100

    Disable reading SystemdOptions EFI Var when in SecureBoot mode
    
    In SecureBoot mode this is probably not what you want. As your cmdline
    is cryptographically signed like when using Type #2 EFI Unified Kernel
    Images (https://systemd.io/BOOT_LOADER_SPECIFICATION/) The user's
    intention is then that the cmdline should not be modified.  You want to
    make sure that the system starts up as exactly specified in the signed
    artifact.

 src/basic/efivars.c      | 36 ++++++++++++++++++++++++++++++++++++
 src/basic/efivars.h      | 16 ++++++++++++++++
 src/basic/proc-cmdline.c | 16 ++++++++++++++--
 src/shared/efi-loader.c  | 34 ----------------------------------
 src/shared/efi-loader.h  | 15 ---------------
 5 files changed, 66 insertions(+), 51 deletions(-)
git bisect log
git bisect start
# good: [db9c5ae73e23d816e2df2a3e10a9a2a60b5b3ed7] Merge pull request #14201 from poettering/v244-final
git bisect good db9c5ae73e23d816e2df2a3e10a9a2a60b5b3ed7
# bad: [ea500ac513cf51bcb79a5666f1519499d029428f] Merge pull request #15034 from keszybz/hwdb-update
git bisect bad ea500ac513cf51bcb79a5666f1519499d029428f
# good: [ea7fe1d1c2b588cc4064ffdec04d6aa604c8d5dd] Merge pull request #14390 from poettering/gpt-var-tmp
git bisect good ea7fe1d1c2b588cc4064ffdec04d6aa604c8d5dd
# bad: [2273ecfeda0b5efd33077fe6c49f5b86350b507c] test: don't install /etc/securetty
git bisect bad 2273ecfeda0b5efd33077fe6c49f5b86350b507c
# bad: [5e176a4deef94f95e4b3dc97e355af2f271b970d] Merge pull request #14368 from poettering/repart
git bisect bad 5e176a4deef94f95e4b3dc97e355af2f271b970d
# good: [b2ae4d9eb85f80d639733e0ff140c5fa8cae71ab] sysctl: move hashmap allocation out of main function
git bisect good b2ae4d9eb85f80d639733e0ff140c5fa8cae71ab
# bad: [e594a3b154bd06c535a934a1cc7231b1ef76df73] repart: add new systemd-repart tool
git bisect bad e594a3b154bd06c535a934a1cc7231b1ef76df73
# bad: [f535af6bcd51a5c25b6c8a3b8004f38ca84f1adf] man: document that WakeSystem= affects clock choice
git bisect bad f535af6bcd51a5c25b6c8a3b8004f38ca84f1adf
# bad: [49dd0c161a1e81a46c8489056b7d5377821b0945] man: suggest SYSTEMD_WANTS usage instead of RUN for long running processes
git bisect bad 49dd0c161a1e81a46c8489056b7d5377821b0945
# bad: [c7d26acce6dcb0e72be6160873fac758e9b7c440] Disable reading SystemdOptions EFI Var when in SecureBoot mode
git bisect bad c7d26acce6dcb0e72be6160873fac758e9b7c440
# good: [32458cc9687c1b60ff0f22c0e71da93ce78b1534] sysctl: downgrade message when we have no permission
git bisect good 32458cc9687c1b60ff0f22c0e71da93ce78b1534
# good: [5c1a9ef08842e555d07d50cd495204a53bb96c37] Merge pull request #14585 from keszybz/sysctl-downgrade-messages
git bisect good 5c1a9ef08842e555d07d50cd495204a53bb96c37
# first bad commit: [c7d26acce6dcb0e72be6160873fac758e9b7c440] Disable reading SystemdOptions EFI Var when in SecureBoot mode

So, here we are: c7d26ac (actually, exact the same as for original author).

Can the following help somehow?

└─multi-user.target @2.972s
  └─systemd-logind.service @2.121s +849ms
    └─basic.target @2.096s
      └─sockets.target @2.095s
        └─org.cups.cupsd.socket @2.093s
          └─sysinit.target @2.074s
            └─systemd-timesyncd.service @1.778s +294ms
              └─systemd-tmpfiles-setup.service @1.715s +47ms
                └─local-fs.target @1.708s
                  └─boot.mount @1.588s +117ms
                    └─systemd-fsck@dev-disk-by\x2duuid-002E\x2d99DC.service @1.531s +53ms
                      └─local-fs-pre.target @1.524s
                        └─lvm2-monitor.service @225ms +1.293s
                          └─lvm2-lvmetad.service @244ms
                            └─systemd-journald.socket @219ms
                              └─-.mount @212ms
                                └─system.slice @212ms
                                  └─-.slice @212ms
└─multi-user.target @38.245s
  └─systemd-logind.service @36.018s +2.225s
    └─basic.target @35.991s
      └─sockets.target @35.989s
        └─org.cups.cupsd.socket @35.986s
          └─sysinit.target @35.974s
            └─systemd-update-done.service @35.026s +945ms
              └─systemd-journal-catalog-update.service @33.972s +1.050s
                └─systemd-tmpfiles-setup.service @33.106s +860ms
                  └─local-fs.target @33.101s
                    └─tmp.mount @33.096s +2ms
                      └─swap.target @33.091s
                        └─dev-disk-by\x2dpartlabel-primary.swap @33.090s
                          └─systemd-journald.socket @8.642s
                            └─system.slice @8.635s
                              └─-.slice @8.635s

@andreesteve
Copy link
Author

@poettering - do you suggest any additional info to be gathered between me or @stellarator ? The EFI related changes indeed look odd to be the root cause of this regression, and I recall doing the bisect twice before posting. Glad to see confirmation from stellarator too on the same commit.

It does look like the original commit (c7d26ac) is also associated with this issue #14864

@openmindead FYI

I know very little of EFI - not sure even how to debug this further. I am stuck on a previous systemd that does not include this change. Instead of investing the time to compile it with the commit revert whenever I need to update systemd, I'd rather invest that time in helping investigate and resolve this. Please do let me know if there is anything else I can try to help narrow this down.

Thank you!

@arianvp
Copy link
Contributor

arianvp commented Apr 28, 2020

The associated issue is about a bug in the test suite (that I should fix) but is not related.

I've been staring at my patch for the past half hour; but for the the life of me can not figure out why it would have caused this regression. Most of it is moving functions around.

The only functional change is that before reading

efi_get_variable_string(EFI_VENDOR_SYSTEMD, "SystemdOptions", line);

we now check if

efi_get_variable(EFI_VENDOR_GLOBAL, "SecureBoot", NULL, &v, &s);

is false.

Could it be that reading the SecureBoot EFI variable comes with a large performance penalty and we start noticing because reading the cmdline is a rather hot codepath?

Any time we read an option from the kernel cmdline we check if this variable is set before checking if SystemdOptions has any extra keys.

@poettering
Copy link
Member

I think this is just fall-out from the behaviour we work around here:

7229ec0

i.e. the fix for this issue is simple: just cache the value once we read it, so that we don't keep hitting efivarfs all the time. Otherwise the kernel will ratelimit us...

@poettering
Copy link
Member

Also see #15598

@poettering poettering added this to the v246 milestone Apr 28, 2020
@arianvp
Copy link
Contributor

arianvp commented Apr 28, 2020

Ah so we're just more likely to hit this kernel rate limit now because we're doing more efivarfs reads? makes sense.

do we generically want to cache all efivarfs reads or only these two specific ones?

UEFI makes destinction between immutable and mutable variables. SecureBoot is an immutable one according to the spec so it seems safe to cache; but there are probably also others that might change and caching is less safe.

Seems like a generic caching makes more sense to not run into these rate-limits into the future

poettering added a commit to poettering/systemd that referenced this issue Apr 28, 2020
EFI variable access is nowadays subject to rate limiting by the kernel.
Thus, let's cache the results of checking them, in order to minimize how
often we access them.

Fixes: systemd#14828
@poettering
Copy link
Member

Fix waiting in #15627

poettering added a commit to poettering/systemd that referenced this issue Apr 29, 2020
EFI variable access is nowadays subject to rate limiting by the kernel.
Thus, let's cache the results of checking them, in order to minimize how
often we access them.

Fixes: systemd#14828
poettering added a commit that referenced this issue Apr 30, 2020
EFI variable access is nowadays subject to rate limiting by the kernel.
Thus, let's cache the results of checking them, in order to minimize how
often we access them.

Fixes: #14828
@poettering
Copy link
Member

Would be fantastic if anyone could test if #15627 (or git master) fixes this issue properly.

@andreesteve
Copy link
Author

andreesteve commented May 2, 2020

@poettering thank you for looking into this. I cherry-picked the change onto stable and tested. Boot time is now 20 seconds user space. It is a significant improvement compared to the 40 seconds user space start up I was getting before. But it is still a huge increase from the 2-3 seconds I used to have before.

[andre@scout ~]$ systemd-analyze 
Startup finished in 9.810s (kernel) + 20.480s (userspace) = 30.291s 
graphical.target reached after 20.480s in userspace

I will try repeating some few more times and report back if I see additional differences.

EDIT: I have been running the patch since yesterday and it get constantly very close to 20 seconds user space all the time. This is an improvement over the original regression, but hopefully there is still a way to get back to 2-3 seconds.

@arianvp
Copy link
Contributor

arianvp commented May 7, 2020

Seems like we should keep this issue open then still, as the regression is still 10x

@arianvp
Copy link
Contributor

arianvp commented May 9, 2020

@poettering I had another idea to get rid of this reading cmdline slowness.

How about we move the reading of SystemdOptions to systemd-boot and let it inject the results into cmdline proper before starting systemd itself?

The EFI stub already has the "Only do this if SecureBoot is disabled" logic that I now duplicated into systemd and made things slower here https://github.com/systemd/systemd/blob/master/src/boot/efi/stub.c#L66-L76

So we get the 'dont do that unless secure boot is off' for free as far as I can see.

@oleastre
Copy link

Tested #15627 today, and at least it makes my system boot correctly, and makes it usable again (I was down to almost 10 minutes for some devices to be detected, now it's back to normal).
I don't have boot timings before 244, but at least this solves some issues.

keszybz added a commit to keszybz/systemd that referenced this issue May 31, 2020
Quoting systemd#14828 (comment):

> [kernel uses] msleep_interruptible() and that means when the process receives
> any kind of signal masked or not this will abort with EINTR.  systemd-logind
> gets signals from the TTY layer all the time though.

> Here's what might be happening: while logind reads the EFI stuff it gets a
> series of signals from the TTY layer, which causes the read() to be aborted
> with EINTR, which means logind will wait 50ms and retry. Which will be
> aborted again, and so on, until quite some time passed. If we'd not wait for
> the 50ms otoh we wouldn't wait so long, as then on each signal we'd
> immediately retry again.
@keszybz
Copy link
Member

keszybz commented May 31, 2020

@oleastre if you could give #15986 as spin I'd be very thankful.

@oleastre
Copy link

oleastre commented Jun 1, 2020

Tested #15986 here is the result efi-log.txt
I also dropped all patches and restarted from a bare 245.4 version + the debug patch, here is my full efi vars timing full-log.txt
What I see there, is that on first var access I already get some seconds. But I don't see earlier access to the variables. So I suspect the debug log is not produced in the initrd. If you have a solution to catch those logs, I'll be happy to try it.

@keszybz
Copy link
Member

keszybz commented Jun 2, 2020

It looks like we should merge #15986, but also consider adding a global cache.

keszybz added a commit to keszybz/systemd that referenced this issue Jun 2, 2020
Quoting systemd#14828 (comment):

> [kernel uses] msleep_interruptible() and that means when the process receives
> any kind of signal masked or not this will abort with EINTR.  systemd-logind
> gets signals from the TTY layer all the time though.

> Here's what might be happening: while logind reads the EFI stuff it gets a
> series of signals from the TTY layer, which causes the read() to be aborted
> with EINTR, which means logind will wait 50ms and retry. Which will be
> aborted again, and so on, until quite some time passed. If we'd not wait for
> the 50ms otoh we wouldn't wait so long, as then on each signal we'd
> immediately retry again.
poettering pushed a commit that referenced this issue Jun 2, 2020
Quoting #14828 (comment):

> [kernel uses] msleep_interruptible() and that means when the process receives
> any kind of signal masked or not this will abort with EINTR.  systemd-logind
> gets signals from the TTY layer all the time though.

> Here's what might be happening: while logind reads the EFI stuff it gets a
> series of signals from the TTY layer, which causes the read() to be aborted
> with EINTR, which means logind will wait 50ms and retry. Which will be
> aborted again, and so on, until quite some time passed. If we'd not wait for
> the 50ms otoh we wouldn't wait so long, as then on each signal we'd
> immediately retry again.
facebook-github-bot pushed a commit to facebookarchive/rpm-backports that referenced this issue Jun 5, 2020
Summary:
There is a known issue upstream where boot time is regressed due to an API breakage in kernel  and code change in systemd (systemd/systemd#14828). You can track the root cause in the issue linked. In FB this manifests as SMI increases when Chef runs.

So far the patches upstream don't resolve fix the problem but reverting the blame commit (systemd/systemd@c7d26ac) does.

Reviewed By: NaomiReeves

Differential Revision: D21896056

fbshipit-source-id: fa8f0b515ac9b06b7bc376c1123f0bd0b6d49848
@filbranden
Copy link
Member

The problem with #15627 is that it's a per-process cache and as reported in #16097 a bunch of processes tries to parse /proc/cmdline and read EFI variables...

@arianvp :

How about we move the reading of SystemdOptions to systemd-boot and let it inject the results into cmdline proper before starting systemd itself?

Why not systemd PID 1 itself but early boot?

See my other suggestion about caching kernel command line somewhere under /run/systemd and have everyone else consume it from there...

@poettering
Copy link
Member

With all those fixes merged now, including the latest (@filbranden's #16139), where are we now with this? Do people still see slowdowns? (would be great to get a new log output with the time measurements with everything merged)

@oleastre
Copy link

I compiled 246.0 from master (6fe95d3) and ran the debug log. Here is the grep on efi variables access efi-vars.txt
Note that I booted the system (between 13:09:51 and 13:11:53) and only logged in after a while, so the end of the log is probably not that relevant.
On my particular system, there is not much difference since my last test; but that system is equiped with an old hdd and mostly io bounded.
Maybe the one of the other impacted users can make a better report on that aspect.

@aegl
Copy link

aegl commented Jun 15, 2020

Linux kernel fixes for this issue have made a step forward. Accepted by EFI maintainer. They should appear in the linux next tree tomorrow (when tag next-20200616 is created).
Fix1: Don't return -EINTR
Fix2: Update modification times when efivarfs files are written.

poettering added a commit to poettering/systemd that referenced this issue Jun 16, 2020
…operty call

Apparently some clients use GetAll() to acquire all properties we expose
at once (gdbus?). Given that EFI variables are no longer cheap, due to
recent kernel changes, let's simply mark them so that they are not
included in GetAll().

This is an API break by some level, but I don't see how we can fix this
otherwise. It's not our own API break after all, but the kernel's, we
just propagate it.

Yes, sucks.

See: systemd#14828
@poettering
Copy link
Member

@oleastre So, in a build with all of #16139 and also #16190 are we back at good performance? You output only showed one slow variable access left.

#16190 is an experiment. We might not need it if we cache more. I just want to verify that is caused by some client issuing the D-Bus property GetAll() call on our interface which makes us acquire everything we know and which means going to EFI variables. Which was fast before, but due to the kernel compat borkage is now extremely slow in some cases...

@oleastre
Copy link

The build I tested only included #16139; did not tried #16190, @poettering do you want a trace with that one ?
For me the on the performance side, it's quite ok since the very first patch in the series that only cached the EFI flags. But as I said, my system is mostly IO bounded and I don't have an SSD to compare.
Probably the original poster of #15739 can make further tests with his system.

@poettering
Copy link
Member

The build I tested only included #16139; did not tried #16190, @poettering do you want a trace with that one ?

Yes, please.

@andreesteve
Copy link
Author

@poettering - I am happy to report that running systemd from tip of master (a51a324) + cherry pick of #16190 has brought my boot times back to 244.1 numbers.

In case it helps, here is the debug dmesg logs: dmesg.debug.txt

Here are the times with debug flag set. Slightly up from 244.1, but probably because of the extra I/O on console due to the debug messages.

Startup finished in 5.642s (kernel) + 3.304s (userspace) = 8.947s 
graphical.target reached after 3.304s in userspace
The time when unit became active or started is printed after the "@" character.
The time the unit took to start is printed after the "+" character.

graphical.target @3.304s
└─multi-user.target @3.304s
  └─systemd-logind.service @2.041s +1.262s
    └─basic.target @2.022s
      └─sockets.target @2.022s
        └─org.cups.cupsd.socket @2.022s
          └─sysinit.target @1.998s
            └─systemd-timesyncd.service @1.767s +230ms
              └─systemd-tmpfiles-setup.service @1.740s +26ms
                └─local-fs.target @1.739s
                  └─boot.mount @1.614s +123ms
                    └─systemd-fsck@dev-disk-by\x2duuid-D399\x2d2D89.service @1.559s +54ms
                      └─local-fs-pre.target @1.556s
                        └─lvm2-monitor.service @858ms +697ms
                          └─lvm2-lvmetad.service @1.160s
                            └─systemd-udevd.service @900ms +258ms
                              └─systemd-tmpfiles-setup-dev.service @874ms +25ms
                                └─kmod-static-nodes.service @858ms +9ms
                                  └─systemd-journald.socket @854ms
                                    └─-.mount @845ms
                                      └─system.slice @845ms
                                        └─-.slice @845ms

And without the debug flag set. Almost exactly what it was back then.

Startup finished in 5.526s (kernel) + 2.768s (userspace) = 8.294s 
graphical.target reached after 2.768s in userspace
The time when unit became active or started is printed after the "@" character.
The time the unit took to start is printed after the "+" character.

graphical.target @2.768s
└─multi-user.target @2.768s
  └─systemd-logind.service @1.690s +1.076s
    └─basic.target @1.684s
      └─sockets.target @1.684s
        └─org.cups.cupsd.socket @1.684s
          └─sysinit.target @1.681s
            └─systemd-timesyncd.service @1.494s +187ms
              └─systemd-tmpfiles-setup.service @1.474s +18ms
                └─local-fs.target @1.473s
                  └─mnt-data.mount @1.405s +67ms
                    └─systemd-fsck@dev-disk-by\x2duuid-693c2dae\x2d3f8d\x2d4825\x2dbfdf\x2d9268a8b9584b.service @1.306s +98ms
                      └─local-fs-pre.target @1.305s
                        └─lvm2-monitor.service @647ms +657ms
                          └─lvm2-lvmetad.service @936ms
                            └─systemd-udevd.service @687ms +247ms
                              └─systemd-tmpfiles-setup-dev.service @664ms +22ms
                                └─kmod-static-nodes.service @647ms +9ms
                                  └─systemd-journald.socket @643ms
                                    └─-.mount @633ms
                                      └─system.slice @633ms
                                        └─-.slice @633ms

Thank you!

eworm-de pushed a commit to eworm-de/systemd that referenced this issue Jun 23, 2020
EFI variable access is nowadays subject to rate limiting by the kernel.
Thus, let's cache the results of checking them, in order to minimize how
often we access them.

Fixes: systemd#14828
(cherry picked from commit f46ba93)
poettering added a commit to poettering/systemd that referenced this issue Jun 25, 2020
With this we are now caching all EFI variables that we expose as
property in logind. Thus a client invoking GetAllProperties() should
only trgger a single read of each variable, but never repeated ones.

Obsoletes: systemd#16190
Fixes: systemd#14828
@poettering
Copy link
Member

@andreesteve thanks for playing around with this. Based on your info it appears it's simply the EFI-backed dbus props of logind that triggered the porblem. I prepped #16281 now which adds caches to them both. With that in place all EFI-backed dbus properties should be unconditionally cheap again, as they used to be, and the problems should go away.

I am pretty sure this wil fix this issue for good. Would of course be great if you could verify that.

I will now close #16190 since I think #16281 is a much nicer, less invasive fix.

poettering added a commit to poettering/systemd that referenced this issue Jun 26, 2020
With this we are now caching all EFI variables that we expose as
property in logind. Thus a client invoking GetAllProperties() should
only trgger a single read of each variable, but never repeated ones.

Obsoletes: systemd#16190
Fixes: systemd#14828
keszybz added a commit to systemd/systemd-stable that referenced this issue Jul 27, 2020
Quoting systemd/systemd#14828 (comment):

> [kernel uses] msleep_interruptible() and that means when the process receives
> any kind of signal masked or not this will abort with EINTR.  systemd-logind
> gets signals from the TTY layer all the time though.

> Here's what might be happening: while logind reads the EFI stuff it gets a
> series of signals from the TTY layer, which causes the read() to be aborted
> with EINTR, which means logind will wait 50ms and retry. Which will be
> aborted again, and so on, until quite some time passed. If we'd not wait for
> the 50ms otoh we wouldn't wait so long, as then on each signal we'd
> immediately retry again.

(cherry picked from commit eee9b30)
sayanchowdhury pushed a commit to kinvolk-archives/systemd-legacy that referenced this issue Aug 3, 2020
Quoting systemd/systemd#14828 (comment):

> [kernel uses] msleep_interruptible() and that means when the process receives
> any kind of signal masked or not this will abort with EINTR.  systemd-logind
> gets signals from the TTY layer all the time though.

> Here's what might be happening: while logind reads the EFI stuff it gets a
> series of signals from the TTY layer, which causes the read() to be aborted
> with EINTR, which means logind will wait 50ms and retry. Which will be
> aborted again, and so on, until quite some time passed. If we'd not wait for
> the 50ms otoh we wouldn't wait so long, as then on each signal we'd
> immediately retry again.

(cherry picked from commit eee9b30)
Yamakuzure pushed a commit to elogind/elogind that referenced this issue Aug 31, 2020
Yamakuzure pushed a commit to elogind/elogind that referenced this issue Aug 31, 2020
Yamakuzure pushed a commit to elogind/elogind that referenced this issue Aug 31, 2020
Quoting systemd/systemd#14828 (comment):

> [kernel uses] msleep_interruptible() and that means when the process receives
> any kind of signal masked or not this will abort with EINTR.  systemd-logind
> gets signals from the TTY layer all the time though.

> Here's what might be happening: while logind reads the EFI stuff it gets a
> series of signals from the TTY layer, which causes the read() to be aborted
> with EINTR, which means logind will wait 50ms and retry. Which will be
> aborted again, and so on, until quite some time passed. If we'd not wait for
> the 50ms otoh we wouldn't wait so long, as then on each signal we'd
> immediately retry again.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
regression ⚠️ A bug in something that used to work correctly and broke through some recent commit