Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARM64 polkit.service Unit failed on startup #156

Closed
bobhenkel opened this issue Jul 16, 2020 · 9 comments
Closed

ARM64 polkit.service Unit failed on startup #156

bobhenkel opened this issue Jul 16, 2020 · 9 comments

Comments

@bobhenkel
Copy link

Spun up an alpha channel instance on OpenStack VM. On my first ssh login I see the the output below. I was able to spin up a x86_64 production channel instance and it didn't cause this failure.

The authenticity of host 'x.y.z (x.y.z)' can't be established.
ECDSA key fingerprint is SHA256:5EJ6tTTrX4zFLh25biRCk96MSFH.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'x.y.z' (ECDSA) to the list of known hosts.
Enter passphrase for key '/Users/bob/temp.key': 
Flatcar Container Linux by Kinvolk alpha (2513.1.0)
Failed Units: 1
  polkit.service
core@test ~ $ systemctl status polkit.service
● polkit.service - Authorization Manager
   Loaded: loaded (/usr/lib/systemd/system/polkit.service; static; vendor preset: disabled)
   Active: failed (Result: signal) since Thu 2020-07-16 23:38:15 UTC; 2min 0s ago
     Docs: man:polkit(8)
  Process: 860 ExecStart=/usr/lib/polkit-1/polkitd --no-debug (code=killed, signal=SEGV)
 Main PID: 860 (code=killed, signal=SEGV)

Jul 16 23:38:15 localhost systemd[1]: Starting Authorization Manager...
Jul 16 23:38:15 localhost polkitd[860]: Started polkitd version 0.113
Jul 16 23:38:15 localhost systemd[1]: polkit.service: Main process exited, code=killed, status=11/SEGV
Jul 16 23:38:15 localhost systemd[1]: polkit.service: Failed with result 'signal'.
Jul 16 23:38:15 localhost systemd[1]: Failed to start Authorization Manager.
core@test ~ $ systemctl start polkit.service
Error getting authority: Error initializing authority: Error calling StartServiceByName for org.freedesktop.PolicyKit1: Timeout was reached (g-io-error-quark, 24)
^C

polkit.service is not starting up.

Impact

Not sure.

Environment and steps to reproduce

  1. Set-up: OpenStack VM, Alpha ARM64 Channel (Flatcar Container Linux by Kinvolk alpha (2513.1.0))
  2. Task: First ssh login/first boot
  3. Action(s): NA
  4. Error: See description above

Expected behavior
polkit.service should start up without failure.

Additional information
NA

@bobhenkel
Copy link
Author

bobhenkel commented Aug 13, 2020

Still seeing this on Flatcar Container Linux by Kinvolk alpha (2592.0.0) What does this failing impact?

Thanks
Bob

@dongsupark
Copy link
Member

Thanks for the reminder.

We have been aware of the issue since a long time. It only happens on ARM64.
I suppose it has something to do with SELinux libs or policies that are not available on ARM64.
It is just that we have not had a chance to have a deeper look into the issue, like other ARM-related issues.

@dghubble
Copy link

This prevents other components from running too (systemd-hostnamed, coreos-metadata, etc), same message. @bobhenkel

@dongsupark
Copy link
Member

FYI, I have been doing experiments of disabling policykit for systemd, only for arm64.
WIP branch: https://github.com/kinvolk/coreos-overlay/tree/dongsu/disable-policykit-systemd-arm .
With the change, failures in the units polkit and systemd-hostnamed were apparently gone.
I am still not sure if that is correct thing to do.
It's not like the whole CI ran or so.

@dongsupark
Copy link
Member

Interesting.

CI runs of the WIP branch caused the whole packages-matrix builds to hang forever during sys-libs/ldb builds, like:

build    2585512 99.1 11.6 7769216 7657660 ?     Rl   15:28   2:59 /qemu-aarch64-static /build/arm64-usr/var/tmp/portage/sys-libs/ldb-1.3.6/work/ldb-1.3.6-.arm64/bin/.conf_check_0/testbuild/default/testprog

It happens always during Jenkins builds.
However, it does not happen under a local SDK.

I have no idea why the policykit change has anything to do with ldb at all.
For now I will stop looking into the issue.

@margamanterola
Copy link
Contributor

It's possible that the hanging that Dongsu saw on ARM was due to the same issue that Thilo fixed last week with regards to the old qemu version. I'm going to rebase the change against the latest main and retry running the CI with that.

@margamanterola
Copy link
Contributor

Indeed, as I suspected the build of Dongsu's branch succeeded in the first try and didn't get stuck. However, when testing this on a machine with an explicit ignition config (i.e. without the default one that calls cloud-init), the hostname gets stuck with localhost. In the journal I can see:

Feb 16 16:56:25 localhost dbus[1417]: [system] Successfully activated service 'org.freedesktop.hostname1'
Feb 16 16:56:25 localhost systemd[1]: Started Hostname Service.
Feb 16 16:56:25 localhost systemd-networkd[1251]: Could not set hostname: Permission denied

If I understand this correctly, systemd-networkd is trying to set the hostname (from the DHCP data?), but it can't because there's no policy kit to authorize the operation. As a comparison, these are the equivalent logs on an amd64 instance:

Feb 16 15:56:14 localhost dbus[1392]: [system] Successfully activated service 'org.freedesktop.hostname1'
Feb 16 15:56:14 localhost systemd[1]: Started Hostname Service.
Feb 16 15:56:14 localhost dbus[1392]: [system] Activating via systemd: service name='org.freedesktop.PolicyKit1' unit='polkit.service'
Feb 16 15:56:14 localhost systemd[1]: Starting Authorization Manager...
Feb 16 15:56:14 localhost polkitd[1462]: Started polkitd version 0.113
Feb 16 15:56:14 localhost polkitd[1462]: Loading rules from directory /etc/polkit-1/rules.d
Feb 16 15:56:14 localhost polkitd[1462]: Loading rules from directory /usr/share/polkit-1/rules.d
Feb 16 15:56:14 localhost polkitd[1462]: Finished loading, compiling and executing 2 rules
Feb 16 15:56:14 localhost systemd[1]: Started Authorization Manager.
Feb 16 15:56:14 localhost dbus[1392]: [system] Successfully activated service 'org.freedesktop.PolicyKit1'
Feb 16 15:56:14 localhost polkitd[1462]: Acquired the name org.freedesktop.PolicyKit1 on the system bus
Feb 16 15:56:14 ip-10-0-3-151 systemd-hostnamed[1440]: Changed hostname to 'ip-10-0-3-151'

So, I don't think we can compile the ARM image without policykit. We need an updated version of policykit that works correctly on ARM.

@tbbharaj
Copy link

Facing same issue on arm64 ami, is there a plan to get a fix rolled out for this anytime soon?

@dongsupark
Copy link
Member

This issue should be fixed.
The PR flatcar-archive/coreos-overlay#1263 was recently merged, and it was included in Alpha 3033.0.0.

Thanks for the bug report.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants