Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Users missing on /etc/passwd #2488

Closed
nianyush opened this issue Apr 18, 2024 · 21 comments
Closed

Users missing on /etc/passwd #2488

nianyush opened this issue Apr 18, 2024 · 21 comments
Assignees
Labels
bug Something isn't working prio: high

Comments

@nianyush
Copy link

nianyush commented Apr 18, 2024

Kairos version 2.4.5

Apr 18 18:28:25 localhost systemd[1]: Stopped OpenBSD Secure Shell server.
Apr 18 18:28:25 localhost systemd[1]: Starting OpenBSD Secure Shell server...
Apr 18 18:28:25 localhost sshd[1618]: Privilege separation user sshd does not exist
Apr 18 18:28:25 localhost systemd[1]: ssh.service: Control process exited, code=exited, status=255/EXCEPTION
Apr 18 18:28:25 localhost systemd[1]: ssh.service: Failed with result 'exit-code'.
Apr 18 18:28:25 localhost systemd[1]: Failed to start OpenBSD Secure Shell server.

image

Seems to be a very corner case for us. I have only seen this once.

Config file that triggered this:

#cloud-config

cosign: false
install:
    auto: true
    device: auto
    grub-entry-name: Palette eXtended Kubernetes Edge
    grub_options:
        saved_entry: registration
    passive:
        size: 8192
    poweroff: true
    reboot: false
    recovery-system:
        size: 10000
    system:
        size: 8192
reset:
    grub-entry-name: Palette eXtended Kubernetes Edge
    system:
        size: 8192
stages:
    after-upgrade:
        - commands:
            - mkdir -p /usr/local/bin
            - '[ -L /usr/local/bin/agent-provider-stylus ] || ln -s /opt/spectrocloud/bin/agent-provider-stylus /usr/local/bin/agent-provider-stylus'
            - '[ -L /usr/local/bin/palette-tui ] || ln -s /opt/spectrocloud/bin/palette-tui /usr/local/bin/palette-tui'
            - bash /opt/spectrocloud/scripts/content.sh
          name: Execute after upgrade commands
        - commands:
            - grub2-editenv /oem/grubenv unset saved_entry
          if: '[ -f /oem/grubenv ]'
          name: Unset registration grubenv
    before-install:
        - commands:
            - echo -n > /etc/machine-id
          if: '[ -f "/run/cos/live_mode" ]'
          name: Remove machine-id
        - commands:
            - echo -n > /etc/machine-id
          if: '[ -f "/run/cos/live_mode" ]'
          name: Remove machine-id
    initramfs:
        - name: Create kairos user
          users:
            kairos:
                groups:
                    - sudo
                passwd: kairos
        - users:
            kairos:
                groups:
                    - sudo
                passwd: kairos
        - name: Create kairos user
          users:
            kairos:
                groups:
                    - sudo
                passwd: kairos
        - users:
            kairos:
                groups:
                    - sudo
                passwd: kairos
        - if: '[ ! -f /oem/80_stylus.yaml ]'
          name: set_inotify_max_values
          sysctl:
            fs.inotify.max_user_instances: "8192"
            fs.inotify.max_user_watches: "524288"
stylus:
    includeTui: false
    installationMode: airgap
    skipStylusUpgrade: true
upgrade:
    grub-entry-name: Palette eXtended Kubernetes Edge
    passive:
        size: 8192
    recovery-system:
        size: 8192
    system:
        size: 8192
verify: false
@nianyush nianyush added bug Something isn't working triage Add this label to issues that should be triaged and prioretized in the next planning call unconfirmed labels Apr 18, 2024
@ci-robbot
Copy link
Collaborator

Hello nianyush,

I'm a bot, an experiment of @mudler and @jimmykarily. Thank you for reporting this issue with Kairos version 2.4.5. I can see that the sshd service is failing to start after boot and that the sshd user does not exist in /etc/passwd. This information is helpful in diagnosing the problem.

Please consider providing the following additional details to help us better understand and reproduce the issue:

  1. Are there any relevant logs or configuration files that could help us understand the issue better?
  2. Have there been any recent updates or changes made to your system that could have caused this issue?

Once we have all the necessary information, we will be able to better assess the issue and take appropriate action. We appreciate your patience and cooperation.

Best regards,
Kairos Bot

@jimmykarily
Copy link
Contributor

This might be related to this: #2492

@Itxaka
Copy link
Member

Itxaka commented Apr 22, 2024

Missing here is the info that the sshd user did indeed disappeared from the /etc/passwd

@mauromorales mauromorales added prio: high and removed triage Add this label to issues that should be triaged and prioretized in the next planning call labels Apr 25, 2024
@mauromorales
Copy link
Member

@nianyush can you confirm that this is also the same issue you experienced with the fully deleted /etc/passwd/ after a kairos-agent upgrade?

@nianyush
Copy link
Author

@mauromorales yes it's exactly the same issue. Only sshd user is lost from /etc/passwd and rest looks fine

@nianyush
Copy link
Author

nianyush commented Apr 25, 2024

encountered this issue again yesterday with kairos v3.0.6 with uki mode. After doing kairos-agent upgrade with a new image and then reboot, i cannot ssh into the vm anymore.
image
image
image

@mauromorales
Copy link
Member

@nianyush thanks for the extra info. To which version did you upgrade to? the same?

@nianyush
Copy link
Author

@mauromorales yes there is no difference in kairos or os version

@mauromorales
Copy link
Member

@nianyush do you by any chance have this system still online? if so, could you share the mounts?

@nianyush
Copy link
Author

nianyush commented Apr 26, 2024

yes i still have one of the systems. from 3.0.6 with uki
image

@mauromorales mauromorales changed the title Sshd service failed to start after boot, and sshd user does not exist in /etc/passwd Users missing on /etc/passwd Apr 26, 2024
@mauromorales
Copy link
Member

While this was detected becasue of the lack of ssh access, the issue is not related to sshd, but to the fact that a bunch of users are missing from the /etc/passwd. I've renamed the ticket to reflect this.

@Itxaka
Copy link
Member

Itxaka commented Apr 26, 2024

@nianyush if you still got access to the machines would it be possible to extract the logs from it?
Kairos logs, journalctl logs, immucore+stages logs would all be very useful. Especially the 2.4.5 which we have access to the original qcow2 file so we can try to reproduce.

Also, is there any metadata attached to the machine? cdrom/usb with a config drive?

@mauromorales
Copy link
Member

@nianyush also if possible, can you check for any other units breaking in systemd? And is it possible to compare the user list vrs a system that is working correctly, want to validate my previous comment

@mauromorales
Copy link
Member

mauromorales commented Apr 26, 2024

Managed to reproduce after several runs

root@localhost:/home/kairos# cat /etc/os-release
PRETTY_NAME="Ubuntu 23.10"
NAME="Ubuntu"
VERSION_ID="23.10"
VERSION="23.10 (Mantic Minotaur)"
VERSION_CODENAME=mantic
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=mantic
LOGO=ubuntu-logo
KAIROS_IMAGE_LABEL="23.10-core-amd64-generic-v3.0.7"
KAIROS_ARTIFACT="kairos-ubuntu-23.10-core-amd64-generic-v3.0.7"
KAIROS_FLAVOR="ubuntu"
KAIROS_MODEL="generic"
KAIROS_BUG_REPORT_URL="https://github.com/kairos-io/kairos/issues"
KAIROS_VARIANT="core"
KAIROS_TARGETARCH="amd64"
KAIROS_ID="kairos"
KAIROS_NAME="kairos-core-ubuntu-23.10"
KAIROS_VERSION="v3.0.7"
KAIROS_PRETTY_NAME="kairos-core-ubuntu-23.10 v3.0.7"
KAIROS_IMAGE_REPO="quay.io/kairos/ubuntu:23.10-core-amd64-generic-v3.0.7"
KAIROS_FAMILY="ubuntu"
KAIROS_REGISTRY_AND_ORG="quay.io/kairos"
KAIROS_VERSION_ID="v3.0.7"
KAIROS_FLAVOR_RELEASE="23.10"
KAIROS_RELEASE="v3.0.7"
KAIROS_HOME_URL="https://github.com/kairos-io/kairos"
KAIROS_SOFTWARE_VERSION_PREFIX="k3s"
KAIROS_ID_LIKE="kairos-core-ubuntu-23.10"
KAIROS_GITHUB_REPO="kairos-io/kairos"
root@localhost:/home/kairos# cat /etc/passwd
kairos:x:1000:65538:Created by entities:/home/kairos:/bin/sh
root:x:0:0::/root:/bin/bash
daemon:x:1:1::/usr/sbin:/usr/sbin/nologin
bin:x:2:2::/bin:/usr/sbin/nologin
sys:x:3:3::/dev:/usr/sbin/nologin
sync:x:4:65534::/bin:/bin/sync
games:x:5:60::/usr/games:/usr/sbin/nologin
man:x:6:12::/var/cache/man:/usr/sbin/nologin
lp:x:7:7::/var/spool/lpd:/usr/sbin/nologin
mail:x:8:8::/var/mail:/usr/sbin/nologin
news:x:9:9::/var/spool/news:/usr/sbin/nologin
uucp:x:10:10::/var/spool/uucp:/usr/sbin/nologin
proxy:x:13:13::/bin:/usr/sbin/nologin
www-data:x:33:33::/var/www:/usr/sbin/nologin
backup:x:34:34::/var/backups:/usr/sbin/nologin
list:x:38:38::/var/list:/usr/sbin/nologin
irc:x:39:39::/run/ircd:/usr/sbin/nologin
_apt:x:42:65534::/nonexistent:/usr/sbin/nologin
nobody:x:65534:65534::/nonexistent:/usr/sbin/nologin
messagebus:x:109:109:System Message Bus:/:/usr/sbin/nologin
polkitd:x:996:996:polkit:/nonexistent:/usr/sbin/nologin
systemd-network:x:998:998:systemd Network Management:/:/usr/sbin/nologin
systemd-resolve:x:995:995:systemd Resolver:/:/usr/sbin/nologin
systemd-timesync:x:997:997:systemd Time Synchronization:/:/usr/sbin/nologin
root@localhost:/home/kairos# ./yip -a -s initramfs /oem/90_custom.yaml
INFO[0000] yip version v1.6.1-g9484451dac23973ab3cd8a76df42edb2415f7f3e 2024-04-22 13:12:56 UTC
INFO[0000] 1.
INFO[0000]  <init> (background: false) (weak: false)
INFO[0000] 2.
INFO[0000]  </oem/90_custom.yaml.Create kairos user> (background: false) (weak: true)
INFO[0000] 3.
INFO[0000]  </oem/90_custom.yaml.1> (background: false) (weak: true)
INFO[0000]  </oem/90_custom.yaml.3> (background: false) (weak: true)
INFO[0000] 4.
INFO[0000]  </oem/90_custom.yaml.Create kairos user.1> (background: false) (weak: true)
INFO[0000]  </oem/90_custom.yaml.set_inotify_max_values> (background: false) (weak: true)

@mauromorales
Copy link
Member

mauromorales commented Apr 26, 2024

Looking at the previous yip analysis, it seems like 2 of the user creation get executed in parallel, when normally they should be done in serial (they are all touching the /etc/passwd file and there's no mutex mechanism as far as I can tell)

I think this somehow comes from having those name: Create kairos user attributes. On a config with 4 duplicated users but without a name they all get evaluated as a single step i.e. in serial:

root@localhost:/home/kairos# ./yip -a -s initramfs no-name.yaml
INFO[0000] yip version v1.6.1-g9484451dac23973ab3cd8a76df42edb2415f7f3e 2024-04-22 13:12:56 UTC
INFO[0000] 1.
INFO[0000]  <init> (background: false) (weak: false)
INFO[0000] 2.
INFO[0000]  <no-name.yaml.0> (background: false) (weak: true)
INFO[0000] 3.
INFO[0000]  <no-name.yaml.1> (background: false) (weak: true)
INFO[0000] 4.
INFO[0000]  <no-name.yaml.2> (background: false) (weak: true)
INFO[0000] 5.
INFO[0000]  <no-name.yaml.3> (background: false) (weak: true)
INFO[0000] 6.
INFO[0000]  <no-name.yaml.set_inotify_max_values> (background: false) (weak: true)

@mauromorales
Copy link
Member

mauromorales commented Apr 26, 2024

I think that the problem comes because of the duplicated name of those user creations, if the names are different, the analysis is similar to the one in the previous comment.

However, because the names are the same, when yip starts adding dependencies, it uses the name as the identifier of the dependency, which when inverting the graph will group them together

see how below we have a ([]herd.GraphEntry) (len=2 cap=2) { at some point, grouping those 2, which never happens when there are no names

([][]herd.GraphEntry) (len=4 cap=4) {
 ([]herd.GraphEntry) (len=1 cap=1) {
  (herd.GraphEntry) {
   WithCallback: (bool) false,
   Background: (bool) false,
   Callback: ([]func(context.Context) error) <nil>,
   Error: (error) <nil>,
   Ignored: (bool) false,
   Fatal: (bool) false,
   WeakDeps: (bool) false,
   Executed: (bool) false,
   Name: (string) (len=4) "init",
   Dependencies: ([]string) <nil>,
   WeakDependencies: ([]string) <nil>
  }
 },
 ([]herd.GraphEntry) (len=1 cap=1) {
  (herd.GraphEntry) {
   WithCallback: (bool) true,
   Background: (bool) false,
   Callback: ([]func(context.Context) error) (len=1 cap=1) {
    (func(context.Context) error) 0xadb1c0
   },
   Error: (error) <nil>,
   Ignored: (bool) false,
   Fatal: (bool) false,
   WeakDeps: (bool) true,
   Executed: (bool) false,
   Name: (string) (len=42) "/some/yip/01_first.yaml.Create Kairos User",
   Dependencies: ([]string) <nil>,
   WeakDependencies: ([]string) <nil>
  }
 },
 ([]herd.GraphEntry) (len=2 cap=2) {
  (herd.GraphEntry) {
   WithCallback: (bool) true,
   Background: (bool) false,
   Callback: ([]func(context.Context) error) (len=1 cap=1) {
    (func(context.Context) error) 0xadb1c0
   },
   Error: (error) <nil>,
   Ignored: (bool) false,
   Fatal: (bool) false,
   WeakDeps: (bool) true,
   Executed: (bool) false,
   Name: (string) (len=25) "/some/yip/01_first.yaml.1",
   Dependencies: ([]string) (len=1 cap=1) {
    (string) (len=42) "/some/yip/01_first.yaml.Create Kairos User"
   },
   WeakDependencies: ([]string) <nil>
  },
  (herd.GraphEntry) {
   WithCallback: (bool) true,
   Background: (bool) false,
   Callback: ([]func(context.Context) error) (len=1 cap=1) {
    (func(context.Context) error) 0xadb1c0
   },
   Error: (error) <nil>,
   Ignored: (bool) false,
   Fatal: (bool) false,
   WeakDeps: (bool) true,
   Executed: (bool) false,
   Name: (string) (len=25) "/some/yip/01_first.yaml.3",
   Dependencies: ([]string) (len=1 cap=1) {
    (string) (len=42) "/some/yip/01_first.yaml.Create Kairos User"
   },
   WeakDependencies: ([]string) <nil>
  }
 },
 ([]herd.GraphEntry) (len=1 cap=1) {
  (herd.GraphEntry) {
   WithCallback: (bool) true,
   Background: (bool) false,
   Callback: ([]func(context.Context) error) (len=1 cap=1) {
    (func(context.Context) error) 0xadb1c0
   },
   Error: (error) <nil>,
   Ignored: (bool) false,
   Fatal: (bool) false,
   WeakDeps: (bool) true,
   Executed: (bool) false,
   Name: (string) (len=44) "/some/yip/01_first.yaml.Create Kairos User.1",
   Dependencies: ([]string) (len=1 cap=1) {
    (string) (len=25) "/some/yip/01_first.yaml.1"
   },
   WeakDependencies: ([]string) <nil>
  }
 }
}

so in the end both Name: (string) (len=25) "/some/yip/01_first.yaml.1", and Name: (string) (len=25) "/some/yip/01_first.yaml.3", have (string) (len=42) "/some/yip/01_first.yaml.Create Kairos User" as a dependency

mauromorales added a commit to mauromorales/yip that referenced this issue Apr 26, 2024
relates to kairos-io/kairos#2488

Signed-off-by: Mauro Morales <contact@mauromorales.com>
@nianyush
Copy link
Author

nianyush commented Apr 26, 2024

Glad to hear you can reproduce it! I don't have that 2.4.5 vm anymore :(

@mauromorales
Copy link
Member

@nianyush thanks a lot for opening the issue and providing all the info though, if it wasn't for your screenshots I wouldn't have had an idea where to look

mauromorales added a commit to kairos-io/kairos-agent that referenced this issue Apr 27, 2024
fixes kairos-io/kairos#2488

Signed-off-by: Mauro Morales <mauro.morales@spectrocloud.com>
@mauromorales
Copy link
Member

The fix for this issue is now in the latest Kairos release (artifacts building, give it a couple of hours)
Kairos: https://github.com/kairos-io/kairos/releases/tag/v3.0.8
Agent: https://github.com/kairos-io/kairos-agent/releases/tag/v2.8.13

@nianyush
Copy link
Author

@mauromorales @Itxaka facing this issue again with 3.0.10 kairos

@mauromorales
Copy link
Member

@nianyush could you paste the system's /etc/os-release and payload from the provider?

@mudler mudler mentioned this issue May 14, 2024
29 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working prio: high
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

5 participants