Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] S6 overlay issue #23

Closed
1 task done
myxor opened this issue Dec 8, 2022 · 23 comments · Fixed by #26
Closed
1 task done

[BUG] S6 overlay issue #23

myxor opened this issue Dec 8, 2022 · 23 comments · Fixed by #26

Comments

@myxor
Copy link

myxor commented Dec 8, 2022

Is there an existing issue for this?

  • I have searched the existing issues

Current Behavior

When starting container it runs but i see the following log:

rm: cannot remove '/etc/s6-overlay/s6-rc.d/svc-php-fpm': Directory not empty
[custom-init] No custom services found, skipping...
s6-rc-compile: fatal: unable to read /etc/s6-overlay/s6-rc.d/svc-php-fpm/type: No such file or directory
s6-rc: fatal: unable to take locks: No such file or directory
s6-linux-init-shutdownd: warning: /run/s6/basedir/scripts/rc.shutdown exited 111

Expected Behavior

No response

Steps To Reproduce

Using this image:

REPOSITORY                      TAG              IMAGE ID       CREATED         SIZE
lscr.io/linuxserver/babybuddy   latest           c4e1900f8304   3 days ago      379MB

when i start it the container is running but i see the log above

Environment

- OS: Ubuntu 18.04 aarch64
- How docker service was installed: apt

CPU architecture

arm64

Docker creation

  babybuddy:
    image: lscr.io/linuxserver/babybuddy
    container_name: babybuddy
    environment:
      - PUID=1000
      - PGID=1000
      - TZ=Europe/Berlin
      - DB_ENGINE=django.db.backends.postgresql
      - DB_HOST=postgresql
      - DB_NAME=babybuddy
      - DB_PASSWORD=xxxx
      - DB_USER=babybuddy
      - DEBUG=1
      - CSRF_TRUSTED_ORIGINS=http://192.168.200.2:8027,null
    volumes:
      - ~/babybuddy:/config
    ports:
      - 8000:8000
      - 8027:8000
    restart: unless-stopped

Container logs

rm: cannot remove '/etc/s6-overlay/s6-rc.d/svc-php-fpm': Directory not empty
[custom-init] No custom services found, skipping...
s6-rc-compile: fatal: unable to read /etc/s6-overlay/s6-rc.d/svc-php-fpm/type: No such file or directory
s6-rc: fatal: unable to take locks: No such file or directory
s6-linux-init-shutdownd: warning: /run/s6/basedir/scripts/rc.shutdown exited 111
@github-actions
Copy link

github-actions bot commented Dec 8, 2022

Thanks for opening your first issue here! Be sure to follow the bug or feature issue templates!

@thespad
Copy link
Member

thespad commented Dec 8, 2022

I can't replicate this using your example compose on arm64, everything starts as I'd expect

 ⠿ Container babybuddy   Started                                                                                                                                                                               1.2s
[custom-init] No custom services found, skipping...
[migrations] started
[migrations] 01-nginx-site-confs-default: executing...
[migrations] 01-nginx-site-confs-default: succeeded
[migrations] 02-default-location: executing...
[migrations] 02-default-location: succeeded
[migrations] done

What is the output of docker version on your host?

@myxor
Copy link
Author

myxor commented Dec 8, 2022

docker version
Client:
 Version:           20.10.12
 API version:       1.41
 Go version:        go1.16.2
 Git commit:        20.10.12-0ubuntu2~18.04.1
 Built:             Fri Oct 21 08:26:39 2022
 OS/Arch:           linux/arm64
 Context:           default
 Experimental:      true

Server:
 Engine:
  Version:          20.10.12
  API version:      1.41 (minimum version 1.12)
  Go version:       go1.16.2
  Git commit:       20.10.12-0ubuntu2~18.04.1
  Built:            Mon Apr  4 20:53:56 2022
  OS/Arch:          linux/arm64
  Experimental:     false
 containerd:
  Version:          1.5.9-0ubuntu1~18.04.1
  GitCommit:
 runc:
  Version:          1.1.0-0ubuntu1~18.04.1
  GitCommit:
 docker-init:
  Version:          0.19.0
  GitCommit:

@thespad
Copy link
Member

thespad commented Dec 8, 2022

Your docker install is (relatively) up to date so I wouldn't expect it to be causing any issues. Do you have another host you can try running the container on to see if you experience the same errors?

@myxor
Copy link
Author

myxor commented Dec 9, 2022

Same docker-compose snipped works on a different machine (with x86 arch), so this is not a general bug in the image.
Although it would be nice if starting of the image would not fail because some folder can't be deleted.

@thespad
Copy link
Member

thespad commented Dec 9, 2022

The problem is it's not failing because it can't delete the folder, it's failing because it partially deletes it which then breaks the init process. If it was failing entirely it would start fine, the same as if it succeeded.

You're not running rootless or anything like that are you?

@myxor
Copy link
Author

myxor commented Dec 13, 2022

I am not running rootless or anything like that, just plain docker on Ubuntu Server.

@github-actions
Copy link

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@andrewgdunn
Copy link

Migrating my babybuddy to another box. I've had it running rootless with podman for a couple months. I think I'm seeing something similar:

[baby@muckle ~]$ podman run --replace --detach --label io.containers.autoupdate=registry --name=babybuddy --net=pasta --publish 127.0.0.1:3006:8000 --env-file ~/babybuddy.env --volume ~/babybuddy:/config:Z lscr.io/linuxserver/babybuddy:latest
61323939abef0033d11deed4cff00c025878fdca1c1a078316c8c4c1d3ac1ea7
[baby@muckle ~]$ podman logs babybuddy 
rm: cannot remove '/etc/s6-overlay/s6-rc.d/svc-php-fpm': Directory not empty
s6-rc-compile: fatal: unable to read /etc/s6-overlay/s6-rc.d/svc-php-fpm/type: No such file or directory

@andrewgdunn
Copy link

andrewgdunn commented Feb 15, 2023

It appears this manifested with the tag: v1.13.2-ls62

[baby@muckle ~]$ podman run --replace --detach --label io.containers.autoupdate=registry --name=babybuddy --net=pasta --publish 127.0.0.1:3006:8000 --env-file ~/babybuddy.env --volume ~/babybuddy:/config:Z lscr.io/linuxserver/babybuddy:v1.13.2-ls61
[baby@muckle ~]$ podman logs babybuddy 
[custom-init] No custom services found, skipping...
s6-rc: info: service s6rc-oneshot-runner: starting
s6-rc: info: service s6rc-oneshot-runner successfully started
s6-rc: info: service fix-attrs: starting
s6-rc: info: service 00-legacy: starting
s6-rc: info: service 00-legacy successfully started
s6-rc: info: service fix-attrs successfully started
s6-rc: info: service legacy-cont-init: starting
cont-init: info: running /etc/cont-init.d/01-envfile
cont-init: info: /etc/cont-init.d/01-envfile exited 0
cont-init: info: running /etc/cont-init.d/01-migrations
[migrations] started
[migrations] no migrations found
cont-init: info: /etc/cont-init.d/01-migrations exited 0
cont-init: info: running /etc/cont-init.d/10-adduser
usermod: no changes

-------------------------------------
          _         ()
         | |  ___   _    __
         | | / __| | |  /  \
         | | \__ \ | | | () |
         |_| |___/ |_|  \__/


Brought to you by linuxserver.io
-------------------------------------

To support the app dev(s) visit:
BabyBuddy: https://github.com/sponsors/cdubz

To support LSIO projects visit:
https://www.linuxserver.io/donate/
-------------------------------------
GID/UID
-------------------------------------

User uid:    911
User gid:    911
-------------------------------------

cont-init: info: /etc/cont-init.d/10-adduser exited 0
cont-init: info: running /etc/cont-init.d/20-config
cont-init: info: /etc/cont-init.d/20-config exited 0
cont-init: info: running /etc/cont-init.d/30-config
Operations to perform:
  Apply all migrations: admin, auth, authtoken, axes, babybuddy, contenttypes, core, sessions
Running migrations:
  Applying contenttypes.0001_initial... OK
  Applying auth.0001_initial... OK
  Applying admin.0001_initial... OK
  Applying admin.0002_logentry_remove_auto_add... OK
  Applying admin.0003_logentry_add_action_flag_choices... OK
  Applying contenttypes.0002_remove_content_type_name... OK
  Applying auth.0002_alter_permission_name_max_length... OK
  Applying auth.0003_alter_user_email_max_length... OK
  Applying auth.0004_alter_user_username_opts... OK
  Applying auth.0005_alter_user_last_login_null... OK
  Applying auth.0006_require_contenttypes_0002... OK
  Applying auth.0007_alter_validators_add_error_messages... OK
  Applying auth.0008_alter_user_username_max_length... OK
  Applying auth.0009_alter_user_last_name_max_length... OK
  Applying auth.0010_alter_group_name_max_length... OK
  Applying auth.0011_update_proxy_permissions... OK
  Applying auth.0012_alter_user_first_name_max_length... OK
  Applying authtoken.0001_initial... OK
  Applying authtoken.0002_auto_20160226_1747... OK
  Applying authtoken.0003_tokenproxy... OK
  Applying axes.0001_initial... OK
  Applying axes.0002_auto_20151217_2044... OK
  Applying axes.0003_auto_20160322_0929... OK
  Applying axes.0004_auto_20181024_1538... OK
  Applying axes.0005_remove_accessattempt_trusted... OK
  Applying axes.0006_remove_accesslog_trusted... OK
  Applying axes.0007_alter_accessattempt_unique_together... OK
  Applying axes.0008_accessfailurelog... OK
  Applying babybuddy.0001_initial... OK
  Applying babybuddy.0002_add_settings... OK
  Applying babybuddy.0003_add_refresh_help_text... OK
  Applying babybuddy.0004_settings_language... OK
  Applying babybuddy.0005_auto_20190502_1701... OK
  Applying babybuddy.0006_auto_20190502_1744... OK
  Applying babybuddy.0007_auto_20190607_1422... OK
  Applying babybuddy.0008_auto_20200120_0622... OK
  Applying babybuddy.0009_settings_timezone... OK
  Applying babybuddy.0010_auto_20200609_0649... OK
  Applying babybuddy.0011_auto_20200813_0238... OK
  Applying babybuddy.0012_auto_20201024_1847... OK
  Applying babybuddy.0013_auto_20210411_1241... OK
  Applying babybuddy.0014_settings_hide_empty... OK
  Applying babybuddy.0015_alter_settings_timezone... OK
  Applying babybuddy.0016_alter_settings_timezone... OK
  Applying babybuddy.0017_settings_hide_age... OK
  Applying babybuddy.0018_auto_20211017_2136... OK
  Applying babybuddy.0019_alter_settings_timezone... OK
  Applying babybuddy.0020_update_language_en_to_en_us... OK
  Applying babybuddy.0021_alter_settings_language... OK
  Applying babybuddy.0022_alter_settings_language... OK
  Applying babybuddy.0023_alter_settings_timezone... OK
  Applying core.0001_initial... OK
  Applying core.0002_auto_20171028_1257... OK
  Applying core.0003_weight... OK
  Applying core.0004_child_picture... OK
  Applying core.0005_auto_20190416_2048... OK
  Applying core.0006_auto_20190502_1701... OK
  Applying core.0007_temperature... OK
  Applying core.0008_auto_20190607_1422... OK
  Applying core.0009_diaperchange_amount... OK
  Applying core.0010_timer_child... OK
  Applying core.0011_auto_20200214_1939... OK
  Applying core.0012_auto_20200813_0238... OK
  Applying core.0013_auto_20210415_0528... OK
  Applying core.0014_alter_child_slug... OK
  Applying core.0015_add_nap_field_for_sleep... OK
  Applying core.0016_alter_sleep_napping... OK
  Applying core.0017_alter_child_last_name... OK
  Applying core.0018_bmi_headcircumference_height... OK
  Applying core.0019_tag_tagged_note_tags... OK
  Applying core.0020_bmi_tags_diaperchange_tags_feeding_tags_and_more... OK
  Applying core.0021_pumping... OK
  Applying core.0022_alter_default_date_and_time... OK
  Applying core.0023_alter_tag_options_alter_bmi_tags_and_more... OK
  Applying core.0024_alter_tag_slug... OK
  Applying sessions.0001_initial... OK
cont-init: info: /etc/cont-init.d/30-config exited 0
cont-init: info: running /etc/cont-init.d/30-keygen
using keys found in /config/keys
cont-init: info: /etc/cont-init.d/30-keygen exited 0
cont-init: info: running /etc/cont-init.d/99-custom-files
[custom-init] No custom files found, skipping...
cont-init: info: /etc/cont-init.d/99-custom-files exited 0
s6-rc: info: service legacy-cont-init successfully started
s6-rc: info: service init-mods: starting
s6-rc: info: service init-mods successfully started
s6-rc: info: service init-mods-package-install: starting
s6-rc: info: service init-mods-package-install successfully started
s6-rc: info: service init-mods-end: starting
s6-rc: info: service init-mods-end successfully started
s6-rc: info: service init-services: starting
s6-rc: info: service init-services successfully started
s6-rc: info: service legacy-services: starting
services-up: info: copying legacy longrun babybuddy (no readiness notification)
services-up: info: copying legacy longrun cron (no readiness notification)
services-up: info: copying legacy longrun nginx (no readiness notification)
services-up: info: copying legacy longrun php-fpm (no readiness notification)
s6-rc: info: service legacy-services successfully started
s6-rc: info: service 99-ci-service-check: starting
[ls.io-init] done.
s6-rc: info: service 99-ci-service-check successfully started
[2023-02-15 17:21:06 +0000] [146] [INFO] Starting gunicorn 20.1.0
[2023-02-15 17:21:06 +0000] [146] [INFO] Listening at: http://127.0.0.1:3000 (146)
[2023-02-15 17:21:06 +0000] [146] [INFO] Using worker: gthread
[2023-02-15 17:21:06 +0000] [164] [INFO] Booting worker with pid: 164
[2023-02-15 17:21:07 +0000] [165] [INFO] Booting worker with pid: 165
[baby@muckle ~]$ podman run --replace --detach --label io.containers.autoupdate=registry --name=babybuddy --net=pasta --publish 127.0.0.1:3006:8000 --env-file ~/babybuddy.env --volume ~/babybuddy:/config:Z lscr.io/linuxserver/babybuddy:v1.13.2-ls62
[baby@muckle ~]$ podman logs babybuddy 
rm: cannot remove '/etc/s6-overlay/s6-rc.d/svc-php-fpm': Directory not empty
[custom-init] No custom services found, skipping...
s6-rc-compile: fatal: unable to read /etc/s6-overlay/s6-rc.d/svc-php-fpm/type: No such file or directory

To summarize above:

  • On a fresh instance (no migrated data)
    • v1.13.2-ls61 (or before, tested back to 59) successfully starts
    • v1.13.2-ls62 (all after) fails to start

@thespad
Copy link
Member

thespad commented Feb 15, 2023

It's an issue we've previously only seen with LXC, the folder removal partially fails (even though it works if run interactively) and that causes the services to fail to start.

@andrewgdunn
Copy link

Is it leveraging a pattern from the wider linuxserver deployment strategy? Should I report the issue further upstream?.

I typically deploy off latest and run a nightly podman-auto-update service, Apparently the only reason I didn't see this before is that I didn't have the auto update service timer enabled. Now that I'm looking to use babybuddy more seriously I'd love to see how to get this resolved in the image or at-least have a discussion of the mitigation (e.g. what exactly needs to be done interactively).

@thespad
Copy link
Member

thespad commented Feb 15, 2023

The problem is it can't be fixed interactively (because the partially deleted service halts the init entirely). It might be possible for us to detect the failure pre-init but our only option in that situation would be to restore the deleted files because the delete doesn't work. It's not breaking to have the service present, it's just unnecessary, and I don't really want to put that logic in for such an edge case issue if I can avoid it.

Because we're not able to replicate this in any of our test environments at the moment, it makes it hard to come up with any kind of "proper" solution for the issue.

@andrewgdunn
Copy link

I guess I'm also a bit confused by the explanation, these files are all internal to the container image. It's not like permission for a mount/bind or selinux is the cause of the failure.

Running interactively doesn't seem to result in a change:

[baby@muckle ~]$ podman run --replace --detach --label io.containers.autoupdate=registry --security-opt label=disable --name=babybuddy --net=pasta --publish 127.0.0.1:3006:8000 --env-file ~/babybuddy.env --volume ~/babybuddy:/config:Z lscr.io/linuxserver/babybuddy:latest
4d6f34f8a147f0257268e11588269bc34584d0f9ed0fb0cd20173138626b1548

[baby@muckle ~]$ podman logs babybuddy 
rm: cannot remove '/etc/s6-overlay/s6-rc.d/svc-php-fpm': Directory not empty
s6-rc-compile: fatal: unable to read /etc/s6-overlay/s6-rc.d/svc-php-fpm/type: No such file or directory

[baby@muckle ~]$ podman exec -it babybuddy ls /etc/s6-overlay/s6-rc.d/svc-php-fpm
dependencies.d

[baby@muckle ~]$ podman exec -it babybuddy rm -rf /etc/s6-overlay/s6-rc.d/svc-php-fpm
rm: cannot remove '/etc/s6-overlay/s6-rc.d/svc-php-fpm': Directory not empty

[baby@muckle ~]$ podman exec -it babybuddy rm -rf /etc/s6-overlay/s6-rc.d/svc-php-fpm/dependencies.d

[baby@muckle ~]$ podman exec -it babybuddy ls /etc/s6-overlay/s6-rc.d/svc-php-fpm
dependencies.d

@thespad
Copy link
Member

thespad commented Feb 15, 2023

Ah, interesting. In the LXC cases we've seen before, it was possible to delete the folders interactively.

@andrewgdunn
Copy link

@myxor do you remember what filesystems you were using for the working and nonworking instances you fiddled with?

@thespad
Copy link
Member

thespad commented Feb 15, 2023

Hmm, can you provide the full output of podman info

@myxor
Copy link
Author

myxor commented Feb 15, 2023

@myxor do you remember what filesystems you were using for the working and nonworking instances you fiddled with?

Both using ext4 as filesystem.

@andrewgdunn
Copy link

Yeah (below). I'd asked @myxor about filesystems because I realize that my original deployment was on xfs and I'm attempting to move to zfs. By this I mean it's where the users home directory is. I deploy services as an unprivileged user, where all the context for the deployment is in their home directory.

I feel like the zfs thing is likely not confounding the situation as I've got other linuxserver containers deployed in the same fashion.

In this case the baby user has it's home directory on a zfs dataset which could be confounding things (I'm going to move it to an xfs partition and report back).

[baby@muckle ~]$ podman info
host:
  arch: amd64
  buildahVersion: 1.29.0
  cgroupControllers:
  - cpu
  - memory
  - pids
  cgroupManager: systemd
  cgroupVersion: v2
  conmon:
    package: conmon-2.1.5-1.el9.x86_64
    path: /usr/bin/conmon
    version: 'conmon version 2.1.5, commit: 48adb81a22c26f0660f0f37d984baebe7b9ade98'
  cpuUtilization:
    idlePercent: 98.94
    systemPercent: 0.56
    userPercent: 0.5
  cpus: 32
  distribution:
    distribution: '"centos"'
    version: "9"
  eventLogger: file
  hostname: muckle
  idMappings:
    gidmap:
    - container_id: 0
      host_id: 3006
      size: 1
    - container_id: 1
      host_id: 689824
      size: 65536
    uidmap:
    - container_id: 0
      host_id: 3006
      size: 1
    - container_id: 1
      host_id: 689824
      size: 65536
  kernel: 5.14.0-252.el9.x86_64
  linkmode: dynamic
  logDriver: k8s-file
  memFree: 163749969920
  memTotal: 270122332160
  networkBackend: netavark
  ociRuntime:
    name: crun
    package: crun-1.8-1.el9.x86_64
    path: /usr/bin/crun
    version: |-
      crun version 1.8
      commit: 0356bf4aff9a133d655dc13b1d9ac9424706cac4
      rundir: /run/user/3006/crun
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +YAJL
  os: linux
  remoteSocket:
    path: /run/user/3006/podman/podman.sock
  security:
    apparmorEnabled: false
    capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID
    rootless: true
    seccompEnabled: true
    seccompProfilePath: /usr/share/containers/seccomp.json
    selinuxEnabled: true
  serviceIsRemote: false
  slirp4netns:
    executable: /usr/bin/slirp4netns
    package: slirp4netns-1.2.0-2.el9.x86_64
    version: |-
      slirp4netns version 1.2.0
      commit: 656041d45cfca7a4176f6b7eed9e4fe6c11e8383
      libslirp: 4.4.0
      SLIRP_CONFIG_VERSION_MAX: 3
      libseccomp: 2.5.2
  swapFree: 34345054208
  swapTotal: 34345054208
  uptime: 16h 24m 18.00s (Approximately 0.67 days)
plugins:
  authorization: null
  log:
  - k8s-file
  - none
  - passthrough
  - journald
  network:
  - bridge
  - macvlan
  volume:
  - local
registries:
  search:
  - registry.access.redhat.com
  - registry.redhat.io
  - docker.io
store:
  configFile: /zfs/safe/baby/.config/containers/storage.conf
  containerStore:
    number: 1
    paused: 0
    running: 1
    stopped: 0
  graphDriverName: overlay
  graphOptions: {}
  graphRoot: /zfs/safe/baby/.local/share/containers/storage
  graphRootAllocated: 3628517818368
  graphRootUsed: 3059482624
  graphStatus:
    Backing Filesystem: zfs
    Native Overlay Diff: "true"
    Supports d_type: "true"
    Using metacopy: "false"
  imageCopyTmpDir: /var/tmp
  imageStore:
    number: 9
  runRoot: /run/user/3006/containers
  transientStore: false
  volumePath: /zfs/safe/baby/.local/share/containers/storage/volumes
version:
  APIVersion: 4.4.0
  Built: 1675439377
  BuiltTime: Fri Feb  3 10:49:37 2023
  GitCommit: ""
  GoVersion: go1.19.4
  Os: linux
  OsArch: linux/amd64
  Version: 4.4.0

@thespad
Copy link
Member

thespad commented Feb 15, 2023

What we're doing with the babybuddy container is uncommon, though not unique, among our fleet so it's entirely possible you're not running anything else trying to perform the same actions.

@andrewgdunn
Copy link

@thespad I joined the linuxserver.io discord and poked you on there if you want to do some interactive debugging.

@thespad
Copy link
Member

thespad commented Feb 15, 2023

There's also this which is a very old bug but possibly a factor in some cases (not your zfs one though).

@andrewgdunn
Copy link

If @myxor or others would want to follow the discussion can be found here.

@thespad thespad linked a pull request Feb 15, 2023 that will close this issue
1 task
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Apr 3, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants