Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Starting containers fails with EACCESS to bash/sh for select users #63

Closed
krono opened this issue Mar 2, 2021 · 13 comments
Closed

Starting containers fails with EACCESS to bash/sh for select users #63

krono opened this issue Mar 2, 2021 · 13 comments

Comments

@krono
Copy link
Contributor

krono commented Mar 2, 2021

[Caveat: running the ppc64le branch]

My users report that they cannot start any container; they fail with:

enroot-switchroot: failed to execute: /bin/sh: Permission denied

I straced the execution and found the following:

For my own unpriviliged user, thinks work fine (sanitized):

chdir("${HOME}     = 0
access("/bin/bash", R_OK|X_OK)          = 0
access("/etc/rc", F_OK)                 = 0
execve("/bin/bash", ["-bash", "/etc/rc", "nvidia-smi"], 0x123d700c0 /* 13 vars */) = 0
brk(NULL)                               = 0x12cdc0000
access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or directory)
....

For other users, not so much:

chdir("${HOME}") = 0
access("/bin/bash", R_OK|X_OK)          = -1 EACCES (Permission denied)
access("/etc/rc", F_OK)                 = 0
execve("/bin/sh", ["-sh", "/etc/rc", "nvidia-smi"], 0x107a900e0 /* 13 vars */) = -1 EACCES (Permission denied)
writev(2, [{iov_base="enroot-switchroot: ", iov_len=19}, {iov_base=NULL, iov_len=0}], 2enroot-switchroot: ) = 19
writev(2, [{iov_base="failed to execute: /bin/sh", iov_len=26}, {iov_base=NULL, iov_len=0}], 2failed to execute: /bin/sh) = 26
writev(2, [{iov_base="", iov_len=0}, {iov_base=": ", iov_len=2}], 2: ) = 2
writev(2, [{iov_base="", iov_len=0}, {iov_base="Permission denied", iov_len=17}], 2Permission denied) = 17
writev(2, [{iov_base="", iov_len=0}, {iov_base="\n", iov_len=1}], 2
) = 1
exit_group(1)                           = ?
+++ exited with 1 +++

== Notes ==

The permissions of sh and bash are ok, and although they have SELinux labels:

# ls -alZ /bin/bash /bin/sh
-rwxr-xr-x. 1 root root system_u:object_r:shell_exec_t:s0 1980208 Aug 30  2019 /bin/bash
lrwxrwxrwx. 1 root root system_u:object_r:bin_t:s0              4 Aug 30  2019 /bin/sh -> bash

SELinux is off:

# sestatus
SELinux status:                 disabled

The home directory is on a network file system, but since it works for my unpriviliged user, I don't think that bit matter.

I would appreciate some guidance how to debug here :)

@3XX0
Copy link
Member

3XX0 commented Mar 2, 2021

Are you sharing your containers with other users?
It looks like they don't have permissions to the sh binary inside this specific container (not the host one)

The home directory is on a network file system

Is selinux disabled for all users?
Maybe you have squashing enabled on your network file system which could alter permissions.

@krono
Copy link
Contributor Author

krono commented Mar 2, 2021

Are you sharing your containers with other users?

No, they are tied to the individual users. (using ~/.local/enroot or /run/user/###/enroot and stuff)

It looks like they don't have permissions to the sh binary inside this specific container (not the host one)

the unpacked container stuffies in the local directories seem to have ok permissions; how can I debug that?

The home directory is on a network file system
is selinux disabled for all users?

It is globally disabled

Maybe you have squashing enabled on your network file system which could alter permissions.

that's an interesting Idea. GPFS does funny things with permissions, but I configured it such that it respects chmod.

However, It works for my normal user but not for others. My user has no priviliges (except sudo, but that does not seem to play a role)

@3XX0
Copy link
Member

3XX0 commented Mar 2, 2021

Is GPFS mounted directly or exported through say NFS?
It could be ACLs or something like that too.

You can try executing said binary outside the container to make sure it's enroot related:
~/.local/enroot/share/<container>/bin/sh
If your users can't run it from outside, the problem is probably not coming from enroot

My user has no priviliges (except sudo, but that does not seem to play a role)

Just to make sure, there are not using sudo to start the container right?

@krono
Copy link
Contributor Author

krono commented Mar 2, 2021

Is GPFS mounted directly or exported through say NFS?
It could be ACLs or something like that too.

Directly. I'll look. ... Oh dear:

$ mmgetacl /$GPFS/home/$ME/.local/share/enroot/cuda/bin/bash
#NFSv4 ACL
#owner:$ME
#group:$GROUP
special:owner@:rw--:allow:Inherited
 (X)READ/LIST (X)WRITE/CREATE (X)APPEND/MKDIR (X)SYNCHRONIZE (X)READ_ACL  (X)READ_ATTR  (X)READ_NAMED
 (X)DELETE    (X)DELETE_CHILD (X)CHOWN        (-)EXEC/SEARCH (-)WRITE_ACL (X)WRITE_ATTR (X)WRITE_NAMED

special:group@:r---:allow:Inherited
 (X)READ/LIST (-)WRITE/CREATE (-)APPEND/MKDIR (X)SYNCHRONIZE (X)READ_ACL  (X)READ_ATTR  (X)READ_NAMED
 (-)DELETE    (-)DELETE_CHILD (-)CHOWN        (-)EXEC/SEARCH (-)WRITE_ACL (-)WRITE_ATTR (-)WRITE_NAMED

user:$ME:rwx-:allow:Inherited
 (X)READ/LIST (X)WRITE/CREATE (X)APPEND/MKDIR (X)SYNCHRONIZE (X)READ_ACL  (X)READ_ATTR  (X)READ_NAMED
 (X)DELETE    (X)DELETE_CHILD (X)CHOWN        (X)EXEC/SEARCH (-)WRITE_ACL (X)WRITE_ATTR (X)WRITE_NAMED

$ mmgetacl /$GPFS/home/$USER/.local/share/enroot/cuda-test/bin/bash
#NFSv4 ACL
#owner:$USER
#group:$GROUP
special:owner@:rw--:allow:Inherited
 (X)READ/LIST (X)WRITE/CREATE (X)APPEND/MKDIR (X)SYNCHRONIZE (X)READ_ACL  (X)READ_ATTR  (X)READ_NAMED
 (X)DELETE    (X)DELETE_CHILD (X)CHOWN        (-)EXEC/SEARCH (-)WRITE_ACL (X)WRITE_ATTR (X)WRITE_NAMED

special:group@:r---:allow:Inherited
 (X)READ/LIST (-)WRITE/CREATE (-)APPEND/MKDIR (X)SYNCHRONIZE (X)READ_ACL  (X)READ_ATTR  (X)READ_NAMED
 (-)DELETE    (-)DELETE_CHILD (-)CHOWN        (-)EXEC/SEARCH (-)WRITE_ACL (-)WRITE_ATTR (-)WRITE_NAMED

user:$USER:rw--:allow:Inherited
 (X)READ/LIST (X)WRITE/CREATE (X)APPEND/MKDIR (X)SYNCHRONIZE (X)READ_ACL  (X)READ_ATTR  (X)READ_NAMED
 (X)DELETE    (X)DELETE_CHILD (X)CHOWN        (-)EXEC/SEARCH (-)WRITE_ACL (X)WRITE_ATTR (X)WRITE_NAMED

This is NFSv4-Acl format, bu the last blocks each say that: my files have execute by me, theirs dont.

@krono
Copy link
Contributor Author

krono commented Mar 2, 2021

Ok, its an ACL problem.

My home directory is set to give all files r-x bit for me. but this is not visible in ls.

BUT: If I chmod +x bash the bash in $USERs enroot dir, the x bit gets set and the following happens:

[$USER ~]$ ls -al .local/share/enroot/cuda-test/bin/bash
-rwxr-x--x 1 $USER GROUP 1563432 Jun 23  2020 .local/share/enroot/cuda-test/bin/bash
[$USER ~]$ .local/share/enroot/cuda-test/bin/bash
[$USER ~]$ #inside bash
[$USER ~]$ exit
[$USER ~]$ enroot start cuda-test bash
enroot-switchroot: failed to execute: /bin/bash: Permission denied

note that this time it says /bin/bash and not /bin/sh as in the first post:

[$USER ~]$ enroot start cuda-test sh
enroot-switchroot: failed to execute: /bin/bash: Permission denied

How does unsquahsfs handle permissions?

Do you want this open for future reference or shall I close?

@3XX0
Copy link
Member

3XX0 commented Mar 2, 2021

Squashfs doesn't support ACL iirc, but this shouldn't matter in this instance.
Unsquashing the image will apply the same permissions as found in the image, which should indirectly apply the correct ACLs.
I'm not sure why this would work for you but not the other users, maybe you have different umasks?

There could be a GPFS issue lurking too, could be similar to this if not supported correctly.

@krono
Copy link
Contributor Author

krono commented Mar 2, 2021

Squashfs doesn't support ACL iirc, but this shouldn't matter in this instance.
Unsquashing the image will apply the same permissions as found in the image, which should indirectly apply the correct ACLs.

That's what I hoped.
It's curious, tho, I thought the effect of unsquashing was, permission-wise equivalent to create/chmod, but it is not, apparrently.

I'm not sure why this would work for you but not the other users, maybe you have different umasks?

No, different ACL inheritance rules on parent dirs. mine is older and has different rules.
I might “rebase” the users' enroot directories…

I'll have a look how things are after GFPS ACL changes and report back.

There could be a GPFS issue lurking too, could be similar to this if not supported correctly.

That would be .. ungood.
I'll watch out for that. (That said, GPFS acls are not the same as posix acls and are not accessed in the same way)

@krono
Copy link
Contributor Author

krono commented Mar 6, 2021

Ok, it is an ACL problem.
From my POV this can be closed unless you want it open for documentation purposes.

@krono
Copy link
Contributor Author

krono commented Mar 10, 2021

I think we can close

@krono krono closed this as completed Mar 10, 2021
@3XX0
Copy link
Member

3XX0 commented Mar 10, 2021

Thanks for looking into it!

@krono
Copy link
Contributor Author

krono commented Mar 10, 2021

Thanks to you for your analysis :)

@BlueCloudDev
Copy link

This is the only result I could find for this issue, so I'm posting the resolution here. I needed to change the directory for storing enroot data. Apparently the drive was not mounted with "exec" and was causing the execution to fail. Remounting with this:

sudo mount -o remount,exec /mnt/localdisk

Resolved the issue for me.

@krono
Copy link
Contributor Author

krono commented Apr 5, 2023

Yes, this is a not atypical reason for this error

This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants