Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

snap-confine/nvidia: Support legacy biarch trees for GLVND systems #4559

Merged
merged 4 commits into from Jan 31, 2018

Conversation

ikeydoherty
Copy link
Contributor

GLVND systems are well catered for on biarch configurations, however the
older 304 + 340 series will rely on ld.so.conf tricks to mask the system
libGL.so with one in the private libdir/nvidia*/ files. This is seen
consistently across Arch, Fedora and Solus. As such we allow these fallback
paths to work so that users of snapd on glvnd enabled distros with the older
legacy drivers can still enjoy snapd + NVIDIA together.

Signed-off-by: Ikey Doherty ikey@solus-project.com

GLVND systems are well catered for on biarch configurations, however the
older 304 + 340 series will rely on ld.so.conf tricks to mask the system
libGL.so with one in the private `libdir/nvidia*/` files. This is seen
consistently across Arch, Fedora and Solus. As such we allow these fallback
paths to work so that users of snapd on glvnd enabled distros with the older
legacy drivers can still enjoy snapd + NVIDIA together.

Signed-off-by: Ikey Doherty <ikey@solus-project.com>
@ikeydoherty
Copy link
Contributor Author

Sample /var/lib/snapd/lib/gl on a GLVND distro (Solus) with the NVIDIA 304 driver: https://hastebin.com/idasofebut - should point out that the primary links are what matter here. Also note NVIDIA 340 provides the libEGL links which in turn would replace the glvnd libEGL in the hostfs portion. And "pure" GLVND still works.

@mvo5 mvo5 added this to the 2.31 milestone Jan 29, 2018
@codecov-io
Copy link

codecov-io commented Jan 29, 2018

Codecov Report

Merging #4559 into master will increase coverage by <.01%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #4559      +/-   ##
==========================================
+ Coverage   78.18%   78.19%   +<.01%     
==========================================
  Files         456      456              
  Lines       32529    32690     +161     
==========================================
+ Hits        25434    25563     +129     
- Misses       4979     5011      +32     
  Partials     2116     2116
Impacted Files Coverage Δ
interfaces/builtin/opengl.go 100% <ø> (ø) ⬆️
cmd/snap-update-ns/change.go 87.87% <0%> (-9.63%) ⬇️
cmd/snap-update-ns/utils.go 94.71% <0%> (-5.29%) ⬇️
overlord/hookstate/hookmgr.go 73.14% <0%> (+1.14%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 96b082c...7f8bc25. Read the comment docs.

Copy link
Contributor

@mvo5 mvo5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Conan-Kudo
Copy link
Contributor

Oh goodie, this should fix some of the reported issues in Fedora with NVIDIA, too.

@Conan-Kudo
Copy link
Contributor

@mvo5 Could you please cherry pick this into 2.31 for me? This will resolve RH#1442051 for me.

Copy link
Collaborator

@zyga zyga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, CC @jdstrand

@zyga zyga requested a review from jdstrand January 29, 2018 15:58
Copy link

@jdstrand jdstrand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm approving the apparmor changes, but please consider checking if it is actually a symlink since all the comments and die() are reporting this is the case.


// Make sure we don't have some link already (merged GLVND systems)
if (lstat(symlink_name, &stat_buf) == 0) {
if (unlink(symlink_name) != 0) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't you also be testing if S_IFLNK? lstat() will happily return 0 if regular file or symlink.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The path is inside our hostfs symlink farm that we create, and we only create symlinks, the code doesn't copy files.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note, the naming here is confusing, but symlink_name is not the source name, it is the target name for the symlink, i.e. the new inode. Confused the hell out of me going through it the first time.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll probably do a small update here to use consistent terminology (oldname vs whatever the other one was).

@Conan-Kudo
Copy link
Contributor

Conan-Kudo commented Jan 30, 2018

@ikeydoherty I think I might be confusing myself again, but will the snap-confine mount-support code actually work with legacy nvidia driver stuff installed in /usr/lib64/nvidia*?

c.f.: http://koji.rpmfusion.org/koji/rpminfo?rpmID=171143
c.f.: http://koji.rpmfusion.org/koji/rpminfo?rpmID=213317

@ikeydoherty
Copy link
Contributor Author

Uh it looks like you guys have even more stuff in that directory than I suspected. You have files in there that shouldn't and don't need to be.. Only the libGL needs to be there :/

In that case we'd have to effectively double the copy list to include all of those (frankly incredibly silly) paths.

@ikeydoherty
Copy link
Contributor Author

And TBH @Conan-Kudo it looks like Fedora should be using bind mounts for those nvidia drivers, but my guess is the use of glvnd makes that unfeasible.

What we can do:

Remove 32 vs 64 logic in the "mount" farm. Make the function accept a common ancestor, i.e. /usr/lib /usr/lib/nvidia, /usr/lib32, etc. Then form full paths from that ancestor and do the blits, while preserving the above "nuke if exists" logic.

That should simplify the code and cover all of the uses, right? And keep the copy list simpler by implicitly allowing skipping over elements.

Instead of our two hard-coded path lists, we'll form relative paths for
all optional elements in the list against common ancestors known for biarch
distributions.

This builds upon the previous work and allows us to replace or grab files
from the subdirectories, and also allows the private NVIDIA libraries to
exist either within the toplevel libdir or within a private directory, as
seen on Fedora.

The flow is changed slightly so that our first primary call for each libdir
is the only mount, and then we follow up with a simple populate call for the
libdir.

Signed-off-by: Ikey Doherty <ikey@solus-project.com>
@ikeydoherty
Copy link
Contributor Author

OK I added a follow up commit which should just make stuff work for Everyone ™

@ikeydoherty
Copy link
Contributor Author

The only pain point I see (because I don't know how it works on Fedora) is this reliance on /usr/lib and not, say, /usr/lib64. Now, if this is an issue on Fedora (i.e they're distinct directories maybe?) then we can make it a configure time option for the LIBDIR? Or patch just patch that one spot, maybe.

@Conan-Kudo
Copy link
Contributor

@ikeydoherty They are distinct directories. /usr/lib is 32-bit, whereas /usr/lib64 is 64-bit.

We could make this a compile-time option if you want, then we can feed in the correct base libdir as a configure option (we already do, it just does nothing with it...).

ikeydoherty and others added 2 commits January 30, 2018 17:28
This will allow each distro to provide the native arch and the alternative
lib32 directory without us trying to guess where they go. With this change,
we'll only try to mount the alternative 32-bit biarch space if we're a
64-bit build of snapd, otherwise we'd only have the native 32-bit arch
to worry about in the first place.

Signed-off-by: Ikey Doherty <ikey@solus-project.com>
Signed-off-by: Neal Gompa <ngompa13@gmail.com>
@zyga zyga merged commit 14b7841 into snapcore:master Jan 31, 2018
@ikeydoherty
Copy link
Contributor Author

Yay! Danke ^_^

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
6 participants