Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

udev: add default group for sgx enclave access #18944

Merged
merged 1 commit into from Mar 10, 2021

Conversation

keszybz
Copy link
Member

@keszybz keszybz commented Mar 9, 2021

Closes #18669.

This creates a "well known" for sgx_enclave ownership. By doing this here we
avoid the risk that various projects making use of the device will provide
similar-but-slightly-incompatible installation instructions, in particular
using different group names.

ACLs are actually a better approach to grant access to users, but not in all
cases, so we want to provide a standard group anyway.

Mode is 0o660, not 0o666 because this is very new code and distributions are
likely to not want to give full access to all users. This might change in the
future, but being conservative is a good default in the beginning.

Rules for /dev/sgx_provision will be provided by libsg-ae-pce:
intel/linux-sgx#678.

@bluca bluca added good-to-merge/waiting-for-ci 👍 PR is good to merge, but CI hasn't passed at time of review. Please merge if you see CI has passed udev labels Mar 9, 2021
Copy link
Member

@bluca bluca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@poettering
Copy link
Member

hmm, what's the point of the SYMLINK+= part? who would use that? Can you explain?

@jethrogb
Copy link

Symlink looks like backcompat with previous out of tree kernel drivers. Now that we have a stable driver to build from I think it's better to force a “reset” of the ecosystem.

@poettering poettering added reviewed/needs-rework 🔨 PR has been reviewed and needs another round of reworks and removed good-to-merge/waiting-for-ci 👍 PR is good to merge, but CI hasn't passed at time of review. Please merge if you see CI has passed labels Mar 10, 2021
@keszybz
Copy link
Member Author

keszybz commented Mar 10, 2021

@haitaohuang, ptal.

@woju
Copy link

woju commented Mar 10, 2021

Yeah, pretty much looks like it. For reference here's a table of (header, device path) and a comment for a particular we use for driver autodetection: https://github.com/oscarlab/graphene/blob/cea083122941c52b8e9b21dfd327364ffbc3f0cb/Pal/src/host/Linux-SGX/link-intel-driver.py#L8-L19. This particular symlink is aimed at DCAP driver, which historically closely followed the repeated patchsets (they even have a nice table: https://github.com/intel/SGXDataCenterAttestationPrimitives/tree/master/driver/linux#compatibility-with-intelr-sgx-psw-releases).

With graphene hat: either way works for us. Thank you!

@jethrogb
Copy link

Closes systemd#18669.

This creates a "well known" for sgx_enclave ownership. By doing this here we
avoid the risk that various projects making use of the device will provide
similar-but-slightly-incompatible installation instructions, in particular
using different group names.

ACLs are actually a better approach to grant access to users, but not in all
cases, so we want to provide a standard group anyway.

Mode is 0o660, not 0o666 because this is very new code and distributions are
likely to not want to give full access to all users. This might change in the
future, but being conservative is a good default in the beginning.

Rules for /dev/sgx_provision will be provided by libsg-ae-pce:
intel/linux-sgx#678.
@keszybz
Copy link
Member Author

keszybz commented Mar 10, 2021

Symlink looks like backcompat with previous out of tree kernel drivers. Now that we have a stable driver to build from I think it's better to force a “reset” of the ecosystem.

I dropped the symlink again.

@keszybz keszybz added good-to-merge/waiting-for-ci 👍 PR is good to merge, but CI hasn't passed at time of review. Please merge if you see CI has passed and removed reviewed/needs-rework 🔨 PR has been reviewed and needs another round of reworks labels Mar 10, 2021
@poettering
Copy link
Member

lgtm

@keszybz keszybz removed the good-to-merge/waiting-for-ci 👍 PR is good to merge, but CI hasn't passed at time of review. Please merge if you see CI has passed label Mar 10, 2021
@keszybz keszybz merged commit c9c4899 into systemd:main Mar 10, 2021
@mbiebl
Copy link
Contributor

mbiebl commented Mar 11, 2021

Hm, why not use uaccess here? The less special system groups we have, the better.

@keszybz
Copy link
Member Author

keszybz commented Mar 11, 2021

uaccess is about logged in users. The set of such users might overlap with the set of users who shall have access to this, but it doesn't have to.

@mbiebl
Copy link
Contributor

mbiebl commented Mar 11, 2021

Sure, but do we have a need/use-case for such a non-logged in user?
Wouldn't it be better, if we added the uaccess tag first, so stuff works OOTB without requiring explicit mangling of group memberships. If the need arises for the non-logged in user case, we can still add the explicit sgx group later.

My fear is, that sgx is very much a specialized use case requiring a certain hardware and software combination. So I don't see the need (yet) to add such a group to everyones system. Who knows if sgx is still a thing a couple of years down the line.
Getting rid of static system groups once introduced is hard.

@keszybz keszybz deleted the sgx-enclave branch March 11, 2021 19:48
@keszybz
Copy link
Member Author

keszybz commented Mar 11, 2021

There isn't any particular reason to tie this to uaccess. If the local policy is to use uaccess, this is very simple to configure. But such policy clearly isn't universal or recommended by existing packages.

My main goal with merging this here is to provide a standardized experience on all systems. Introducing a group like this is really cheap, the whole patch is two lines. Intel cpus are very popular, so even if only some fraction of users can use this, this fraction can still be a lot of people. If we at some point decide that it's not useful any more, we can drop it again. Existing systems will keep the group, no harm done. This is certainly better then a proliferation of scripts and instructions to add the group in every project that wants to make use of it.

@mbiebl
Copy link
Contributor

mbiebl commented Mar 11, 2021

Well, no. Removing existing system groups isn't cheap. Once introduced it's usually hard to get rid of them, as you might have files/directories using that group.
It's still not sufficiently justified, why uaccess is not suitable here.

@mbiebl
Copy link
Contributor

mbiebl commented Mar 11, 2021

Well, no. Removing existing system groups isn't cheap.

Which is why we should be careful introducing new system groups...

@bluca
Copy link
Member

bluca commented Mar 11, 2021

I guess if someone wants to use uaccess, then anyone who does will be compatible with each other out of the box, as you don't need a specific keyword. With group access, you do need to use the exact same name, and that's where standardisation at the top level helps.

Also, I definitely see use cases for this feature in service-level software, which won't run from a logged in user, and thus won't benefit from uaccess, right?

@mbiebl
Copy link
Contributor

mbiebl commented Mar 11, 2021

Also, I definitely see use cases for this feature in service-level software

Such as?

@mbiebl
Copy link
Contributor

mbiebl commented Mar 11, 2021

Honestly, I'm quite puzzled. For years we have been advocating that static group member ship is probably not such a great idea.
Now we make a 180 u-turn?

@poettering
Copy link
Member

uaccess doesn't sounds like an appropriate thing I'd say. That#s really more about desktopish stuff, and SGX is entirely generic stuff. I mean, I'd expect sshd do use it and http servers and such that need crypto, and there's really no relevance to the desktop there.

As you might have seen with the AMD SEV stuff I generally try to push away stuff that is too special purpose, and where there's another package that is pretty much "the" maintainer of that subsystem. But I was told that SGX doesn't really fit into that much, it's pretty generic stuff that is useful outside of the virtualization space, and apparently is supposed to be pretty commonly available in intel CPUs.

I mean, we have to draw the line somewhere: we have the group concept and it makes sense for system-level stuff like this. Hence we should use it. Because otherwise there's no point in the groups at all, if we never use them.

The other options are only to say "meh, not our problem, let people fight for device node ownership", or to open up the device node. I doubt this is really in our interest.

We try to gently push distros to come to a common groups database, and we can do this via the sysusers stuff pretty OK. Of course it's open source, people can patch it out downstream again.

@mbiebl
Copy link
Contributor

mbiebl commented Mar 11, 2021

backup services that want tape access. or bitcoin miners that want render access. or printer servers that want lp access. They all may run (at least partially) unprivileged, with access to select hw and nothing else.

huh? you missed the point apparently

@mbiebl
Copy link
Contributor

mbiebl commented Mar 11, 2021

So, the responses so far convinced me even more to revert this change. But I'm open to actual use cases.

@poettering
Copy link
Member

Unless there is a proper justification for this group, it will be patched out in Debian/Ubuntu (which would defeat the point of making it "standard")

I am not sure what a "proper" justification is supposed to be, beyond what we already gave you: we add groups for device nodes for reasonably common/generic device types, where system components have a good reason to get access to (while others shall not) and which are accessed by multiple projects directly as opposed to having a single userspace subsystem owner. That's all there is to it.

of course, these criteria are not binary decision points — "common" and "generic" are subjective and a continous scale. we came to the conclusion that these criteria apply so we merged it. If you disagree with that, it's your freedom.

So for the next time (iirc we had the same discussion about the "render" nodes stuff), let's semi formalize things:

  1. must be a driver class for reasonably generic and common hw
  2. must have more than one consumer in userspace, as opposed to have a single "primary" subsystem package (which should own the rules/group in that case)
  3. must be something select system services shall get access to. i.e. if only users are desktop "human" users, or when world accessible is OK too, this doesn't apply.

@poettering
Copy link
Member

So, the responses so far convinced me even more to revert this change. But I'm open to actual use cases.

I mean, knock yourself out, but we listed plenty usecases, don't claim otherwise. You just decided to ignore or discount them as irrelevant.

@poettering
Copy link
Member

huh? you missed the point apparently

then explain it to me!

@bluca
Copy link
Member

bluca commented Mar 11, 2021

So, the responses so far convinced me even more to revert this change. But I'm open to actual use cases.

I'm not sure what you mean by "actual use case" - if you mean software that today can use enclaves, then the various ssl libraries have been mentioned. For example, you can run apache or nginx with wolfssl with sgx support. You know perfectly well you don't want to run web servers as root. That's a use case as concrete as it could possibly be, available today.
Then again, the feature is brand new. More will come.

@mbiebl
Copy link
Contributor

mbiebl commented Mar 11, 2021

I mean, knock yourself out, but we listed plenty usecases, don't claim otherwise.

Not really, no. But thanks for being so constructive.

You just decided to ignore or discount them as irrelevant.

Hm, that feels familiar.

@mbiebl
Copy link
Contributor

mbiebl commented Mar 11, 2021

You know perfectly well you don't want to run web servers as root

The web servers I know start off as root and only drop privileges after they've setup everything (like binding to privileged ports)

@bluca
Copy link
Member

bluca commented Mar 11, 2021

You know perfectly well you don't want to run web servers as root

The web servers I know start off as root and only drop privileges after they've setup everything (like binding to privileged ports)

I'm not an expert but I don't think that works here, as access is required at any time

@mbiebl
Copy link
Contributor

mbiebl commented Mar 11, 2021

huh? you missed the point apparently

then explain it to me!

you are talking about tape recorders. Not sure what that is supposed to mean when I was asking for real use cases.

@mbiebl
Copy link
Contributor

mbiebl commented Mar 11, 2021

I'm not an expert but I don't think that works here, as access is required at any time

It's an fd that can be passed to the children, no?

@poettering
Copy link
Member

@mbiebl i listed two more cases, and iirc backups in big data centers is still all tapes, they even build new tape drivers pretty regularly.

@poettering
Copy link
Member

The web servers I know start off as root and only drop privileges after they've setup everything (like binding to privileged ports)

Yeah, on sysvinit this is how things worked, but we try to move people away from schemes like that and just drop privs for them.

@mbiebl
Copy link
Contributor

mbiebl commented Mar 11, 2021

Yeah, on sysvinit this is how things worked, but we try to move people away from schemes like that and just drop privs for them.

that's still true today, even under systemd, afaics.

@mbiebl
Copy link
Contributor

mbiebl commented Mar 11, 2021

@mbiebl i listed two more cases, and iirc backups in big data centers is still all tapes, they even build new tape drivers pretty regularly.

how tape recoders are related to sgx still eludes me.

@jethrogb
Copy link

Any network service is a candidate for use of SGX, which they may use to protect sensitive information such as TLS keys or application data. However, not every network service requires root.

@bluca
Copy link
Member

bluca commented Mar 11, 2021

I'm not an expert but I don't think that works here, as access is required at any time

It's an fd that can be passed to the children, no?

AFAIK none of the libraries/implementations work like that, and the nodes are accessed directly. Another real, existing use case is Graphene, and this is how it instructs users to set up:

https://graphene.readthedocs.io/en/latest/sgx-intro.html#linux-kernel-drivers

iow, exactly what this PR does, as suggested in the ticket by #18669 (comment)

@woju
Copy link

woju commented Mar 11, 2021

@mbiebl: We (Graphene) and Fortanix (@jethrogb's project) both use /dev/sgx_enclave interface (and there are other projects who don't speak up here). Both of the frameworks are (should be) perfectly fine to install alongside each other. What should each of us suggest to our users? 0666 is generally a bad advice, because some people prefer to restrict access to SGX for various reasons (auditability is one concern, other people don't trust anything remotely related to Intel ME, which SGX depends upon, ...). So we'd need 0660 with a common group, which is how such problems are traditionally solved, and those examples with render, dialup and even tape served to prove the point.

We aim for as easy installation as possible (optimally apt install && gpasswd -a), because many users are not proficient linux admins (Graphene is widely used in ML scenarios by people who are data scientists by trade). The lack of this common group would make packaging our projects for Debian/Ubuntu problematic.

@poettering
Copy link
Member

As it appears Signal (the IM program) can do its crypto in SGX. Not sure if on linux yet though. See https://signal.org/blog/secure-value-recovery/

@mbiebl
Copy link
Contributor

mbiebl commented Mar 13, 2021

As it appears Signal (the IM program) can do its crypto in SGX. Not sure if on linux yet though. See https://signal.org/blog/secure-value-recovery/

Perfect example that uaccess would be a better choice here.

@bluca
Copy link
Member

bluca commented Mar 13, 2021

As it appears Signal (the IM program) can do its crypto in SGX. Not sure if on linux yet though. See https://signal.org/blog/secure-value-recovery/

Perfect example that uaccess would be a better choice here.

Uhm, how so? It is used server-side, not by the user app - ie, a practical example of the "securely handle customer data" that I mentioned the other day. From the article:

However, we can invert the traditional SGX relationship to run a secure enclave on the server. An SGX enclave on the server would enable a service to perform computations on encrypted client data without learning the content of the data or the result of the computation.

@mbiebl
Copy link
Contributor

mbiebl commented Mar 13, 2021

As it appears Signal (the IM program) can do its crypto in SGX. Not sure if on linux yet though. See https://signal.org/blog/secure-value-recovery/

Perfect example that uaccess would be a better choice here.

Uhm, how so? It is used server-side, not by the user app -

So "Signal (the IM program)" is not referring to the client?

@bluca
Copy link
Member

bluca commented Mar 13, 2021

As it appears Signal (the IM program) can do its crypto in SGX. Not sure if on linux yet though. See https://signal.org/blog/secure-value-recovery/

Perfect example that uaccess would be a better choice here.

Uhm, how so? It is used server-side, not by the user app -

So "Signal (the IM program)" is not referring to the client?

No, it refers to the server, it's explained in the blog post, it's quite an interesting read

@haitaohuang
Copy link

Thanks @keszybz for submitting the patch.
Sorry I'm late for responding (I was off grid for a few days).

I would still prefer 0666 as default:

  1. it is the easiest for user to install enclave apps without worrying about joining any special groups.
  2. Access to sgx_enclave node only gives user permission to run code inside enclave. ISA guarantees only subset of instructions available inside enclave, e.g. no syscall, cpuid, io, etc. In other words, enclaves can only do a subset of things that the user can already do without it.
  3. Enclaves does consume special EPC memory which is limited . So we do need mitigate DOS attack on EPC. Maybe a specific group would be helpful in that regard but probably not adequate . In future kernel, EPC usage will be monitored and controlled by cgroups. Once cgroups are supported, the extra group is redundant for this purpose?

BTW, SGX itself does not depend on ME as some may deduce from some implementations for client use cases that enclaves need use service from ME, such as trusted counters.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

Successfully merging this pull request may close these issues.

Add default udev rules and group for sgx_* dev nodes for new SGX support in kernel 5.11
7 participants