Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

osutil/group.go: treat all non-nil errs from user.Lookup{Group,} as Unknown* #9041

Conversation

anonymouse64
Copy link
Member

@anonymouse64 anonymouse64 commented Jul 21, 2020

Unfortunately the POSIX standard does not specify what the correct error code is
to identify the case where a username or groupname was not found when calling
getpwnam_r or getgrnam_r respectively, so Go's implementation of user.Lookup and
user.LookupGroup's returning UnknownUserError and UnknownGroupError is
fundamentally broken.

To remedy that, we just treat all non-nil errors as UnknownUserError or
UnknownGroupError. This means that other code using this may suffer from less
precise error handling, but that code already will be suffering from that same
problem as it is an upstream problem with the implementation of getpwnam_r and
getgrnam_r.

This is needed to unbreak preseeding on Groovy cloud images before
the preseeding fix can be back-ported to focal, so marking this as Critical.

@anonymouse64 anonymouse64 added ⚠ Critical High-priority stuff (e.g. to fix master) Bug labels Jul 21, 2020
@anonymouse64 anonymouse64 added this to the 2.45 milestone Jul 21, 2020
@pedronis
Copy link
Collaborator

@anonymouse64 does preseeding work with this applied in the failing? or it just goes bit further? I mean does the adduser etc work then?

@anonymouse64
Copy link
Member Author

@anonymouse64 does preseeding work with this applied in the failing? or it just goes bit further? I mean does the adduser etc work then?

yes preseeding completes with this on groovy, I have not yet booted the resultant image, but preseeding looks correct, and the calls to adduser, etc. all work

@anonymouse64
Copy link
Member Author

Upstream Go issue filed: golang/go#40334

@zyga zyga self-requested a review July 22, 2020 10:51
bboozzoo
bboozzoo previously approved these changes Jul 22, 2020
Copy link
Collaborator

@bboozzoo bboozzoo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

// if there is a real problem finding the user then presumably other
// things will fail upon trying to create the user, etc. which will give
// more useful and specific errors
return 0, user.UnknownUserError(username)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can almost convince myself it makes sense. The lookup failed, so the user/group is effectively unknown.

I've looked at the code that may be affected and it's seems that only snapshots care about those particular kinds of errors. It does a look by uid, but AFAICT it will eventually hit nsswitch (due to getpwuid_r). If I'm reading this right, snapshotstate.Save() can actually create a snapshot where otherwise it would not create one due to errors from user.LookupId() bubbling up. I don't think it's a big problem though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked at the snapshot case and I agree I don't think it's a big problem, but note that the snapshot code doesn't use osutil.FindUid so it won't hit the code here. As such, I changed the code in snapshot and added this same comment there too

osutil/group.go Outdated Show resolved Hide resolved
@pedronis
Copy link
Collaborator

It seems we have a couple more places using user.Lookup* in the codebase, I think we need to check what they do with errors.

zyga
zyga previously approved these changes Jul 22, 2020
Copy link
Collaborator

@zyga zyga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, because cgo doesn't even attempt to map system level error information into anything like UnknownUserError, which is terrible.

@anonymouse64
Copy link
Member Author

It seems we have a couple more places using user.Lookup* in the codebase, I think we need to check what they do with errors.

yes as @bboozzoo points out there are some usages that suffer from the same problem in snapshot code. I will update those in this PR then (and do the requested comment change)

@anonymouse64 anonymouse64 added ⛔ Blocked Needs security review Can only be merged once security gave a :+1: and removed ⛔ Blocked labels Jul 22, 2020
@anonymouse64
Copy link
Member Author

Alright I fixed the other instances of checking for user.Unknown{Group,User}Error here, although one of the cases I would like extra eyes on to confirm that my thinking is correct, specifically around osutil.RealUser, as that inspects SUDO_USER, and while I don't think my changes result in a privilege escalation I would feel better if someone from security could take a (really quick) look at it.

Copy link
Collaborator

@zyga zyga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment about being root probably needs a bit of extra eyes. Root can be confined as well. I would not like to allow for a situation when a unprivileged snap service can somehow craft snapshot payload that writes over /root.

osutil/user.go Outdated Show resolved Hide resolved
osutil/user.go Outdated Show resolved Hide resolved
@pedronis pedronis removed this from the 2.45 milestone Jul 23, 2020
@anonymouse64 anonymouse64 removed the ⚠ Critical High-priority stuff (e.g. to fix master) label Jul 23, 2020
@anonymouse64
Copy link
Member Author

This is not Critical anymore, as CPC will be able to patch livecd-rootfs to not include systemd as a method for passwd/group lookup in nsswitch.conf in the chroot, and in that situation Go's function returns UnknownUserError correctly.

@anonymouse64 anonymouse64 added the Run nested The PR also runs tests inluded in nested suite label Jul 23, 2020
@anonymouse64
Copy link
Member Author

closing and re-opening to pick up the "run nested" label and run the nested spread tests

// > does not specify what value errno might have in this situation.
// > But that makes it impossible to recognize errors.
//
// See upstream Go issue: https://github.com/golang/go/issues/40334
u, e := userLookupId(username)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the difference between userLookup() and userLookupId()? Is it intentional to call the one moments after the other?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

userLookup is Go's user.Lookup (saved to a variable for mocking) that takes the username i.e. setharnold as the argument, while userLookupId is Go's user.LookupId, which takes the uid as the argument, i.e. 1000.

It is intentional to call them both like this yes (but note not always, only if the first username call fails). The reason to call them both like this is that here the username variable came from user input a la snap save --users=user1,user2,1000,12345 where each of the arguments could be either uid or a username, so we aren't sure which is which.

// if there is a real problem finding the user/group then presumably
// other things will fail upon trying to create the user, etc. which
// will give more useful and specific errors
return 0, user.UnknownGroupError(groupname)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not thrilled the the user id being returned in an error case being the all-powerful account on the system. Are there any "sum types" in golang that could be used to clearly differentiate between errors and successes, that would make failing open more difficult?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems to me that there are kinda 2 points you're making here which I will respond to separately

  1. yes there is a concept of "sum types", specifically for this case of errors, there is a new system for composing different error types together and being able to inspect the full set of composed error types, which would allow us to return both the original error we got from user.Lookup here, as well as an error of the type user.UnknownGroupError that can be inspected / checked for explicitly by calling code, but I would consider that kind of change a bit invasive for this PR and I think is more a stylistic choice here rather than necessary.
  2. regarding returning the default value of 0 here, this is fairly idiomatic Go, which is just returning the default return value for a function when there is an error, however in this case it is rather unfortunate that the default value of a uint also happens to be a fairly powerful user in the success case. What we could do to avoid this and protect against ignoring the error is we could change the return value on error to something that's not idiomatic Go, and return a very large number instead of 0, or we could change the return type to be a pointer, and on error we return nil, which means that a caller that ignores the error would panic trying to use the value as it would be nil. The issue with the former is that it's unclear to me what the right value is, looking around it seems different OS's have different max uid's, and there's not an easy way to find out without adding a bunch of OS specific code we would rather not do (but probably technically cloud). The issue with the latter is that we would have to change the function signature and all usages, which for this PR is rather invasive, so I would prefer to do that in a followup PR if you think this change needs to be made. I would also have to audit all other uid returning code we have to see if there are other functions similarly affected by this bug of default return value also meaning something very powerful.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems this has a single +1 now but no security review +1. Would changing the return value in the error case of "0" to "-1" be sufficient so that @setharnold would do a +1 on this? Or is there more needed?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-1 has also a special meaning for some syscalls, so not sure it's a completely better solution

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think of the choices between 0 and -1, I prefer 0, especially if that's the convention.

I would prefer to see a move to using sum types that can clearly prevent using an error return as a legitimate value but can appreciate that might be a long-term project.

Security team +1 on this. (Jamie's otherwise occupied; if you've got a specific question for him you can ask him more directly.)

Thanks

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's add a TODO to this file to turn the uint results at a later point to something with a distinguished default value (i.e. that can't be confused with uid/gid 0)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done in #9243

@pedronis
Copy link
Collaborator

@anonymouse64 was it intentional to push the preseed fixes here?

osutil/user.go Outdated Show resolved Hide resolved
@anonymouse64
Copy link
Member Author

@anonymouse64 was it intentional to push the preseed fixes here?

it was but seeing how this grows the PR significantly, due to needing the other preseed fixes from #9053, it seems better to force push those out of existence from here and open a followup PR to this which adds those regression tests after this and 9053 is merged.

…nknown*

Unfortunately the POSIX standard does not specify what the correct error code is
to identify the case where a username or groupname was not found when calling
getpwnam_r or getgrnam_r respectively, so Go's implementation of user.Lookup and
user.LookupGroup's returning UnknownUserError and UnknownGroupError is
fundamentally broken.

To remedy that, we just treat all non-nil errors as UnknownUserError or
UnknownGroupError. This means that other code using this may suffer from less
precise error handling, but that code already will be suffering from that same
problem as it is an upstream problem with the implementation of getpwnam_r and
getgrnam_r.

Signed-off-by: Ian Johnson <ian.johnson@canonical.com>
…ling

As with osutil/user.FindUid(), we suffer from an upstream problem where error
handling surrounding positively identifying if a given user/group/id does not
exist is impossible. As such, treat all non-nil errors as the unknown case, in
this case what happens is that when creating snapshots, we may silently miss
some users if there are issues looking up their name/id, but that is better
behavior than today where we would not create any snapshot if there was an
errant /home/not-a-user/ directory and that not-a-user returned a non-0 errno
from getpwnam_r.

Signed-off-by: Ian Johnson <ian.johnson@canonical.com>
…mments

The name of this function was confusing, as it is not the real user in the sense
of real vs effective uid, but rather either the normal current user, or if we
are root via sudo method, then the user calling sudo.

Additionally, add a comment about the ineffectiveness of checking for
user.UnknownUserError here.

Signed-off-by: Ian Johnson <ian.johnson@canonical.com>
@anonymouse64 anonymouse64 force-pushed the feature/osutil-user-group-lookup-err-handling-is-hopeless branch from fee118a to 206bb3a Compare July 24, 2020 17:56
@anonymouse64 anonymouse64 dismissed stale reviews from zyga and bboozzoo July 24, 2020 17:57

PR changed significantly

@anonymouse64 anonymouse64 removed the Run nested The PR also runs tests inluded in nested suite label Jul 24, 2020
@jdstrand jdstrand self-requested a review July 24, 2020 21:11
Copy link
Collaborator

@pedronis pedronis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

…ing-is-hopeless

Signed-off-by: Ian Johnson <ian.johnson@canonical.com>
…ing-is-hopeless

Signed-off-by: Ian Johnson <ian.johnson@canonical.com>
Copy link
Collaborator

@pedronis pedronis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's land this with a TODO

// if there is a real problem finding the user/group then presumably
// other things will fail upon trying to create the user, etc. which
// will give more useful and specific errors
return 0, user.UnknownGroupError(groupname)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's add a TODO to this file to turn the uint results at a later point to something with a distinguished default value (i.e. that can't be confused with uid/gid 0)

@mvo5
Copy link
Contributor

mvo5 commented Sep 28, 2020

Thanks, this looks fine now we will do some followup work as outlined by samuele

@mvo5 mvo5 merged commit 7ff96a4 into snapcore:master Sep 28, 2020
@anonymouse64 anonymouse64 deleted the feature/osutil-user-group-lookup-err-handling-is-hopeless branch September 28, 2020 15:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Needs security review Can only be merged once security gave a :+1:
Projects
None yet
6 participants