OS X has a race condition establishing your security session #221

wants to merge 1 commit into


None yet
4 participants

saurik commented Apr 18, 2012

On OS X, due to the way security sessions and bootstrap Mach contexts (I'm honestly not 100% certain which is the problem here, and my understanding of these systems is not perfect: it mostly comes from working on the new Substrate injector from a few months ago, and some work I did on absinthe) are inherited, if sshd shuts down before the login shell starts up (which both can happen and normally does happen as the parent mosh-server does an _exit immediately after a fork, causing that process to exit before the children get very far), then the resulting shell is pretty much entirely broken: you can't even figure out your own username (whoami returns numbers and bash simply prints "I have no name!"), much less talk to complex Mach services such as launchd.

The patch referenced by this pull request is obviously stupid and non-deterministic (it is just a call to sleep(2) before the _exit() after the fork() in the parent mosh-server process), but at least it demonstrates the problem and is a starting point for people looking at this problem to figure out a better way to handle it (or for those experiencing it, such as @jcftang from #218) to find temporary relief.

Obviously, however, the final fix needs to be deterministic. The problem, though, is I'm not certain how long we really need to wait; do we just need to wait for the second fork, for the second psuedo-terminal to be created, for the login shell to actually "do something"... I don't know (although my current bet is that if we coordinate such that after the second fork, which also happens to create the pseudo-terminal anyway as it is really a call to forkpty, we'll be golden).

Doing that will require some irritating synchronization primitive between the various processes (and in the worse case something like waiting for the terminal to actually bear fruit, which would require the terminal network loop to get involved). Anyone doing that should also try to verify (maybe by adding a long sleep in the child after the synchronization) that the event being waited for is actually truly sufficient (and not just probabilistically better).

(There are other solutions--ones that involve changing the way OpenSSH and mosh get bootstrapped together--that may also provide better fixes for other bugs, but those are more drastic and against some of the goals of the project developers; they are probably even pragmatically wrong anyway, as there actually are some key advantages to this OpenSSH->mosh-server craziness, although those may be short-term.)


keithw commented Apr 19, 2012

How can I avoid applying a commit titled "Fail to fix OS X security session race condition"...... :-)

I certainly don't understand the OS X issues involved here, but I can tell you we don't have these problems on a 10.7 server -- do you think they have made things easier for us, or am I just getting lucky?

We do have a special wrapper on MIT's Athena system to preserve the security session (well, to preserve the Kerberos credentials and AFS tokens), so this would not be the first time we had to do something special to preserve an OS security context, speaking very generally. On the other hand, if the problem only affects 10.6 and before (which is not at all clear obviously), I don't think we are obligated to jump through hoops for them. But I have to defer to the Mac experts here -- I'm not one.

@keithw keithw closed this Apr 19, 2012


saurik commented Apr 19, 2012

This has absolutely nothing to do with AFS, Kerberos... I'm not even certain why you are bringing these up. I also am not certain this bug does not affect OS X 10.7 (from my admittedly sketchy understanding of how this stuff works, I see no reason why it would have changed in 10.7: I certainly did not see anything in Substrate's bootstrap environment change while porting to iOS 5): it is the kind of thing where someone not seriously using the session might not even notice. For example, on your 10.7 setup, can you run "launchctl list" and verify that you get a large amount of output and process identifiers? Regardless, closing an issue like this before doing the deference to the Mac exports you alluded to is simply an incorrect priority.


keithw commented Apr 19, 2012

Hi saurik,

Yes, if I run "launchctl list", I get 30 lines of output:

PID Status Label
80882 - 0x7fa653400af0.anonymous.launchctl
80883 - 0x10c207ae0.anonymous.less
80872 - 0x10c2086f0.anonymous.mosh-server
80873 - 0x10c2083f0.anonymous.bash
80865 - 0x10c207de0.anonymous.launchproxy

  •   0       com.apple.TrustEvaluationAgent
  •   0       com.apple.netauth.user.auth
  •   0       com.apple.mdworker.prescan.0
  •   0       com.apple.mdworker.pool.framework.3
  •   0       com.apple.mdworker.pool.framework.2

I just closed the pull request -- I assume nobody (including you) thinks I should actually apply /* XXX: horrible race condition */ sleep(2); as submitted. If you want to discuss further on an issue or submit a patch that preserves the security session and Mach context, please feel free.

(If you don't agree that a security session is analogous to an AFS PAG and token or Kerberos credentials cache, all of which are normally destroyed by SSH on logout and that have to take special pains to preserve, that's your right.)


saurik commented Apr 19, 2012

I figured that a pull request is an issue: they appear under issues. If they are semantically separate then I'm apparently using GitHub horrible incorrectly. As pull requests track branches, not commits, as far as I know people normally use them as discussions around concrete code related to features. This is why they appear as issues with "code attached": the branch can then be updated under the pull request and people can continue reviewing the idea. Hence why I purposely used a pull request on a really rough "this is obviously wrong" patch concept, in order to allow it to be organized in the issue tracker.


quentinmit commented Apr 19, 2012

Keith, this issue is real. I've observed it myself. You shouldn't close the pull request just because the current code isn't mergeable - we can discuss it here on this issue, and eventually we'll have something to merge.


keithw commented Apr 19, 2012

I don't doubt the issue is real -- I'm just not going to apply the pull request that was submitted (I don't think anybody expected otherwise?). If you still want to discuss here or change the commits, be my guest.

@keithw keithw reopened this Apr 19, 2012


kmcallister commented Apr 19, 2012

Hence why I purposely used a pull request on a really rough "this is obviously wrong" patch concept, in order to allow it to be organized in the issue tracker.

Well, it says

saurik wants someone to merge 1 commit into keithw:master from saurik:saurik-synchronized-exit

at the top, and that's plainly not the case.

I would rather reserve pull requests for code which is plausibly ready to be merged. That way the distinction between "issue" and "pull request" is useful to the people who actually do the pulling. Of course, you can create a branch in your repo and link to it from Markdown code in an issue comment. That's what I've done for my android branch and several others in the past.


kmcallister commented Apr 29, 2012

Closing this, as discussion has run down, and I don't want it sitting in the pull request queue forever. I've opened #249 to keep track of this issue.

It's unfortunate that I apparently can't turn a pull request into a "regular" issue. I've already sent GitHub an email complaining about this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment