mosh doesn't work on PowerPC OS X 10.5 (Leopard) #479

Closed
gordon-morehouse opened this Issue Dec 16, 2013 · 12 comments

Projects

None yet

4 participants

@gordon-morehouse

Built 1.2.4 from source after trying Tigerbrew's 1.2.4. Both seem to connect and then instantly exit, saying

[mosh is exiting.]

After a couple of line feeds. Tried on a couple different mosh servers, x86 and ARM, both Debian or close enough.

@andersk
Member
andersk commented Dec 16, 2013

If you apply #452 on the client side, you might get a more informative error message.

@andersk
Member
andersk commented Dec 16, 2013

Also, this might be related to #424? (I’m not a Mac user.)

@gordon-morehouse

I'll happily try it - it may take me a day or two.

@gordon-morehouse

I manually applied #452 to 1.2.4 as downloaded by Tigerbrew using 'brew install --interactive'. There's still no useful error output after switching to alternate screen and back; it just does that and prints "[mosh is exiting.]" after a couple line feeds as before.

@gordon-morehouse

What can I do to help get this working on PowerPC? Now that I'm used to mosh, its lack is very limiting on my PPC laptop.

@keithw
Member
keithw commented Feb 3, 2014

What happens when you run mosh-server and mosh-client separately (as described at http://mosh.mit.edu)?

@gordon-morehouse

Okay, I've installed mosh again on my G4 with Tigerbrew (see mistydemeo/tigerbrew#87 for related ticket). This installs 1.2.4, downloading the tarball straight from mosh.mit.edu. This is on OS X 10.5.8 running on a PPC 7450 (1.25GHz G4). Compilation was mostly accomplished with 'cc1plus' - just watched 'top' as it was building. Here's the build output, if it's useful:

$ brew install mosh
==> Downloading http://mosh.mit.edu/mosh-1.2.4.tar.gz
Already downloaded: /Library/Caches/Homebrew/mobile-shell-1.2.4.tar.gz
==> ./configure --prefix=/usr/local/Cellar/mobile-shell/1.2.4 --enable-completio
==> make install
==> Caveats
Bash completion has been installed to:
  /usr/local/etc/bash_completion.d
==> Summary
/usr/local/Cellar/mobile-shell/1.2.4: 13 files, 1.6M, built in 4.7 minutes

So, I run mosh-server on the remote host:

$ mosh-server

MOSH CONNECT 60024 6FsVVXcwd4wSMJRAQuAonQ

mosh-server (mosh 1.2.3)
Copyright 2012 Keith Winstein <mosh-devel@mit.edu>
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

[mosh-server detached, pid = 3546]
$

And then I run mosh-client on the G4, and it exits in probably less than half a second with an interesting message at thte top-of-screen bar, more on that in a sec, but here's how I executed it (IP redacted):

$ MOSH_KEY=6FsVVXcwd4wSMJRAQuAonQ mosh-client 111.222.111.222 60024

mosh did not make a successful connection to 111.222.111.222:60024.
Please verify that UDP port 60024 is not firewalled and can reach the server.

(By default, mosh uses a UDP port between 60000 and 61000. The -p option
selects a specific UDP port number.)

[mosh is exiting.]

First order of business: I'm connecting from my home here with the G4. All other machines using mosh - OS X Intel, Linux, and Linux on Raspberry Pi - have never had any problems connecting to this remote host through my NAT, and have been doing so for months, multiple times a day. This problem also occurs on the G4 on friends' networks, coffee shops, and so on, so I'm fairly confident that it's not a networking problem external to the G4. I've also never observed these symptoms on anything but the G4.

Now to the more interesting bits. First, when I attempted to connect with 'mosh-client' and it rapidly exited, I noticed a flash of the top-of-screen bar, so I ran it multiple times so that I'd have enough time to read it. The bar read something as follows, but with the "without contact" time randomized between about 20:00 and 55:00 every execution.

mosh: Timed out waiting for server... (29:51 without contact)

Finally, when I looked for PID 3546 on the remote host, it did not exist.

I hope this is useful! Let me know if there are more steps I can take.

@gordon-morehouse

Building with GCC 4.8 (built from Tigerbrew) does not solve this problem on PPC 7450 ("G4e") running 10.5.8.

@gordon-morehouse

This is broken the same way on OS X 10.5.8 on a G5 (PPC 970), so it's not just confined to PPC 7450.

@mistydemeo

I can repro on a PowerBook G4 with OS X 10.5.8. I'll try building on Intel OS X 10.5.8 to see if it occurs there as well.

@mistydemeo

I tested on an Intel Leopard machine, and a command that fails on PowerPC succeeds there. I suspect there may be an endianness issue here.

@mistydemeo

So it turns out this is a simple endian issue, in the timestamp code - thanks to @keithw for suggesting it was the timestamp.

The timestamp code has a Darwin-specific code path that uses mach_absolute_time(). While that's known to have somewhat different behaviour between PPC and Intel, that turned out to be a red herring - it was something much simpler.

mach_absolute_time() returns a uint64_t value that's an "absolute time unit" which is not a second; to convert into seconds, a mach_timebase_info_data_t struct is provided with numerator and denominator members that can be used to calculate the value in a fraction of seconds. mosh converts to milliseconds like so:

millis_cache = ((mach_absolute_time() * s_timebase_info.numer) / (1000000 * s_timebase_info.denom));

The division with a different integer type produces a wrong result on PowerPC, which is big-endian, and that's the cause of the hugely wrong timestamps @gordon-morehouse was seeing. Creating a uint64_t 1000000 and dividing by that produces correct results.

That said, it seems unnecessary to have Darwin-specific timestamp code given that gettimeofday() exists and works as expected on, AFAIK, every version of Darwin going way way back.

@mistydemeo mistydemeo added a commit to mistydemeo/mosh that referenced this issue Apr 18, 2014
@mistydemeo mistydemeo Timestamp: remove Darwin-specific code
freeze_timestamp() was previously using the mach_absolute_time()
function on Darwin to determine time. This isn't really necessary,
since Darwin also supports gettimeofday(). (mach_absolute_time() also
introduced a minor endian bug that caused implausible timestamps
on PowerPC, which is easily fixed but doesn't happen with
gettimeofday() anyway.)

Fixes #479.
0abd05b
@mistydemeo mistydemeo added a commit to mistydemeo/mosh that referenced this issue Apr 18, 2014
@mistydemeo mistydemeo Timestamp: fix endian bug in freeze_timestamp()
Multiplying a uint64_t by an int was producing wrong results on big-
endian architectures (e.g. PowerPC). This resulted in implausible
timestamps that caused mosh to exit instantly on starting up.

Fixes #479.
460f7ed
@mistydemeo mistydemeo added a commit to mistydemeo/mosh that referenced this issue Apr 18, 2014
@mistydemeo mistydemeo Timestamp: fix integer overflow in freeze_timestamp()
When converting the value of mach_absolute_time() into milliseconds,
multiplying a uint64_t by an int was producing wrong results on big-
endian architectures (e.g. PowerPC) due to the larger value of
s_timebase_info.denom on that platform. This resulted in implausible
timestamps that caused mosh to exit instantly on starting up.

Fixes #479.
864f33a
@andersk andersk added a commit to andersk/mosh that referenced this issue Apr 18, 2014
@andersk andersk Timestamp: Prevent integer overflow on Darwin PPC 32-bit
A Darwin PPC 32-bit user observes huge values numer == 1000000000 and
denom == 18431683 returned from mach_timebase_info().  For these
values, mach_absolute_time() * numer overflows uint64_t every 1000.82
seconds, and 1000000 * denom always overflows uint32_t, with the
effect of making time run backwards at -11190660 times its usual
speed.

This bug was masked on Darwin x86 64-bit, where numer == denom == 1.

Fix it by doing the conversion with double arithmetic instead.

Closes #479.

Signed-off-by: Anders Kaseorg <andersk@mit.edu>
9ee4577
@andersk andersk added a commit to andersk/mosh that referenced this issue Apr 18, 2014
@andersk andersk Timestamp: Prevent integer overflow on Darwin PPC 32-bit
A Darwin PPC 32-bit user observes huge values numer == 1000000000 and
denom == 18431683 returned from mach_timebase_info().  For these
values, mach_absolute_time() * numer overflows uint64_t every 1000.82
seconds, and 1000000 * denom always overflows uint32_t, with the
effect of making time run backwards at -11190660 times its usual
speed.

This bug was masked on Darwin x86 64-bit, where numer == denom == 1.

Fix it by doing the conversion with double arithmetic instead.

Closes #479.

Signed-off-by: Anders Kaseorg <andersk@mit.edu>
ba47f9f
@andersk andersk added a commit that closed this issue May 26, 2015
@andersk andersk Timestamp: Prevent integer overflow on Darwin PPC 32-bit
A Darwin PPC 32-bit user observes huge values numer == 1000000000 and
denom == 18431683 returned from mach_timebase_info().  For these
values, mach_absolute_time() * numer overflows uint64_t every 1000.82
seconds, and 1000000 * denom always overflows uint32_t, with the
effect of making time run backwards at -11190660 times its usual
speed.

This bug was masked on Darwin x86 64-bit, where numer == denom == 1.

Fix it by doing the conversion with double arithmetic instead.

Closes #479.

Signed-off-by: Anders Kaseorg <andersk@mit.edu>
e52d22b
@andersk andersk closed this in e52d22b May 26, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment