Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Already on GitHub? Sign in to your account

is mosh actually slower than ssh on fast connections? #222

saurik opened this Issue Apr 18, 2012 · 13 comments


None yet
3 participants

saurik commented Apr 18, 2012

So, I'm now realizing that the latency of keypresses while using mosh is actually slightly worse than the latency of keypresses while using ssh when you are on a good connection (30ms ping and no packet loss)... I've been having many more serious mosh issues over the last few days that it is only now when everything finally worked that I started doing side-by-side comparisons with ssh and I'm realizing that maybe there's something wrong here.

The mosh-induced lag is slightly noticeable on individual keypresses (with or without prediction forced with -a, btw), but it seems really striking when the screen needs to get updated. As an example: if I do a clear followed by an ls (large enough folder to cover most of my screen with files), when I hit enter on ssh the content very quickly streams to me, but with mosh it kind of hiccups. If you do clear; date; ls over and over again it is a little more noticeable.

Even putting that sequence in a while true; do...done loop, so without interaction from the client, is somewhat indicative of the general slowness in rebuilding the screen: the amount of time that mosh spends with most of the screen blank is quite a bit longer than the amount of time ssh does; in fact, ssh can refresh the screen sufficiently quickly that the date at the time barely ever blinks out. The result is that mosh feels "heavy" when doing elaborate things like opening a full-screen text editor. :(

Any reason this might be the case? As I stated, I tried with both prediction on and prediction off. I am using iTerm2 on an OS X 10.6.8 client (and as the client doesn't experience issue #218 I've been doing most of this performance testing with the currently-mainline poll implementation, although I have also tried the select implementation with no apparent difference, although that was before my more comprehensive tests: most of what I've been testing is the poll variant).


keithw commented Apr 19, 2012

It's not out of the question, and we do have pretty comprehensive data on this using the real world traces that were in the mosh research paper.

In general, this depends on the link dynamics and the CPU power of the server. The mosh paper has lots of timing details, but in general, we are doing smoothed delay-based frame-rate control (plus we're running something with the computational intensity of "screen" at least). SSH is using TCP's window-based, loss-triggered congestion control. In the case of very small updates (like you type a character and it echoes a character), TCP is basically not restricting the flow at all in this situation, and the echoes may well be coming from the kernel.

The absolute worst case for mosh is if you run something like "clear" and hit enter. Let's say the shell or kernel echoes the "enter" after 5 ms. Then the SEND_MINDELAY timer starts (it's 15 ms), because we're trying to aggregate all the related writes together to save time. Let's say because of CPU slowness or whatever, the actual "clear" doesn't happen for 16 more ms. That means we send a frame at t=20 ms (just showing the effect of the "enter"), and THEN at 21 ms there's a new frame. But now we have to wait the full "frame rate" interval, which on a fast link is 20 ms minimum. So the frame doesn't get fired off until t=40 ms.

By contrast, SSH would probably give you that new frame at t=21 ms. So there is a 19 ms delay in this case.

If your RTT is longer (so the frame rate is slower), the penalty for missing that first frame will be even larger. There's a graph on this in the mosh paper for RTT=500 ms justifying the 15 ms figure, but YMMV -- and it does depend on whether your CPU can come up with an answer in the first 15 ms.

The mosh timing parameters are all in src/network/transportsender.h (and src/network/transportsender.cc for SEND_MINDELAY) if you want to play around with them. You can use the tools in the https://github.com/keithw/stm-data repository to get rigorous timing data of your client->server and server->client interactions.

Plus there's the more plebeian reason that mosh is running a whole machinery that's on par with "screen", and if you're just doing something simple like typing one letter and seeing it echoed on a fast link, that is basically the time our approach is the most overkill (from a CPU perspective). So it might be interesting to compare mosh vs. SSH+tmux as well as mosh vs. raw SSH.


Could you have a dynamic SEND_MINDELAY that is normally 0, but temporarily increased to 15 when two changes occur within a certain interval?


saurik commented Apr 20, 2012

I am now doing comparisons between: 1) ssh, 2) screen->ssh->screen, 3) mosh. There is very little aggregate speed difference between # 1 and # 2: the only thing I notice is that # 1 seems to chunk the output back to me while # 2 noticeably streams it back character-by-character. In comparison, # 3 is, even when I recompile mosh-server with SEND_MINDELAY set to 0, noticeably slower (and in some of my test cases even painfully slower).

I think the thing that I am most surprised by is that even if I turn on "always predict" I still get horrible delays and lags while typing. Even if I run tmux inside of tmux inside of tmux inside of tmux (yes: with four status bars on the bottom of my screen stacking their way up) and put that entire thing inside of screen, the result is still incredibly and massively faster than mosh: why isn't "always predict" pretty much as fast as a local shell?

Other than the SEND_MINDELAY change, I am testing using mosh mainline (commit 6a3ea5) with no local changes. The server is using Ubuntu Oneiric (with some random packages from Precise) and the client is Mac OS X 10.6.8 (the version of Mac OS X that most of the people I know are still using). I am using what I believe to be the latest released build ( of iTerm2 as my local terminal (if that is at all interesting). I have ~95ms of RTT.


saurik commented Apr 20, 2012

In comparison, at ~140ms RTT, such as to one of my other servers, I can tell that mosh is providing a definite benefit, especially if I turn on "always predict" mode. (edit:) Although, if I exit the screen session I was in (this was mosh->screen->irssi), and now start typing at a bash prompt (so just raw mosh), and I no longer get prediction (despite having -a; maybe it is not trusting its predictions anymore), it is noticeably slower than the raw ssh connection. This server does not have SEND_MINDELAY set to 0, however, so maybe it is the 15ms lag that is doing it (although I can't imagine it is that little if I'm noticing it that strongly). When prediction kicks in it is much better. Irritatingly, prediction never kicks in as I start typing a command: it is only after I've been typing a command for a while (I still have not yet read the paper on the algorithm).


keithw commented Apr 20, 2012

Saurik, interesting. Here's what I get with the timing tools I sent earlier. This is just measuring simple character echos on a netbook (a 1.5 GHz Atom N550 running Linux), going to localhost:

raw bash: 0.6 ms +/- .05 ms
screen: 0.8 ms +/- .06 ms
mosh -a (after first prediction on line is confirmed): 1.8 ms +/- 0.86 ms
screen->ssh->screen: 2.0 ms +/- 0.13 ms
four tmuxes: 3.0 ms +/- 0.26 ms
mosh (no prediction): 24 ms +/- 4.77 ms

So with predictions off, mosh potentially does take the ~20 ms penalty I mentioned. With the predictions on, mosh is slower than screen but faster than four tmuxes or screen+ssh+screen.

I don't think 24 ms is "incredibly and massively" slower (we're talking less than 1/40 sec), but if it's that painful we can certainly think about tuning this better. From the real-world data in the paper, we could reduce MINDELAY from 15 to 8 without much difficulty on long-delay links. The other one to try is SEND_INTERVAL_MIN in src/network/transportsender.h, which is currently 20 ms (50 fps maximum frame rate). If you reduce that, you will take less of a penalty for missing the frame on fast links (RTT < 40 ms). We may want to do that as well.

keithw added a commit that referenced this issue Apr 20, 2012

Adjust timing parameters in response to real-world trace data.
Also increases maximum frame rate from 50 fps to 100 fps.

Relevant to issue #222 on github.

keithw commented Apr 20, 2012

This commit improves the "mosh (no prediction)" case to 15.7 ms +/- 1.0 ms, now under 1/60 second. (We also increase the maximum frame rate from 50 fps to 100 fps.)


saurik commented Apr 20, 2012

Test case: take the string at the end of this comment and copy/paste it into a shell. (Note: when I say "ssh" or "mosh" here, I mean to a remote host; right now RTT of 30ms, as evening has finally arrived.)

ssh: it starts appearing immediately, and most of the content starts to appear in a few chunks
screen->ssh->screen: it starts appearing immediately, and seems to kind of stream the rest of the content
mosh: it literally just sits there outputting absolutely nothing for an entire second (seriously) and then the content appears in three fairly slow chunks

"""this is the way to go to the store this is the way to go to the store this is the way to go to the store this is the way to go to the store this is the way to go to the store this is the way to go to the store this is the way to go to the store this is the way to go to the store this is the way to go to the store this is the way to go to the store v this is the way to go to the store this is the way to go to the store this is the way to go to the store"""


saurik commented Apr 20, 2012

(I was also doing test cases involving just typing into the terminal and paying attention to how far behind the text was, but that test case with the copy/paste is the most striking, and is part of the reason I start using "really far out there" adverbs to describe the difference in performance. ;P) (My current likely-to-be-entire-incorrect guess is that mosh is doing a copy of the entire terminal state every time it accepts new information and tries to make permissions; given the experience I had last night where even with what seemed like a reasonable framerate it was using most of my CPU making terminal states, I figure that mosh is really inefficient at that.)


keithw commented Apr 20, 2012

A second? You're just hitting a bug or something. I can't reproduce that.

I just tested this copy and paste (on a Wi-Fi link that then gpes 12 hops, with RTT of 20 ms).

mosh -n [without the above commit] displayed it to the screen in two chunks, the first after 32 ms and the second after an additional 19 ms.

If it were taking 1000 ms, you better believe I'd be up in arms too.

(Copy-and-paste is not going to test mosh's predictions, because everything is arriving at the same time. We only display predictions once we have at least one prediction in that epoch confirmed -- generally that's the first letter you typed on the same line. But if all the keystrokes all come at the same time, by the time the first one is confirmed, they're probably all confirmed so prediction doesn't buy you a lot.)


saurik commented Apr 20, 2012

In another terminal tab I attached gdb to mosh-client, pasted the test string into mosh, and then fast-fast-like-a-bunny switched to gdb and hit ctrl-c; I managed to catch it before it managed to output anything.

(gdb) bt
#0  0x613dbeee in std::__copy<true, std::random_access_iterator_tag>::copy<wchar_t> ()
#1  0x613dbfdb in std::__copy_aux<wchar_t const*, wchar_t*> ()
#2  0x613dc327 in std::__copy_normal<true, true>::__copy_n<__gnu_cxx::__normal_iterator<wchar_t const*, std::vector<wchar_t, std::allocator<wchar_t> > >, __gnu_cxx::__normal_iterator<wchar_t*, std::vector<wchar_t, std::allocator<wchar_t> > > > ()
#3  0x613dc3bf in std::copy<__gnu_cxx::__normal_iterator<wchar_t const*, std::vector<wchar_t, std::allocator<wchar_t> > >, __gnu_cxx::__normal_iterator<wchar_t*, std::vector<wchar_t, std::allocator<wchar_t> > > > ()
#4  0x613f2f96 in std::vector<wchar_t, std::allocator<wchar_t> >::operator= ()
#5  0x613f31fd in Terminal::Cell::operator= ()
#6  0x613de471 in std::__copy<false, std::random_access_iterator_tag>::copy<Terminal::Cell const*, Terminal::Cell*> ()
#7  0x613de51b in std::__copy_aux<Terminal::Cell const*, Terminal::Cell*> ()
#8  0x613de5c7 in std::__copy_normal<true, true>::__copy_n<__gnu_cxx::__normal_iterator<Terminal::Cell const*, std::vector<Terminal::Cell, std::allocator<Terminal::Cell> > >, __gnu_cxx::__normal_iterator<Terminal::Cell*, std::vector<Terminal::Cell, std::allocator<Terminal::Cell> > > > ()
#9  0x613de65f in std::copy<__gnu_cxx::__normal_iterator<Terminal::Cell const*, std::vector<Terminal::Cell, std::allocator<Terminal::Cell> > >, __gnu_cxx::__normal_iterator<Terminal::Cell*, std::vector<Terminal::Cell, std::allocator<Terminal::Cell> > > > ()
#10 0x614254b9 in std::vector<Terminal::Cell, std::allocator<Terminal::Cell> >::operator= ()
#11 0x614301fd in Terminal::Row::operator= ()
#12 0x613e1122 in std::__copy<false, std::random_access_iterator_tag>::copy<std::_Deque_iterator<Terminal::Row, Terminal::Row const&, Terminal::Row const*>, std::_Deque_iterator<Terminal::Row, Terminal::Row&, Terminal::Row*> > ()
#13 0x613e127f in std::__copy_aux<std::_Deque_iterator<Terminal::Row, Terminal::Row const&, Terminal::Row const*>, std::_Deque_iterator<Terminal::Row, Terminal::Row&, Terminal::Row*> > ()
#14 0x613e13a5 in std::__copy_normal<false, false>::__copy_n<std::_Deque_iterator<Terminal::Row, Terminal::Row const&, Terminal::Row const*>, std::_Deque_iterator<Terminal::Row, Terminal::Row&, Terminal::Row*> > ()
#15 0x613e14d3 in std::copy<std::_Deque_iterator<Terminal::Row, Terminal::Row const&, Terminal::Row const*>, std::_Deque_iterator<Terminal::Row, Terminal::Row&, Terminal::Row*> > ()
#16 0x613e5bbb in std::deque<Terminal::Row, std::allocator<Terminal::Row> >::operator= ()
#17 0x613e7739 in Terminal::Framebuffer::operator= ()
#18 0x613c9486 in STMClient::output_new_frame ()
#19 0x613c9eb0 in STMClient::main ()
#20 0x613c174e in main ()

saurik commented Apr 20, 2012

OK, staring at that for a while I think I'm wasting your time: I think the flags I passed to configure to make mosh link correctly against protobuf and ncurses accidentally knocked out -O2. :(


keithw commented Apr 20, 2012

Oh, yeah, that didn't occur to me but we do love our optimization with all this STL going on. I had a fit when homebrew wanted to override our compiler flags and compile us (as they compile everything) with -Os.

In any event, we did bring the latency on "mosh localhost" down from 24 ms to 15 ms on my netbook, and at least now we have real data about where we stand.


saurik commented Apr 20, 2012

Yup (unoptimized build; apparently fatal for performance of mosh, understandably). Sorry. :( (Although, I guess I'm somewhat glad that those timing parameters got looked at.)

@saurik saurik closed this Apr 20, 2012

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment