Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Launching grains is slow #2975

Open
zenhack opened this issue Aug 19, 2017 · 7 comments
Open

Launching grains is slow #2975

zenhack opened this issue Aug 19, 2017 · 7 comments
Labels
app-platform

Comments

@zenhack
Copy link
Contributor

zenhack commented Aug 19, 2017

This was just asked about on IRC, and I noticed it before too: launching grains is unreasonably slow. For concrete numbers: my janky todo app (https://github.com/zenhack/yata), when run outside of the sandbox, starts instantaniously. In contrast, clicking on it in the grain list, while sitting about 15 feet from the server on my laptop (on wifi) takes 2-3 seconds before the UI appears. I remember there being some discussion about this on the mailing list wrt davros way back:

https://groups.google.com/forum/#!searchin/sandstorm-dev/davros$20startup|sort:relevance/sandstorm-dev/-mncsxPR7Rg/o3DHo_ynAgAJ

At the time we were working under the assumption that davros was at fault, but I suspect that is not the case, given that startup times outside the sandbox are dramatically faster. In the case of the app I linked above, it's basically just opening a sqlite database and then listening on a port; this takes almost no time outside the sandbox, and it seems unreasonable that it should take seconds within.

I've also noticed that it does get worse on worse internet connections, I think disproportionately to the decrease in overall network performance (but I'd have to do more careful measurements to be sure).

For reference, here is the discussion on IRC:

https://botbot.me/freenode/sandstorm/2017-08-18/?msg=90006029&page=1

@amenonsen
Copy link

amenonsen commented Aug 19, 2017

I also see the same problem on my new sandstorm installation. I would be happy to provide any diagnostic information that could help understand the problem.

@kentonv
Copy link
Member

kentonv commented Aug 26, 2017

In an ad hoc test I did just now, a new YATA instance starts up in ~1.7 seconds on Oasis and ~0.5 seconds running locally. I can't seem to reproduce the issue @zenhack describes.


That said, it's definitely true that many Sandstorm apps start up pretty slowly. But this isn't because the server is running any slower. Sandstorm's approach to sandboxing has almost zero overhead in terms of server performance.

Typically, the problem is some combination of:

  1. App servers or server frameworks that are not designed with fine-grained containerization in mind, and so historically haven't had any pressure to optimize startup times. Etherpad, for example, is pretty slow to start for no really good reason -- it's just doing a bunch of stuff it shouldn't need to. Rocket.Chat is ridiculously slow, taking some 30-40 seconds before it will respond to a request. The way I'd like to solve this (other than optimizing every app individually) is with a checkpoint-restore approach based on snappy-start. But, that's a major project for which I have not yet had time.
  2. Very large client asset bundles, which have to load fresh for every grain. Currently there's no mechanism by which static assets can be cached and reused across grains, so the client has to reload them every time. Worse, the bundle can't even start downloading until the grain server has started, so this stacks with the first problem. I'd like to develop a mechanism by which apps can specify static assets that are served directly by Sandstorm in a way that can safely be cached across grains and can load in parallel with the grain server. This is a comparatively smaller project than snappy-start on the Sandstorm end, but every app will likely have to be updated to integrate with any such system.

@zenhack
Copy link
Contributor Author

zenhack commented Aug 27, 2017

@kentonv
Copy link
Member

kentonv commented Aug 27, 2017

My previous measurements were based on holding a stopwatch, so they accounted for human cognition delay.

If I look just at the Chrome devtools network panel, I get time-to-fist-byte of ~230ms for a new YATA grain, ~130ms for an existing grain. The latter seems to be independent of whether the grain was already running.

On Oasis, I'm seeing an existing grain TTFB is 450ms, and a new grain is 625ms (assuming the app is cached on the workers -- pulling from cold storage can add a second or two). About 200ms-300ms of this is DNS + TCP + TLS for the newly-created subdomain. Meanwhile there are three other network round-trips needed on grain load, and my RTT to Oasis is 60ms. So in this case the time is almost entirely explained by network round trips. Conceivably we could find ways to eliminate a round trip or two.

Now I think I have an explanation for your observations on your local server: where is your DNS? My guess is that when you're seeing a multi-second startup time for YATA, it's almost entirely DNS lookup time, and your DNS is remote. I have a local DNS server for my local Sandstorm instance so it's roughly instantaneous for me.

In any case, I think we might be looking at maybe 100ms of Sandstorm bookkeeping overhead (maybe some Mongo queries, etc.), which parallelizes with three network round trips. We could probably reduce either of those numbers a bit with some optimizations. But I don't think this is the real problem with Sandstorm app startup times. If every app started as fast as YATA I think everyone would be very happy. The real problem is the multi-second startups of more bloated apps.

@zenhack
Copy link
Contributor Author

zenhack commented Aug 28, 2017

The dns issue had occurred to me; setting up a local one and comparing is on my todo list. I'm using sandcats for dns. dig tells me I'm getting response times of < 50 ms, so I'm skeptical, but I'll sit down and test soonish.

@zenhack
Copy link
Contributor Author

zenhack commented Aug 28, 2017

Okay, yeah, setting up dnsmasq on my machine and having it handle requests for the sandstorm box's domain speeds things up substantially. 200ms for the local system, around 1 second for the machine in the other room (measured via the firefox dev tools). The latter still seems longer than it ought to be given that we're talking about wifi to a machine in the next room, but it's at least well within the not-annoying range.

The motivator for this though is actually my phone on LTE, which takes much longer, even when the signal is such that loading times for e.g. zenhack.net are still imperceptable. I can't convienently set up a custom DNS resolver on my phone to handle things locally, (a) because sandcats dns is dynamic, so it would break when my IP changed, and (b) just because doing that on a phone is a bit annoying (though I could figure something out if it were critical).

A few seconds on top of sandstorm itself is enough to make me think twice about bothering to open up my phone to jot down a todo item (half the reason I wrote YATA was that simple TODOs was even worse, and it is a big improvement).

It occurs to me that using per-session per-grain domains is going to defeat DNS caching, at least if we're just responding per-domain. I have heard that there are some significant problems re: compatiblity with wildcard domains, but I don't know just how bad they are/how widely supported they are anyway. One thought is to get sandcats to supply a wildcard record to the DNS client.

@kentonv
Copy link
Member

kentonv commented Aug 29, 2017

AIUI, there's actually no such thing as a "wildcard record" in the DNS protocol. Rather, configuring a wildcard causes the server to respond to all matching requests in the same way. It's entirely up to the server to implement the matching.

There IS such a thing as wildcard TLS certificates, but that's a different matter.

We could "fix" the slow-DNS problem by "pre-allocating" hosts: When you open Sandstorm, it could randomly-generate some hostnames client-side and fire off dummy requests to them, to force DNS lookup and even TLS negotiation to complete. Then upon opening a session, the client could request that the server assign a particular hostname. I think it's fine, security-wise, to allow this -- a client who chooses a non-random hostname would only be hurting themselves.

@ocdtrekkie ocdtrekkie added the app-platform label Feb 12, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
app-platform
Projects
None yet
Development

No branches or pull requests

4 participants