Make a real async API instead of hacking rack #5

aq1018 opened this Issue Jun 24, 2011 · 10 comments


None yet
4 participants

aq1018 commented Jun 24, 2011

Here are a few issues I found with current approach to ruby async web stack (or lack of one), including this gem:

  • Rack is not designed with async in mind.
  • An awesome hack is still a hack, not a solution.
  • Confuse developers. Some middlewares are async, some are not.
  • throw :async scares me.
  • Various bugs and unexpected behaviors make development a pain.

Why we need a separate rack-like API for async web apps:

  • Can be designed to be async from ground up.
  • No hacks.
  • Allow future async web frameworks and async app servers to integrate and grow.
  • A very important piece in any web stack.
  • Leads to more friendly development experience.

P.s. Sorry for putting this here as an issue, I couldn't find any discussion about this possibility anywhere, so I'm trying to see if other people's opinions on this idea.


rkh commented Jun 24, 2011

Agreed. This project never aimed for being more than a hack.


raggi commented Jun 25, 2011

Also, i kinda have to bzzzt at some of the points, you only really want to specialize certain portions of your infrastructure where there are bottlenecks. sync is perfectly suitable for most web apps and frameworks.

The thing is, the lack of uptake or interest from parties with real use cases has significantly lowered the priority for me actually executing this with what time I have.

Middleware chains can't work the same way in async systems (having built a few), so a "rack like api" is kind of misleading. The best approach is actually to have process / state oriented actors in a two way flow chain. That being said, a generic framework along these lines tends to not be suitable for the /really valid/ use cases for it, and as such you often have to specialize for that level of performance.

If you don't need that level of performance, then the hack works just fine through something like async-sinatra, for the time being. That is, it's suitable for proof of concepts and simple (and particular) long running or IO heavy operations.

Yes, it would be ideal to have something in the long run. I have to strongly recommend against the goliath route though, and it seems not many people really know how to build a generic approach for this.

aq1018 commented Jun 25, 2011

I have a rather special use case here. I'm working on a service API that is basically a middle man for all the internal APIs in my company. When a request comes in, it talks to a bunch of internal APIs and aggregates data. Our stats indicates that it spends about 80% of its time waiting on other requests. After some simulations / benchmarks, we were convinced that some sort of reactor pattern such as node.js or eventmachine will work well in this problem domain. Although I prefer node.js, my team members are all ruby guys. So node.js is simply not an option.

So I would say there is definitely real needs for a fully asynchronous stack. However, the lack of mature libraries (other than eventmachine) scares off 'enterprise people'. So it is really a chicken or egg question.

Regarding async API it is only "rack like" in purpose, that is binding web frameworks and server apps. That's all. I guess it will have both request and response as deferrable objects, and used like, response), where @app can be executed in a callback block inside the middleware.

To be honest, I'm perfectly fine with your gem or async-sinatra. However, my company doesn't feel secure using this because 1). It's a hack, and all hacks are bad. 2). This service is the core part of the company infrastructure, and it just can't go down, ever!

This is basically why I'm pushing for a new API. I'm aiming in the long run I can sell this async pattern to my company, and I don't mind start hacking on this new API. In fact I just finished reading thin source code and I got a general feel on how app servers integrate with rack. Also looking at node.js and connect to see how they implemented this. And I'm actually planning on writing one using my 20% time. :)

I looked at goliath as well. It got me really scared on how they hide all the callbacks behind ruby fibers. Unless I'm intimately familiar with the internals, I won't be able to tell what is async and what is not. And what implications would have if more and more abstractions are applied on top. I think this is a bit too much magic for my taste. Amazingly, it seems Goliath is getting a lot of attention. I just hope the future ruby async route is not by hiding it.

Lastly, I will put something up for the new API on github next Friday or so.

Oh... do you have a name for this puppy? ( Maybe puppy? Or pipe? Just some random ideas... )

aq1018 commented Jun 25, 2011

Maybe call it A3 -- Asynchronous Application API


raggi commented Jun 25, 2011

Passing single termination machines (deferrables) as request/response objects is ok, up to a point. the problem is, that doesn't handle a significant amount of other use cases.

As far as your requirements go, if it's got such a high requirement for reliability, you only have two options anyway: fix this in process, or move away from ruby or node. Neither ruby or node are really all that "mature" for HA systems, although people can and do use them for this. You can build HA systems out of unreliable systems by planning for failure and mitigating it in system and process.

One option you may not have considered is to simply aggregate your apis using proxies. I could envisage a very very simple nginx + lua combo that could easily aggregate this all for you, use < 10mb of ram, and be very reliable and extremely performant.

Equally you could just fire up your prototype on top of thin or rainbows in threaded mode and see how well it performs. MRIs green threads aren't that bad provided you aren't killing yourself with deep stacks or large heaps. An application like this shouldn't need to do so. Taking the async route, the GC pain is just as bad if you aren't limiting heap space, and so you'll get best performance with limiting the number of concurrent operations in order to keep heap growth down. As is often the case, forking out a lot with REEs COW compatible mode will also allow you to spawn more backends cheaply (especially for something ram-cheap like a sinatra app). Of course, don't take my word for it, all of this is easy to profile / benchmark.

aq1018 commented Jun 25, 2011

I wouldn't categorize that system as HA. Although it is quite critical. The current system is built with rails 2.3 backed with 20 unicorn processes. Also, I guess I vastly simplified the explanation. By aggregation, there is actually a lot of business logic going on behind the scenes. Current app boots up eating 300MB of RAM, and leaks to 1GB overtime. Because of this we are actually hitting a memory / performance issue right now. This is why we are looking at alternatives. Ideally this thing should probably be done in Erlang or something, but again, company demands ruby. So we really only have one option, that is fix in process.

Currently we are looking into thread everything with rainbows. Meaning make everything thread safe. We also use a lot of legacy gems, and some of them are probably not thread safe. It will take some work to make the code we write thread safe, and then we will ensure the libraries we include are also thread safe. I feel there will be a lot of thread related debugging going on if we go this route, and according to my experience, a lot of bugs won't show up until you put a huge load onto it for a long time. This could mean that some of those bugs will be unnoticed until a few days after deploying to production system, and all hell break loose.

This is why I favor reactor patterns.

I'm kind of new to ruby, but I'd like to +1 the OP. Looking into all of this async stuff is sending chills through my spine.

I quickly ruled out using Rails, as I found the framework messy. And ended up shopping around for alternatives. Sinatra is neat but only useful for simple apps. I'm still shopping...

While trying all this stuff, I also looked into how it worked in the background. My initial/naive attempts included using mod_ruby, mod_rails/mod_rack, thin, goliath... More lately, rack by itself behind webrick/mod_rewrite and mod_proxy.

Anyway, to get to the point, it's a bit surprising to discover a (supposedly?) mature language with a (large) number of libraries yet no reasonably standard, functional, and generally accepted means to do an async request without risking that it blows up in one's face.

I'm getting the impression that the number of gems that are thread safe is about zero. A day or two ago I even noted that event machine is not thread safe. It's kind of scary. :-)

Is there no project around that seeks to build a new and thread safe web stack from the ground up? Ideally one built for ruby 1.9 only (i.e. using fibers)?


raggi commented Jul 3, 2011


Your implication that a reactor library of eventmachines nature should be thread safe is remarkably odd. To be honest, that ticket is somewhat misleading. What it means is, we want to make the reactor safe for dispatch from other threads, but it would be a really bad idea to roll locks into the core. Yes, we do need to do something to make it signal safe, but actually, the interpreter has many signal handling issues also. Signal handling is hard, and failings in this area are common.

Rails is theoretically thread safe, you could try Rainbows for a modern thread safe ruby web server.

I have to say, this discussion still reminds me of the roflscale discussion, and I can't help but want to encourage people to fix the root cause problems first, before trying to run down lines of far more complex systems. Complex systems shouldn't necessarily be avoided, but learning about them before trying to implement production software in them is a much safer idea.

Your final comment regarding "built for ruby 1.9 only (i.e. using fibers)" shows further confusion. So far, I have seen no one in the ruby community write a lock of any form that supports Fibers. This points out a clear lack of use or understanding within the community, and as such, people diving into these things is probably doing more damage than good. In fact, forget probably, it is, I help people out of trouble all too often.

Sorry for the negativity, it's a product of frequent disappointment in peoples discussions around this area.

Well, again, I'm quite new to Ruby, so I'm certainly misunderstanding a number of things. And being a newbie, I'm most certainly impressed (wrongly?) by tickets which suggest that EM isn't thread-safe in its own right.

Maybe I've misunderstood the whole thing. I mean, seriously. I certainly can't rule that out. I've only learned ruby a few weeks ago, so please take everything I write with the appropriate pinch (hangar load?) of salt. And no worries, I didn't feel any negativity in your comment. A lot less than in mine anyway, and I'm a bit sorry about it in retrospect. ;-)

I'm still loving the language, mind you. But I'm having some doubts lately. I was hoping to find something equivalent to PHP's Symfony2 for Ruby somewhere, somehow -- something a lot less verbose than the PHP version, but just as clean and well designed. I found tons of great-looking libraries, and C-based stuff and what not to speed things up. I really did a lot of homework (even at the risk of broadcasting a lack of humility). But the last thing I want to do is to port Symfony2 to Ruby. I'm sure many a dev would love it, but I just can't. Not alone.

And then there's this performance thing... For someone like myself, i.e. who dismissed ruby even before learning it a few years ago (when the hype was all about RoR) due to potential performance bottlenecks... it's a bit scary.

I see your point when you say that performance is no big deal for most apps. I honestly do. I've experienced the same in the past couple of years. Customers worrying about this and that and what not, and then you discover they're serving 3k requests per day. Ludicrous.

But I also know it occasionally counts.

At any rate, picture something multi-process (a la Thin, to work around the GIL) and fork-based (a la Passenger, to reduce memory) and evented (to reduce IO blocking). In as far as I've been reading and browsing (both docs and more importantly code), I've yet to find anything that does this. Rainbows (forgot to mention I looked into that one too) didn't best I could read its source code.


raggi commented Jul 4, 2011

Getting rails 3 to pull ~200r/s with clean code is not hard. I mean real application code, that actually hits a database.

Clustering that is not hard.

Saving the cost of forking with REE in that use case is not hard.

Rails is quite a bit simpler than symphony for many similar use cases.

Highly concurrent evented code will not necessarily save you ram (I've built plenty). Each concurrent request needs more ram for the current context. MRIs biggest penalty comes from the GC, so go figure.

Highly evented code is useful for special use cases, but general web development is NOT IT.

Go write some Rails, turn on the pragmatism, and you'll fast learn that most of the people who've been blogging about "performance issues" in Rails are mostly writing god awful code.

Some of the authors of the more recent and cited complaints about rails are also responsible for some frankly dire "async solutions".

Rainbows supports quite a few modes for serving concurrently. It's code is not that complex, either.

Finally, whilst I might have written a lot of Ruby code in recent years (including a lot of the stuff that's under discussion), I am not really a rails head, in fact, I know barely any of the rails api. That all aside, it's trivial for me, and for most people to use. Stop reading blogs, and start determining the truth for yourself through real experimentation. If blogs were papers, then most of what you've been "researching" would be coming out of crappy politechs and not being cited.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment