Hijack #481

Closed
wants to merge 3 commits into
from
@raggi
Official Rack repositories member

Hi all!

I had some time on the plane with no WiFi and some holiday left over, so...

@tenderlove please tell me if this matches up with the stuff you were describing to me at rubyconf
@tarcieri please take a look as you're also familiar with these things
@evanphx as with Tony, if you wouldn't mind
@chneukirchen @rkh you know why :-)

Please also highlight anyone else who would be worthwhile, or at least send them a link. I'd appreciate feedback.

What?

We still don't have an API that allows for IO-like streaming, and this is a straw man that addresses this within the confines of the rack 1.x spec. It's not an attempt to build out what I hope a 2.0 spec should be, but I am hoping that something like this will be enough to aid Rails 4s ventures, enable websockets, and a few other strategies. With HTTP2 around the corner, we'll likely want to revisit the IO API for 2.0, but we'll see how this plays out. Maybe IO wrapped around channels will be ok.

How?

There are two modes, one for "full hijacking", which is what you'd want for websockets and various hacks, and another mode which I think is closer to what @tenderlove really wants, which is a post-headers hijack. In both cases, the application assumes control of the socket after it has been hijacked, and there is currently no API for "giving sockets back to the server", so it's recommended that folks use Connection:close with this mechanism.

Examples?

I've implemented this spec in Thin (oddly enough) because it was the only reasonable server I had checked out on my machine as I as writing it. I suspect the Puma implementation will be even simpler. It's all in this branch: https://github.com/raggi/thin/commits/hijack

Specifically the server piece lives here: https://github.com/raggi/thin/blob/e04855459cb42fd98a0a483075f8337cafe6d949/lib/thin/connection.rb

There are two examples, one for each of the hijack "modes" (pre headers and post headers):

The examples implement a trivial HTTP -> telnet chat interface. You make any HTTP request to "join" and the response dumps you into the single global line buffered chat room.

Limitations

I have not tried to spec out a full IO API, and I'm not sure that we should.
I have not tried to respec all of the HTTP / anti-HTTP semantics.
There is no spec for buffering or the like.

The intent is that this is an API to "get out the way".

Similar implementations

http://golang.org/pkg/net/http/#Hijacker

Additional example

(this is a middleware example, that ships outgoing IO in chunked encoding - good for streamed xhr or the like)

class ChunkyIO
  class IOWrapper
    extend Forwardable
    def_delegators :@io, :close_read, :read, :read_nonblock

    def initialize(io)
      @io = io
      @terminated = false
    end

    def write(data)
      @io.write chunk_header(data) + data + "\r\n"
    end

    def write_nonblock(data)
      # XXX bugs can occur here, if a partial write happens!
      @io.write_nonblock chunk_header(data) + data + "\r\n"
    end

    def chunk_header(data)
      "#{data.size.to_s(16)}\r\n"
    end

    def close_write
      terminate
      @io.close_write
    end

    def close
      terminate
      @io.close
    end

    def terminate
      return if @terminated
      write ""
      @terminated = true
    end
  end

  def initialize(app)
    @app = app
  end

  def call(env)
    return app.call(env) unless env['rack.hijack?']

    handle_request env
    response = app.call(env)
    handle_response response[1]
    response
  end

  def handle_request(env)
    orig = env['rack.hijack']
    env['rack.hijack'] = proc do
      orig.call
      env['rack.hijack_io'] = IOWrapper.new(env['rack.hijack_io'])
    end
  end

  def handle_response(headers)
    if orig = headers['rack.hijack']
      headers['rack.hijack'] = proc do |io|
        orig.call IOWrapper.new(io)
      end
    end
  end
end
@raggi
Official Rack repositories member

Sorry for the double mail. We still have a parent to this repo :'(

@raggi raggi was assigned Jan 5, 2013
@tarcieri

Awesome! I have a patch to Reel that half-ass supports websockets using something similar (it passes 'rack.websocket' into the Rack env):

celluloid/reel#17

Would sure be nice to have an official API for this sort of thing.

@evanphx

I support this! Puma exposes the socket via env['puma.socket'] already so support is trivial.

@chneukirchen
Official Rack repositories member

+1, I see no obvious problems.

@raggi
Official Rack repositories member

Do you think we need a rack.hijacked? boolean for middleware that might want to use this?

I'd generally discourage it from middleware, but, right now there's no way to tell if "someone else" is already in control of the socket.

@chneukirchen
Official Rack repositories member

Can't they just check whether rack.hijack_io is set?

@rkh
Official Rack repositories member
rkh commented Jan 5, 2013

This seems good. I think I can port Sinatra's stream method to use this without any changes to the external API.

@raggi
Official Rack repositories member

@chneukirchen no, as it is allowed to be set AOT or after the call. this part of the spec is intentionally open to avoid introducing implementation complexities. I am concerned about env replacement middleware too, although that's rare, we don't specify that middleware must pass on /the/ env object, which is why the "should" is there. we also don't directly specify that env can be written to or deleted from, but that's generally accepted + implemented.

In general maybe we shouldn't worry about the middleware case too much. It's confined and hard to specify well.

@raggi
Official Rack repositories member

It sounds like everyones in agreement. I'm tempted to place a "beta api" marker around this with a projected roadmap of removing that marker by 1.6 if all is going well. This potentially leaves us open to make a few changes if we've missed things between now and then. Necessary, unnecessary? other approaches?

@raggi
Official Rack repositories member

@macournoyer adding you, take a look - I know you're familiar with the older async api.

@igrigorik as you've also touched on similar items, maybe you have some thoughts?

@tenderlove
Official Rack repositories member

I like that there is an IO in the env hash, I think. But it seems difficult to support an API like this:

def index
  response.status = 200
  response['X-Greeting'] = 'Hi MOM!'
  10.times do
    sleep 10;
    response.stream.write "hello world\n"
  end
  response.stream.close
  ### response is closed here
end

If I'm reading the spec correctly, rack will call call on whatever is in rack.hijack. This means we have to force people to separate "things that set headers" from "things that stream". Presumably that would lead to an API like this:

def index
  response.status = 200
  response['X-Greeting'] = 'Hi MOM!'
  streaming do |io|
    10.times do
      sleep 10;
      io.write "hello world\n"
    end
    io.close
    ### response is closed here
  end
  ### response is not closed here
end

I'm afraid people would be confused by when the socket is open or closed. It may seem counterintuitive that the socket is still open after the streaming { } block is closed.

Am I missing anything?

@rkh
Official Rack repositories member
@igrigorik

From a high-level pass, this makes sense. It's also pretty much the pattern we implemented in Goliath: https://github.com/postrank-labs/goliath/blob/master/examples/chunked_streaming.rb

Admittedly, since we're "kinda, sorta" rack compliant, we were able to bend the rules and create our own helper methods to tackle the use case. Which also gives us the API @tenderlove highlighted, +/- a few EM timers. :)

However, since we brought up HTTP 2.0.. This change is not sufficient to enable what we need in that department. While the 2.0 spec is still a WIP, we have a pretty good idea as to how it's going to look, and it may be worth starting to think about it.

(But that shouldn't stop this proposal from going forward)

@tenderlove
Official Rack repositories member

@rkh just not super stoked about more callbacks. The number of call methods in my stack traces already riles me up.

@rkh
Official Rack repositories member
@tenderlove
Official Rack repositories member

@igrigorik the first api I listed is verbatim what is possible in Rails master today. In fact, the second API can be implemented in terms of the first:

def streaming
  yield response.stream
end

def index
  response.status = 200
  response['X-Greeting'] = 'Hi MOM!'
  streaming do |io|
    10.times do
      sleep 10;
      io.write "hello world\n"
    end
    io.close
    ### response is closed here
  end
  ### response is not closed here
end

Unfortunately, the opposite is not true. Also, the user having no visibility in to when the streaming { } block is executed seems sub-optimal.

@raggi
Official Rack repositories member

@tenderlove It would be worth taking a look at the example rackup files I linked. There may be a misunderstanding. There are two APIs, one where rack does not call env['rack.hijack'], users do, the other where the server calls headers['rack.hijack']. In the prior case, the call method is not a callback that executes user code, it's a callback that executes webserver code, to tell the webserver "you don't own the socket anymore, the user does". This call should do very little, and doesn't remain on the stack (see the thin server implementation).

My problem with the API that you're proposing, is that it is totally undefined when the user takes control of the socket. What would the following code mean?

response.status = 200
response['X-Greeting'] = 'Hi MOM!'

# When do the following two lines actually get written to the socket? What if I put a sleeping loop in there?
response.stream.write 'ohai!'
response['X-Treme-Headers'] = 'woosh!'

response.stream.close

If we define response.stream as "the first call to stream sends the status and headers, then takes control of the socket", then it's easy to implement in the hijack API. The example I just gave is invalid to this spec for response.stream, but that's fine. In theory you could make it "more valid" if you provided a response.close that was allowed to be called after you are "finished" with the stream, that would write a tailer to the stream socket, in raw HTTP. As tailers are barely supported though, this seems useless, and it still has the horrible caveat, that you'd have to specify what headers you're going to set later, in order to fulfill the HTTP spec for tailers. Anyway, if you're willing to accept the first call to stream meaning the user takes control of the socket, then such an implementation is easy, and would look like so:

class MyResponse < Rack::Response
  def stream
    @stream ||=
      begin
        unless env['rack.hijack?']
          raise NotImplementedError, "server does not support socket hijacking, cannot stream"
        end
        env['rack.hijack'].call
        stream = env['rack.hijack_io']
        write_status_to stream
        write_headers_to stream
        stream
      end
  end
end

Additionally, a particularly "clever" application might be able to do some buffering to allow for late initialization of headers, without a change to the proposed spec. It's a hacky solution to the problem I describe above, and horrendously complex with additional user facing edge cases, but, I could envisage something in Rails (I'd rather not), that operates with the following rules:

  • response.stream buffers outgoing data until the end of the action
  • the end of the action completes mostly as normal, setting headers['rack.hijack'] to use the response body callback api
  • when the callback occurs, response.stream writes out it's buffer, and switches to an immediate write mode

You'd need a more sophisticated stream object than just the hijack_io, but it's entirely possible to write this. This would enable you to avoid threads/callcc's, still not have to write status and headers yourself, but still have socket like IO for the body stream. If you wanted to add user threads (say a thread in an action to free up the webserver later on - good for really long running actions or the like) then all you'd need to do is ensure that the stream buffer -> immediate mode switching is thread safe. That's also not too hard. It would take a bit longer, but if this is really what you want, I'd be happy to write up a basic implementation for you.

@raggi
Official Rack repositories member

@tenderlove I think you want to be able to avoid this: https://github.com/rails/rails/blob/master/actionpack/lib/action_controller/metal/live.rb#L102-L125 but this is not a rack problem really. It's a problem with your API. Your API has no indicator to the user of when writes really happen, or when the response stops and the stream starts - it's just implied. Your API already forces people to make a distinction between "things that set headers" and "things that stream", that is here: https://github.com/rails/rails/blob/master/actionpack/lib/action_controller/metal/live.rb#L69 . Ok, so they can cause this "by accident" instead of explicitly declaring it - meaning they're in some ways more likely to trip on an exception to learn how the API works than to just have to add one line (I think it's a learnability (and thus usability) mistake).

It's not possible to run two pieces of code at the same time, without some kind of threading style abstraction. This isn't an API limitation, it's as you know, the way the world works. Is there some other solution you have in mind where you can overcome this limitation? - To be clear, you want server code (sending status and headers) and middleware code (altering status and headers in after actions) to be running at the same time as user code (streaming data), I think without threads, if I understand your concerns correctly.

Now we can serialize these operations a bit, and in these cases we have two options:

  • User code calls to the server to write status and headers

Or

  • Server code calls to user code to write the stream

You seem to indicate you would like to do the prior, which is as I describe above, possible with the request hijack side of this SPEC change. It means that the response is going to have to write the response data itself, rather than the server doing it, which is kind of ugly, but, it will work. Unfortunately, this also results in needing to disregard middleware "after actions" on the response, which will have no effect. In order to preserve middleware after actions (such as headers set after the application is called), then you have no choice but to use the second approach. It's either that, or you drop middleware altogether, and go right to the metal all the time.

We could try to sugar the first case a little, by adding an additional server callback that says "here's the status and the headers, please write them for me", so that we're not writing status and header IO code - but that's so trivial, it may not be worth it. The only advantage to this could be, that if it was a user code -> server code callback, then it could be wrapped by supporting middleware using a similar approach to the chunkyio example I give above. It's not pretty, but middleware is middleware - it needs to sit in the middle somehow. I can draft something along these lines too, if that seems "more correct" for your use case.

@tenderlove
Official Rack repositories member

My problem with the API that you're proposing, is that it is totally undefined when the user takes control of the socket. What would the following code mean?

It would raise an exception. Header hash modification is forbidden after the headers have been dumped to the socket (which happens when the buffer thread wakes, but the buffer thread won't wake until there is data queued). But since we can "liberate" the socket from the webserver, then we're free to define those semantics.

it's a callback that executes webserver code, to tell the webserver "you don't own the socket anymore, the user does".

Perfect. Exactly what I want.

I think without threads, if I understand your concerns correctly.

No, I'd like a thread that's writing to the socket, an app thread, and a buffer between them. If I can say to the webserver "the socket is mine", then this should be perfect.

Your code example is great. I'll read the telnet example you posted, but so far I am very happy. :-)

@rkh
Official Rack repositories member
rkh commented Jan 6, 2013

The example code posted assumes that in the stack we always hand down the original env hash, never a new/modified one. Otherwise the io can't be added later on. And adding it before hand could lead to undefined behaviour if someone just starts writing to it. Maybe calling the callback should instead return the IO? Or we need some wrapper object in place for the IO, or something.

@rkh
Official Rack repositories member
rkh commented Jan 6, 2013

For websockets, we should also consider updating the spec to no longer require the request body to be rewindable. Though that would have more implications for existing code.

@raggi
Official Rack repositories member

@rkh re env, yeah, i've been considering that too - I'll switch the should to a must on the returning it. maybe some language that it must be preserved / set / passed down after initialization (after the call to hijack).

@raggi
Official Rack repositories member

@rkh re rewindable and websockets, i'm not sure how practical that's going to be. clearly multipart and websockets have different usage patterns, maybe this is something where middleware that's reading from either needs to be more aware of the context. web servers supporting websocket pass-through certainly shouldn't be buffering the incoming body before hitting the app.

@rkh
Official Rack repositories member
rkh commented Jan 6, 2013

Yeah, I mean, I think if people really want websockets, they should be fine with running that through something else, as it probably doesn't fit a request/response model anyhow. For SSE and alike, this would be gold already.

@rkh
Official Rack repositories member
rkh commented Jan 6, 2013

it's a callback that executes webserver code, to tell the webserver "you don't own the socket anymore, the user does".

Perfect. Exactly what I want.

@raggi the scenario/api @tenderlove has in mind would only work with a full/pre-header hijack, right? otherwise you need to hand down a callback for starting to stream, no? This would mean that Rails would need to take care of writing headers to the stream, potentially messing with middleware, right?

@raggi
Official Rack repositories member

@rkh correct, i'm considering adding some items to rack::util to assist with that.

@rkh
Official Rack repositories member
rkh commented Jan 6, 2013

@raggi but either way you'd need to hand the reply back down the stack to make sure all the middleware adds headers. This is even more important considering that middleware usually has to first call the app before even having a header hash to add stuff to.

@raggi
Official Rack repositories member

@rkh yep, that's why I added the response style hijack api as well as the request style. there's not much we can do to solve that problem in the request style, within the confines of the env -> tuple api. do you have any thoughts to the contrary?

@rkh
Official Rack repositories member
rkh commented Jan 6, 2013

No, I'm just not sure of the advantage over the async protocol Thin and Rainbows implement. To the contrary, that already works with chunking etc.

@raggi
Official Rack repositories member

The advantages are (TINAEL):

  • You can read as well as write
  • You can do partial close
  • The API doesn't enforce buffering, and allows flushing
  • It's possible to implement the async protocol inside this API, but not vice versa (although as you see in thin, this api isn't ideal for reactors - supporting both requires an API that's truly incompatible with SPEC 1.x)
  • The API is easier for servers to allow extensions - most servers will supply real IO objects, so users will often be able to use ioctl, etc.
  • Hijackers are responsible for handling remote close logic, rather than the async.close callback, which has sensitive semantics.

I'm not sure what you mean by "that already works with chunking", unless the server is chunking as each yields (which can be horribly inefficient), then the async protocol does not have chunking implied at all.

@rkh
Official Rack repositories member
rkh commented Jan 6, 2013

I'm not sure what you mean by "that already works with chunking", unless the server is chunking as each yields (which can be horribly inefficient), then the async protocol does not have chunking implied at all.

Rack::Chunked is chunking as each yields.

The API is easier for servers to allow extensions - most servers will supply real IO objects, so users will often be able to use ioctl, etc.

This also means that the list of methods the io object needs to implement is not well defined.

You can read as well as write.

So should you read from the hijacked IO rather than rack.input?

The API doesn't enforce buffering, and allows flushing

Right.

Hijackers are responsible for handling remote close logic, rather than the async.close callback, which has sensitive semantics.

This might prove difficult for pieces of logic that cannot reflect over the entire stack. For instance, if I was to use the partial hijack and something like Rack::Chunked or Rack::Deflater would be on the stack, I would send out headers suggesting something different from what I'm writing onto the stream. In a similar vain, some handlers print our warnings if I send a response that is neither chunked nor has a content-length.

How far do we go when it comes to requiring it to be an IO? Your example shows how to wrap the IO, but that basically denies access to the file handler, doesn't it?

I'm not doubting the approach, I just wanna make sure we're all on the same page. :)

@tenderlove
Official Rack repositories member

@rkh Chunked and deflater should be implemented as wrappers around the IO object. I think we can make these middleware work in both cases (both cases being hijacked and non-hijacked). It seems like no middleware should actually hijack a request unless they plan to perform the response. But middleware could wrap a liberated IO object.

@raggi
Official Rack repositories member

@rkh

We have a nomenclature problem. by async protocol, I assmed you're referring to async.callback, the thin style api. Lets be clear, async.* is the async protocol, body streaming is the current "body response to each and yields strings" protocol.

The list of methods defined in this spec is the minimum that all servers should implement. This is the "portable" spec - it should be enough to implement websockets, rails live, etc. If you want to do "optimization" work, which requires more than this part of the SPEC, you're free to do so by expanding on this API. Doing so requires that the user do this with a specific knowledge of their server implementation. This is described explicitly in the SPEC, and the SPEC recommends that IO-like methods are used. The problem with saying something like "all ruby IO methods" instead is that it's hard, maybe even impossible to implement for all servers, is potentially totally incompatible with HTTP2, and with pipelining, which could be implemented with some server wrappers within the minimal API, at least for some use cases (pipeline has other issues).

Regarding rack.input, if you want to do websockets, yes. In that case, they should be the same IO really, so it makes no difference. Bear in mind that what current servers do most of the time isn't correct for a websocket upgrade (streaming the input body to a file) - but that's easy to fix. They do different things for different use cases. For certain websocket upgrade type scenarios, using Rack::Request and calling params could cause a problem. This is outside of our concern.

async.close is not related to wire formats, it's related to the HTTP client closing the socket before the response is complete, it is a callback in the async protocol for handling this case. It's an important thing for high concurrency async servers. The sync api will instead "discover" this through select for error, or failed reads/writes.

Regarding Rack::Chunked and Rack::Deflater, I'm planning on making them support hijack in the response style spec. They will not do anything in the request style spec - as that's the "really get out the way" spec. This may deserve some additional convention documentation in the SPEC. Adding to TODO. - Point is, there's two use cases here, one where you really want raw IO control, one where you want to just "write" to the body stream, but make best effort at conforming to both the Rack and HTTP specs.

I'm not sure what you mean by "denies access to the file handler"?

Great questions and comments, keep em coming :)

@raggi
Official Rack repositories member

@tenderlove absolutely in line with my thinking. I'm going to also add a TODO to add the "middleware should not hijack unless it plans to handle the response" to the SPEC, under a conventions section.

@raggi raggi closed this Jan 11, 2013
@FooBarWidget

We just implemented the socket hijacking API in Phusion Passenger 4: phusion/passenger@c5b5b2e

@rkh
Official Rack repositories member

\o/

@rkh
Official Rack repositories member

Sinatra 1.4 will be using this API.

@tarcieri

I saw this got added to Unicorn as well. I should probably add it to Reel there, eh? ;)

@jonasschneider jonasschneider referenced this pull request in tent/tentd Jan 22, 2013
Closed

Streaming API #73

3 of 11 tasks complete
@FooBarWidget

Something just occurred to me. Why does the partial hijacking API involve assigning a lambda to the "rack.hijack" response header? Can't you just return the lambda as the body?

Also, is env['rack.hijack?'] needed? Can't you just check whether env['rack.hijack'] exists?

@josevalim
Official Rack repositories member

It it still not clear to me how we will be able to detect in a middleware that we should not touch/buffer the body. For example, take the etag middleware:

https://github.com/rack/rack/blob/master/lib/rack/etag.rb

What we have done in Rails is to set the cache control to no-cache and ensure etag and respects this cache control. We have also added a check to not cache if the body responds to_path (i.e. we are sending a file back). It seems we will need to add an extra check to verify if a hijack happened which makes writing proper middleware even harder than what we have today.

I am sorry but I also don't have any solution to offer (yet).

@FooBarWidget

We've written an article on how the hijacking API works: http://blog.phusion.nl/2013/01/23/the-new-rack-socket-hijacking-api/

@raggi
Official Rack repositories member

@FooBarWidget fair question re. lambda in body. I'll think about it some more.

@raggi
Official Rack repositories member

@josevalim yep, I don't have a magic solution for that either, within the confines of compatibility. Concern coupling across disconnected objects with just data structures for communication is, well, difficult.

@josevalim
Official Rack repositories member

Concern coupling across disconnected objects with just data structures for communication is, well, difficult.

Amen.

@macournoyer
Official Rack repositories member

Sorry for being so late on this.

Trying to figure out how I'll implement this in Thin. Not sure this makes sense since all IO operations should be asynchronous. I understand most ppl do a lot of blocking operations inside their apps anyways, but I don't want to expose an API that encourage that.

Perhaps the rack.hijack_io in Thin could only respond to the *_nonblock methods. Would that still be valid RE the specs? Or could I go as far as having a callback to read and write. Something like this:

def call(env)
  io = env['rack.hijack_io']
  io.read do |data|
    puts data
  end
  ...
end
@raggi
Official Rack repositories member

@macournoyer I already did it for you... Just look in my branch.

@raggi
Official Rack repositories member

@FooBarWidget I consider the hijack boolean to be potentially more specialized than the callback proc. The boolean might be managed by a code path that's checking things like "are we in a pipeline (other than the tail)", where it's not appropriate to be hijacking.

@SamSaffron

@raggi @josevalim could etag middleware simply check for env['rack.hijack_io'] not being nil, if its set, it should skip? adding special headers just to keep etag happy seems rather awkward.

@SamSaffron

@FooBarWidget @evanphx @rkh

How about supporting a "standardish" method on the connection you leak out. #return_to_webserver that way this horrible Connection Close hack can be totally avoided.

I hijack the socket, do what I will, send the data back to the client, call #return_to_webserver if it responds to it, add connection close header if not.

Nothing about the spec seems to conflict with this, and it would give me a path forward to getting message bus long polling working very similarly across multiple web servers without mountains of hack.

@raggi
Official Rack repositories member

@SamSaffron there's actually a lot of nasty little details that end up being included in that, some of which already exist, but may get substantially worse. By the time you have an arbitrary IO, it could be in any state, and may not be usable by the server. My thoughts are that this could be difficult for server authors to support in the long term, as it opens up a whole host of application - server support requests. I'll think about it some more.

@SamSaffron

@raggi I follow, this API is super risky in so many ways (by design), if you hijack in middleware and forget to add the close connection header you can start processing keep alived requests that would never have routed to you.

My main goal here is to get Discourse working across Unicorn / Passenger / Puma / Thin with long polling support, which mean I can't really rely on throw :async any more. I guess I can just cough up the extra tcp handshake after every long poll for now. But I dislike the idea of having some web servers working less efficiently than others.

@raggi
Official Rack repositories member
@SamSaffron

@raggi indeed, haproxy would easily take care of this for you, not sure if apache mod proxy or nginx would. Nor am I sure passenger handles this behind the scenes.

@rkh
Official Rack repositories member

My main goal here is to get Discourse working across Unicorn / Passenger / Puma / Thin with long polling support, which mean I can't really rely on throw :async any more. I guess I can just cough up the extra tcp handshake after every long poll for now. But I dislike the idea of having some web servers working less efficiently than others.

You can do long polling with classic each streaming if you want the connection to be handled by your rack server.

@SamSaffron

@rkh sure, that is fine for puma and "non-free" passenger, but you would run out of connections too fast on unicorn or "free" passenger.

@raggi
Official Rack repositories member

how do you not run out of connections when you hand them back?

@SamSaffron

@raggi my understanding is unicorn is one worker thread / socket per process http://unicorn.bogomips.org/PHILOSOPHY.html same for non-concurrent passenger (the free version), I may be wrong, not sure.

My understanding is that the web server would wait till all the each is consumed before allowing any new requests in, (rainbows would allow it, puma and concurrent passenger as well)

@raggi
Official Rack repositories member

if you're trying to hand the socket back to the webserver after you've hijacked it into being concurrent, you're going to cause or have all kinds of scheduling issues. Why don't you implement this yourself, see how if it works, then we can talk about that implementation and it's edge cases?

@SamSaffron

Fair enough @raggi not trying to be a pain in the behind here, just figuring out how to consume the new APIs.

@raggi
Official Rack repositories member

I know you're not trying to be a pain, I'm just short on time, and there are some implementation details that you need to understand, so I really would suggest you have a go at implementing a stable version of your proposal in a few servers - it will be a valuable exercise.

@rkh
Official Rack repositories member

So, on threaded servers like Puma this will still only allow you to run as many connections in parallel as the size of the thread pool, right? There is no way to "hand the thread back", is there?

@FooBarWidget

There is. Spawn your own thread(s) and offload the IO object there, then return from the original thread. Or pass the IO object to an EventMachine thread, and return from the original thread. I'm not 100% sure whether the Rack hijacking spec allows this but at least Phusion Passenger allows it. You can see an example in our test suite: https://github.com/FooBarWidget/passenger/blob/master/test/ruby/request_handler_spec.rb#L197-L208

@rkh
Official Rack repositories member

Yeah, that's what I was getting at: It's not part of the spec.

@raggi
Official Rack repositories member

The spec deliberately ignores scheduling concerns. Servers are encouraged to clearly document how concurrency safe the sockets are. For most servers (all so far?) that just hand off a real socket, that's easy - the app is in control of scheduling and the socket. The socket could be handed off to an app managed thread instead of a server managed thread with no critical concerns.

In the case of having to hand back a socket however, one would have to consider scheduling much more closely, even for "trivial cases". If the server accepts more sockets than it has threads for, then the hand back could easily result in deadlocks / over subscription / stale resources, etc.

@rkh
Official Rack repositories member

Wait, what, how do I hand back a socket?

@raggi
Official Rack repositories member

You don't, that's the point. If I provided an API for that, it would absolutely need some specification regarding scheduling concerns, which would have been far too big and prescriptive for a 1.x series change.

@rkh
Official Rack repositories member

I think I still don't really see the advantage of the partial hijack vs each-streaming then. At least not from a rails/sinatra/library perspective.

@rkh
Official Rack repositories member

I don't really see the use case for "handing back a socket". I was talking about the original thread not blocking/being reused or something. If the spec would contain a sentence stating that the rack handler permits capturing a large (whatever that means) number of sockets, wouldn't that be enough?

@raggi
Official Rack repositories member

@rkh the advantage of partial hijack is just for scheduling and io handoff when people can't understand how to deal with each. It's a cop-out, but I had a lot of smart people simply not getting how to yield each regularly.

Regarding handing back a socket, we really have a lot of semantics to cover, is the socket even in a valid http state when it's handed back? A lot more encapsulation is required to make that any kind of sane.

@rkh
Official Rack repositories member

I don't want to hand a socket back, just a thread/control flow.

@rkh
Official Rack repositories member

How does the "for scheduling" part look like? Can I just safely do a Thread.new { .. write to socket here ... }?

@stakach stakach referenced this pull request in faye/faye Mar 2, 2013
Closed

Add support for Puma #198

@raggi
Official Rack repositories member

@rkh it is to be defined by the server, but in all implementations I have seen so far, yes, absolutely.

@rkh
Official Rack repositories member

So, which servers understand that?

@FooBarWidget has there been an open source passenger release by now that includes support?

I want to experiment with moving Sinatra streaming over to this mechanism.

@FooBarWidget

Phusion Passenger 4.0 RC 4 open source version has been out for 2 weeks now. It supports the hijacking API.

@rkh
Official Rack repositories member

Phusion Passenger 4.0 RC 4 open source version has been out for 2 weeks now. It supports the hijacking API.

Sweet!

@rkh
Official Rack repositories member

@raggi btw, when I run your full hijack example in dev mode, I get a NoMethodError: undefined method 'to_i' for #<Rack::Lint::HijackWrapper:0x007f8cc288ccb8>

@halorgium

@raggi is there a automated test to confirm whether a Handler meets the requirements?
It might make sense to put a simple series of things to run in the top description too.
I didn't find it entirely clear.

@halorgium

@rkh @raggi the problem is that the hijack.ru example returns an invalid response.
Should the Rack::Lint allow this behaviour of returning an [#<IO>, nil, nil]?

@jperkin jperkin pushed a commit to joyent/pkgsrc that referenced this pull request Dec 9, 2013
taca Update ruby-rack to 1.5.2.
== Changes

Please note that this release includes a few potentially breaking changes.
Of particular note are:

 * SessionHash is no longer a Hash sublcass
 * Rack::File cache_control parameter is removed in place of headers options

Additonally, SPEC has been updated in several areas and is now at 1,2.

A new SPEC section was introduced that provides two server-optional IO hijacking
APIs. Further information on these APIs will be made available by the community
in good time. In the mean time, some information can be found in the original
pull request: rack/rack#481

* January 21st, 2013: Thirty third public release 1.5.0
  * Introduced hijack SPEC, for before-response and after-response hijacking
  * SessionHash is no longer a Hash subclass
  * Rack::File cache_control parameter is removed, in place of headers options
  * Rack::Auth::AbstractRequest#scheme now yields strings, not symbols
  * Rack::Utils cookie functions now format expires in RFC 2822 format
  * Rack::File now has a default mime type
  * rackup -b 'run Rack::File.new(".")', option provides command line configs
  * Rack::Deflater will no longer double encode bodies
  * Rack::Mime#match? provides convenience for Accept header matching
  * Rack::Utils#q_values provides splitting for Accept headers
  * Rack::Utils#best_q_match provides a helper for Accept headers
  * Rack::Handler.pick provides convenience for finding available servers
  * Puma added to the list of default servers (preferred over Webrick)
  * Various middleware now correctly close body when replacing it
  * Rack::Request#params is no longer persistent with only GET params
  * Rack::Request#update_param and #delete_param provide persistent operations
  * Rack::Request#trusted_proxy? now returns true for local unix sockets
  * Rack::Response no longer forces Content-Types
  * Rack::Sendfile provides local mapping configuration options
  * Rack::Utils#rfc2109 provides old netscape style time output
  * Updated HTTP status codes
  * Ruby 1.8.6 likely no longer passes tests, and is no longer fully supported

* January 28th, 2013: Thirty fourth public release 1.5.1
  * Rack::Lint check_hijack now conforms to other parts of SPEC
  * Added hash-like methods to Abstract::ID::SessionHash for compatibility
  * Various documentation corrections

* February 7th, Thirty fifth public release 1.5.2
  * Fix CVE-2013-0263, timing attack against Rack::Session::Cookie
  * Fix CVE-2013-0262, symlink path traversal in Rack::File
  * Add various methods to Session for enhanced Rails compatibility
  * Request#trusted_proxy? now only matches whole stirngs
  * Add JSON cookie coder, to be default in Rack 1.6+ due to security concerns
  * URLMap host matching in environments that don't set the Host header fixed
  * Fix a race condition that could result in overwritten pidfiles
  * Various documentation additions
9f396d9
@isaiah isaiah referenced this pull request in isaiah/jubilee Jan 8, 2014
Closed

Support the Rack hijack API #4

@jperkin jperkin pushed a commit to joyent/pkgsrc that referenced this pull request Mar 14, 2014
taca Update ruby-rack to 1.5.2.
== Changes

Please note that this release includes a few potentially breaking changes.
Of particular note are:

 * SessionHash is no longer a Hash sublcass
 * Rack::File cache_control parameter is removed in place of headers options

Additonally, SPEC has been updated in several areas and is now at 1,2.

A new SPEC section was introduced that provides two server-optional IO hijacking
APIs. Further information on these APIs will be made available by the community
in good time. In the mean time, some information can be found in the original
pull request: rack/rack#481

* January 21st, 2013: Thirty third public release 1.5.0
  * Introduced hijack SPEC, for before-response and after-response hijacking
  * SessionHash is no longer a Hash subclass
  * Rack::File cache_control parameter is removed, in place of headers options
  * Rack::Auth::AbstractRequest#scheme now yields strings, not symbols
  * Rack::Utils cookie functions now format expires in RFC 2822 format
  * Rack::File now has a default mime type
  * rackup -b 'run Rack::File.new(".")', option provides command line configs
  * Rack::Deflater will no longer double encode bodies
  * Rack::Mime#match? provides convenience for Accept header matching
  * Rack::Utils#q_values provides splitting for Accept headers
  * Rack::Utils#best_q_match provides a helper for Accept headers
  * Rack::Handler.pick provides convenience for finding available servers
  * Puma added to the list of default servers (preferred over Webrick)
  * Various middleware now correctly close body when replacing it
  * Rack::Request#params is no longer persistent with only GET params
  * Rack::Request#update_param and #delete_param provide persistent operations
  * Rack::Request#trusted_proxy? now returns true for local unix sockets
  * Rack::Response no longer forces Content-Types
  * Rack::Sendfile provides local mapping configuration options
  * Rack::Utils#rfc2109 provides old netscape style time output
  * Updated HTTP status codes
  * Ruby 1.8.6 likely no longer passes tests, and is no longer fully supported

* January 28th, 2013: Thirty fourth public release 1.5.1
  * Rack::Lint check_hijack now conforms to other parts of SPEC
  * Added hash-like methods to Abstract::ID::SessionHash for compatibility
  * Various documentation corrections

* February 7th, Thirty fifth public release 1.5.2
  * Fix CVE-2013-0263, timing attack against Rack::Session::Cookie
  * Fix CVE-2013-0262, symlink path traversal in Rack::File
  * Add various methods to Session for enhanced Rails compatibility
  * Request#trusted_proxy? now only matches whole stirngs
  * Add JSON cookie coder, to be default in Rack 1.6+ due to security concerns
  * URLMap host matching in environments that don't set the Host header fixed
  * Fix a race condition that could result in overwritten pidfiles
  * Various documentation additions
6ebd384
@jsonn jsonn pushed a commit to jsonn/pkgsrc that referenced this pull request Oct 11, 2014
taca Update ruby-rack to 1.5.2.
== Changes

Please note that this release includes a few potentially breaking changes.
Of particular note are:

 * SessionHash is no longer a Hash sublcass
 * Rack::File cache_control parameter is removed in place of headers options

Additonally, SPEC has been updated in several areas and is now at 1,2.

A new SPEC section was introduced that provides two server-optional IO hijacking
APIs. Further information on these APIs will be made available by the community
in good time. In the mean time, some information can be found in the original
pull request: rack/rack#481

* January 21st, 2013: Thirty third public release 1.5.0
  * Introduced hijack SPEC, for before-response and after-response hijacking
  * SessionHash is no longer a Hash subclass
  * Rack::File cache_control parameter is removed, in place of headers options
  * Rack::Auth::AbstractRequest#scheme now yields strings, not symbols
  * Rack::Utils cookie functions now format expires in RFC 2822 format
  * Rack::File now has a default mime type
  * rackup -b 'run Rack::File.new(".")', option provides command line configs
  * Rack::Deflater will no longer double encode bodies
  * Rack::Mime#match? provides convenience for Accept header matching
  * Rack::Utils#q_values provides splitting for Accept headers
  * Rack::Utils#best_q_match provides a helper for Accept headers
  * Rack::Handler.pick provides convenience for finding available servers
  * Puma added to the list of default servers (preferred over Webrick)
  * Various middleware now correctly close body when replacing it
  * Rack::Request#params is no longer persistent with only GET params
  * Rack::Request#update_param and #delete_param provide persistent operations
  * Rack::Request#trusted_proxy? now returns true for local unix sockets
  * Rack::Response no longer forces Content-Types
  * Rack::Sendfile provides local mapping configuration options
  * Rack::Utils#rfc2109 provides old netscape style time output
  * Updated HTTP status codes
  * Ruby 1.8.6 likely no longer passes tests, and is no longer fully supported

* January 28th, 2013: Thirty fourth public release 1.5.1
  * Rack::Lint check_hijack now conforms to other parts of SPEC
  * Added hash-like methods to Abstract::ID::SessionHash for compatibility
  * Various documentation corrections

* February 7th, Thirty fifth public release 1.5.2
  * Fix CVE-2013-0263, timing attack against Rack::Session::Cookie
  * Fix CVE-2013-0262, symlink path traversal in Rack::File
  * Add various methods to Session for enhanced Rails compatibility
  * Request#trusted_proxy? now only matches whole stirngs
  * Add JSON cookie coder, to be default in Rack 1.6+ due to security concerns
  * URLMap host matching in environments that don't set the Host header fixed
  * Fix a race condition that could result in overwritten pidfiles
  * Various documentation additions
ef1d60b
@ketan ketan referenced this pull request in jruby/jruby-rack Dec 25, 2015
Closed

Missing support for rack socket hijacking API? #202

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment