Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Already on GitHub? Sign in to your account

Hijack #481

Closed
wants to merge 3 commits into
from

Conversation

Projects
None yet
Owner

raggi commented Jan 5, 2013

Hi all!

I had some time on the plane with no WiFi and some holiday left over, so...

@tenderlove please tell me if this matches up with the stuff you were describing to me at rubyconf
@tarcieri please take a look as you're also familiar with these things
@evanphx as with Tony, if you wouldn't mind
@chneukirchen @rkh you know why :-)

Please also highlight anyone else who would be worthwhile, or at least send them a link. I'd appreciate feedback.

What?

We still don't have an API that allows for IO-like streaming, and this is a straw man that addresses this within the confines of the rack 1.x spec. It's not an attempt to build out what I hope a 2.0 spec should be, but I am hoping that something like this will be enough to aid Rails 4s ventures, enable websockets, and a few other strategies. With HTTP2 around the corner, we'll likely want to revisit the IO API for 2.0, but we'll see how this plays out. Maybe IO wrapped around channels will be ok.

How?

There are two modes, one for "full hijacking", which is what you'd want for websockets and various hacks, and another mode which I think is closer to what @tenderlove really wants, which is a post-headers hijack. In both cases, the application assumes control of the socket after it has been hijacked, and there is currently no API for "giving sockets back to the server", so it's recommended that folks use Connection:close with this mechanism.

Examples?

I've implemented this spec in Thin (oddly enough) because it was the only reasonable server I had checked out on my machine as I as writing it. I suspect the Puma implementation will be even simpler. It's all in this branch: https://github.com/raggi/thin/commits/hijack

Specifically the server piece lives here: https://github.com/raggi/thin/blob/e04855459cb42fd98a0a483075f8337cafe6d949/lib/thin/connection.rb

There are two examples, one for each of the hijack "modes" (pre headers and post headers):

The examples implement a trivial HTTP -> telnet chat interface. You make any HTTP request to "join" and the response dumps you into the single global line buffered chat room.

Limitations

I have not tried to spec out a full IO API, and I'm not sure that we should.
I have not tried to respec all of the HTTP / anti-HTTP semantics.
There is no spec for buffering or the like.

The intent is that this is an API to "get out the way".

Similar implementations

http://golang.org/pkg/net/http/#Hijacker

Additional example

(this is a middleware example, that ships outgoing IO in chunked encoding - good for streamed xhr or the like)

class ChunkyIO
  class IOWrapper
    extend Forwardable
    def_delegators :@io, :close_read, :read, :read_nonblock

    def initialize(io)
      @io = io
      @terminated = false
    end

    def write(data)
      @io.write chunk_header(data) + data + "\r\n"
    end

    def write_nonblock(data)
      # XXX bugs can occur here, if a partial write happens!
      @io.write_nonblock chunk_header(data) + data + "\r\n"
    end

    def chunk_header(data)
      "#{data.size.to_s(16)}\r\n"
    end

    def close_write
      terminate
      @io.close_write
    end

    def close
      terminate
      @io.close
    end

    def terminate
      return if @terminated
      write ""
      @terminated = true
    end
  end

  def initialize(app)
    @app = app
  end

  def call(env)
    return app.call(env) unless env['rack.hijack?']

    handle_request env
    response = app.call(env)
    handle_response response[1]
    response
  end

  def handle_request(env)
    orig = env['rack.hijack']
    env['rack.hijack'] = proc do
      orig.call
      env['rack.hijack_io'] = IOWrapper.new(env['rack.hijack_io'])
    end
  end

  def handle_response(headers)
    if orig = headers['rack.hijack']
      headers['rack.hijack'] = proc do |io|
        orig.call IOWrapper.new(io)
      end
    end
  end
end
Owner

raggi commented Jan 5, 2013

Sorry for the double mail. We still have a parent to this repo :'(

@ghost ghost assigned raggi Jan 5, 2013

tarcieri commented Jan 5, 2013

Awesome! I have a patch to Reel that half-ass supports websockets using something similar (it passes 'rack.websocket' into the Rack env):

celluloid/reel#17

Would sure be nice to have an official API for this sort of thing.

Contributor

evanphx commented Jan 5, 2013

I support this! Puma exposes the socket via env['puma.socket'] already so support is trivial.

Owner

chneukirchen commented Jan 5, 2013

+1, I see no obvious problems.

Owner

raggi commented Jan 5, 2013

Do you think we need a rack.hijacked? boolean for middleware that might want to use this?

I'd generally discourage it from middleware, but, right now there's no way to tell if "someone else" is already in control of the socket.

Owner

chneukirchen commented Jan 5, 2013

Can't they just check whether rack.hijack_io is set?

Member

rkh commented Jan 5, 2013

This seems good. I think I can port Sinatra's stream method to use this without any changes to the external API.

Owner

raggi commented Jan 5, 2013

@chneukirchen no, as it is allowed to be set AOT or after the call. this part of the spec is intentionally open to avoid introducing implementation complexities. I am concerned about env replacement middleware too, although that's rare, we don't specify that middleware must pass on /the/ env object, which is why the "should" is there. we also don't directly specify that env can be written to or deleted from, but that's generally accepted + implemented.

In general maybe we shouldn't worry about the middleware case too much. It's confined and hard to specify well.

Owner

raggi commented Jan 5, 2013

It sounds like everyones in agreement. I'm tempted to place a "beta api" marker around this with a projected roadmap of removing that marker by 1.6 if all is going well. This potentially leaves us open to make a few changes if we've missed things between now and then. Necessary, unnecessary? other approaches?

Owner

raggi commented Jan 5, 2013

@macournoyer adding you, take a look - I know you're familiar with the older async api.

@igrigorik as you've also touched on similar items, maybe you have some thoughts?

Member

tenderlove commented Jan 6, 2013

I like that there is an IO in the env hash, I think. But it seems difficult to support an API like this:

def index
  response.status = 200
  response['X-Greeting'] = 'Hi MOM!'
  10.times do
    sleep 10;
    response.stream.write "hello world\n"
  end
  response.stream.close
  ### response is closed here
end

If I'm reading the spec correctly, rack will call call on whatever is in rack.hijack. This means we have to force people to separate "things that set headers" from "things that stream". Presumably that would lead to an API like this:

def index
  response.status = 200
  response['X-Greeting'] = 'Hi MOM!'
  streaming do |io|
    10.times do
      sleep 10;
      io.write "hello world\n"
    end
    io.close
    ### response is closed here
  end
  ### response is not closed here
end

I'm afraid people would be confused by when the socket is open or closed. It may seem counterintuitive that the socket is still open after the streaming { } block is closed.

Am I missing anything?

Member

rkh commented Jan 6, 2013

Surprisingly the API you're afraid of ending up with is what Sinatra gives you right now.

On Jan 6, 2013, at 1:31 AM, Aaron Patterson notifications@github.com wrote:

I like that there is an IO in the env hash, I think. But it seems difficult to support an API like this:

def index
response.status = 200
response['X-Greeting'] = 'Hi MOM!'
10.times do
sleep 10;
response.stream.write "hello world\n"
end
response.stream.close

response is closed here

end
If I'm reading the spec correctly, rack will call call on whatever is in rack.hijack. This means we have to force people to separate "things that set headers" from "things that stream". Presumably that would lead to an API like this:

def index
response.status = 200
response['X-Greeting'] = 'Hi MOM!'
streaming do |io|
10.times do
sleep 10;
io.write "hello world\n"
end
io.close
### response is closed here
end

response is not closed here

end
I'm afraid people would be confused by when the socket is open or closed. It may seem counterintuitive that the socket is still open after the streaming { } block is closed.

Am I missing anything?


Reply to this email directly or view it on GitHub.

From a high-level pass, this makes sense. It's also pretty much the pattern we implemented in Goliath: https://github.com/postrank-labs/goliath/blob/master/examples/chunked_streaming.rb

Admittedly, since we're "kinda, sorta" rack compliant, we were able to bend the rules and create our own helper methods to tackle the use case. Which also gives us the API @tenderlove highlighted, +/- a few EM timers. :)

However, since we brought up HTTP 2.0.. This change is not sufficient to enable what we need in that department. While the 2.0 spec is still a WIP, we have a pretty good idea as to how it's going to look, and it may be worth starting to think about it.

(But that shouldn't stop this proposal from going forward)

Member

tenderlove commented Jan 6, 2013

@rkh just not super stoked about more callbacks. The number of call methods in my stack traces already riles me up.

Member

rkh commented Jan 6, 2013

@tenderlove callcc? ;)

On Jan 6, 2013, at 2:06 AM, Aaron Patterson notifications@github.com wrote:

@rkh just not super stoked about more callbacks. The number of call methods in my stack traces already riles me up.


Reply to this email directly or view it on GitHub.

Member

tenderlove commented Jan 6, 2013

@igrigorik the first api I listed is verbatim what is possible in Rails master today. In fact, the second API can be implemented in terms of the first:

def streaming
  yield response.stream
end

def index
  response.status = 200
  response['X-Greeting'] = 'Hi MOM!'
  streaming do |io|
    10.times do
      sleep 10;
      io.write "hello world\n"
    end
    io.close
    ### response is closed here
  end
  ### response is not closed here
end

Unfortunately, the opposite is not true. Also, the user having no visibility in to when the streaming { } block is executed seems sub-optimal.

Owner

raggi commented Jan 6, 2013

@tenderlove It would be worth taking a look at the example rackup files I linked. There may be a misunderstanding. There are two APIs, one where rack does not call env['rack.hijack'], users do, the other where the server calls headers['rack.hijack']. In the prior case, the call method is not a callback that executes user code, it's a callback that executes webserver code, to tell the webserver "you don't own the socket anymore, the user does". This call should do very little, and doesn't remain on the stack (see the thin server implementation).

My problem with the API that you're proposing, is that it is totally undefined when the user takes control of the socket. What would the following code mean?

response.status = 200
response['X-Greeting'] = 'Hi MOM!'

# When do the following two lines actually get written to the socket? What if I put a sleeping loop in there?
response.stream.write 'ohai!'
response['X-Treme-Headers'] = 'woosh!'

response.stream.close

If we define response.stream as "the first call to stream sends the status and headers, then takes control of the socket", then it's easy to implement in the hijack API. The example I just gave is invalid to this spec for response.stream, but that's fine. In theory you could make it "more valid" if you provided a response.close that was allowed to be called after you are "finished" with the stream, that would write a tailer to the stream socket, in raw HTTP. As tailers are barely supported though, this seems useless, and it still has the horrible caveat, that you'd have to specify what headers you're going to set later, in order to fulfill the HTTP spec for tailers. Anyway, if you're willing to accept the first call to stream meaning the user takes control of the socket, then such an implementation is easy, and would look like so:

class MyResponse < Rack::Response
  def stream
    @stream ||=
      begin
        unless env['rack.hijack?']
          raise NotImplementedError, "server does not support socket hijacking, cannot stream"
        end
        env['rack.hijack'].call
        stream = env['rack.hijack_io']
        write_status_to stream
        write_headers_to stream
        stream
      end
  end
end

Additionally, a particularly "clever" application might be able to do some buffering to allow for late initialization of headers, without a change to the proposed spec. It's a hacky solution to the problem I describe above, and horrendously complex with additional user facing edge cases, but, I could envisage something in Rails (I'd rather not), that operates with the following rules:

  • response.stream buffers outgoing data until the end of the action
  • the end of the action completes mostly as normal, setting headers['rack.hijack'] to use the response body callback api
  • when the callback occurs, response.stream writes out it's buffer, and switches to an immediate write mode

You'd need a more sophisticated stream object than just the hijack_io, but it's entirely possible to write this. This would enable you to avoid threads/callcc's, still not have to write status and headers yourself, but still have socket like IO for the body stream. If you wanted to add user threads (say a thread in an action to free up the webserver later on - good for really long running actions or the like) then all you'd need to do is ensure that the stream buffer -> immediate mode switching is thread safe. That's also not too hard. It would take a bit longer, but if this is really what you want, I'd be happy to write up a basic implementation for you.

Owner

raggi commented Jan 6, 2013

@tenderlove I think you want to be able to avoid this: https://github.com/rails/rails/blob/master/actionpack/lib/action_controller/metal/live.rb#L102-L125 but this is not a rack problem really. It's a problem with your API. Your API has no indicator to the user of when writes really happen, or when the response stops and the stream starts - it's just implied. Your API already forces people to make a distinction between "things that set headers" and "things that stream", that is here: https://github.com/rails/rails/blob/master/actionpack/lib/action_controller/metal/live.rb#L69 . Ok, so they can cause this "by accident" instead of explicitly declaring it - meaning they're in some ways more likely to trip on an exception to learn how the API works than to just have to add one line (I think it's a learnability (and thus usability) mistake).

It's not possible to run two pieces of code at the same time, without some kind of threading style abstraction. This isn't an API limitation, it's as you know, the way the world works. Is there some other solution you have in mind where you can overcome this limitation? - To be clear, you want server code (sending status and headers) and middleware code (altering status and headers in after actions) to be running at the same time as user code (streaming data), I think without threads, if I understand your concerns correctly.

Now we can serialize these operations a bit, and in these cases we have two options:

  • User code calls to the server to write status and headers

Or

  • Server code calls to user code to write the stream

You seem to indicate you would like to do the prior, which is as I describe above, possible with the request hijack side of this SPEC change. It means that the response is going to have to write the response data itself, rather than the server doing it, which is kind of ugly, but, it will work. Unfortunately, this also results in needing to disregard middleware "after actions" on the response, which will have no effect. In order to preserve middleware after actions (such as headers set after the application is called), then you have no choice but to use the second approach. It's either that, or you drop middleware altogether, and go right to the metal all the time.

We could try to sugar the first case a little, by adding an additional server callback that says "here's the status and the headers, please write them for me", so that we're not writing status and header IO code - but that's so trivial, it may not be worth it. The only advantage to this could be, that if it was a user code -> server code callback, then it could be wrapped by supporting middleware using a similar approach to the chunkyio example I give above. It's not pretty, but middleware is middleware - it needs to sit in the middle somehow. I can draft something along these lines too, if that seems "more correct" for your use case.

Member

tenderlove commented Jan 6, 2013

My problem with the API that you're proposing, is that it is totally undefined when the user takes control of the socket. What would the following code mean?

It would raise an exception. Header hash modification is forbidden after the headers have been dumped to the socket (which happens when the buffer thread wakes, but the buffer thread won't wake until there is data queued). But since we can "liberate" the socket from the webserver, then we're free to define those semantics.

it's a callback that executes webserver code, to tell the webserver "you don't own the socket anymore, the user does".

Perfect. Exactly what I want.

I think without threads, if I understand your concerns correctly.

No, I'd like a thread that's writing to the socket, an app thread, and a buffer between them. If I can say to the webserver "the socket is mine", then this should be perfect.

Your code example is great. I'll read the telnet example you posted, but so far I am very happy. :-)

Member

rkh commented Jan 6, 2013

The example code posted assumes that in the stack we always hand down the original env hash, never a new/modified one. Otherwise the io can't be added later on. And adding it before hand could lead to undefined behaviour if someone just starts writing to it. Maybe calling the callback should instead return the IO? Or we need some wrapper object in place for the IO, or something.

Member

rkh commented Jan 6, 2013

For websockets, we should also consider updating the spec to no longer require the request body to be rewindable. Though that would have more implications for existing code.

Owner

raggi commented Jan 6, 2013

@rkh re env, yeah, i've been considering that too - I'll switch the should to a must on the returning it. maybe some language that it must be preserved / set / passed down after initialization (after the call to hijack).

Owner

raggi commented Jan 6, 2013

@rkh re rewindable and websockets, i'm not sure how practical that's going to be. clearly multipart and websockets have different usage patterns, maybe this is something where middleware that's reading from either needs to be more aware of the context. web servers supporting websocket pass-through certainly shouldn't be buffering the incoming body before hitting the app.

Member

rkh commented Jan 6, 2013

Yeah, I mean, I think if people really want websockets, they should be fine with running that through something else, as it probably doesn't fit a request/response model anyhow. For SSE and alike, this would be gold already.

Member

rkh commented Jan 6, 2013

it's a callback that executes webserver code, to tell the webserver "you don't own the socket anymore, the user does".

Perfect. Exactly what I want.

@raggi the scenario/api @tenderlove has in mind would only work with a full/pre-header hijack, right? otherwise you need to hand down a callback for starting to stream, no? This would mean that Rails would need to take care of writing headers to the stream, potentially messing with middleware, right?

Owner

raggi commented Jan 6, 2013

@rkh correct, i'm considering adding some items to rack::util to assist with that.

Member

rkh commented Jan 6, 2013

@raggi but either way you'd need to hand the reply back down the stack to make sure all the middleware adds headers. This is even more important considering that middleware usually has to first call the app before even having a header hash to add stuff to.

Owner

raggi commented Jan 6, 2013

@rkh yep, that's why I added the response style hijack api as well as the request style. there's not much we can do to solve that problem in the request style, within the confines of the env -> tuple api. do you have any thoughts to the contrary?

Member

rkh commented Jan 6, 2013

No, I'm just not sure of the advantage over the async protocol Thin and Rainbows implement. To the contrary, that already works with chunking etc.

Owner

raggi commented Jan 6, 2013

The advantages are (TINAEL):

  • You can read as well as write
  • You can do partial close
  • The API doesn't enforce buffering, and allows flushing
  • It's possible to implement the async protocol inside this API, but not vice versa (although as you see in thin, this api isn't ideal for reactors - supporting both requires an API that's truly incompatible with SPEC 1.x)
  • The API is easier for servers to allow extensions - most servers will supply real IO objects, so users will often be able to use ioctl, etc.
  • Hijackers are responsible for handling remote close logic, rather than the async.close callback, which has sensitive semantics.

I'm not sure what you mean by "that already works with chunking", unless the server is chunking as each yields (which can be horribly inefficient), then the async protocol does not have chunking implied at all.

Member

rkh commented Jan 6, 2013

I'm not sure what you mean by "that already works with chunking", unless the server is chunking as each yields (which can be horribly inefficient), then the async protocol does not have chunking implied at all.

Rack::Chunked is chunking as each yields.

The API is easier for servers to allow extensions - most servers will supply real IO objects, so users will often be able to use ioctl, etc.

This also means that the list of methods the io object needs to implement is not well defined.

You can read as well as write.

So should you read from the hijacked IO rather than rack.input?

The API doesn't enforce buffering, and allows flushing

Right.

Hijackers are responsible for handling remote close logic, rather than the async.close callback, which has sensitive semantics.

This might prove difficult for pieces of logic that cannot reflect over the entire stack. For instance, if I was to use the partial hijack and something like Rack::Chunked or Rack::Deflater would be on the stack, I would send out headers suggesting something different from what I'm writing onto the stream. In a similar vain, some handlers print our warnings if I send a response that is neither chunked nor has a content-length.

How far do we go when it comes to requiring it to be an IO? Your example shows how to wrap the IO, but that basically denies access to the file handler, doesn't it?

I'm not doubting the approach, I just wanna make sure we're all on the same page. :)

Member

tenderlove commented Jan 6, 2013

@rkh Chunked and deflater should be implemented as wrappers around the IO object. I think we can make these middleware work in both cases (both cases being hijacked and non-hijacked). It seems like no middleware should actually hijack a request unless they plan to perform the response. But middleware could wrap a liberated IO object.

Owner

raggi commented Jan 7, 2013

@rkh

We have a nomenclature problem. by async protocol, I assmed you're referring to async.callback, the thin style api. Lets be clear, async.* is the async protocol, body streaming is the current "body response to each and yields strings" protocol.

The list of methods defined in this spec is the minimum that all servers should implement. This is the "portable" spec - it should be enough to implement websockets, rails live, etc. If you want to do "optimization" work, which requires more than this part of the SPEC, you're free to do so by expanding on this API. Doing so requires that the user do this with a specific knowledge of their server implementation. This is described explicitly in the SPEC, and the SPEC recommends that IO-like methods are used. The problem with saying something like "all ruby IO methods" instead is that it's hard, maybe even impossible to implement for all servers, is potentially totally incompatible with HTTP2, and with pipelining, which could be implemented with some server wrappers within the minimal API, at least for some use cases (pipeline has other issues).

Regarding rack.input, if you want to do websockets, yes. In that case, they should be the same IO really, so it makes no difference. Bear in mind that what current servers do most of the time isn't correct for a websocket upgrade (streaming the input body to a file) - but that's easy to fix. They do different things for different use cases. For certain websocket upgrade type scenarios, using Rack::Request and calling params could cause a problem. This is outside of our concern.

async.close is not related to wire formats, it's related to the HTTP client closing the socket before the response is complete, it is a callback in the async protocol for handling this case. It's an important thing for high concurrency async servers. The sync api will instead "discover" this through select for error, or failed reads/writes.

Regarding Rack::Chunked and Rack::Deflater, I'm planning on making them support hijack in the response style spec. They will not do anything in the request style spec - as that's the "really get out the way" spec. This may deserve some additional convention documentation in the SPEC. Adding to TODO. - Point is, there's two use cases here, one where you really want raw IO control, one where you want to just "write" to the body stream, but make best effort at conforming to both the Rack and HTTP specs.

I'm not sure what you mean by "denies access to the file handler"?

Great questions and comments, keep em coming :)

Owner

raggi commented Jan 7, 2013

@tenderlove absolutely in line with my thinking. I'm going to also add a TODO to add the "middleware should not hijack unless it plans to handle the response" to the SPEC, under a conventions section.

@raggi raggi closed this Jan 11, 2013

Contributor

FooBarWidget commented Jan 22, 2013

We just implemented the socket hijacking API in Phusion Passenger 4: phusion/passenger@c5b5b2e

Member

rkh commented Jan 22, 2013

\o/

Member

rkh commented Jan 22, 2013

Sinatra 1.4 will be using this API.

I saw this got added to Unicorn as well. I should probably add it to Reel there, eh? ;)

@jonasschneider jonasschneider referenced this pull request in tent/tentd Jan 22, 2013

Closed

Streaming API #73

3 of 11 tasks complete
Contributor

FooBarWidget commented Jan 23, 2013

Something just occurred to me. Why does the partial hijacking API involve assigning a lambda to the "rack.hijack" response header? Can't you just return the lambda as the body?

Also, is env['rack.hijack?'] needed? Can't you just check whether env['rack.hijack'] exists?

Contributor

josevalim commented Jan 23, 2013

It it still not clear to me how we will be able to detect in a middleware that we should not touch/buffer the body. For example, take the etag middleware:

https://github.com/rack/rack/blob/master/lib/rack/etag.rb

What we have done in Rails is to set the cache control to no-cache and ensure etag and respects this cache control. We have also added a check to not cache if the body responds to_path (i.e. we are sending a file back). It seems we will need to add an extra check to verify if a hijack happened which makes writing proper middleware even harder than what we have today.

I am sorry but I also don't have any solution to offer (yet).

Contributor

FooBarWidget commented Jan 23, 2013

We've written an article on how the hijacking API works: http://blog.phusion.nl/2013/01/23/the-new-rack-socket-hijacking-api/

Owner

raggi commented Jan 26, 2013

@FooBarWidget fair question re. lambda in body. I'll think about it some more.

Owner

raggi commented Jan 26, 2013

@josevalim yep, I don't have a magic solution for that either, within the confines of compatibility. Concern coupling across disconnected objects with just data structures for communication is, well, difficult.

Contributor

josevalim commented Jan 26, 2013

Concern coupling across disconnected objects with just data structures for communication is, well, difficult.

Amen.

Member

macournoyer commented Jan 27, 2013

Sorry for being so late on this.

Trying to figure out how I'll implement this in Thin. Not sure this makes sense since all IO operations should be asynchronous. I understand most ppl do a lot of blocking operations inside their apps anyways, but I don't want to expose an API that encourage that.

Perhaps the rack.hijack_io in Thin could only respond to the *_nonblock methods. Would that still be valid RE the specs? Or could I go as far as having a callback to read and write. Something like this:

def call(env)
  io = env['rack.hijack_io']
  io.read do |data|
    puts data
  end
  ...
end
Owner

raggi commented Jan 27, 2013

@macournoyer I already did it for you... Just look in my branch.

Owner

raggi commented Jan 27, 2013

@FooBarWidget I consider the hijack boolean to be potentially more specialized than the callback proc. The boolean might be managed by a code path that's checking things like "are we in a pipeline (other than the tail)", where it's not appropriate to be hijacking.

Contributor

SamSaffron commented Feb 16, 2013

@raggi @josevalim could etag middleware simply check for env['rack.hijack_io'] not being nil, if its set, it should skip? adding special headers just to keep etag happy seems rather awkward.

Contributor

SamSaffron commented Feb 16, 2013

@FooBarWidget @evanphx @rkh

How about supporting a "standardish" method on the connection you leak out. #return_to_webserver that way this horrible Connection Close hack can be totally avoided.

I hijack the socket, do what I will, send the data back to the client, call #return_to_webserver if it responds to it, add connection close header if not.

Nothing about the spec seems to conflict with this, and it would give me a path forward to getting message bus long polling working very similarly across multiple web servers without mountains of hack.

Owner

raggi commented Feb 17, 2013

@SamSaffron there's actually a lot of nasty little details that end up being included in that, some of which already exist, but may get substantially worse. By the time you have an arbitrary IO, it could be in any state, and may not be usable by the server. My thoughts are that this could be difficult for server authors to support in the long term, as it opens up a whole host of application - server support requests. I'll think about it some more.

Contributor

SamSaffron commented Feb 17, 2013

@raggi I follow, this API is super risky in so many ways (by design), if you hijack in middleware and forget to add the close connection header you can start processing keep alived requests that would never have routed to you.

My main goal here is to get Discourse working across Unicorn / Passenger / Puma / Thin with long polling support, which mean I can't really rely on throw :async any more. I guess I can just cough up the extra tcp handshake after every long poll for now. But I dislike the idea of having some web servers working less efficiently than others.

Owner

raggi commented Feb 18, 2013

Often times your load balancer will handle this for you, no?

On Sunday, February 17, 2013, Sam wrote:

@raggi https://github.com/raggi I follow, this API is super risky in so
many ways (by design), if you hijack in middleware and forget to add the
close connection header you can start processing keep alived requests that
would never have routed to you.

My main goal here is to get Discourse working across Unicorn / Passenger /
Puma / Thin with long polling support, which mean I can't really rely on
throw :async any more. I guess I can just cough up the extra tcp handshake
after every long poll for now. But I dislike the idea of having some web
servers working less efficiently than others.


Reply to this email directly or view it on GitHubhttps://github.com/rack/rack/pull/481#issuecomment-13698091.

Contributor

SamSaffron commented Feb 18, 2013

@raggi indeed, haproxy would easily take care of this for you, not sure if apache mod proxy or nginx would. Nor am I sure passenger handles this behind the scenes.

Member

rkh commented Feb 18, 2013

My main goal here is to get Discourse working across Unicorn / Passenger / Puma / Thin with long polling support, which mean I can't really rely on throw :async any more. I guess I can just cough up the extra tcp handshake after every long poll for now. But I dislike the idea of having some web servers working less efficiently than others.

You can do long polling with classic each streaming if you want the connection to be handled by your rack server.

Contributor

SamSaffron commented Feb 18, 2013

@rkh sure, that is fine for puma and "non-free" passenger, but you would run out of connections too fast on unicorn or "free" passenger.

Owner

raggi commented Feb 19, 2013

how do you not run out of connections when you hand them back?

Contributor

SamSaffron commented Feb 19, 2013

@raggi my understanding is unicorn is one worker thread / socket per process http://unicorn.bogomips.org/PHILOSOPHY.html same for non-concurrent passenger (the free version), I may be wrong, not sure.

My understanding is that the web server would wait till all the each is consumed before allowing any new requests in, (rainbows would allow it, puma and concurrent passenger as well)

Owner

raggi commented Feb 19, 2013

if you're trying to hand the socket back to the webserver after you've hijacked it into being concurrent, you're going to cause or have all kinds of scheduling issues. Why don't you implement this yourself, see how if it works, then we can talk about that implementation and it's edge cases?

Contributor

SamSaffron commented Feb 19, 2013

Fair enough @raggi not trying to be a pain in the behind here, just figuring out how to consume the new APIs.

Owner

raggi commented Feb 19, 2013

I know you're not trying to be a pain, I'm just short on time, and there are some implementation details that you need to understand, so I really would suggest you have a go at implementing a stable version of your proposal in a few servers - it will be a valuable exercise.

Member

rkh commented Feb 26, 2013

So, on threaded servers like Puma this will still only allow you to run as many connections in parallel as the size of the thread pool, right? There is no way to "hand the thread back", is there?

Contributor

FooBarWidget commented Feb 26, 2013

There is. Spawn your own thread(s) and offload the IO object there, then return from the original thread. Or pass the IO object to an EventMachine thread, and return from the original thread. I'm not 100% sure whether the Rack hijacking spec allows this but at least Phusion Passenger allows it. You can see an example in our test suite: https://github.com/FooBarWidget/passenger/blob/master/test/ruby/request_handler_spec.rb#L197-L208

Member

rkh commented Feb 26, 2013

Yeah, that's what I was getting at: It's not part of the spec.

Owner

raggi commented Feb 26, 2013

The spec deliberately ignores scheduling concerns. Servers are encouraged to clearly document how concurrency safe the sockets are. For most servers (all so far?) that just hand off a real socket, that's easy - the app is in control of scheduling and the socket. The socket could be handed off to an app managed thread instead of a server managed thread with no critical concerns.

In the case of having to hand back a socket however, one would have to consider scheduling much more closely, even for "trivial cases". If the server accepts more sockets than it has threads for, then the hand back could easily result in deadlocks / over subscription / stale resources, etc.

Member

rkh commented Feb 26, 2013

Wait, what, how do I hand back a socket?

Owner

raggi commented Feb 27, 2013

You don't, that's the point. If I provided an API for that, it would absolutely need some specification regarding scheduling concerns, which would have been far too big and prescriptive for a 1.x series change.

Member

rkh commented Feb 27, 2013

I think I still don't really see the advantage of the partial hijack vs each-streaming then. At least not from a rails/sinatra/library perspective.

Member

rkh commented Feb 27, 2013

I don't really see the use case for "handing back a socket". I was talking about the original thread not blocking/being reused or something. If the spec would contain a sentence stating that the rack handler permits capturing a large (whatever that means) number of sockets, wouldn't that be enough?

Owner

raggi commented Feb 28, 2013

@rkh the advantage of partial hijack is just for scheduling and io handoff when people can't understand how to deal with each. It's a cop-out, but I had a lot of smart people simply not getting how to yield each regularly.

Regarding handing back a socket, we really have a lot of semantics to cover, is the socket even in a valid http state when it's handed back? A lot more encapsulation is required to make that any kind of sane.

Member

rkh commented Feb 28, 2013

I don't want to hand a socket back, just a thread/control flow.

Member

rkh commented Feb 28, 2013

How does the "for scheduling" part look like? Can I just safely do a Thread.new { .. write to socket here ... }?

@stakach stakach referenced this pull request in faye/faye Mar 2, 2013

Closed

Add support for Puma #198

Owner

raggi commented Mar 2, 2013

@rkh it is to be defined by the server, but in all implementations I have seen so far, yes, absolutely.

Member

rkh commented Mar 16, 2013

So, which servers understand that?

@FooBarWidget has there been an open source passenger release by now that includes support?

I want to experiment with moving Sinatra streaming over to this mechanism.

Contributor

FooBarWidget commented Mar 16, 2013

Phusion Passenger 4.0 RC 4 open source version has been out for 2 weeks now. It supports the hijacking API.

Member

rkh commented Mar 17, 2013

Phusion Passenger 4.0 RC 4 open source version has been out for 2 weeks now. It supports the hijacking API.

Sweet!

Member

rkh commented Mar 23, 2013

@raggi btw, when I run your full hijack example in dev mode, I get a NoMethodError: undefined method 'to_i' for #<Rack::Lint::HijackWrapper:0x007f8cc288ccb8>

Contributor

halorgium commented Apr 20, 2013

@raggi is there a automated test to confirm whether a Handler meets the requirements?
It might make sense to put a simple series of things to run in the top description too.
I didn't find it entirely clear.

Contributor

halorgium commented Apr 21, 2013

@rkh @raggi the problem is that the hijack.ru example returns an invalid response.
Should the Rack::Lint allow this behaviour of returning an [#<IO>, nil, nil]?

jperkin pushed a commit to joyent/pkgsrc-legacy that referenced this pull request Dec 9, 2013

Update ruby-rack to 1.5.2.
== Changes

Please note that this release includes a few potentially breaking changes.
Of particular note are:

 * SessionHash is no longer a Hash sublcass
 * Rack::File cache_control parameter is removed in place of headers options

Additonally, SPEC has been updated in several areas and is now at 1,2.

A new SPEC section was introduced that provides two server-optional IO hijacking
APIs. Further information on these APIs will be made available by the community
in good time. In the mean time, some information can be found in the original
pull request: rack/rack#481

* January 21st, 2013: Thirty third public release 1.5.0
  * Introduced hijack SPEC, for before-response and after-response hijacking
  * SessionHash is no longer a Hash subclass
  * Rack::File cache_control parameter is removed, in place of headers options
  * Rack::Auth::AbstractRequest#scheme now yields strings, not symbols
  * Rack::Utils cookie functions now format expires in RFC 2822 format
  * Rack::File now has a default mime type
  * rackup -b 'run Rack::File.new(".")', option provides command line configs
  * Rack::Deflater will no longer double encode bodies
  * Rack::Mime#match? provides convenience for Accept header matching
  * Rack::Utils#q_values provides splitting for Accept headers
  * Rack::Utils#best_q_match provides a helper for Accept headers
  * Rack::Handler.pick provides convenience for finding available servers
  * Puma added to the list of default servers (preferred over Webrick)
  * Various middleware now correctly close body when replacing it
  * Rack::Request#params is no longer persistent with only GET params
  * Rack::Request#update_param and #delete_param provide persistent operations
  * Rack::Request#trusted_proxy? now returns true for local unix sockets
  * Rack::Response no longer forces Content-Types
  * Rack::Sendfile provides local mapping configuration options
  * Rack::Utils#rfc2109 provides old netscape style time output
  * Updated HTTP status codes
  * Ruby 1.8.6 likely no longer passes tests, and is no longer fully supported

* January 28th, 2013: Thirty fourth public release 1.5.1
  * Rack::Lint check_hijack now conforms to other parts of SPEC
  * Added hash-like methods to Abstract::ID::SessionHash for compatibility
  * Various documentation corrections

* February 7th, Thirty fifth public release 1.5.2
  * Fix CVE-2013-0263, timing attack against Rack::Session::Cookie
  * Fix CVE-2013-0262, symlink path traversal in Rack::File
  * Add various methods to Session for enhanced Rails compatibility
  * Request#trusted_proxy? now only matches whole stirngs
  * Add JSON cookie coder, to be default in Rack 1.6+ due to security concerns
  * URLMap host matching in environments that don't set the Host header fixed
  * Fix a race condition that could result in overwritten pidfiles
  * Various documentation additions

@isaiah isaiah referenced this pull request in isaiah/jubilee Jan 8, 2014

Closed

Support the Rack hijack API #4

jperkin pushed a commit to joyent/pkgsrc-legacy that referenced this pull request Mar 14, 2014

Update ruby-rack to 1.5.2.
== Changes

Please note that this release includes a few potentially breaking changes.
Of particular note are:

 * SessionHash is no longer a Hash sublcass
 * Rack::File cache_control parameter is removed in place of headers options

Additonally, SPEC has been updated in several areas and is now at 1,2.

A new SPEC section was introduced that provides two server-optional IO hijacking
APIs. Further information on these APIs will be made available by the community
in good time. In the mean time, some information can be found in the original
pull request: rack/rack#481

* January 21st, 2013: Thirty third public release 1.5.0
  * Introduced hijack SPEC, for before-response and after-response hijacking
  * SessionHash is no longer a Hash subclass
  * Rack::File cache_control parameter is removed, in place of headers options
  * Rack::Auth::AbstractRequest#scheme now yields strings, not symbols
  * Rack::Utils cookie functions now format expires in RFC 2822 format
  * Rack::File now has a default mime type
  * rackup -b 'run Rack::File.new(".")', option provides command line configs
  * Rack::Deflater will no longer double encode bodies
  * Rack::Mime#match? provides convenience for Accept header matching
  * Rack::Utils#q_values provides splitting for Accept headers
  * Rack::Utils#best_q_match provides a helper for Accept headers
  * Rack::Handler.pick provides convenience for finding available servers
  * Puma added to the list of default servers (preferred over Webrick)
  * Various middleware now correctly close body when replacing it
  * Rack::Request#params is no longer persistent with only GET params
  * Rack::Request#update_param and #delete_param provide persistent operations
  * Rack::Request#trusted_proxy? now returns true for local unix sockets
  * Rack::Response no longer forces Content-Types
  * Rack::Sendfile provides local mapping configuration options
  * Rack::Utils#rfc2109 provides old netscape style time output
  * Updated HTTP status codes
  * Ruby 1.8.6 likely no longer passes tests, and is no longer fully supported

* January 28th, 2013: Thirty fourth public release 1.5.1
  * Rack::Lint check_hijack now conforms to other parts of SPEC
  * Added hash-like methods to Abstract::ID::SessionHash for compatibility
  * Various documentation corrections

* February 7th, Thirty fifth public release 1.5.2
  * Fix CVE-2013-0263, timing attack against Rack::Session::Cookie
  * Fix CVE-2013-0262, symlink path traversal in Rack::File
  * Add various methods to Session for enhanced Rails compatibility
  * Request#trusted_proxy? now only matches whole stirngs
  * Add JSON cookie coder, to be default in Rack 1.6+ due to security concerns
  * URLMap host matching in environments that don't set the Host header fixed
  * Fix a race condition that could result in overwritten pidfiles
  * Various documentation additions

jsonn pushed a commit to jsonn/pkgsrc that referenced this pull request Oct 11, 2014

Update ruby-rack to 1.5.2.
== Changes

Please note that this release includes a few potentially breaking changes.
Of particular note are:

 * SessionHash is no longer a Hash sublcass
 * Rack::File cache_control parameter is removed in place of headers options

Additonally, SPEC has been updated in several areas and is now at 1,2.

A new SPEC section was introduced that provides two server-optional IO hijacking
APIs. Further information on these APIs will be made available by the community
in good time. In the mean time, some information can be found in the original
pull request: rack/rack#481

* January 21st, 2013: Thirty third public release 1.5.0
  * Introduced hijack SPEC, for before-response and after-response hijacking
  * SessionHash is no longer a Hash subclass
  * Rack::File cache_control parameter is removed, in place of headers options
  * Rack::Auth::AbstractRequest#scheme now yields strings, not symbols
  * Rack::Utils cookie functions now format expires in RFC 2822 format
  * Rack::File now has a default mime type
  * rackup -b 'run Rack::File.new(".")', option provides command line configs
  * Rack::Deflater will no longer double encode bodies
  * Rack::Mime#match? provides convenience for Accept header matching
  * Rack::Utils#q_values provides splitting for Accept headers
  * Rack::Utils#best_q_match provides a helper for Accept headers
  * Rack::Handler.pick provides convenience for finding available servers
  * Puma added to the list of default servers (preferred over Webrick)
  * Various middleware now correctly close body when replacing it
  * Rack::Request#params is no longer persistent with only GET params
  * Rack::Request#update_param and #delete_param provide persistent operations
  * Rack::Request#trusted_proxy? now returns true for local unix sockets
  * Rack::Response no longer forces Content-Types
  * Rack::Sendfile provides local mapping configuration options
  * Rack::Utils#rfc2109 provides old netscape style time output
  * Updated HTTP status codes
  * Ruby 1.8.6 likely no longer passes tests, and is no longer fully supported

* January 28th, 2013: Thirty fourth public release 1.5.1
  * Rack::Lint check_hijack now conforms to other parts of SPEC
  * Added hash-like methods to Abstract::ID::SessionHash for compatibility
  * Various documentation corrections

* February 7th, Thirty fifth public release 1.5.2
  * Fix CVE-2013-0263, timing attack against Rack::Session::Cookie
  * Fix CVE-2013-0262, symlink path traversal in Rack::File
  * Add various methods to Session for enhanced Rails compatibility
  * Request#trusted_proxy? now only matches whole stirngs
  * Add JSON cookie coder, to be default in Rack 1.6+ due to security concerns
  * URLMap host matching in environments that don't set the Host header fixed
  * Fix a race condition that could result in overwritten pidfiles
  * Various documentation additions

@ketan ketan referenced this pull request in jruby/jruby-rack Dec 25, 2015

Closed

Missing support for rack socket hijacking API? #202

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment