Skip to content
This repository

Hijack #481

Closed
wants to merge 3 commits into from

12 participants

James Tucker Tony Arcieri Evan Phoenix Christian Neukirchen Konstantin Haase Aaron Patterson Ilya Grigorik Hongli Lai José Valim Marc-André Cournoyer Sam Tim Carey-Smith
James Tucker
Owner

Hi all!

I had some time on the plane with no WiFi and some holiday left over, so...

@tenderlove please tell me if this matches up with the stuff you were describing to me at rubyconf
@tarcieri please take a look as you're also familiar with these things
@evanphx as with Tony, if you wouldn't mind
@chneukirchen @rkh you know why :-)

Please also highlight anyone else who would be worthwhile, or at least send them a link. I'd appreciate feedback.

What?

We still don't have an API that allows for IO-like streaming, and this is a straw man that addresses this within the confines of the rack 1.x spec. It's not an attempt to build out what I hope a 2.0 spec should be, but I am hoping that something like this will be enough to aid Rails 4s ventures, enable websockets, and a few other strategies. With HTTP2 around the corner, we'll likely want to revisit the IO API for 2.0, but we'll see how this plays out. Maybe IO wrapped around channels will be ok.

How?

There are two modes, one for "full hijacking", which is what you'd want for websockets and various hacks, and another mode which I think is closer to what @tenderlove really wants, which is a post-headers hijack. In both cases, the application assumes control of the socket after it has been hijacked, and there is currently no API for "giving sockets back to the server", so it's recommended that folks use Connection:close with this mechanism.

Examples?

I've implemented this spec in Thin (oddly enough) because it was the only reasonable server I had checked out on my machine as I as writing it. I suspect the Puma implementation will be even simpler. It's all in this branch: https://github.com/raggi/thin/commits/hijack

Specifically the server piece lives here: https://github.com/raggi/thin/blob/e04855459cb42fd98a0a483075f8337cafe6d949/lib/thin/connection.rb

There are two examples, one for each of the hijack "modes" (pre headers and post headers):

The examples implement a trivial HTTP -> telnet chat interface. You make any HTTP request to "join" and the response dumps you into the single global line buffered chat room.

Limitations

I have not tried to spec out a full IO API, and I'm not sure that we should.
I have not tried to respec all of the HTTP / anti-HTTP semantics.
There is no spec for buffering or the like.

The intent is that this is an API to "get out the way".

Similar implementations

http://golang.org/pkg/net/http/#Hijacker

Additional example

(this is a middleware example, that ships outgoing IO in chunked encoding - good for streamed xhr or the like)

class ChunkyIO
  class IOWrapper
    extend Forwardable
    def_delegators :@io, :close_read, :read, :read_nonblock

    def initialize(io)
      @io = io
      @terminated = false
    end

    def write(data)
      @io.write chunk_header(data) + data + "\r\n"
    end

    def write_nonblock(data)
      # XXX bugs can occur here, if a partial write happens!
      @io.write_nonblock chunk_header(data) + data + "\r\n"
    end

    def chunk_header(data)
      "#{data.size.to_s(16)}\r\n"
    end

    def close_write
      terminate
      @io.close_write
    end

    def close
      terminate
      @io.close
    end

    def terminate
      return if @terminated
      write ""
      @terminated = true
    end
  end

  def initialize(app)
    @app = app
  end

  def call(env)
    return app.call(env) unless env['rack.hijack?']

    handle_request env
    response = app.call(env)
    handle_response response[1]
    response
  end

  def handle_request(env)
    orig = env['rack.hijack']
    env['rack.hijack'] = proc do
      orig.call
      env['rack.hijack_io'] = IOWrapper.new(env['rack.hijack_io'])
    end
  end

  def handle_response(headers)
    if orig = headers['rack.hijack']
      headers['rack.hijack'] = proc do |io|
        orig.call IOWrapper.new(io)
      end
    end
  end
end
James Tucker
Owner

Sorry for the double mail. We still have a parent to this repo :'(

Tony Arcieri

Awesome! I have a patch to Reel that half-ass supports websockets using something similar (it passes 'rack.websocket' into the Rack env):

celluloid/reel#17

Would sure be nice to have an official API for this sort of thing.

Evan Phoenix

I support this! Puma exposes the socket via env['puma.socket'] already so support is trivial.

Christian Neukirchen
Owner

+1, I see no obvious problems.

James Tucker
Owner

Do you think we need a rack.hijacked? boolean for middleware that might want to use this?

I'd generally discourage it from middleware, but, right now there's no way to tell if "someone else" is already in control of the socket.

Christian Neukirchen
Owner

Can't they just check whether rack.hijack_io is set?

Konstantin Haase
Collaborator
rkh commented

This seems good. I think I can port Sinatra's stream method to use this without any changes to the external API.

James Tucker
Owner

@chneukirchen no, as it is allowed to be set AOT or after the call. this part of the spec is intentionally open to avoid introducing implementation complexities. I am concerned about env replacement middleware too, although that's rare, we don't specify that middleware must pass on /the/ env object, which is why the "should" is there. we also don't directly specify that env can be written to or deleted from, but that's generally accepted + implemented.

In general maybe we shouldn't worry about the middleware case too much. It's confined and hard to specify well.

James Tucker
Owner

It sounds like everyones in agreement. I'm tempted to place a "beta api" marker around this with a projected roadmap of removing that marker by 1.6 if all is going well. This potentially leaves us open to make a few changes if we've missed things between now and then. Necessary, unnecessary? other approaches?

James Tucker
Owner

@macournoyer adding you, take a look - I know you're familiar with the older async api.

@igrigorik as you've also touched on similar items, maybe you have some thoughts?

Aaron Patterson
Collaborator

I like that there is an IO in the env hash, I think. But it seems difficult to support an API like this:

def index
  response.status = 200
  response['X-Greeting'] = 'Hi MOM!'
  10.times do
    sleep 10;
    response.stream.write "hello world\n"
  end
  response.stream.close
  ### response is closed here
end

If I'm reading the spec correctly, rack will call call on whatever is in rack.hijack. This means we have to force people to separate "things that set headers" from "things that stream". Presumably that would lead to an API like this:

def index
  response.status = 200
  response['X-Greeting'] = 'Hi MOM!'
  streaming do |io|
    10.times do
      sleep 10;
      io.write "hello world\n"
    end
    io.close
    ### response is closed here
  end
  ### response is not closed here
end

I'm afraid people would be confused by when the socket is open or closed. It may seem counterintuitive that the socket is still open after the streaming { } block is closed.

Am I missing anything?

Konstantin Haase
Collaborator
rkh commented
Ilya Grigorik

From a high-level pass, this makes sense. It's also pretty much the pattern we implemented in Goliath: https://github.com/postrank-labs/goliath/blob/master/examples/chunked_streaming.rb

Admittedly, since we're "kinda, sorta" rack compliant, we were able to bend the rules and create our own helper methods to tackle the use case. Which also gives us the API @tenderlove highlighted, +/- a few EM timers. :)

However, since we brought up HTTP 2.0.. This change is not sufficient to enable what we need in that department. While the 2.0 spec is still a WIP, we have a pretty good idea as to how it's going to look, and it may be worth starting to think about it.

(But that shouldn't stop this proposal from going forward)

Aaron Patterson
Collaborator

@rkh just not super stoked about more callbacks. The number of call methods in my stack traces already riles me up.

Konstantin Haase
Collaborator
rkh commented
Aaron Patterson
Collaborator

@igrigorik the first api I listed is verbatim what is possible in Rails master today. In fact, the second API can be implemented in terms of the first:

def streaming
  yield response.stream
end

def index
  response.status = 200
  response['X-Greeting'] = 'Hi MOM!'
  streaming do |io|
    10.times do
      sleep 10;
      io.write "hello world\n"
    end
    io.close
    ### response is closed here
  end
  ### response is not closed here
end

Unfortunately, the opposite is not true. Also, the user having no visibility in to when the streaming { } block is executed seems sub-optimal.

James Tucker
Owner

@tenderlove It would be worth taking a look at the example rackup files I linked. There may be a misunderstanding. There are two APIs, one where rack does not call env['rack.hijack'], users do, the other where the server calls headers['rack.hijack']. In the prior case, the call method is not a callback that executes user code, it's a callback that executes webserver code, to tell the webserver "you don't own the socket anymore, the user does". This call should do very little, and doesn't remain on the stack (see the thin server implementation).

My problem with the API that you're proposing, is that it is totally undefined when the user takes control of the socket. What would the following code mean?

response.status = 200
response['X-Greeting'] = 'Hi MOM!'

# When do the following two lines actually get written to the socket? What if I put a sleeping loop in there?
response.stream.write 'ohai!'
response['X-Treme-Headers'] = 'woosh!'

response.stream.close

If we define response.stream as "the first call to stream sends the status and headers, then takes control of the socket", then it's easy to implement in the hijack API. The example I just gave is invalid to this spec for response.stream, but that's fine. In theory you could make it "more valid" if you provided a response.close that was allowed to be called after you are "finished" with the stream, that would write a tailer to the stream socket, in raw HTTP. As tailers are barely supported though, this seems useless, and it still has the horrible caveat, that you'd have to specify what headers you're going to set later, in order to fulfill the HTTP spec for tailers. Anyway, if you're willing to accept the first call to stream meaning the user takes control of the socket, then such an implementation is easy, and would look like so:

class MyResponse < Rack::Response
  def stream
    @stream ||=
      begin
        unless env['rack.hijack?']
          raise NotImplementedError, "server does not support socket hijacking, cannot stream"
        end
        env['rack.hijack'].call
        stream = env['rack.hijack_io']
        write_status_to stream
        write_headers_to stream
        stream
      end
  end
end

Additionally, a particularly "clever" application might be able to do some buffering to allow for late initialization of headers, without a change to the proposed spec. It's a hacky solution to the problem I describe above, and horrendously complex with additional user facing edge cases, but, I could envisage something in Rails (I'd rather not), that operates with the following rules:

  • response.stream buffers outgoing data until the end of the action
  • the end of the action completes mostly as normal, setting headers['rack.hijack'] to use the response body callback api
  • when the callback occurs, response.stream writes out it's buffer, and switches to an immediate write mode

You'd need a more sophisticated stream object than just the hijack_io, but it's entirely possible to write this. This would enable you to avoid threads/callcc's, still not have to write status and headers yourself, but still have socket like IO for the body stream. If you wanted to add user threads (say a thread in an action to free up the webserver later on - good for really long running actions or the like) then all you'd need to do is ensure that the stream buffer -> immediate mode switching is thread safe. That's also not too hard. It would take a bit longer, but if this is really what you want, I'd be happy to write up a basic implementation for you.

James Tucker
Owner

@tenderlove I think you want to be able to avoid this: https://github.com/rails/rails/blob/master/actionpack/lib/action_controller/metal/live.rb#L102-L125 but this is not a rack problem really. It's a problem with your API. Your API has no indicator to the user of when writes really happen, or when the response stops and the stream starts - it's just implied. Your API already forces people to make a distinction between "things that set headers" and "things that stream", that is here: https://github.com/rails/rails/blob/master/actionpack/lib/action_controller/metal/live.rb#L69 . Ok, so they can cause this "by accident" instead of explicitly declaring it - meaning they're in some ways more likely to trip on an exception to learn how the API works than to just have to add one line (I think it's a learnability (and thus usability) mistake).

It's not possible to run two pieces of code at the same time, without some kind of threading style abstraction. This isn't an API limitation, it's as you know, the way the world works. Is there some other solution you have in mind where you can overcome this limitation? - To be clear, you want server code (sending status and headers) and middleware code (altering status and headers in after actions) to be running at the same time as user code (streaming data), I think without threads, if I understand your concerns correctly.

Now we can serialize these operations a bit, and in these cases we have two options:

  • User code calls to the server to write status and headers

Or

  • Server code calls to user code to write the stream

You seem to indicate you would like to do the prior, which is as I describe above, possible with the request hijack side of this SPEC change. It means that the response is going to have to write the response data itself, rather than the server doing it, which is kind of ugly, but, it will work. Unfortunately, this also results in needing to disregard middleware "after actions" on the response, which will have no effect. In order to preserve middleware after actions (such as headers set after the application is called), then you have no choice but to use the second approach. It's either that, or you drop middleware altogether, and go right to the metal all the time.

We could try to sugar the first case a little, by adding an additional server callback that says "here's the status and the headers, please write them for me", so that we're not writing status and header IO code - but that's so trivial, it may not be worth it. The only advantage to this could be, that if it was a user code -> server code callback, then it could be wrapped by supporting middleware using a similar approach to the chunkyio example I give above. It's not pretty, but middleware is middleware - it needs to sit in the middle somehow. I can draft something along these lines too, if that seems "more correct" for your use case.

Aaron Patterson
Collaborator

My problem with the API that you're proposing, is that it is totally undefined when the user takes control of the socket. What would the following code mean?

It would raise an exception. Header hash modification is forbidden after the headers have been dumped to the socket (which happens when the buffer thread wakes, but the buffer thread won't wake until there is data queued). But since we can "liberate" the socket from the webserver, then we're free to define those semantics.

it's a callback that executes webserver code, to tell the webserver "you don't own the socket anymore, the user does".

Perfect. Exactly what I want.

I think without threads, if I understand your concerns correctly.

No, I'd like a thread that's writing to the socket, an app thread, and a buffer between them. If I can say to the webserver "the socket is mine", then this should be perfect.

Your code example is great. I'll read the telnet example you posted, but so far I am very happy. :-)

Konstantin Haase
Collaborator
rkh commented

The example code posted assumes that in the stack we always hand down the original env hash, never a new/modified one. Otherwise the io can't be added later on. And adding it before hand could lead to undefined behaviour if someone just starts writing to it. Maybe calling the callback should instead return the IO? Or we need some wrapper object in place for the IO, or something.

Konstantin Haase
Collaborator
rkh commented

For websockets, we should also consider updating the spec to no longer require the request body to be rewindable. Though that would have more implications for existing code.

James Tucker
Owner

@rkh re env, yeah, i've been considering that too - I'll switch the should to a must on the returning it. maybe some language that it must be preserved / set / passed down after initialization (after the call to hijack).

James Tucker
Owner

@rkh re rewindable and websockets, i'm not sure how practical that's going to be. clearly multipart and websockets have different usage patterns, maybe this is something where middleware that's reading from either needs to be more aware of the context. web servers supporting websocket pass-through certainly shouldn't be buffering the incoming body before hitting the app.

Konstantin Haase
Collaborator
rkh commented

Yeah, I mean, I think if people really want websockets, they should be fine with running that through something else, as it probably doesn't fit a request/response model anyhow. For SSE and alike, this would be gold already.

Konstantin Haase
Collaborator
rkh commented

it's a callback that executes webserver code, to tell the webserver "you don't own the socket anymore, the user does".

Perfect. Exactly what I want.

@raggi the scenario/api @tenderlove has in mind would only work with a full/pre-header hijack, right? otherwise you need to hand down a callback for starting to stream, no? This would mean that Rails would need to take care of writing headers to the stream, potentially messing with middleware, right?

James Tucker
Owner

@rkh correct, i'm considering adding some items to rack::util to assist with that.

Konstantin Haase
Collaborator
rkh commented

@raggi but either way you'd need to hand the reply back down the stack to make sure all the middleware adds headers. This is even more important considering that middleware usually has to first call the app before even having a header hash to add stuff to.

James Tucker
Owner

@rkh yep, that's why I added the response style hijack api as well as the request style. there's not much we can do to solve that problem in the request style, within the confines of the env -> tuple api. do you have any thoughts to the contrary?

Konstantin Haase
Collaborator
rkh commented

No, I'm just not sure of the advantage over the async protocol Thin and Rainbows implement. To the contrary, that already works with chunking etc.

James Tucker
Owner

The advantages are (TINAEL):

  • You can read as well as write
  • You can do partial close
  • The API doesn't enforce buffering, and allows flushing
  • It's possible to implement the async protocol inside this API, but not vice versa (although as you see in thin, this api isn't ideal for reactors - supporting both requires an API that's truly incompatible with SPEC 1.x)
  • The API is easier for servers to allow extensions - most servers will supply real IO objects, so users will often be able to use ioctl, etc.
  • Hijackers are responsible for handling remote close logic, rather than the async.close callback, which has sensitive semantics.

I'm not sure what you mean by "that already works with chunking", unless the server is chunking as each yields (which can be horribly inefficient), then the async protocol does not have chunking implied at all.

Konstantin Haase
Collaborator
rkh commented

I'm not sure what you mean by "that already works with chunking", unless the server is chunking as each yields (which can be horribly inefficient), then the async protocol does not have chunking implied at all.

Rack::Chunked is chunking as each yields.

The API is easier for servers to allow extensions - most servers will supply real IO objects, so users will often be able to use ioctl, etc.

This also means that the list of methods the io object needs to implement is not well defined.

You can read as well as write.

So should you read from the hijacked IO rather than rack.input?

The API doesn't enforce buffering, and allows flushing

Right.

Hijackers are responsible for handling remote close logic, rather than the async.close callback, which has sensitive semantics.

This might prove difficult for pieces of logic that cannot reflect over the entire stack. For instance, if I was to use the partial hijack and something like Rack::Chunked or Rack::Deflater would be on the stack, I would send out headers suggesting something different from what I'm writing onto the stream. In a similar vain, some handlers print our warnings if I send a response that is neither chunked nor has a content-length.

How far do we go when it comes to requiring it to be an IO? Your example shows how to wrap the IO, but that basically denies access to the file handler, doesn't it?

I'm not doubting the approach, I just wanna make sure we're all on the same page. :)

Aaron Patterson
Collaborator

@rkh Chunked and deflater should be implemented as wrappers around the IO object. I think we can make these middleware work in both cases (both cases being hijacked and non-hijacked). It seems like no middleware should actually hijack a request unless they plan to perform the response. But middleware could wrap a liberated IO object.

James Tucker
Owner

@rkh

We have a nomenclature problem. by async protocol, I assmed you're referring to async.callback, the thin style api. Lets be clear, async.* is the async protocol, body streaming is the current "body response to each and yields strings" protocol.

The list of methods defined in this spec is the minimum that all servers should implement. This is the "portable" spec - it should be enough to implement websockets, rails live, etc. If you want to do "optimization" work, which requires more than this part of the SPEC, you're free to do so by expanding on this API. Doing so requires that the user do this with a specific knowledge of their server implementation. This is described explicitly in the SPEC, and the SPEC recommends that IO-like methods are used. The problem with saying something like "all ruby IO methods" instead is that it's hard, maybe even impossible to implement for all servers, is potentially totally incompatible with HTTP2, and with pipelining, which could be implemented with some server wrappers within the minimal API, at least for some use cases (pipeline has other issues).

Regarding rack.input, if you want to do websockets, yes. In that case, they should be the same IO really, so it makes no difference. Bear in mind that what current servers do most of the time isn't correct for a websocket upgrade (streaming the input body to a file) - but that's easy to fix. They do different things for different use cases. For certain websocket upgrade type scenarios, using Rack::Request and calling params could cause a problem. This is outside of our concern.

async.close is not related to wire formats, it's related to the HTTP client closing the socket before the response is complete, it is a callback in the async protocol for handling this case. It's an important thing for high concurrency async servers. The sync api will instead "discover" this through select for error, or failed reads/writes.

Regarding Rack::Chunked and Rack::Deflater, I'm planning on making them support hijack in the response style spec. They will not do anything in the request style spec - as that's the "really get out the way" spec. This may deserve some additional convention documentation in the SPEC. Adding to TODO. - Point is, there's two use cases here, one where you really want raw IO control, one where you want to just "write" to the body stream, but make best effort at conforming to both the Rack and HTTP specs.

I'm not sure what you mean by "denies access to the file handler"?

Great questions and comments, keep em coming :)

James Tucker
Owner

@tenderlove absolutely in line with my thinking. I'm going to also add a TODO to add the "middleware should not hijack unless it plans to handle the response" to the SPEC, under a conventions section.

James Tucker raggi closed this
Hongli Lai

We just implemented the socket hijacking API in Phusion Passenger 4: phusion/passenger@c5b5b2e

Konstantin Haase
Collaborator
rkh commented

\o/

Konstantin Haase
Collaborator
rkh commented

Sinatra 1.4 will be using this API.

Tony Arcieri

I saw this got added to Unicorn as well. I should probably add it to Reel there, eh? ;)

Jonas Schneider jonasschneider referenced this pull request in tent/tentd
Closed

Streaming API #73

3 of 11 tasks complete
Hongli Lai

Something just occurred to me. Why does the partial hijacking API involve assigning a lambda to the "rack.hijack" response header? Can't you just return the lambda as the body?

Also, is env['rack.hijack?'] needed? Can't you just check whether env['rack.hijack'] exists?

José Valim
Collaborator

It it still not clear to me how we will be able to detect in a middleware that we should not touch/buffer the body. For example, take the etag middleware:

https://github.com/rack/rack/blob/master/lib/rack/etag.rb

What we have done in Rails is to set the cache control to no-cache and ensure etag and respects this cache control. We have also added a check to not cache if the body responds to_path (i.e. we are sending a file back). It seems we will need to add an extra check to verify if a hijack happened which makes writing proper middleware even harder than what we have today.

I am sorry but I also don't have any solution to offer (yet).

Hongli Lai

We've written an article on how the hijacking API works: http://blog.phusion.nl/2013/01/23/the-new-rack-socket-hijacking-api/

James Tucker
Owner

@FooBarWidget fair question re. lambda in body. I'll think about it some more.

James Tucker
Owner

@josevalim yep, I don't have a magic solution for that either, within the confines of compatibility. Concern coupling across disconnected objects with just data structures for communication is, well, difficult.

José Valim
Collaborator

Concern coupling across disconnected objects with just data structures for communication is, well, difficult.

Amen.

Marc-André Cournoyer

Sorry for being so late on this.

Trying to figure out how I'll implement this in Thin. Not sure this makes sense since all IO operations should be asynchronous. I understand most ppl do a lot of blocking operations inside their apps anyways, but I don't want to expose an API that encourage that.

Perhaps the rack.hijack_io in Thin could only respond to the *_nonblock methods. Would that still be valid RE the specs? Or could I go as far as having a callback to read and write. Something like this:

def call(env)
  io = env['rack.hijack_io']
  io.read do |data|
    puts data
  end
  ...
end
James Tucker
Owner

@macournoyer I already did it for you... Just look in my branch.

James Tucker
Owner

@FooBarWidget I consider the hijack boolean to be potentially more specialized than the callback proc. The boolean might be managed by a code path that's checking things like "are we in a pipeline (other than the tail)", where it's not appropriate to be hijacking.

Sam

@raggi @josevalim could etag middleware simply check for env['rack.hijack_io'] not being nil, if its set, it should skip? adding special headers just to keep etag happy seems rather awkward.

Sam

@FooBarWidget @evanphx @rkh

How about supporting a "standardish" method on the connection you leak out. #return_to_webserver that way this horrible Connection Close hack can be totally avoided.

I hijack the socket, do what I will, send the data back to the client, call #return_to_webserver if it responds to it, add connection close header if not.

Nothing about the spec seems to conflict with this, and it would give me a path forward to getting message bus long polling working very similarly across multiple web servers without mountains of hack.

James Tucker
Owner

@SamSaffron there's actually a lot of nasty little details that end up being included in that, some of which already exist, but may get substantially worse. By the time you have an arbitrary IO, it could be in any state, and may not be usable by the server. My thoughts are that this could be difficult for server authors to support in the long term, as it opens up a whole host of application - server support requests. I'll think about it some more.

Sam

@raggi I follow, this API is super risky in so many ways (by design), if you hijack in middleware and forget to add the close connection header you can start processing keep alived requests that would never have routed to you.

My main goal here is to get Discourse working across Unicorn / Passenger / Puma / Thin with long polling support, which mean I can't really rely on throw :async any more. I guess I can just cough up the extra tcp handshake after every long poll for now. But I dislike the idea of having some web servers working less efficiently than others.

James Tucker
Owner
Sam

@raggi indeed, haproxy would easily take care of this for you, not sure if apache mod proxy or nginx would. Nor am I sure passenger handles this behind the scenes.

Konstantin Haase
Collaborator

My main goal here is to get Discourse working across Unicorn / Passenger / Puma / Thin with long polling support, which mean I can't really rely on throw :async any more. I guess I can just cough up the extra tcp handshake after every long poll for now. But I dislike the idea of having some web servers working less efficiently than others.

You can do long polling with classic each streaming if you want the connection to be handled by your rack server.

Sam

@rkh sure, that is fine for puma and "non-free" passenger, but you would run out of connections too fast on unicorn or "free" passenger.

James Tucker
Owner

how do you not run out of connections when you hand them back?

Sam

@raggi my understanding is unicorn is one worker thread / socket per process http://unicorn.bogomips.org/PHILOSOPHY.html same for non-concurrent passenger (the free version), I may be wrong, not sure.

My understanding is that the web server would wait till all the each is consumed before allowing any new requests in, (rainbows would allow it, puma and concurrent passenger as well)

James Tucker
Owner

if you're trying to hand the socket back to the webserver after you've hijacked it into being concurrent, you're going to cause or have all kinds of scheduling issues. Why don't you implement this yourself, see how if it works, then we can talk about that implementation and it's edge cases?

Sam

Fair enough @raggi not trying to be a pain in the behind here, just figuring out how to consume the new APIs.

James Tucker
Owner

I know you're not trying to be a pain, I'm just short on time, and there are some implementation details that you need to understand, so I really would suggest you have a go at implementing a stable version of your proposal in a few servers - it will be a valuable exercise.

Konstantin Haase
Collaborator

So, on threaded servers like Puma this will still only allow you to run as many connections in parallel as the size of the thread pool, right? There is no way to "hand the thread back", is there?

Hongli Lai

There is. Spawn your own thread(s) and offload the IO object there, then return from the original thread. Or pass the IO object to an EventMachine thread, and return from the original thread. I'm not 100% sure whether the Rack hijacking spec allows this but at least Phusion Passenger allows it. You can see an example in our test suite: https://github.com/FooBarWidget/passenger/blob/master/test/ruby/request_handler_spec.rb#L197-L208

Konstantin Haase
Collaborator

Yeah, that's what I was getting at: It's not part of the spec.

James Tucker
Owner

The spec deliberately ignores scheduling concerns. Servers are encouraged to clearly document how concurrency safe the sockets are. For most servers (all so far?) that just hand off a real socket, that's easy - the app is in control of scheduling and the socket. The socket could be handed off to an app managed thread instead of a server managed thread with no critical concerns.

In the case of having to hand back a socket however, one would have to consider scheduling much more closely, even for "trivial cases". If the server accepts more sockets than it has threads for, then the hand back could easily result in deadlocks / over subscription / stale resources, etc.

Konstantin Haase
Collaborator

Wait, what, how do I hand back a socket?

James Tucker
Owner

You don't, that's the point. If I provided an API for that, it would absolutely need some specification regarding scheduling concerns, which would have been far too big and prescriptive for a 1.x series change.

Konstantin Haase
Collaborator

I think I still don't really see the advantage of the partial hijack vs each-streaming then. At least not from a rails/sinatra/library perspective.

Konstantin Haase
Collaborator

I don't really see the use case for "handing back a socket". I was talking about the original thread not blocking/being reused or something. If the spec would contain a sentence stating that the rack handler permits capturing a large (whatever that means) number of sockets, wouldn't that be enough?

James Tucker
Owner

@rkh the advantage of partial hijack is just for scheduling and io handoff when people can't understand how to deal with each. It's a cop-out, but I had a lot of smart people simply not getting how to yield each regularly.

Regarding handing back a socket, we really have a lot of semantics to cover, is the socket even in a valid http state when it's handed back? A lot more encapsulation is required to make that any kind of sane.

Konstantin Haase
Collaborator

I don't want to hand a socket back, just a thread/control flow.

Konstantin Haase
Collaborator

How does the "for scheduling" part look like? Can I just safely do a Thread.new { .. write to socket here ... }?

Stephen von Takach stakach referenced this pull request in faye/faye
Closed

Add support for Puma #198

James Tucker
Owner
raggi commented

@rkh it is to be defined by the server, but in all implementations I have seen so far, yes, absolutely.

Konstantin Haase
Collaborator
rkh commented

So, which servers understand that?

@FooBarWidget has there been an open source passenger release by now that includes support?

I want to experiment with moving Sinatra streaming over to this mechanism.

Hongli Lai

Phusion Passenger 4.0 RC 4 open source version has been out for 2 weeks now. It supports the hijacking API.

Konstantin Haase
Collaborator
rkh commented

Phusion Passenger 4.0 RC 4 open source version has been out for 2 weeks now. It supports the hijacking API.

Sweet!

Konstantin Haase
Collaborator
rkh commented

@raggi btw, when I run your full hijack example in dev mode, I get a NoMethodError: undefined method 'to_i' for #<Rack::Lint::HijackWrapper:0x007f8cc288ccb8>

Tony Arcieri tarcieri referenced this pull request in celluloid/reel
Closed

Support the Rack hijack API #42

Paul Cortens thoughtless referenced this pull request in polleverywhere/firehose
Open

Make firehose compatible with latest version of Rainbows! #22

Tim Carey-Smith

@raggi is there a automated test to confirm whether a Handler meets the requirements?
It might make sense to put a simple series of things to run in the top description too.
I didn't find it entirely clear.

Tim Carey-Smith

@rkh @raggi the problem is that the hijack.ru example returns an invalid response.
Should the Rack::Lint allow this behaviour of returning an [#<IO>, nil, nil]?

Tim Carey-Smith halorgium referenced this pull request in halorgium/rack-client
Open

Support for hijacking the socket #16

Tony Arcieri tarcieri referenced this pull request in celluloid/reel-rack
Open

Support the Rack hijack API #2

Jonathan Perkin jperkin referenced this pull request from a commit in joyent/pkgsrc
taca Update ruby-rack to 1.5.2.
== Changes

Please note that this release includes a few potentially breaking changes.
Of particular note are:

 * SessionHash is no longer a Hash sublcass
 * Rack::File cache_control parameter is removed in place of headers options

Additonally, SPEC has been updated in several areas and is now at 1,2.

A new SPEC section was introduced that provides two server-optional IO hijacking
APIs. Further information on these APIs will be made available by the community
in good time. In the mean time, some information can be found in the original
pull request: rack/rack#481

* January 21st, 2013: Thirty third public release 1.5.0
  * Introduced hijack SPEC, for before-response and after-response hijacking
  * SessionHash is no longer a Hash subclass
  * Rack::File cache_control parameter is removed, in place of headers options
  * Rack::Auth::AbstractRequest#scheme now yields strings, not symbols
  * Rack::Utils cookie functions now format expires in RFC 2822 format
  * Rack::File now has a default mime type
  * rackup -b 'run Rack::File.new(".")', option provides command line configs
  * Rack::Deflater will no longer double encode bodies
  * Rack::Mime#match? provides convenience for Accept header matching
  * Rack::Utils#q_values provides splitting for Accept headers
  * Rack::Utils#best_q_match provides a helper for Accept headers
  * Rack::Handler.pick provides convenience for finding available servers
  * Puma added to the list of default servers (preferred over Webrick)
  * Various middleware now correctly close body when replacing it
  * Rack::Request#params is no longer persistent with only GET params
  * Rack::Request#update_param and #delete_param provide persistent operations
  * Rack::Request#trusted_proxy? now returns true for local unix sockets
  * Rack::Response no longer forces Content-Types
  * Rack::Sendfile provides local mapping configuration options
  * Rack::Utils#rfc2109 provides old netscape style time output
  * Updated HTTP status codes
  * Ruby 1.8.6 likely no longer passes tests, and is no longer fully supported

* January 28th, 2013: Thirty fourth public release 1.5.1
  * Rack::Lint check_hijack now conforms to other parts of SPEC
  * Added hash-like methods to Abstract::ID::SessionHash for compatibility
  * Various documentation corrections

* February 7th, Thirty fifth public release 1.5.2
  * Fix CVE-2013-0263, timing attack against Rack::Session::Cookie
  * Fix CVE-2013-0262, symlink path traversal in Rack::File
  * Add various methods to Session for enhanced Rails compatibility
  * Request#trusted_proxy? now only matches whole stirngs
  * Add JSON cookie coder, to be default in Rack 1.6+ due to security concerns
  * URLMap host matching in environments that don't set the Host header fixed
  * Fix a race condition that could result in overwritten pidfiles
  * Various documentation additions
9f396d9
Isaiah Peng isaiah referenced this pull request in isaiah/jubilee
Open

Support the Rack hijack API #4

Jonathan Perkin jperkin referenced this pull request from a commit in joyent/pkgsrc
taca Update ruby-rack to 1.5.2.
== Changes

Please note that this release includes a few potentially breaking changes.
Of particular note are:

 * SessionHash is no longer a Hash sublcass
 * Rack::File cache_control parameter is removed in place of headers options

Additonally, SPEC has been updated in several areas and is now at 1,2.

A new SPEC section was introduced that provides two server-optional IO hijacking
APIs. Further information on these APIs will be made available by the community
in good time. In the mean time, some information can be found in the original
pull request: rack/rack#481

* January 21st, 2013: Thirty third public release 1.5.0
  * Introduced hijack SPEC, for before-response and after-response hijacking
  * SessionHash is no longer a Hash subclass
  * Rack::File cache_control parameter is removed, in place of headers options
  * Rack::Auth::AbstractRequest#scheme now yields strings, not symbols
  * Rack::Utils cookie functions now format expires in RFC 2822 format
  * Rack::File now has a default mime type
  * rackup -b 'run Rack::File.new(".")', option provides command line configs
  * Rack::Deflater will no longer double encode bodies
  * Rack::Mime#match? provides convenience for Accept header matching
  * Rack::Utils#q_values provides splitting for Accept headers
  * Rack::Utils#best_q_match provides a helper for Accept headers
  * Rack::Handler.pick provides convenience for finding available servers
  * Puma added to the list of default servers (preferred over Webrick)
  * Various middleware now correctly close body when replacing it
  * Rack::Request#params is no longer persistent with only GET params
  * Rack::Request#update_param and #delete_param provide persistent operations
  * Rack::Request#trusted_proxy? now returns true for local unix sockets
  * Rack::Response no longer forces Content-Types
  * Rack::Sendfile provides local mapping configuration options
  * Rack::Utils#rfc2109 provides old netscape style time output
  * Updated HTTP status codes
  * Ruby 1.8.6 likely no longer passes tests, and is no longer fully supported

* January 28th, 2013: Thirty fourth public release 1.5.1
  * Rack::Lint check_hijack now conforms to other parts of SPEC
  * Added hash-like methods to Abstract::ID::SessionHash for compatibility
  * Various documentation corrections

* February 7th, Thirty fifth public release 1.5.2
  * Fix CVE-2013-0263, timing attack against Rack::Session::Cookie
  * Fix CVE-2013-0262, symlink path traversal in Rack::File
  * Add various methods to Session for enhanced Rails compatibility
  * Request#trusted_proxy? now only matches whole stirngs
  * Add JSON cookie coder, to be default in Rack 1.6+ due to security concerns
  * URLMap host matching in environments that don't set the Host header fixed
  * Fix a race condition that could result in overwritten pidfiles
  * Various documentation additions
6ebd384
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
This page is out of date. Refresh to see the latest.

Showing 2 changed files with 192 additions and 0 deletions. Show diff stats Hide diff stats

  1. +64 0 SPEC
  2. +128 0 lib/rack/lint.rb
64 SPEC
@@ -60,6 +60,9 @@ Rack-specific variables:
60 60 <tt>rack.multithread</tt>:: true if the application object may be simultaneously invoked by another thread in the same process, false otherwise.
61 61 <tt>rack.multiprocess</tt>:: true if an equivalent application object may be simultaneously invoked by another process, false otherwise.
62 62 <tt>rack.run_once</tt>:: true if the server expects (but does not guarantee!) that the application will only be invoked this one time during the life of its containing process. Normally, this will only be true for a server based on CGI (or something similar).
  63 +<tt>rack.hijack?</tt>:: present and true if the server supports connection hijacking. See below, hijacking.
  64 +<tt>rack.hijack</tt>:: an object responding to #call that must be called at least once before using rack.hijack_io. It is recommended #call return rack.hijack_io as well as setting it in env if necessary.
  65 +<tt>rack.hijack_io</tt>:: if rack.hijack? is true, and rack.hijack has received #call, this will contain an object resembling an IO. See hijacking.
63 66 Additional environment specifications have approved to
64 67 standardized middleware APIs. None of these are required to
65 68 be implemented by the server.
@@ -90,6 +93,7 @@ There are the following restrictions:
90 93 * <tt>rack.url_scheme</tt> must either be +http+ or +https+.
91 94 * There must be a valid input stream in <tt>rack.input</tt>.
92 95 * There must be a valid error stream in <tt>rack.errors</tt>.
  96 +* There may be a valid hijack stream in <tt>rack.hijack_io</tt>
93 97 * The <tt>REQUEST_METHOD</tt> must be a valid token.
94 98 * The <tt>SCRIPT_NAME</tt>, if non-empty, must start with <tt>/</tt>
95 99 * The <tt>PATH_INFO</tt>, if non-empty, must start with <tt>/</tt>
@@ -129,12 +133,72 @@ The error stream must respond to +puts+, +write+ and +flush+.
129 133 * +flush+ must be called without arguments and must be called
130 134 in order to make the error appear for sure.
131 135 * +close+ must never be called on the error stream.
  136 +=== Hijacking
  137 +==== Request (before status)
  138 +If rack.hijack? is true then rack.hijack must respond to #call.
  139 +rack.hijack must return the io that will also be assigned (or is
  140 +already present, in rack.hijack_io.
  141 +
  142 +rack.hijack_io must respond to:
  143 +<tt>read, write, read_nonblock, write_nonblock, flush, close,
  144 +close_read, close_write, closed?</tt>
  145 +
  146 +The semantics of these IO methods must be a best effort match to
  147 +those of a normal ruby IO or Socket object, using standard
  148 +arguments and raising standard exceptions. Servers are encouraged
  149 +to simply pass on real IO objects, although it is recognized that
  150 +this approach is not directly compatible with SPDY and HTTP 2.0.
  151 +
  152 +IO provided in rack.hijack_io should preference the
  153 +IO::WaitReadable and IO::WaitWritable APIs wherever supported.
  154 +
  155 +There is a deliberate lack of full specification around
  156 +rack.hijack_io, as semantics will change from server to server.
  157 +Users are encouraged to utilize this API with a knowledge of their
  158 +server choice, and servers may extend the functionality of
  159 +hijack_io to provide additional features to users. The purpose of
  160 +rack.hijack is for Rack to "get out of the way", as such, Rack only
  161 +provides the minimum of specification and support.
  162 +
  163 +If rack.hijack? is false, then rack.hijack should not be set.
  164 +
  165 +If rack.hijack? is false, then rack.hijack_io should not be set.
  166 +==== Response (after headers)
  167 +It is also possible to hijack a response after the status and headers
  168 +have been sent.
  169 +In order to do this, an application may set the special header
  170 +<tt>rack.hijack</tt> to an object that responds to <tt>call</tt>
  171 +accepting an argument that conforms to the <tt>rack.hijack_io</tt>
  172 +protocol.
  173 +
  174 +After the headers have been sent, and this hijack callback has been
  175 +called, the application is now responsible for the remaining lifecycle
  176 +of the IO. The application is also responsible for maintaining HTTP
  177 +semantics. Of specific note, in almost all cases in the current SPEC,
  178 +applications will have wanted to specify the header Connection:close in
  179 +HTTP/1.1, and not Connection:keep-alive, as there is no protocol for
  180 +returning hijacked sockets to the web server. For that purpose, use the
  181 +body streaming API instead (progressively yielding strings via each).
  182 +
  183 +Servers must ignore the <tt>body</tt> part of the response tuple when
  184 +the <tt>rack.hijack</tt> response API is in use.
  185 +
  186 +The special response header <tt>rack.hijack</tt> must only be set
  187 +if the request env has <tt>rack.hijack?</tt> <tt>true</tt>.
  188 +==== Conventions
  189 +* Middleware should not use hijack unless it is handling the whole
  190 + response.
  191 +* Middleware may wrap the IO object for the response pattern.
  192 +* Middleware should not wrap the IO object for the request pattern. The
  193 + request pattern is intended to provide the hijacker with "raw tcp".
132 194 == The Response
133 195 === The Status
134 196 This is an HTTP status. When parsed as integer (+to_i+), it must be
135 197 greater than or equal to 100.
136 198 === The Headers
137 199 The header must respond to +each+, and yield values of key and value.
  200 +Special headers starting "rack." are for communicating with the
  201 +server, and must not be sent back to the client.
138 202 The header keys must be Strings.
139 203 The header must not contain a +Status+ key,
140 204 contain keys with <tt>:</tt> or newlines in their name,
128 lib/rack/lint.rb
... ... @@ -1,4 +1,5 @@
1 1 require 'rack/utils'
  2 +require 'forwardable'
2 3
3 4 module Rack
4 5 # Rack::Lint validates your application and the requests and
@@ -50,6 +51,9 @@ def _call(env)
50 51 check_status status
51 52 ## the *headers*,
52 53 check_headers headers
  54 +
  55 + check_hijack_response headers, env
  56 +
53 57 ## and the *body*.
54 58 check_content_type status, headers
55 59 check_content_length status, headers
@@ -121,6 +125,9 @@ def check_env(env)
121 125 ## <tt>rack.multithread</tt>:: true if the application object may be simultaneously invoked by another thread in the same process, false otherwise.
122 126 ## <tt>rack.multiprocess</tt>:: true if an equivalent application object may be simultaneously invoked by another process, false otherwise.
123 127 ## <tt>rack.run_once</tt>:: true if the server expects (but does not guarantee!) that the application will only be invoked this one time during the life of its containing process. Normally, this will only be true for a server based on CGI (or something similar).
  128 + ## <tt>rack.hijack?</tt>:: present and true if the server supports connection hijacking. See below, hijacking.
  129 + ## <tt>rack.hijack</tt>:: an object responding to #call that must be called at least once before using rack.hijack_io. It is recommended #call return rack.hijack_io as well as setting it in env if necessary.
  130 + ## <tt>rack.hijack_io</tt>:: if rack.hijack? is true, and rack.hijack has received #call, this will contain an object resembling an IO. See hijacking.
124 131 ##
125 132
126 133 ## Additional environment specifications have approved to
@@ -227,6 +234,8 @@ def check_env(env)
227 234 check_input env["rack.input"]
228 235 ## * There must be a valid error stream in <tt>rack.errors</tt>.
229 236 check_error env["rack.errors"]
  237 + ## * There may be a valid hijack stream in <tt>rack.hijack_io</tt>
  238 + check_hijack env
230 239
231 240 ## * The <tt>REQUEST_METHOD</tt> must be a valid token.
232 241 assert("REQUEST_METHOD unknown: #{env["REQUEST_METHOD"]}") {
@@ -417,6 +426,121 @@ def close(*args)
417 426 end
418 427 end
419 428
  429 + class HijackWrapper
  430 + include Assertion
  431 + extend Forwardable
  432 +
  433 + REQUIRED_METHODS = [
  434 + :read, :write, :read_nonblock, :write_nonblock, :flush, :close,
  435 + :close_read, :close_write, :closed?
  436 + ]
  437 +
  438 + def_delegators :@io, *REQUIRED_METHODS
  439 +
  440 + def initialize(io)
  441 + @io = io
  442 + REQUIRED_METHODS.each do |meth|
  443 + assert("rack.hijack_io must respond to #{meth}") { io.respond_to? meth }
  444 + end
  445 + end
  446 + end
  447 +
  448 + ## === Hijacking
  449 + #
  450 + # AUTHORS: n.b. The trailing whitespace between paragraphs is important and
  451 + # should not be removed. The whitespace creates paragraphs in the RDoc
  452 + # output.
  453 + #
  454 + ## ==== Request (before status)
  455 + def check_hijack(env)
  456 + if env['rack.hijack?']
  457 + ## If rack.hijack? is true then rack.hijack must respond to #call.
  458 + original_hijack = env['rack.hijack']
  459 + assert("rack.hijack must respond to call") { original_hijack.respond_to?(:call) }
  460 + env['rack.hijack'] = proc do
  461 + ## rack.hijack must return the io that will also be assigned (or is
  462 + ## already present, in rack.hijack_io.
  463 + io = original_hijack.call
  464 + HijackWrapper.new(io)
  465 + ##
  466 + ## rack.hijack_io must respond to:
  467 + ## <tt>read, write, read_nonblock, write_nonblock, flush, close,
  468 + ## close_read, close_write, closed?</tt>
  469 + ##
  470 + ## The semantics of these IO methods must be a best effort match to
  471 + ## those of a normal ruby IO or Socket object, using standard
  472 + ## arguments and raising standard exceptions. Servers are encouraged
  473 + ## to simply pass on real IO objects, although it is recognized that
  474 + ## this approach is not directly compatible with SPDY and HTTP 2.0.
  475 + ##
  476 + ## IO provided in rack.hijack_io should preference the
  477 + ## IO::WaitReadable and IO::WaitWritable APIs wherever supported.
  478 + ##
  479 + ## There is a deliberate lack of full specification around
  480 + ## rack.hijack_io, as semantics will change from server to server.
  481 + ## Users are encouraged to utilize this API with a knowledge of their
  482 + ## server choice, and servers may extend the functionality of
  483 + ## hijack_io to provide additional features to users. The purpose of
  484 + ## rack.hijack is for Rack to "get out of the way", as such, Rack only
  485 + ## provides the minimum of specification and support.
  486 + env['rack.hijack_io'] = HijackWrapper.new(env['rack.hijack_io'])
  487 + io
  488 + end
  489 + else
  490 + ##
  491 + ## If rack.hijack? is false, then rack.hijack should not be set.
  492 + assert("rack.hijack? is false, but rack.hijack is present") { env['rack.hijack'].nil? }
  493 + ##
  494 + ## If rack.hijack? is false, then rack.hijack_io should not be set.
  495 + assert("rack.hijack? is false, but rack.hijack_io is present") { env['rack.hijack_io'].nil? }
  496 + end
  497 + end
  498 +
  499 + ## ==== Response (after headers)
  500 + ## It is also possible to hijack a response after the status and headers
  501 + ## have been sent.
  502 + def check_hijack_response(headers, env)
  503 + ## In order to do this, an application may set the special header
  504 + ## <tt>rack.hijack</tt> to an object that responds to <tt>call</tt>
  505 + ## accepting an argument that conforms to the <tt>rack.hijack_io</tt>
  506 + ## protocol.
  507 + ##
  508 + ## After the headers have been sent, and this hijack callback has been
  509 + ## called, the application is now responsible for the remaining lifecycle
  510 + ## of the IO. The application is also responsible for maintaining HTTP
  511 + ## semantics. Of specific note, in almost all cases in the current SPEC,
  512 + ## applications will have wanted to specify the header Connection:close in
  513 + ## HTTP/1.1, and not Connection:keep-alive, as there is no protocol for
  514 + ## returning hijacked sockets to the web server. For that purpose, use the
  515 + ## body streaming API instead (progressively yielding strings via each).
  516 + ##
  517 + ## Servers must ignore the <tt>body</tt> part of the response tuple when
  518 + ## the <tt>rack.hijack</tt> response API is in use.
  519 +
  520 + if env['rack.hijack?'] && headers['rack.hijack']
  521 + assert('rack.hijack header must respond to #call') {
  522 + headers['rack.hijack'].respond_to? :call
  523 + }
  524 + original_hijack = headers['rack.hijack']
  525 + headers['rack.hijack'] = proc do |io|
  526 + original_hijack.call HijackWrapper.new(io)
  527 + end
  528 + else
  529 + ##
  530 + ## The special response header <tt>rack.hijack</tt> must only be set
  531 + ## if the request env has <tt>rack.hijack?</tt> <tt>true</tt>.
  532 + assert('rack.hijack header must not be present if server does not support hijacking') {
  533 + headers['rack.hijack'].nil?
  534 + }
  535 + end
  536 + end
  537 + ## ==== Conventions
  538 + ## * Middleware should not use hijack unless it is handling the whole
  539 + ## response.
  540 + ## * Middleware may wrap the IO object for the response pattern.
  541 + ## * Middleware should not wrap the IO object for the request pattern. The
  542 + ## request pattern is intended to provide the hijacker with "raw tcp".
  543 +
420 544 ## == The Response
421 545
422 546 ## === The Status
@@ -433,6 +557,10 @@ def check_headers(header)
433 557 header.respond_to? :each
434 558 }
435 559 header.each { |key, value|
  560 + ## Special headers starting "rack." are for communicating with the
  561 + ## server, and must not be sent back to the client.
  562 + next if key =~ /^rack\..+$/
  563 +
436 564 ## The header keys must be Strings.
437 565 assert("header key must be a string, was #{key.class}") {
438 566 key.kind_of? String

Tip: You can add notes to lines in a file. Hover to the left of a line to make a note

Something went wrong with that request. Please try again.