Add WebSocket and SSE options #1272

Open
wants to merge 6 commits into
from

Conversation

Projects
None yet
8 participants
@ohler55
Contributor

ohler55 commented Apr 28, 2018

This suggested addition is the result of a collaboration between @boazsegev of Iodine fame and myself. Both Agoo and Iodine will support these suggested additions. At this point I don't think any other servers are supporting WebSockets and SSE so this would set the standard for others to follow.

@ioquatix

This comment has been minimized.

Show comment Hide comment
@ioquatix

ioquatix May 1, 2018

Contributor

Interesting.

I already support completely transparent asynchronous web sockets in falcon. It uses rack.hijack just fine. An example is given here: https://github.com/socketry/async-websocket/blob/master/examples/chat/config.ru

Of course with this model supporting SSE is also trivial.

Regarding the event driven model which is being exposed via rack.upgrade: I don't think exposing an event driven API is a great way to make systems asynchronous. It's kind of limiting.

Regarding the overall design - careful thought needs to be given to how HTTP/2 fits into this picture. Ideally, SSE, WebSockets, and other "interactive" streams, transit via reading from the request body and writing to the response body. That's the model which I feel works best with HTTP/2, where each request/response is encapsulated into a full duplex stream.

Contributor

ioquatix commented May 1, 2018

Interesting.

I already support completely transparent asynchronous web sockets in falcon. It uses rack.hijack just fine. An example is given here: https://github.com/socketry/async-websocket/blob/master/examples/chat/config.ru

Of course with this model supporting SSE is also trivial.

Regarding the event driven model which is being exposed via rack.upgrade: I don't think exposing an event driven API is a great way to make systems asynchronous. It's kind of limiting.

Regarding the overall design - careful thought needs to be given to how HTTP/2 fits into this picture. Ideally, SSE, WebSockets, and other "interactive" streams, transit via reading from the request body and writing to the response body. That's the model which I feel works best with HTTP/2, where each request/response is encapsulated into a full duplex stream.

@ioquatix

This comment has been minimized.

Show comment Hide comment
@ioquatix

ioquatix May 1, 2018

Contributor

Bare in mind that async-websocket doesn't support web sockets over HTTP/2, this is something I'll investigate in the coming months.

This is also one of the challenges of implementing rack.hijack for HTTP/2. You need to be careful of what it means, exactly, to hijack a connection (or stream in the case of HTTP/2).

Contributor

ioquatix commented May 1, 2018

Bare in mind that async-websocket doesn't support web sockets over HTTP/2, this is something I'll investigate in the coming months.

This is also one of the challenges of implementing rack.hijack for HTTP/2. You need to be careful of what it means, exactly, to hijack a connection (or stream in the case of HTTP/2).

@boazsegev

This comment has been minimized.

Show comment Hide comment
@boazsegev

boazsegev May 1, 2018

I think the HTTP/2 layer is something that's handled only by the server. It shouldn't change the rack.upgrade semantics since it isn't one of the protocols / behaviors that should be handled by the application.

I think the idea is great because it decouples the application from any network logic.

While hijack solutions remove the server, changing the relationship from network<=>server<=>application to network<=>application, this approach keeps the application away from the network, keeping the initial (and desired) relationship intact.

EDIT:

I also published a blog post about this PR and tried to explain this difference on a Reddit thread. I'm not sure how well I did explaining my thought on the matter, but I think it's a wonderful approach.

boazsegev commented May 1, 2018

I think the HTTP/2 layer is something that's handled only by the server. It shouldn't change the rack.upgrade semantics since it isn't one of the protocols / behaviors that should be handled by the application.

I think the idea is great because it decouples the application from any network logic.

While hijack solutions remove the server, changing the relationship from network<=>server<=>application to network<=>application, this approach keeps the application away from the network, keeping the initial (and desired) relationship intact.

EDIT:

I also published a blog post about this PR and tried to explain this difference on a Reddit thread. I'm not sure how well I did explaining my thought on the matter, but I think it's a wonderful approach.

@ioquatix

This comment has been minimized.

Show comment Hide comment
@ioquatix

ioquatix May 1, 2018

Contributor

@boazsegev While I see where you are coming from, I think that your idea of a semantic model is much higher than what is exposed by HTTP.

Given the direction of HTTP/2, it's clear to me that the base HTTP semantic model is a full-duplex un-buffered stream of chunks. Whether you implement this with multiple HTTP/1 connections or HTTP/2 streams (which to me is the network layer), is irrelevant to the application.

The application simply sees a full duplex stream of chunks. That's the semantic model for the application. On top of this you can implement your desired web sockets, sse, and other interesting things, like streaming responses, and so on.

Your semantic model has a tight coupling with your desired application level protocols, but I think that's a very limiting design. For example, you can't implement general streaming responses like https://github.com/socketry/falcon/blob/master/examples/beer/config.ru (the fact it uses async is irrelevant, you could just as easily do it with threads).

Contributor

ioquatix commented May 1, 2018

@boazsegev While I see where you are coming from, I think that your idea of a semantic model is much higher than what is exposed by HTTP.

Given the direction of HTTP/2, it's clear to me that the base HTTP semantic model is a full-duplex un-buffered stream of chunks. Whether you implement this with multiple HTTP/1 connections or HTTP/2 streams (which to me is the network layer), is irrelevant to the application.

The application simply sees a full duplex stream of chunks. That's the semantic model for the application. On top of this you can implement your desired web sockets, sse, and other interesting things, like streaming responses, and so on.

Your semantic model has a tight coupling with your desired application level protocols, but I think that's a very limiting design. For example, you can't implement general streaming responses like https://github.com/socketry/falcon/blob/master/examples/beer/config.ru (the fact it uses async is irrelevant, you could just as easily do it with threads).

@ioquatix

This comment has been minimized.

Show comment Hide comment
@ioquatix

ioquatix May 1, 2018

Contributor

I think I should mention this discussion here since it seems relevant: #1148

Contributor

ioquatix commented May 1, 2018

I think I should mention this discussion here since it seems relevant: #1148

@ohler55

This comment has been minimized.

Show comment Hide comment
@ohler55

ohler55 May 1, 2018

Contributor

I'd like to point out that Rack is event based as are the WebSocket and SSE API/specs. The Rack hijack offers a streaming back door that is not addressed by the spec. This PR attempts to formalize an approach to using WebSocket and SSE that is consistent with the use pattern common in modern web applications with JavaScript front ends. It is not an attempt to get rid of the hijack option. There is certainly reason to continue support for hijacking but that should not be a reason to reject this PR.

Contributor

ohler55 commented May 1, 2018

I'd like to point out that Rack is event based as are the WebSocket and SSE API/specs. The Rack hijack offers a streaming back door that is not addressed by the spec. This PR attempts to formalize an approach to using WebSocket and SSE that is consistent with the use pattern common in modern web applications with JavaScript front ends. It is not an attempt to get rid of the hijack option. There is certainly reason to continue support for hijacking but that should not be a reason to reject this PR.

@boazsegev

This comment has been minimized.

Show comment Hide comment
@boazsegev

boazsegev May 1, 2018

@ioquatix , you're totally right, I can't implement that code... nobody can unless they implement Async::Task and add that as a Rack dependency.

As for streaming solutions: SSE is a streaming solution.

If you refer to video streaming or binary data requirements, WebSockets are perfect for that.

If you meant old-school HTTP streaming... yes, that's true. I'm not even sure that I should, but I'm aware that I don't. In fact, I'm pretty sure the Rack model wouldn't be ideal for streaming anyway.

@ioquatix , you're totally right, I can't implement that code... nobody can unless they implement Async::Task and add that as a Rack dependency.

As for streaming solutions: SSE is a streaming solution.

If you refer to video streaming or binary data requirements, WebSockets are perfect for that.

If you meant old-school HTTP streaming... yes, that's true. I'm not even sure that I should, but I'm aware that I don't. In fact, I'm pretty sure the Rack model wouldn't be ideal for streaming anyway.

@ioquatix

This comment has been minimized.

Show comment Hide comment
@ioquatix

ioquatix May 1, 2018

Contributor

I'd like to point out that Rack is event based

I'm not convinced by this

If anything, rack is based on composable middleware. There is no event handling.

The Rack hijack offers a streaming back door that is not addressed by the spec

Sorry, but this is complete dump trucks:

rack/SPEC

Lines 87 to 96 in 154ac52

<tt>rack.hijack?</tt>:: present and true if the server supports
connection hijacking. See below, hijacking.
<tt>rack.hijack</tt>:: an object responding to #call that must be
called at least once before using
rack.hijack_io.
It is recommended #call return rack.hijack_io
as well as setting it in env if necessary.
<tt>rack.hijack_io</tt>:: if rack.hijack? is true, and rack.hijack
has received #call, this will contain
an object resembling an IO. See hijacking.

This PR attempts to formalize an approach to using WebSocket and SSE that is consistent with the use pattern common in modern web applications with JavaScript front ends

This is the guts of the matter. Why is this necessary? It's clearly already possible to do it - so we have to ask the question, do we need to formalise this in rack, or is there something simpler that we could formalise which allows more possibilities. We should accept or reject this PR based on the merits of what it adds to Rack, as a specification. I think there are better, more generic, options.

Contributor

ioquatix commented May 1, 2018

I'd like to point out that Rack is event based

I'm not convinced by this

If anything, rack is based on composable middleware. There is no event handling.

The Rack hijack offers a streaming back door that is not addressed by the spec

Sorry, but this is complete dump trucks:

rack/SPEC

Lines 87 to 96 in 154ac52

<tt>rack.hijack?</tt>:: present and true if the server supports
connection hijacking. See below, hijacking.
<tt>rack.hijack</tt>:: an object responding to #call that must be
called at least once before using
rack.hijack_io.
It is recommended #call return rack.hijack_io
as well as setting it in env if necessary.
<tt>rack.hijack_io</tt>:: if rack.hijack? is true, and rack.hijack
has received #call, this will contain
an object resembling an IO. See hijacking.

This PR attempts to formalize an approach to using WebSocket and SSE that is consistent with the use pattern common in modern web applications with JavaScript front ends

This is the guts of the matter. Why is this necessary? It's clearly already possible to do it - so we have to ask the question, do we need to formalise this in rack, or is there something simpler that we could formalise which allows more possibilities. We should accept or reject this PR based on the merits of what it adds to Rack, as a specification. I think there are better, more generic, options.

@ioquatix

This comment has been minimized.

Show comment Hide comment
@ioquatix

ioquatix May 1, 2018

Contributor

@ioquatix , you're totally right, I can't implement that code... nobody can unless they implement Async::Task and add that as a Rack dependency.

I don't think I proposed that anywhere. I just said, with the current Rack spec, you can do it, and it scales well. We don't need to change the rack spec to have a great implementation of web sockets.

As for streaming solutions: SSE is a streaming solution.

It only works in one direction and it only works in a very specific context. It's not general response streaming, as in body.each {|chunk| ... peer.write(chunk) ...}

If you refer to video streaming or binary data requirements, WebSockets are perfect for that.

If you meant old-school HTTP streaming... yes, that's true. I'm not even sure that I should, but I'm aware that I don't. In fact, I'm pretty sure the Rack model wouldn't be ideal for streaming anyway.

The Rack model is great for streaming responses. That's the whole point of the response body responding to each.

Contributor

ioquatix commented May 1, 2018

@ioquatix , you're totally right, I can't implement that code... nobody can unless they implement Async::Task and add that as a Rack dependency.

I don't think I proposed that anywhere. I just said, with the current Rack spec, you can do it, and it scales well. We don't need to change the rack spec to have a great implementation of web sockets.

As for streaming solutions: SSE is a streaming solution.

It only works in one direction and it only works in a very specific context. It's not general response streaming, as in body.each {|chunk| ... peer.write(chunk) ...}

If you refer to video streaming or binary data requirements, WebSockets are perfect for that.

If you meant old-school HTTP streaming... yes, that's true. I'm not even sure that I should, but I'm aware that I don't. In fact, I'm pretty sure the Rack model wouldn't be ideal for streaming anyway.

The Rack model is great for streaming responses. That's the whole point of the response body responding to each.

@ioquatix

This comment has been minimized.

Show comment Hide comment
@ioquatix

ioquatix May 1, 2018

Contributor

By the way, I'm not trying to shit all over your PR, although it probably feels like it. Congratulations for making something, and implementing it and proposing it. Basically, I just don't see the logic of it though. I think there are better things in Rack that need to be addressed, and I think the direction of HTTP/2 confirms this.

Contributor

ioquatix commented May 1, 2018

By the way, I'm not trying to shit all over your PR, although it probably feels like it. Congratulations for making something, and implementing it and proposing it. Basically, I just don't see the logic of it though. I think there are better things in Rack that need to be addressed, and I think the direction of HTTP/2 confirms this.

@ohler55

This comment has been minimized.

Show comment Hide comment
@ohler55

ohler55 May 1, 2018

Contributor

Of course Rack is event based. The trigger for all Rack code is the #call(env) callback where env is the encapsulation of the event that has occurred. That event being an HTTP request. After that you are correct, the handling is based on a middleware model.

Contributor

ohler55 commented May 1, 2018

Of course Rack is event based. The trigger for all Rack code is the #call(env) callback where env is the encapsulation of the event that has occurred. That event being an HTTP request. After that you are correct, the handling is based on a middleware model.

@boazsegev

This comment has been minimized.

Show comment Hide comment
@boazsegev

boazsegev May 1, 2018

I'm paraphrasing slightly to focus on your question, forgive me and correct me if I misunderstood:

Why is this necessary? ... is there something simpler that we could formalise which allows more possibilities.

I believe this is necessary because hijack is complex.

There's a difference between simple and easy. hijack is easy - but not simple.

It might seem simple for Rack and maybe even simple in the server, but it's a super complex solution that requires applications to add IO handling code.

The hijack pattern is an anti-DRY and anti-SOLID pattern that breaks away from object oriented approaches and requires applications to become familiar with the networking layers.

Sure, we might need to keep hijack around because some things are just impossible without it, but that doesn't make it the best solution for everything.

On the other hand, the callback approach in this PR is simple, but it's clearly not easy.

Servers will have to put in some work (less work than applications need to put in for hijack, but still).

However, the solution is simple and follows both DRY and SOLID principles.

The callback approach isn't a generic solution, it's just good for things that lend themselves to using callbacks, but this is the advantage of this approach.

IMHO, This is a perfect fit for everything callback related and it's extendible (if WebSockets 2.0 come around, nothing needs to be changed on the Rack/Application side, just the server).

boazsegev commented May 1, 2018

I'm paraphrasing slightly to focus on your question, forgive me and correct me if I misunderstood:

Why is this necessary? ... is there something simpler that we could formalise which allows more possibilities.

I believe this is necessary because hijack is complex.

There's a difference between simple and easy. hijack is easy - but not simple.

It might seem simple for Rack and maybe even simple in the server, but it's a super complex solution that requires applications to add IO handling code.

The hijack pattern is an anti-DRY and anti-SOLID pattern that breaks away from object oriented approaches and requires applications to become familiar with the networking layers.

Sure, we might need to keep hijack around because some things are just impossible without it, but that doesn't make it the best solution for everything.

On the other hand, the callback approach in this PR is simple, but it's clearly not easy.

Servers will have to put in some work (less work than applications need to put in for hijack, but still).

However, the solution is simple and follows both DRY and SOLID principles.

The callback approach isn't a generic solution, it's just good for things that lend themselves to using callbacks, but this is the advantage of this approach.

IMHO, This is a perfect fit for everything callback related and it's extendible (if WebSockets 2.0 come around, nothing needs to be changed on the Rack/Application side, just the server).

@boazsegev

This comment has been minimized.

Show comment Hide comment
@boazsegev

boazsegev May 1, 2018

@ioquatix , I was looking through the falcon repo and you seem to have done great work there.

I understand your concerns and your wish to have a better solution. It seems to me that you yourself put in a lot of time and effort into improving the sated of affairs and making things easier and better for everyone.

I appreciate your approach and I find myself slightly surprised by the challenges you raise against this PR.

It seems to me that you're using similar semantics with different method names (i.e., your WebSocket next_message seems to block the current thread, allowing other threads to handle other server events until a new message becomes available... or maybe I misunderstood).

Sure, my implementation is evented and your implementation is multi-threaded, but both lend themselves to the same approach - application side "callbacks".

Your implementation could just as easily remain threaded and hide the next_event loop away so it calls the correct callback.

I'm not sure I understand your reservations.

@ioquatix , I was looking through the falcon repo and you seem to have done great work there.

I understand your concerns and your wish to have a better solution. It seems to me that you yourself put in a lot of time and effort into improving the sated of affairs and making things easier and better for everyone.

I appreciate your approach and I find myself slightly surprised by the challenges you raise against this PR.

It seems to me that you're using similar semantics with different method names (i.e., your WebSocket next_message seems to block the current thread, allowing other threads to handle other server events until a new message becomes available... or maybe I misunderstood).

Sure, my implementation is evented and your implementation is multi-threaded, but both lend themselves to the same approach - application side "callbacks".

Your implementation could just as easily remain threaded and hide the next_event loop away so it calls the correct callback.

I'm not sure I understand your reservations.

@ioquatix

This comment has been minimized.

Show comment Hide comment
@ioquatix

ioquatix May 1, 2018

Contributor

I believe this is necessary because hijack is complex.

Those are all really good points and I appreciate your thinking.

To be honest, in falcon, implementing your proposed API wouldn't be more than about 100 lines of code, perhaps less. So, it's not complicated to implement, because falcon already has a well defined and good concurrency model. That's the context in which I'm thinking about your PR, by the way.

Whether the complexity is in the server, or the application code, is an interesting point and one that I can agree with to a certain extent. I don't think there is a strong argument either way.

@ioquatix , I was looking through the falcon repo and you seem to have done great work there.

Thanks, I can tell we are going to get along well now.

I understand your concerns and your wish to have a better solution. It seems to me that you yourself put in a lot of time and effort into improving the sated of affairs and making things easier and better for everyone.

Yes, but I still haven't made a 1.0 release yet. It's coming soon hopefully.

I appreciate your approach and I find myself slightly surprised by the challenges you raise against this PR.

Fair enough.

It seems to me that you're using similar semantics with different method names (i.e., your WebSocket next_message seems to block the current thread, allowing other threads to handle other server events until a new message becomes available... or maybe I misunderstood).

That's almost right. Async doesn't use threads, it uses fibers which are cooperatively scheduled. They have much less overhead than threads.

Sure, my implementation is evented and your implementation is multi-threaded, but both lend themselves to the same approach - application side "callbacks".

Async is event driven. When a fiber performs an operation that would block, it yields back to the reactor which resumes once the operation can continue.

I deliberately try to avoid callbacks because in my experience they lead to callback hell. The difference is that with callbacks you need to use variables to track state, and each time the callback is invoked you have to process that state to figure out what to do next. With fibers, you naturally resume where you were with your stack intact. You can implement complicated state machines with ease.

Your implementation could just as easily remain threaded and hide the next_event loop away so it calls the correct callback.

Yes, this is feasible.

I'm not sure I understand your reservations.

I like this PR from the point of view that it tries to provide some generic interface for web sockets and server sent events. Making a generic interface for that is both simple and difficult. I admire the you've done it and implemented it.

My main reservation is that your proposed API is incomplete. The rack specification as it stands, is really simple and allows for a lot of flexibility. What you've proposed is an extension which fundamentally encodes a model for concurrency into Rack, something that it hasn't had except in a very precise location (response body #each).

To look at it another way, falcon passes the rack linter, yet it implements real-time streaming requests and responses. So, within the current confines of Rack, we can enjoy these advanced features.

Your proposed API exposes an incomplete model for concurrency and the phrase that comes to my mind is "thar be the dragons". What happens if you stick a RestClient.get into on_message - does it stall the entire server? How do you communicate between web sockets (in your PR I assume you'd need a shared backend like redis)? Whether or not you agree with it, the node model of single process, multiple connections is one that works very well - you can have an array of web socket connections, send messages to all of them, etc. This model can work well with Rack as shown by falcon (e.g. https://github.com/socketry/async-websocket/tree/master/examples/chat).

Whatever way you cut it, Ruby has a pretty frustrating model for concurrency right now (and IO too, but it's slowly getting better). People are working on it, myself, others (Eric, https://bugs.ruby-lang.org/issues/13618), ko1 (guilds) and I have no doubt that good things are coming to this space.

But, Rack, right now, has to be a specification that works with Ruby as it is today. If Thribers become part of Ruby (I personally hope not), the way servers should implement web-sockets should be totally different. It might be possible to make a MVP, like this PR, but I think it's loading Rack up with too much complexity.

Contributor

ioquatix commented May 1, 2018

I believe this is necessary because hijack is complex.

Those are all really good points and I appreciate your thinking.

To be honest, in falcon, implementing your proposed API wouldn't be more than about 100 lines of code, perhaps less. So, it's not complicated to implement, because falcon already has a well defined and good concurrency model. That's the context in which I'm thinking about your PR, by the way.

Whether the complexity is in the server, or the application code, is an interesting point and one that I can agree with to a certain extent. I don't think there is a strong argument either way.

@ioquatix , I was looking through the falcon repo and you seem to have done great work there.

Thanks, I can tell we are going to get along well now.

I understand your concerns and your wish to have a better solution. It seems to me that you yourself put in a lot of time and effort into improving the sated of affairs and making things easier and better for everyone.

Yes, but I still haven't made a 1.0 release yet. It's coming soon hopefully.

I appreciate your approach and I find myself slightly surprised by the challenges you raise against this PR.

Fair enough.

It seems to me that you're using similar semantics with different method names (i.e., your WebSocket next_message seems to block the current thread, allowing other threads to handle other server events until a new message becomes available... or maybe I misunderstood).

That's almost right. Async doesn't use threads, it uses fibers which are cooperatively scheduled. They have much less overhead than threads.

Sure, my implementation is evented and your implementation is multi-threaded, but both lend themselves to the same approach - application side "callbacks".

Async is event driven. When a fiber performs an operation that would block, it yields back to the reactor which resumes once the operation can continue.

I deliberately try to avoid callbacks because in my experience they lead to callback hell. The difference is that with callbacks you need to use variables to track state, and each time the callback is invoked you have to process that state to figure out what to do next. With fibers, you naturally resume where you were with your stack intact. You can implement complicated state machines with ease.

Your implementation could just as easily remain threaded and hide the next_event loop away so it calls the correct callback.

Yes, this is feasible.

I'm not sure I understand your reservations.

I like this PR from the point of view that it tries to provide some generic interface for web sockets and server sent events. Making a generic interface for that is both simple and difficult. I admire the you've done it and implemented it.

My main reservation is that your proposed API is incomplete. The rack specification as it stands, is really simple and allows for a lot of flexibility. What you've proposed is an extension which fundamentally encodes a model for concurrency into Rack, something that it hasn't had except in a very precise location (response body #each).

To look at it another way, falcon passes the rack linter, yet it implements real-time streaming requests and responses. So, within the current confines of Rack, we can enjoy these advanced features.

Your proposed API exposes an incomplete model for concurrency and the phrase that comes to my mind is "thar be the dragons". What happens if you stick a RestClient.get into on_message - does it stall the entire server? How do you communicate between web sockets (in your PR I assume you'd need a shared backend like redis)? Whether or not you agree with it, the node model of single process, multiple connections is one that works very well - you can have an array of web socket connections, send messages to all of them, etc. This model can work well with Rack as shown by falcon (e.g. https://github.com/socketry/async-websocket/tree/master/examples/chat).

Whatever way you cut it, Ruby has a pretty frustrating model for concurrency right now (and IO too, but it's slowly getting better). People are working on it, myself, others (Eric, https://bugs.ruby-lang.org/issues/13618), ko1 (guilds) and I have no doubt that good things are coming to this space.

But, Rack, right now, has to be a specification that works with Ruby as it is today. If Thribers become part of Ruby (I personally hope not), the way servers should implement web-sockets should be totally different. It might be possible to make a MVP, like this PR, but I think it's loading Rack up with too much complexity.

@boazsegev

This comment has been minimized.

Show comment Hide comment
@boazsegev

boazsegev May 1, 2018

@ioquatix interesting, thanks for taking the time to explain.

I love the fiber approach, as it hides away the event reactor and make things more comfortable to manage.

My main reservation is that your proposed API is incomplete.

Yes, you are right that there are concerns related to the proposed approach. It doesn't deal with IPC or client<=>client communications. It also allows developers to violate multi-threading best practices.

However, this proposal seems pretty balanced. It doesn't enforce a huge change on existing servers and the "missing" pieces are easily complemented by existing solutions (Redis being just one of them).

Personally, I implement pub/sub in iodine to "fill the gap". However, I'm not sure this should be part of the specification since other developers might wish to "fill the gap" using a different approach, such as limiting the server to a single process and using an array.

What you've proposed is an extension which fundamentally encodes a model for concurrency into Rack...

I'm not sure I understand this part.

The server can be a single threaded server, a thread-per connection server, a fiber based server and practically any type of concurrency model can be implemented.

I'm not sure I see where the PR requires a specific concurrency model.

What happens if you stick a RestClient.get into on_message - does it stall the entire server?

That really depends on the server, I guess.

What happens if I call sleep(10) while handling an HTTP request (except getting fired, that is)?

It's probably the same answer.

Iodine, for example, supports multi-threading and cluster mode (pretty much the same model as Puma). Agoo (I think) supports multi-threading.

For both of these servers I would recommend avoiding blocking calls within the thread, but they both offer some level of protection before experiencing DoS.

I don't use fibers so I'm not sure how they would react to this scenario. Hopefully they will react better.

Either way, I doubt the question of having fibers replace threads is part of the discussion here.

Whether or not you agree with it, the node model of single process, multiple connections is one that works very well...

I agree with it, but I don't see why that matters.

Even the node model needs to be adjusted when scaling horizontally.

All models have properties that define their strengths and weaknesses.

IMHO, this PR proposes a mechanism that's independent of model used by the server, making it flexible enough for everyone to implement.

@ioquatix interesting, thanks for taking the time to explain.

I love the fiber approach, as it hides away the event reactor and make things more comfortable to manage.

My main reservation is that your proposed API is incomplete.

Yes, you are right that there are concerns related to the proposed approach. It doesn't deal with IPC or client<=>client communications. It also allows developers to violate multi-threading best practices.

However, this proposal seems pretty balanced. It doesn't enforce a huge change on existing servers and the "missing" pieces are easily complemented by existing solutions (Redis being just one of them).

Personally, I implement pub/sub in iodine to "fill the gap". However, I'm not sure this should be part of the specification since other developers might wish to "fill the gap" using a different approach, such as limiting the server to a single process and using an array.

What you've proposed is an extension which fundamentally encodes a model for concurrency into Rack...

I'm not sure I understand this part.

The server can be a single threaded server, a thread-per connection server, a fiber based server and practically any type of concurrency model can be implemented.

I'm not sure I see where the PR requires a specific concurrency model.

What happens if you stick a RestClient.get into on_message - does it stall the entire server?

That really depends on the server, I guess.

What happens if I call sleep(10) while handling an HTTP request (except getting fired, that is)?

It's probably the same answer.

Iodine, for example, supports multi-threading and cluster mode (pretty much the same model as Puma). Agoo (I think) supports multi-threading.

For both of these servers I would recommend avoiding blocking calls within the thread, but they both offer some level of protection before experiencing DoS.

I don't use fibers so I'm not sure how they would react to this scenario. Hopefully they will react better.

Either way, I doubt the question of having fibers replace threads is part of the discussion here.

Whether or not you agree with it, the node model of single process, multiple connections is one that works very well...

I agree with it, but I don't see why that matters.

Even the node model needs to be adjusted when scaling horizontally.

All models have properties that define their strengths and weaknesses.

IMHO, this PR proposes a mechanism that's independent of model used by the server, making it flexible enough for everyone to implement.

@ioquatix

This comment has been minimized.

Show comment Hide comment
@ioquatix

ioquatix May 1, 2018

Contributor

However, this proposal seems pretty balanced. It doesn't enforce a huge change on existing servers and the "missing" pieces are easily complemented by existing solutions (Redis being just one of them).

How do servers support your proposal? By pulling in websocket-driver? It's a large surface area, and it does get pretty tricky. I wouldn't call it balanced - it's heavily biased towards implementing WebSockets.

I'm not sure I understand this part.
The server can be a single threaded server, a thread-per connection server, a fiber based server and practically any type of concurrency model can be implemented.
I'm not sure I see where the PR requires a specific concurrency model.

Invoking a Rack app is for the most part trivial. It's a function call that returns a result. No real model for concurrency is needed for this basic definition. You scale up by executing the function on different CPU cores, but fundamentally you can't change it to a non-linear event driven callback model or some other asynchronous model (async provides transparent inversion of control back to the reactor, but it doesn't change or require changes in flow control).

This proposal embeds non-linear flow control into the Rack spec. What I mean is, it's impossible to implement the given proposal without some kind concurrency. I'm not saying that any particular model for concurrency is being encoded, but just that by your approach a model for currency is now necessary.

This has a huge down-stream effect, since all code that depends on Rack now has to be aware of and capable of asynchronous execution. For example, how would you change rack-test to support this PR?

I think this equally applies to rack.hijack, and I also don't like that approach, but it's pretty much set in stone now. For example, how do you implement rack.hijack with HTTP/2? It's probably not possible in the general sense.

What happens if I call sleep(10) while handling an HTTP request (except getting fired, that is)?

In async provided you call task.sleep(10) the fiber defers for 10 seconds. It doesn't block the server. In puma it would block the worker thread.

Have you tried using ActiveRecord in your on_message callback? How did that scale for you? My experience is that the ActiveRecord ConnectionPool design is very poor for highly concurrent workloads. These are very tricky issues to get right (although you can do it as shown here: https://github.com/socketry/async-postgres and it does scale up pretty well).

[regarding node] I agree with it, but I don't see why that matters.
Even the node model needs to be adjusted when scaling horizontally.

Because with your proposed API, implementing a basic Node style server like this is not possible. Right from the get go you need additional machinery to do pub/sub or other kinds of communication. Even the situation with streaming responses is not improved without additional work. It's a very specific proposal designed for a very specific kind of scalability. It's far too specific IMHO.

What we need is a proposal that better aligns with HTTP/2 streams since it's clear to me that it should be the future of HTTP and Rack. It should have a clear model for concurrency that fits with the existing multiprocess/multithread/worker implementations - i.e. reading from rack.input might block the request and writing to the response body might block the request due to buffering. On top of that, as already demonstrated, you can implement highly scalable asynchronous servers.

Streaming request and response bodies in falcon directly improve the latency of existing apps with no changes. Here is an implementation of rack.input which streams (and buffers) the input: https://github.com/socketry/falcon/blob/master/lib/falcon/adapters/input.rb

But this PR requires significant changes to existing apps for any kind of benefit. Not only that, but it only supports a very specific kind of scalability with an under-specified concurrency model (i.e. what happens if you block in on_message).

Let me finish with the following question: Do even think there is a future for WebSockets? https://datatracker.ietf.org/doc/draft-hirano-httpbis-websocket-over-http2/ hasn't been touched since 2014. There is an interesting write up here: https://daniel.haxx.se/blog/2016/06/15/no-WebSockets-over-http2/ - WebSockets are something which has never been a good fit for the request/response paradigm and that's something which fundamentally underpins Rack (assuming that rack.hijack is a poorly thought out addition to the spec :).

Contributor

ioquatix commented May 1, 2018

However, this proposal seems pretty balanced. It doesn't enforce a huge change on existing servers and the "missing" pieces are easily complemented by existing solutions (Redis being just one of them).

How do servers support your proposal? By pulling in websocket-driver? It's a large surface area, and it does get pretty tricky. I wouldn't call it balanced - it's heavily biased towards implementing WebSockets.

I'm not sure I understand this part.
The server can be a single threaded server, a thread-per connection server, a fiber based server and practically any type of concurrency model can be implemented.
I'm not sure I see where the PR requires a specific concurrency model.

Invoking a Rack app is for the most part trivial. It's a function call that returns a result. No real model for concurrency is needed for this basic definition. You scale up by executing the function on different CPU cores, but fundamentally you can't change it to a non-linear event driven callback model or some other asynchronous model (async provides transparent inversion of control back to the reactor, but it doesn't change or require changes in flow control).

This proposal embeds non-linear flow control into the Rack spec. What I mean is, it's impossible to implement the given proposal without some kind concurrency. I'm not saying that any particular model for concurrency is being encoded, but just that by your approach a model for currency is now necessary.

This has a huge down-stream effect, since all code that depends on Rack now has to be aware of and capable of asynchronous execution. For example, how would you change rack-test to support this PR?

I think this equally applies to rack.hijack, and I also don't like that approach, but it's pretty much set in stone now. For example, how do you implement rack.hijack with HTTP/2? It's probably not possible in the general sense.

What happens if I call sleep(10) while handling an HTTP request (except getting fired, that is)?

In async provided you call task.sleep(10) the fiber defers for 10 seconds. It doesn't block the server. In puma it would block the worker thread.

Have you tried using ActiveRecord in your on_message callback? How did that scale for you? My experience is that the ActiveRecord ConnectionPool design is very poor for highly concurrent workloads. These are very tricky issues to get right (although you can do it as shown here: https://github.com/socketry/async-postgres and it does scale up pretty well).

[regarding node] I agree with it, but I don't see why that matters.
Even the node model needs to be adjusted when scaling horizontally.

Because with your proposed API, implementing a basic Node style server like this is not possible. Right from the get go you need additional machinery to do pub/sub or other kinds of communication. Even the situation with streaming responses is not improved without additional work. It's a very specific proposal designed for a very specific kind of scalability. It's far too specific IMHO.

What we need is a proposal that better aligns with HTTP/2 streams since it's clear to me that it should be the future of HTTP and Rack. It should have a clear model for concurrency that fits with the existing multiprocess/multithread/worker implementations - i.e. reading from rack.input might block the request and writing to the response body might block the request due to buffering. On top of that, as already demonstrated, you can implement highly scalable asynchronous servers.

Streaming request and response bodies in falcon directly improve the latency of existing apps with no changes. Here is an implementation of rack.input which streams (and buffers) the input: https://github.com/socketry/falcon/blob/master/lib/falcon/adapters/input.rb

But this PR requires significant changes to existing apps for any kind of benefit. Not only that, but it only supports a very specific kind of scalability with an under-specified concurrency model (i.e. what happens if you block in on_message).

Let me finish with the following question: Do even think there is a future for WebSockets? https://datatracker.ietf.org/doc/draft-hirano-httpbis-websocket-over-http2/ hasn't been touched since 2014. There is an interesting write up here: https://daniel.haxx.se/blog/2016/06/15/no-WebSockets-over-http2/ - WebSockets are something which has never been a good fit for the request/response paradigm and that's something which fundamentally underpins Rack (assuming that rack.hijack is a poorly thought out addition to the spec :).

SPEC
+ on_shutdown() # may be called before a connection is closed due to server shutdown.
+ on_close() # called when the connection is closed
+ on_drained # may be called when the number of pending writes drops to zero.
+ The object will be extended by the server to include:

This comment has been minimized.

Show comment Hide comment
@tenderlove

tenderlove May 1, 2018

Member

This will break method caches. I'd prefer a solution that doesn't lean on runtime method extension. Maybe something like this:

class MyWSEObject
  def initialize io
    @io = io
  end

  def on_open
    @io.write("neat")
  end
end

app = lambda do |env|
  env['rack.upgrade'] = MyWSEObject.new(env['rack.wse.object'])
end
@tenderlove

tenderlove May 1, 2018

Member

This will break method caches. I'd prefer a solution that doesn't lean on runtime method extension. Maybe something like this:

class MyWSEObject
  def initialize io
    @io = io
  end

  def on_open
    @io.write("neat")
  end
end

app = lambda do |env|
  env['rack.upgrade'] = MyWSEObject.new(env['rack.wse.object'])
end

This comment has been minimized.

Show comment Hide comment
@ioquatix

ioquatix May 1, 2018

Contributor
class MyWSEObject
  def on_open(io)
    io.write("neat")
  end
end

app = lambda do |env|
  env['rack.upgrade'] = MyWSEObject.new
end

would be simpler

@ioquatix

ioquatix May 1, 2018

Contributor
class MyWSEObject
  def on_open(io)
    io.write("neat")
  end
end

app = lambda do |env|
  env['rack.upgrade'] = MyWSEObject.new
end

would be simpler

This comment has been minimized.

Show comment Hide comment
@tenderlove

tenderlove May 1, 2018

Member

That would be fine too. 😄

@tenderlove

tenderlove May 1, 2018

Member

That would be fine too. 😄

This comment has been minimized.

Show comment Hide comment
@ioquatix

ioquatix May 1, 2018

Contributor

So, the next question, is it a good idea to write into the request env to essentially send a response? To me, that's confusing. Perhaps at best it could be something like

if upgrade = env['rack.upgrade']
  return upgrade.call(MyWSEObject.new)
end

return [400, {}, []]

Perhaps the server could signal exactly what APIs are supported as in:

if upgrade = env['rack.upgrade.web_socket'] # or rack.upgrade.event_stream
  return upgrade.call(MyWSEObject.new)
end

return [400, {}, []]
@ioquatix

ioquatix May 1, 2018

Contributor

So, the next question, is it a good idea to write into the request env to essentially send a response? To me, that's confusing. Perhaps at best it could be something like

if upgrade = env['rack.upgrade']
  return upgrade.call(MyWSEObject.new)
end

return [400, {}, []]

Perhaps the server could signal exactly what APIs are supported as in:

if upgrade = env['rack.upgrade.web_socket'] # or rack.upgrade.event_stream
  return upgrade.call(MyWSEObject.new)
end

return [400, {}, []]

This comment has been minimized.

Show comment Hide comment
@tenderlove

tenderlove May 2, 2018

Member

TBH it doesn't really matter to me. Unfortunately though, env is really the only way that middleware can communicate. So writing to env is the only way that one middleware could indicate to the next "I'm responding by writing to a websocket". If the intention is that any "callable" (middleware / app) that sets the WSE object must not call anything else in the middleware chain, then that needs to be explicitly stated in SPEC.

@tenderlove

tenderlove May 2, 2018

Member

TBH it doesn't really matter to me. Unfortunately though, env is really the only way that middleware can communicate. So writing to env is the only way that one middleware could indicate to the next "I'm responding by writing to a websocket". If the intention is that any "callable" (middleware / app) that sets the WSE object must not call anything else in the middleware chain, then that needs to be explicitly stated in SPEC.

This comment has been minimized.

Show comment Hide comment
@ohler55

ohler55 May 2, 2018

Contributor

@ioquatix having the env['rack.upgrade'] respond to some kind of registration method would be a reasonable approach in my opinion. I feel a little uncomfortable having it provide a return value since it makes it less obvious what is being returned but my feelings are not very strong on that.

@ohler55

ohler55 May 2, 2018

Contributor

@ioquatix having the env['rack.upgrade'] respond to some kind of registration method would be a reasonable approach in my opinion. I feel a little uncomfortable having it provide a return value since it makes it less obvious what is being returned but my feelings are not very strong on that.

This comment has been minimized.

Show comment Hide comment
@tenderlove

tenderlove May 2, 2018

Member

@ohler55 ya, I think passing the IO object to every callback (as @ioquatix suggested) is my preferred approach.

@tenderlove

tenderlove May 2, 2018

Member

@ohler55 ya, I think passing the IO object to every callback (as @ioquatix suggested) is my preferred approach.

This comment has been minimized.

Show comment Hide comment
@boazsegev

boazsegev May 2, 2018

For C extensions, there's double the amount of objects per connection.

Why is this?

@tenderlove , I assume you know that the C server doesn't need an IO object, it uses a simple integer to keep track of IO.

This means that a C extension just needs to keep the callback object itself and add a secret variable to that object with the socket number.

This secret variable isn't possible in Ruby, but C extensions are allowed to create them. Since that variable is a Number, it's very small (it doesn't create an Object in the ObjectSpace realm and it's immutable).

This means a single object in C (the callback object). If we add the IO object, it's two objects - a 200% increase in object count.

In Ruby, we will have the callback handler, the raw IO object (both pre-existing) and an IO wrapper, a 150% increase in object count.

@boazsegev

boazsegev May 2, 2018

For C extensions, there's double the amount of objects per connection.

Why is this?

@tenderlove , I assume you know that the C server doesn't need an IO object, it uses a simple integer to keep track of IO.

This means that a C extension just needs to keep the callback object itself and add a secret variable to that object with the socket number.

This secret variable isn't possible in Ruby, but C extensions are allowed to create them. Since that variable is a Number, it's very small (it doesn't create an Object in the ObjectSpace realm and it's immutable).

This means a single object in C (the callback object). If we add the IO object, it's two objects - a 200% increase in object count.

In Ruby, we will have the callback handler, the raw IO object (both pre-existing) and an IO wrapper, a 150% increase in object count.

This comment has been minimized.

Show comment Hide comment
@tenderlove

tenderlove May 2, 2018

Member

This means a single object in C (the callback object). If we add the IO object, it's two objects - a 200% increase in object count.

What % increase is that in an actual app?

@tenderlove

tenderlove May 2, 2018

Member

This means a single object in C (the callback object). If we add the IO object, it's two objects - a 200% increase in object count.

What % increase is that in an actual app?

This comment has been minimized.

Show comment Hide comment
@boazsegev

boazsegev May 2, 2018

What % increase is that in an actual app?

That really depends on the app's use-case.

A micro services app running a WebSocket API will get hit harder than an HTTP app with the occasional WebSocket upload manager.

I think the issue is more pronounced because we are talking about long term objects.

Short term object count doesn't effect memory fragmentation as much as the fact that there's a long-term object blocking a memory "block" / "arena".

The more long-term objects are created (and eventually destroyed), the faster the fragmentation.

Having said that, even a 200% increase in object count is probably better than the solutions we currently employ when implementing WebSockets using hijack.

I won't make an issue out of it. I'll do what everyone thinks best, but I know in very practical terms (having coded both implementations in C) that extend is cheaper both in memory and in the number of method tree traversals required.

@boazsegev

boazsegev May 2, 2018

What % increase is that in an actual app?

That really depends on the app's use-case.

A micro services app running a WebSocket API will get hit harder than an HTTP app with the occasional WebSocket upload manager.

I think the issue is more pronounced because we are talking about long term objects.

Short term object count doesn't effect memory fragmentation as much as the fact that there's a long-term object blocking a memory "block" / "arena".

The more long-term objects are created (and eventually destroyed), the faster the fragmentation.

Having said that, even a 200% increase in object count is probably better than the solutions we currently employ when implementing WebSockets using hijack.

I won't make an issue out of it. I'll do what everyone thinks best, but I know in very practical terms (having coded both implementations in C) that extend is cheaper both in memory and in the number of method tree traversals required.

@evanphx

This comment has been minimized.

Show comment Hide comment
@evanphx

evanphx May 2, 2018

Contributor

Hi all! Websockets and rack again, let's do this!

Up front, last time this came up, I prototyped it in puma (puma/puma#1054). It was an experiment we didn't merge in, but it totally works and we might revisit it.

The API of on_* methods for events is totally fine, no issue there.

Extending the object to decorate it with a write method is pretty ugly, from both an implementation statement as well as a performance one. Passing an object that implements #write to the on_* methods to write back is less code and performs better.

If frameworks want to take that object and use extend to make it's methods available on the handler, that's totally fine and up to the framework. Rack should not do that on it's own, it should provide a lower requirement.

Contributor

evanphx commented May 2, 2018

Hi all! Websockets and rack again, let's do this!

Up front, last time this came up, I prototyped it in puma (puma/puma#1054). It was an experiment we didn't merge in, but it totally works and we might revisit it.

The API of on_* methods for events is totally fine, no issue there.

Extending the object to decorate it with a write method is pretty ugly, from both an implementation statement as well as a performance one. Passing an object that implements #write to the on_* methods to write back is less code and performs better.

If frameworks want to take that object and use extend to make it's methods available on the handler, that's totally fine and up to the framework. Rack should not do that on it's own, it should provide a lower requirement.

@SamSaffron

This comment has been minimized.

Show comment Hide comment
@SamSaffron

SamSaffron May 2, 2018

Contributor

You should be able to opt to the "type" of upgrade you are after

I can already see 2 use cases that this protocol does not help at all

  1. https://github.com/discourse/discourse/blob/master/lib/hijack.rb

With the current example use of:

https://github.com/discourse/discourse/blob/master/app/controllers/user_avatars_controller.rb#L71-L78

  1. Messagebus also controls transport via: https://github.com/SamSaffron/message_bus/blob/master/lib/message_bus/client.rb and there is no clean protocol here for a "chunked encoding" http req short of SSE (which happens to fit sort of except for demanding "text/event-stream" as the content type

I would recommend

class Raw
  def type
      :raw  # or :sse or :socket
   end

  def on_open(io)
      # custom header support
      io.write_headers({  }) 
     
      # transparent chunked encoding support
      io.write_chunk("a chunk") 
  end
end

 env['rack.raw'] = Raw.new 

I am not super happy about rack.upgrade as a name cause it very tied to web sockets kind of terminology

Contributor

SamSaffron commented May 2, 2018

You should be able to opt to the "type" of upgrade you are after

I can already see 2 use cases that this protocol does not help at all

  1. https://github.com/discourse/discourse/blob/master/lib/hijack.rb

With the current example use of:

https://github.com/discourse/discourse/blob/master/app/controllers/user_avatars_controller.rb#L71-L78

  1. Messagebus also controls transport via: https://github.com/SamSaffron/message_bus/blob/master/lib/message_bus/client.rb and there is no clean protocol here for a "chunked encoding" http req short of SSE (which happens to fit sort of except for demanding "text/event-stream" as the content type

I would recommend

class Raw
  def type
      :raw  # or :sse or :socket
   end

  def on_open(io)
      # custom header support
      io.write_headers({  }) 
     
      # transparent chunked encoding support
      io.write_chunk("a chunk") 
  end
end

 env['rack.raw'] = Raw.new 

I am not super happy about rack.upgrade as a name cause it very tied to web sockets kind of terminology

@ioquatix

This comment has been minimized.

Show comment Hide comment
@ioquatix

ioquatix May 2, 2018

Contributor

Rather than env['rack.raw'] = Raw.new, why not just return [200, {}, Raw.new] and have servers do different things depending on #type.

Contributor

ioquatix commented May 2, 2018

Rather than env['rack.raw'] = Raw.new, why not just return [200, {}, Raw.new] and have servers do different things depending on #type.

@SamSaffron

This comment has been minimized.

Show comment Hide comment
@SamSaffron

SamSaffron May 2, 2018

Contributor

Another thing that makes me somewhat uneasy here is that it still suffers from the very ugly trait that deep down the middleware stack someone returns rubbish to all the rest of the pieces that then gets thrown in bin.

So... to mitigate this very bad problem I would recommend:

  # only gets called AFTER everything walked down the middleware stack. 
  def on_open(io, env, response)
      status, headers, result = response

      # custom header support
      io.write_headers({  }) 
     
      # transparent chunked encoding support
      io.write_chunk("a chunk") 
  end

Overall I am leaning toward not even mucking with rack long term for a lot of this stuff and moving to fiber servers that can pause and resume more cleanly, multiplexing tons of connections cleanly. A big blocker there though is the MRI does not let you ship fibers between threads now which is not ideal.

That said this works ... so ... yay ?https://github.com/SamSaffron/performance/blob/master/fiber_server/server.ru

Contributor

SamSaffron commented May 2, 2018

Another thing that makes me somewhat uneasy here is that it still suffers from the very ugly trait that deep down the middleware stack someone returns rubbish to all the rest of the pieces that then gets thrown in bin.

So... to mitigate this very bad problem I would recommend:

  # only gets called AFTER everything walked down the middleware stack. 
  def on_open(io, env, response)
      status, headers, result = response

      # custom header support
      io.write_headers({  }) 
     
      # transparent chunked encoding support
      io.write_chunk("a chunk") 
  end

Overall I am leaning toward not even mucking with rack long term for a lot of this stuff and moving to fiber servers that can pause and resume more cleanly, multiplexing tons of connections cleanly. A big blocker there though is the MRI does not let you ship fibers between threads now which is not ideal.

That said this works ... so ... yay ?https://github.com/SamSaffron/performance/blob/master/fiber_server/server.ru

@SamSaffron

This comment has been minimized.

Show comment Hide comment
@SamSaffron

SamSaffron May 2, 2018

Contributor

Also... one extra big question here is ... do we want to bloat all of our Rack servers out there with all this extra logic when simply adding a single dependency to middlware that implements this today is already feasible on top of hijack?

should we not start with gem install magic_generic_web_socket_support_middleware prior to pushing on implementers to all implement the same thing?

Contributor

SamSaffron commented May 2, 2018

Also... one extra big question here is ... do we want to bloat all of our Rack servers out there with all this extra logic when simply adding a single dependency to middlware that implements this today is already feasible on top of hijack?

should we not start with gem install magic_generic_web_socket_support_middleware prior to pushing on implementers to all implement the same thing?

@ioquatix

This comment has been minimized.

Show comment Hide comment
@ioquatix

ioquatix May 2, 2018

Contributor

A big blocker there though is the MRI does not let you ship fibers between threads now which is not ideal.

Fibers are actually bound to the thread that creates them by definition.

Some designs use a more generic coroutine structure (e.g. green threads) which you can move between threads. If you are interested in this, you might find https://bugs.ruby-lang.org/issues/13618 interesting.

If you are interested in Fiber based servers, check out falcon. https://github.com/socketry/falcon/blob/master/examples/beer/config.ru and https://github.com/socketry/ which is a complete stack of asynchronous components.

Also... one extra big question here is ... do we want to bloat all of our Rack servers out there with all this extra logic when simply adding a single dependency to middlware that implements this today is already feasible on top of hijack?

Yes, I strongly agree with this, and the answer IMHO is no.

Contributor

ioquatix commented May 2, 2018

A big blocker there though is the MRI does not let you ship fibers between threads now which is not ideal.

Fibers are actually bound to the thread that creates them by definition.

Some designs use a more generic coroutine structure (e.g. green threads) which you can move between threads. If you are interested in this, you might find https://bugs.ruby-lang.org/issues/13618 interesting.

If you are interested in Fiber based servers, check out falcon. https://github.com/socketry/falcon/blob/master/examples/beer/config.ru and https://github.com/socketry/ which is a complete stack of asynchronous components.

Also... one extra big question here is ... do we want to bloat all of our Rack servers out there with all this extra logic when simply adding a single dependency to middlware that implements this today is already feasible on top of hijack?

Yes, I strongly agree with this, and the answer IMHO is no.

@ohler55

This comment has been minimized.

Show comment Hide comment
@ohler55

ohler55 May 2, 2018

Contributor

@evanphx I appreciate you opinions and it is good to have a different perspective than what we had. I can't agree that the performance would be any different in one case over the other though. The other points are certainly worth considering.

Contributor

ohler55 commented May 2, 2018

@evanphx I appreciate you opinions and it is good to have a different perspective than what we had. I can't agree that the performance would be any different in one case over the other though. The other points are certainly worth considering.

@SamSaffron

This comment has been minimized.

Show comment Hide comment
@SamSaffron

SamSaffron May 2, 2018

Contributor

yes, I am across Eric's work there and hope we get something in MRI, I also think Koichi is open to having a protocol for allowing to move Fibers between threads, and I want to see proper green threads back.

Also seen Falcon, I agree that a fiber based server is very very appealing for a bunch of workloads it simplifies so much of the mess we have now with pure threaded servers and slow requests, especially cause you can walk down the middleware stack properly in the right time.

Contributor

SamSaffron commented May 2, 2018

yes, I am across Eric's work there and hope we get something in MRI, I also think Koichi is open to having a protocol for allowing to move Fibers between threads, and I want to see proper green threads back.

Also seen Falcon, I agree that a fiber based server is very very appealing for a bunch of workloads it simplifies so much of the mess we have now with pure threaded servers and slow requests, especially cause you can walk down the middleware stack properly in the right time.

@tenderlove

This comment has been minimized.

Show comment Hide comment
@tenderlove

tenderlove May 2, 2018

Member

should we not start with gem install magic_generic_web_socket_support_middleware prior to pushing on implementers to all implement the same thing?

TBH I'm OK with it if it's an opt-in. This proposal seems simple enough that someone could implement a "hijack" based approach using a middleware. The upside of formalizing it is that if webservers want to implement a high performance version of magic_generic_web_socket_support_middleware they can. I think of it as similar to X-Sendfile: yes we have a middleware that can do it, but webservers can implement an accelerated version since we have the standard.

Member

tenderlove commented May 2, 2018

should we not start with gem install magic_generic_web_socket_support_middleware prior to pushing on implementers to all implement the same thing?

TBH I'm OK with it if it's an opt-in. This proposal seems simple enough that someone could implement a "hijack" based approach using a middleware. The upside of formalizing it is that if webservers want to implement a high performance version of magic_generic_web_socket_support_middleware they can. I think of it as similar to X-Sendfile: yes we have a middleware that can do it, but webservers can implement an accelerated version since we have the standard.

@ohler55

This comment has been minimized.

Show comment Hide comment
@ohler55

ohler55 May 2, 2018

Contributor

@SamSaffron in regard to on_open taking an io, env, and repsonse. By the time on_open is called the connection has already been established. Since it is either WebSocket or SSE a response and env are no longer relevant. The intent was to allow middleware to decide if a connection shoulc be established based on the return status in the #call(env) method that asked for the upgrade.

Contributor

ohler55 commented May 2, 2018

@SamSaffron in regard to on_open taking an io, env, and repsonse. By the time on_open is called the connection has already been established. Since it is either WebSocket or SSE a response and env are no longer relevant. The intent was to allow middleware to decide if a connection shoulc be established based on the return status in the #call(env) method that asked for the upgrade.

@SamSaffron

This comment has been minimized.

Show comment Hide comment
@SamSaffron

SamSaffron May 2, 2018

Contributor

@tenderlove My issue though is that encourages a whole bunch of code duplication... say Puma takes this on now it needs to ship with a websocket protocol as a strong dependency, or worst still it will carry its own duplicate websocket protocol thingy. Chunked encoding is easy enough but there is a fair amount of code to do websocket upgrade and encoding depending on how far you want to take it and how many protocol variants you want to support.

Contributor

SamSaffron commented May 2, 2018

@tenderlove My issue though is that encourages a whole bunch of code duplication... say Puma takes this on now it needs to ship with a websocket protocol as a strong dependency, or worst still it will carry its own duplicate websocket protocol thingy. Chunked encoding is easy enough but there is a fair amount of code to do websocket upgrade and encoding depending on how far you want to take it and how many protocol variants you want to support.

@ioquatix

This comment has been minimized.

Show comment Hide comment
@ioquatix

ioquatix May 2, 2018

Contributor

I only think on_open should take an IO. It should be the responsibility of the instance to store the IO IMHO. This way you can call @io.write at any time, e.g. from a timer.

Contributor

ioquatix commented May 2, 2018

I only think on_open should take an IO. It should be the responsibility of the instance to store the IO IMHO. This way you can call @io.write at any time, e.g. from a timer.

@tenderlove

This comment has been minimized.

Show comment Hide comment
@tenderlove

tenderlove May 2, 2018

Member

I only think on_open should take an IO. It should be the responsibility of the instance to store the IO IMHO. This way you can call @io.write at any time, e.g. from a timer.

I'd rather they all take the IO object. You can save to an ivar if you want.

Member

tenderlove commented May 2, 2018

I only think on_open should take an IO. It should be the responsibility of the instance to store the IO IMHO. This way you can call @io.write at any time, e.g. from a timer.

I'd rather they all take the IO object. You can save to an ivar if you want.

@ioquatix

This comment has been minimized.

Show comment Hide comment
@ioquatix

ioquatix May 2, 2018

Contributor

I'd rather they all take the IO object.

What's the reasoning behind that? Is it more efficient?

Contributor

ioquatix commented May 2, 2018

I'd rather they all take the IO object.

What's the reasoning behind that? Is it more efficient?

@tenderlove

This comment has been minimized.

Show comment Hide comment
@tenderlove

tenderlove May 2, 2018

Member

To address @SamSaffron's "junk data" comment, I totally agree. Maybe we could use headers to indicate to the webserver that the body object is "special" (it knows websockets):

class WSBody
  def each
    yield "Webserver said it does WS, but lied"
  end

  def on_open io
  end

  # ... more on_* methods
end

# App

app = lambda do |env|
  if env['rack.supports_ws?']
    [ 200, { "X-Websocket" => "on" }, WSBody.new ]
  else
    [ 404, {}, [ "This was supposed to be a WS response" ] ]
  end
end

# Webserver

status, headers, body = app.call("rack.supports_ws?" => true)
if headers["X-Websocket"]
  # do websocket stuff with the body
  body.on_open io
else
  # do the normal `each` thing
end

I think it would eliminate questions about what to do WRT middleware (this type of API would insist that the thing responding to Websockets not forward to another middleware)

Member

tenderlove commented May 2, 2018

To address @SamSaffron's "junk data" comment, I totally agree. Maybe we could use headers to indicate to the webserver that the body object is "special" (it knows websockets):

class WSBody
  def each
    yield "Webserver said it does WS, but lied"
  end

  def on_open io
  end

  # ... more on_* methods
end

# App

app = lambda do |env|
  if env['rack.supports_ws?']
    [ 200, { "X-Websocket" => "on" }, WSBody.new ]
  else
    [ 404, {}, [ "This was supposed to be a WS response" ] ]
  end
end

# Webserver

status, headers, body = app.call("rack.supports_ws?" => true)
if headers["X-Websocket"]
  # do websocket stuff with the body
  body.on_open io
else
  # do the normal `each` thing
end

I think it would eliminate questions about what to do WRT middleware (this type of API would insist that the thing responding to Websockets not forward to another middleware)

@ohler55

This comment has been minimized.

Show comment Hide comment
@ohler55

ohler55 May 2, 2018

Contributor

@SamSaffron code duplication on the server implementation is really a matter left up to the server author. I don't think that is relevant to the proposed spec addition, which is proposed to be optional.

@ioquatix using an io object or the extended handler itself is pretty much the same in regard to some external writer loop. The example code on Agoo demonstrates that. Thats not to say that using an io object might be preferred fro other reasons though.

Contributor

ohler55 commented May 2, 2018

@SamSaffron code duplication on the server implementation is really a matter left up to the server author. I don't think that is relevant to the proposed spec addition, which is proposed to be optional.

@ioquatix using an io object or the extended handler itself is pretty much the same in regard to some external writer loop. The example code on Agoo demonstrates that. Thats not to say that using an io object might be preferred fro other reasons though.

@boazsegev

This comment has been minimized.

Show comment Hide comment
@boazsegev

boazsegev May 2, 2018

@SamSaffron

My issue though is that encourages a whole bunch of code duplication

I think code duplication is one of the concerns this PR will solve.

Rright now we have tons of applications each implementing their own WebSocket protocol (there's only one standard protocol, so historical variants aren't necessary).

On top of that, all these application have their own implementation of IO handling logic (their own reactor, nio4r or some other approach).

Consolidating the dependency in the server actually minimizes code bloat as most of the code is already there. Sure, adding the WebSocket parser might be new code, but the network and IO handling logic is already there.

@SamSaffron

My issue though is that encourages a whole bunch of code duplication

I think code duplication is one of the concerns this PR will solve.

Rright now we have tons of applications each implementing their own WebSocket protocol (there's only one standard protocol, so historical variants aren't necessary).

On top of that, all these application have their own implementation of IO handling logic (their own reactor, nio4r or some other approach).

Consolidating the dependency in the server actually minimizes code bloat as most of the code is already there. Sure, adding the WebSocket parser might be new code, but the network and IO handling logic is already there.

@tenderlove

This comment has been minimized.

Show comment Hide comment
@tenderlove

tenderlove May 2, 2018

Member

What's the reasoning behind that? Is it more efficient?

It means that you can implement a stateless object. I prefer to avoid maintaining state when possible. If the methods take the IO object as a parameter, I am allowed to implement a stateless object, but it is my choice. Insisting that only on_open takes the IO object forces me to maintain state (I no longer have a choice).

Member

tenderlove commented May 2, 2018

What's the reasoning behind that? Is it more efficient?

It means that you can implement a stateless object. I prefer to avoid maintaining state when possible. If the methods take the IO object as a parameter, I am allowed to implement a stateless object, but it is my choice. Insisting that only on_open takes the IO object forces me to maintain state (I no longer have a choice).

@SamSaffron

This comment has been minimized.

Show comment Hide comment
@SamSaffron

SamSaffron May 2, 2018

Contributor

Since it is either WebSocket or SSE a response and env are no longer relevant.

I think chunked encoding responses are just as good as a candidate, they are more widely supported than SSE anyway. In fact I can see almost zero reason to support SSE cause it is pretty much a novelty protocol. https://caniuse.com/#feat=eventsource compared to https://caniuse.com/#feat=xhr2 which is supported on IE/edge.

env is very relevant for chunked encoding you may want to add headers deep in your middleware for CORS and various things like that.

Contributor

SamSaffron commented May 2, 2018

Since it is either WebSocket or SSE a response and env are no longer relevant.

I think chunked encoding responses are just as good as a candidate, they are more widely supported than SSE anyway. In fact I can see almost zero reason to support SSE cause it is pretty much a novelty protocol. https://caniuse.com/#feat=eventsource compared to https://caniuse.com/#feat=xhr2 which is supported on IE/edge.

env is very relevant for chunked encoding you may want to add headers deep in your middleware for CORS and various things like that.

@ohler55

This comment has been minimized.

Show comment Hide comment
@ohler55

ohler55 May 2, 2018

Contributor

@tenderlove in you code example you seem to be assuming WebSocket and SSE calls are triggered by an event from the browser. Maybe I misread the code. My apologies if I did. Anyway, I think a common use case will be to push event from Ruby to the browser. Do foresee a more request response pattern initiated from the browser using WebSockets instead of using HTTP?

Contributor

ohler55 commented May 2, 2018

@tenderlove in you code example you seem to be assuming WebSocket and SSE calls are triggered by an event from the browser. Maybe I misread the code. My apologies if I did. Anyway, I think a common use case will be to push event from Ruby to the browser. Do foresee a more request response pattern initiated from the browser using WebSockets instead of using HTTP?

@tenderlove

This comment has been minimized.

Show comment Hide comment
@tenderlove

tenderlove May 2, 2018

Member

@tenderlove in you code example you seem to be assuming WebSocket and SSE calls are triggered by an event from the browser. Maybe I misread the code. My apologies if I did. Anyway, I think a common use case will be to push event from Ruby to the browser. Do foresee a more request response pattern initiated from the browser using WebSockets instead of using HTTP?

Derp, yes, sorry. It's been a long day. 😞

Member

tenderlove commented May 2, 2018

@tenderlove in you code example you seem to be assuming WebSocket and SSE calls are triggered by an event from the browser. Maybe I misread the code. My apologies if I did. Anyway, I think a common use case will be to push event from Ruby to the browser. Do foresee a more request response pattern initiated from the browser using WebSockets instead of using HTTP?

Derp, yes, sorry. It's been a long day. 😞

@ohler55

This comment has been minimized.

Show comment Hide comment
@ohler55

ohler55 May 2, 2018

Contributor

@tenderlove regarding stateless objects, I think there is some merit to what you say. The principles of the API remains the same but adding the io object does allow some additional options. It certainly doesn't hurt.

Contributor

ohler55 commented May 2, 2018

@tenderlove regarding stateless objects, I think there is some merit to what you say. The principles of the API remains the same but adding the io object does allow some additional options. It certainly doesn't hurt.

@ioquatix

This comment has been minimized.

Show comment Hide comment
@ioquatix

ioquatix May 3, 2018

Contributor

I've been mucking around with a sample implementation of this proposal.

I can implement it entirely as middleware. So, in my mind, it could be something independent of Rack. I didn't implement it exactly because I just reused what I've already done with async-websocket.

Specific feedback regarding the proposal.

With reference to https://developer.mozilla.org/en-US/docs/Web/API/WebSocket

rack.upgrade?

It's okay. It's trivial enough to implement.

It's not clear if the user should return anything. In my case, I use hijack, so no return is necessary (or desirable).

on_open

Makes sense. Do you mean the connection at the socket level or the connection at the web-socket level?

on_message(client, message)

Makes sense. Would it make more sense to call this on_text which is similar to how websocket-driver names things? (then write could be text). Alternatively, call this on_receive and have a send method?

on_shutdown(client)

Not exactly sure where this fits in. It's similar to on_close.

on_close

Do you mean when the websocket connection has been closed by the remote end only (and gracefully?) i.e. when ready state of web socket is changed to close? What about EOF?

on_drained(client)

This seems very asynchronous model dependent. I don't know how to implement this. My model is completely asynchronous so when you call write it won't return until the write is complete, but it's non-blocking. Remove this or make it completely optional (as in, it should still work if unimplemented).

write(message)

Pretty crappy name IMHO, because it directly clashes with the WebSocket::Driver implementation's use of #write (https://github.com/faye/websocket-driver-ruby#usage). If you have to wrap the underlying driver, it's another object allocation. Bleh. Rename it #send or #text?

It would be pretty damn convenient if #send would convert it to JSON first. Perhaps if Accept: application/json is set, it can be done automatically. We could also support other formats. So, user doesn't need to worry about underlying transport to send complex objects (e.g. hash, array, and so on).

close()

If user calls this, what is the sequence of events afterwards? on_close? on_shutdown? Nothing?

open?()

In distributed systems, it's hard to know if anything is truely open. But you can usually tell with certainty if something is closed. Rename to closed? - it's typical Ruby IO#closed? Socket#closed? and so on.

pending()

Again, this is very async model specific. Remove?

Summary

I still stand by my original assessment. But, I think it's an interesting idea. This proposal implements something which can be done entirely in middleware with minimal overhead if your server has a half decent concurrency model.. I don't see a clear benefit to making Rack more complex in this regard and encoding a concurrency model into the SPEC. However, I can see value in making some kind of standard layer. I just don't feel like it belongs in the server to implement it, middleware would make more sense. It's also more flexible for changes in the future. The rack.hijack API might be simple, but it does solve this problem elegantly and allows servers to expose whatever concurrency model makes sense.

I still believe streaming requests/response bodies to be the right way forward, but I'm still learning about how this all fits together with HTTP/2 and whether it's possible to use XMLHTTPRequest in a similar way to WebSockets in the browser. It's simpler to set up, multiplexes better (when running on HTTP/2), and has a simpler model - basically read chunk from stream, write chunk to stream, and it works with existing request/response model without any real changes.

It would be interesting to see if the WebSocket draft for HTTP/2 is ever implemented by anyone.

Anyway that's my 2c. Feel free to play around with the implementation :)

Contributor

ioquatix commented May 3, 2018

I've been mucking around with a sample implementation of this proposal.

I can implement it entirely as middleware. So, in my mind, it could be something independent of Rack. I didn't implement it exactly because I just reused what I've already done with async-websocket.

Specific feedback regarding the proposal.

With reference to https://developer.mozilla.org/en-US/docs/Web/API/WebSocket

rack.upgrade?

It's okay. It's trivial enough to implement.

It's not clear if the user should return anything. In my case, I use hijack, so no return is necessary (or desirable).

on_open

Makes sense. Do you mean the connection at the socket level or the connection at the web-socket level?

on_message(client, message)

Makes sense. Would it make more sense to call this on_text which is similar to how websocket-driver names things? (then write could be text). Alternatively, call this on_receive and have a send method?

on_shutdown(client)

Not exactly sure where this fits in. It's similar to on_close.

on_close

Do you mean when the websocket connection has been closed by the remote end only (and gracefully?) i.e. when ready state of web socket is changed to close? What about EOF?

on_drained(client)

This seems very asynchronous model dependent. I don't know how to implement this. My model is completely asynchronous so when you call write it won't return until the write is complete, but it's non-blocking. Remove this or make it completely optional (as in, it should still work if unimplemented).

write(message)

Pretty crappy name IMHO, because it directly clashes with the WebSocket::Driver implementation's use of #write (https://github.com/faye/websocket-driver-ruby#usage). If you have to wrap the underlying driver, it's another object allocation. Bleh. Rename it #send or #text?

It would be pretty damn convenient if #send would convert it to JSON first. Perhaps if Accept: application/json is set, it can be done automatically. We could also support other formats. So, user doesn't need to worry about underlying transport to send complex objects (e.g. hash, array, and so on).

close()

If user calls this, what is the sequence of events afterwards? on_close? on_shutdown? Nothing?

open?()

In distributed systems, it's hard to know if anything is truely open. But you can usually tell with certainty if something is closed. Rename to closed? - it's typical Ruby IO#closed? Socket#closed? and so on.

pending()

Again, this is very async model specific. Remove?

Summary

I still stand by my original assessment. But, I think it's an interesting idea. This proposal implements something which can be done entirely in middleware with minimal overhead if your server has a half decent concurrency model.. I don't see a clear benefit to making Rack more complex in this regard and encoding a concurrency model into the SPEC. However, I can see value in making some kind of standard layer. I just don't feel like it belongs in the server to implement it, middleware would make more sense. It's also more flexible for changes in the future. The rack.hijack API might be simple, but it does solve this problem elegantly and allows servers to expose whatever concurrency model makes sense.

I still believe streaming requests/response bodies to be the right way forward, but I'm still learning about how this all fits together with HTTP/2 and whether it's possible to use XMLHTTPRequest in a similar way to WebSockets in the browser. It's simpler to set up, multiplexes better (when running on HTTP/2), and has a simpler model - basically read chunk from stream, write chunk to stream, and it works with existing request/response model without any real changes.

It would be interesting to see if the WebSocket draft for HTTP/2 is ever implemented by anyone.

Anyway that's my 2c. Feel free to play around with the implementation :)

@ioquatix

This comment has been minimized.

Show comment Hide comment
@ioquatix

ioquatix May 3, 2018

Contributor

An addendum, a concurrent rack.hijack IO should be possible for most servers with very minimal surface area (it's what falcon does). It would still require some kind of rack.upgrade approach since to make a worker truely async you'd need to expose part of the concurrency model, as in "Your rack.upgrade object will be run in it's own thread with blocking IO" as the base requirement, but with more advanced servers able to provide better guarantees (e.g. rack.multithread could be false, you can assume an async like model if things are running concurrently). Again, it starts to get pretty hairy, in terms of how you define that model, and whether that should really be in Rack, but it's probably a more generic approach and it would work with HTTP/2 as well (in general, rack.hijack with an IO like API should map directly to HTTP/2 streams, and it's the next thing I want to try in falcon).. It wouldn't even preclude you from implementing web sockets on top of it with whatever you currently use and a default implementation as a middleware which really does just make a thread per request is entirely possible.

Contributor

ioquatix commented May 3, 2018

An addendum, a concurrent rack.hijack IO should be possible for most servers with very minimal surface area (it's what falcon does). It would still require some kind of rack.upgrade approach since to make a worker truely async you'd need to expose part of the concurrency model, as in "Your rack.upgrade object will be run in it's own thread with blocking IO" as the base requirement, but with more advanced servers able to provide better guarantees (e.g. rack.multithread could be false, you can assume an async like model if things are running concurrently). Again, it starts to get pretty hairy, in terms of how you define that model, and whether that should really be in Rack, but it's probably a more generic approach and it would work with HTTP/2 as well (in general, rack.hijack with an IO like API should map directly to HTTP/2 streams, and it's the next thing I want to try in falcon).. It wouldn't even preclude you from implementing web sockets on top of it with whatever you currently use and a default implementation as a middleware which really does just make a thread per request is entirely possible.

@ohler55

This comment has been minimized.

Show comment Hide comment
@ohler55

ohler55 May 3, 2018

Contributor

I appreciate you taking the PR seriously enough to devote time to exploring implementation options. Thank you. Nice summary as well. I'd like to point out that all the method are optional. If the handler does not respond to the method then it is not called.

rack.upgrade ✔️

on_open indicated the connection has beenn upgraded to WebSocket or SSE and is ready for sending messages.

on_message ✔️

on_shutdown I agree, it does not seem necessary.

on_close Yes, called when the socket is closed. Ideally gracefully but that assumes the client did a graceful close which may not be the case. Basically the socket is no longer write able and nothing else will be received from it. Since both WebSocket and SSE are message based the notion of EOF doesn't really come into play as it is a streaming concept. Best case is EOF is the same as closed or maybe another way to look at it is if the server is reading a message and gets an EOF error then it assumes the socket is closed and tells the application that the connection is classed with the on_close callback. That is the way Agoo handles it anyway.

on_drained indicates no more messages are waiting to be sent. Depending on the implementation this could indicate the previous message was sent and another can be sent or the backlog of messages has dropped to zero. It is meant as a way for the application to meter itself and detecting slow consumers. In my opinion it is nice to have to avoid slow consumer attack vectors.

write could certainly be renamed but send would conflict in the base Ruby Object #send so something different. #write is the second best choice. You seem to have agreed since you used it in Faye which of course is where the conflict lies. How about push?

close when called initiates a graceful close of the connection. Once closed the on_close callback should be called.

open? flipping the call to closed? makes sense. Agreed.

pending is async specific. It could be removed and everything would still be fine. It was added to address the slow client issues and allow and application to address slow client attacks. How about return some value that indicates it is not supported?

Thanks again for the well thought out response.

Contributor

ohler55 commented May 3, 2018

I appreciate you taking the PR seriously enough to devote time to exploring implementation options. Thank you. Nice summary as well. I'd like to point out that all the method are optional. If the handler does not respond to the method then it is not called.

rack.upgrade ✔️

on_open indicated the connection has beenn upgraded to WebSocket or SSE and is ready for sending messages.

on_message ✔️

on_shutdown I agree, it does not seem necessary.

on_close Yes, called when the socket is closed. Ideally gracefully but that assumes the client did a graceful close which may not be the case. Basically the socket is no longer write able and nothing else will be received from it. Since both WebSocket and SSE are message based the notion of EOF doesn't really come into play as it is a streaming concept. Best case is EOF is the same as closed or maybe another way to look at it is if the server is reading a message and gets an EOF error then it assumes the socket is closed and tells the application that the connection is classed with the on_close callback. That is the way Agoo handles it anyway.

on_drained indicates no more messages are waiting to be sent. Depending on the implementation this could indicate the previous message was sent and another can be sent or the backlog of messages has dropped to zero. It is meant as a way for the application to meter itself and detecting slow consumers. In my opinion it is nice to have to avoid slow consumer attack vectors.

write could certainly be renamed but send would conflict in the base Ruby Object #send so something different. #write is the second best choice. You seem to have agreed since you used it in Faye which of course is where the conflict lies. How about push?

close when called initiates a graceful close of the connection. Once closed the on_close callback should be called.

open? flipping the call to closed? makes sense. Agreed.

pending is async specific. It could be removed and everything would still be fine. It was added to address the slow client issues and allow and application to address slow client attacks. How about return some value that indicates it is not supported?

Thanks again for the well thought out response.

@ohler55

This comment has been minimized.

Show comment Hide comment
@ohler55

ohler55 May 3, 2018

Contributor

I like the idea of using rack.multithread as an indicator to the application as to what it can expect in regard to the async model especially since it is already part of the spec and seems to be a perfect fit.

Contributor

ohler55 commented May 3, 2018

I like the idea of using rack.multithread as an indicator to the application as to what it can expect in regard to the async model especially since it is already part of the spec and seems to be a perfect fit.

@ioquatix

This comment has been minimized.

Show comment Hide comment
@ioquatix

ioquatix May 3, 2018

Contributor

write could certainly be renamed but send would conflict in the base Ruby Object #send so something different. #write is the second best choice. You seem to have agreed since you used it in Faye which of course is where the conflict lies. How about push?

Don't get me started with send and Ruby. I had no end of problems with it several years ago, when I first started playing with Ruby's IO. Fortunately, now __send__ exist in Ruby. I sometimes wonder if we are writing Python :p

To me, send/recv are socket level, but they resonate for higher level network IO too.

read/write make me think of buffered IO.

As this is a message based protocol, and you chose on_message, why not then send_message?

pending and on_drained ...

I understand where you are coming from.

In my client code, the following happens:

connection.send_message(....) -> @driver.text -> @socket.write(protocol level data) -> Socket#write_nonblock -> okay, :wait_writable or EPIPE
connection.send_message(....) -> @driver.text -> @socket.write(protocol level data) -> Socket#write_nonblock -> okay, :wait_writable or EPIPE
connection.send_message(....) -> @driver.text -> @socket.write(protocol level data) -> Socket#write_nonblock -> okay, :wait_writable or EPIPE

You literally can't send messages faster than they would be written to the socket, because you'll yield on :wait_writable and select(io, writable). Once the data is written to OS socket, how do you know pending or not? I mean, eventually that buffer will fill up. But do we get receipt of messages in WebSockets? I don't know enough about it, I only used websocket-driver and left the rest up to that gem. Can we get pending acknowledged count from that implementation?

Contributor

ioquatix commented May 3, 2018

write could certainly be renamed but send would conflict in the base Ruby Object #send so something different. #write is the second best choice. You seem to have agreed since you used it in Faye which of course is where the conflict lies. How about push?

Don't get me started with send and Ruby. I had no end of problems with it several years ago, when I first started playing with Ruby's IO. Fortunately, now __send__ exist in Ruby. I sometimes wonder if we are writing Python :p

To me, send/recv are socket level, but they resonate for higher level network IO too.

read/write make me think of buffered IO.

As this is a message based protocol, and you chose on_message, why not then send_message?

pending and on_drained ...

I understand where you are coming from.

In my client code, the following happens:

connection.send_message(....) -> @driver.text -> @socket.write(protocol level data) -> Socket#write_nonblock -> okay, :wait_writable or EPIPE
connection.send_message(....) -> @driver.text -> @socket.write(protocol level data) -> Socket#write_nonblock -> okay, :wait_writable or EPIPE
connection.send_message(....) -> @driver.text -> @socket.write(protocol level data) -> Socket#write_nonblock -> okay, :wait_writable or EPIPE

You literally can't send messages faster than they would be written to the socket, because you'll yield on :wait_writable and select(io, writable). Once the data is written to OS socket, how do you know pending or not? I mean, eventually that buffer will fill up. But do we get receipt of messages in WebSockets? I don't know enough about it, I only used websocket-driver and left the rest up to that gem. Can we get pending acknowledged count from that implementation?

@ioquatix

This comment has been minimized.

Show comment Hide comment
@ioquatix

ioquatix May 3, 2018

Contributor

In an implementation where messages are buffered some how, where do you propose to handle EPIPE? Or should it never occur and on_close should be invoked? or do we need on_error? To me, implementing something like on_error to handle EPIPE?

If your concurrency model is more event driven, I can imagine you would buffer the message and actual writing might occur on a different call stack.

Contributor

ioquatix commented May 3, 2018

In an implementation where messages are buffered some how, where do you propose to handle EPIPE? Or should it never occur and on_close should be invoked? or do we need on_error? To me, implementing something like on_error to handle EPIPE?

If your concurrency model is more event driven, I can imagine you would buffer the message and actual writing might occur on a different call stack.

@ioquatix

This comment has been minimized.

Show comment Hide comment
@ioquatix

ioquatix May 3, 2018

Contributor

In my implementation, when you have a loop like while event = @driver.next_event, if you receive EOF, event is nil. It's not an exception. Then, it would call on_close. However, sometimes it might be caused by ECONNRESET. This is really an error condition. should it be on_close(error = nil)?

but if you call send_message, and remote end is gone, you make get EPIPE, Errno::ECONNRESET, or on macOS even EPROTOTYPE (https://bugs.ruby-lang.org/issues/14713). It will blow up in user's code.

Contributor

ioquatix commented May 3, 2018

In my implementation, when you have a loop like while event = @driver.next_event, if you receive EOF, event is nil. It's not an exception. Then, it would call on_close. However, sometimes it might be caused by ECONNRESET. This is really an error condition. should it be on_close(error = nil)?

but if you call send_message, and remote end is gone, you make get EPIPE, Errno::ECONNRESET, or on macOS even EPROTOTYPE (https://bugs.ruby-lang.org/issues/14713). It will blow up in user's code.

@ohler55

This comment has been minimized.

Show comment Hide comment
@ohler55

ohler55 May 3, 2018

Contributor

You bring up some interesting points. For Agoo the write are indeed on a different call stack but I like the idea of providing more information to the application about the reason for a close. Maybe the callback should be on_close(err) where err is nil for a graceful close. Yes, we are thinking along the same lines,

I would expect a write to fail with some kind of exception if the connection has been closed. Agoo raises an IOError.

Contributor

ohler55 commented May 3, 2018

You bring up some interesting points. For Agoo the write are indeed on a different call stack but I like the idea of providing more information to the application about the reason for a close. Maybe the callback should be on_close(err) where err is nil for a graceful close. Yes, we are thinking along the same lines,

I would expect a write to fail with some kind of exception if the connection has been closed. Agoo raises an IOError.

@boazsegev

This comment has been minimized.

Show comment Hide comment
@boazsegev

boazsegev May 3, 2018

@ioquatix, it is my humble understanding that this specification allows us to abstract away the network layer as much as we can.

This allows all persistent connections to share the same API regardless of their type (WebSockets, SSE, raw TCP/IP, etc').

It's my understanding that EOF and other network details should be handled by the server. They aren't application related events.

The reverse is also true. JSON is as application concern, it doesn't fit every persistent connection out there (i.e., binary transmissions in WebSocket data).

It's my recommendation that the server require a String object and let the application deal with making sure the data formatted as a String.

Network details could be inferred by the server (UTF-8 for text encoded messages and otherwise the server should probably assume binary data, which some connection objects might not support).

on_shutdown

on_shutdown I agree, it does not seem necessary.

I humbly disagree.

Maybe I'm assuming but I think applications can benefit from the knowledge that the client is closing because the server is shutting down.

There are three possible disconnection causes:

  1. The connection is deemed unnecessary (this could be a proxy timeout due to inactivity).
  2. The server shutting down (the client might be mid transmission).
  3. Network errors (which we can do nothing about).

True, on_shutdown is only called when the server is shutting down (not for every connection), but it's a good way to prevent communication errors and keep the disconnection polite - especially when using server clusters or load balancing.

This could be use by load-balanced micro-services as well to indicate a node is going away (and perhaps send a list of alternate nodes in a gossip based system).

on_drained / pending()

@ioquatix , since you're using a blocking model, where the method returns only after write was complete, pending can easily return 0 - no pending write, indicates that on_drained isn't about to be called (maybe ever).

Do note that this blocking design might be risky, if - for example - a DataBase connection was checked out of a connection pool or the write method was called within a critical section (i.e., looping over all clients).

write

write - Pretty crappy name IMHO...

Actually, just because other implementations use this amazing name (which is also the name of an underlying system call), doesn't mean that the specification should choose a poor name...

...in fact, it just supports the choice as a good one - developers will have a good idea as to what happens when they call this method.

IMHO send conflicts with the Ruby object send which might prevent an error from being detected (i.e., if send is called on another object, it will not return a "missing method write, but rather a long gibberish message).

push seems (IMHO) less intuitive and more protocol specific (SSE). It also implies objects rather than data.

IMHO, write describes exactly what's happening. It might behave slightly differently from an IO write (no partial write allowed), but it performs the same function.

open?

open? flipping the call to closed? makes sense. Agreed.

Actually I think that closed? will make application code harder to maintain. It's too easy to mixup close for closed? and vice verses.

Using open? is safer, even if the server's information regarding the connection is never fail proof.

on_close(err) / on_error(err)

I really think network / connection errors aren't an application concern, they are a server concern.

However, if we are to report an error, I would wonder if it would make sense to place the error information in a different callback, preventing the need for the obvious if(err) block of code in on_close.

boazsegev commented May 3, 2018

@ioquatix, it is my humble understanding that this specification allows us to abstract away the network layer as much as we can.

This allows all persistent connections to share the same API regardless of their type (WebSockets, SSE, raw TCP/IP, etc').

It's my understanding that EOF and other network details should be handled by the server. They aren't application related events.

The reverse is also true. JSON is as application concern, it doesn't fit every persistent connection out there (i.e., binary transmissions in WebSocket data).

It's my recommendation that the server require a String object and let the application deal with making sure the data formatted as a String.

Network details could be inferred by the server (UTF-8 for text encoded messages and otherwise the server should probably assume binary data, which some connection objects might not support).

on_shutdown

on_shutdown I agree, it does not seem necessary.

I humbly disagree.

Maybe I'm assuming but I think applications can benefit from the knowledge that the client is closing because the server is shutting down.

There are three possible disconnection causes:

  1. The connection is deemed unnecessary (this could be a proxy timeout due to inactivity).
  2. The server shutting down (the client might be mid transmission).
  3. Network errors (which we can do nothing about).

True, on_shutdown is only called when the server is shutting down (not for every connection), but it's a good way to prevent communication errors and keep the disconnection polite - especially when using server clusters or load balancing.

This could be use by load-balanced micro-services as well to indicate a node is going away (and perhaps send a list of alternate nodes in a gossip based system).

on_drained / pending()

@ioquatix , since you're using a blocking model, where the method returns only after write was complete, pending can easily return 0 - no pending write, indicates that on_drained isn't about to be called (maybe ever).

Do note that this blocking design might be risky, if - for example - a DataBase connection was checked out of a connection pool or the write method was called within a critical section (i.e., looping over all clients).

write

write - Pretty crappy name IMHO...

Actually, just because other implementations use this amazing name (which is also the name of an underlying system call), doesn't mean that the specification should choose a poor name...

...in fact, it just supports the choice as a good one - developers will have a good idea as to what happens when they call this method.

IMHO send conflicts with the Ruby object send which might prevent an error from being detected (i.e., if send is called on another object, it will not return a "missing method write, but rather a long gibberish message).

push seems (IMHO) less intuitive and more protocol specific (SSE). It also implies objects rather than data.

IMHO, write describes exactly what's happening. It might behave slightly differently from an IO write (no partial write allowed), but it performs the same function.

open?

open? flipping the call to closed? makes sense. Agreed.

Actually I think that closed? will make application code harder to maintain. It's too easy to mixup close for closed? and vice verses.

Using open? is safer, even if the server's information regarding the connection is never fail proof.

on_close(err) / on_error(err)

I really think network / connection errors aren't an application concern, they are a server concern.

However, if we are to report an error, I would wonder if it would make sense to place the error information in a different callback, preventing the need for the obvious if(err) block of code in on_close.

@boazsegev

This comment has been minimized.

Show comment Hide comment
@boazsegev

boazsegev May 3, 2018

@ioquatix

In my implementation, when you have a loop like while event = @driver.next_event, if you receive EOF, event is nil. It's not an exception. Then, it would call on_close. However, sometimes it might be caused by ECONNRESET. This is really an error condition. should it be on_close(error = nil)?

ECONNRESET is a perfect example for an error that the application really shouldn't receive. Why should the application have any information about network errors? What would it do with it?

IMHO, this is a Server related concern and it should be handled by the server.

@ioquatix

In my implementation, when you have a loop like while event = @driver.next_event, if you receive EOF, event is nil. It's not an exception. Then, it would call on_close. However, sometimes it might be caused by ECONNRESET. This is really an error condition. should it be on_close(error = nil)?

ECONNRESET is a perfect example for an error that the application really shouldn't receive. Why should the application have any information about network errors? What would it do with it?

IMHO, this is a Server related concern and it should be handled by the server.

@ohler55

This comment has been minimized.

Show comment Hide comment
@ohler55

ohler55 May 3, 2018

Contributor

I have less of an issue passing some error back to the application with on_close. The application can't do anything about it but it isn't harmful either.

@boazsegev you make some good points I was a little too willing to compromise on.

I'm not sure where we go from here. I assume we need a 'member' to step in again. How about it @tenderlove , how should we proceed on this PR?

Contributor

ohler55 commented May 3, 2018

I have less of an issue passing some error back to the application with on_close. The application can't do anything about it but it isn't harmful either.

@boazsegev you make some good points I was a little too willing to compromise on.

I'm not sure where we go from here. I assume we need a 'member' to step in again. How about it @tenderlove , how should we proceed on this PR?

@boazsegev

This comment has been minimized.

Show comment Hide comment
@boazsegev

boazsegev May 3, 2018

@ioquatix

In my client code, the following happens...

You literally can't send messages faster than they would be written to the socket...
... Once the data is written to OS socket, how do you know pending or not?

I think the question isn't about data pending after performing write to the socket. It's about data pending before write is performed (if applicable).

In your (fiber-per-connection) model write always completes. But consider the node.js model (which iodine emulates in this regard):

  • An application calls connection.write.

  • The server places the data in the connection's queue and adds a flush_connection event to the event loop.

  • connection.write returns.

    Note that no network actions were performed. At this point pending() would probably return 1 (or perhaps a higher value).

  • The application code ends and the server handles the "flush" event, flushing the connection's buffer using the system call write (or socket.write).

  • Once all the data was written to the socket (which might require a number of events) the connection's on_drained event is scheduled.

    This doesn't guarantee that the data actually reached the browser/client (a network error might have occurred, the OS might be waiting on a timeout).

    This only guarantees that the server won't call write (or sock.write) until more data is scheduled to be sent - the data is out of the server's control.

@ioquatix

In my client code, the following happens...

You literally can't send messages faster than they would be written to the socket...
... Once the data is written to OS socket, how do you know pending or not?

I think the question isn't about data pending after performing write to the socket. It's about data pending before write is performed (if applicable).

In your (fiber-per-connection) model write always completes. But consider the node.js model (which iodine emulates in this regard):

  • An application calls connection.write.

  • The server places the data in the connection's queue and adds a flush_connection event to the event loop.

  • connection.write returns.

    Note that no network actions were performed. At this point pending() would probably return 1 (or perhaps a higher value).

  • The application code ends and the server handles the "flush" event, flushing the connection's buffer using the system call write (or socket.write).

  • Once all the data was written to the socket (which might require a number of events) the connection's on_drained event is scheduled.

    This doesn't guarantee that the data actually reached the browser/client (a network error might have occurred, the OS might be waiting on a timeout).

    This only guarantees that the server won't call write (or sock.write) until more data is scheduled to be sent - the data is out of the server's control.

@ohler55

This comment has been minimized.

Show comment Hide comment
@ohler55

ohler55 May 3, 2018

Contributor

Agoo uses a similar approach.

Contributor

ohler55 commented May 3, 2018

Agoo uses a similar approach.

@boazsegev

This comment has been minimized.

Show comment Hide comment
@boazsegev

boazsegev May 4, 2018

write - exception vs. true/false?

Reading through the thread I noticed how hard it might be to predict a write failing.

This means that user code must use begin/rescue blocks to address network issues.

Since a failing write is a predictable and possible result in the normal flow, wouldn't it be better if the server handled any exceptions and simply returned true if the data was (or scheduled to be) written to the socket and false otherwise?

I think this will improve performance and also prevent the obvious nesting of exception handling code (on the server and the app).

Thoughts?

write - exception vs. true/false?

Reading through the thread I noticed how hard it might be to predict a write failing.

This means that user code must use begin/rescue blocks to address network issues.

Since a failing write is a predictable and possible result in the normal flow, wouldn't it be better if the server handled any exceptions and simply returned true if the data was (or scheduled to be) written to the socket and false otherwise?

I think this will improve performance and also prevent the obvious nesting of exception handling code (on the server and the app).

Thoughts?

@ohler55

This comment has been minimized.

Show comment Hide comment
@ohler55

ohler55 May 4, 2018

Contributor

I had assumed raising an exception was the natural behavior but maybe that is not the best approach. Even returning true or false does not necessarily indicate the message was delivered. I think the real question is, what information would be useful to the app. Maybe nothing. Callbacks exist for closing which lets the app know writes will no longer be delivered. Is an acknowledged protocol desired on top of WebSockets? I think that might be a step too far.

Contributor

ohler55 commented May 4, 2018

I had assumed raising an exception was the natural behavior but maybe that is not the best approach. Even returning true or false does not necessarily indicate the message was delivered. I think the real question is, what information would be useful to the app. Maybe nothing. Callbacks exist for closing which lets the app know writes will no longer be delivered. Is an acknowledged protocol desired on top of WebSockets? I think that might be a step too far.

@ioquatix

This comment has been minimized.

Show comment Hide comment
@ioquatix

ioquatix May 4, 2018

Contributor

I think the next steps are:

  • To finish several implementations of the proposed spec, this will expose any obvious design issues with the spec.
  • To implement several practical applications, and test interop (including periodic/timer events, pub/sub, and upstream interaction, e.g. database/redis) - if we can't make useful applications that work on different servers (i.e. if you start relying on iodine's pub/sub which can't be implemented in puma, it's going to be a problem.
  • To prepare a finalised spec which takes into account the above and then have a formal vote, probably only involving actual stake-holders: Ruby server implementors and Rack contributors.
Contributor

ioquatix commented May 4, 2018

I think the next steps are:

  • To finish several implementations of the proposed spec, this will expose any obvious design issues with the spec.
  • To implement several practical applications, and test interop (including periodic/timer events, pub/sub, and upstream interaction, e.g. database/redis) - if we can't make useful applications that work on different servers (i.e. if you start relying on iodine's pub/sub which can't be implemented in puma, it's going to be a problem.
  • To prepare a finalised spec which takes into account the above and then have a formal vote, probably only involving actual stake-holders: Ruby server implementors and Rack contributors.
@ohler55

This comment has been minimized.

Show comment Hide comment
@ohler55

ohler55 May 4, 2018

Contributor

There are currently two implementations of the originally proposed spec, Agoo and Iodine. I'd think that if proof that the spec is workable.

Waiting for several applications to be implemented against a spec that has not been approved seems rather unlikely. Does that pretty much means not changes can be made to the spec because there is no spec to follow. Rather circular logic.

So the purpose of this PR was to refine the proposal and then get a vote which would be result in this PR being either approved or rejected. How do we get to that phase?

Contributor

ohler55 commented May 4, 2018

There are currently two implementations of the originally proposed spec, Agoo and Iodine. I'd think that if proof that the spec is workable.

Waiting for several applications to be implemented against a spec that has not been approved seems rather unlikely. Does that pretty much means not changes can be made to the spec because there is no spec to follow. Rather circular logic.

So the purpose of this PR was to refine the proposal and then get a vote which would be result in this PR being either approved or rejected. How do we get to that phase?

@ioquatix

This comment has been minimized.

Show comment Hide comment
@ioquatix

ioquatix May 4, 2018

Contributor

Waiting for several applications to be implemented against a spec that has not been approved seems rather unlikely.

I don't think so. Making some sample apps is a great way to document the API, provides a great starting point for new users, etc. And finally, if the sample apps can't work between servers, what's the point of a shared SPEC?

Contributor

ioquatix commented May 4, 2018

Waiting for several applications to be implemented against a spec that has not been approved seems rather unlikely.

I don't think so. Making some sample apps is a great way to document the API, provides a great starting point for new users, etc. And finally, if the sample apps can't work between servers, what's the point of a shared SPEC?

@ioquatix

This comment has been minimized.

Show comment Hide comment
@ioquatix

ioquatix May 4, 2018

Contributor

There are currently two implementations of the originally proposed spec, Agoo and Iodine. I'd think that if proof that the spec is workable.

I think we need to see a working implementation in at least Puma & Passenger as a baseline since that is what is very common for Ruby development and production.

Contributor

ioquatix commented May 4, 2018

There are currently two implementations of the originally proposed spec, Agoo and Iodine. I'd think that if proof that the spec is workable.

I think we need to see a working implementation in at least Puma & Passenger as a baseline since that is what is very common for Ruby development and production.

@ohler55

This comment has been minimized.

Show comment Hide comment
@ohler55

ohler55 May 4, 2018

Contributor

I believe the suggested bar was several practical applications. There are already sample applications in the Agoo and Iodine repositories.

Contributor

ohler55 commented May 4, 2018

I believe the suggested bar was several practical applications. There are already sample applications in the Agoo and Iodine repositories.

@ioquatix

This comment has been minimized.

Show comment Hide comment
@ioquatix

ioquatix May 4, 2018

Contributor

To implement several practical applications

Something more practical than "Hello World". For example, a shared real-time Todo list :p

I think we should be involving some database activity in the callback, probably using ActiveRecord or Redis.

If you've already got several sample applications, once we finalise a draft SPEC and we all implement it, we should be able to all run those apps on our respective servers, right? That's the whole point of having a shared SPEC.

Contributor

ioquatix commented May 4, 2018

To implement several practical applications

Something more practical than "Hello World". For example, a shared real-time Todo list :p

I think we should be involving some database activity in the callback, probably using ActiveRecord or Redis.

If you've already got several sample applications, once we finalise a draft SPEC and we all implement it, we should be able to all run those apps on our respective servers, right? That's the whole point of having a shared SPEC.

@matthewd

This comment has been minimized.

Show comment Hide comment
@matthewd

matthewd May 4, 2018

Member

I still don't see a statement about whether write can block.

I'm also unconvinced that pending/drained is an effective solution for non-blocking writes.

I remain of the opinion that this should remain a draft / small-e extension while real-world implementations adopt it and fully exercise it.

Easy-mode for convincing me would be a PR to Rails adapting Action Cable to this API -- which would also allow like-for-like benchmarking.

Member

matthewd commented May 4, 2018

I still don't see a statement about whether write can block.

I'm also unconvinced that pending/drained is an effective solution for non-blocking writes.

I remain of the opinion that this should remain a draft / small-e extension while real-world implementations adopt it and fully exercise it.

Easy-mode for convincing me would be a PR to Rails adapting Action Cable to this API -- which would also allow like-for-like benchmarking.

@boazsegev

This comment has been minimized.

Show comment Hide comment
@boazsegev

boazsegev May 4, 2018

@matthewd ,

Although I understand the suggestion (adding a Rails PR), I find that quite unfair.

I can program a server in C and I can wrap in a gem for Ruby MRI... but I know nothing about ActionCable, it's inner workings or the Client<=>Application protocol used above the WebSocket layer.

The learning curve and time spent on a Rails patch would only be viable if the PR is conceptually acceptable and we all intend to move forward with the proposal and implement it.

I really doubt if pushing a PR to Rails is something that should happen before we agree that the PR is conceptually acceptable (i.e., contingent on benchmarks, or whatever).

As a side note, I also don't think this should enter a Rails 5.x update release. This PR has the potential to retire the Rails IO handling layer (the one based on nio4r) and reduce the existing codebase significantly - at least where WebSockets and SSE are concerned (and long polling could be achieved by SSE Javascript's polyfills, retiring hijack altogether).

boazsegev commented May 4, 2018

@matthewd ,

Although I understand the suggestion (adding a Rails PR), I find that quite unfair.

I can program a server in C and I can wrap in a gem for Ruby MRI... but I know nothing about ActionCable, it's inner workings or the Client<=>Application protocol used above the WebSocket layer.

The learning curve and time spent on a Rails patch would only be viable if the PR is conceptually acceptable and we all intend to move forward with the proposal and implement it.

I really doubt if pushing a PR to Rails is something that should happen before we agree that the PR is conceptually acceptable (i.e., contingent on benchmarks, or whatever).

As a side note, I also don't think this should enter a Rails 5.x update release. This PR has the potential to retire the Rails IO handling layer (the one based on nio4r) and reduce the existing codebase significantly - at least where WebSockets and SSE are concerned (and long polling could be achieved by SSE Javascript's polyfills, retiring hijack altogether).

@boazsegev

This comment has been minimized.

Show comment Hide comment
@boazsegev

boazsegev May 4, 2018

I still don't see a statement about whether write can block.

@matthewd , I assumed (since there's no statement) that it would be implementation defined.

I'm assuming that blocking servers will be avoided by applications that are effected by the behavior.

But I totally understand if you believe there should be a concurrency model requirement regarding write and close not blocking. After all, if write blocks, pub/sub performance might grind to a halt.

I still don't see a statement about whether write can block.

@matthewd , I assumed (since there's no statement) that it would be implementation defined.

I'm assuming that blocking servers will be avoided by applications that are effected by the behavior.

But I totally understand if you believe there should be a concurrency model requirement regarding write and close not blocking. After all, if write blocks, pub/sub performance might grind to a halt.

SPEC
+ on_drained(client) # may be called when the number of pending writes drops to zero.
+ The <tt>client</tt> is used for writing and checking status of the upgraded connect. It has these methods.
+ write(message) # writes to the WebSocket or SSE connection
+ close() # forces a close of the WebSocket or SSE connection

This comment has been minimized.

Show comment Hide comment
@boazsegev

boazsegev May 4, 2018

After reading through @ioquatix and @matthewd 's comments, I wonder:

Perhaps close() should "schedule the connection to close once all pending write calls have been performed"?

@boazsegev

boazsegev May 4, 2018

After reading through @ioquatix and @matthewd 's comments, I wonder:

Perhaps close() should "schedule the connection to close once all pending write calls have been performed"?

This comment has been minimized.

Show comment Hide comment
@ioquatix

ioquatix May 4, 2018

Contributor

If you want to have that, implement it as close_write which is what Ruby calls shutdown(SHUT_WR). close should guarantee after returning that the underlying socket is closed. Otherwise, you are in for a world of pain.

@ioquatix

ioquatix May 4, 2018

Contributor

If you want to have that, implement it as close_write which is what Ruby calls shutdown(SHUT_WR). close should guarantee after returning that the underlying socket is closed. Otherwise, you are in for a world of pain.

This comment has been minimized.

Show comment Hide comment
@ohler55

ohler55 May 4, 2018

Contributor

Agoo treats a cal to close as a request just like a write and places the close on the queue.

@ohler55

ohler55 May 4, 2018

Contributor

Agoo treats a cal to close as a request just like a write and places the close on the queue.

This comment has been minimized.

Show comment Hide comment
@boazsegev

boazsegev May 4, 2018

@ohler55, so does iodine. I place a "close" marker on the packet queue, so close is always performed after all scheduled data is sent.

@ioquatix , I think your missing the point of abstracting away the network and the protocol.

My suggestion was about clarifying this little bit, not changing what both iodine and agoo already implement.

We aren't authoring a network layer. We are authoring an abstracted application side API.

The reasonable exception is that write is performed before close. i.e., if my code is:

write "hello"
close

The reasonable expectation is that "hello" is actually written.

There's no force_close or close_write in the specification because the application shouldn't be concerned with these things. If the application doesn't want data written, it can avoid writing it.

@boazsegev

boazsegev May 4, 2018

@ohler55, so does iodine. I place a "close" marker on the packet queue, so close is always performed after all scheduled data is sent.

@ioquatix , I think your missing the point of abstracting away the network and the protocol.

My suggestion was about clarifying this little bit, not changing what both iodine and agoo already implement.

We aren't authoring a network layer. We are authoring an abstracted application side API.

The reasonable exception is that write is performed before close. i.e., if my code is:

write "hello"
close

The reasonable expectation is that "hello" is actually written.

There's no force_close or close_write in the specification because the application shouldn't be concerned with these things. If the application doesn't want data written, it can avoid writing it.

This comment has been minimized.

Show comment Hide comment
@ioquatix

ioquatix May 4, 2018

Contributor

You can make close call flush, or you can flush after every write. But you make close call flush, you better be careful about EPIPE.

@ioquatix

ioquatix May 4, 2018

Contributor

You can make close call flush, or you can flush after every write. But you make close call flush, you better be careful about EPIPE.

This comment has been minimized.

Show comment Hide comment
@ioquatix

ioquatix May 4, 2018

Contributor

For a high level protocol like this, calling flush after each write would make sense to me.

It provides the user with a strong expectation, that after calling write, the data has been sent, and pending a network failure, would arrive, or else the write fails right then, with, say, EPIPE. Otherwise you'll just end up with a spaghetti state machine trying to handle all these conditions.

@ioquatix

ioquatix May 4, 2018

Contributor

For a high level protocol like this, calling flush after each write would make sense to me.

It provides the user with a strong expectation, that after calling write, the data has been sent, and pending a network failure, would arrive, or else the write fails right then, with, say, EPIPE. Otherwise you'll just end up with a spaghetti state machine trying to handle all these conditions.

This comment has been minimized.

Show comment Hide comment
@boazsegev

boazsegev May 4, 2018

@ioquatix ,

I hope I don't seem too blunt or crude. I very much appreciate the interest and willingness to polish the specification and make it both as clear and as practical as can be.

However, please consider the model to be totally separate from the network - there is no network. There's only this API.

We can change the API if we need features, but we don't expose network bits or logic because part of our job is to abstract these things away - so there is no network, there is no protocol (as much as possible).

In this sense, flush doesn't exist. It's a network / server detail that the application never sees, abstracted away by the server.

The closest an application can come to ask about these things is to ask about all the pending outgoing write events that haven't yet completed. This allows an application to know if the on_drain callback is somewhere in their future.

The pending query doesn't to expose the network, it exposes the progress of existing API calls. This provides important information about the possibility of a slow client (or a slow clearing "queue") allowing an application to stop serving a resource hungry "client".

@boazsegev

boazsegev May 4, 2018

@ioquatix ,

I hope I don't seem too blunt or crude. I very much appreciate the interest and willingness to polish the specification and make it both as clear and as practical as can be.

However, please consider the model to be totally separate from the network - there is no network. There's only this API.

We can change the API if we need features, but we don't expose network bits or logic because part of our job is to abstract these things away - so there is no network, there is no protocol (as much as possible).

In this sense, flush doesn't exist. It's a network / server detail that the application never sees, abstracted away by the server.

The closest an application can come to ask about these things is to ask about all the pending outgoing write events that haven't yet completed. This allows an application to know if the on_drain callback is somewhere in their future.

The pending query doesn't to expose the network, it exposes the progress of existing API calls. This provides important information about the possibility of a slow client (or a slow clearing "queue") allowing an application to stop serving a resource hungry "client".

This comment has been minimized.

Show comment Hide comment
@ohler55

ohler55 May 4, 2018

Contributor

If I understand your response you are answering yes to the question of whether or not a flush method should be added. Then in any application you wrote you would block until the write completes instead of making use of the on_drained callback. That is your choice of course.

@ohler55

ohler55 May 4, 2018

Contributor

If I understand your response you are answering yes to the question of whether or not a flush method should be added. Then in any application you wrote you would block until the write completes instead of making use of the on_drained callback. That is your choice of course.

This comment has been minimized.

Show comment Hide comment
@ioquatix

ioquatix May 4, 2018

Contributor

However, please consider the model to be totally separate from the network - there is no network. There's only this API.

Fair enough.

@ioquatix

ioquatix May 4, 2018

Contributor

However, please consider the model to be totally separate from the network - there is no network. There's only this API.

Fair enough.

This comment has been minimized.

Show comment Hide comment
@ioquatix

ioquatix May 4, 2018

Contributor

If I understand your response you are answering yes to the question of whether or not a flush method should be added. Then in any application you wrote you would block until the write completes instead of making use of the on_drained callback. That is your choice of course.

I really find the inverted flow control of callback style programming horrible. So, I prefer #flush over an #on_drained callback. Callbacks = state machine spaghetti = hard to maintain/buggy code. It's just my personal opinion, FYI.

@ioquatix

ioquatix May 4, 2018

Contributor

If I understand your response you are answering yes to the question of whether or not a flush method should be added. Then in any application you wrote you would block until the write completes instead of making use of the on_drained callback. That is your choice of course.

I really find the inverted flow control of callback style programming horrible. So, I prefer #flush over an #on_drained callback. Callbacks = state machine spaghetti = hard to maintain/buggy code. It's just my personal opinion, FYI.

@ohler55

This comment has been minimized.

Show comment Hide comment
@ohler55

ohler55 May 4, 2018

Contributor

@matthewd would a statement leaving the blocking behavior up the the server be enough or maybe a method to ask if blocking would block other threads?

Contributor

ohler55 commented May 4, 2018

@matthewd would a statement leaving the blocking behavior up the the server be enough or maybe a method to ask if blocking would block other threads?

@boazsegev boazsegev referenced this pull request in boazsegev/iodine May 8, 2018

Closed

How to debug catastrophic crash #34

@boazsegev

This comment has been minimized.

Show comment Hide comment
@boazsegev

boazsegev May 8, 2018

I have an initial development implementation of the updated draft, placed in iodine's 0.6.0 version branch (installable from source).

There's some features being re-written (such as pub/sub extensions, including Redis integration), but the core WebSocket/SSE and pub/sub API will work.

I think this would be enough to author a test application using a monkey-patched Rails...

@matthewd , any final changes / requests / tips before we start woking on a "like-for-like benchmarking" example app? How do you want to benchmark this? Should we use the websocket-shootout approach?

(it might not test what we're looking at, as memory consumption and performance will still depend heavily on the pub/sub system rather than the connection system...)

boazsegev commented May 8, 2018

I have an initial development implementation of the updated draft, placed in iodine's 0.6.0 version branch (installable from source).

There's some features being re-written (such as pub/sub extensions, including Redis integration), but the core WebSocket/SSE and pub/sub API will work.

I think this would be enough to author a test application using a monkey-patched Rails...

@matthewd , any final changes / requests / tips before we start woking on a "like-for-like benchmarking" example app? How do you want to benchmark this? Should we use the websocket-shootout approach?

(it might not test what we're looking at, as memory consumption and performance will still depend heavily on the pub/sub system rather than the connection system...)

@ohler55

This comment has been minimized.

Show comment Hide comment
@ohler55

ohler55 May 8, 2018

Contributor

The Agoo develop branch is also compatible with the spec as described in the current state of this PR. Should make a versioned release later this week once the named pub-sub feature is completed.

Contributor

ohler55 commented May 8, 2018

The Agoo develop branch is also compatible with the spec as described in the current state of this PR. Should make a versioned release later this week once the named pub-sub feature is completed.

@ohler55

This comment has been minimized.

Show comment Hide comment
@ohler55

ohler55 May 9, 2018

Contributor

I cleaned up the formatting of the SPEC around the upgrade section. An HTML version is at http://www.ohler.com/agoo/rack/file.SPEC.html.

Contributor

ohler55 commented May 9, 2018

I cleaned up the formatting of the SPEC around the upgrade section. An HTML version is at http://www.ohler.com/agoo/rack/file.SPEC.html.

@boazsegev

This comment has been minimized.

Show comment Hide comment
@boazsegev

boazsegev May 14, 2018

@evanphx , how is the Puma adaptation going?

So far, the API iodine implements for the client object looks like this.

It includes a number of methods not listed in the specification besides the Pub/Sub extensions (the timeout, timeout= and protocol method).

I also added the env accessor so I could avoid making objects just to keep the env alive. This is something I need for an upcoming plezi.io release and I'm not sure if it should be adopted (as it adds longevity to all the objects in the env Hash, which is bound to increase memory fragmentation).

Anyway, I hope to see this move forward. Iodine has been serving clients like this for the last couple of year or so (albeit using extent rather than a client object) and so far the feedback is positive and the performance is promising.

@evanphx , how is the Puma adaptation going?

So far, the API iodine implements for the client object looks like this.

It includes a number of methods not listed in the specification besides the Pub/Sub extensions (the timeout, timeout= and protocol method).

I also added the env accessor so I could avoid making objects just to keep the env alive. This is something I need for an upcoming plezi.io release and I'm not sure if it should be adopted (as it adds longevity to all the objects in the env Hash, which is bound to increase memory fragmentation).

Anyway, I hope to see this move forward. Iodine has been serving clients like this for the last couple of year or so (albeit using extent rather than a client object) and so far the feedback is positive and the performance is promising.

@ioquatix

This comment has been minimized.

Show comment Hide comment
@ioquatix

ioquatix May 14, 2018

Contributor

If you need extensions to make anything useful they need to be part of the spec IMHO.

Contributor

ioquatix commented May 14, 2018

If you need extensions to make anything useful they need to be part of the spec IMHO.

@boazsegev

This comment has been minimized.

Show comment Hide comment
@boazsegev

boazsegev May 15, 2018

@ioquatix , this was my thought as well, which is why I wrote it here. Perhaps these added methods (timeout, timeout= and protocol) should be part of the specification.

On the other hand, I don't have to have the env variable, I just as easily could have create a callback object (using a class instance) instead of static module... it was more of a design preference that I thought I should mention.

EDIT I should note that the timeout methods aren't required to make WebSockets / SSE work. It's just that iodine also supports raw TCP/IP connections and these connections require access to timeout management.

boazsegev commented May 15, 2018

@ioquatix , this was my thought as well, which is why I wrote it here. Perhaps these added methods (timeout, timeout= and protocol) should be part of the specification.

On the other hand, I don't have to have the env variable, I just as easily could have create a callback object (using a class instance) instead of static module... it was more of a design preference that I thought I should mention.

EDIT I should note that the timeout methods aren't required to make WebSockets / SSE work. It's just that iodine also supports raw TCP/IP connections and these connections require access to timeout management.

@ohler55

This comment has been minimized.

Show comment Hide comment
@ohler55

ohler55 May 15, 2018

Contributor

The entire pub-sub pattern can be implemented by keeping an Array of connections. A publish would then call write on each one. The reason pub-sub was not included in the PR was that it added another new use pattern to the spec that would take even more effort for servers to comply with.

Like @boazsegev and Iodine Agoo includes a pub-sub implementation that is compatible to a large degree with Iodine. In then end we wanted something basic that could be used with a minimum of changes in Rack and the way developers use it.

The current PR focuses just on Push without also introducing pub-sub. Maybe a future PR can propose the addition of pub-sub. One step at a time.

Contributor

ohler55 commented May 15, 2018

The entire pub-sub pattern can be implemented by keeping an Array of connections. A publish would then call write on each one. The reason pub-sub was not included in the PR was that it added another new use pattern to the spec that would take even more effort for servers to comply with.

Like @boazsegev and Iodine Agoo includes a pub-sub implementation that is compatible to a large degree with Iodine. In then end we wanted something basic that could be used with a minimum of changes in Rack and the way developers use it.

The current PR focuses just on Push without also introducing pub-sub. Maybe a future PR can propose the addition of pub-sub. One step at a time.

@ioquatix

This comment has been minimized.

Show comment Hide comment
@ioquatix

ioquatix May 15, 2018

Contributor

The entire pub-sub pattern can be implemented by keeping an Array of connections. A publish would then call write on each one. The reason pub-sub was not included in the PR was that it added another new use pattern to the spec that would take even more effort for servers to comply with.

This wouldn't work with falcon in forked mode, since it forks one process per CPU core and they all handle connections independently.

Even in threaded mode, some kind of synchronisation is required. I believe the same problem would affect puma in cluster mode, for example.

My opinion is that this spec needs to be feature complete, as in, it's possible within the confines of the spec to implement actual useful websocket applications. Otherwise, what's the point? If every server supports the core spec but provides their own incompatible models on top of that, this isn't going to work out very well.

Contributor

ioquatix commented May 15, 2018

The entire pub-sub pattern can be implemented by keeping an Array of connections. A publish would then call write on each one. The reason pub-sub was not included in the PR was that it added another new use pattern to the spec that would take even more effort for servers to comply with.

This wouldn't work with falcon in forked mode, since it forks one process per CPU core and they all handle connections independently.

Even in threaded mode, some kind of synchronisation is required. I believe the same problem would affect puma in cluster mode, for example.

My opinion is that this spec needs to be feature complete, as in, it's possible within the confines of the spec to implement actual useful websocket applications. Otherwise, what's the point? If every server supports the core spec but provides their own incompatible models on top of that, this isn't going to work out very well.

@ohler55

This comment has been minimized.

Show comment Hide comment
@ohler55

ohler55 May 15, 2018

Contributor

The PR, as it stands, is feature complete for WebSocket and SSE. Sure, extending the PR to include pub-sub would be a problem for servers that don't share state but that has nothing to do with WebSocket and SSE support proposed here.

@ioquatix, Slamming @boazsegev example code that is meant to demonstrate not only the proposed Rack additions but also other features of Iodine and then claiming that example invalidates this PR is ludicrous. The PR offers a clean and simple way for even new developers to take advantage of WebSockets and SSE. Some of the comments such as those from @tenderlove were constructive and genuinely aimed toward enhancing the current Rack spec with the addition of WebSockets and SSE. You have fought tooth and nail against it from the start with an extreme passion to block any variation on the PR. I don't understand why you feel so threatened by this PR. Can you explain why you have such strong feelings about offering Rack users an simple to use approach to WebSocket and SSE?

Contributor

ohler55 commented May 15, 2018

The PR, as it stands, is feature complete for WebSocket and SSE. Sure, extending the PR to include pub-sub would be a problem for servers that don't share state but that has nothing to do with WebSocket and SSE support proposed here.

@ioquatix, Slamming @boazsegev example code that is meant to demonstrate not only the proposed Rack additions but also other features of Iodine and then claiming that example invalidates this PR is ludicrous. The PR offers a clean and simple way for even new developers to take advantage of WebSockets and SSE. Some of the comments such as those from @tenderlove were constructive and genuinely aimed toward enhancing the current Rack spec with the addition of WebSockets and SSE. You have fought tooth and nail against it from the start with an extreme passion to block any variation on the PR. I don't understand why you feel so threatened by this PR. Can you explain why you have such strong feelings about offering Rack users an simple to use approach to WebSocket and SSE?

@matthewd matthewd locked as too heated and limited conversation to collaborators May 15, 2018

@matthewd

This comment has been minimized.

Show comment Hide comment
@matthewd

matthewd May 15, 2018

Member

Okay, this conversation clearly needs to go back on ice for a bit.

I don't agree that pub/sub is within scope, but if we can't do each other the courtesy of assuming genuine opinions and positive/constructive intentions, then we're not going to get anywhere useful.

Member

matthewd commented May 15, 2018

Okay, this conversation clearly needs to go back on ice for a bit.

I don't agree that pub/sub is within scope, but if we can't do each other the courtesy of assuming genuine opinions and positive/constructive intentions, then we're not going to get anywhere useful.

@tenderlove

This comment has been minimized.

Show comment Hide comment
@tenderlove

tenderlove May 15, 2018

Member

TBH I'm kind of worried about committing to an API without a real world app using it. The reason is because I want to make sure we don't forget anything we need to implement actual apps. If we could get Action Cable running on top of a server that used this API, then I'd be satisfied to merge.

Member

tenderlove commented May 15, 2018

TBH I'm kind of worried about committing to an API without a real world app using it. The reason is because I want to make sure we don't forget anything we need to implement actual apps. If we could get Action Cable running on top of a server that used this API, then I'd be satisfied to merge.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.