Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Keep processing incoming data even after we have initiated a close handshake. #72

Merged
merged 5 commits into from Sep 12, 2019

Conversation

najamelan
Copy link
Contributor

@najamelan najamelan commented Sep 11, 2019

This commit will make it so that after initiating a close handshake but before receiving confirmation from the remote, we still pass data messages to client code. An integration test is included verifying that it works correctly.

This fixes the issue as described here: snapview/tokio-tungstenite#69 (comment)

Note that the websocket RFC is not specific about this. There is however no good reason for tungstenite to impose unnecessary data loss on clients.

@daniel-abramov
Copy link
Member

Note that the websocket RFC is not specific about this. There is however no good reason for tungstenite to impose unnecessary data loss on clients.

I personally agree on that.

I think the original idea (implemented a couple years ago) was to enforce this "loss" of messages, so that if there is a buggy server on the other side which ignores your Close message, it's messages are dropped. But yeah... that does not make much sense.

@najamelan
Copy link
Contributor Author

I can clarify something about the commit: "Improve WebSocketState interface with Copy, PartialEq, ... ".

It's not strictly necessary for this pull request to work, but I added that when looking into the other PR for sending after close. It seems to me that the method check_active is kind of problematic. Given the asymmetry, it makes more sense to know whether we can_read or can_write, but when I dove deeper, the code is actually quite complex.

Almost all of the methods do more than just reading or writing. Some do both, and almost all have side effects like changing the state. I think these are design issues which are beyond the scope of these pull requests and so eventually I kept the changes minimal.

I did leave the commit in because I think it doesn't harm and at least now it's possible to compare state with == when that might be handy. It will also automatically copy if you have to pass it on somewhere. It kind of improves the interface of WebSocketState, but it's not needed for anything in the code right now.

@najamelan
Copy link
Contributor Author

In terms of API, check_active also doesn't seem to convey a clear meaning as it doesn't correspond to the same states as is_active.

Looking through the code I also notice a small nit, that is_active and can_read now dereference self for the match, which isn't necessary. I will add a commit here to fix that.

Copy link
Member

@daniel-abramov daniel-abramov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for bringing this topic.

@najamelan
Copy link
Contributor Author

@application-developer-DA sorry, clicked wrong button!

@daniel-abramov
Copy link
Member

It's not strictly necessary for this pull request to work, but I added that when looking into the other PR for sending after close. It seems to me that the method check_active is kind of problematic. Given the asymmetry, it makes more sense to know whether we can_read or can_write, but when I dove deeper, the code is actually quite complex.

I've also brought this topic into our internal discussions. Perhaps some parts can indeed be simplified if we split them better.

Almost all of the methods do more than just reading or writing. Some do both, and almost all have side effects like changing the state. I think these are design issues which are beyond the scope of these pull requests and so eventually I kept the changes minimal.

Unfortunately the complexity of having reading+writing in functions like read_message() comes from the fact that we have to comply with a protocol. We must add messages to the send queue while reading in order to ensure that we stick to the protocol. For instance, if we receive a Ping, we automatically reply with Pong. We might have put a burden of complying to such small things to the user, but that would be too "user-unfriendly". But yeah, that part of src/protocol/mod.rs can be simplified I think.

In terms of API, check_active also doesn't seem to convey a clear meaning as it doesn't correspond to the same states as is_active.

Yes, check_active() was added not that long ago (after introducing AlreadyClosed and changing the logic of invoking ConnectionReset). I personally don't see a big use of AlreadyClosed, but I think @agalakhov prefers having it.

Looking through the code I also notice a small nit, that is_active and can_read now dereference self for the match, which isn't necessary. I will add a commit here to fix that.

Thanks. There are left-overs from the initial implementation which were compiled with an older version of Rust compiler. As we started working on tungstenite-rs, compiler was not smart enough to do an "automatic dereference" at that place (so you either had to dereference a &self, or use &WebSocketState::XXX in matching block).

@daniel-abramov daniel-abramov merged commit fc41b59 into snapview:master Sep 12, 2019
@najamelan
Copy link
Contributor Author

Unfortunately the complexity of having reading+writing in functions like read_message() comes from the fact that we have to comply with a protocol.

Yeah, I didn't find a straightforward way to simplify it. I looked at the use of check_active in WebSocketContext::read_message, but that can't be replaced with can_read, and even the call to read_message_frame needs to happen on other states at least to verify if read_frame returns None. Otherwise, it would seem cleaner to move the changing of the state in read_message_frame up the stack (so it just does reading and nothing else), but since the check for None is required, it still needs to be called.

You will see in the other pull request that I move the condition right to the top of read_message_frame, eliminating as much processing as possible, but we can't eliminate the method call all together.

From reviewing the code I would say it's most likely not the most elegant implementation possible, but it's probably a reasonable target for "if it's not broken, don't fix it", unless someone is really motivated to rewrite it and provide the tests needed to guarantee it doesn't break (as tests are mostly missing right now).

@najamelan najamelan deleted the bugfix/receive_after_close branch September 12, 2019 15:12
@daniel-abramov
Copy link
Member

From reviewing the code I would say it's most likely not the most elegant implementation possible, but it's probably a reasonable target for "if it's not broken, don't fix it", unless someone is really motivated to rewrite it and provide the tests needed to guarantee it doesn't break (as tests are mostly missing right now).

We're open to suggestions on how to rewrite it cleaner. I do have a couple ideas how to simplify that logic in protocol/mod.rs as well, but never could find enough time to rewrite it.

Do you find the code at other places cleaner and simpler to understand? (is it only the file which defines complex websocket logic complex) The project began as a full-rewrite of a former ws-rs library which had massive issues and was not usable for us in production and also not compatible with tokio (after @agalakhov 's attempts to fix ws-rs it turned out that writing it from scratch would be easier).

@najamelan
Copy link
Contributor Author

Do you find the code at other places cleaner and simpler to understand?

I haven't really looked through everything. I was hoping to be a user of websockets and not a websockets implementation developer ;-)

I just noticed that it wasn't easy to return early in top level methods based on conditions like "in this state no new messages should come from remote" or in this state "we shouldn't send any more data". That being said, the number of states you can be in with a websocket connection is quite complex, so maybe that's just the complexity we see in the impl.

One way I can think of that can make this more comprehensible is using an actual state machine, with states and transitions. That can allow keeping actual work separate (like in methods) from the logic of the state machine. It's just one idea. I do think that if this was my code I would like to rewrite it. But you never really know if you will find a better solution before you try, it's just a gut feeling I have when looking at code.

The real problem here I think is the lack of tests and documentation. Complex code almost always has bugs, unless it's thoroughly tested, and those tests also make it safe to refactor and experiment with the design. I haven't really looked into autobahn tests, because maybe that covers a lot. Although it didn't seem to cover the issues I filed pull requests for yesterday.

@najamelan
Copy link
Contributor Author

Just had another idea about design. Something that complicates it is that tungstenite contains two layers that aren't clearly separated.

  • the implementation of the protocol at low level
  • user of the protocol (respond to close and ping)

You could imagine tungstenite without the second part. It would be up to the user to respond to those. I think a better design is to start from the low level only first. Something like tungstenite-core. Then add the convenience interface in a separate layer. That keeps things simpler and concerns separate.

@daniel-abramov
Copy link
Member

One way I can think of that can make this more comprehensible is using an actual state machine, with states and transitions. That can allow keeping actual work separate (like in methods) from the logic of the state machine. It's just one idea. I do think that if this was my code I would like to rewrite it. But you never really know if you will find a better solution before you try, it's just a gut feeling I have when looking at code.

Could be. This particular file which contains all these transitions between different states inside the websocket context also looks dirty to me... I did not have chance to rewrite and retest it though... But it might be, that some changes will be required to make it ready for new tokio.

The real problem here I think is the lack of tests and documentation. Complex code almost always has bugs, unless it's thoroughly tested, and those tests also make it safe to refactor and experiment with the design. I haven't really looked into autobahn tests, because maybe that covers a lot. Although it didn't seem to cover the issues I filed pull requests for yesterday.

We have unit tests on a lower level to verify the most crucial parts and the the autobahn tests ensure that we comply with the protocol and don't have any issues. At the moment we released the library, our implementation was the most strict implementation we could find among other implementations in Rust. All autobahn tests were green except a couple minor tests which were "non-strict". We also have some fuzzing tests and there were no critical bugs found after some fuzzing (which also was mentioned here: https://www.reddit.com/r/rust/comments/8zpp5f/auditing_popular_crates_how_a_oneline_unsafe_has/).

The issues you found and fixed are fine, but things like "do not process messages after receiving close" are not strict according to RFC (as you've already mentioned). This is probably the reason why such small (yet important for some rare cases) issues have not been spotted before.

@najamelan
Copy link
Contributor Author

najamelan commented Sep 19, 2019

I did have a closer look at the websocket crate lately and I must say that it's design is a lot cleaner and simpler. I suggest you review their code when thinking about tungstenite's design.

The major thing that keeps their design clean is indeed what I mentioned above, they do not automatically respond to ping and close. The user has to do that manually.

The problem with websocket right now is that they use hyper for the http handshake and they are not compatible with recent versions of hyper. It seems they are working on that. I have no idea how both crates compare in performance.

Another thing that is better in tungstenite I reckon is working on an underlying stream that might or not use WouldBlock. websocket crate has everything split in sync and async modules. The tungstenite approach is very neat on that point.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants