Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to handle ws::ErrorKind::Queue? #59

Closed
nickhs opened this issue Jul 20, 2016 · 5 comments
Closed

How to handle ws::ErrorKind::Queue? #59

nickhs opened this issue Jul 20, 2016 · 5 comments

Comments

@nickhs
Copy link

nickhs commented Jul 20, 2016

I'm (apparently) sending too many items to the internal queue and getting back:

ERROR:ws::handler: WS Error <Queue(NotifyError::Full(..))>
DEBUG:ws::handler: Connection closing due to (Abnormal)

However I don't see any calls on Sender that would allow me to determine if I'm about to fill up the queue? By the time I get the error back ws-rs is already disconnecting from the client.

Is there some method I can call that'd advise to sleep for a bit and try again?

Real world use case: https://github.com/nickhs/websocket_replay/blob/master/src/main.rs#L80

@housleyjk
Copy link
Owner

In order to avoid overfilling the queue, you need to set the queue size high enough. In this case, you need Settings::max_connections. The event queue is configured to be 5 times the number of allowed connections. The default is 100, so 500 events at a time. You can increase the number of max connections up to usize::max_value() / 5 ( which is 3689348814741910323 on 64bit systems, if you anticipate handling that many simultaneous websockets, you will need more than one instance of WS-RS 😄 ).

The bench-server example shows how to increase the maximum number of allowed connections. In that example, max_connections is set to 10,000, which gives a queue size of 50,000.

In my experience 10_000 is a nice number for a production environment, but you can do higher if you need to.

@nickhs
Copy link
Author

nickhs commented Jul 20, 2016

Thanks Jason! That totally makes sense - I guess what I'm asking for here is instead of making the queue size "large enough hopefully" is there some sort of feedback mechanism I can get from the sender that tells me when the queue's about to be full so I can try again later? Something like an EAGAIN?

@housleyjk housleyjk added the bug label Jul 20, 2016
@housleyjk
Copy link
Owner

Simply encountering a full queue shouldn't bring down a connection. The queue error exists to inform the caller that the queue is full, so then they can try the operation again later. Sleeping or blocking or trying to schedule a timeout all won't help because to consume events, we need to return to the event loop immediately. Luckily, this should never happen if max_connections is set to a reasonably high value.

Anyway, it was not intended behavior that a queue error should result in a connection disconnecting, so this is a bug. The fix I propose is to delegate handling queue errors to handlers and also add a setting to allow people to just increase the per-connection queue size without needing to accept more connections. You should be able to do something like max_connections=1, queue_size=7000 if you know you will only ever have one connection but you want to queue a bunch of messages all at once.

You can handle the queue error yourself with something like this even now:

   fn replay_lines(&mut self, count: usize) {
        let mut buf = Vec::new();
        for _ in 0..count {
            let res = self.fh.read_until(self.sess_args.delim, &mut buf);
            let bytes_read = res.expect("cannot read file");
            if bytes_read == 0 {
                self.active = false;
            }
            // Note, this ignores other errors that may occur, if you want to handle more than
            // just a queue error, you will want to match on the error and handle each case
            if let Err(ws::Error{kind: ws::ErrorKind::Queue(_), ..}) = self.sender.send(&buf[..]) {
                // Since we are reading a file, our position is already tracked, so we don't need to 
                // store our unfinished work anywhere else. Just break and wait for the timeout to
                // occur to read the next section
                println!("Waiting for queue to clear before queuing next message.");
                break;
            }
        }
    }

But I did test this out and the connection is disconnected because of the queue failure at another point, which defect the new changes will fix.

I noticed in your code that it's the UpfrontPlayback setting that is causing the queue to exhaust. For example, if you have a UpfrontPlayback::Count(_) approaching 500, then line 88 of your code will hit the queue limit before any messages have a chance to be sent or received.

It's up to you of course, but my advice is that you should set the new queue_size setting to be greater than the count or perc setting passed into your cli. That by itself will prevent this issue. If this code goes into production, you should probably also not use expect anywhere while listen is running because a panic will drop all connections, whereas simply encountering an error due to hitting the max_connections limit or the event loop queue limit will only ever terminate the offending connection.

@nickhs
Copy link
Author

nickhs commented Jul 21, 2016

Simply encountering a full queue shouldn't bring down a connection.

Okay, I agree. Does that mean I'm misinterpreting https://ws-rs.org/api_docs/src/ws/src/result.rs.html#44 when it states that the connection will disconnect? Or is that documentation wrong in light of the bug?

Sleeping or blocking or trying to schedule a timeout all won't help because to consume events, we need to return to the event loop immediately.

Does mio have any sort of provision for a yield type call? Or is returning my only option?

But I did test this out and the connection is disconnected because of the queue failure at another point, which defect the new changes will fix.

That's what I saw too - even if I did a return on receiving a queue full error I'd still see a disconnect. Glad to know I wasn't going insane 😄 .

It's up to you of course, but my advice is that you should set the new queue_size setting to be greater than the count or perc setting passed into your cli

Yeap, seems like an obvious win. I'd have to do some pre-processing in the percentage case (determine the total number of lines in the file to figure out the message count) but nothing horrid.

As an aside (and this my naivety), where is the 5 * max_connections buffer created? I was unable to find it after grepping around aside from a reference in src/io.rs

Thanks for your time! (and apologies again if I'm being dumb)

@housleyjk
Copy link
Owner

You are not dumb. This was definitely a weakness in the documentation and the implementation.

Thank you for catching that in result. It used to be true, but it's not now that we let the handler decide what to do in response to a full queue. I will update the docs.

Sadly, there is no yield in vanilla mio and no yield yet in Rust. Maybe someday, but for now, the only way to get the queue cleared is to return from the function.

The size of the queue is set here or here. If you look at those lines in master, you will see the old 5.

Thank you for pointing this stuff out!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants