Improves server min/max port range #78

msft-bnorick · 2023-04-21T18:55:32Z

Adds a simple LFSR RNG which can generate all ports in the [min_port, max_port] range in random order for each client connection
Uses an LFSR per client to attempt binding ports until one is bound
If the client connected without a port request, only returns failure if all ports in the range are already bound
If client requests a port, it is checked against both min and max

- Adds a simple LFSR RNG which can generate all (valid) ports in random order for each client connection - Uses an LFSR per client to attempt binding ports until one is bound - If the client connected without a port request, only returns failure when all ports in the range are already bound - If client requests a port, it is checked against both min and max

ekzhang · 2023-04-21T20:46:49Z

This seems like a good approach, thanks. Would likely just use Fastrand though, which is less code and also less dependencies.

Code matters because bore is really small in amount of code and I would hope to keep it that way.

By the way, it's been a while and the {max port, bind to 0 behavior} are clearlys issue that a lot of people are interested in, thanks to everyone who expressed this. I'm really busy right now but will set a reminder to get back to this in 2 weeks (no promises though).

msft-bnorick · 2023-04-22T00:26:23Z

I opted to implement an LFSR because it has the property that it will generate every number once before cycling, meaning we'll never waste any time checking a port we already tried (for a given client connection) and we don't need to explicitly track the ports we've already attempted to bind. If you use and RNG without this property you either 1) waste time rechecking ports you've already checked, or 2) need to track which ports you've tried already.

There is a crate which provides the LFSR we'd want here, so if you'd rather add a dependency than a few lines of code then that's an option.

ekzhang · 2023-04-27T04:04:10Z

Thanks for the feedback! That is true and a valid property of Galois fields generated by a single element, but there are downsides in that the linear recurrence is a pretty poor quality random number generator, and duplicate recurrences would overlap. Mathematically to find a free port with probability 1-δ, when at least ε proportion of the ports are currently available, it suffices to check -2 ln(δ) / ε times (up to a second-order term in ε). So we can guarantee very high success rate (99.999%) at utilizing most ports (say, 80% of ports) with only 115 i.i.d. random checks.

use std::{net::TcpListener, time::Instant};

fn main() {
    let l1 = TcpListener::bind("127.0.0.1:1234").unwrap();
    let start = Instant::now();
    let l2 = TcpListener::bind("127.0.0.1:1234");
    assert!(l2.is_err());
    println!("{:?}", start.elapsed());
    drop(l1);
}

I just benchmarked it, and binding to a port that's already in use takes 10-20 µs on my machine, which isn't really a significant overhead. I'll probably just have it attempt binding about 150 times.

ekzhang · 2023-04-27T13:58:10Z

Also some other inaccuracies in case you weren't aware:

I opted to implement an LFSR because it has the property that it will generate every number once before cycling, meaning we'll never waste any time checking a port we already tried (for a given client connection) and we don't need to explicitly track the ports we've already attempted to bind.

The implementation you suggest is quadratic in complexity as the number of used ports grows, since the LFSR needs to check every single port until it finds a working one (and moreover, always in the same order), which could lead to unintentional denial of service.

There is a crate which provides the LFSR we'd want here, so if you'd rather add a dependency than a few lines of code then that's an option.

Thanks. Fundamentally bore cares a lot about being minimal and understandable by readers; this is why we minimize code size and avoid adding additional features. It's more maintainable and easier for people to read if we select a high-quality random-number generator crate (fastrand, 23KB and 500 lines of code, 0 dependencies), rather than the two implementations you suggest:

Add rand (370 KB, 6000 lines of code, 0-280 KB of dependencies) plus an unvetted, undocumented LFSR implementation that users would have to parse through, understand the properties of Galois fields for, and verify.
Add rand and also lfsr, which involves macros and pulls in over 2 MB and 50K lines of code in dependencies.

msft-bnorick · 2023-04-27T14:55:59Z

The implementation you suggest is quadratic in complexity as the number of used ports grows, since the LFSR needs to check every single port until it finds a working one (and moreover, always in the same order), which could lead to unintentional denial of service.

I don't think the bolded bit is true, as I start the search at a different place for each client connection.

Given your benchmarking, it seems checking many ports will be less costly than I imagined (I didn't take the time to check and just defaulted to what I still think is the more performant solution). The LFSR implementation is 60 lines (plenty of which are boilerplate) almost straight from the Wikipedia article, not sure why you're treating it like it is some magic that users couldn't parse. As far as the dependencies, rand is only used to seed the LFSR, so replace that with fastrand and you'll solve the "size of dependencies" problem.

ekzhang · 2023-04-27T17:27:18Z

Thanks for the contribution! I think I've made my decision.

msft-bnorick · 2023-04-27T17:30:45Z

Sure. Can you clarify if this still holds when using a random starting port?

The implementation you suggest is quadratic in complexity as the number of used ports grows, since the LFSR needs to check every single port until it finds a working one (and moreover, always in the same order), which could lead to unintentional denial of service.

ekzhang · 2023-04-27T17:35:29Z

Yeah, the behavior when many ports are used is what I intended to convey — if only ε proportion of ports are currently available, the LFSR would need to check 1/ε ports before finding one, and in particular it becomes quadratic once there are no more ports available since each future request requires checking all 65536 ports. Checking that many ports takes a while, so regardless I'd like to avoid doing that!

msft-bnorick · 2023-04-27T17:38:35Z

Makes sense. But if you stop after 150, you'll be turning away clients when there are actually ports available. I wanted to avoid that.

In my case, the [min_port, max_port] range is only ~2500 ports, so it's not as bad as 65k, but I get your point.

msft-bnorick mentioned this pull request Apr 21, 2023

Min port is not respected for randomly assigned ports #70

Closed

ekzhang closed this Apr 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improves server min/max port range #78

Improves server min/max port range #78

msft-bnorick commented Apr 21, 2023

ekzhang commented Apr 21, 2023 •

edited

Loading

msft-bnorick commented Apr 22, 2023 •

edited

Loading

ekzhang commented Apr 27, 2023 •

edited

Loading

ekzhang commented Apr 27, 2023 •

edited

Loading

msft-bnorick commented Apr 27, 2023

ekzhang commented Apr 27, 2023

msft-bnorick commented Apr 27, 2023

ekzhang commented Apr 27, 2023

msft-bnorick commented Apr 27, 2023

Improves server min/max port range #78

Improves server min/max port range #78

Conversation

msft-bnorick commented Apr 21, 2023

ekzhang commented Apr 21, 2023 • edited Loading

msft-bnorick commented Apr 22, 2023 • edited Loading

ekzhang commented Apr 27, 2023 • edited Loading

ekzhang commented Apr 27, 2023 • edited Loading

msft-bnorick commented Apr 27, 2023

ekzhang commented Apr 27, 2023

msft-bnorick commented Apr 27, 2023

ekzhang commented Apr 27, 2023

msft-bnorick commented Apr 27, 2023

ekzhang commented Apr 21, 2023 •

edited

Loading

msft-bnorick commented Apr 22, 2023 •

edited

Loading

ekzhang commented Apr 27, 2023 •

edited

Loading

ekzhang commented Apr 27, 2023 •

edited

Loading