Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SIP] Authentication based multi-user-single-port #130

Open
madeye opened this issue Sep 26, 2018 · 42 comments
Open

[SIP] Authentication based multi-user-single-port #130

madeye opened this issue Sep 26, 2018 · 42 comments

Comments

@madeye
Copy link
Contributor

madeye commented Sep 26, 2018

Background

Previous discussions suggest that we do authentication (SIP004) on the first chunk with different keys, to identify the user based on the success key.

Implementation Consideration

Performing GCM/Poly1305 on the first chunk should be very fast. It's expected that even a naive implementation would support thousands of users without any notable overhead.

Still, we can cache the success keys for its source IP, which would save most of computation. To prevent potential DDOS attack, the IP that tries too many times with authentication failure should be blocked.

Given this SIP doesn't involve any protocol change, only server code needs to be modified. The only limitation here is that AEAD ciphers are required.

Example

Jigsaw implemented a go-ss2 based server here: https://github.com/Jigsaw-Code/outline-ss-server. Early report shows that it works quite well with 100 users: #128 (comment)

@Mygod
Copy link
Contributor

Mygod commented Sep 26, 2018

Have you considered the possibility that NAT might mess with your cache? Namely, if two clients behind the same NAT router try to connect to the same server with different credentials, god bless you because they have the same source IP address to the server.

@kimw
Copy link
Contributor

kimw commented Sep 26, 2018

Have you considered the possibility that NAT might mess with your cache? Namely, if two clients behind the same NAT router try to connect to the same server with different credentials, god bless you because they have the same source IP address to the server.

May be that's what we called, THE COST :)

Things cannot be perfect. It's depend on a BALANCE.

  1. Do not support multi users run on a single port (I mean really multi, e.g. 100 users) => multi ports should be opened <= It's a abnormal server side behavior.

  2. Multiple users mass in a single port, oh yes, it's cool!

    And, it looks some kind "clean" from server side. The operator of SSP, shadowsocks service provider, must buy your beer.

    And * 2, ss-manager should, maybe, retired.

That's just some personal comment. This SIP need more balance in any case.

@kimw
Copy link
Contributor

kimw commented Sep 26, 2018

More words:

If a shadowsocks server supports only countable user/users, it's abnormal behavior too.

--

Following on this idea, maybe a later SIP should about exchange shadowsocks in a kind of circle (known friends? trusted servers?)

@Mygod
Copy link
Contributor

Mygod commented Sep 26, 2018

Hmm if you're okay with the COST that users one day complaining to you it's not working because of NAT v.s. cache issue.

I suggest either not taking the cache approach, or use other protocols that already supports multi-user like v2mess (I haven't looked at the protocol yet but it seems that that protocol supports this use-case).

Different people prefer different balance between things. I don't think Shadowsocks is intended to cover all kind of balances you wish.

@riobard
Copy link
Contributor

riobard commented Sep 26, 2018

Hmmm… I think if we're gonna officially support multiuser per port, we might as well address the problem cleanly? #54 is still open ^_^

@riobard
Copy link
Contributor

riobard commented Sep 26, 2018

But I agree this hack is neat in that it does not require any changes in the clients. 👍

@Mygod
Copy link
Contributor

Mygod commented Sep 26, 2018

Also I should point out that the problem I pointed out might occur more frequently than you imagine thanks to exhausted IPv4 pool and widely-deployed CGN. It's likely that one will run into such frustration despite having taken precautions.

@riobard
Copy link
Contributor

riobard commented Sep 26, 2018

CGN is a major concern. We might need to run some tests to determine the rough size of NAT pools used by ISPs doing massive CGN.

@madeye
Copy link
Contributor Author

madeye commented Sep 26, 2018

NAT should not be a problem, as long as not all of the users are behind the same NAT address.

Say five users behind a same NAT ip address, at most five keys cached for that IP.

@madeye
Copy link
Contributor Author

madeye commented Sep 26, 2018

This SIP just suggests a kind of multi-user-single-port solution for shadowsocks without modifying the protocol.

But as mentioned by @Mygod, shadowsocks is not designed for this purpose.

I listed this SIP here since it's already implemented in a third-party software. If anyone is interested in it as well, please follow this SIP and apply the suggested optimizations.

@riobard
Copy link
Contributor

riobard commented Sep 27, 2018

My worry is that people will eventually abuse this hack to run commercial services. It's not gonna scale well when users are mostly behind CGN with small pool of public IPs, e.g. mobile networks in China.

@Mygod
Copy link
Contributor

Mygod commented Sep 27, 2018

CGN also applies to ADSLs. Also one shouldn't forget NAT routers in enterprises, schools, etc. A good way to combat this is to enlarge cache size and always do a fallback lookup.

@madeye
Copy link
Contributor Author

madeye commented Sep 27, 2018

Fallback lookup is always needed. Even a key is cached, the authentication is still required. If failed for authentication, a fallback lookup is performed.

I don't expect millions of users on one single port. A reasonable assumption is thousands of users per server, hundreds per port.

And of course, it cannot scale for commercial usage.

@celeron533
Copy link

In some places, the ISP may do the NAT for entire neighborhood which may include 10,000 end users by assigning the ip address with prefix 100.64. It is also a kind of NAT.

https://tools.ietf.org/html/rfc6598

  1. IANA Considerations

IANA has recorded the allocation of an IPv4 /10 for use as Shared
Address Space.

The Shared Address Space address range is 100.64.0.0/10.

@riobard
Copy link
Contributor

riobard commented Sep 27, 2018

@celeron533 This is CGN mentioned above.

@shinku721
Copy link

shinku721 commented Oct 13, 2018

Hmm, why not use a ElGamal-like method to identify users?

@Mygod
Copy link
Contributor

Mygod commented Oct 13, 2018 via email

@fortuna
Copy link
Contributor

fortuna commented Nov 29, 2018

FYI, Outline Servers have all been migrated to outline-ss-server this week. They don't yet use the single port feature, but we intend to enable it in a few weeks, after I implement the IP->cipher cache.

We can roll that out gradually and see how it performs in the wild. In my own tests, the added latency for 100 users without any optimization in a crappy $5 VPS can be significant, 10s of milliseconds, but it can vary wildly, and I believe the optimizations will help significantly. Also, outline-ss-server has Prometheus metrics, so we will be able to expose latency metrics and admins will be able to monitor that.

BTW, outline-ss-server still allows for multiple ports, and you can have multiple keys per port, and multiple ports per key. You can always start a new port if one becomes overloaded. One nice feature is that you can do that without creating a new process for each port, or stop the running one.

@fortuna
Copy link
Contributor

fortuna commented Nov 29, 2018

It's worth mentioning that the single-port feature has some very good motivation:

  • It makes it a lot easier and safer to configure your server firewall. No need to open all the ports.
  • It allows all servers to run on ports 443, 80 or any other usually unblocked port. We found multiple cases of users not being able to use Outline in strict networks that doesn't allow traffic to high port numbers, or outside a small subset of ports.
  • It allows Outline Servers to run on a Docker container without needing --net=host (you can expose the single port instead)
  • In the future, we'll be able to run the Outline Server management API and the Shadowsocks service on the same port, by making it fallback to HTTPS to the management API if all keys fail. This will make the servers even harder to detect (you'll get a standard 404).

@fortuna
Copy link
Contributor

fortuna commented Dec 12, 2018

I now have a benchmark for my single-port implementation:
Jigsaw-Code/outline-ss-server#7

These are the results on a $5 Frankfurt DigitalOcean machine that is idle:

BenchmarkTCPFindCipher 	    1000	   1304879 ns/op	 2015027 B/op	    3107 allocs/op
BenchmarkUDPUnpack     	    3000	    615077 ns/op	  115427 B/op	    1801 allocs/op

That's 1.3ms to go over 100 ciphers for a TCP connection. 0.6 ms for a UDP datagram. That will probably be worse under load, but it gives an idea of the kind of added latency we'd be adding.

There's 2MB of allocations for one TCP connection. I believe that can be significantly reduced by sharing buffers, but it gets a little tricky with the code structure and different ciphers needing different sizes of buffers (I guess I need to find the max buffer size).

@riobard
Copy link
Contributor

riobard commented Dec 12, 2018

@fortuna That's a lot of allocs/op. Is that normal?

@fortuna
Copy link
Contributor

fortuna commented Dec 13, 2018

PR Jigsaw-Code/outline-ss-server#8 makes the TCP performance on par with UDP. We no longer allocate so much memory:

BenchmarkTCPFindCipher-12    	    1000	   1349922 ns/op	  125278 B/op	    1705 allocs/op
BenchmarkUDPUnpack-12        	    2000	    881121 ns/op	  125030 B/op	    1701 allocs/op

The ~2MB allocations were because I was allocating a buffer for an entire encrypted chunk (~16KB) for each of the 100 ciphers I tried. Now I allocate only one buffer for all ciphers

As for the number of allocations, it's just that' I'm doing the operation 100 times. For 1 cipher only I get these numbers:

BenchmarkTCPFindCipher-12    	   30000	     52329 ns/op	    1408 B/op	      22 allocs/op
BenchmarkUDPUnpack-12        	  200000	      8989 ns/op	    1266 B/op	      18 allocs/op

@fortuna
Copy link
Contributor

fortuna commented Dec 13, 2018

With the new findAccessKey optimization, the allocations and CPU are dominated by the low level crypto, so I'm not sure there's much room to improve there:
image

This is without the IP -> cipher cache. I'm trying to make the cipher finding as efficient as possible, to reduce the need for the cache.

@fortuna
Copy link
Contributor

fortuna commented Dec 19, 2018

FYI, I've added an optimization to outline-ss-server that will keep the latest used cipher in the front of the list. This way the time to find the cipher is proportional to the number of ciphers being used, rather than the total ciphers.

Furthermore, I've added the shadowsocks_time_to_cipher_ms metric that will tell you the 50th, 90th and 99th percentile times to find the cipher for each access key.

This should be enough to inform us whether the performance is good enough. It would be great if people here gave it a try and reported back. The lastest binary with the changes is v1.0.3 and can be found in the releases:
https://github.com/Jigsaw-Code/outline-ss-server/releases

@fortuna
Copy link
Contributor

fortuna commented Apr 4, 2019

Update: Outline has been running servers with multi-user support on a single port for a few months now. Some organizations have 300 keys on a server, with over 100 active on any given day. Median latency due to cipher finding is around 10ms and CPU usage is minimal (bandwidth is the bottleneck).

At 90th percentile you can see cases here and there close to 1 second, but that's not common and may be due to other factors such as a burst in CPU usage (maybe expensive prometheus queries).

Has anyone here tried the single port feature? How was your experience?

@madeye
Copy link
Contributor Author

madeye commented Apr 5, 2019

Average 10ms latency looks too slow to me.

Assuming 300 users and the worst case that 300 authentications performed for each connection, one single authentication takes 33us. It means more than 33k cycles on a 1 GHz CPU, which is too long for a small packet authentication.

Can you elaborate more about the measurement of latency?

@Mygod
Copy link
Contributor

Mygod commented Apr 5, 2019

2998 light-kilometer might or might not be acceptable depending on use case, e.g. it's probably not acceptable for game streaming but probably ok for downloading/video streaming. 😄

@fortuna
Copy link
Contributor

fortuna commented Apr 5, 2019

This site says that 20ms is excellent RTT. So 10ms shouldn't be perceptible.

Also, this is latency added per connection, not per packet.

@Mygod
Copy link
Contributor

Mygod commented Apr 5, 2019

How about UDP connections/packets (which are mostly used in latency-sensitive applications)?

@fortuna
Copy link
Contributor

fortuna commented Apr 12, 2019

I have a benchmark above: #130 (comment)

UDP takes about 9 microseconds per cipher.

@Mygod
Copy link
Contributor

Mygod commented Apr 13, 2019

@fortuna Sorry, I mean to ask whether the added latency for UDP connections is per-connection or per-packet.

@fortuna
Copy link
Contributor

fortuna commented Apr 13, 2019 via email

@Mygod
Copy link
Contributor

Mygod commented Apr 13, 2019

I think it would be more appropriate to optimize for UDP connections (I think there are UDP lookup caches in libev implementation)

@fortuna
Copy link
Contributor

fortuna commented Apr 13, 2019 via email

@Mygod
Copy link
Contributor

Mygod commented Apr 19, 2019

@fortuna Is it technically possible to do a cache for UDP packets as well?

@fortuna
Copy link
Contributor

fortuna commented Aug 2, 2019

Update: @bemasc has merged Jigsaw-Code/outline-ss-server#25 that adds a new optimization to the cipher finding. We now associate a "last client ip" to each cipher. When a new request arrives, we lookup the ciphers that had the client ip as the last ip, and try them first, before trying the the prioritized list.

If a cipher is accessed by a single IP, it will always be tried first.
If a cipher is accessed by multiple IPs simultaneously, it's likely to stay in the front of the priority list.

With the optimization, any extra latency will be almost gone for almost everyone, even if there are hundreds of active access keys.

@Mygod, the heuristic of pushing used ciphers to the front of the list, as well as the new one, are applied to both TCP and UDP.

@riobard
Copy link
Contributor

riobard commented Aug 2, 2019

@fortuna Neat! Almost two orders of magnitude latency reduction in the common case! I'm really surprised by how far you guys have pushed forward without changing the protocol 👍

@Ehco1996
Copy link

Ehco1996 commented Nov 9, 2020

i also impl the many user in one port use python asyncio

core idea is to use db order field to find the right user

code is here

https://github.com/Ehco1996/aioshadowsocks/blob/052c472422955c4ade7d0e375c8d093231aff1a9/shadowsocks/mdb/models.py#L157

@ghost
Copy link

ghost commented Nov 9, 2020

We can use same technology to eliminate the need of the encryption method selection. Server try both AES-256-GCM and Chacha20-Poly1305 with same password (they have same tag size and salt size, thus have exact same packet layout). Client choose the fastest one depends on it's platform.

Remove encryption selection might be too radical (and lack of foresight. With this selector, we've introduced new protocol) for us, but still an option for other shadowsocks-like protocol.

@lzm0
Copy link

lzm0 commented Nov 19, 2021

This may be a stupid question, but what prevents us from using a HashSet for cipher lookup?

@fortuna
Copy link
Contributor

fortuna commented Nov 19, 2021

@lzm0 there's no id in the Shadowsocks protocol that can be mapped to the credentials to use, so there's no key to lookup. That's why we need to use trial decryption.

@database64128
Copy link
Contributor

Shadowsocks 2022 (#196) has a protocol extension that brings native multi-user-single-port support without trial decryption: https://github.com/Shadowsocks-NET/shadowsocks-specs/blob/main/2022-2-shadowsocks-2022-extensible-identity-headers.md

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants