Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Write salt/iv, addr together with content #160

Closed

Conversation

Anonymous-Someneese
Copy link

Sending salt, address and content seperately would generate three packets during the start of a connection. Since the size of salt is almost always 8, 16 or 32 and the address are pretty much always 20-100 bytes, it leaves a strong feature.

Additionally, with TCP fastopen turned on, only the first write would be carried by
SYN packet. The other writes will have to wait until the handshake complete, making fastopen pointless.

Anonymous-Someneese added 2 commits January 3, 2020 03:13
Sending salt, address and content seperately would generate three packets
at the start of a connection. Since the size of salt is almost always 8, 16 or 32,
it leaves a strong feature.
Additionally, with TCP fastopen turned on, only the first write would be carried by
SYN packet. Writing multiple times makes it pointless.
This decouples two sides and allow easier third party integration.
@riobard
Copy link

riobard commented Jan 4, 2020

Salt and target address can be combined in a single packet because we know both pieces of information immediately when a client connects. Actual content is a bit challenging because for a generic TCP connection the client might not send any data but expect the server to send information (e.g. hello banner) first, though popular protocols like HTTP do not have this problem. A workaround could be to wait a couple millisecond for the client to send actual payload to combine with salt and target address in the first packet.

Go still lacks proper support for TCP Fast Open. However if local side is on Linux kernel 4.11 and above, it could be hacked on right now. See golang/go#4842 for detail.

@Anonymous-Someneese
Copy link
Author

Anonymous-Someneese commented Jan 7, 2020

As for the connection without initial data problem, shadowsocks-libev behaves the same so I guess it's good enough for most implementation? For the Fastopen, yes I am intended to implement using this method but we first need to send all data in one row in order to drop TTFB.

@Anonymous-Someneese
Copy link
Author

I just gave it a timeout. @riobard What do you think?

@riobard
Copy link

riobard commented Jan 8, 2020

To be honest I'm hesitant to add artificial delay. TLS 1.3 and TFO spends all effort to cut handshake time and we just spend that saving right away…

@Anonymous-Someneese
Copy link
Author

Anonymous-Someneese commented Jan 8, 2020

I agree. But if it maintain the current behavior, TFO will have no effect. This change will save 1 RTT for normal connections if TFO is enabled and add 5ms for connections that don't send data on the initiator side. Maybe we can have an option to disable the behavior and allow setting the delay manually?

@riobard
Copy link

riobard commented Jan 8, 2020

Wait, I thought we do not currently have TFO yet?

@Anonymous-Someneese
Copy link
Author

Anonymous-Someneese commented Jan 9, 2020

Yes. I mean even if you add TFO support, only the first packet will be sent in the handshake process. So TFO will have no effect if we send multiple packets for the request. I will add TFO support in another PR if this one is merged.

@Anonymous-Someneese
Copy link
Author

@riobard Any update?

@riobard
Copy link

riobard commented Jan 22, 2020

@Anonymous-Someneese Sorry I've been busy recently and haven't got time to think about the issue. Maybe @lixin9311 could check the PR and see if it fits.

@lixin9311
Copy link
Collaborator

@Anonymous-Someneese Sorry I've been busy recently and haven't got time to think about the issue. Maybe @lixin9311 could check the PR and see if it fits.

I think it could be a more elegant way, if we pass target address to the shadow function, and initialize the cipher with an iv/salt, temporarily saving them somewhere.
Then we send them along with the first data packet, initialize the writer afterwards, without the timer and any other atomic ops.

@lixin9311
Copy link
Collaborator

I would be happy to send another PR addressing the same issue, but this patch currently doesn't look good to me.

@Anonymous-Someneese
Copy link
Author

Salt and target address can be combined in a single packet because we know both pieces of information immediately when a client connects. Actual content is a bit challenging because for a generic TCP connection the client might not send any data but expect the server to send information (e.g. hello banner) first, though popular protocols like HTTP do not have this problem. A workaround could be to wait a couple millisecond for the client to send actual payload to combine with salt and target address in the first packet.

@lixin9311 Thank you for your work. But what is your solution to this problem?

@lixin9311
Copy link
Collaborator

Salt and target address can be combined in a single packet because we know both pieces of information immediately when a client connects. Actual content is a bit challenging because for a generic TCP connection the client might not send any data but expect the server to send information (e.g. hello banner) first, though popular protocols like HTTP do not have this problem. A workaround could be to wait a couple millisecond for the client to send actual payload to combine with salt and target address in the first packet.

@lixin9311 Thank you for your work. But what is your solution to this problem?

see
#164
thank you

@riobard
Copy link

riobard commented Feb 17, 2020

Today I've got some time to think about this issue. There are three separate but related problems here:

  1. Salt and address should definitely be combined and sent together. This can be easily done by turning off TCP_NODELAY (default is on by Go's net.TCPConn) at the beginning, and turning it back on after address is written. Nagel's algorithm should do the job of combing the two writes into a single small packet without changing our code structure.
  2. Waiting for actual payload from client is problematic, because the client might be expecting the server to speak first.
  3. And the above two problems exist because we want to enable TCP Fast Open to send all three pieces of data (salt, address, initial payload) in the first SYN packet, which Nagel's algo does not even apply (at least that's my understanding). TFO comes with its own host of problems, mostly because it does not work well in practice (see TCP fast open support #161 (comment)), and the semantics of enabling it in a split-proxy scenario like Shadowsocks with 4 endpoints and 3 legs is ill-defined. The still-in-draft SOCKS6 spec covers it but only in the normal 3 endpoints and 2 legs scenario.

Given the above understanding of the problem landscape I tend to think we just do the minimal work of fixing problem 1.

Please share your thoughts and we could discuss.

@lixin9311
Copy link
Collaborator

Yes, I agree. If we only need to send iv and address, we can manually merge those 2 packets, without any system calls.

@Anonymous-Someneese
Copy link
Author

Anonymous-Someneese commented Mar 16, 2020

I think a code change should be implemented instead of changing socket label since many application uses this project as library. We should at least concatenat iv and addr packets to eliminate the protocol feature.
As for fastopen, I think giving an option would be better without one. The user should be responsible to decide if it's better to turn it on. That's also what other implementation have been doing.

@fortuna
Copy link

fortuna commented Aug 4, 2020

I strongly recommend coalescing salt+address+initial data. It makes the size of the initial packet hard to predict, which is not the case for salt+address.

We were able to implement this in outline-ss-server: Jigsaw-Code/outline-ss-server#73
The change was very tricky to get right without adding extra buffering, but we made it. Our ShadowsocksWriter now has a LazyWrite method that eventually gets flushed by a timed Flush call, or a full Write or ReadFrom. See our DialTCP code.

In practice there's no delay most of the time, since clients will usually send data immediately. For server-speaks-first protocols, there's a minimal 10ms delay, but I don't think it matters. We are in the process of releasing this change to Outline Clients during this week and the next.

We have previously made a change to coalesce the salt and initial server data, which is a lot simpler:
Jigsaw-Code/outline-ss-server#69

That has been running on production servers for a few weeks now.

@fortuna
Copy link

fortuna commented Aug 4, 2020

Coalescing salt+address+initial data also removes the possibility of some replay attacks that observe where the boundaries of the encryption blocks are, since it becomes one single block with unpredictable length.

@riobard
Copy link

riobard commented Aug 4, 2020

@fortuna I think (haven't actually tried yet) that by turning on Nagle's algorithm for the first couple of packets we can get the same coalescing effect without explicit packing bytes?

@fortuna
Copy link

fortuna commented Aug 6, 2020

I'm curious to hear whether that works. I also haven't tried it. It would certainly be an easy way to implement the packet coalescing. However, from what I read about the Nagle's algorithm, it only buffers if there's unconfirmed data in the pipe, which I interpret as immediately sending the first write as its own packet, which wouldn't accomplish what we need.

@riobard
Copy link

riobard commented Aug 6, 2020

@fortuna My bad. After further thinking about it, what you said is correct, and Nagle's algo won't help in this case.

@riobard
Copy link

riobard commented Aug 9, 2020

OK, I've found the correct mechanism to implement this without breaking code layering and API compatibility.

I was confused and thinking about TCP_NODELAY (Nagle's algo), but actually it's TCP_CORK that'll do wonders in the case. A naive solution is like this:

  1. Set TCP_CORK;
  2. Send salt;
  3. Send address;
  4. (Optionally) send payload;
  5. Unset TCP_CORK.

Caveats:

  1. TCP_CORK is Linux-specific. On BSD there is a similar option called TCP_NOPUSH. Not sure about Windows.
  2. TCP_CORK will wait up to 200ms to accumulate data before sending out on the wire. The 200ms timer seems to be hard-coded and unadjustable. Ideally we'd like to keep it to under 10ms in our use case to reduce latency penalty of server-speak-first protocols. A simple workaround is that right after setting TCP_CORK we start a 10ms-delayed function to unset TCP_CORK, which forces sending on the wire.

@riobard
Copy link

riobard commented Aug 9, 2020

A proof of concept for client-side use 50676a7

@fortuna
Copy link

fortuna commented Aug 10, 2020

The TCP_CORK solution seems convenient, but unfortunate that it's for Linux only. We opted to implement the in user space, which we have much more control and is cross-platform. Maybe you implement TCP_CORK in the interim before implementing in the user space?

Feel free to copy what we did by the way. We effectively have a LazyWrite call in the ShadowsocksWriter that appends to the internal buffer, and a Flush call that writes to the TCP connection. Regular Write will also flush. We were able to do it without requiring an extra buffer. You just need to make sure there's space for the AEAD tag after the lazy data, in case you need to flush before the regular ReadFrom finishes the first read.

@riobard
Copy link

riobard commented Aug 11, 2020

BSD has TCP_NOPUSH, so the last commit works on both Linux and BSD. It is also trivial to implement corking (or traffic shaping in general) in user-space by wrapping net.TCPConn with necessary timing/buffering policies for platforms like Windows which lack similar kernel mechanism without breaking code layering and API compatibility.

@riobard
Copy link

riobard commented Aug 16, 2020

Turns out a generic user-space TCP_CORK is very simple, see this feature branch https://github.com/shadowsocks/go-shadowsocks2/tree/feature-generic-cork

@riobard
Copy link

riobard commented Aug 19, 2020

Generic TCP corking has been merged in #186. Closing this now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants