Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Listener doesn't process parallel handshakes #279

Open
jkralik opened this issue Aug 17, 2020 · 16 comments · May be fixed by #613
Open

Listener doesn't process parallel handshakes #279

jkralik opened this issue Aug 17, 2020 · 16 comments · May be fixed by #613
Milestone

Comments

@jkralik
Copy link
Contributor

jkralik commented Aug 17, 2020

Your environment.

plgd-dev/go-coap#109

What did you do?

Just look at last comment. But it seems that other handshakes waits each other.

What did you expect?

dlts.Listener is able to process parallel handshakes.

@Sean-Der
Copy link
Member

Sean-Der commented Aug 19, 2020

The issue is here. Every time we get a new inbound stream we block until the handshake is complete. This behavior also isn't a regression f7a68a3

We should look at how crypto/tls buffers internally so that handshakes don't block either. I don't know what the API should look like. I don't know if this is an established Go pattern, but you could call accept n times. That is one way you could accomplish the concurrency. Let me confirm first.

@backkem any ideas?

@Sean-Der
Copy link
Member

Sean-Der commented Aug 19, 2020

https://gist.github.com/Sean-Der/a3a6abbe9b78ced613590312a18dc8a4

@jkralik @boaks @jdbruijn this worked^ If one Server process hangs the other 4 work just fine. If I remove the change in the example one client hanging hurts everything.

You just need to decide how many goroutines you are willing to spawn.. Maybe this should be something that Pion automatically does? I am not sure what is idiomatic.

jdbruijn added a commit to withthegrid/dtls that referenced this issue Aug 19, 2020
@jdbruijn
Copy link
Member

I've checked in my test setup and also works there. See the readme of the repo I've just referenced this in. Will be testing this in my application code, but should work for the time being at lest. @Sean-Der Thanks so much for the quick analysis and fix.

We should look at how crypto/tls buffers internally so that handshakes don't block either.

I have no idea what is the idiomatic but I'll have a look at crypto/tls to see what they do. I can't imagine that handshakes would be blocking there.

You just need to decide how many goroutines you are willing to spawn..

I'll have a look at how the go routines work exactly and what spawning N of them means in CPU, RAM etc. to determine how much I'd be willing to spawn.

For reference to anyone reading: In most applications this wouldn't be that much of an issue since the handshake over an internet connection goes relatively quickly, so context deadlines can be configured to be very tight (e.g. 100 ms). With embedded devices, the full handshake generally takes 5 seconds, which is much more blocking even if it doesn't hang and 500 devices are trying to make connection at the same time.

@daenney
Copy link
Member

daenney commented Aug 19, 2020

I'm probably misunderstanding the problem, but can't we use a buffered channel to get around this?

var conns := make(chan net.Conn, 100)

hub := util.NewHub()

go func(l net.Listener) {
    for {
        conn, err := l.Accept()
        if err != nil {
            .. do something ..
            return
        }
        conns <- conn
    }
}(listener)

for {
    select {
    case c := <-conns:
        hub.Register(conn)
    }
}
hub.Chat()

@daenney
Copy link
Member

daenney commented Aug 19, 2020

Oh mm, the Accept is on dtls.Listener, so that triggers the whole handshake machinery. We should be able to Accept() on the underlying listener first and then spin the rest off into a goroutine? I suppose the problem is this:

return Server(c, l.config)
? We do this inline, instead of spawning a goroutine there that'll then take care of the handshake?

@jkralik
Copy link
Contributor Author

jkralik commented Aug 20, 2020

@Sean-Der From my point view, spawning goroutine's below to underlayer of dtls/listener, because we want to have similar API as tls.Listener. and it is more convenient for using. Of course there must be option to limit spawned goroutines.

@Sean-Der
Copy link
Member

@daenney yep exactly!

@jkralik agree!


Server in crypto/tls doesn't actually do anything so we need to match this API. Instead the first call to Read or Write should start the handshake.

@Sean-Der
Copy link
Member

It looks like we will need to do a v3 and add Handshake

I can't believe I missed this before :( this is a pretty big breaking change, but I don't think it will be difficult to do!

@boaks
Copy link

boaks commented Aug 22, 2020

For reference to anyone reading: In most applications this wouldn't be that much of an issue since the handshake over an internet connection goes relatively quickly, so context deadlines can be configured to be very tight (e.g. 100 ms). With embedded devices, the full handshake generally takes 5 seconds, which is much more blocking even if it doesn't hang and 500 devices are trying to make connection at the same time.

I'm not sure, when exactly the spawned goroutine is required. If this is already required for RFC 6347 - 4.2.1. Denial-of-Service Countermeasures, I think it violates the "In order to counter both of these attacks, DTLS borrows the stateless cookie technique used by Photuris [PHOTURIS] and IKE [IKEv2]." there. Then each (spoofed) CLIENT_HELLO may trigger/consume such a goroutine. Even if that number is limited, it will make it easy for spoofer to perform a DoS attack.

@Sean-Der
Copy link
Member

Sean-Der commented Aug 22, 2020

I'm not sure, when exactly the spawned goroutine is required.

@boaks Each Accept call would be in one. You would do a pre-allocated pool, so I don't think the initial ClientHello will cause a problem!

The API difference is that crypto/tls does the handshake on the first call to Read or Write. pion/dtls does it on the call to Accept. All the blocking occurs in Read and Write which users usually move to a goroutine.


If users do a pool and call Accept multiple times they can handle multiple handshakes at once. That is the best option to unblock right away.


Going forward we have two choices to make the API behave exactly like crypto/tls

  • Add an option to do Handshake in the first call to Read/Write. This adds more complexity, but is not a breaking change!
  • Break the API and never do the Handshake when creating the connection. Only when doing the first Read/Write or when calling Handshake explicitly.

One thing that worries me is that this could encourage users to spawn a goroutine each time a ClientHello is received. The user will do the call to Read/Write/Handshake in a new goroutine.

@daenney daenney added this to the 3.0.0 milestone May 17, 2021
@daenney
Copy link
Member

daenney commented May 17, 2021

If we were to take option 1, realistically we'd want that option on/by default since the current behaviour is undesirable. But we can't flip that on by default since it changes our behaviour quite a bit and we don't know if folks are relying on it somehow. That would result in us having to do a major anyway to be able to flip the default for that option, at which point I'd rather we break the API and align on crypto/tls than take on the tech debt.

@niondir
Copy link

niondir commented Apr 19, 2022

Is there any progress on this? We are accepting DTLS connections of potentially many thousand clients (IoT devices) on low bandwidth connections with high timeouts - which is worst case for this scenario ....

@boaks
Copy link

boaks commented Apr 20, 2022

My experience with (too) many coap and/or dtls stacks is, it's either intended and developed to be used for IoT, or you may be faced much more pain, then you expect. That doesn't mean, other implementations are bad, it just means, they are not really supporting IoT.

@Sean-Der
Copy link
Member

Hi @niondir

Instead of doing your Handshake or first Read/Write in a goroutine create your dtls/Conn in one. Even when we change the behavior to match crypto/TLS you still have to handle the concurrent handshakes.

Happy to create a JSFiddle showing how to do this. When this issue is closed though I don’t think this library will change dramatically, we are just moving when something blocks.

thanks!

@jdbruijn
Copy link
Member

jdbruijn commented Apr 21, 2022

FWIW I'm doing like example below in our DTLS server for IoT devices to create 1000 handshake workers/listeners, also using go-coap. Has been working great for over a year now.

func (l *DTLSListener) acceptLoop() {
 	defer l.wg.Done()
 	for i := 0; i < 1000; i++ {
 		go func() {
 			for {
 				conn, err := l.listener.Accept()
 				select {
 				case l.connCh <- connData{conn: conn, err: err}:
 				case <-l.doneCh:
 					return
 				}
 			}
 		}()
 	}
 }

@jordan-bonecutter
Copy link

jordan-bonecutter commented Mar 18, 2024

This is a pretty big issue. From my experience, 1 bad handshaker and a wonky config can pretty much take down the entire server (default 30s handshake timeout, client may have a much shorter one like 5s. Client handshakes build up infinitely.)

I will submit a PR, but I propose:

type Handshaker interface {
  Handshake() (net.Conn, error)
}

type handshaker struct {
  inner net.Conn
  conf *Config
}

func(h handshaker) Handshake() (net.Conn, error) {
  return Server(h.inner, h.conf)
}

func(ln *listener) AcceptHandshaker() (Handshaker, error) {
  inner, err := ln.parent.Accept()
  return handshaker{inner, ln.config}, err
}

The Handshake method can then be called in a background goroutine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants