Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: stun protocol & stun connection #9

Merged
merged 44 commits into from
May 24, 2024
Merged

feat: stun protocol & stun connection #9

merged 44 commits into from
May 24, 2024

Conversation

lchenut
Copy link
Collaborator

@lchenut lchenut commented Mar 8, 2024

Presentation

This PR is a part of the stack to create the nim-libp2p webrtc-direct transport (defined here: https://github.com/libp2p/specs/blob/master/webrtc/webrtc-direct.md)
We implement a small version of the STUN protocol following https://datatracker.ietf.org/doc/html/rfc5389
As the RFC tells us: STUN serves as a tool for other protocols in dealing with NAT traversal, thus we also implement a small version of the ICE Lite protocol following https://datatracker.ietf.org/doc/html/rfc8445.

The STUN protocol reads and writes from the UdpConn. All the received STUN messages are treated as such and answered directly, all the other received messages are queued to be read by the underlying DTLS protocol.

Stun protocol

As said above, this STUN implementation isn't complete. At the moment of writing this PR, we are only interested in implementing the server part of the webrtc-direct transport. We will reconsider and improve the implementation when vacp2p/nim-libp2p#409 is addressed or if the webrtc-direct spec changes, for example. But for now on, there are some part missing.

Non-exhaustive list of what is missing:

  • REALM and NONCE attributes (no authentification/security process described in the webrtc-direct spec)
  • Retransmissions (client side, we only implement the server)
  • Shared secret Request/Response/Error (again nothing described in the spec)

ICE Lite protocol

For the same reasons as to why the STUN protocol isn't fully implemented (and because it's in the webrtc-direct spec), we use the Lite implementation of ICE. And again, only the server (ICE-CONTROLLED) side. As the server in the webrtc-direct transport must be publicly available, the Lite version should be sufficient.

webrtc/stun/stun.nim Outdated Show resolved Hide resolved
tests/teststun.nim Outdated Show resolved Hide resolved
tests/teststun.nim Outdated Show resolved Hide resolved
tests/teststun.nim Outdated Show resolved Hide resolved
laddr*: TransportAddress # Local address
raddr*: TransportAddress # Remote address
dataRecv*: AsyncQueue[seq[byte]] # data received which will be read by DTLS
stunMsgs*: AsyncQueue[seq[byte]] # stun messages received and to be

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this queue unbounded?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, those two queues are unbounded.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should they be?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's nothing in the RFC saying anything about this. And I'm not confident enough to say There should be a limit and this limit is this number. So I leave things as they are because of my uncertainty.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this is something always mentioned on specs, but I believe there should always be a limit to avoid memory DoS attacks. See more in https://github.com/libp2p/rust-libp2p/blob/master/docs/coding-guidelines.md#bound-everything.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

try:
let decoded = StunMessage.decode(await self.stunMsgs.popFirst())
if not decoded.isFingerprintValid():
# Fingerprint is invalid, the StunMessage received might be a false positive.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does "might be a false positive" mean? Not a Stun message? Why is it moved to the dataRecv queue?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically, there's a first check with the raw data (without decoding) where we check different things (the size of the message, the presence of the magic stun cookie etc...). When the incoming message is sorted, we can decode it. If something is wrong with the Fingerprint, it can means that it is, in fact, not a Stun Message but a message for another protocol. And the RFC specifies this by saying :

   The FINGERPRINT attribute can aid in distinguishing STUN packets from
   packets of other protocols.  See [Section 7](https://datatracker.ietf.org/doc/html/rfc8489#section-7).``` 

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you briefly describe it in the comment and add the link? Why is it moved to the dataRecv queue?

Copy link
Collaborator Author

@lchenut lchenut May 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it moved to the dataRecv queue?

Because StunConn uses two queues for received messages:

  • one for Stun Messages stunMsgs which are decoded/answered/etc... in stunMessageHandler
  • one for the others protocols dataRecv (in the case of the WebRTC stack, it's DTLS messages), which are popped from the queue when read is called.

And, if the Fingerprint is wrong, it could be a false negative, which mean, it's not a Stun Message, but probably a DTLS message, thus it should be in dataRecv and not in stunMsgs

except WebRtcError as exc:
trace "Failed to write the Stun response", error=exc.msg

proc init*(

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be new as StunConn is a ref.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

##
await self.closeEvent.wait()

proc close*(self: StunConn) =

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we close the underlying UDP conn?

Copy link
Collaborator Author

@lchenut lchenut May 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Udp conn is a really bad name, it should be Udp Transport, I might change this in another PR.
But no, UdpConn is closed only when we close the Stun transport

if self.closed:
debug "Try to close an already closed StunConn"
return
self.closed = true

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should it be the last line?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or maybe having a closing and closed, but not sure.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed it, but as the proc is synchronous, it doesn't change anything

import sequtils, typetraits, std/sha1
import bearssl

proc generateRandomSeq*(rng: ref HmacDrbgContext, size: int): seq[byte] =

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is supposed to return a seq, but I believe it doesn't return anything, potentially nil.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean? I initialize result. It definitely returns a seq.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find it confusing. There are 3 different ways of returning in Nim and it's used arbitrarily.

rem = (rem shr 1) xor 0xedb88320'u32
else:
rem = rem shr 1
result[i] = rem

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd recommend to avoid result and use explicit returns unless you have a very strong preference for that.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hum.... I like result actually, it's a neat tool

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find it unusual and harder to reason. Another reason is that it is used arbitrarily in the procs. I believe it is better to follow the same pattern.

Copy link

@diegomrsantos diegomrsantos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Amazing job. Thanks for addressing the comments and delivering this.

@lchenut lchenut merged commit 6cda0c8 into master May 24, 2024
8 checks passed
@lchenut lchenut deleted the stun-protocol branch May 24, 2024 12:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants