Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Border Router performance optimization #4334

Closed
rohrerj opened this issue Apr 3, 2023 · 7 comments
Closed

Border Router performance optimization #4334

rohrerj opened this issue Apr 3, 2023 · 7 comments
Labels
i/proposal accepted i/proposal A new idea requiring additional input and discussion

Comments

@rohrerj
Copy link
Contributor

rohrerj commented Apr 3, 2023

Right now, the performance of the border router is strongly limited because a single goroutine per border router interface is responsible for reading, parsing, processing and forwarding the packets.

Proposal: To improve the performance of the border router we make changes to the router/dataplane.go file to separate the forwarding pipeline of each border router interface into a receiver, multiple processing routines and a forwarder, where the pool of processing routines are shared among all border router interfaces of a border router.

The receiver makes a batch read from the network socket using pre-allocated buffers from a pool and parses the source and flowID from the packet. Then we hash them to map the tuple of source and flowID to a processing routine and forward the packet to them. The processing routine will then process the packet as usual and forward the packet to the forwarder which is responsible for the egress interface. The forwarder forwards the packet to the NIC and returns the buffer used to store the packet back to the buffer pool of the receiver that received that packet.

Because of the hashing, all packets of the same source and same flowID are processed by the same processing routine and hence no packet reordering can happen.
The number of processing routines will be configurable.

These improvements are supervised by Marc Wyss from the network security group at ETH.

BorderRouter drawio

@rohrerj rohrerj added the i/proposal A new idea requiring additional input and discussion label Apr 3, 2023
@matzf
Copy link
Contributor

matzf commented Apr 3, 2023

This sounds great, cool that you want to tackle this!

As this is a significant change, we'll want to work out a more detailed design document as described in the contribution guide once there is general agreement that this proposal should be accepted. There is some more detailed information on these design documents in my work-in-progress PR here. I should really get around to finalize this 😳
The design document should describe the implementation in more detail. Things like how configuration options will work, how the code will be organized, how the implementation tasks can be split into multiple steps, etc. For this proposal specifically, I think it should also discuss the details of the buffer recycle mechanism. Also, any considerations on memory locality, locking goroutines to threads and cores etc, if this is something that we want to tackle. Maybe the whole thing could be organized in a way such that it would be at some point possible to replace the reading/writing goroutines with a different asynchronous IO mechanism, such as io_uring on linux?
I'm writing this just so you can maybe already start to thinking about this and prepare.

Perhaps one detail worth discussing on this level already is this; we should consider how we could add any special "slow path" handling of packets (e.g. malformed packets or SCMP traceroute), perhaps also including some form of rate limits for this, so that the "fast-path" is minimally affected.

@marcfrei
Copy link
Contributor

marcfrei commented Apr 5, 2023

Adding to the last point by @matzf, the new design should also consider more general traffic control schemes like, e.g., the one proposed in #4054, that we might want to add in the future.

@matzf
Copy link
Contributor

matzf commented Apr 6, 2023

Summarizing some additional points from the discussion of this proposal in the chat for future reference:

  • The current implementation was never built with performance in mind, it was mostly a PoC for the new Header format. (See posix-router: add header v2 posix-router #3869). From this perspective, it makes sense to quite heavily restructure it.
  • This implementation also serves as a reference implementation and so should remain somewhat simple and not be made unreadable by very complicated optimizations. This refers mainly to the actual packet processing logic, and applies less to the i/o parts.
  • There was some discussion on whether simple channels are a good choice for the communication between the goroutines involved here.
    • Channels have been found to be slow in the past, but was with older Go versions so this may no longer be the case.
    • Alternative could be a custom ringbuffer. There was an existing implementation in the code base that was used in the pre-header-v2 incarnation of the router (still used for dispatcher & gateway).
  • As a historical note and pointer for reference: the first version of the Go SCION router (pre header-v2) was implemented essentially in the newly proposed way, i.e., a reader go routine that was batch reading from the UDP socket, a set of processing go routines and one output go-routine per external SCION interface. Between the go-routines our own implementation of a ringbuffer was used.
    That implementation was far from perfectly optimized and was hard to understand. Some of the complexity may also have been due to the (over-)ambitious configuration hot-reloading.
    See https://github.com/scionproto/scion/tree/92531f5cb62197b9d705001c13e5a6bdb7ba1fa4/go/border

Going forward, we're likely going accept this proposal. The final comment period starts now, and unless there are objections during this period, the proposal will be marked as accepted. As mentioned above, @rohrerj, you can then submit a more in-depth design document describing how you plan to implement this proposal.

@matzf matzf added the i/proposal final comment period Final comment period (10 days) label Apr 6, 2023
@dtaht
Copy link

dtaht commented Apr 8, 2023

This design is remarkably similar to my fq_codel stuff (RFC8290), which does flow isolation and active queue management. One of the tricks in fq_codel to speed processing is that sparser updates are processed sooner than more bulky updates.
Also somewhat related to that, in trying to understand scion for the first time, is what happens when the application is overloaded? How does it shed load?

@matzf
Copy link
Contributor

matzf commented Apr 17, 2023

No objections were raised during the final comment period, which means that this proposal is now accepted.
As mentioned before, please submit a design document, describing in more detail how this proposal will be implemented.

@matzf matzf added i/proposal accepted and removed i/proposal final comment period Final comment period (10 days) labels Apr 17, 2023
@matzf
Copy link
Contributor

matzf commented Apr 17, 2023

Thanks for the input @dtaht. I saw that you've joined the chat, I will respond to your question over there.

@rohrerj
Copy link
Contributor Author

rohrerj commented Apr 20, 2023

I created the PR 4339 for the design document.

@matzf matzf closed this as completed in b4a2011 Sep 28, 2023
jiceatscion pushed a commit to jiceatscion/scion that referenced this issue Sep 28, 2023
Add a low-priority slow path for special case packet handling, in
particular SCMP errors and traceroutes, for the the asynchronous packet
processing pipeline in the router added in scionproto#4351.

This is implementation part three of three described in the design
document (scionproto#4339, doc/dev/design/BorderRouter.rst).

Closes scionproto#4334
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
i/proposal accepted i/proposal A new idea requiring additional input and discussion
Projects
None yet
Development

No branches or pull requests

4 participants