**Design and Applications for Embedded Networks-on-Chip on FPGAs**Mohamed S. Abdelfattah, Andrew Bitar and Vaughn Betz

Thank you for taking the time to consider our submission. This paper is an extended version of our conference paper:

*M. S. Abdelfattah, A. Bitar and V. Betz, “Take the Highway: Design for Embedded NoCs on FPGAs,” in International Symposium on Field-Programmable Gate-Arrays (FPGA), pp.98-107, Feb. 2015. (Best paper award)*

We have therefore ensured that there is at least 50% new material in this manuscript (~6 new pages). Our original conference paper presents the FabricPort interface to connect an embedded NoC to the FPGA fabric and defines the design rules required to ensure FPGA-style communication can be implemented on embedded NoCs. Additionally, 2 application implementations on JPEG and Ethernet switches highlighted the use cases and advantages of using an embedded NoC on an FPGA.

In this submission, we rewrote large parts of the paper including a new introduction and motivation. We significantly extend our original submission with four main contributions. First, we complement the FabricPort interface with IOLinks that directly connect the NoC to fast I/O interfaces. We believe this is crucially important in easing the timing closure to these I/O interfaces. We present a detailed case study with IOLinks connecting a typical FPGA memory controller and we show that this improves memory access latency. Second, we present an explanation of how our hybrid NoC-RTL simulator (RTL2Booksim) is created. Third, we design a 400 Gb/s packet processor that leverages our embedded NoC. We show that our new design improves latency and area utilization compared to previous FPGA packet parser designs. Finally, we revise our design rules and conditions and include an additional design rule pertaining to clock drift in latency-sensitive designs on FPGAs.

New material enumerated:

1. Present IOLinks: direct connections between the embedded NoC and the FPGA’s memory and I/O controllers.
2. Present RTL2Booksim: a simulator that allows the co-simulation of a software NoC simulator (booksim) and hardware RTL designs.
3. Compare the latency of external (DDR3) memory access with an embedded NoC (with IOLink) or a soft bus.
4. Design a packet processor using the embedded NoC that is more flexible and efficient than previous FPGA packet processors.
5. Additional design constraint for latency-sensitive design on embedded NoCs pertaining to clock drift.
6. Rewritten intro and motivation to better articulate the necessity of embedded NoCs for a wider audience.