- Development and testing of a Trigger Processor
- <sup>2</sup> Card based on a Kintex Ultrascale FPGA.

S. Mallios\*a, S. Sotiropoulosb, K. Adamidisa, G. Bestintzanosa, C. Fountasa, G. Karathanasisc, P. Katsoulisa, N. Manthosa, I. Papadopoulosa, S. Sotiropoulosb, P. Sphicasc, C. Vellidisc

a University of Ioannina, Ioannina, Greece
b Institute of Accelerating Systems and Applications (IASA), Greece
c University of Athens, Athens, Greece
E-mail: stavros.mallios@cern.ch@mail.org

During the HL-LHC era , the upgraded detector will be read -out at an unprecedented data rate of up to 50Tb/s and an event rate of 750kHz. Within the scope of Phase-2 RD, a Level-1 Trigger processor card was designed, by the Greek CMS Trigger team, to provide a hardware environment for developing and evaluating new Level-1 trigger muon designs and technologies. The board is powered by a Kintex UltraScale FPGA . A new firmware was developed implementing 16Gbps links with IPbus support, to accommodate the testing of new algorithms. The hardware and firmware design of the board is presented here.

Topical Workshop on Electronics for Particle Physics (TWEPP2018) 17-21 September 2018 Antwerp, Belgium

\*Speaker.

# 3 1. Introduction

The upgraded High Luminosity LHC, after the third Long Shutdown (LS3), will provide an instantaneous luminosity of 7.5 10<sup>34</sup> cm<sup>2</sup> s<sup>2</sup> s<sup>2</sup> (levelled), at the price of a dramatic increase of the number of pileup interactions. It is generally expected that the number of pileup interactions could reach 200 per bunch crossing. The upgraded detector will be read-out at an unprecedented data rate of up to 50 Tb/s and an event rate of 750 kHz. Within the scope of Phase 2 R&D, a new Level-1 Trigger processor card was designed, by the Greek CMS Trigger team, to provide a hardware environment for developing and evaluating new Level-1 trigger muon de-10 signs and technologies. The board is powered by a Kintex UltraScale FPGA that provides the best price/performance/watt at 20nm and includes the highest signal processing bandwidth in a mid-12 range device. The ultrascale FPGA provides next-generation transceivers that reach speeds up to 13 16 Gbps. The board comes with state-of-the-art fibre optics technologies, using micro footprint optical interconnects, that provide up to 192 Gbps full-duplex bandwidth over 12 channels. Fur-15 thermore, a QSFP+ connector provides an additional 40 Gbps of bandwidth over 4 channels. For 16 testing purposes, a new firmware was developed, implementing synchronous and asynchronous 17 16Gbps GTH links. The links use the 64b/66b encoding scheme with an overhead of 2 coding bits 18 per 64 bits that is considerably more efficient than the previously-used 8b/10b encoding scheme. 19 In addition, a simple infrastructure with input/output buffers and IPBus support, was developed 20 to accommodate the testing of new algorithms. The hardware and firmware design of the proces-21 sor card is presented here. that provide up to 192 Gbps full-duplex bandwidth over 12 channels. 22 Furthermore, a QSFP+ connector provides an additional 40 Gbps of bandwidth over 4 channels. 23 For testing purposes, a new firmware was developed, implementing synchronous and asynchronous 16Gbps GTH links. The links use the 64b/66b encoding scheme with an overhead of 2 coding bits 25 per 64 bits that is considerably more efficient than the previously-used 8b/10b encoding scheme. 26 In addition, a simple infrastructure with input/output buffers and IPBus support, was developed to accommodate the testing of new algorithms. The hardware and firmware design of the processor 28 card is presented herehat provide up to 192 Gbps full-duplex bandwidth over 12 channels. Fur-29 thermore, a QSFP+ connector provides an additional 40 Gbps of bandwidth over 4 channels. For testing purposes, a new firmware was developed, implementing synchronous and asynchronous 31 16Gbps GTH links. The links use the 64b/66b encoding scheme with an overhead of 2 coding bits 32 per 64 bits that is considerably more efficient than the previously-used 8b/10b encoding scheme. 33 In addition, a simple infrastructure with input/output buffers and IPBus support, was developed to accommodate the testing of new algorithms. The hardware and firmware design of the processor 35 card is presented here. 36

#### 37 2. The board

The board is powered by a Kintex UltraScale FPGA, that provides the best price per performance per watt at 20 nm technology in a mid-range device. The Kintex UltraScale FPGA provides 20 next-generation GTH transceivers that reach speeds up to 16.3 Gbps. The board comes with state-of-the-art fibre optics technologies, from Samtec. The high performance interconnect system uses active optical engines, that provide 12 full-duplex channels, at data rates up to 16 Gbps.

Furthermore, 4 FPGA transceivers are routed to a QSFP28 connector, allowing data rates of up to 28 Gbit/s per channel over 4 channels. In total the boardâĂŹs 16x16 Gbps links add up to a total optical bandwidth of approximately 256 Gbps in each direction, making it a high-performance all-optical data-stream processor. A Xilinx ZYNQ System-on-Chip (SoC) device will be used as the control interface for the Kintex UltraScale FPGA. The system controller sets up or queries on-board resources, such as the power controllers and programmable clocks.

### 49 **2.1 Optical interface**

The FireFly optical flyover assembly is designed for flexibility and is interchangeable with the FireFly copper assembly using the same connector system. It is available with x12 simplex or duplex optical transceivers to achieve 16 Gb/s per channel.

# 53 2.2 Clocking

The board includes 5 clock sources. The GTH transceivers connected to the high speed Firefly modules are clocked by a dedicated low jitter quad clock generator (Si5338). A low-jitter frequency generator (Si570) is connected to the QSFP transceivers and can be used as a secondary clock source to the Firefly transceivers. A jitter attenuator (Si5328B) is used to reduce the jitter of an RX recovered clock. A fixed frequency clock source can be used as a free running clock for reset and initialization FSMs. Finally an SMA external clock input is also included. All programmable clocks are accessed through a dedicated I2C bus.

#### 61 2.3 The FPGA

#### 62 **2.4 PCB**

Plated-through hole (PTH) via structures in high layer-count printed circuit boards (PCBs) can significantly distort high speed digital signals that pass through them. Often the distortion is severe enough that digital receivers can no longer ascertain whether a logical one or a logical zero was originally transmitted.

#### 67 3. Firmware

### 88 3.1 Protocol

Transforms 64-bit data to 66-bit line code to provide enough state changes to allow reasonable clock recovery and alignment of the data stream at the receiver. The protocol is using the 64b/66b encoding. The links are Asynchronous meaning that the main algorithmic logic is clocked with a lower frequency than the link clock. It allows more flexibility when choosing the logic clock. This is achieved using asynchronous FIFOs in the receiving and the transmitting side. To compensate for the difference of the frequency a special word is being injected when the FIFO is empty. The alignment of the links are also based on the insertion and check of this padding word. For testing purposes the local clock is running at 240 MHz and the link clock at 250 MHz.

## 7 3.2 aligmnent

The link bring-up and error detection is based on the generic 2-bit 64b/66b encoding header combined with the periodically sending of a padding word and CRC blocks. Single errors are considered soft errors and are monitored with a soft error counter. Continues errors are considered hard errors and result in auto reset and re-alignment of the links. The overhead of the 64b/66b encoding is 3.125% and the CRC/padding blocks are injected every 100 blocks resulting at a total overhead of 4.125%. The maximum time for the link (re)allignment is 200us

#### 84 Testing

85

87

94

95

96

97

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

The functionality of the links was extensively tested using the KCU-105 ultrascale developement board. For the tests an FMC loopback card was used to implement a copper loopback quad link.

Latency: The latency of the GTH was measured at 9 CLKs adding the 2 FIFOs latency to cross between clock domains and the error checking code the link latency add up to a total latency of 23 CLKs. Bit Error Rate Tests: BER test were done by sending PRBS-31 data over an FMC copper loopback card. The links run for more than 72 hours without errors resulting in a BER < 10 -16

#### 93 4. Conclusions

The upgraded High Luminosity LHC, after the third Long Shutdown (LS3), will provide an instantaneous luminosity of 7.5 10<sup>34</sup> cm<sup>3-1</sup> (levelled), at the price of a dramatic increase of the number of pileup interactions. It is generally expected that the number of pileup interactions could reach 200 per bunch crossing. The upgraded detector will be read-out at an unprecedented data rate of up to 50 Tb/s and an event rate of 750 kHz. Within the scope of Phase 2 R&D, a new Level-1 Trigger processor card was designed, by the Greek CMS Trigger team, to provide a hardware environment for developing and evaluating new Level-1 trigger muon designs and technologies. The board is powered by a Kintex UltraScale FPGA that provides the best price/performance/watt at 20nm and includes the highest signal processing bandwidth in a midrange device. The ultrascale FPGA provides next-generation transceivers that reach speeds up to 16 Gbps. The board comes with state-of-the-art fibre optics technologies, using micro footprint optical interconnects, that provide up to 192 Gbps full-duplex bandwidth over 12 channels. Furthermore, a OSFP+ connector provides an additional 40 Gbps of bandwidth over 4 channels. For testing purposes, a new firmware was developed, implementing synchronous and asynchronous 16Gbps GTH links. The links use the 64b/66b encoding scheme with an overhead of 2 coding bits per 64 bits that is considerably more efficient than the previously-used 8b/10b encoding scheme. In addition, a simple infrastructure with input/output buffers and IPBus support, was developed to accommodate the testing of new algorithms. The hardware and firmware design of the processor card is presented here.

#### **4.1 PCB**

Plated-through hole (PTH) via structures in high layer-count printed circuit boards (PCBs) can significantly distort high speed digital signals that pass through them. Often the distortion is severe



Figure 1: Altium 3D represantation of the board

enough that digital receivers can no longer ascertain whether a logical one or a logical zero was originally transmitted.



Figure 2: Serpentine routing of high speed differential signals