- Development and testing of a Trigger Processor
- 2 Card based on a Kintex Ultrascale FPGA.
  - S. Mallios\*a, S. Sotiropoulosb, K. Adamidisa, G. Bestintzanosa, C. Fountasa, G. Karathanasisc, P. Katsoulisa, N. Manthosa, I. Papadopoulosa, S. Sotiropoulosb, P. Sphicasc, C. Vellidisc
  - a) University of Ioannina, Ioannina, Greece
  - b) Institute of Accelerating Systems and Applications (IASA), Greece
  - c) University of Athens, Athens, Greece

E-mail: stavros.mallios@cern.ch@mail.org

During the HL-LHC era, the upgraded detector will be read-out at an unprecedented data rate of up to 50 Tb/s and an event rate of 750Khz. Within the scope of Phase-2 RD, a Level-1 Trigger processor card was designed, by the Greek CMS Trigger team, to provide a hardware environment for developing and evaluating new Level-1 trigger muon designs and technologies. The board is powered by a Kintex UltraScale FPGA . A new firmware was developed implementing 16 Gbps links with IPbus support, to accommodate the testing of new algorithms. The hardware and firmware design of the board is presented here.

Topical Workshop on Electronics for Particle Physics (TWEPP2018) 17-21 September 2018 Antwerp, Belgium

\*Speaker.

## 1. Introduction

The upgraded High Luminosity LHC, after the third Long Shutdown (LS3), will provide an instantaneous luminosity of  $7.5 \times 10^{34} cm^{-2} s^{-1}$  (levelled), at the price of a dramatic increase of the number of pileup interactions. It is generally expected that the number of pileup interactions could reach 200 per bunch crossing. The upgraded detector will be read-out at an unprecedented data rate of up to 50 Tb/s and an event rate of 750 kHz. Within the scope of Phase 2 R&D, a new Level-1 Trigger processor card was designed, by the Greek CMS Trigger team, to provide a hardware environment for developing and evaluating new Level-1 trigger muon designs and technologies. The board comes with state-of-the-art fibre optics technologies, using micro footprint optical intercon-11 nects. For testing purposes, a new firmware was developed, implementing asynchronous 16Gbps 12 GTH links. The links use the 64b/66b encoding scheme with an overhead of 2 coding bits per 13 64 bits that is considerably more efficient than the previously-used 8b/10b encoding scheme. The hardware and firmware design of the processor card is presented here. 15

#### 16 2. The hardware

17

18

19

20

21

22

23

24

25

26 27 The board is powered by a Kintex UltraScale FPGA, that provides provides 20 next-generation GTH transceivers, that reach speeds up to 16.3 Gbps. The board comes with state-of-the-art fibre optics technologies, from Samtec. The high performance interconnect system uses active optical engines, that provide 12 full-duplex channels, at data rates up to 16 Gbps. Furthermore, 4 FPGA transceivers are routed to a QSFP28 connector, allowing data rates of up to 28 Gbps per channel over 4 channels. In total the boardâĂŹs 16x16 Gbps links add up to a total optical bandwidth of approximately 256 Gbps in each direction, making it a high-performance all-optical data-stream processor (Figure 1). A Xilinx ZYNQ System-on-Chip (SoC) device will be used as the control interface for the Kintex UltraScale FPGA. The system controller sets up or queries on-board resources, such as the power controllers and programmable clocks.



Figure 1: Altium 3D representation of the board

#### 2.1 The FPGA

30

31

33

36

The board has been designed to utilize the XCKU040 part, a mid-range Xilinx Kindex Ultrascale high-performance FPGA with a focus on price/performance ratio. It has a high DSP and block RAM-to-logic ratios and next-generation transceivers. Combined with low-cost packaging, it enables an optimum blend of capability and cost. The part is available in an FFVA1156 package with all the high speed MGTs placed on the left side of the part (Figure 2). The ultrascale architecture provides key innovations like next generation routing, ASIC-like clocking and enhanced logic blocks for a target of 90% utilization High-speed memory cascading to remove bottlenecks in DSP and packet processing. The board also provides up to 2GBit of DDR4 memory (four [256 Mb x 16] devices).



Figure 2: FFVA1156 Package - XCKU040 I/O Bank Diagram

## 8 2.2 High Speed Optical Links

The data-interface consists of 16 optical links operating in excess of 16 Gbps, making full use of the Multi-Gigabit Transceivers (MGTs) available on the Kindex Ultrascale fpga. 12 of the optical links are The FireFly optical flyover assembly is designed for flexibility and is interchangeable with the FireFly copper assembly using the same connector system. It is available with x12 simplex or duplex optical transceivers to achieve 16 Gbps per channel.

## 44 2.3 Clocking

The board includes 5 low jitter programmable clock sources (Figure 3). The GTH transceivers connected to the high speed Firefly modules, are clocked by a dedicated low jitter quad clock generator (Si5338). A low-jitter frequency generator (Si570) is connected to the QSFP transceivers and can be used as a secondary clock source to the Firefly transceivers. A jitter attenuator (Si5328B) is used to reduce the jitter of a received recovered clock. A fixed frequency clock source can be used as a free running clock for reset and initialization FSMs. Finally an SMA external clock input is also included. All programmable clocks are accessed through a dedicated I2C bus.



Figure 3: clocking

#### **2.4 PCB**

The board has 32 high speed differential-pairs running at 16 Gbps. The Firefly transceivers were placed very close to the FPGAs right side to achieve closer proximity to the Banks were all the MGTs are to simplify board layout and enhances signal integrity (Figure 4a). For the substrate, the Panasonic Megtron-6 was chosen due to the excellent high-frequency performance and impedance properties [Ref3]. A ground-plane has been placed between each layer containing high-speed traces, resulting in a 16-layer stack-up. All high speed differential pairs route lengths were matched by using serpentine routing (Figure 4b). Finally to avoid additional signal distortion caused by the plated-through hole (PTH) VIAs, by removing the via stub using a technique knows as back-drilling.



Figure 4: PCB details

## 3. Firmware

#### 3.1 Protocol

The 16Gbps links firmware is a lightweight, link-layer protocol that can be used to move data point-to-point across one or more high-speed serial lanes. It suports simplex operation with continues data transfer. The links are asynchronous meaning that the main algorithmic logic is clocked with a lower frequency than the link clock, allowing more flexibility when choosing the logic clock. This is achieved using asynchronous FIFOs in the receiving and the transmitting side. To compensate for the difference of the frequency padding words are being injected on the transmitting side and are stripped away on the receiving side. The link initialization and error handling are also based on the insertion and check of those padding words. For testing purposes the local clock is running at 240 MHz and the link clock at 250 MHz. The link encoding is the 64b/66b line code that transforms 64-bit data to 66-bit line code to provide enough state changes to allow reasonable clock recovery and alignment of the data stream at the receiver[ref4]. The protocol overhead of 64b/66b encoding is 2 coding bits for every 64 payload bits or 3.125%. This is considerably more efficient than the 25% overhead of the previously-used 8b/10b encoding scheme, which added 2 coding bits to every 8 payload bits.

# 78 3.2 Link initialization and error handling

The link bring-up and error detection is based on the generic 2-bit 64b/66b encoding header combined with the periodically sending of a padding word and CRC blocks. Single errors are considered soft errors and are monitored with a soft error counter. Continues errors are considered hard errors and result in auto reset and re-alignment of the links. The overhead of the 64b/66b encoding is 3.125% and the CRC/padding blocks are injected every 100 blocks resulting at a total overhead of 4.125%. The maximum time for the link (re)alignment is 200us

## 3.3 Testing

The functionality of the links was extensively tested using the KCU-105 ultrascale developement board. For the tests an FMC loopback card was used to implement a copper loopback quad link.



Figure 5: Link alignment and error handling block diagram

Latency: The latency of the GTH was measured at 9 CLKs adding the 2 FIFOs latency to cross between clock domains and the error checking code the link latency add up to a total latency of 23 CLKs. Bit Error Rate Tests: BER test were done by sending PRBS-31 data over an FMC copper loopback card. The links run for more than 72 hours without errors resulting in a BER < 10 -16

# 4. Conclusions

95 skata (0, )

## 96 References

C. Collaboration, *The Phase-2 Upgrade of the CMS Muon Detectors*, Tech. Rep.
CERN-LHCC-2017-012. CMS-TDR-016, CERN, Geneva, Sep, 2017.