# The GBT Project

P. Moreira<sup>a</sup>, R. Ballabriga<sup>a</sup>, S. Baron<sup>a</sup>, S. Bonacini<sup>a</sup>, O. Cobanoglu<sup>a</sup>, F. Faccio<sup>a</sup>, T. Fedorov<sup>b</sup>, R. Francisco<sup>a</sup>, P. Gui<sup>b</sup>, P. Hartin<sup>b</sup>, K. Kloukinas<sup>a</sup>, X. Llopart<sup>a</sup>, A. Marchioro<sup>a</sup>, C. Paillard<sup>a</sup>, N. Pinilla<sup>b</sup>, K. Wyllie<sup>a</sup> and B. Yu<sup>b</sup>

<sup>a</sup> CERN, 1211 Geneva 23, Switzerland <sup>b</sup> SMU, Dallas TX 75275-0338, USA

# Paulo.Moreira@cern.ch

### Abstract

The GigaBit Transceiver (GBT) architecture and transmission protocol has been proposed for data transmission in the physics experiments of the future upgrade of the LHC accelerator, the SLHC. Due to the high beam luminosity planned for the SLHC, the experiments will require high data rate links and electronic components capable of sustaining high radiation doses. The GBT ASICs address this issue implementing a radiation-hard bi-directional 4.8 Gb/s optical fibre link between the counting room and the experiments. The paper describes in detail the GBT-SERDES architecture and presents an overview of the various components that constitute the GBT chipset.

#### I. RADIATION HARD OPTICAL LINK ARCHITECTURE

The goal of the GBT project is to produce the electrical components of a radiation hard optical link, as shown in **Figure 1**. One half of the system resides on the detector and hence in a radiation environment, therefore requiring custom electronics. The other half of the system is free from radiation and can use commercially-available components. Optical data transmission is via a system of opto-electronics components produced by the Versatile Link project, described elsewhere in these proceedings [1]. The architecture incorporates timing and trigger signals, detector data and slow controls all into one physical link, hence providing an economic solution for all data transmission in a particle physics experiment.



Figure 1 Radiation-hard optical link architecture

The on-detector part of the system consists of the following components.

**GBTX:** a serializer-de-serializer chip receiving and transmitting serial data at 4.8 Gb/s [2]. It encodes and decodes the data into the GBT protocol and provides the interface to the detector front-end electronics. Some of the implementation aspects of this ASIC will be the subject of the following sections.

**GBTIA:** a trans-impedance amplifier receiving the 4.8 Gb/s serial input data from a photodiode [3]. This device was specially designed to cope with the performance degradation of PIN-diodes under radiation. In particular the GBTIA can handle very large photodiode leakage currents (a condition that is typical for PIN-diodes subjected to high radiation doses [1]) with only a moderate degradation of the sensitivity. The device integrates in the same die the transimpedance pre-amplifier, limiting amplifier and  $50 \Omega$  line driver. The GBTIA was fabricated and tested for performance and radiation tolerance with excellent results. A complete description of the circuit and tests can be found in [3] in these proceedings.

GBLD: a laser-driver ASIC to modulate 4.8 Gb/s serial data on a laser [4]. At present it is not yet clear which type of laser diodes, edge-emitters or VCSELs, will offer the best tolerance to radiation [1]. The GBLD was thus conceived to drive both types of lasers. These devices have very different characteristics with the former type requiring high modulation and bias currents while the latter need low bias and modulation currents. The GBLD is thus a programmable device that can handle both types of lasers. Additionally, the GBLD implements programmable pre- and de-emphasis equalization, a feature that allows its optimisation for different laser responses. The GBLD has been prototyped and it is functional but displays a limited bandwidth and, therefore requires a small re-design to correct for under-estimated parasitic effects in the layout. Reference [4] in these proceedings describes the laser driver circuits and discusses the experimental results.

**GBT-SCA:** a chip to provide the slow-controls interface to the front-end electronics. This device is optional in the GBT system. Its main functions are to adapt the GBT to the most commonly used control buses used in High Energy Physics (HEP) as well as the monitoring of detector environmental quantities such as temperatures and voltages. The device is still in an early phase of specification and a discussion of its architecture can be found in reference [5] in these proceedings.

The off-detector part of the GBT system consists of a Field-Programmable-Gate-Array (FPGA), programmed to be compatible with the GBT protocol and to provide the interface to off-detector systems.

To implement reliable links the on-detector components have to be tolerant to total radiation doses and to single event effects (SEE), for example transient pulses in the photodiodes and bit flips in the digital logic [6]. The chips will therefore be

implemented in commercial 130 nm CMOS to benefit from its inherent resistance to ionising radiation. Tolerance to SEE is achieved by triple modular redundancy (TMR) and other architectural choices described later in this paper. One such measure is forward error correction (FEC), where the data is transmitted together with a Reed-Solomon code which allows both error detection and correction in the receiver [2] and [7]. The format of the GBT data packet is shown in **Figure 2**. A fixed header (H) is followed by 4 bits of slow control data (SC), 80 bits of user data (D) and the Reed-Solomon FEC code of 32 bits. The coding efficiency is therefore 88/120 = 73%, and the available user bandwidth is 3.2 Gb/s.



Figure 2 GBT frame format

FPGA designs have been successfully implemented in both Altera and Xilinx devices, and reference firmware is available to users. Details on the FPGA design can be found in reference [8] in these proceedings.

#### II. THE GBTX PROTOTYPE: GBT-SERDES

The GBTX will be based on a 4.8 Gb/s Serializer-Deserializer (SERDES) circuit which will convert the input data received from the front-end electronics into a serial stream with the GBT format and will de-serialize the GBT frame transmitted from the counting room and feed the data to the front-end electronics.

From the point of view of manufacturability this circuit requires careful study and planning since it operates at high frequency with tight timing margins. Total dose radiation tolerance and robustness to Single Event Upsets (SEU) are major design requirements. They call for the use of circuits that have speed and power penalties when compared with those commonly used in engineering projects that target the consumer markets. An additional constraint that is specific to HEP applications is the requirement of predictable and constant latency links. To study the feasibility of a SERDES circuit that can handle all of these constraints in a commercial 130 nm CMOS technology, a prototype (the GBT-SERDES) is currently under development.



Figure 3 GBT-SERDES architecture

The architecture of the GBT-SERDES is shown in Figure 3. It is broadly composed of a transmitter (TX) and a receiver

(RX) section. The TX receives parallel data through the Parallel Input (Parallel In) interface. The parallel data is then scrambled and Reed-Salomon encoded before it is fed to the Serializer (SER) where it is converted into a 4.8 Gb/s serial stream with the frame format described above. On the RX side, after serial to parallel conversion in the De-serializer circuit (DES), the data is fed to the frame aligner, then Reed-Salomon decoded and de-scrambled before it is sent to the external parallel bus through the parallel output interface. The procedures adopted for Reed-Solomon encoding/decoding and scrambling/descrambling used in this implementation were already discussed in detail in references [2] and [7] and will not be reviewed in this work. For cost savings in the prototype, a time-division multiplexed parallel bus was adopted for the input and output buses thus significantly reducing the silicon area required to fabricate the circuit since the ASIC is pad limited.

In the receiver and transmitter data paths, switches have been inserted between the functional blocks. These switches allow routing the data, at different levels of depth down the data path, from either the RX into the TX or from the TX to the RX. This functionality can be used for evaluation testing of the ASIC but it mainly aims at providing a link diagnostics tool for field tests of the optical link that will use the GBTX. Further self testing features are a Pseudo Random Bit Sequence (PRBS) generator in the TX. The PRBS generator can also be programmed to produce constant data or a simple bit count. As shown in Figure 3 only the performance critical blocks (shaded regions) are implemented using full-custom design techniques while the remaining circuits are based on the standard library cells provided by the foundry.

The full custom circuits include the Serializer (SER), the de-serializer (DES) with its Clock and Data Recovery (CDR) circuit, the Clock Generator (CG) and the Phase Shifter (PS). The serializer circuit is described in detail elsewhere in these proceedings [9] and consequently will not be described here.

**De-serializer:** The de-serializer block diagram is represented in Figure 4. Its main features are: a Half-rate Phase/Frequency- Detector (HPFD), frequency aided lock acquisition and a constant-latency "barrel-shifter.



Figure 4 De-serializer architecture

CDR: A Half-rate Alexander Phase/Frequency Detector (HPFD) is used in the GBT-SERDES since it allows the use of a lower operation frequency of the CDR PLL and hence safer timing margins in the de-serializer circuit. Although the HPFD is of the bang-bang type, it is well suited for operation with scrambled data since the phase-error information is only provided when data transitions are present on the incoming serial stream. Although the phase detector used also detects frequency, its detection range is insufficient to cover all the process, voltage and temperature variations. To ensure that the CDR can always lock to the data it is thus necessary to pre-calibrate the VCO "free-running" oscillation frequency. For that, the VCO has two control inputs: a coarse control input that allows the centring of the VCO oscillation frequency and a fine control input that is under the CDR HPFD control and allows the CDR circuit to lock to the serial data. The ASIC provides two alternative ways to centre the VCO free-running oscillation frequency. In one method, a 9bit voltage DAC (not shown in Figure 4) is used to control the coarse input of the VCO. When using the DAC, the calibration procedure is the following. In a first phase the oscillation frequency of the VCO is compared with the reference clock frequency and a search of the coarse control voltage that leads to the smallest frequency error is done. When that operation is complete, the control is passed to the CDR HPFD which will finally pull the VCO frequency to data frequency and finally will lock to the phase of the incoming serial stream. In a second method the CDR VCO coarse voltage is derived from that of a reference PLL that is locked to the reference clock (see Figure 4). The VCOs in both PLL are replicas of each other so that for the same control voltage they should have the same oscillation frequency. Due to statistical variations on the fabrication process this is however not exact, leading to a slight difference between the VCO frequencies. The CDR VCO fine control voltage is under control of the CDR loop and, due to the frequency detecting ability of the HRPD, will be able to pull the CDR VCO to that of the incoming serial data.

Barrel-shifter: Since a Half-Rate phase detector is used there is an ambiguity of 180° on the phase of the VCO clock signal in relation to the phase of the incoming data. This ambiguity is non-deterministic and will vary randomly every time the CDR circuit is started. Moreover, since the word clock (40 MHz) is generated by frequency division of the VCO clock (2.4 GHz), its phase is random in relation to the start of the frame (i.e. frame header) and consequently to the LHC bunch-crossing clock. The receiver must thus find the boundaries of the frame in order to correctly interpret the incoming data. That function is commonly implemented in de-serializers by a barrel-shifter. These devices are used to search for the position of the frame header in a shift register. When found, the following bits in the shift register are taken to be the data. In other words, the serial data is shifted until the frame header aligns with the word clock. This method has however the disadvantage of having a non-predictable latency: every time the system is restarted the phase of the word clock is random in relation to the frame header. To avoid this problem and thus to guarantee fixed latency, a novel "barrel-shifter" principle is used in the GBT-SERDES. In this circuit, instead, the clock is shifted until the frame header is found in a definite position in the shift register. This guaranties that the clock is always aligned with the frame header. To phase shift the clock in order to search for the frame header the clock is phase advanced by a VCO clock cycle at a time. This is made by forcing the counter to skip a count cycle every time the clock phase needs to be advanced. Even when the frame header has been found in the correct position there is still an uncertainty of half clock cycle which is intrinsic to the use of the half-rate phase detector. This final ambiguity is resolved by the header detection circuit and the codes chosen for the header that together can detect if the phase of the VCO clock is in phase or in anti-phase with the header. After this phase relationship has been determined an extra phase shift of half clock cycle can be made if necessary in order to align the word clock with the beginning of the frame header and thus ensuring predictable and fixed latency as required for trigger links in HEP applications.

### PHASE SHIFTER:

The purpose of the phase shifter is to generate multiple clocks as local timing references that are synchronous with the accelerator clock. The frequency and phase of the output clocks are digitally programmable. The output clock frequency can be 40 MHz, 80 MHz, or 160 MHz and the phase resolution is 50 ps independent of the frequency.

To handle multiple output frequencies and a phase resolution of 50 ps in a range of 25 ns (for the 40 MHz clock), the phase shifter is designed to consist of three components: a PLL, Coarse De-skewing Logic (CDL), and Fine De-skewing Logic (FDL). Figure Figure 5 depicts the overall system block diagram.



Figure 5 The block diagram of the phase shifter

From the 40 MHz accelerator reference, the PLL generates the FastClk of 1.28 GHz (with a period of 781 ps) for both the CDL and FDL blocks. The divider in the PLL is made of a 5-bit binary counter whose outputs are used by the CDL to produce the right output clock frequency. Since the output clocks are synchronized with FastClk, the PLL guarantees the synchronization of the output clocks with the machine reference clock.

In addition to performing frequency selection, the CDL shifts the clock by multiple periods of the FastClk according

to the MSB bits of the control word (Delay [8:4] in Figure 5). The output of the CDL block is therefore a clock of the specified frequency with the phase shifted by multiples of 781ps.

The FDL is designed to fine de-skewing the clock by a fraction of 781 ps (one period of the FastClk). It is based on a modified DLL structure with a 16-stage voltage controlled delay line (VCDL). The 16 delay stages allow for fine deskewing the clock by 1/16 of one period of the FastClk to obtain the 50 ps delay resolution. This is achieved by feeding the CDL clock to the VCDL and connecting a delayed version of the CDL clock, delayed by one clock cycle of the FastClk, to the phase detector (PD). The other input of the PD is the VCDL output. This architecture sets the delay through the VCDL to be exactly one period of FastClk, 781 ps, thus the delay through each stage is 50 ps. A 16:1 Mux is used to select the appropriate delay stage output based on the FDL control word (Delay[3:0]).

To generate multiple clock outputs simultaneously using this architecture, replicas of the CDL and FDL can be employed whereas one PLL can be shared among different channels. In the first version of the GBT chip, three phaseshifting channels are implemented.

**C4 PACKAGE:** The GBT-SERDES, and even more-so the future GBTX, are heavily pad-limited ASICs. Adoption of a wire bond packaging technique would result in high silicon area and thus in high silicon cost. C4 packages (flip-chip) and ASIC design techniques allow the distribution of the I/O over the full area of the ASIC and therefore reduce the wasted silicon area in pad limited designs. C4 packages are always custom made and thus incur development costs. However, in the case of the GBT-SERDES, the cost balance is in favour of the use of a C4 package.

Due to the absence of bond-wires, C4 packages exhibit very low parasitic inductances on the chip-to-package interconnect. Moreover, since they use fabrication technologies very similar to the ones employed for the fabrication of PCBs, it is possible to design controlled impedance transmission lines directly in the package in order to optimize the high speed connections. Considering both the economical and electrical advantages that the use of a C4 package could bring it was thus chosen to package the GBT-SERDES in a  $13 \times 13$  bump-pad C4 package.

# III. STATUS AND FUTURE DEVELOPMENTS

The GBT-SERDES is expected in early 2010 and will then undergo tests, including an irradiation programme. These will verify the functionality of the serializer and de-serializer blocks which will then be incorporated into the final GBTX design. This will contain a more sophisticated digital interface for coupling to the front-end systems, as illustrated in **Figure 6** and **Figure 7**. The interface will be configurable so the user can select an appropriate mode to input and output the 80 bits of data per frame. Parallel mode (**Figure 6**) uses a 40-bit bidirectional double-data-rate bus running at the system frequency. The user can also split this into 5 independent 8-bit busses. An alternative configuration uses serial data transport, known as E-link mode (**Figure 7**). The interface can provide

40, 20 or 10 bidirectional serial links running at 80 Mb/s, 160 Mb/s and 320 Mb/s respectively. Each port transmits and receives the serial data and clock using the Scalable Low Voltage Signalling (SLVS) standard. The E-link port is being implemented as a portable design macro that can be incorporated easily within the design of a front-end chip. More details of this and SLVS can be found in [11]. One E-port can be dedicated to communication with the GBT-SCA chip (although other uses are not precluded). This will provide an interface between the GBT protocol and standards such as I2C and JTAG [5].



Figure 6 Parallel interface mode



Figure 7 E-Link interface mode

The user will be able to operate the GBTX in one of three different data modes. In transceiver configuration, the chip will handle full bi-directional data, receiving its configuration from the link and acting as a clock source for the on-detector system. In simplex receiver configuration, the chip will receive data from the off-detector system and the transmission functions are disabled. The GBTX will provide the clock and can still be configured via the link, but the reading of its status will have to be done via a secondary link. In simplex transmitter configuration, the GBTX transmits data from the detector and the receiver functions are disabled. The chip will therefore require an external clock and configuration link. Both of these can be fulfilled by, for example, another GBTX in the transceiver configuration. These different configuration

possibilities allow the user to optimise the GBT for their particular system.

#### IV. CONCLUSIONS

The GBT project is now at the prototyping stage for all components in the chipset. Measurements of the prototype GBTIA and GBLD indicate that functionality has been achieved, but some corrections are required in the case of the GBLD. The GBT-SERDES, incorporating the serializer and de-serializer blocks, has been designed with special measures to enhance radiation tolerance and will be submitted for fabrication in November 2009. Results are expected in early 2010 when the design of the final GBTX chip will start.

## V. REFERENCES

- [1] J. Troska et al., 'The Versatile Transceiver Proof of Concept', these proceedings
- [2] P. Moreira et al., 'The GBT, a Proposed Architecture for Multi-Gb/s Data Transmission in High Energy Physics', Topical Workshop on Electronics for particle Physics, Prague, Czech Republic, 3 7 Sept. 2007, pp. 332-336
- [3] M. Menouni et al., 'The GBTIA, a 5 Gbit/s radiation-hard optical receiver for the SLHC upgrades', these proceedings

- [4] G. Mazza et al., 'A 5 Gb/s Radiation Tolerant Laser Driver in 0.13 um CMOS technology', these proceedings
- [5] A. Gabrielli et al., 'The GBT-SCA, a radiation tolerant ASIC for detector control applications in SLHCB experiments', these proceedings
- [6] A. Pacheco et al, 'Single-Event Upsets in Photoreceivers for Multi-Gb/s Data Transmission', Nuclear Science, IEEE Transactions on Volume 56, Issue 4, Part 2, Aug. 2009 Page(s):1978 1986
- [7] G. Papotti et al., 'An Error-Correcting Line Code for a HEP Rad-Hard Multi-GigaBit Optical Link', Proceedings of the 12<sup>th</sup> Workshop on Electronics for LHC and Future Experiments, Valencia, Spain, 25-29 Sept 2006, CERN-LHCC-2007-006
- [8] F. Marin et al., 'Implementing the GBT data transmission protocol in FPGAs', these proceedings
- [9] O. Cobanoglu et al. 'A Radiation Tolerant 4.8 Gb/s Serializer for the Giga-Bit Transceiver', these proceedings
- [10] B. Razavi, 'Challenges in the Design of High-Speed Clock and Data Recovery Circuits', IEEE Communications Magazine, August 2002, pp: 94-101
- [11] S. Bonacini et al., 'e-link: A Radiation-Hard Low-Power Electrical Link for Chip-to-Chip Communication', these proceedings