<img src="images/strathsdr_banner.png" align="left" >

# Overview of the RFSoC Architecture

----

Throughout this notebook and beyond, we will explore the Xilinx Zynq RFSoC architecture, RF Data Converters (RF DCs), and development tools. There are a wide variety of RFSoC devices available. Therefore, we will pay particular attention to the ZU28DR RFSoC device, as this is used widely on several RFSoC development platforms. By the end of this notebook series you will have thoroughly explored many of the RFSoC's capabilities and be more 'in-tune' with its processing features. These notebooks will enable you to start your RFSoC developement adventure.

## Table of Contents
* [Overview of the RFSoC Architecture](#overview_of_the_rfsoc_architecture)
    * [Generations of RFSoC](#generations_of_rfsoc)
        * [The First in a Generation](#the_first_in_a_generation)
        * [The RFSoC Device Family](#the_rfsoc_device_family)
    * [The Processing System](#the_processing_system)
        * [The Application Processor](#the_application_processor)
        * [The Real-Time Processor](#the_real_time_processor)
        * [Platform Management and Security](#platform_management_and_security)
    * [The Programmable Logic](#the_programmable_logic)
        * [The Logic Fabric](#the_logic_fabric)
        * [Special Processing Resources](#special_processing_resources)
        * [The RF Interface](#the_rf_interface)
        * [Super Sample Rate](#super_sample_rate)
    * [The RF Data Converters](#the_rf_data_converters)
        * [The RF ADC Hierarchy](#the_rf_adc_hierarchy)
        * [The RF DAC Hierarchy](#the_rf_dac_hierarchy)
    * [Conclusion](#conclusion)
    * [References](#references)
    
## Revision History
* **v1.0** | 27/02/2021 | First RFSoC overview notebook
    

----

## Generations of RFSoC <a class="anchor" id="generations_of_rfsoc"></a>
The Zynq System-on-Chip (SoC) [[1]](#ref-1) and the Zynq Multiprocessor System-on-Chip (MPSoC) [[2]](#ref-2), each have several chip variations consisting of different families and features. The Zynq RFSoC is similar to its predecessors. In particular, it has 'device generations', offering progression of functionality and capabilities through the generations. For the purpose of keeping things simple, this notebook series will only describe generation one (commonly referred to as Gen-1) RFSoC devices throughout examples and illustrations. Keep this in mind as you are progressing through this material.

### The First in a Generation <a class="anchor" id="the_first_in_a_generation"></a>
The Xilinx RFSoC is a single-chip solution for applications demanding high sample rate processing. The Gen-1 RFSoC devices can be considered as a combination of the Zynq MPSoC and several channels of GHz RF samplers. A diagram illustrating the architecture of a typical RFSoC device [[5]](#ref-5) can be seen in [Figure 1](#fig-1).

<a class="anchor" id="fig-1"></a>
<figure>
<img src='./images/rfsoc_architecture_overview.svg' width='80%'/>
    <figcaption><b>Figure 1: High-level overview of the RFSoC ZU28DR device architecture.</b></figcaption>
</figure>

Below is a list of major RFSoC components that are housed entirely on one RFSoC chip:

* A Processing System (PS).
* Programmable Logic (PL) containing Field Programmable Gate Array (FPGA) logic fabric.
* 8 or 16 Channels of RF Samplers.
* Soft Decision Forward Error Correction (SD-FEC) Blocks.

The RFSoC's PS, contains an Application Processing Unit (APU), a Real-Time Processing Unit (RPU), and a set of Platform Management and Security Configuration processors. The RFSoC's PL is host to FPGA logic fabric that can provide hardware acceleration for computationally intensive arithmetic. The FPGA fabric is conveniently interfaced to the RFSoC's RF DCs, which are the primary communication path to the analogue world. There are two types of RF DCs, the first is the RF Analogue-to-Digital Converter (RF ADC) and the second is the RF Digital-to-Analogue Converter (RF DAC). In this series of notebooks, we will collectively refer to the RF ADCs and RF DACs, as RF DCs.

### The RFSoC Device Family <a class="anchor" id="the_rfsoc_device_family"></a>
Before we further explore the architecture of Gen-1 RFSoC devices, let's take a moment to investigate [Table 1](#tab-1), which presents some useful information about the Gen-1 RFSoC device family [[3]](#ref-3). From here, you will have a better understanding of the different components implemented in each device.

<a class="anchor" id="tab-1"></a>
<figure>
    <figcaption><b>Table 1: The Gen-1 RFSoC device family.</b></figcaption>
    <br>
    <table style="width:100%">
      <tr style="text-align:center">
        <th colspan="6">Generation 1</th>
      <tr>
        <th></th>
        <th>ZU21DR</th>
        <th>ZU25DR</th>
        <th>ZU27DR</th>
        <th>ZU28DR</th>
        <th>ZU29DR</th>
      <tr style="text-align:center">
        <td>ADC Blocks<br>Max Rate (Gsps)</td>
        <td>0<br>0</td>
        <td>8<br>4.096</td>
        <td>8<br>4.096</td>
        <td>8<br>4.096</td>
        <td>16<br>2.048</td>
      </tr>
      <tr style="text-align:center">
        <td>DAC Blocks<br>Max Rate (Gsps)</td>
        <td>0<br>0</td>
        <td>8<br>6.554</td>
        <td>8<br>6.554</td>
        <td>8<br>6.554</td>
        <td>16<br>6.554</td>
      </tr>
      <tr style="text-align:center">
        <td>SD-FEC Blocks</td>
        <td>8</td>
        <td>0</td>
        <td>0</td>
        <td>8</td>
        <td>0</td>
      </tr>
      <tr style="text-align:center">
        <td>System Logic Cells (K)</td>
        <td>930</td>
        <td>678</td>
        <td>930</td>
        <td>930</td>
        <td>930</td>
      </tr>
      <tr style="text-align:center">
        <td>CLB LUTs (K)</td>
        <td>425</td>
        <td>310</td>
        <td>425</td>
        <td>425</td>
        <td>425</td>
      </tr>
      <tr style="text-align:center">
        <td>Max Dist. RAM (Mb)</td>
        <td>13.0</td>
        <td>9.6</td>
        <td>13.0</td>
        <td>13.0</td>
        <td>13.0</td>
      </tr>
      <tr style="text-align:center">
        <td>Total Block RAM (Mb)</td>
        <td>38.0</td>
        <td>27.8</td>
        <td>38.0</td>
        <td>38.0</td>
        <td>38.0</td>
      </tr>
      <tr style="text-align:center">
        <td>UltraRAM (Mb)</td>
        <td>22.5</td>
        <td>13.5</td>
        <td>22.5</td>
        <td>22.5</td>
        <td>22.5</td>
      </tr>
      <tr style="text-align:center">
        <td>DSP Slices</td>
        <td>4272</td>
        <td>3145</td>
        <td>4272</td>
        <td>4272</td>
        <td>4272</td>
      </tr>
    </table>
</figure>

Pay particular attention to the number of RF ADCs and RF DACs each device contains. Note that the ZU21DR device does not contain any RF DCs, and instead only contains SD-FEC blocks for accelerated forward error correction. The sample frequency of the RF ADCs also change between RFSoC devices depending on the number of available channels.

----

## The Processing System <a class="anchor" id="the_processing_system"></a>
Let's now investigate the RFSoC's PS. From [Figure 1](#fig-1) we can see that the PS contains an APU, RPU, Platform Management Unit (PMU), Configuration Security Unit (CSU) and external memory controller. There are also a variety of hardware drivers for general and high-speed peripheral communication. Each of these components are described thoroughly in the Exploring Zynq MPSoC book [[2]](#ref-2). We will briefly review the APU, RPU, PMU and CSU in this section.

### The Application Processor <a class="anchor" id="the_application_processor"></a>
The APU contains an Arm Cortex-A53 Multi-Processor Core (MPCore). The Cortex-A53 MPCore is host to four processing cores, each with their own dedicated computational units. These include a Floating Point Unit (FPU), NEON Media Processing Engine (MPE), Cryptography Extension (Crypto), Memory Management Unit (MMU), and dedicated Level 1 cache memory per core. The entire APU has access to a Snoop Control Unit (SCU) and Level 2 cache memory. An overview of the APU can be seen in [Figure 2](#fig-2).

<a class="anchor" id="fig-2"></a>
<figure>
<img src='./images/application_processor.svg' width='50%'/>
    <figcaption><b>Figure 2: Simplified diagram of the Application Processing Unit.</b></figcaption>
</figure>

A motivation for using the APU is to host an operating system. For example, if you are accessing this notebook on your RFSoC development board, then you are probably using the PYNQ software framework (a Linux based operating system) on the APU. An operating system provides driver support and application control for your RFSoC system design.

### The Real-Time Processor <a class="anchor" id="the_real_time_processor"></a>
The RPU contains two Arm Cortex-R5 cores and should be considered for real-time applications and deterministic system control, as it provides low latency performance. The RPU contains a number of computational units and memories, which include an FPU, Tightly Coupled Memories (TCMs), two local caches, and a Memory Protection Unit (MPU). See [Figure 3](#fig-3) for a simplified overview of the RPU architecture.

<a class="anchor" id="fig-3"></a>
<figure>
<img src='./images/real_time_processor.svg' width='50%'/>
    <figcaption><b>Figure 3: Overview of the Real-Time Processing System.</b></figcaption>
</figure>

The RPU can be selected to run the FreeRTOS operating system, to support system development and design.

### Platform Management and Security <a class="anchor" id="platform_management_and_security"></a>
The PMU consists of a triplicated MicroBlaze processing unit, which contains hardened processors. Using three MicroBlaze processors rather than one improves the reliability of data handling with a majority voting system. The PMU contains several memories, and Xilinx firmware, which allow it to effectively manage the RFSoC device.

Lastly, the CSU consists of a Secure Processor Block (SPB) and crypto interface block (CIB). Similar to the PMU, the SPB contains three triplicated MicroBlaze processing units. These manage the secure boot of the Arm processors and several other security related features, such as Physically Unclonable Functions (PUFs) and tamper detection. The CIB contains several crypto blocks for secure applications: AES-GCM, SHA-3, and RSA 4096.

----

## The Programmable Logic <a class="anchor" id="the_programmable_logic"></a>
The PL is an essential part of the RFSoC device, as it directly interfaces to the high sample rate RF ADCs and RF DACs. As we have previously explored, there are several RF DC channels, each can be processed simultaneously by exploiting the parallel processing capabilities of FPGAs. Before we begin exploring the RF DCs, let's review the fundamental building blocks of an FPGA, and investigate its special processing resources for undertaking computationally intensive arithemtic. See [Figure 4](#fig-4) for an architecture overview of the RFSoC's FPGA logic fabric.

<a class="anchor" id="fig-4"></a>
<figure>
    <img src='./images/rfsoc_fpga.svg' width='80%'/>
    <br>
    <br>
    <figcaption><b>Figure 4: The Zynq RFSoC's Programmable Logic with FPGA, RF Data Converters, and AXI ports.</b></figcaption>
</figure>

Also shown in [Figure 4](#fig-4) are the Advanced eXtensible Interface (AXI) ports, which enable data transfer between the RFSoC's PL and PS. If you would like to learn more about the AXI ports, see Chapter 3 and Chapter 11 in [[2]](#ref-2).

This section will review the FPGA logic fabric and special processing resources such as Block RAMs, Ultra RAMs, and DSP48E2 slices. Finally, the interface to the RF ADC and RF DAC channels will be investigated.

### The Logic Fabric <a class="anchor" id="the_logic_fabric"></a>
Previously in [Table 1](#tab-1) it was shown that each RFSoC device contains logic cells and Configurable Logic Blocks (CLBs). These resources are fundamental for many Digital Signal Processing (DSP) architectures and implementations. CLBs are arranged in columns in the FPGA logic fabric, and are closely aligned with switch matrices to support signal routing between neighbouring resources. Each CLB in Gen-1 RFSoC devices contain 8 Lookup Tables (LUTs), 16 Flip-Flops (FFs), and other critical routing logic for communicating between CLBs. The routing logic is composed of multiplexers and other special arithmetic carry logic, which are useful for routing signals between adjacent CLBs. See [Figure 5](#fig-5) for an overview of a CLB.

<a class="anchor" id="fig-5"></a>
<figure>
<img src='./images/clb_overview.svg' width='70%'/>
    <figcaption><b>Figure 5: Configurable Logic Block (CLB) connections and components overview.</b></figcaption>
</figure>

The FPGA logic fabric has a variety of applications, including constructing various distributed arithmetic circuits such as addition and multiplication, and also hosting distributed RAM using LUTs (see [Table 1](#tab-1) for the maximum distributed RAM per device).

### Special Processing Resources <a class="anchor" id="special_processing_resources"></a>
The RFSoC is packed full of unique processing resources. The FPGA contains special components to accelerate arithmetic and algorithm computation. The special processing resources are separated into two categories; storage and arithmetic.

#### Block RAM and UltraRAM
Other than distributed memory, the PL also contains Block RAMs and UltraRAMs for memory storage. Each storage element has their own features and capabilities.

Block RAMs can be configured to operate as a RAM, ROM and First In First Out (FIFO) buffer. When configured as one storage element, a Block RAM can store up to 36Kb of data. Alternatively, the Block RAM can be separated into two individual memories capable of storing 18Kb of data each. Block RAMs are unique because they can be reshaped. For example, a Block RAM can be configured to store 4096 elements x 9 bits, or 8192 elements x 4 bits, and other configurations.

UltraRAM is a larger storage element than Block RAM as it can store up to 100Mb of data in one tile. Unlike Block RAM, an UltraRAM tile cannot be reshaped, as it only supports an address configuration of 4096 elements x 72 bits.

#### DSP48E2 Slices
Many FPGA architectures benefit from dedicated DSP resources for large arithmetic computations. The RFSoC's PL offers several columns of DSP48E2 slices, which can provide high-speed and wide wordlength arithmetic support. DSP slices can be cascaded with others to increase arithmetic wordlength. [Figure 6](#fig-6) shows the architecture of a DSP slice and the input/output wordlengths of each stage.

<a class="anchor" id="fig-6"></a>
<figure>
<img src='./images/dsp_slice.svg' width='60%'/>
    <figcaption><b>Figure 6: Simplified block diagram of the DSP48E2 slice.</b></figcaption>
</figure>

Notice that the maximum input to the multiplier is 27x18 bits. The pre-adder (suitable for systolic Finite Impulse Response, FIR, filters) can accept 30x27 bits, and the post-adder can accept 48x45 bits. The DSP48E2 slice is very suitable for implementing FIR filter designs. FIRs can be deployed on the FPGA and perform signal filtering operations such as decimation and interpolation for RFSoC applications.

### The RF Interface <a class="anchor" id="the_rf_interface"></a>
Finally, the RFSoC's FPGA is our gateway to the RF ADC and RF DAC channels. As previously shown in [Figure 1](#fig-1), there are several RF DC channels that each require their own interface to the FPGA logic fabric. For the purpose of exploring the RF interface, we will focus on one channel only. Signal data is transferred between the RFSoC's FPGA and the RF DCs using the AXI-Stream interface. AXI-Stream is a straightforward data transfer standard that only requires three signals to operate correctly.

* Data — Also known as tdata, this signal is used to transfer data between the FPGA and RF interface.
* Valid — Also known as tvalid, this signal is used to indicate the presence of valid data on the tdata signal.
* Ready — Also known as tready, this signal is used to handle backpressure in the AXI-Stream transfer.

Further information on the AXI-Stream interface can be found in [[4]](#ref-4).

The RF interface follows a relatively straightforward design topology. A component is responsible for creating data and the other consumes the data. In the past, this has been referred to as a master and slave topology. We will refer to these terms as the leader and follower respectively.

The leader is responsible for retrieving or generating data so that it can be transferred to the follower. If we first consider an RF DAC channel, we can see that the FPGA is responsible for transferring data onto the RF interface. In this setup, the FPGA is the leader and the RF DAC is the follower, as shown in [Figure 7](#fig-7).

<a class="anchor" id="fig-7"></a>
<figure>
<img src='./images/rf_dac_interface.svg' width='80%'/>
    <figcaption><b>Figure 7: The leader/follower interface between the FPGA and RF DAC. Real and imaginary components are concatenated onto a single AXI-Stream interface on ZU28DR devices. Configuration shown is complex-to-real mode.</b></figcaption>
</figure>

In constrast, an RF ADC channel will transfer data onto the RF interface for the FPGA to consume. The RF ADC is the leader in this situation, and the FPGA is the follower. An illustration of this setup is shown in [Figure 8](#fig-8).

<a class="anchor" id="fig-8"></a>
<figure>
<img src='./images/rf_adc_interface.svg' width='80%'/>
    <figcaption><b>Figure 8: The follower/leader interface between the RF ADC and FPGA. Real and imaginary components are on separate AXI-Stream interfaces on ZU28DR devices.</b></figcaption>
</figure>

The data interface for the RF ADC and RF DAC channels use 16 bits to represent one sample. The AXI-Stream interface operates using a handshaking protocol, where the follower must set the tready signal high to initiate a transfer from the leader.

### Super Sample Rate <a class="anchor" id="super_sample_rate"></a>
Take a moment to consider the high sample rate of the RFSoC's ADC and DAC channels. Their maximum sample rate is 4096Msps and 6554Msps respectively. If you are familiar with FPGAs, you will realise that these digital signals cannot be clocked sequentially on the FPGA logic fabric, as the required clock frequency is too high. In order to represent these signals, they must be converted to a Super Sample Rate (SSR) representation.

An SSR interface contains several time contiguous samples per AXI-Stream clock cycle. To explain this concept, consider the conceptual diagram presented in [Figure 9](#fig-9). To acquire a signal sampled at 4096Msps, the samples must first be deserialised. Deserialisation increases the signal wordlength, but also has the effect of decreasing the required AXI-Stream clock frequency.

<a class="anchor" id="fig-9"></a>
<figure>
<img src='./images/super_sample_rate.svg' width='90%'/>
    <figcaption><b>Figure 9: Conceptual diagram of converting from a serial interface, to an SSR interface. The RF DCs do not strictly use this technique to create their SSR interface. This diagram attempts to only explain the concept of SSR interfaces.</b></figcaption>
</figure>

SSR interfaces are used regularly to transfer data between the RF DC channels and the RFSoC's FPGA. They are primarily required when the sample rate requirements of the RF interface is too high for single rate implementation.

For all Gen-1 RFSoC devices, an SSR interface uses 16 bits to represent a sample. Therefore, an SSR of 1 will be 16 bits wide, an SSR of 2 will be 32 bits wide, and an SSR of 4 will be 64 bits wide.

----

## The RF Data Converters <a class="anchor" id="the_rf_data_converters"></a>
We've finally arrived at our main topic of discussion. The RFSoC's Data Converter technology, which is one of the significant differences between a Zynq MPSoC and Zynq RFSoC device. When selecting a Zynq device, the RF DCs should be a motivating reason to use an RFSoC for your target application (other than the SD-FEC blocks). Let's quickly review the facts of the RF DCs on the ZU28DR RFSoC device.

* There are 8 RF ADC channels.
    * The maximum sample rate is 4096 Msps.
    * The sampler uses a wordlength of 12 bits.
* There are 8 RF DAC channels.
    * The maximum sample rate is 6554 Msps.
    * The sampler uses a wordlength of 14 bits.
    
Although there are 8 RF ADC and RF DAC channels, we should pay particular attention to how they are physically configured. The RFSoC data converters are laid out in tiles, which are host to a group of blocks that implement the core functionality of the associated data converter. This hierarchy of tiles and blocks simplifies the data converter design and implementation. Let's explore the hierarchy of each RF DC in this section to obtain a better understanding of its overall layout.

### The RF ADC Hierarchy <a class="anchor" id="the_rf_adc_hierarchy"></a>
The RF ADC can be configured as four blocks per tile, or two blocks per tile depending on the selected device. In particular, the ZU28DR device uses a layout of two blocks per tile, meaning a total of 4 tiles are required to host all 8 ADC blocks. A tile can generate all its own clocking requirements using its own Phase Locked Loop (PLL). The PLL requires an external, low-jitter, off-chip clock to operate effectively. An overview of the RF ADC is presented in [Figure 10](#fig-10).

<a class="anchor" id="fig-10"></a>
<figure>
<img src='./images/rf_adc_hierarchy.svg' width='80%'/>
    <figcaption><b>Figure 10: RF ADC hierarchy for the ZU28DR RFSoC device.</b></figcaption>
</figure>

Each RF ADC tile uses differential signaling. Therefore, the off-chip analogue signal needs to be differential. As on the ZCU111 + XM500 and RFSoC2x2 development boards, the differential signals are converted to single-ended using an RF Balun.

Each RF ADC block above contains several core processing blocks for a Digital Down Converter (DDC). The RF ADC processing pipeline samples an analogue signal, effectively converting it to the digital domain. The ADC block then applies DSP techniques such as threshold detection and Quadrature Modulation Correction (QMC). The signal is subsequently down converted using a complex mixer and programmable decimator [[5]](#ref-5).

### The RF DAC Hierarchy <a class="anchor" id="the_rf_dac_hierarchy"></a>
The RF DAC is configured as four blocks per tile, which requires a total of 2 tiles to host all 8 DAC blocks on the ZU28DR device. Similar to the ADC, there is an internal PLL for generating the DAC's clocking requirements. As before, an external, low-jitter clock is required to drive the PLL. [Figure 11](#fig-11) presents an overview of the RF DAC hierarchy.

<a class="anchor" id="fig-11"></a>
<figure>
<img src='./images/rf_dac_hierarchy.svg' width='80%'/>
    <figcaption><b>Figure 11: RF DAC hierarchy for the ZU28DR RFSoC device.</b></figcaption>
</figure>

Differential signaling is used again to interface RF DAC data to the outside world. Baluns can be used to convert between differential signals and single-ended signals. 

Each RF DAC block contains many functional processing blocks to implement a Digital Up Converter (DUC). The RF DAC contains several stages including a programmable interpolator, complex mixer, QMC processing, and an anti-sinc filter to correct Nyquist Zone 1 droop. Finally, a sampler converts the processed digital signal to the analogue domain for transmission [[5]](#ref-5).

----

## Conclusion
In this notebook we have explored the Xilinx Zynq UltraScale+ RFSoC device architecture. An investigation of the RFSoC's constituent components were reviewed, and the RF DC hierarchy presented. It was found that the RFSoC's FPGA logic fabric interfaces directly with the RF ADC and RF DAC tiles. SSR interfaces are regularly used to transfer high data rates between the FPGA and RF DCs.

The next notebook in this series will explore the RFSoC's data converters in more detail, by investigating the core functionality of RF DC tiles and blocks.

| | [Next Notebook ➡️](02_exploring_the_rf_dataconverters.ipynb)

## References
<a class="anchor" id="ref-1"></a>
[1] - [L. H. Crockett, R. A. Elliot, M. A. Enderwitz and R. W. Stewart, The Zynq Book: Embedded Processing with the ARM CortexA9 on the Xilinx Zynq-7000 All Programmable SoC, First Edition, Strathclyde Academic Media, 2014.](http://www.zynqbook.com/)

<a class="anchor" id="ref-2"></a>
[2] - [Crockett, L., Northcote, D., Ramsay, C., Robinson, F., & Stewart, R. (2019). Exploring Zynq MPSoC: With PYNQ and Machine Learning Applications.](https://www.zynq-mpsoc-book.com/)

<a class="anchor" id="ref-3"></a>
[3] - [Xilinx, Inc. "Zynq UltraScale+ RFSoC Product Tables and Product Selection Guide", xmp105, v1.9, 2020](https://www.xilinx.com/support/documentation/selection-guides/zynq-usp-rfsoc-product-selection-guide.pdf)

<a class="anchor" id="ref-4"></a>
[4] - [Arm, Ltd. "AMBA 4 AXI4-Stream Protocol Specification", Issue A, Version 1.0, March 2010.](http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ihi0051a/index.html)

<a class="anchor" id="ref-5"></a>
[5] - [Xilinx, Inc, "USP RF Data Converter: LogiCORE IP Product Guide", PG269, v2.3, June 2020](https://www.xilinx.com/support/documentation/ip_documentation/usp_rf_data_converter/v2_3/pg269-rf-data-converter.pdf)