From 91e1d5b757033f26f5c86ea9475d7807191d3382 Mon Sep 17 00:00:00 2001 From: jkinsky Date: Mon, 27 Feb 2023 09:33:06 -0600 Subject: [PATCH] Update FPGA pipe array readme Changed the sample name in the readme to match the name in the sample.json file. Restructured the readme to match the new template, with exceptions related to FPGA structure. Corrected formatting. Updated some branding. Rewrote and restructured some sections for clarity. --- .../DesignPatterns/pipe_array/README.md | 324 ++++++++++-------- 1 file changed, 172 insertions(+), 152 deletions(-) diff --git a/DirectProgramming/C++SYCL_FPGA/Tutorials/DesignPatterns/pipe_array/README.md b/DirectProgramming/C++SYCL_FPGA/Tutorials/DesignPatterns/pipe_array/README.md index 8b00dc4106..28d9e14226 100755 --- a/DirectProgramming/C++SYCL_FPGA/Tutorials/DesignPatterns/pipe_array/README.md +++ b/DirectProgramming/C++SYCL_FPGA/Tutorials/DesignPatterns/pipe_array/README.md @@ -1,27 +1,40 @@ -# Data Transfers Using Pipe Arrays -This FPGA tutorial showcases a design pattern that makes it possible to create arrays of pipes. - -| Optimized for | Description ---- |--- -| OS | Linux* Ubuntu* 18.04/20.04
RHEL*/CentOS* 8
SUSE* 15
Windows* 10 -| Hardware | Intel® Agilex®, Arria® 10, and Stratix® 10 FPGAs -| Software | Intel® oneAPI DPC++/C++ Compiler -| What you will learn | A design pattern to generate an array of pipes using SYCL*
Static loop unrolling through template metaprogramming -| Time to complete | 15 minutes - -> **Note**: Even though the Intel DPC++/C++ OneAPI compiler is enough to compile for emulation, generating reports and generating RTL, there are extra software requirements for the simulation flow and FPGA compiles. +# "Pipe Array" Sample + +This FPGA sample demonstrates a design pattern to create arrays of pipes. + +| Area | Description +|:-- |:-- +| What you will learn | A design pattern to generate an array of pipes using SYCL*
Static loop unrolling through template metaprogramming +| Time to complete | 15 minutes +| Category | Code Optimization + +## Purpose + +In certain situations, it is useful to create a collection of pipes that can be indexed like an array in a SYCL-compliant FPGA design. If you are not yet familiar with pipes, refer to the prerequisite tutorial "Data Transfers Using Pipes". + +In SYCL*, each pipe defines a unique type with static methods for reading data (`read`) and writing data (`write`). Since pipes are not objects but *types*, defining a collection of pipes requires C++ template meta-programming. This is somewhat non-intuitive but yields highly efficient code. + +This tutorial provides a convenient pair of header files defining an abstraction for an array of pipes. The headers can be used in any SYCL-compliant design and can be extended as necessary. + +## Prerequisites + +| Optimized for | Description +|:--- |:--- +| OS | Ubuntu* 18.04/20.04
RHEL*/CentOS* 8
SUSE* 15
Windows* 10 +| Hardware | Intel® Agilex®, Arria® 10, and Stratix® 10 FPGAs +| Software | Intel® oneAPI DPC++/C++ Compiler + +> **Note**: Even though the Intel® oneAPI DPC++/C++ Compiler is enough to compile for emulation, generating reports, generating RTL, there are extra software requirements for the simulation flow and FPGA compiles. > -> For using the simulator flow, Intel® Quartus® Prime Pro Edition and one of the following simulators must be installed and accessible through your PATH: +> For using the simulator flow, you must have Intel® Quartus® Prime Pro Edition and one of the following simulators installed and accessible through your PATH: > - Questa*-Intel® FPGA Edition > - Questa*-Intel® FPGA Starter Edition -> - ModelSim® SE +> - ModelSim SE > > When using the hardware compile flow, Intel® Quartus® Prime Pro Edition must be installed and accessible through your PATH. -> -> :warning: Make sure you add the device files associated with the FPGA that you are targeting to your Intel® Quartus® Prime installation. -## Prerequisites +> **Warning** Make sure you add the device files associated with the FPGA that you are targeting to your Intel® Quartus® Prime installation. This sample is part of the FPGA code samples. It is categorized as a Tier 3 sample that demonstrates a design pattern. @@ -42,24 +55,24 @@ flowchart LR ``` Find more information about how to navigate this part of the code samples in the [FPGA top-level README.md](/DirectProgramming/C++SYCL_FPGA/README.md). -You can also find more information about [troubleshooting build errors](/DirectProgramming/C++SYCL_FPGA/README.md#troubleshooting), [running the sample on the Intel® DevCloud](/DirectProgramming/C++SYCL_FPGA/README.md#build-and-run-the-samples-on-intel-devcloud-optional), [using Visual Studio Code with the code samples](/DirectProgramming/C++SYCL_FPGA/README.md#use-visual-studio-code-vs-code-optional), [links to selected documentation](/DirectProgramming/C++SYCL_FPGA/README.md#documentation), etc. +You can also find more information about [troubleshooting build errors](/DirectProgramming/C++SYCL_FPGA/README.md#troubleshooting), [running the sample on the Intel® DevCloud](/DirectProgramming/C++SYCL_FPGA/README.md#build-and-run-the-samples-on-intel-devcloud-optional), [using Visual Studio Code with the code samples](/DirectProgramming/C++SYCL_FPGA/README.md#use-visual-studio-code-vs-code-optional), [links to selected documentation](/DirectProgramming/C++SYCL_FPGA/README.md#documentation), and more. -## Purpose -In certain situations, it is useful to create a collection of pipes that can be indexed like an array in a SYCL-compliant FPGA design. If you are not yet familiar with pipes, refer to the prerequisite tutorial "Data Transfers Using Pipes". +## Key Implementation Details -In SYCL*, each pipe defines a unique type with static methods for reading data (`read`) and writing data (`write`). Since pipes are not objects but *types*, defining a collection of pipes requires C++ template meta-programming. This is somewhat non-intuitive but yields highly efficient code. +The sample demonstrates the following important concepts: -This tutorial provides a convenient pair of header files defining an abstraction for an array of pipes. The headers can be used in any SYCL-compliant design and can be extended as necessary. +- A design pattern to generate an array of pipes. +- Static loop unrolling through template metaprogramming. -### Example 1: A simple array of pipes +### A Simple Array of Pipes -To create an array of pipes, include the pipe_utils.hpp header from the DirectProgramming/C++SYCL_FPGA/include/ directory in your design: +To create an array of pipes, include the `pipe_utils.hpp` header from the `../DirectProgramming/C++SYCL_FPGA/include/..` directory in your design: ```c++ #include "pipe_utils.hpp" ``` -As with regular pipes, an array of pipes needs template parameters for an ID, for the `min_capacity` of each pipe, and each pipe's data type. An array of pipes additionally requires one or more template parameters to specify the array size. The following code declares a one dimensional array of 10 pipes, each with `capacity=32`, that operate on `int` values. +As with regular pipes, an array of pipes needs template parameters for an ID, for the `min_capacity` of each pipe, and each pipe's data type. An array of pipes additionally requires one or more template parameters to specify the array size. The following code declares a one-dimensional array of 10 pipes, each with `capacity=32`, that operate on `int` values. ```c++ using MyPipeArray = PipeArray< // Defined in "pipe_utils.hpp". @@ -78,6 +91,7 @@ Indexing inside a pipe array can be done via the `PipeArray::PipeAt` type alias, MyPipeArray::PipeAt<3>::write(17); auto x = MyPipeArray::PipeAt<3>::read(); ``` + The template parameter `<3>` identifies a specific pipe within the array of pipes. The index of the pipe being accessed *must* be determinable at compile time. In most cases, we want to use an array of pipes so that we can iterate over them in a loop. To respect the requirement that all pipe indices are uniquely determinable at compile time, we must use a static form of loop unrolling based on C++ templates. A simple example is shown in the code snippet: @@ -88,13 +102,13 @@ Unroller<0, 10>::Step([](auto i) { MyPipeArray::PipeAt::write(17); }); ``` -While this may initially feel foreign to those unaccustomed to C++ template metaprogramming, this is a simple and powerful pattern common to many C++ libraries. It is easy to reuse. This code sample includes a simple header file `unroller.hpp`, which implements the `Unroller` functionality. -### Example 2: A 2D array of pipes +While this approach may feel foreign to those unaccustomed to C++ template metaprogramming, this is a simple and powerful pattern common to many C++ libraries. It is easy to reuse. This code sample includes a simple header file `unroller.hpp`, which implements the `Unroller` functionality. + +### A 2D Array of Pipes -This code sample defines a `Producer` kernel that reads data from host memory and forwards this data into a two dimensional pipe matrix. +This code sample defines a `Producer` kernel that reads data from host memory and forwards this data into a two-dimensional pipe matrix. The following code snippet creates a two-dimensional pipe array. -The following code snippet creates a two dimensional pipe array. ``` c++ constexpr size_t kNumRows = 2; constexpr size_t kNumCols = 2; @@ -108,7 +122,8 @@ using ProducerToConsumerPipeMatrix = PipeArray< // Defined in "pipe_utils.hpp". kNumCols // array dimension. >; ``` -The producer kernel writes `num_passes` units of data into each of the `kNumRows * kNumCols` pipes. Note that the unrollers' lambdas must capture certain variables from their outer scope. + +The producer kernel writes `num_passes` units of data into each of the `kNumRows * kNumCols` pipes. Note that the lambdas in the unrollers must capture certain variables from their outer scope. ```c++ h.single_task([=]() { @@ -143,6 +158,7 @@ h.single_task>([=]() { ``` The host must thus enqueue the producer kernel and `kNumRows * kNumCols` separate consumer kernels. The latter is achieved through another static unroll. + ```c++ { queue q(device_selector, fpga_tools::exception_handler); @@ -160,15 +176,9 @@ The host must thus enqueue the producer kernel and `kNumRows * kNumCols` separat } ``` -## Key Concepts -* A design pattern to generate an array of pipes. -* Static loop unrolling through template metaprogramming. +## Build the `Pipe Array` Sample -## Building the `pipe_array` Tutorial - -> **Note**: When working with the command-line interface (CLI), you should configure the oneAPI toolkits using environment variables. -> Set up your CLI environment by sourcing the `setvars` script located in the root of your oneAPI installation every time you open a new terminal window. -> This practice ensures that your compiler, libraries, and tools are ready for development. +>**Note**: When working with the command-line interface (CLI), you should configure the oneAPI toolkits using environment variables. Set up your CLI environment by sourcing the `setvars` script in the root of your oneAPI installation every time you open a new terminal window. This practice ensures that your compiler, libraries, and tools are ready for development. > > Linux*: > - For system wide installations: `. /opt/intel/oneapi/setvars.sh` @@ -179,124 +189,134 @@ The host must thus enqueue the producer kernel and `kNumRows * kNumCols` separat > - `C:\Program Files(x86)\Intel\oneAPI\setvars.bat` > - Windows PowerShell*, use the following command: `cmd.exe "/K" '"C:\Program Files (x86)\Intel\oneAPI\setvars.bat" && powershell'` > -> For more information on configuring environment variables, see [Use the setvars Script with Linux* or macOS*](https://www.intel.com/content/www/us/en/develop/documentation/oneapi-programming-guide/top/oneapi-development-environment-setup/use-the-setvars-script-with-linux-or-macos.html) or [Use the setvars Script with Windows*](https://www.intel.com/content/www/us/en/develop/documentation/oneapi-programming-guide/top/oneapi-development-environment-setup/use-the-setvars-script-with-windows.html). - -### On a Linux* System - -1. Generate the `Makefile` by running `cmake`. - ``` - mkdir build - cd build - ``` - To compile for the default target (the Agilex® device family), run `cmake` using the command: - ``` - cmake .. - ``` - - > **Note**: You can change the default target by using the command: - > ``` - > cmake .. -DFPGA_DEVICE= - > ``` - > - > Alternatively, you can target an explicit FPGA board variant and BSP by using the following command: - > ``` - > cmake .. -DFPGA_DEVICE=: - > ``` - > - > You will only be able to run an executable on the FPGA if you specified a BSP. -2. Compile the design through the generated `Makefile`. The following build targets are provided, matching the recommended development flow: - - * Compile for emulation (fast compile time, targets emulated FPGA device): +> For more information on configuring environment variables, see [*Use the setvars Script with Linux* or macOS**](https://www.intel.com/content/www/us/en/develop/documentation/oneapi-programming-guide/top/oneapi-development-environment-setup/use-the-setvars-script-with-linux-or-macos.html) or [*Use the setvars Script with Windows**](https://www.intel.com/content/www/us/en/develop/documentation/oneapi-programming-guide/top/oneapi-development-environment-setup/use-the-setvars-script-with-windows.html). + +### On Linux* + +1. Change to the sample directory. +2. Build the program for Intel® Agilex® device family, which is the default. + ``` + mkdir build + cd build + cmake .. + ``` + > **Note**: You can change the default target by using the command: + > ``` + > cmake .. -DFPGA_DEVICE= + > ``` + > + > Alternatively, you can target an explicit FPGA board variant and BSP by using the following command: + > ``` + > cmake .. -DFPGA_DEVICE=: + > ``` + > + > You will only be able to run an executable on the FPGA if you specified a BSP. + +3. Compile the design. (The provided targets match the recommended development flow.) + + 1. Compile for emulation (fast compile time, targets emulated FPGA device): + ``` + make fpga_emu + ``` + 2. Generate the optimization report: + ``` + make report + ``` + The report resides at `pipe_array_report.prj/reports/report.html`. + + You can visualize the kernels and pipes generated by looking at the *System Viewer* section of the report. However, you should first reduce the array dimensions `kNumRows` and `kNumCols` to small values (2 or 3) to help visualization. + + 3. Compile for simulation (fast compile time, targets simulated FPGA device, reduced data size): + ``` + make fpga_sim + ``` + 4. Compile for FPGA hardware (longer compile time, targets FPGA device): + ``` + make fpga + ``` + +### On Windows* + +1. Change to the sample directory. +2. Build the program for the Intel® Agilex® device family, which is the default. + ``` + mkdir build + cd build + cmake -G "NMake Makefiles" .. + ``` + > **Note**: You can change the default target by using the command: + > ``` + > cmake -G "NMake Makefiles" .. -DFPGA_DEVICE= + > ``` + > + > Alternatively, you can target an explicit FPGA board variant and BSP by using the following command: + > ``` + > cmake -G "NMake Makefiles" .. -DFPGA_DEVICE=: + > ``` + > + > You will only be able to run an executable on the FPGA if you specified a BSP. + +3. Compile the design. (The provided targets match the recommended development flow.) + + 1. Compile for emulation (fast compile time, targets emulated FPGA device): + ``` + nmake fpga_emu ``` - make fpga_emu + 2. Generate the optimization report: + ``` + nmake report + ``` + The report resides at `pipe_array_report.prj.a/reports/report.html`. + + You can visualize the kernels and pipes generated by looking at the *System Viewer* section of the report. However, you should first reduce the array dimensions `kNumRows` and `kNumCols` to small values (2 or 3) to help visualization. + + 3. Compile for simulation (fast compile time, targets simulated FPGA device, reduced data size): + ``` + nmake fpga_sim + ``` + 4. Compile for FPGA hardware (longer compile time, targets FPGA device): + ``` + nmake fpga ``` - * Generate the optimization report: - ``` - make report - ``` - * Compile for simulation (fast compile time, targets simulated FPGA device, reduced data size): - ``` - make fpga_sim - ``` - * Compile for FPGA hardware (longer compile time, targets FPGA device): - ``` - make fpga - ``` - -### On a Windows* System - -1. Generate the `Makefile` by running `cmake`. - ``` - mkdir build - cd build - ``` - To compile for the default target (the Agilex® device family), run `cmake` using the command: - ``` - cmake -G "NMake Makefiles" .. - ``` - > **Note**: You can change the default target by using the command: - > ``` - > cmake -G "NMake Makefiles" .. -DFPGA_DEVICE= - > ``` - > - > Alternatively, you can target an explicit FPGA board variant and BSP by using the following command: - > ``` - > cmake -G "NMake Makefiles" .. -DFPGA_DEVICE=: - > ``` - > - > You will only be able to run an executable on the FPGA if you specified a BSP. - -2. Compile the design through the generated `Makefile`. The following build targets are provided, matching the recommended development flow: - - * Compile for emulation (fast compile time, targets emulated FPGA device): - ``` - nmake fpga_emu - ``` - * Generate the optimization report: - ``` - nmake report - ``` - * Compile for simulation (fast compile time, targets simulated FPGA device, reduced data size): - ``` - nmake fpga_sim - ``` - * Compile for FPGA hardware (longer compile time, targets FPGA device): - ``` - nmake fpga - ``` > **Note**: If you encounter any issues with long paths when compiling under Windows*, you may have to create your ‘build’ directory in a shorter path, for example c:\samples\build. You can then run cmake from that directory, and provide cmake with the full path to your sample directory. -## Examining the Reports -Locate `report.html` in the `pipe_array_report.prj/reports/` directory. Open the report in any of Chrome*, Firefox*, Edge*, or Internet Explorer*. - -You can visualize the kernels and pipes generated by looking at the "System Viewer" section of the report. However, it is recommended that you first reduce the array dimensions `kNumRows` and `kNumCols` to small values (2 or 3) to facilitate visualization. - -## Running the Sample - - 1. Run the sample on the FPGA emulator (the kernel executes on the CPU): - ``` - ./pipe_array.fpga_emu (Linux) - pipe_array.fpga_emu.exe (Windows) - ``` -2. Run the sample on the FPGA simulator device: - * On Linux - ``` - CL_CONTEXT_MPSIM_DEVICE_INTELFPGA=1 ./pipe_array.fpga_sim - ``` - * On Windows - ``` - set CL_CONTEXT_MPSIM_DEVICE_INTELFPGA=1 - pipe_array.fpga_sim.exe - set CL_CONTEXT_MPSIM_DEVICE_INTELFPGA= - ``` -3. Run the sample on the FPGA device (only if you ran `cmake` with `-DFPGA_DEVICE=:`): - ``` - ./pipe_array.fpga (Linux) - pipe_array.fpga.exe (Windows) - ``` - -### Example of Output +## Run the `Pipe Array` Sample + +### On Linux + +1. Run the sample on the FPGA emulator (the kernel executes on the CPU). + ``` + ./pipe_array.fpga_emu + ``` +2. Run the sample on the FPGA simulator device. + ``` + CL_CONTEXT_MPSIM_DEVICE_INTELFPGA=1 ./pipe_array.fpga_sim + ``` +3. Run the sample on the FPGA device (only if you ran `cmake` with `-DFPGA_DEVICE=:`). + ``` + ./pipe_array.fpga + ``` + +### On Windows + +1. Run the sample on the FPGA emulator (the kernel executes on the CPU). + ``` + pipe_array.fpga_emu.exe + ``` +2. Run the sample on the FPGA simulator device. + ``` + set CL_CONTEXT_MPSIM_DEVICE_INTELFPGA=1 + pipe_array.fpga_sim.exe + set CL_CONTEXT_MPSIM_DEVICE_INTELFPGA= + ``` +3. Run the sample on the FPGA device (only if you ran `cmake` with `-DFPGA_DEVICE=:`). + ``` + pipe_array.fpga.exe + ``` + +## Example Output + ``` Input Array Size: 1024 Enqueuing producer... @@ -312,4 +332,4 @@ PASSED: The results are correct Code samples are licensed under the MIT license. See [License.txt](https://github.com/oneapi-src/oneAPI-samples/blob/master/License.txt) for details. -Third party program Licenses can be found here: [third-party-programs.txt](https://github.com/oneapi-src/oneAPI-samples/blob/master/third-party-programs.txt). +Third party program Licenses can be found here: [third-party-programs.txt](https://github.com/oneapi-src/oneAPI-samples/blob/master/third-party-programs.txt). \ No newline at end of file