From 8bd8db3159f43c7cf67a71fbde9de8e382bc3d04 Mon Sep 17 00:00:00 2001 From: jkinsky Date: Tue, 2 May 2023 14:24:35 -0500 Subject: [PATCH 1/5] FPGA Reg Sample readme update Restructured to match new template-with adjustments for FPGA samples. Updated readme sample name to match the sample.json sample name. Moved images into assets folder. Updated some branding. Corrected some formatting issues. Make grammar and spelling corrections. --- .../Tutorials/Features/fpga_reg/README.md | 276 ++++++++---------- .../fpga_reg/{ => assets}/fpga_reg.png | Bin .../fpga_reg/{ => assets}/no_fpga_reg.png | Bin 3 files changed, 129 insertions(+), 147 deletions(-) rename DirectProgramming/C++SYCL_FPGA/Tutorials/Features/fpga_reg/{ => assets}/fpga_reg.png (100%) rename DirectProgramming/C++SYCL_FPGA/Tutorials/Features/fpga_reg/{ => assets}/no_fpga_reg.png (100%) diff --git a/DirectProgramming/C++SYCL_FPGA/Tutorials/Features/fpga_reg/README.md b/DirectProgramming/C++SYCL_FPGA/Tutorials/Features/fpga_reg/README.md index a6ec75fcb7..3cda1cbad6 100755 --- a/DirectProgramming/C++SYCL_FPGA/Tutorials/Features/fpga_reg/README.md +++ b/DirectProgramming/C++SYCL_FPGA/Tutorials/Features/fpga_reg/README.md @@ -1,16 +1,31 @@ -# Explicit Pipeline Register Insertion with `fpga_reg` +# `FPGA Reg` Sample -This FPGA tutorial demonstrates how a power user can apply the SYCL*-compliant C++ extension `ext::intel::fpga_reg` to tweak the hardware generated by the compiler. +This sample is an FPGA tutorial that demonstrates how a power user can apply the SYCL*-compliant Explicit Pipeline Register Insertion C++ extension, `fpga_reg``ext::intel::fpga_reg`, to tweak the hardware generated by the compiler. -> **Note**: This is an **advanced tutorial** for FPGA power users. +> **Note**: **This is an advanced tutorial for FPGA power users.** -| Optimized for | Description ---- |--- -| OS | Linux* Ubuntu* 18.04/20.04
RHEL*/CentOS* 8
SUSE* 15
Windows* 10 -| Hardware | Intel® Agilex® 7, Arria® 10, and Stratix® 10 FPGAs -| Software | Intel® oneAPI DPC++/C++ Compiler -| What you will learn | How to use the `ext::intel::fpga_reg` extension
How `ext::intel::fpga_reg` can be used to re-structure the compiler-generated hardware
Situations in which applying `ext::intel::fpga_reg` might be beneficial -| Time to complete | 20 minutes +| Area | Description +|:--- |:--- +| What you will learn | How to use the `ext::intel::fpga_reg` extension.
How `ext::intel::fpga_reg` can be used to re-structure the compiler-generated hardware.
How to identify situations where applying `ext::intel::fpga_reg` can help. +| Time to complete | 20 minutes +| Category | Concepts and Functionality + +## Purpose + +This FPGA tutorial demonstrates an example of using the `ext::intel::fpga_reg` extension to: + +- Help reduce the fanout of specific signals in the SYCL-compliant design. +- Improve the overall fMAX of the generated hardware. + +> **Note**: A fMAX improvement is not always possible when using `ext::intel::fpga_reg`. + +## Prerequisites + +| Optimized for | Description +|:--- |:--- +| OS | Ubuntu* 18.04/20.04
RHEL*/CentOS* 8
SUSE* 15
Windows* 10 +| Hardware | Intel® Agilex® 7, Arria® 10, and Stratix® 10 FPGAs +| Software | Intel® oneAPI DPC++/C++ Compiler > **Note**: Even though the Intel DPC++/C++ OneAPI compiler is enough to compile for emulation, generating reports and generating RTL, there are extra software requirements for the simulation flow and FPGA compiles. > @@ -20,10 +35,8 @@ This FPGA tutorial demonstrates how a power user can apply the SYCL*-compliant C > - ModelSim® SE > > When using the hardware compile flow, Intel® Quartus® Prime Pro Edition must be installed and accessible through your PATH. -> -> :warning: Make sure you add the device files associated with the FPGA that you are targeting to your Intel® Quartus® Prime installation. -## Prerequisites +> **Warning**: Make sure you add the device files associated with the FPGA that you are targeting to your Intel® Quartus® Prime installation. This sample is part of the FPGA code samples. It is categorized as a Tier 3 sample that demonstrates a compiler feature. @@ -44,16 +57,15 @@ flowchart LR ``` Find more information about how to navigate this part of the code samples in the [FPGA top-level README.md](/DirectProgramming/C++SYCL_FPGA/README.md). -You can also find more information about [troubleshooting build errors](/DirectProgramming/C++SYCL_FPGA/README.md#troubleshooting), [running the sample on the Intel® DevCloud](/DirectProgramming/C++SYCL_FPGA/README.md#build-and-run-the-samples-on-intel-devcloud-optional), [using Visual Studio Code with the code samples](/DirectProgramming/C++SYCL_FPGA/README.md#use-visual-studio-code-vs-code-optional), [links to selected documentation](/DirectProgramming/C++SYCL_FPGA/README.md#documentation), etc. +You can also find more information about [troubleshooting build errors](/DirectProgramming/C++SYCL_FPGA/README.md#troubleshooting), [running the sample on the Intel® DevCloud](/DirectProgramming/C++SYCL_FPGA/README.md#build-and-run-the-samples-on-intel-devcloud-optional), [using Visual Studio Code with the code samples](/DirectProgramming/C++SYCL_FPGA/README.md#use-visual-studio-code-vs-code-optional), [links to selected documentation](/DirectProgramming/C++SYCL_FPGA/README.md#documentation), and more. -## Purpose +## Key Implementation Details -This FPGA tutorial demonstrates an example of using the `ext::intel::fpga_reg` extension to: - -* Help reduce the fanout of specific signals in the SYCL-compliant design. -* Improve the overall fMAX of the generated hardware. +This tutorial demonstrates the following key concepts: -Note: A fMAX improvement is not always possible when using `ext::intel::fpga_reg`. +- How to use the `ext::intel::fpga_reg` extension. +- How to use `ext::intel::fpga_reg` to restructure the compiler-generated hardware. +- How to identify situations where applying `ext::intel::fpga_reg` can help. ### Simple Code Example @@ -81,7 +93,7 @@ int func (int input) { This forces the compiler to insert a register between the input and output. You can observe this in the optimization report's System Viewer. -### Understanding the Tutorial Design +### Understanding the Design The basic function performed by the tutorial kernel is a vector dot product with a pre-adder. The loop is unrolled so that the core part of the algorithm is a feed-forward datapath. The coefficient array is implemented as a circular shift register and rotates by one for each iteration of the outer loop. @@ -89,39 +101,28 @@ The optimization applied in this tutorial impacts the system fMAX or Part 1 compiles the kernel code without setting the `USE_FPGA_REG` macro, whereas Part 2 compiles the kernel while setting this macro. This chooses between two functionally equivalent code segments, but the latter version uses `ext::intel::fpga_reg`. In the `USE_FPGA_REG` version of the code, the compiler is guaranteed to insert at least one register stage between the input and output of each of the calls to `ext::intel::fpga_reg` function. -#### Part 1: Without `USE_FPGA_REG` - -The compiler will generate the following hardware for Part 1. The diagram below has been simplified for illustration. +#### Part 1: Without Using `USE_FPGA_REG` -Part 1 +The compiler generates the following hardware for Part 1. The diagram below has been simplified for illustration. -Note the following: +![Part 1](assets/no_fpga_reg.png) -* The compiler automatically infers a tree structure for the series of adders. -* There is a large fanout (of up to 4 in this simplified example) from `val` to each of the adders. +The compiler automatically infers a tree structure for the series of adders. There is a large fanout (of up to 4 in this simplified example) from `val` to each of the adders. The fanout grows linearly with the unroll factor in this tutorial. In FPGA designs, signals with large fanout can sometimes degrade system fMAX. This happens because the FPGA placement algorithm cannot place *all* of the fanout logic elements physically close to the fanout source, leading to longer wires. In this situation, it can be helpful to add explicit fanout control in your code via `ext::intel::fpga_reg`. This is an advanced optimization for FPGA power-users. -#### Part 2: with `USE_FPGA_REG` +#### Part 2: Using `USE_FPGA_REG` In this part, we added two sets of `ext::intel::fpga_reg` within the unrolled loop. The first is added to pipeline `val` once per iteration. This reduces the fanout of `val` from 4 in the example in Part 1 to just 2. The second `ext::intel::fpga_reg` is inserted between accumulation into the `acc` value. This generates the following structure in hardware. -Part 2 +![Part 2](assets/fpga_reg.png) In this version, the adder tree has been transformed into a vine-like structure. This increases latency, but it helps us achieve our goal of reducing the fanout and improving fMAX. Since the outer loop is pipelined and has a high trip count, the inner loop's increased latency has a negligible impact on throughput. The tradeoff pays off, as the fMAX improvement yields a higher performing design. -## Key Concepts +## Build the `FPGA Reg` Tutorial -* How to use the `ext::intel::fpga_reg` extension. -* How `ext::intel::fpga_reg` can be used to re-structure the compiler-generated hardware. -* Situations in which applying `ext::intel::fpga_reg` might be beneficial. - -## Building the `fpga_reg` Design - -> **Note**: When working with the command-line interface (CLI), you should configure the oneAPI toolkits using environment variables. -> Set up your CLI environment by sourcing the `setvars` script located in the root of your oneAPI installation every time you open a new terminal window. -> This practice ensures that your compiler, libraries, and tools are ready for development. +>**Note**: When working with the command-line interface (CLI), you should configure the oneAPI toolkits using environment variables. Set up your CLI environment by sourcing the `setvars` script in the root of your oneAPI installation every time you open a new terminal window. This practice ensures that your compiler, libraries, and tools are ready for development. > > Linux*: > - For system wide installations: `. /opt/intel/oneapi/setvars.sh` @@ -134,20 +135,15 @@ Since the outer loop is pipelined and has a high trip count, the inner loop's in > > For more information on configuring environment variables, see [Use the setvars Script with Linux* or macOS*](https://www.intel.com/content/www/us/en/develop/documentation/oneapi-programming-guide/top/oneapi-development-environment-setup/use-the-setvars-script-with-linux-or-macos.html) or [Use the setvars Script with Windows*](https://www.intel.com/content/www/us/en/develop/documentation/oneapi-programming-guide/top/oneapi-development-environment-setup/use-the-setvars-script-with-windows.html). -### On a Linux* System +### On Linux* -1. Install the design in `build` directory from the design directory by running `cmake`: - - ```bash +1. Change to the sample directory. +2. Build the program for Intel® Agilex® 7 device family, which is the default. + ``` mkdir build cd build - ``` - - To compile for the default target (the Agilex® 7 device family), run `cmake` using the command: - ``` cmake .. ``` - > **Note**: You can change the default target by using the command: > ``` > cmake .. -DFPGA_DEVICE= @@ -160,43 +156,32 @@ Since the outer loop is pipelined and has a high trip count, the inner loop's in > > You will only be able to run an executable on the FPGA if you specified a BSP. -2. Compile the design using the generated `Makefile`. The following four build targets are provided that match the recommended development flow: - - * Compile and run for emulation (fast compile time, targets emulates an FPGA device) using: - - ```bash - make fpga_emu - ``` - - * Generate HTML optimization reports using: - - ```bash - make report - ``` - - * Compile for simulation (fast compile time, targets simulated FPGA device) - - ```bash - make fpga_sim - ``` - - * Compile and run on FPGA hardware (longer compile time, targets an FPGA device) using: - - ```bash - make fpga - ``` - -### On a Windows* System - -1. Generate the `Makefile` by running `cmake`. - - ```bat +3. Compile the design. (The provided targets match the recommended development flow.) + + 1. Compile and run for emulation (fast compile time, targets emulates an FPGA device). + ``` + make fpga_emu + ``` + 2. Generate the HTML optimization reports. (See [Read the Reports](#read-the-reports) below for information on finding and understanding the reports.) + ``` + make report + ``` + 3. Compile for simulation (fast compile time, targets simulated FPGA device). + ``` + make fpga_sim + ``` + 4. Compile and run on FPGA hardware (longer compile time, targets an FPGA device). + ``` + make fpga + ``` + +### On Windows* + +1. Change to the sample directory. +2. Build the program for the Intel® Agilex® 7 device family, which is the default. + ``` mkdir build cd build - ``` - - To compile for the default target (the Agilex® 7 device family), run `cmake` using the command: - ``` cmake -G "NMake Makefiles" .. ``` > **Note**: You can change the default target by using the command: @@ -211,94 +196,91 @@ Since the outer loop is pipelined and has a high trip count, the inner loop's in > > You will only be able to run an executable on the FPGA if you specified a BSP. -2. Compile the design through the generated `Makefile`. The following build targets are provided, matching the recommended development flow: +3. Compile the design. (The provided targets match the recommended development flow.) + + 1. Compile for emulation (fast compile time, targets emulated FPGA device). + ``` + nmake fpga_emu + ``` + 2. Generate the optimization report. (See [Read the Reports](#read-the-reports) below for information on finding and understanding the reports.) + ``` + nmake report + ``` + 3. Compile for simulation (fast compile time, targets simulated FPGA device, reduced problem size). + ``` + nmake fpga_sim + ``` + 4. Compile for FPGA hardware (longer compile time, targets FPGA device): + ``` + nmake fpga + ``` +> **Note**: If you encounter any issues with long paths when compiling under Windows*, you may have to create your ‘build’ directory in a shorter path, for example c:\samples\build. You can then run cmake from that directory, and provide cmake with the full path to your sample directory. + +### Read the Reports - * Compile for emulation (fast compile time, targets emulated FPGA device): - - ``` - nmake fpga_emu - ``` - - * Generate the optimization report: - - ``` - nmake report - ``` - - * Compile for simulation (fast compile time, targets simulated FPGA device, reduced problem size): +Locate the pair of `report.html` files in either: - ``` - nmake fpga_sim - ``` +* **Report-only compile**: `fpga_reg_report.prj` and `fpga_reg_registered_report.prj` +* **FPGA hardware compile**: `fpga_reg.prj` and `fpga_reg_registered.prj` - * Compile for FPGA hardware (longer compile time, targets FPGA device): +Observe the structure of the design in the optimization report's System Viewer and notice the changes within `Cluster 2` of the `SimpleMath.B3` block when compiling with Intel Arria® 10 GX FPGA. In the report for Part 1, the viewer shows a much more shallow graph compared to the one in Part 2. This is because the operations are performed much closer to one another in Part 1 compared to Part 2. By transforming the code in Part 2, with more register stages, the compiler achieved a higher fMAX. - ``` - nmake fpga - ``` +>**Note**: Only the report generated after the FPGA hardware compile will reflect the performance benefit of using the `fpga_reg` extension. The difference is *not* apparent in the reports generated by `make report` because a design's fMAX cannot be predicted. The final achieved fMAX can be found in `fpga_reg.prj/reports/report.html` and `fpga_reg_registered.prj/reports/report.html` (after `make fpga` completes). -> **Note**: If you encounter any issues with long paths when -compiling under Windows*, you may have to create your ‘build’ directory in a -shorter path, for example c:\samples\build. You can then run cmake from that -directory, and provide cmake with the full path to your sample directory. -## Examining the Reports +## Run the `FPGA Reg` Sample -Locate the pair of `report.html` files in either: +### On Linux -* **Report-only compile**: `fpga_reg_report.prj` and `fpga_reg_registered_report.prj` -* **FPGA hardware compile**: `fpga_reg.prj` and `fpga_reg_registered.prj` +1. Run the sample on the FPGA emulator (the kernel executes on the CPU). + ``` + ./fpga_reg.fpga_emu + ``` -Open the reports in Chrome*, Firefox*, Edge*, or Internet Explorer*. Observe the structure of the design in the optimization report's System Viewer and notice the changes within `Cluster 2` of the `SimpleMath.B3` block when compiling with Intel Arria® 10 GX FPGA. In the report for Part 1, the viewer shows a much more shallow graph compared to the one in Part 2. This is because the operations are performed much closer to one another in Part 1 compared to Part 2. By transforming the code in Part 2, with more register stages, the compiler achieved a higher fMAX. +2. Run the sample on the FPGA simulator device. + ``` + CL_CONTEXT_MPSIM_DEVICE_INTELFPGA=1 ./fpga_reg.fpga_sim + CL_CONTEXT_MPSIM_DEVICE_INTELFPGA=1 ./fpga_reg_registered.fpga_sim + ``` ->**NOTE**: Only the report generated after the FPGA hardware compile will reflect the performance benefit of using the `fpga_reg` extension. The difference is *not* apparent in the reports generated by `make report` because a design's fMAX cannot be predicted. The final achieved fMAX can be found in `fpga_reg.prj/reports/report.html` and `fpga_reg_registered.prj/reports/report.html` (after `make fpga` completes). +3. Run the sample on the FPGA device (only if you ran `cmake` with `-DFPGA_DEVICE=:`). + ``` + ./fpga_reg.fpga + ./fpga_reg_registered.fpga + ``` -## Running the Sample +### On Windows -1. Run the sample on the FPGA emulator (the kernel executes on the CPU): +1. Run the sample on the FPGA emulator (the kernel executes on the CPU). + ``` + fpga_reg.fpga_emu.exe + ``` - ```bash - ./fpga_reg.fpga_emu (Linux) - fpga_reg.fpga_emu.exe (Windows) +2. Run the sample on the FPGA simulator device. + ``` + set CL_CONTEXT_MPSIM_DEVICE_INTELFPGA=1 + fpga_reg.fpga_sim.exe + fpga_reg_registered.fpga_sim.exe + set CL_CONTEXT_MPSIM_DEVICE_INTELFPGA= ``` -2. Run the sample on the FPGA simulator device - - * On Linux - ```bash - CL_CONTEXT_MPSIM_DEVICE_INTELFPGA=1 ./fpga_reg.fpga_sim - CL_CONTEXT_MPSIM_DEVICE_INTELFPGA=1 ./fpga_reg_registered.fpga_sim - ``` - * On Windows - ```bash - set CL_CONTEXT_MPSIM_DEVICE_INTELFPGA=1 - fpga_reg.fpga_sim.exe - fpga_reg_registered.fpga_sim.exe - set CL_CONTEXT_MPSIM_DEVICE_INTELFPGA= - ``` - -3. Run the sample on the FPGA device (only if you ran `cmake` with `-DFPGA_DEVICE=:`): - - ```bash - ./fpga_reg.fpga (Linux) - ./fpga_reg_registered.fpga (Linux) - fpga_reg.fpga.exe (Windows) - fpga_reg_registered.fpga.exe (Windows) +3. Run the sample on the FPGA device (only if you ran `cmake` with `-DFPGA_DEVICE=:`). + ``` + fpga_reg.fpga.exe + fpga_reg_registered.fpga.exe ``` -### Example of Output +## Example Output -```txt +``` Throughput for kernel with input size 1000000 and coefficient array size 64: 2.819272 GFlops PASSED: Results are correct. ``` -### Discussion of Results - You will be able to observe the improvement in the throughput going from Part 1 to Part 2. You will also note that the fMAX of Part 2 is significantly larger than of Part 1. ## License Code samples are licensed under the MIT license. See [License.txt](https://github.com/oneapi-src/oneAPI-samples/blob/master/License.txt) for details. -Third-party program Licenses can be found here: [third-party-programs.txt](https://github.com/oneapi-src/oneAPI-samples/blob/master/third-party-programs.txt). +Third-party program Licenses can be found here: [third-party-programs.txt](https://github.com/oneapi-src/oneAPI-samples/blob/master/third-party-programs.txt). \ No newline at end of file diff --git a/DirectProgramming/C++SYCL_FPGA/Tutorials/Features/fpga_reg/fpga_reg.png b/DirectProgramming/C++SYCL_FPGA/Tutorials/Features/fpga_reg/assets/fpga_reg.png similarity index 100% rename from DirectProgramming/C++SYCL_FPGA/Tutorials/Features/fpga_reg/fpga_reg.png rename to DirectProgramming/C++SYCL_FPGA/Tutorials/Features/fpga_reg/assets/fpga_reg.png diff --git a/DirectProgramming/C++SYCL_FPGA/Tutorials/Features/fpga_reg/no_fpga_reg.png b/DirectProgramming/C++SYCL_FPGA/Tutorials/Features/fpga_reg/assets/no_fpga_reg.png similarity index 100% rename from DirectProgramming/C++SYCL_FPGA/Tutorials/Features/fpga_reg/no_fpga_reg.png rename to DirectProgramming/C++SYCL_FPGA/Tutorials/Features/fpga_reg/assets/no_fpga_reg.png From b83ad72667d4ce1b6d79dba06aeeeb1eecc49ad1 Mon Sep 17 00:00:00 2001 From: jkinsky <106110367+jkinsky@users.noreply.github.com> Date: Wed, 3 May 2023 08:16:38 -0500 Subject: [PATCH 2/5] Update DirectProgramming/C++SYCL_FPGA/Tutorials/Features/fpga_reg/README.md Co-authored-by: yuguen-intel --- .../C++SYCL_FPGA/Tutorials/Features/fpga_reg/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/DirectProgramming/C++SYCL_FPGA/Tutorials/Features/fpga_reg/README.md b/DirectProgramming/C++SYCL_FPGA/Tutorials/Features/fpga_reg/README.md index 3cda1cbad6..0d92ab708a 100755 --- a/DirectProgramming/C++SYCL_FPGA/Tutorials/Features/fpga_reg/README.md +++ b/DirectProgramming/C++SYCL_FPGA/Tutorials/Features/fpga_reg/README.md @@ -1,4 +1,4 @@ -# `FPGA Reg` Sample +# `fpga_reg` Sample This sample is an FPGA tutorial that demonstrates how a power user can apply the SYCL*-compliant Explicit Pipeline Register Insertion C++ extension, `fpga_reg``ext::intel::fpga_reg`, to tweak the hardware generated by the compiler. From 2317793c4f6ecf465f1fb08714614c6af07c63ca Mon Sep 17 00:00:00 2001 From: jkinsky <106110367+jkinsky@users.noreply.github.com> Date: Wed, 3 May 2023 08:16:53 -0500 Subject: [PATCH 3/5] Update DirectProgramming/C++SYCL_FPGA/Tutorials/Features/fpga_reg/README.md Co-authored-by: yuguen-intel --- .../C++SYCL_FPGA/Tutorials/Features/fpga_reg/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/DirectProgramming/C++SYCL_FPGA/Tutorials/Features/fpga_reg/README.md b/DirectProgramming/C++SYCL_FPGA/Tutorials/Features/fpga_reg/README.md index 0d92ab708a..40775d3699 100755 --- a/DirectProgramming/C++SYCL_FPGA/Tutorials/Features/fpga_reg/README.md +++ b/DirectProgramming/C++SYCL_FPGA/Tutorials/Features/fpga_reg/README.md @@ -1,6 +1,6 @@ # `fpga_reg` Sample -This sample is an FPGA tutorial that demonstrates how a power user can apply the SYCL*-compliant Explicit Pipeline Register Insertion C++ extension, `fpga_reg``ext::intel::fpga_reg`, to tweak the hardware generated by the compiler. +This sample is an FPGA tutorial that demonstrates how a power user can apply the SYCL*-compliant Explicit Pipeline Register Insertion C++ extension, `ext::intel::fpga_reg`, to tweak the hardware generated by the compiler. > **Note**: **This is an advanced tutorial for FPGA power users.** From 2a0e80a7546014b306d369bce37ad6c56bc93a60 Mon Sep 17 00:00:00 2001 From: jkinsky <106110367+jkinsky@users.noreply.github.com> Date: Wed, 3 May 2023 08:17:04 -0500 Subject: [PATCH 4/5] Update DirectProgramming/C++SYCL_FPGA/Tutorials/Features/fpga_reg/README.md Co-authored-by: yuguen-intel --- .../C++SYCL_FPGA/Tutorials/Features/fpga_reg/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/DirectProgramming/C++SYCL_FPGA/Tutorials/Features/fpga_reg/README.md b/DirectProgramming/C++SYCL_FPGA/Tutorials/Features/fpga_reg/README.md index 40775d3699..5878460b19 100755 --- a/DirectProgramming/C++SYCL_FPGA/Tutorials/Features/fpga_reg/README.md +++ b/DirectProgramming/C++SYCL_FPGA/Tutorials/Features/fpga_reg/README.md @@ -120,7 +120,7 @@ In this part, we added two sets of `ext::intel::fpga_reg` within the unrolled lo In this version, the adder tree has been transformed into a vine-like structure. This increases latency, but it helps us achieve our goal of reducing the fanout and improving fMAX. Since the outer loop is pipelined and has a high trip count, the inner loop's increased latency has a negligible impact on throughput. The tradeoff pays off, as the fMAX improvement yields a higher performing design. -## Build the `FPGA Reg` Tutorial +## Build the `fpga_reg` Tutorial >**Note**: When working with the command-line interface (CLI), you should configure the oneAPI toolkits using environment variables. Set up your CLI environment by sourcing the `setvars` script in the root of your oneAPI installation every time you open a new terminal window. This practice ensures that your compiler, libraries, and tools are ready for development. > From e37174b7e99e781b0ae93cdcd725c9d7efd735d5 Mon Sep 17 00:00:00 2001 From: jkinsky <106110367+jkinsky@users.noreply.github.com> Date: Wed, 3 May 2023 08:17:19 -0500 Subject: [PATCH 5/5] Update DirectProgramming/C++SYCL_FPGA/Tutorials/Features/fpga_reg/README.md Co-authored-by: yuguen-intel --- .../C++SYCL_FPGA/Tutorials/Features/fpga_reg/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/DirectProgramming/C++SYCL_FPGA/Tutorials/Features/fpga_reg/README.md b/DirectProgramming/C++SYCL_FPGA/Tutorials/Features/fpga_reg/README.md index 5878460b19..b6c6abfff1 100755 --- a/DirectProgramming/C++SYCL_FPGA/Tutorials/Features/fpga_reg/README.md +++ b/DirectProgramming/C++SYCL_FPGA/Tutorials/Features/fpga_reg/README.md @@ -228,7 +228,7 @@ Observe the structure of the design in the optimization report's System Viewer a >**Note**: Only the report generated after the FPGA hardware compile will reflect the performance benefit of using the `fpga_reg` extension. The difference is *not* apparent in the reports generated by `make report` because a design's fMAX cannot be predicted. The final achieved fMAX can be found in `fpga_reg.prj/reports/report.html` and `fpga_reg_registered.prj/reports/report.html` (after `make fpga` completes). -## Run the `FPGA Reg` Sample +## Run the `fpga_reg` Sample ### On Linux