diff --git a/DirectProgramming/C++SYCL_FPGA/Tutorials/Features/optimization_targets/README.md b/DirectProgramming/C++SYCL_FPGA/Tutorials/Features/optimization_targets/README.md
index 8b6eab82a1..e7cdd0984e 100644
--- a/DirectProgramming/C++SYCL_FPGA/Tutorials/Features/optimization_targets/README.md
+++ b/DirectProgramming/C++SYCL_FPGA/Tutorials/Features/optimization_targets/README.md
@@ -1,13 +1,36 @@
# `Optimization Targets` Sample
+This sample is an FPGA tutorial that demonstrates how to set optimization targets for your compile to target different performance metrics.
+
+This tutorial shows compiling with the minimum latency optimization target to achieve low latency at the cost of reduced fMAX .
+
+| Area | Description
+|:--- |:---
+| What you will learn | How to set optimization targets for your compile. How to use the minimum latency optimization target to compile low-latency designs. How to manually override underlying controls set by the minimum latency optimization target.
+| Time to complete | 20 minutes
+| Category | Concepts and Functionality
+
+## Purpose
+
This FPGA tutorial demonstrates how to set optimization targets for your compile to target different performance metrics.
-As an example, this tutorial shows compiling with the minimum latency optimization target to achieve low latency at the cost of reduced fMAX .
+The `-Xsoptimize=` command-line option sets optimization targets, and it supports the following flag:
+
+| Flag | Explanation | Documentation
+|:--- |:--- |:---
+|`latency` | Minimum latency | [*Minimum Latency Flow*](https://www.intel.com/content/www/us/en/docs/oneapi-fpga-add-on/optimization-guide/current/minimum-latency-flow.html) topic in the *FPGA Optimization Guide for Intel® oneAPI Toolkits Developer Guide*.
-| Area | Description
-|:--- |:---
-| What you will learn | How to set optimization targets for your compileHow to use the minimum latency optimization target to compile low-latency designs How to manually override underlying controls set by the minimum latency optimization target
-| Time to complete | 20 minutes
+To compile your design with the minimum latency optimization target, use the flag option `-Xsoptimize=latency`.
+
+As an example, this tutorial shows how to use the minimum latency optimization target to compile low-latency designs and how to manually override underlying controls set by the minimum latency optimization target. By default, the minimum latency optimization target tries to achieve lower latency at the cost of decreased fMAX , so it is a good starting point for optimizing latency-sensitive designs.
+
+## Prerequisites
+
+| Optimized for | Description
+|:--- |:---
+| OS | Ubuntu* 18.04/20.04 RHEL*/CentOS* 8 SUSE* 15 Windows* 10
+| Hardware | Intel® Agilex® 7, Arria® 10, and Stratix® 10 FPGAs
+| Software | Intel® oneAPI DPC++/C++ Compiler
> **Note**: Even though the Intel DPC++/C++ OneAPI compiler is enough to compile for emulation, generating reports and generating RTL, there are extra software requirements for the simulation flow and FPGA compiles.
>
@@ -17,16 +40,8 @@ As an example, this tutorial shows compiling with the minimum latency optimizati
> - ModelSim® SE
>
> When using the hardware compile flow, Intel® Quartus® Prime Pro Edition must be installed and accessible through your PATH.
->
-> :warning: Make sure you add the device files associated with the FPGA that you are targeting to your Intel® Quartus® Prime installation.
-
-## Prerequisites
-| Optimized for | Description
-|:--- |:---
-| OS | Ubuntu* 18.04/20.04 RHEL*/CentOS* 8 SUSE* 15 Windows* 10
-| Hardware | Intel® Agilex® 7, Arria® 10, and Stratix® 10 FPGAs
-| Software | Intel® oneAPI DPC++/C++ Compiler
+> **Warning**: Make sure you add the device files associated with the FPGA that you are targeting to your Intel® Quartus® Prime installation.
This sample is part of the FPGA code samples.
It is categorized as a Tier 3 sample that demonstrates a compiler feature.
@@ -47,43 +62,30 @@ flowchart LR
```
Find more information about how to navigate this part of the code samples in the [FPGA top-level README.md](/DirectProgramming/C++SYCL_FPGA/README.md).
-You can also find more information about [troubleshooting build errors](/DirectProgramming/C++SYCL_FPGA/README.md#troubleshooting), [running the sample on the Intel® DevCloud](/DirectProgramming/C++SYCL_FPGA/README.md#build-and-run-the-samples-on-intel-devcloud-optional), [using Visual Studio Code with the code samples](/DirectProgramming/C++SYCL_FPGA/README.md#use-visual-studio-code-vs-code-optional), [links to selected documentation](/DirectProgramming/C++SYCL_FPGA/README.md#documentation), etc.
+You can also find more information about [troubleshooting build errors](/DirectProgramming/C++SYCL_FPGA/README.md#troubleshooting), [running the sample on the Intel® DevCloud](/DirectProgramming/C++SYCL_FPGA/README.md#build-and-run-the-samples-on-intel-devcloud-optional), [using Visual Studio Code with the code samples](/DirectProgramming/C++SYCL_FPGA/README.md#use-visual-studio-code-vs-code-optional), [links to selected documentation](/DirectProgramming/C++SYCL_FPGA/README.md#documentation), and more.
-## Purpose
-
-This FPGA tutorial demonstrates how to set optimization targets for your compile to target different performance metrics.
-
-The `-Xsoptimize=` command-line flag sets optimization targets. It has the following options:
-| Option | Explanation | Documentation
-| ------- | -------------- | -------------
-|`latency`| Minimum latency| [Minimum Latency Flow](https://www.intel.com/content/www/us/en/docs/oneapi-fpga-add-on/optimization-guide/current/minimum-latency-flow.html)
+## Key Implementation Details
-As an example, this tutorial shows how to use the minimum latency optimization target to compile low-latency designs and how to manually override underlying controls set by the minimum latency optimization target. By default, the minimum latency optimization target tries to achieve lower latency at the cost of decreased fMAX , so it is a good starting point for optimizing latency-sensitive designs.
+The sample illustrates the following important concepts.
-To compile your design with the minimum latency optimization target, use the flag option `-Xsoptimize=latency`.
+- Setting optimization targets to use when compiling your program.
+- Using the minimum latency optimization target to compile low-latency designs.
+- Manually overriding underlying controls set by the minimum latency optimization target.
### Understanding the Tutorial Design
-The basic function performed by the tutorial kernel is an RGB to grayscale algorithm. To see the impact of the minimum latency optimization target in this tutorial in terms of latency and fMAX , and also see how to override underlying controls set by the minimum latency optimization target with specific manual controls, the design needs to be compiled three times.
-
-Part 1 compiles the design without the `-Xsoptimize=latency` flag. In this default flow, the compiler targets higher throughput and fMAX with the sacrifice of latency and area.
-
-Part 2 compiles the design with the `-Xsoptimize=latency` flag, so the minimum latency optimization target is used in this compile, which lowers latency by trading off fMAX .
+The basic function performed by the tutorial kernel is an RGB to grayscale algorithm. We compile the design three times to see the impact of the minimum latency optimization target in this tutorial in terms of latency and fMAX and to see how to override underlying controls set by the minimum latency optimization target with specific manual controls.
-Part 3 also compiles the design with the minimum latency optimization target, as well as manual controls that revert default underlying controls set by the minimum latency optimization target. Therefore, latency and fMAX of this compile are the same as part 1.
+- Part 1 compiles the design without the `-Xsoptimize=latency` flag. In this default flow, the compiler targets higher throughput and fMAX with the sacrifice of latency and area.
-## Key Concepts
+- Part 2 compiles the design with the `-Xsoptimize=latency` flag, so the minimum latency optimization target is used in this compile, which lowers latency by trading off fMAX .
-* How to set optimization targets for your compile
-* How to use the minimum latency optimization target to compile low-latency designs
-* How to manually override underlying controls set by the minimum latency optimization target
+- Part 3 compiles the design with the minimum latency optimization target and includes manual controls that revert default underlying controls set by the minimum latency optimization target. Therefore, latency and fMAX of this compile are the same as part 1.
-## Building the `optimization_targets` Tutorial
+## Build the `Optimization Targets` Tutorial
-> **Note**: When working with the command-line interface (CLI), you should configure the oneAPI toolkits using environment variables.
-> Set up your CLI environment by sourcing the `setvars` script located in the root of your oneAPI installation every time you open a new terminal window.
-> This practice ensures that your compiler, libraries, and tools are ready for development.
+>**Note**: When working with the command-line interface (CLI), you should configure the oneAPI toolkits using environment variables. Set up your CLI environment by sourcing the `setvars` script in the root of your oneAPI installation every time you open a new terminal window. This practice ensures that your compiler, libraries, and tools are ready for development.
>
> Linux*:
> - For system wide installations: `. /opt/intel/oneapi/setvars.sh`
@@ -94,20 +96,17 @@ Part 3 also compiles the design with the minimum latency optimization target, as
> - `C:\Program Files(x86)\Intel\oneAPI\setvars.bat`
> - Windows PowerShell*, use the following command: `cmd.exe "/K" '"C:\Program Files (x86)\Intel\oneAPI\setvars.bat" && powershell'`
>
-> For more information on configuring environment variables, see [Use the setvars Script with Linux* or macOS*](https://www.intel.com/content/www/us/en/develop/documentation/oneapi-programming-guide/top/oneapi-development-environment-setup/use-the-setvars-script-with-linux-or-macos.html) or [Use the setvars Script with Windows*](https://www.intel.com/content/www/us/en/develop/documentation/oneapi-programming-guide/top/oneapi-development-environment-setup/use-the-setvars-script-with-windows.html).
+> For more information on configuring environment variables, see *[Use the setvars Script with Linux* or macOS*](https://www.intel.com/content/www/us/en/develop/documentation/oneapi-programming-guide/top/oneapi-development-environment-setup/use-the-setvars-script-with-linux-or-macos.html)* or *[Use the setvars Script with Windows*](https://www.intel.com/content/www/us/en/develop/documentation/oneapi-programming-guide/top/oneapi-development-environment-setup/use-the-setvars-script-with-windows.html)*.
### On Linux*
-1. Generate the `Makefile` by running `cmake`:
+1. Change to the sample directory.
+2. Build the program for Intel® Agilex® 7 device family, which is the default.
```
mkdir build
cd build
- ```
- To compile for the default target (the Agilex® 7 device family), run `cmake` using the command:
- ```
cmake ..
```
-
> **Note**: You can change the default target by using the command:
> ```
> cmake .. -DFPGA_DEVICE=
@@ -120,42 +119,32 @@ Part 3 also compiles the design with the minimum latency optimization target, as
>
> You will only be able to run an executable on the FPGA if you specified a BSP.
-2. Compile the design using the generated `Makefile`. The following build targets are provided, matching the recommended development flow:
-
- * Compile for emulation (fast compile time, targets emulated FPGA device):
-
- ```bash
- make fpga_emu
- ```
-
- * Generate the optimization reports:
-
- ```bash
- make report
- ```
-
- * Compile for simulation (fast compile time, targets simulated FPGA device, reduced data size):
-
- ```bash
- make fpga_sim
- ```
-
- * Compile for FPGA hardware (longer compile time, targets FPGA device):
-
- ```bash
- make fpga
- ```
+3. Compile the design. (The provided targets match the recommended development flow.)
+
+ 1. Compile and run for emulation (fast compile time, targets emulates an FPGA device).
+ ```
+ make fpga_emu
+ ```
+ 2. Generate the HTML optimization reports. (See [Read the Reports](#read-the-reports) below for information on finding and understanding the reports.)
+ ```
+ make report
+ ```
+ 3. Compile for simulation (fast compile time, targets simulated FPGA device).
+ ```
+ make fpga_sim
+ ```
+ 4. Compile and run on FPGA hardware (longer compile time, targets an FPGA device).
+ ```
+ make fpga
+ ```
### On Windows*
-1. Generate the `Makefile` by running `cmake`.
-
+1. Change to the sample directory.
+2. Build the program for the Intel® Agilex® 7 device family, which is the default.
```
mkdir build
cd build
- ```
- To compile for the default target (the Agilex® 7 device family), run `cmake` using the command:
- ```
cmake -G "NMake Makefiles" ..
```
> **Note**: You can change the default target by using the command:
@@ -170,106 +159,116 @@ Part 3 also compiles the design with the minimum latency optimization target, as
>
> You will only be able to run an executable on the FPGA if you specified a BSP.
-2. Compile the design through the generated `Makefile`. The following build targets are provided, matching the recommended development flow:
-
- * Compile for emulation (fast compile time, targets emulated FPGA device):
- ```
- nmake fpga_emu
- ```
- * Generate the optimization reports:
- ```
- nmake report
- ```
- * Compile for simulation (fast compile time, targets simulated FPGA device, reduced data size):
- ```
- nmake fpga_sim
- ``
- * Compile for FPGA hardware (longer compile time, targets FPGA device):
- ```
- nmake fpga
- ```
+3. Compile the design. (The provided targets match the recommended development flow.)
+
+ 1. Compile for emulation (fast compile time, targets emulated FPGA device).
+ ```
+ nmake fpga_emu
+ ```
+ 2. Generate the optimization report. (See [Read the Reports](#read-the-reports) below for information on finding and understanding the reports.)
+ ```
+ nmake report
+ ```
+ 3. Compile for simulation (fast compile time, targets simulated FPGA device, reduced problem size).
+ ```
+ nmake fpga_sim
+ ```
+ 4. Compile for FPGA hardware (longer compile time, targets FPGA device):
+ ```
+ nmake fpga
+ ```
> **Note**: If you encounter any issues with long paths when compiling under Windows*, you may have to create your ‘build’ directory in a shorter path, for example c:\samples\build. You can then run cmake from that directory, and provide cmake with the full path to your sample directory.
-### Examining the Reports
-
-Locate the pair of `report.html` files in either:
+### Read the Reports
-* **Report-only compile**: `no_control_report.prj`, `minimum_latency_report.prj`, and `manual_revert_report.prj`
-* **FPGA hardware compile**: `no_control.fpga.prj`, `minimum_latency.fpga.prj`, and `manual_revert.fpga.prj`
+Locate the `report.html` files in the following locations (depending on the compile path that you selected):
-Open the reports in Chrome*, Firefox*, Edge*, or Internet Explorer*.
+- **Report-only compile**: `no_control_report.prj`, `minimum_latency_report.prj`, and `manual_revert_report.prj`
+- **FPGA hardware compile**: `no_control.fpga.prj`, `minimum_latency.fpga.prj`, and `manual_revert.fpga.prj`
Navigate to **Loop Analysis** (**Throughput Analysis > Loop Analysis**). In this viewer, you can find the latency of loops in the kernel. The latency of the compile with the minimum latency optimization target (part 2) should be lower than the other two compiles. Also, the latency of the other two compiles (part 1 & 3) should be the same.
Navigate to **Clock Frequency Summary** (**Summary > Clock Frequency Summary**) in `no_control.fpga.prj/reports/report.html`, `minimum_latency.fpga.prj/reports/report.html`, and `manual_revert.fpga.prj/reports/report.html` (after `make fpga` completes). In this table, you can find the actual fMAX . The fMAX of the compile with the minimum latency optimization target (part 2) should be lower than the other two compiles. Also, the fMAX of the other two compiles (part 1 & 3) should be the same. Note that only the report generated by the FPGA hardware compile will reflect the true fMAX affected by the minimum latency optimization target. The difference is **not** apparent in the reports generated by `make report` because a design's fMAX cannot be predicted.
-## Running the Sample
+## Run the `Optimization Targets` Sample
-1. Run the sample on the FPGA emulator (the kernel executes on the CPU):
+### On Linux
- ```bash
- ./no_control.fpga_emu (Linux)
- no_control.fpga_emu.exe (Windows)
+1. Run the sample on the FPGA emulator (the kernel executes on the CPU).
+ ```
+ ./no_control.fpga_emu
```
-2. Run the sample on the FPGA simulator device:
-
- * On Linux
- ```bash
- CL_CONTEXT_MPSIM_DEVICE_INTELFPGA=1 ./no_control.fpga_sim
- CL_CONTEXT_MPSIM_DEVICE_INTELFPGA=1 ./minimum_latency.fpga_sim
- CL_CONTEXT_MPSIM_DEVICE_INTELFPGA=1 ./manual_revert.fpga_sim
- ```
- * On Windows
- ```bash
- set CL_CONTEXT_MPSIM_DEVICE_INTELFPGA=1
- no_control.fpga_sim.exe
- minimum_latency.fpga_sim.exe
- manual_revert.fpga_sim.exe
- set CL_CONTEXT_MPSIM_DEVICE_INTELFPGA=
- ```
-
-3. Run the sample on the FPGA device (only if you ran `cmake` with `-DFPGA_DEVICE=:`):
-
- ```bash
- ./no_control.fpga (Linux)
- ./minimum_latency.fpga (Linux)
- ./manual_revert.fpga (Linux)
- no_control.fpga.exe (Windows)
- minimum_latency.fpga.exe (Windows)
- manual_revert.fpga.exe (Windows)
+2. Run the sample on the FPGA simulator device.
+ ```
+ CL_CONTEXT_MPSIM_DEVICE_INTELFPGA=1 ./no_control.fpga_sim
+ CL_CONTEXT_MPSIM_DEVICE_INTELFPGA=1 ./minimum_latency.fpga_sim
+ CL_CONTEXT_MPSIM_DEVICE_INTELFPGA=1 ./manual_revert.fpga_sim
+ ```
+
+3. Run the sample on the FPGA device (only if you ran `cmake` with `-DFPGA_DEVICE=:`).
+ ```
+ ./no_control.fpga
+ ./minimum_latency.fpga
+ ./manual_revert.fpga
+ ```
+
+### On Windows
+
+1. Run the sample on the FPGA emulator (the kernel executes on the CPU).
+ ```
+ no_control.fpga_emu.exe
+ ```
+
+2. Run the sample on the FPGA simulator device.
+ ```
+ set CL_CONTEXT_MPSIM_DEVICE_INTELFPGA=1
+ no_control.fpga_sim.exe
+ minimum_latency.fpga_sim.exe
+ manual_revert.fpga_sim.exe
+ set CL_CONTEXT_MPSIM_DEVICE_INTELFPGA=
+ ```
+
+3. Run the sample on the FPGA device (only if you ran `cmake` with `-DFPGA_DEVICE=:`).
+ ```
+ no_control.fpga.exe
+ minimum_latency.fpga.exe
+ manual_revert.fpga.exe
```
## Example Output
-Output of sample without minimum latency optimization target:
-```txt
+Example output without minimum latency optimization target:
+
+```
Kernel Throughput: 195.716MB/s
Exec Time: 1.9491e-05s, InputMB: 0.0038147MB
PASSED: all kernel results are correct
```
-Output of sample with minimum latency optimization target:
-```txt
+Example output with minimum latency optimization target:
+
+```
Kernel Throughput: 137.764MB/s
Exec Time: 2.769e-05s, InputMB: 0.0038147MB
PASSED: all kernel results are correct
```
-Output of sample with minimum latency optimization target but controls manually reverted:
-```txt
+Example output with minimum latency optimization target but controls manually reverted:
+
+```
Kernel Throughput: 192.934MB/s
Exec Time: 1.9772e-05s, InputMB: 0.0038147MB
PASSED: all kernel results are correct
```
-### Discussion of Results
+Comparing to Intel® Arria® 10 GX FPGA, it is more notable on Intel® Stratix® 10 SX FPGA that the minimum latency optimization target significantly reduces the latency, along with the fMAX and the throughput. That is because the minimum latency optimization target disables the hyper-optimized handshaking, which achieves higher fMAX at the cost of increased latency.
-Comparing to Intel Arria® 10 GX FPGA, it is more notable on Intel Stratix® 10 SX FPGA that the minimum latency optimization target significantly reduces the latency, along with the fMAX and the throughput. That is because the minimum latency optimization target disables the hyper-optimized handshaking, which achieves higher fMAX at the cost of increased latency. For more information on the hyper-optimized handshaking protocol on Intel Stratix® 10 and Intel Agilex® 7 devices, see [Modify the Handshaking Protocol Between Clusters (-Xshyper-optimized-handshaking)](https://www.intel.com/content/www/us/en/docs/oneapi-fpga-add-on/optimization-guide/current/hyper-opt-handshaking.html).
+> **Note**: For more information on the hyper-optimized handshaking protocol on Intel® Stratix® 10 and Intel Agilex® 7 devices, see the [*Modify the Handshaking Protocol Between Clusters (-Xshyper-optimized-handshaking)*](https://www.intel.com/content/www/us/en/docs/oneapi-fpga-add-on/optimization-guide/current/hyper-opt-handshaking.html) topic in the *FPGA Optimization Guide for Intel® oneAPI Toolkits Developer Guide*.
## License
Code samples are licensed under the MIT license. See [License.txt](https://github.com/oneapi-src/oneAPI-samples/blob/master/License.txt) for details.
-Third-party program Licenses can be found here: [third-party-programs.txt](https://github.com/oneapi-src/oneAPI-samples/blob/master/third-party-programs.txt).
+Third-party program Licenses can be found here: [third-party-programs.txt](https://github.com/oneapi-src/oneAPI-samples/blob/master/third-party-programs.txt).
\ No newline at end of file