diff --git a/DirectProgramming/C++SYCL_FPGA/README.md b/DirectProgramming/C++SYCL_FPGA/README.md
index 1c09366556..d6be0de2b7 100644
--- a/DirectProgramming/C++SYCL_FPGA/README.md
+++ b/DirectProgramming/C++SYCL_FPGA/README.md
@@ -269,8 +269,6 @@ qsub -I -l nodes=1:fpga_runtime:ppn=2 -d .
Only `fpga_compile` nodes support compiling to FPGA. When compiling for FPGA hardware, increase the job timeout to 24 hours.
-Executing programs on FPGA hardware is only supported on `fpga_runtime` nodes of the appropriate type, such as `fpga_runtime:arria10` or `fpga_runtime:stratix10`.
-
Neither compiling nor executing programs on FPGA hardware are supported on the login nodes. For more information, see the [Intel® oneAPI Base Toolkit Get Started Guide](https://devcloud.intel.com/oneapi/documentation/base-toolkit/).
>**Note**: Since Intel® DevCloud for oneAPI includes the appropriate development environment already configured for you, you do not need to set environment variables.
diff --git a/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/anr/README.md b/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/anr/README.md
index 4ab42594ac..d0c85dfb36 100755
--- a/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/anr/README.md
+++ b/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/anr/README.md
@@ -37,7 +37,7 @@ You can also find more information about [troubleshooting build errors](/DirectP
| Optimized for | Description
|:--- |:---
| OS | Ubuntu* 18.04/20.04
RHEL*/CentOS* 8
SUSE* 15
Windows* 10
-| Hardware | Intel® Programmable Acceleration Card (PAC) with Intel Arria® 10 GX FPGA
FPGA Programmable Acceleration Card (PAC) D5005 (with Intel Stratix® 10 SX)
Intel Xeon® CPU E5-1650 v2 @ 3.50GHz (host machine)
+| Hardware | Intel® Agilex™, Arria® 10, and Stratix® 10 FPGAs
| Software | Intel® oneAPI DPC++/C++ Compiler
> **Note**: Even though the Intel DPC++/C++ OneAPI compiler is enough to compile for emulation, generating reports and generating RTL, there are extra software requirements for the simulation flow and FPGA compiles.
@@ -48,6 +48,8 @@ You can also find more information about [troubleshooting build errors](/DirectP
> - ModelSim® SE
>
> When using the hardware compile flow, Intel® Quartus® Prime Pro Edition must be installed and accessible through your PATH.
+>
+> :warning: Make sure you add the device files associated with the FPGA that you are targeting to your Intel® Quartus® Prime installation.
## Key Implementation Details
@@ -149,17 +151,26 @@ The design uses the following generic header files.
### On Linux*
1. Change to the sample directory.
-2. Configure the build system for **Intel® PAC with Intel Arria® 10 GX FPGA**, which is the default.
+2. Configure the build system for the Agilex™ device family, which is the default.
```
mkdir build
cd build
cmake ..
```
- For **Intel® FPGA PAC D5005 (with Intel Stratix® 10 SX)**, enter the following:
- ```
- cmake .. -DFPGA_DEVICE=intel_s10sx_pac:pac_s10
- ```
+
+ > **Note**: You can change the default target by using the command:
+ > ```
+ > cmake .. -DFPGA_DEVICE=
+ > ```
+ >
+ > Alternatively, you can target an explicit FPGA board variant and BSP by using the following command:
+ > ```
+ > cmake .. -DFPGA_DEVICE=:
+ > ```
+ >
+ > You will only be able to run an executable on the FPGA if you specified a BSP.
+
3. Compile the design. (The provided targets match the recommended development flow.)
1. Compile for emulation (fast compile time, targets emulated FPGA device).
@@ -181,23 +192,27 @@ The design uses the following generic header files.
make fpga
```
- (Optional) The hardware compiles listed above can take several hours to complete; alternatively, you can download FPGA precompiled binaries (compatible with Linux* Ubuntu* 18.04) from [https://iotdk.intel.com/fpga-precompiled-binaries/latest/anr.fpga.tar.gz](https://iotdk.intel.com/fpga-precompiled-binaries/latest/anr.fpga.tar.gz).
-
### On Windows*
->**Note**: The Intel® PAC with Intel Arria® 10 GX FPGA and Intel® FPGA PAC D5005 (with Intel Stratix® 10 SX) do not yet support Windows*. Compiling to FPGA hardware on Windows* requires a third-party or custom Board Support Package (BSP) with Windows* support.
-
1. Change to the sample directory.
-2. Configure the build system for **Intel® PAC with Intel Arria® 10 GX FPGA**, which is the default.
+2. Configure the build system for the Agilex™ device family, which is the default.
```
mkdir build
cd build
cmake -G "NMake Makefiles" ..
```
- To compile for the **Intel® FPGA PAC D5005 (with Intel Stratix® 10 SX)**, enter the following:
- ```
- cmake -G "NMake Makefiles" .. -DFPGA_DEVICE=intel_s10sx_pac:pac_s10
- ```
+
+ > **Note**: You can change the default target by using the command:
+ > ```
+ > cmake -G "NMake Makefiles" .. -DFPGA_DEVICE=
+ > ```
+ >
+ > Alternatively, you can target an explicit FPGA board variant and BSP by using the following command:
+ > ```
+ > cmake -G "NMake Makefiles" .. -DFPGA_DEVICE=:
+ > ```
+ >
+ > You will only be able to run an executable on the FPGA if you specified a BSP.
3. Compile the design. (The provided targets match the recommended development flow.)
@@ -229,11 +244,11 @@ The design uses the following generic header files.
```
./anr.fpga_emu
```
-2. Run the sample on the FPGA simulator device:
+2. Run the sample on the FPGA simulator device.
```
CL_CONTEXT_MPSIM_DEVICE_INTELFPGA=1 ./anr.fpga_sim
```
-3. Alternatively, run the sample on the FPGA device.
+3. Alternatively, run the sample on the FPGA device (only if you ran `cmake` with `-DFPGA_DEVICE=:`).
```
./anr.fpga
```
@@ -244,13 +259,13 @@ The design uses the following generic header files.
```
anr.fpga_emu.exe
```
-2. Run the sample on the FPGA simulator device:
+2. Run the sample on the FPGA simulator device.
```
set CL_CONTEXT_MPSIM_DEVICE_INTELFPGA=1
anr.fpga_sim.exe
set CL_CONTEXT_MPSIM_DEVICE_INTELFPGA=
```
-3. Alternatively, run the sample on the FPGA device.
+3. Alternatively, run the sample on the FPGA device (only if you ran `cmake` with `-DFPGA_DEVICE=:`).
```
anr.fpga.exe
```
diff --git a/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/anr/src/CMakeLists.txt b/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/anr/src/CMakeLists.txt
index d6ee2236af..c7ef09ab18 100644
--- a/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/anr/src/CMakeLists.txt
+++ b/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/anr/src/CMakeLists.txt
@@ -6,12 +6,36 @@ set(FPGA_TARGET ${TARGET_NAME}.fpga)
# FPGA board selection
if(NOT DEFINED FPGA_DEVICE)
- set(FPGA_DEVICE "intel_a10gx_pac:pac_a10")
+ set(FPGA_DEVICE "Agilex")
+ set(DEVICE_FLAG "Agilex")
message(STATUS "FPGA_DEVICE was not specified.\
- \nConfiguring the design to run on the default FPGA board ${FPGA_DEVICE} (Intel(R) PAC with Intel Arria(R) 10 GX FPGA). \
- \nPlease refer to the README for information on board selection.")
+ \nConfiguring the design to the default FPGA family: ${FPGA_DEVICE}\
+ \nPlease refer to the README for information on target selection.")
+
+ set(BSP_FLAG "")
else()
- message(STATUS "Configuring the design to run on FPGA board ${FPGA_DEVICE}")
+ string(TOLOWER ${FPGA_DEVICE} FPGA_DEVICE_NAME)
+ if(FPGA_DEVICE_NAME MATCHES ".*a10.*" OR FPGA_DEVICE_NAME MATCHES ".*arria10.*")
+ set(DEVICE_FLAG "A10")
+ elseif(FPGA_DEVICE_NAME MATCHES ".*s10.*" OR FPGA_DEVICE_NAME MATCHES ".*stratix10.*")
+ set(DEVICE_FLAG "S10")
+ elseif(FPGA_DEVICE_NAME MATCHES ".*agilex.*")
+ set(DEVICE_FLAG "Agilex")
+ else()
+ message(FATAL_ERROR "An unrecognized or custom board was passed, but DEVICE_FLAG was not specified. \
+ Please make sure you have set -DDEVICE_FLAG=A10, -DDEVICE_FLAG=S10 or \
+ -DDEVICE_FLAG=Agilex.")
+ endif()
+ message(STATUS "Configuring the design with the following target: ${FPGA_DEVICE}")
+
+ # Check if the target is a BSP
+ if(IS_BSP MATCHES "1" OR FPGA_DEVICE MATCHES ".*pac_a10.*|.*pac_s10.*")
+ set(BSP_FLAG "-DIS_BSP")
+ else()
+ set(BSP_FLAG "")
+ message(STATUS "The selected target ${FPGA_DEVICE} is assumed to be an FPGA part number, so the IS_BSP macro will not be passed to your C++ code.")
+ message(STATUS "If the target is actually a BSP, run cmake with -DIS_BSP=1 to pass the IS_BSP macro to your C++ code.")
+ endif()
endif()
# These are Windows-specific flags:
@@ -46,11 +70,11 @@ endif()
# e.g. cmake .. -DSEED=7
if(NOT DEFINED SEED)
# the default seed
- if(FPGA_DEVICE MATCHES ".*a10.*")
+ if(DEVICE_FLAG MATCHES "A10")
set(SEED 1)
- elseif(FPGA_DEVICE MATCHES ".*s10.*")
+ elseif(DEVICE_FLAG MATCHES "S10")
set(SEED 2)
- elseif(FPGA_DEVICE MATCHES ".*agilex.*")
+ elseif(DEVICE_FLAG MATCHES "Agilex")
set(SEED 3)
else()
set(SEED 4)
@@ -79,11 +103,11 @@ if(PIXELS_PER_CYCLE)
message(STATUS "PIXELS_PER_CYCLE explicitly set to ${PIXELS_PER_CYCLE}")
else()
# Default PIXELS_PER_CYCLE based on the board being used
- if(FPGA_DEVICE MATCHES ".*a10.*")
+ if(DEVICE_FLAG MATCHES "A10")
set(PIXELS_PER_CYCLE 2)
- elseif(FPGA_DEVICE MATCHES ".*s10.*")
+ elseif(DEVICE_FLAG MATCHES "S10")
set(PIXELS_PER_CYCLE 2)
- elseif(FPGA_DEVICE MATCHES ".*agilex.*")
+ elseif(DEVICE_FLAG MATCHES "Agilex")
set(PIXELS_PER_CYCLE 1)
else()
message(WARNING "Unknown board: setting PIXELS_PER_CYCLE to 1")
@@ -120,13 +144,13 @@ endif()
# 1. The "compile" stage compiles the device code to an intermediate representation (SPIR-V).
# 2. The "link" stage invokes the compiler's FPGA backend before linking.
# For this reason, FPGA backend flags must be passed as link flags in CMake.
-set(EMULATOR_COMPILE_FLAGS "-fsycl -fintelfpga -Wall ${CONSTEXPR_STEPS} ${WIN_FLAG} ${AC_TYPES_FLAG} ${FILTER_SIZE_FLAG} ${PIXELS_PER_CYCLE_FLAG} ${MAX_COLS_FLAG} ${PIXEL_BITS_FLAG} -DFPGA_EMULATOR")
-set(EMULATOR_LINK_FLAGS "-fsycl -fintelfpga ${AC_TYPES_FLAG} ${FILTER_SIZE_FLAG} ${PIXELS_PER_CYCLE_FLAG} ${MAX_COLS_FLAG} ${PIXEL_BITS_FLAG}")
-set(SIMULATOR_COMPILE_FLAGS "-fsycl -fintelfpga -Wall ${CONSTEXPR_STEPS} ${WIN_FLAG} ${AC_TYPES_FLAG} ${FILTER_SIZE_FLAG} ${PIXELS_PER_CYCLE_FLAG} ${MAX_COLS_FLAG} ${PIXEL_BITS_FLAG} -Xssimulation -DFPGA_SIMULATOR")
-set(SIMULATOR_LINK_FLAGS "-fsycl -fintelfpga -Xssimulation -Xsghdl -Xstarget=${FPGA_DEVICE} ${USER_HARDWARE_FLAGS}")
-set(REPORT_LINK_FLAGS "-fsycl -fintelfpga -Xshardware ${PROFILE_FLAG} ${FLAT_COMPILE_FLAG} -Xsparallel=2 ${SEED_FLAG} -Xstarget=${FPGA_DEVICE} ${FILTER_SIZE_FLAG} ${PIXELS_PER_CYCLE_FLAG} ${MAX_COLS_FLAG} ${PIXEL_BITS_FLAG} ${IP_MODE_FLAG} ${USER_HARDWARE_FLAGS}")
-set(HARDWARE_COMPILE_FLAGS "-fsycl -fintelfpga -Wall ${CONSTEXPR_STEPS} ${WIN_FLAG} ${AC_TYPES_FLAG} ${FILTER_SIZE_FLAG} ${PIXELS_PER_CYCLE_FLAG} ${MAX_COLS_FLAG} ${PIXEL_BITS_FLAG} -DFPGA_HARDWARE")
-set(HARDWARE_LINK_FLAGS "${REPORT_LINK_FLAGS} ${AC_TYPES_FLAG}")
+set(EMULATOR_COMPILE_FLAGS "-fsycl -fintelfpga -Wall ${CONSTEXPR_STEPS} ${WIN_FLAG} ${AC_TYPES_FLAG} ${FILTER_SIZE_FLAG} ${PIXELS_PER_CYCLE_FLAG} ${MAX_COLS_FLAG} ${PIXEL_BITS_FLAG} -DFPGA_EMULATOR ${BSP_FLAG}")
+set(EMULATOR_LINK_FLAGS "-fsycl -fintelfpga ${AC_TYPES_FLAG} ${FILTER_SIZE_FLAG} ${PIXELS_PER_CYCLE_FLAG} ${MAX_COLS_FLAG} ${PIXEL_BITS_FLAG} ${BSP_FLAG}")
+set(SIMULATOR_COMPILE_FLAGS "-fsycl -fintelfpga -Wall ${CONSTEXPR_STEPS} ${WIN_FLAG} ${AC_TYPES_FLAG} ${FILTER_SIZE_FLAG} ${PIXELS_PER_CYCLE_FLAG} ${MAX_COLS_FLAG} ${PIXEL_BITS_FLAG} -Xssimulation -DFPGA_SIMULATOR ${BSP_FLAG}")
+set(SIMULATOR_LINK_FLAGS "-fsycl -fintelfpga -Xssimulation -Xsghdl -Xstarget=${FPGA_DEVICE} ${USER_HARDWARE_FLAGS} ${AC_TYPES_FLAG} ${BSP_FLAG}")
+set(REPORT_LINK_FLAGS "-fsycl -fintelfpga -Xshardware ${PROFILE_FLAG} ${FLAT_COMPILE_FLAG} -Xsparallel=2 ${SEED_FLAG} -Xstarget=${FPGA_DEVICE} ${FILTER_SIZE_FLAG} ${PIXELS_PER_CYCLE_FLAG} ${MAX_COLS_FLAG} ${PIXEL_BITS_FLAG} ${USER_HARDWARE_FLAGS} ${BSP_FLAG}")
+set(HARDWARE_COMPILE_FLAGS "-fsycl -fintelfpga -Wall ${CONSTEXPR_STEPS} ${WIN_FLAG} ${AC_TYPES_FLAG} ${FILTER_SIZE_FLAG} ${PIXELS_PER_CYCLE_FLAG} ${MAX_COLS_FLAG} ${PIXEL_BITS_FLAG} -DFPGA_HARDWARE ${BSP_FLAG}")
+set(HARDWARE_LINK_FLAGS "${REPORT_LINK_FLAGS} ${AC_TYPES_FLAG} ${BSP_FLAG}")
# use cmake -D USER_HARDWARE_FLAGS= to set extra flags for FPGA backend compilation
###############################################################################
diff --git a/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/anr/src/anr.hpp b/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/anr/src/anr.hpp
index 7ed3d2fe1d..5c0de80dcf 100644
--- a/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/anr/src/anr.hpp
+++ b/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/anr/src/anr.hpp
@@ -347,15 +347,7 @@ std::vector SubmitANRKernels(queue& q, int cols, int rows,
// submit the vertical kernel using a column stencil
auto vertical_kernel = q.single_task([=] {
// copy host side intensity sigma LUT to the device
- // For testing the kernel system as an IP and checking the area and Fmax,
- // we allow the user to turn off connections to device memory. In this case
- // (the DISABLE_DEVICE_MEM macro IS defined), the results will be incorrect
- // since there is no way to get the data to/from the device.
-#if defined(IP_MODE)
- IntensitySigmaLUT sig_i_lut;
-#else
IntensitySigmaLUT sig_i_lut(sig_i_lut_data_ptr);
-#endif
// build the constexpr exp() and inverse LUT ROMs
constexpr ExpLUT exp_lut;
diff --git a/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/anr/src/dma_kernels.hpp b/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/anr/src/dma_kernels.hpp
index 905f13f2af..127980e148 100644
--- a/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/anr/src/dma_kernels.hpp
+++ b/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/anr/src/dma_kernels.hpp
@@ -22,11 +22,13 @@ template
event SubmitInputDMA(queue &q, T *in_ptr, int rows, int cols, int frames) {
using PipeType = DataBundle;
+#if defined (IS_BSP)
// LSU attribute to turn off caching
using NonCachingLSU =
ext::intel::lsu, ext::intel::cache<0>,
ext::intel::statically_coalesce,
ext::intel::prefetch>;
+#endif
// validate the number of columns
if ((cols % pixels_per_cycle) != 0) {
@@ -41,7 +43,12 @@ event SubmitInputDMA(queue &q, T *in_ptr, int rows, int cols, int frames) {
// Using device memory
return q.single_task([=]() [[intel::kernel_args_restrict]] {
+
+#if defined (IS_BSP)
device_ptr in(in_ptr);
+#else
+ T* in(in_ptr);
+#endif
// coalesce the following two loops into a single for-loop using the
// loop_coalesce attribute
@@ -51,7 +58,11 @@ event SubmitInputDMA(queue &q, T *in_ptr, int rows, int cols, int frames) {
PipeType pipe_data;
#pragma unroll
for (int k = 0; k < pixels_per_cycle; k++) {
+#if defined (IS_BSP)
pipe_data[k] = NonCachingLSU::load(in + i * pixels_per_cycle + k);
+#else
+ pipe_data[k] = in[i * pixels_per_cycle + k];
+#endif
}
Pipe::write(pipe_data);
}
@@ -77,7 +88,12 @@ event SubmitOutputDMA(queue &q, T *out_ptr, int rows, int cols, int frames) {
// Using device memory
return q.single_task([=]() [[intel::kernel_args_restrict]] {
+
+#if defined (IS_BSP)
device_ptr out(out_ptr);
+#else
+ T* out(out_ptr);
+#endif
// coalesce the following two loops into a single for-loop using the
// loop_coalesce attribute
diff --git a/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/anr/src/intensity_sigma_lut.hpp b/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/anr/src/intensity_sigma_lut.hpp
index d35acf52d9..4367fcf5c5 100644
--- a/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/anr/src/intensity_sigma_lut.hpp
+++ b/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/anr/src/intensity_sigma_lut.hpp
@@ -16,6 +16,7 @@ class IntensitySigmaLUT {
// default constructor
IntensitySigmaLUT() {}
+#if defined (IS_BSP)
// construct from a device_ptr (for constructing from device memory)
IntensitySigmaLUT(device_ptr ptr) {
// use a pipelined LSU to load from device memory since we don't
@@ -25,6 +26,14 @@ class IntensitySigmaLUT {
data_[i] = PipelinedLSU::load(ptr + i);
}
}
+#else
+ // construct from a regular pointer
+ IntensitySigmaLUT(float* ptr) {
+ for (int i = 0; i < lut_depth; i++) {
+ data_[i] = ptr[i];
+ }
+ }
+#endif
// construct from the ANR parameters (actually builds the LUT)
IntensitySigmaLUT(ANRParams params) {
@@ -39,8 +48,12 @@ class IntensitySigmaLUT {
}
// helper static method to allocate enough memory to hold the LUT
- static float* AllocateDevice(sycl::queue& q) {
+ static float* Allocate(sycl::queue& q) {
+#if defined (IS_BSP)
float* ptr = sycl::malloc_device(lut_depth, q);
+#else
+ float* ptr = sycl::malloc_shared(lut_depth, q);
+#endif
if (ptr == nullptr) {
std::cerr << "ERROR: could not allocate space for 'ptr'\n";
std::terminate();
@@ -49,7 +62,7 @@ class IntensitySigmaLUT {
}
// helper method to copy the data to the device
- sycl::event CopyDataToDevice(sycl::queue& q, float* ptr) {
+ sycl::event CopyData(sycl::queue& q, float* ptr) {
return q.memcpy(ptr, data_, lut_depth * sizeof(float));
}
diff --git a/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/anr/src/main.cpp b/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/anr/src/main.cpp
index ad9d8ae466..649744dd88 100644
--- a/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/anr/src/main.cpp
+++ b/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/anr/src/main.cpp
@@ -117,6 +117,7 @@ int main(int argc, char* argv[]) {
// create the output pixels (initialize to all 0s)
std::vector out_pixels(in_pixels.size(), 0);
+#if defined (IS_BSP)
// allocate memory on the device for the input and output
PixelT *in, *out;
if ((in = malloc_device(pixel_count, q)) == nullptr) {
@@ -127,18 +128,31 @@ int main(int argc, char* argv[]) {
std::cerr << "ERROR: could not allocate space for 'out'\n";
std::terminate();
}
+#else
+ // allocate memory on the host for the input and output
+ PixelT *in, *out;
+ if ((in = malloc_shared(pixel_count, q)) == nullptr) {
+ std::cerr << "ERROR: could not allocate space for 'in'\n";
+ std::terminate();
+ }
+ if ((out = malloc_shared(pixel_count, q)) == nullptr) {
+ std::cerr << "ERROR: could not allocate space for 'out'\n";
+ std::terminate();
+ }
+#endif
+
// copy the input data to the device memory and wait for the copy to finish
q.memcpy(in, in_pixels.data(), pixel_count * sizeof(PixelT)).wait();
// allocate space for the intensity sigma LUT
- float* sig_i_lut_data_ptr = IntensitySigmaLUT::AllocateDevice(q);
+ float* sig_i_lut_data_ptr = IntensitySigmaLUT::Allocate(q);
// create the intensity sigma LUT data locally on the host
IntensitySigmaLUT sig_i_lut_host(params);
// copy the intensity sigma LUT to the device
- sig_i_lut_host.CopyDataToDevice(q, sig_i_lut_data_ptr).wait();
+ sig_i_lut_host.CopyData(q, sig_i_lut_data_ptr).wait();
//////////////////////////////////////////////////////////////////////////////
// track timing information in ms
diff --git a/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/board_test/README.md b/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/board_test/README.md
index f74484a9f7..e61f97b305 100644
--- a/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/board_test/README.md
+++ b/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/board_test/README.md
@@ -40,18 +40,22 @@ You can also find more information about [troubleshooting build errors](/DirectP
| Optimized for | Description
|:--- |:---
| OS | Ubuntu* 18.04/20.04
RHEL*/CentOS* 8
SUSE* 15
Windows* 10
-| Hardware | Intel® Programmable Acceleration Card with Intel® Arria® 10 GX FPGA (Intel® PAC with Intel® Arria® 10 GX FPGA)
Intel® FPGA Programmable Acceleration Card (PAC) D5005 (with Intel Stratix® 10 SX FPGA)
Intel® FPGA 3rd party / custom platforms with oneAPI support
**Note**: Intel® FPGA PAC hardware is only compatible with Ubuntu 18.04*
+| Hardware | Intel® Agilex™, Arria® 10, and Stratix® 10 FPGAs
| Software | Intel® oneAPI DPC++/C++ Compiler
-> **Note**: Even though the Intel DPC++/C++ OneAPI compiler is enough to compile for emulation, generating reports and generating RTL, there are extra software requirements for the simulation flow and FPGA compiles.
->
-> For using the simulator flow, Intel® Quartus® Prime Pro Edition and one of the following simulators must be installed and accessible through your PATH:
-> - Questa*-Intel® FPGA Edition
-> - Questa*-Intel® FPGA Starter Edition
-> - ModelSim® SE
+> **Note**: Even though the Intel DPC++/C++ OneAPI compiler is enough to compile for emulation, generating reports and generating RTL, there are extra software requirements for the simulation flow and FPGA compiles.
+>
+> For using the simulator flow, Intel® Quartus® Prime Pro Edition and one of the following simulators must be installed and accessible through your PATH:
+> - Questa*-Intel® FPGA Edition
+> - Questa*-Intel® FPGA Starter Edition
+> - ModelSim® SE
+>
+> When using the hardware compile flow, Intel® Quartus® Prime Pro Edition must be installed and accessible through your PATH.
>
-> When using the hardware compile flow, Intel® Quartus® Prime Pro Edition must be installed and accessible through your PATH.
-
+> :warning: Make sure you add the device files associated with the FPGA that you are targeting to your Intel® Quartus® Prime installation.
+
+> :warning: This sample is benchmarking an FPGA board, therefore it should really be used when targeting an FPGA board/BSP.
+
## Key Implementation Details
A oneAPI Board Support Package (BSP) consists of software layers and an FPGA hardware scaffold design, making it possible to target an FPGA through the Intel® oneAPI DPC++/C++ Compiler.
@@ -118,21 +122,26 @@ Performance results are based on testing as of Jan 31, 2022.
### On Linux*
1. Change to the sample directory.
-2. Configure the build system for **Intel® FPGA PAC D5005 (with Intel Stratix® 10 SX)**, which is the default.
+2. Configure the build system for the Agilex™ device family, which is the default.
```
mkdir build
cd build
cmake ..
```
- For **Intel® PAC with Intel Arria® 10 GX FPGA**, enter the following:
- ```
- cmake -DFPGA_DEVICE=intel_a10gx_pac:pac_a10 ..
- ```
- You can also compile for a custom FPGA platform. Ensure that the board support package is installed on your system, and enter a command similar to the following example:
- ```
- cmake -DFPGA_DEVICE=: ..
- ```
+
+ > **Note**: You can change the default target by using the command:
+ > ```
+ > cmake .. -DFPGA_DEVICE=
+ > ```
+ >
+ > Alternatively, you can target an explicit FPGA board variant and BSP by using the following command:
+ > ```
+ > cmake .. -DFPGA_DEVICE=:
+ > ```
+ >
+ > You will only be able to run an executable on the FPGA if you specified a BSP.
+
3. Compile the design. (The provided targets match the recommended development flow.)
1. Compile for emulation (fast compile time, targets emulated FPGA device).
@@ -150,27 +159,28 @@ Performance results are based on testing as of Jan 31, 2022.
make fpga
```
- (Optional) The hardware compiles listed above can take several hours to complete; alternatively, you can download FPGA precompiled binaries (compatible with Linux* Ubuntu* 18.04) from [https://iotdk.intel.com/fpga-precompiled-binaries/latest/board_test.fpga.tar.gz](https://iotdk.intel.com/fpga-precompiled-binaries/latest/board_test.fpga.tar.gz).
-
### On Windows*
->**Note**: The Intel® PAC with Intel Arria® 10 GX FPGA and Intel® FPGA PAC D5005 (with Intel Stratix® 10 SX) do not yet support Windows*. Compiling to FPGA hardware on Windows* requires a third-party or custom Board Support Package (BSP) with Windows* support.
-
1. Change to the sample directory.
-2. Configure the build system for **Intel® FPGA PAC D5005 (with Intel Stratix® 10 SX)**, which is the default.
+2. Configure the build system for the Agilex™ device family, which is the default.
```
mkdir build
cd build
cmake -G "NMake Makefiles" ..
```
- To compile for the **Intel® PAC with Intel Arria® 10 GX FPGA**, enter the following:
- ```
- cmake -G "NMake Makefiles" -DFPGA_DEVICE=intel_a10gx_pac:pac_a10 ..
- ```
- You can also compile for a custom FPGA platform. Ensure that the board support package is installed on your system, and enter a command similar to the following example:
- ```
- cmake -G "NMake Makefiles" -DFPGA_DEVICE=: ..
- ```
+
+ > **Note**: You can change the default target by using the command:
+ > ```
+ > cmake -G "NMake Makefiles" .. -DFPGA_DEVICE=
+ > ```
+ >
+ > Alternatively, you can target an explicit FPGA board variant and BSP by using the following command:
+ > ```
+ > cmake -G "NMake Makefiles" .. -DFPGA_DEVICE=:
+ > ```
+ >
+ > You will only be able to run an executable on the FPGA if you specified a BSP.
+
3. Compile the design. (The provided targets match the recommended development flow.)
1. Compile for emulation (fast compile time, targets emulated FPGA device).
@@ -229,7 +239,7 @@ To view test details and usage information using the binary, use the `-help` opt
```
./board_test.fpga_emu
```
- 2. Run the sample on the FPGA device.
+ 2. Run the sample on the FPGA device (only if you ran `cmake` with `-DFPGA_DEVICE=:`).
```
./board_test.fpga
```
@@ -247,6 +257,14 @@ To view test details and usage information using the binary, use the `-help` opt
```
board_test.exe -test=
```
+ 2. Run the sample on the FPGA device (only if you ran `cmake` with `-DFPGA_DEVICE=:`).
+ ```
+ ./board_test.fpga.exe
+ ```
+ By default the program runs all tests. To run a specific test, enter the test number as an argument to the `-test` option:
+ ```
+ ./board_test.fpga.exe -test=
+ ```
## Example Output
diff --git a/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/board_test/src/CMakeLists.txt b/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/board_test/src/CMakeLists.txt
index e11d99f43b..01c6b46987 100644
--- a/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/board_test/src/CMakeLists.txt
+++ b/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/board_test/src/CMakeLists.txt
@@ -9,12 +9,17 @@ set(FPGA_EARLY_IMAGE ${TARGET_NAME}_report.a)
# FPGA board selection
if(NOT DEFINED FPGA_DEVICE)
- set(FPGA_DEVICE "intel_s10sx_pac:pac_s10")
+ set(FPGA_DEVICE "Agilex")
message(STATUS "FPGA_DEVICE was not specified.\
- \nConfiguring the design to run on the default FPGA board ${FPGA_DEVICE} (Intel(R) PAC with Intel Stratix(R) 10 SX FPGA). \
- \nPlease refer to the README for information on board selection.")
+ \nConfiguring the design to the default FPGA family: ${FPGA_DEVICE}\
+ \nPlease refer to the README for information on target selection.")
else()
- message(STATUS "Configuring the design to run on FPGA board ${FPGA_DEVICE}")
+ message(STATUS "Configuring the design with the following target: ${FPGA_DEVICE}")
+endif()
+
+# Check if the target is a BSP
+if(NOT FPGA_DEVICE MATCHES ".*:.*")
+ message(STATUS "This sample is made to target BSPs as this is a benchmarking sample.")
endif()
# This is a Windows-specific flag that enables error handling in host code
diff --git a/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/cholesky/README.md b/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/cholesky/README.md
index f38d47df95..209172f2fe 100755
--- a/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/cholesky/README.md
+++ b/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/cholesky/README.md
@@ -44,7 +44,7 @@ You can also find more information about [troubleshooting build errors](/DirectP
| Optimized for | Description
|:--- |:---
| OS | Ubuntu* 18.04/20.04
RHEL*/CentOS* 8
SUSE* 15
Windows* 10
-| Hardware |Intel® Programmable Acceleration Card with Intel® Arria® 10 GX FPGA (Intel® PAC with Intel® Arria® 10 GX FPGA)
Intel® FPGA Programmable Acceleration Card (PAC) D5005 (with Intel Stratix® 10 SX)
Intel Xeon® CPU E5-1650 v2 @ 3.50GHz (host machine)
+| Hardware | Intel® Agilex™, Arria® 10, and Stratix® 10 FPGAs
| Software | Intel® oneAPI DPC++/C++ Compiler
> **Note**: Even though the Intel DPC++/C++ OneAPI compiler is enough to compile for emulation, generating reports and generating RTL, there are extra software requirements for the simulation flow and FPGA compiles.
@@ -55,6 +55,8 @@ You can also find more information about [troubleshooting build errors](/DirectP
> - ModelSim® SE
>
> When using the hardware compile flow, Intel® Quartus® Prime Pro Edition must be installed and accessible through your PATH.
+>
+> :warning: Make sure you add the device files associated with the FPGA that you are targeting to your Intel® Quartus® Prime installation.
### Performance
@@ -145,16 +147,26 @@ For `constexpr_math.hpp`, `memory_utils.hpp`, `metaprogramming_utils.hpp`, and `
### On Linux*
1. Change to the sample directory.
-2. Configure the build system for **Intel® PAC with Intel Arria® 10 GX** FPGA, which is the default.
+2. Configure the build system for the Agilex™ device family, which is the default.
+
```
mkdir build
cd build
cmake ..
```
- For the **Intel® FPGA PAC D5005 (with Intel Stratix® 10 SX)**, enter the following command instead:
- ```
- cmake .. -DFPGA_DEVICE=intel_s10sx_pac:pac_s10
- ```
+
+ > **Note**: You can change the default target by using the command:
+ > ```
+ > cmake .. -DFPGA_DEVICE=
+ > ```
+ >
+ > Alternatively, you can target an explicit FPGA board variant and BSP by using the following command:
+ > ```
+ > cmake .. -DFPGA_DEVICE=:
+ > ```
+ >
+ > You will only be able to run an executable on the FPGA if you specified a BSP.
+
3. Compile the design. (The provided targets match the recommended development flow.)
1. Compile for emulation (fast compile time, targets emulated FPGA device).
@@ -176,23 +188,28 @@ For `constexpr_math.hpp`, `memory_utils.hpp`, `metaprogramming_utils.hpp`, and `
make fpga
```
- (Optional) The hardware compile may take several hours to complete; alternatively, you can download FPGA precompiled binaries (compatible with Linux* Ubuntu* 18.04) from [https://iotdk.intel.com/fpga-precompiled-binaries/latest/cholesky.fpga.tar.gz](https://iotdk.intel.com/fpga-precompiled-binaries/latest/cholesky.fpga.tar.gz).
-
### On Windows*
->**Note**: The Intel® PAC with Intel Arria® 10 GX FPGA and Intel® FPGA PAC D5005 (with Intel Stratix® 10 SX) do not yet support Windows*. Compiling to FPGA hardware on Windows* requires a third-party or custom Board Support Package (BSP) with Windows* support.
-
1. Change to the sample directory.
-2. Configure the build system for the Intel® PAC with Intel Arria® 10 GX FPGA, which is the default.
+2. Configure the build system for the Agilex™ device family, which is the default.
```
mkdir build
cd build
cmake -G "NMake Makefiles" ..
```
- For the Intel® FPGA PAC D5005 (with Intel Stratix® 10 SX), enter the following command instead:
- ```
- cmake -G "NMake Makefiles" .. -DFPGA_DEVICE=intel_s10sx_pac:pac_s10
- ```
+
+ > **Note**: You can change the default target by using the command:
+ > ```
+ > cmake -G "NMake Makefiles" .. -DFPGA_DEVICE=
+ > ```
+ >
+ > Alternatively, you can target an explicit FPGA board variant and BSP by using the following command:
+ > ```
+ > cmake -G "NMake Makefiles" .. -DFPGA_DEVICE=:
+ > ```
+ >
+ > You will only be able to run an executable on the FPGA if you specified a BSP.
+
3. Compile the design. (The provided targets match the recommended development flow.)
1. Compile for emulation (fast compile time, targets emulated FPGA device):
```
@@ -236,7 +253,7 @@ You can apply the Cholesky decomposition to a number of matrices, as shown below
```
CL_CONTEXT_MPSIM_DEVICE_INTELFPGA=1 ./cholesky.fpga_sim
```
-3. Run the sample on the FPGA device.
+3. Run the sample on the FPGA device (only if you ran `cmake` with `-DFPGA_DEVICE=:`).
```
./cholesky.fpga
```
@@ -253,7 +270,7 @@ You can apply the Cholesky decomposition to a number of matrices, as shown below
cholesky.fpga_sim.exe
set CL_CONTEXT_MPSIM_DEVICE_INTELFPGA=
```
-3. Run the sample on the FPGA device.
+3. Run the sample on the FPGA device (only if you ran `cmake` with `-DFPGA_DEVICE=:`).
```
cholesky.fpga.exe
```
diff --git a/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/cholesky/src/CMakeLists.txt b/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/cholesky/src/CMakeLists.txt
index e52aa0d3d3..0dd6a5a000 100755
--- a/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/cholesky/src/CMakeLists.txt
+++ b/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/cholesky/src/CMakeLists.txt
@@ -7,42 +7,53 @@ set(FPGA_EARLY_IMAGE ${TARGET_NAME}_report.a)
# FPGA board selection
if(NOT DEFINED FPGA_DEVICE)
- set(FPGA_DEVICE "intel_a10gx_pac:pac_a10")
+ set(FPGA_DEVICE "Agilex")
+ set(DEVICE_FLAG "Agilex")
message(STATUS "FPGA_DEVICE was not specified.\
- \nConfiguring the design to run on the default FPGA board ${FPGA_DEVICE} (Intel(R) PAC with Intel Arria(R) 10 GX FPGA). \
- \nPlease refer to the README for information on board selection.")
+ \nConfiguring the design to the default FPGA family: ${FPGA_DEVICE}\
+ \nPlease refer to the README for information on target selection.")
+ set(BSP_FLAG "")
else()
- message(STATUS "Configuring the design to run on FPGA board ${FPGA_DEVICE}")
-endif()
+ string(TOLOWER ${FPGA_DEVICE} FPGA_DEVICE_NAME)
+ if(FPGA_DEVICE_NAME MATCHES ".*a10.*" OR FPGA_DEVICE_NAME MATCHES ".*arria10.*")
+ set(DEVICE_FLAG "A10")
+ elseif(FPGA_DEVICE_NAME MATCHES ".*s10.*" OR FPGA_DEVICE_NAME MATCHES ".*stratix10.*")
+ set(DEVICE_FLAG "S10")
+ elseif(FPGA_DEVICE_NAME MATCHES ".*agilex.*")
+ set(DEVICE_FLAG "Agilex")
+ endif()
+ message(STATUS "Configuring the design with the following target: ${FPGA_DEVICE}")
-# This is a Windows-specific flag that enables error handling in host code
-if(WIN32)
- set(PLATFORM_SPECIFIC_COMPILE_FLAGS "/EHsc /Qactypes /Wall /fp:precise")
- set(PLATFORM_SPECIFIC_LINK_FLAGS "/Qactypes /fp:precise")
-else()
- set(PLATFORM_SPECIFIC_COMPILE_FLAGS "-qactypes -Wall -fno-finite-math-only -fp-model=precise")
- set(PLATFORM_SPECIFIC_LINK_FLAGS "-fp-model=precise")
+ # Check if the target is a BSP
+ if(IS_BSP MATCHES "1" OR FPGA_DEVICE MATCHES ".*pac_a10.*|.*pac_s10.*")
+ set(BSP_FLAG "-DIS_BSP")
+ else()
+ set(BSP_FLAG "")
+ message(STATUS "The selected target ${FPGA_DEVICE} is assumed to be an FPGA part number, so the IS_BSP macro will not be passed to your C++ code.")
+ message(STATUS "If the target is actually a BSP, run cmake with -DIS_BSP=1 to pass the IS_BSP macro to your C++ code.")
+ endif()
endif()
+if(NOT DEFINED DEVICE_FLAG)
+ message(FATAL_ERROR "An unrecognized or custom board was passed, but DEVICE_FLAG was not specified. \
+ Make sure you have set -DDEVICE_FLAG=A10, -DDEVICE_FLAG=S10 or -DDEVICE_FLAG=Agilex.")
+endif()
-# A10 parameters
-set(MATRIX_DIMENSION 32)
-set(COMPLEX 0)
-set(FIXED_ITERATIONS 39)
-set(CLOCK_TARGET 360MHz)
-set(SEED "-Xsseed=29")
-# Overwrite design parameters according to the selected board
-if(FPGA_DEVICE MATCHES ".*a10.*")
+if(DEVICE_FLAG MATCHES "A10")
# A10 parameters
- # Nothing to do
-elseif(FPGA_DEVICE MATCHES ".*s10.*")
+ set(MATRIX_DIMENSION 32)
+ set(COMPLEX 0)
+ set(FIXED_ITERATIONS 39)
+ set(CLOCK_TARGET 360MHz)
+ set(SEED "-Xsseed=29")
+elseif(DEVICE_FLAG MATCHES "S10")
# S10 parameters
set(MATRIX_DIMENSION 32)
set(COMPLEX 0)
set(FIXED_ITERATIONS 44)
set(CLOCK_TARGET 450MHz)
set(SEED "-Xsseed=5")
-elseif(FPGA_DEVICE MATCHES ".*agilex.*")
+elseif(DEVICE_FLAG MATCHES "Agilex")
# Agilex™ parameters
set(MATRIX_DIMENSION 32)
set(FIXED_ITERATIONS 45)
@@ -50,8 +61,16 @@ elseif(FPGA_DEVICE MATCHES ".*agilex.*")
set(CLOCK_TARGET 520MHz)
set(SEED "-Xsseed=5")
else()
- message(STATUS "Unknown board ${FPGA_DEVICE}!")
- message(STATUS "Using Arria 10 defaults.")
+ message(FATAL_ERROR "An incorrect DEVICE_FLAG was given. Make sure you have set -DDEVICE_FLAG=A10, -DDEVICE_FLAG=S10 or -DDEVICE_FLAG=Agilex.")
+endif()
+
+# This is a Windows-specific flag that enables error handling in host code
+if(WIN32)
+ set(PLATFORM_SPECIFIC_COMPILE_FLAGS "/EHsc /Qactypes /Wall /fp:precise")
+ set(PLATFORM_SPECIFIC_LINK_FLAGS "/Qactypes /fp:precise")
+else()
+ set(PLATFORM_SPECIFIC_COMPILE_FLAGS "-qactypes -Wall -fno-finite-math-only -fp-model=precise")
+ set(PLATFORM_SPECIFIC_LINK_FLAGS "-fp-model=precise")
endif()
if(IGNORE_DEFAULT_SEED)
@@ -79,12 +98,12 @@ message(STATUS "SEED=${SEED}")
# 1. The "compile" stage compiles the device code to an intermediate representation (SPIR-V).
# 2. The "link" stage invokes the compiler's FPGA backend before linking.
# For this reason, FPGA backend flags must be passed as link flags in CMake.
-set(EMULATOR_COMPILE_FLAGS "-fsycl -fintelfpga -Wall ${PLATFORM_SPECIFIC_COMPILE_FLAGS} -Wformat-security -Werror=format-security -fbracket-depth=512 -DFIXED_ITERATIONS=${FIXED_ITERATIONS} -DCOMPLEX=${COMPLEX} -DMATRIX_DIMENSION=${MATRIX_DIMENSION} -DFPGA_EMULATOR")
-set(EMULATOR_LINK_FLAGS "-fsycl -fintelfpga ${PLATFORM_SPECIFIC_LINK_FLAGS}")
-set(SIMULATOR_COMPILE_FLAGS "-fsycl -fintelfpga -Wall ${PLATFORM_SPECIFIC_COMPILE_FLAGS} -DFPGA_SIMULATOR -fbracket-depth=512 -DFIXED_ITERATIONS=${FIXED_ITERATIONS} -DCOMPLEX=${COMPLEX} -DMATRIX_DIMENSION=${MATRIX_DIMENSION} -Xsfp-relaxed ${USER_SIMULATOR_FLAGS}")
-set(SIMULATOR_LINK_FLAGS "-fsycl -fintelfpga ${PLATFORM_SPECIFIC_LINK_FLAGS} -Xssimulation -Xsghdl -Xsclock=${CLOCK_TARGET} -Xstarget=${FPGA_DEVICE} ${USER_HARDWARE_FLAGS} -Xsfp-relaxed")
-set(HARDWARE_COMPILE_FLAGS "-fsycl -fintelfpga -Wall ${PLATFORM_SPECIFIC_COMPILE_FLAGS} -Wformat-security -Werror=format-security -fbracket-depth=512 -DFIXED_ITERATIONS=${FIXED_ITERATIONS} -DCOMPLEX=${COMPLEX} -DMATRIX_DIMENSION=${MATRIX_DIMENSION} -Xsfp-relaxed -DFPGA_HARDWARE")
-set(HARDWARE_LINK_FLAGS "-fsycl -fintelfpga ${PLATFORM_SPECIFIC_LINK_FLAGS} -Xshardware -Xsclock=${CLOCK_TARGET} -Xsparallel=2 ${SEED} -Xstarget=${FPGA_DEVICE} ${USER_HARDWARE_FLAGS} -Xsfp-relaxed")
+set(EMULATOR_COMPILE_FLAGS "-fsycl -fintelfpga -Wall ${PLATFORM_SPECIFIC_COMPILE_FLAGS} -Wformat-security -Werror=format-security -fbracket-depth=512 -DFIXED_ITERATIONS=${FIXED_ITERATIONS} -DCOMPLEX=${COMPLEX} -DMATRIX_DIMENSION=${MATRIX_DIMENSION} -DFPGA_EMULATOR ${BSP_FLAG}")
+set(EMULATOR_LINK_FLAGS "-fsycl -fintelfpga ${PLATFORM_SPECIFIC_LINK_FLAGS} ${BSP_FLAG}")
+set(SIMULATOR_COMPILE_FLAGS "-fsycl -fintelfpga -Wall ${PLATFORM_SPECIFIC_COMPILE_FLAGS} -DFPGA_SIMULATOR -fbracket-depth=512 -DFIXED_ITERATIONS=${FIXED_ITERATIONS} -DCOMPLEX=${COMPLEX} -DMATRIX_DIMENSION=${MATRIX_DIMENSION} -Xsfp-relaxed ${USER_SIMULATOR_FLAGS} ${BSP_FLAG}")
+set(SIMULATOR_LINK_FLAGS "-fsycl -fintelfpga ${PLATFORM_SPECIFIC_LINK_FLAGS} -Xssimulation -Xsghdl -Xsclock=${CLOCK_TARGET} -Xstarget=${FPGA_DEVICE} ${USER_HARDWARE_FLAGS} -Xsfp-relaxed ${BSP_FLAG}")
+set(HARDWARE_COMPILE_FLAGS "-fsycl -fintelfpga -Wall ${PLATFORM_SPECIFIC_COMPILE_FLAGS} -Wformat-security -Werror=format-security -fbracket-depth=512 -DFIXED_ITERATIONS=${FIXED_ITERATIONS} -DCOMPLEX=${COMPLEX} -DMATRIX_DIMENSION=${MATRIX_DIMENSION} -Xsfp-relaxed -DFPGA_HARDWARE ${BSP_FLAG}")
+set(HARDWARE_LINK_FLAGS "-fsycl -fintelfpga ${PLATFORM_SPECIFIC_LINK_FLAGS} -Xshardware -Xsclock=${CLOCK_TARGET} -Xsparallel=2 ${SEED} -Xstarget=${FPGA_DEVICE} ${USER_HARDWARE_FLAGS} -Xsfp-relaxed ${BSP_FLAG}")
# use cmake -D USER_HARDWARE_FLAGS= to set extra flags for FPGA backend compilation
diff --git a/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/cholesky/src/cholesky.hpp b/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/cholesky/src/cholesky.hpp
index d0729d4a29..80bc2df92d 100644
--- a/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/cholesky/src/cholesky.hpp
+++ b/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/cholesky/src/cholesky.hpp
@@ -54,8 +54,14 @@ void CholeskyDecompositionImpl(
sycl::ext::intel::pipe;
// Allocate FPGA DDR memory.
+#if defined (IS_BSP)
TT *a_device = sycl::malloc_device(kAMatrixSize * matrix_count, q);
TT *l_device = sycl::malloc_device(kLMatrixSize * matrix_count, q);
+#else
+ // malloc_device are not supported when targetting an FPGA part/family
+ TT *a_device = sycl::malloc_shared(kAMatrixSize * matrix_count, q);
+ TT *l_device = sycl::malloc_shared(kLMatrixSize * matrix_count, q);
+#endif
if ((a_device == nullptr) || (l_device == nullptr)) {
std::cerr << "Error when allocating FPGA DDR" << std::endl;
@@ -93,8 +99,6 @@ void CholeskyDecompositionImpl(
constexpr int kLoopIter =
(kLMatrixSize / kNumElementsPerDDRBurst) + kExtraIteration;
- sycl::device_ptr vector_ptr_device(l_device);
-
// Repeat matrix_count complete L matrix pipe reads
// for as many repetitions as needed
// The loop coalescing directive merges the two outer loops together
@@ -105,6 +109,18 @@ void CholeskyDecompositionImpl(
for (int li = 0; li < kLoopIter; li++) {
TT bank[kNumElementsPerDDRBurst];
+#if defined (IS_BSP)
+ // When targeting a BSP, we instruct the compiler that this pointer
+ // lives on the device.
+ // Knowing this, the compiler won't generate hardware to
+ // potentially get data from the host.
+ sycl::device_ptr vector_ptr(l_device);
+#else
+ // Device pointers are not supported when targeting an FPGA
+ // family/part
+ TT* vector_ptr(l_device);
+#endif
+
for (int k = 0; k < kNumElementsPerDDRBurst; k++) {
if (((li * kNumElementsPerDDRBurst) + k) < kLMatrixSize) {
bank[k] = LMatrixPipe::read();
@@ -117,7 +133,7 @@ void CholeskyDecompositionImpl(
#pragma unroll
for (int k = 0; k < kNumElementsPerDDRBurst; k++) {
if (((li * kNumElementsPerDDRBurst) + k) < kLMatrixSize) {
- vector_ptr_device[(matrix_idx * kLMatrixSize) +
+ vector_ptr[(matrix_idx * kLMatrixSize) +
(li * kNumElementsPerDDRBurst) + k] = bank[k];
}
}
@@ -125,7 +141,7 @@ void CholeskyDecompositionImpl(
// Write a burst of kNumElementsPerDDRBurst elements to DDR
#pragma unroll
for (int k = 0; k < kNumElementsPerDDRBurst; k++) {
- vector_ptr_device[(matrix_idx * kLMatrixSize) +
+ vector_ptr[(matrix_idx * kLMatrixSize) +
(li * kNumElementsPerDDRBurst) + k] = bank[k];
}
}
diff --git a/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/cholesky/src/memory_transfers.hpp b/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/cholesky/src/memory_transfers.hpp
index 6d25905d41..f587925870 100644
--- a/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/cholesky/src/memory_transfers.hpp
+++ b/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/cholesky/src/memory_transfers.hpp
@@ -38,8 +38,6 @@ void MatrixReadFromDDRToPipe(
// Size of a full matrix
constexpr int kMatrixSize = rows * columns;
- sycl::device_ptr matrix_ptr_device(matrix_ptr);
-
// Repeatedly read matrix_count matrices from DDR and send them to the pipe
for (int repetition = 0; repetition < repetitions; repetition++) {
for (int matrix_index = 0; matrix_index < matrix_count; matrix_index++) {
@@ -47,6 +45,18 @@ void MatrixReadFromDDRToPipe(
// Only useful in the case of kIncompleteBurst
int load_index = 0;
+#if defined (IS_BSP)
+ // When targeting a BSP, we instruct the compiler that this pointer
+ // lives on the device.
+ // Knowing this, the compiler won't generate hardware to
+ // potentially get data from the host.
+ sycl::device_ptr matrix_ptr_located(matrix_ptr);
+#else
+ // Device pointers are not supported when targeting an FPGA
+ // family/part
+ TT* matrix_ptr_located(matrix_ptr);
+#endif
+
[[intel::initiation_interval(1)]] // NO-FORMAT: Attribute
for (ac_int li = 0; li < kLoopIter; li++) {
bool last_burst_of_col;
@@ -71,12 +81,12 @@ void MatrixReadFromDDRToPipe(
// memory address that may be beyond the matrix last address)
if (!out_of_bounds) {
ddr_read.template get() =
- matrix_ptr_device[matrix_index * kMatrixSize + load_index +
+ matrix_ptr_located[matrix_index * kMatrixSize + load_index +
k];
}
} else {
ddr_read.template get() =
- matrix_ptr_device[matrix_index * kMatrixSize +
+ matrix_ptr_located[matrix_index * kMatrixSize +
(int)(li)*num_elem_per_bank + k];
}
});
diff --git a/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/cholesky_inversion/README.md b/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/cholesky_inversion/README.md
index 88dcfbb230..dfd57273b0 100644
--- a/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/cholesky_inversion/README.md
+++ b/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/cholesky_inversion/README.md
@@ -57,7 +57,7 @@ You can also find more information about [troubleshooting build errors](/DirectP
| Optimized for | Description
|:--- |:---
| OS | Ubuntu* 18.04/20.04
RHEL*/CentOS* 8
SUSE* 15
Windows* 10
-| Hardware | Intel® Programmable Acceleration Card with Intel® Arria® 10 GX FPGA (Intel® PAC with Intel® Arria® 10 GX FPGA)
Intel® FPGA Programmable Acceleration Card (PAC) D5005 (with Intel Stratix® 10 SX)
+| Hardware | Intel® Agilex™, Arria® 10, and Stratix® 10 FPGAs
| Software | Intel® oneAPI DPC++/C++ Compiler
> **Note**: Even though the Intel DPC++/C++ OneAPI compiler is enough to compile for emulation, generating reports and generating RTL, there are extra software requirements for the simulation flow and FPGA compiles.
@@ -68,6 +68,8 @@ You can also find more information about [troubleshooting build errors](/DirectP
> - ModelSim® SE
>
> When using the hardware compile flow, Intel® Quartus® Prime Pro Edition must be installed and accessible through your PATH.
+>
+> :warning: Make sure you add the device files associated with the FPGA that you are targeting to your Intel® Quartus® Prime installation.
### Performance
@@ -167,16 +169,26 @@ Additionaly, the cmake build system can be configured using the following parame
### On Linux*
1. Change to the sample directory.
-2. Configure the build system for **Intel® PAC with Intel Arria® 10 GX FPGA**, which is the default.
+2. Configure the build system for the Agilex™ device family, which is the default.
+
```
mkdir build
cd build
cmake ..
```
- For the **Intel® FPGA PAC D5005 (with Intel Stratix® 10 SX)**, enter the following:
- ```
- cmake .. -DFPGA_DEVICE=intel_s10sx_pac:pac_s10
- ```
+
+ > **Note**: You can change the default target by using the command:
+ > ```
+ > cmake .. -DFPGA_DEVICE=
+ > ```
+ >
+ > Alternatively, you can target an explicit FPGA board variant and BSP by using the following command:
+ > ```
+ > cmake .. -DFPGA_DEVICE=:
+ > ```
+ >
+ > You will only be able to run an executable on the FPGA if you specified a BSP.
+
3. Compile the design. (The provided targets match the recommended development flow.)
1. Compile for emulation (fast compile time, targets emulated FPGA device).
@@ -198,23 +210,27 @@ Additionaly, the cmake build system can be configured using the following parame
make fpga
```
- (Optional) The hardware compiles listed above can take several hours to complete; alternatively, you can download FPGA precompiled binaries (compatible with Linux* Ubuntu* 18.04) from [https://iotdk.intel.com/fpga-precompiled-binaries/latest/cholesky_inversion.fpga.tar.gz](https://iotdk.intel.com/fpga-precompiled-binaries/latest/cholesky_inversion.fpga.tar.gz).
-
### On Windows*
-> **Note**: The Intel® PAC with Intel Arria® 10 GX FPGA and Intel® FPGA PAC D5005 (with Intel Stratix® 10 SX) do not yet support Windows*. Compiling to FPGA hardware on Windows* requires a third-party or custom Board Support Package (BSP) with Windows* support.
-
1. Change to the sample directory.
-2. Configure the build system for **Intel® PAC with Intel Arria® 10 GX FPGA**, which is the default
+2. Configure the build system for the Agilex™ device family, which is the default.
```
mkdir build
cd build
cmake -G "NMake Makefiles" ..
```
- For the **Intel® FPGA PAC D5005 (with Intel Stratix® 10 SX)**, enter the following:
- ```
- cmake -G "NMake Makefiles" .. -DFPGA_DEVICE=intel_s10sx_pac:pac_s10
- ```
+
+ > **Note**: You can change the default target by using the command:
+ > ```
+ > cmake -G "NMake Makefiles" .. -DFPGA_DEVICE=
+ > ```
+ >
+ > Alternatively, you can target an explicit FPGA board variant and BSP by using the following command:
+ > ```
+ > cmake -G "NMake Makefiles" .. -DFPGA_DEVICE=:
+ > ```
+ >
+ > You will only be able to run an executable on the FPGA if you specified a BSP.
3. Compile the design. (The provided targets match the recommended development flow.)
@@ -263,7 +279,7 @@ You can apply the Cholesky-based inversion to 8 matrices repeated a number of ti
```
CL_CONTEXT_MPSIM_DEVICE_INTELFPGA=1 ./cholesky_inversion.fpga_sim
```
-3. Run on the FPGA device.
+3. Run on the FPGA device (only if you ran `cmake` with `-DFPGA_DEVICE=:`).
```
./cholesky_inversion.fpga
```
@@ -280,7 +296,7 @@ You can apply the Cholesky-based inversion to 8 matrices repeated a number of ti
cholesky_inversion.fpga_sim.exe
set CL_CONTEXT_MPSIM_DEVICE_INTELFPGA=
```
-3. Run on the FPGA device.
+3. Run on the FPGA device (only if you ran `cmake` with `-DFPGA_DEVICE=:`).
```
cholesky_inversion.fpga.exe
```
diff --git a/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/cholesky_inversion/src/CMakeLists.txt b/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/cholesky_inversion/src/CMakeLists.txt
index 1b464c424e..16f31b9059 100755
--- a/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/cholesky_inversion/src/CMakeLists.txt
+++ b/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/cholesky_inversion/src/CMakeLists.txt
@@ -7,12 +7,36 @@ set(FPGA_EARLY_IMAGE ${TARGET_NAME}_report.a)
# FPGA board selection
if(NOT DEFINED FPGA_DEVICE)
- set(FPGA_DEVICE "intel_a10gx_pac:pac_a10")
+ set(FPGA_DEVICE "Agilex")
+ set(DEVICE_FLAG "Agilex")
message(STATUS "FPGA_DEVICE was not specified.\
- \nConfiguring the design to run on the default FPGA board ${FPGA_DEVICE} (Intel(R) PAC with Intel Arria(R) 10 GX FPGA). \
- \nPlease refer to the README for information on board selection.")
+ \nConfiguring the design to the default FPGA family: ${FPGA_DEVICE}\
+ \nPlease refer to the README for information on target selection.")
+ set(BSP_FLAG "")
else()
- message(STATUS "Configuring the design to run on FPGA board ${FPGA_DEVICE}")
+ string(TOLOWER ${FPGA_DEVICE} FPGA_DEVICE_NAME)
+ if(FPGA_DEVICE_NAME MATCHES ".*a10.*" OR FPGA_DEVICE_NAME MATCHES ".*arria10.*")
+ set(DEVICE_FLAG "A10")
+ elseif(FPGA_DEVICE_NAME MATCHES ".*s10.*" OR FPGA_DEVICE_NAME MATCHES ".*stratix10.*")
+ set(DEVICE_FLAG "S10")
+ elseif(FPGA_DEVICE_NAME MATCHES ".*agilex.*")
+ set(DEVICE_FLAG "Agilex")
+ endif()
+ message(STATUS "Configuring the design with the following target: ${FPGA_DEVICE}")
+
+ # Check if the target is a BSP
+ if(IS_BSP MATCHES "1" OR FPGA_DEVICE MATCHES ".*pac_a10.*|.*pac_s10.*")
+ set(BSP_FLAG "-DIS_BSP")
+ else()
+ set(BSP_FLAG "")
+ message(STATUS "The selected target ${FPGA_DEVICE} is assumed to be an FPGA part number, so the IS_BSP macro will not be passed to your C++ code.")
+ message(STATUS "If the target is actually a BSP, run cmake with -DIS_BSP=1 to pass the IS_BSP macro to your C++ code.")
+ endif()
+endif()
+
+if(NOT DEFINED DEVICE_FLAG)
+ message(FATAL_ERROR "An unrecognized or custom board was passed, but DEVICE_FLAG was not specified. \
+ Make sure you have set -DDEVICE_FLAG=A10, -DDEVICE_FLAG=S10 or -DDEVICE_FLAG=Agilex.")
endif()
# This is a Windows-specific flag that enables error handling in host code
@@ -24,20 +48,15 @@ else()
set(PLATFORM_SPECIFIC_LINK_FLAGS "-fp-model=precise ")
endif()
-
-# A10 parameters
-set(MATRIX_DIMENSION 32)
-set(COMPLEX 0)
-set(FIXED_ITERATIONS_DECOMPOSITION 39)
-set(FIXED_ITERATIONS_INVERSION 34)
-set(CLOCK_TARGET 360MHz)
-set(SEED "-Xsseed=29")
-
-# Set design parameters according to the selected board
-if(FPGA_DEVICE MATCHES ".*a10.*")
+if(DEVICE_FLAG MATCHES "A10")
# A10 parameters
- # Nothing to do
-elseif(FPGA_DEVICE MATCHES ".*s10.*")
+ set(MATRIX_DIMENSION 32)
+ set(COMPLEX 0)
+ set(FIXED_ITERATIONS_DECOMPOSITION 39)
+ set(FIXED_ITERATIONS_INVERSION 34)
+ set(CLOCK_TARGET 360MHz)
+ set(SEED "-Xsseed=29")
+elseif(DEVICE_FLAG MATCHES "S10")
# S10 parameters
set(MATRIX_DIMENSION 32)
set(COMPLEX 0)
@@ -45,7 +64,7 @@ elseif(FPGA_DEVICE MATCHES ".*s10.*")
set(FIXED_ITERATIONS_INVERSION 44)
set(CLOCK_TARGET 450MHz)
set(SEED "-Xsseed=5")
-elseif(FPGA_DEVICE MATCHES ".*agilex.*")
+elseif(DEVICE_FLAG MATCHES "Agilex")
# Agilex™ parameters
set(MATRIX_DIMENSION 32)
set(FIXED_ITERATIONS_DECOMPOSITION 45)
@@ -54,8 +73,7 @@ elseif(FPGA_DEVICE MATCHES ".*agilex.*")
set(CLOCK_TARGET 520MHz)
set(SEED "-Xsseed=5")
else()
- message(STATUS "Unknown board ${FPGA_DEVICE}!")
- message(STATUS "Using Arria 10 defaults.")
+ message(FATAL_ERROR "An incorrect DEVICE_FLAG was given. Make sure you have set -DDEVICE_FLAG=A10, -DDEVICE_FLAG=S10 or -DDEVICE_FLAG=Agilex.")
endif()
if(IGNORE_DEFAULT_SEED)
@@ -88,12 +106,12 @@ message(STATUS "SEED=${SEED}")
# 1. The "compile" stage compiles the device code to an intermediate representation (SPIR-V).
# 2. The "link" stage invokes the compiler's FPGA backend before linking.
# For this reason, FPGA backend flags must be passed as link flags in CMake.
-set(EMULATOR_COMPILE_FLAGS "-fsycl -fintelfpga -Wall ${PLATFORM_SPECIFIC_COMPILE_FLAGS} -Wformat-security -Werror=format-security -fbracket-depth=512 -DFIXED_ITERATIONS_DECOMPOSITION=${FIXED_ITERATIONS_DECOMPOSITION} -DFIXED_ITERATIONS_INVERSION=${FIXED_ITERATIONS_INVERSION} -DCOMPLEX=${COMPLEX} -DMATRIX_DIMENSION=${MATRIX_DIMENSION} -DFPGA_EMULATOR")
-set(EMULATOR_LINK_FLAGS "-fsycl -fintelfpga ${PLATFORM_SPECIFIC_LINK_FLAGS}")
-set(SIMULATOR_COMPILE_FLAGS "-fsycl -fintelfpga -Wall ${PLATFORM_SPECIFIC_COMPILE_FLAGS} -DFPGA_SIMULATOR -DFIXED_ITERATIONS_DECOMPOSITION=${FIXED_ITERATIONS_DECOMPOSITION} -DFIXED_ITERATIONS_INVERSION=${FIXED_ITERATIONS_INVERSION} -DCOMPLEX=${COMPLEX} -DMATRIX_DIMENSION=${MATRIX_DIMENSION} -Xsfp-relaxed ${USER_HARDWARE_FLAGS}")
-set(SIMULATOR_LINK_FLAGS "-fsycl -fintelfpga ${PLATFORM_SPECIFIC_LINK_FLAGS} -Xssimulation -Xsghdl -Xsclock=${CLOCK_TARGET} -Xstarget=${FPGA_DEVICE} ${USER_SIMULATOR_FLAGS} -Xsfp-relaxed")
-set(HARDWARE_COMPILE_FLAGS "-fsycl -fintelfpga -Wall ${PLATFORM_SPECIFIC_COMPILE_FLAGS} -Wformat-security -Werror=format-security -fbracket-depth=512 -DFIXED_ITERATIONS_DECOMPOSITION=${FIXED_ITERATIONS_DECOMPOSITION} -DFIXED_ITERATIONS_INVERSION=${FIXED_ITERATIONS_INVERSION} -DCOMPLEX=${COMPLEX} -DMATRIX_DIMENSION=${MATRIX_DIMENSION} -Xsfp-relaxed -DFPGA_HARDWARE")
-set(HARDWARE_LINK_FLAGS "-fsycl -fintelfpga ${PLATFORM_SPECIFIC_LINK_FLAGS} -Xshardware -Xsclock=${CLOCK_TARGET} -Xsparallel=2 ${SEED} -Xstarget=${FPGA_DEVICE} ${USER_HARDWARE_FLAGS} -Xsfp-relaxed")
+set(EMULATOR_COMPILE_FLAGS "-fsycl -fintelfpga -Wall ${PLATFORM_SPECIFIC_COMPILE_FLAGS} -Wformat-security -Werror=format-security -fbracket-depth=512 -DFIXED_ITERATIONS_DECOMPOSITION=${FIXED_ITERATIONS_DECOMPOSITION} -DFIXED_ITERATIONS_INVERSION=${FIXED_ITERATIONS_INVERSION} -DCOMPLEX=${COMPLEX} -DMATRIX_DIMENSION=${MATRIX_DIMENSION} -DFPGA_EMULATOR ${BSP_FLAG}")
+set(EMULATOR_LINK_FLAGS "-fsycl -fintelfpga ${PLATFORM_SPECIFIC_LINK_FLAGS} ${BSP_FLAG}")
+set(SIMULATOR_COMPILE_FLAGS "-fsycl -fintelfpga -Wall ${PLATFORM_SPECIFIC_COMPILE_FLAGS} -DFPGA_SIMULATOR -DFIXED_ITERATIONS_DECOMPOSITION=${FIXED_ITERATIONS_DECOMPOSITION} -DFIXED_ITERATIONS_INVERSION=${FIXED_ITERATIONS_INVERSION} -DCOMPLEX=${COMPLEX} -DMATRIX_DIMENSION=${MATRIX_DIMENSION} -Xsfp-relaxed ${USER_HARDWARE_FLAGS} ${BSP_FLAG}")
+set(SIMULATOR_LINK_FLAGS "-fsycl -fintelfpga ${PLATFORM_SPECIFIC_LINK_FLAGS} -Xssimulation -Xsghdl -Xsclock=${CLOCK_TARGET} -Xstarget=${FPGA_DEVICE} ${USER_SIMULATOR_FLAGS} -Xsfp-relaxed ${BSP_FLAG}")
+set(HARDWARE_COMPILE_FLAGS "-fsycl -fintelfpga -Wall ${PLATFORM_SPECIFIC_COMPILE_FLAGS} -Wformat-security -Werror=format-security -fbracket-depth=512 -DFIXED_ITERATIONS_DECOMPOSITION=${FIXED_ITERATIONS_DECOMPOSITION} -DFIXED_ITERATIONS_INVERSION=${FIXED_ITERATIONS_INVERSION} -DCOMPLEX=${COMPLEX} -DMATRIX_DIMENSION=${MATRIX_DIMENSION} -Xsfp-relaxed -DFPGA_HARDWARE ${BSP_FLAG}")
+set(HARDWARE_LINK_FLAGS "-fsycl -fintelfpga ${PLATFORM_SPECIFIC_LINK_FLAGS} -Xshardware -Xsclock=${CLOCK_TARGET} -Xsparallel=2 ${SEED} -Xstarget=${FPGA_DEVICE} ${USER_HARDWARE_FLAGS} -Xsfp-relaxed ${BSP_FLAG}")
# use cmake -D USER_HARDWARE_FLAGS= to set extra flags for FPGA backend compilation
###############################################################################
diff --git a/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/cholesky_inversion/src/cholesky_inversion.hpp b/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/cholesky_inversion/src/cholesky_inversion.hpp
index 7f67dfdc38..22e8faac39 100644
--- a/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/cholesky_inversion/src/cholesky_inversion.hpp
+++ b/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/cholesky_inversion/src/cholesky_inversion.hpp
@@ -62,8 +62,14 @@ void CholeskyInversionImpl(
sycl::ext::intel::pipe;
// Allocate FPGA DDR memory.
+#if defined (IS_BSP)
TT *a_device = sycl::malloc_device(kAMatrixSize * matrix_count, q);
TT *i_device = sycl::malloc_device(kIMatrixSize * matrix_count, q);
+#else
+ // malloc_device are not supported when targetting an FPGA part/family
+ TT *a_device = sycl::malloc_shared(kAMatrixSize * matrix_count, q);
+ TT *i_device = sycl::malloc_shared(kIMatrixSize * matrix_count, q);
+#endif
if ((a_device == nullptr) || (i_device == nullptr)) {
std::cerr << "Error when allocating FPGA DDR" << std::endl;
diff --git a/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/cholesky_inversion/src/memory_transfers.hpp b/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/cholesky_inversion/src/memory_transfers.hpp
index 1a40f3915f..4644e6c954 100644
--- a/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/cholesky_inversion/src/memory_transfers.hpp
+++ b/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/cholesky_inversion/src/memory_transfers.hpp
@@ -38,7 +38,17 @@ void MatrixReadFromDDRToPipe(
// Size of a full matrix
constexpr int kMatrixSize = rows * columns;
- sycl::device_ptr matrix_ptr_device(matrix_ptr);
+#if defined (IS_BSP)
+ // When targeting a BSP, we instruct the compiler that this pointer
+ // lives on the device.
+ // Knowing this, the compiler won't generate hardware to
+ // potentially get data from the host.
+ sycl::device_ptr matrix_ptr_located(matrix_ptr);
+#else
+ // Device pointers are not supported when targeting an FPGA
+ // family/part
+ TT* matrix_ptr_located(matrix_ptr);
+#endif
// Repeatedly read matrix_count matrices from the DDR and send them to the
// pipe
@@ -72,12 +82,12 @@ void MatrixReadFromDDRToPipe(
// memory address that may be beyond the matrix last address)
if (!out_of_bounds) {
ddr_read.template get() =
- matrix_ptr_device[matrix_index * kMatrixSize + load_index +
+ matrix_ptr_located[matrix_index * kMatrixSize + load_index +
k];
}
} else {
ddr_read.template get() =
- matrix_ptr_device[matrix_index * kMatrixSize +
+ matrix_ptr_located[matrix_index * kMatrixSize +
(int)(li)*num_elem_per_bank + k];
}
});
@@ -118,7 +128,17 @@ void VectorReadFromPipeToDDR(
constexpr int kExtraIteration = kIncompleteBurst ? 1 : 0;
constexpr int kLoopIter = (vector_size / num_elem_per_bank) + kExtraIteration;
- sycl::device_ptr vector_ptr_device(vector_ptr);
+#if defined (IS_BSP)
+ // When targeting a BSP, we instruct the compiler that this pointer
+ // lives on the device.
+ // Knowing this, the compiler won't generate hardware to
+ // potentially get data from the host.
+ sycl::device_ptr vector_ptr_located(vector_ptr);
+#else
+ // Device pointers are not supported when targeting an FPGA
+ // family/part
+ TT* vector_ptr_located(vector_ptr);
+#endif
// Repeat vector_count complete I vector pipe reads
// for as many repetitions as needed
@@ -139,7 +159,7 @@ void VectorReadFromPipeToDDR(
#pragma unroll
for (int k = 0; k < num_elem_per_bank; k++) {
if (((li * num_elem_per_bank) + k) < vector_size) {
- vector_ptr_device[(vector_idx * vector_size) +
+ vector_ptr_located[(vector_idx * vector_size) +
(li * num_elem_per_bank) + k] = bank[k];
}
}
@@ -147,7 +167,7 @@ void VectorReadFromPipeToDDR(
// Write a burst of num_elem_per_bank elements to DDR
#pragma unroll
for (int k = 0; k < num_elem_per_bank; k++) {
- vector_ptr_device[(vector_idx * vector_size) +
+ vector_ptr_located[(vector_idx * vector_size) +
(li * num_elem_per_bank) + k] = bank[k];
}
}
diff --git a/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/crr/README.md b/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/crr/README.md
index 4c1e0e1a76..58026eb161 100755
--- a/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/crr/README.md
+++ b/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/crr/README.md
@@ -39,7 +39,7 @@ You can also find more information about [troubleshooting build errors](/DirectP
| Optimized for | Description
|:--- |:---
| OS | Ubuntu* 18.04/20.04
RHEL*/CentOS* 8
SUSE* 15
Windows* 10
-| Hardware | Intel® Programmable Acceleration Card with Intel® Arria® 10 GX FPGA (Intel® PAC with Intel® Arria® 10 GX FPGA)
Intel® FPGA Programmable Acceleration Card (PAC) D5005 (with Intel Stratix® 10 SX)
+| Hardware | Intel® Agilex™, Arria® 10, and Stratix® 10 FPGAs
| Software | Intel® oneAPI DPC++/C++ Compiler
> **Note**: Even though the Intel DPC++/C++ OneAPI compiler is enough to compile for emulation, generating reports and generating RTL, there are extra software requirements for the simulation flow and FPGA compiles.
@@ -50,6 +50,8 @@ You can also find more information about [troubleshooting build errors](/DirectP
> - ModelSim® SE
>
> When using the hardware compile flow, Intel® Quartus® Prime Pro Edition must be installed and accessible through your PATH.
+>
+> :warning: Make sure you add the device files associated with the FPGA that you are targeting to your Intel® Quartus® Prime installation.
### Performance
@@ -151,16 +153,26 @@ This design measures the FPGA performance to determine how many assets can be pr
### On Linux*
1. Change to the sample directory.
-2. Configure the build system for **Intel® PAC with Intel Arria® 10 GX FPGA**, which is the default.
+2. Configure the build system for the Agilex™ device family, which is the default.
+
```
mkdir build
cd build
cmake ..
```
- For the **Intel® FPGA PAC D5005 (with Intel Stratix® 10 SX)**, enter the following:
- ```
- cmake .. -DFPGA_DEVICE=intel_s10sx_pac:pac_s10
- ```
+
+ > **Note**: You can change the default target by using the command:
+ > ```
+ > cmake .. -DFPGA_DEVICE=
+ > ```
+ >
+ > Alternatively, you can target an explicit FPGA board variant and BSP by using the following command:
+ > ```
+ > cmake .. -DFPGA_DEVICE=:
+ > ```
+ >
+ > You will only be able to run an executable on the FPGA if you specified a BSP.
+
3. Compile the design. (The provided targets match the recommended development flow.)
1. Compile for emulation (fast compile time, targets emulated FPGA device).
@@ -178,23 +190,27 @@ This design measures the FPGA performance to determine how many assets can be pr
make fpga
```
- (Optional) As the above hardware compile may take several hours to complete, you can download FPGA precompiled binaries (compatible with Linux* Ubuntu* 18.04) from [https://iotdk.intel.com/fpga-precompiled-binaries/latest/crr.fpga.tar.gz](https://iotdk.intel.com/fpga-precompiled-binaries/latest/crr.fpga.tar.gz).
-
### On Windows*
-> **Note**: The Intel® PAC with Intel Arria® 10 GX FPGA and Intel® FPGA PAC D5005 (with Intel Stratix® 10 SX) do not yet support Windows*. Compiling to FPGA hardware on Windows* requires a third-party or custom Board Support Package (BSP) with Windows* support.
-
1. Change to the sample directory.
-2. Configure the build system for **Intel® PAC with Intel Arria® 10 GX FPGA**, which is the default
+2. Configure the build system for the Agilex™ device family, which is the default.
```
mkdir build
cd build
cmake -G "NMake Makefiles" ..
```
- For the **Intel® FPGA PAC D5005 (with Intel Stratix® 10 SX)**, enter the following:
- ```
- cmake -G "NMake Makefiles" .. -DFPGA_DEVICE=intel_s10sx_pac:pac_s10
- ```
+
+ > **Note**: You can change the default target by using the command:
+ > ```
+ > cmake -G "NMake Makefiles" .. -DFPGA_DEVICE=
+ > ```
+ >
+ > Alternatively, you can target an explicit FPGA board variant and BSP by using the following command:
+ > ```
+ > cmake -G "NMake Makefiles" .. -DFPGA_DEVICE=:
+ > ```
+ >
+ > You will only be able to run an executable on the FPGA if you specified a BSP.
3. Compile the design. (The provided targets match the recommended development flow.)
@@ -229,7 +245,7 @@ This design measures the FPGA performance to determine how many assets can be pr
```
CL_CONTEXT_MPSIM_DEVICE_INTELFPGA=1 ./crr.fpga_sim [-o=]
```
-3. Run the sample on the FPGA device.
+3. Run the sample on the FPGA device (only if you ran `cmake` with `-DFPGA_DEVICE=:`).
```
./crr.fpga [-o=]
```
@@ -250,7 +266,7 @@ This design measures the FPGA performance to determine how many assets can be pr
set CL_CONTEXT_MPSIM_DEVICE_INTELFPGA=
```
-3. Run the sample on the FPGA device.
+3. Run the sample on the FPGA device (only if you ran `cmake` with `-DFPGA_DEVICE=:`).
```
crr.fpga.exe [-o=]
```
diff --git a/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/crr/src/CMakeLists.txt b/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/crr/src/CMakeLists.txt
index 448a5a0769..9e1667f88c 100755
--- a/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/crr/src/CMakeLists.txt
+++ b/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/crr/src/CMakeLists.txt
@@ -6,12 +6,27 @@ set(FPGA_TARGET ${TARGET_NAME}.fpga)
# FPGA board selection
if(NOT DEFINED FPGA_DEVICE)
- set(FPGA_DEVICE "intel_a10gx_pac:pac_a10")
+ set(FPGA_DEVICE "Agilex")
+ set(DEVICE_FLAG "Agilex")
message(STATUS "FPGA_DEVICE was not specified.\
- \nConfiguring the design to run on the default FPGA board ${FPGA_DEVICE} (Intel(R) PAC with Intel Arria(R) 10 GX FPGA). \
- \nPlease refer to the README for information on board selection.")
+ \nConfiguring the design to the default FPGA family: ${FPGA_DEVICE}\
+ \nPlease refer to the README for information on target selection.")
else()
- message(STATUS "Configuring the design to run on FPGA board ${FPGA_DEVICE}")
+ string(TOLOWER ${FPGA_DEVICE} FPGA_DEVICE_NAME)
+ if(FPGA_DEVICE_NAME MATCHES ".*a10.*" OR FPGA_DEVICE_NAME MATCHES ".*arria10.*")
+ set(DEVICE_FLAG "A10")
+ elseif(FPGA_DEVICE_NAME MATCHES ".*s10.*" OR FPGA_DEVICE_NAME MATCHES ".*stratix10.*")
+ set(DEVICE_FLAG "S10")
+ elseif(FPGA_DEVICE_NAME MATCHES ".*agilex.*")
+ set(DEVICE_FLAG "Agilex")
+ endif()
+ message(STATUS "Configuring the design with the following target: ${FPGA_DEVICE}")
+endif()
+
+if(NOT DEFINED DEVICE_FLAG)
+ message(FATAL_ERROR "An unrecognized or custom board was passed, but DEVICE_FLAG was not specified. \
+ Please make sure you have set -DDEVICE_FLAG=A10, -DDEVICE_FLAG=S10 or \
+ -DDEVICE_FLAG=Agilex.")
endif()
# This is a Windows-specific flag that enables error handling in host code
@@ -20,19 +35,19 @@ if(WIN32)
endif()
# Set design parameters according to the selected board
-if(FPGA_DEVICE MATCHES ".*a10.*")
+if(DEVICE_FLAG MATCHES "A10")
# A10 parameters
set(OUTER_UNROLL 1)
set(INNER_UNROLL 64)
set(OUTER_UNROLL_POW2 1)
set(SEED "-Xsseed=1")
-elseif(FPGA_DEVICE MATCHES ".*s10.*")
+elseif(DEVICE_FLAG MATCHES "S10")
# S10 parameters
set(OUTER_UNROLL 2)
set(INNER_UNROLL 64)
set(OUTER_UNROLL_POW2 2)
set(SEED "-Xsseed=2")
-elseif(FPGA_DEVICE MATCHES ".*agilex.*")
+elseif(DEVICE_FLAG MATCHES "Agilex")
# Agilex™
set(OUTER_UNROLL 2)
set(INNER_UNROLL 64)
diff --git a/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/db/README.md b/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/db/README.md
index 31c532ef45..e8202429be 100644
--- a/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/db/README.md
+++ b/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/db/README.md
@@ -10,7 +10,9 @@ This reference design demonstrates how to use an FPGA to accelerate database que
## Purpose
-The database query acceleration sample includes 8 tables and a set of 21 business-oriented queries with broad industry-wide relevance. This reference design shows how four queries can be accelerated using the Intel® FPGA PAC D5005 (with Intel Stratix® 10 SX) and oneAPI. To do so, we create a set of common database operators (found in the `src/db_utils/` directory) that are combined in different ways to build the four queries.
+The database query acceleration sample includes 8 tables and a set of 21 business-oriented queries with broad industry-wide relevance. This reference design shows how four queries can be accelerated using oneAPI. To do so, we create a set of common database operators (found in the `src/db_utils/` directory) that are combined in different ways to build the four queries.
+
+Note that this design uses a lot of resources and is designed with Intel® Stratix® 10 FPGA capabilities in mind.
## Prerequisites
@@ -38,7 +40,7 @@ You can also find more information about [troubleshooting build errors](/DirectP
| Optimized for | Description
--- |---
| OS | Ubuntu* 18.04/20.04
RHEL*/CentOS* 8
SUSE* 15
Windows* 10
-| Hardware | FPGA Programmable Acceleration Card (PAC) D5005 (with Intel Stratix® 10 SX)
+| Hardware | Intel® Agilex™, Arria® 10, and Stratix® 10 FPGAs
| Software | Intel® oneAPI DPC++/C++ Compiler
> **Note**: Even though the Intel DPC++/C++ OneAPI compiler is enough to compile for emulation, generating reports and generating RTL, there are extra software requirements for the simulation flow and FPGA compiles.
@@ -49,8 +51,8 @@ You can also find more information about [troubleshooting build errors](/DirectP
> - ModelSim® SE
>
> When using the hardware compile flow, Intel® Quartus® Prime Pro Edition must be installed and accessible through your PATH.
-
-> **Note**: This example design is only officially supported for the Intel® FPGA PAC D5005 (with Intel Stratix® 10 SX).
+>
+> :warning: Make sure you add the device files associated with the FPGA that you are targeting to your Intel® Quartus® Prime installation.
### Performance
@@ -144,7 +146,7 @@ Query 12 showcases the `MergeJoin` database operator. The block diagram of the d
### On Linux*
1. Change to the sample directory.
-2. Configure the build system for query number 1.
+2. Configure the build system for the default target (the Agilex™ device family).
```
mkdir build
cd build
@@ -152,6 +154,18 @@ Query 12 showcases the `MergeJoin` database operator. The block diagram of the d
```
`-DQUERY=` can be any of the following query numbers: `1`, `9`, `11` or `12`.
+ > **Note**: You can change the default target by using the command:
+ > ```
+ > cmake .. -DQUERY= -DFPGA_DEVICE=
+ > ```
+ >
+ > Alternatively, you can target an explicit FPGA board variant and BSP by using the following command:
+ > ```
+ > cmake .. -DQUERY= -DFPGA_DEVICE=:
+ > ```
+ >
+ > You will only be able to run an executable on the FPGA if you specified a BSP.
+
3. Compile the design. (The provided targets match the recommended development flow.)
1. Compile for emulation (fast compile time, targets emulated FPGA device).
@@ -168,7 +182,7 @@ Query 12 showcases the `MergeJoin` database operator. The block diagram of the d
```
The report resides at `db_report.prj/reports/report.html`.
- >**Note**: If you are compiling Query 9 (`-DQUERY=9`), expect a long report generation time. You can download pre-generated reports from [https://iotdk.intel.com/fpga-precompiled-binaries/latest/db.fpga.tar.gz](https://iotdk.intel.com/fpga-precompiled-binaries/latest/db.fpga.tar.gz).
+ >**Note**: If you are compiling Query 9 (`-DQUERY=9`), expect a long report generation time.
4. Compile for FPGA hardware (longer compile time, targets FPGA device).
@@ -178,21 +192,29 @@ Query 12 showcases the `MergeJoin` database operator. The block diagram of the d
When building for hardware, the default scale factor is **1**. To use the smaller scale factor of 0.01, add the flag `-DSF_SMALL=1` to the original `cmake` command. For example: `cmake .. -DQUERY=11 -DSF_SMALL=1`. See the [Database files](#database-files) for more information.
- (Optional) The hardware compile may take several hours to complete. You can download a pre-compiled binary (compatible with Linux* Ubuntu* 18.04) for an Intel® FPGA PAC D5005 (with Intel Stratix® 10 SX) from [https://iotdk.intel.com/fpga-precompiled-binaries/latest/db.fpga.tar.gz](https://iotdk.intel.com/fpga-precompiled-binaries/latest/db.fpga.tar.gz).
-
### On Windows*
->**Note**: The FPGA Programmable Acceleration Card (PAC) D5005 (with Intel Stratix® 10 SX) does not yet support Windows*. Compiling to FPGA hardware on Windows* requires a third-party or custom Board Support Package (BSP) with Windows* support.
-
1. Change to the sample directory.
-2. Configure the build system for query number 1.
+2. Configure the build system for the default target (the Agilex™ device family).
```
mkdir build
cd build
- cmake -G "NMake Makefiles" -DQUERY=1
+ cmake -G "NMake Makefiles" .. -DQUERY=1
```
`-DQUERY=` can be any of the following query numbers: `1`, `9`, `11` or `12`.
+ > **Note**: You can change the default target by using the command:
+ > ```
+ > cmake -G "NMake Makefiles" .. -DQUERY= -DFPGA_DEVICE=
+ > ```
+ >
+ > Alternatively, you can target an explicit FPGA board variant and BSP by using the following command:
+ > ```
+ > cmake -G "NMake Makefiles" .. -DQUERY= -DFPGA_DEVICE=:
+ > ```
+ >
+ > You will only be able to run an executable on the FPGA if you specified a BSP.
+
3. Compile the design. (The provided targets match the recommended development flow.)
1. Compile for emulation (fast compile time, targets emulated FPGA device).
@@ -238,11 +260,11 @@ Query 12 showcases the `MergeJoin` database operator. The block diagram of the d
./db.fpga_emu --dbroot=../data/sf0.01 --test
```
(Optional) Run the design for queries `9`, `11` and `12`.
-2. Run the sample on the FPGA simulator device:
+2. Run the sample on the FPGA simulator device.
```
CL_CONTEXT_MPSIM_DEVICE_INTELFPGA=1 ./db.fpga_sim --dbroot=../data/sf0.01 --test
```
-3. Run the design on an FPGA device.
+3. Run the design on an FPGA device (only if you ran `cmake` with `-DFPGA_DEVICE=:`).
```
./db.fpga --dbroot=../data/sf1 --test
```
@@ -254,13 +276,13 @@ Query 12 showcases the `MergeJoin` database operator. The block diagram of the d
db.fpga_emu.exe --dbroot=../data/sf0.01 --test
```
(Optional) Run the design for queries `9`, `11` and `12`.
-2. Run the sample on the FPGA simulator device:
+2. Run the sample on the FPGA simulator device.
```
set CL_CONTEXT_MPSIM_DEVICE_INTELFPGA=1
db.fpga_sim.exe --dbroot=../data/sf0.01 --test
set CL_CONTEXT_MPSIM_DEVICE_INTELFPGA=
```
-3. Run the sample on an FPGA device.
+3. Run the sample on an FPGA device (only if you ran `cmake` with `-DFPGA_DEVICE=:`).
```
db.fpga.exe --dbroot=../data/sf1 --test
```
diff --git a/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/db/src/CMakeLists.txt b/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/db/src/CMakeLists.txt
index 339f3e0a5d..63ab8c7ed5 100755
--- a/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/db/src/CMakeLists.txt
+++ b/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/db/src/CMakeLists.txt
@@ -12,29 +12,29 @@ else()
message(STATUS "\tQUERY=${QUERY}")
endif()
-# select default board based on query
-if(${QUERY} EQUAL 1)
- set(DEFAULT_BOARD "intel_a10gx_pac:pac_a10")
- set(DEFAULT_BOARD_STR "Intel Arria(R) 10 GX")
-elseif(${QUERY} EQUAL 9)
- set(DEFAULT_BOARD "intel_s10sx_pac:pac_s10")
- set(DEFAULT_BOARD_STR "Intel Stratix(R) 10 SX")
-elseif(${QUERY} EQUAL 11)
- set(DEFAULT_BOARD "intel_s10sx_pac:pac_s10")
- set(DEFAULT_BOARD_STR "Intel Stratix(R) 10 SX")
-elseif(${QUERY} EQUAL 12)
- set(DEFAULT_BOARD "intel_a10gx_pac:pac_a10")
- set(DEFAULT_BOARD_STR "Intel Arria(R) 10 GX")
-endif()
-
# FPGA board selection
if(NOT DEFINED FPGA_DEVICE)
- set(FPGA_DEVICE ${DEFAULT_BOARD})
+ set(FPGA_DEVICE "Agilex")
+ set(DEVICE_FLAG "Agilex")
message(STATUS "FPGA_DEVICE was not specified.\
- \nConfiguring the design to run on the default FPGA board ${FPGA_DEVICE} (Intel(R) PAC with ${DEFAULT_BOARD_STR} FPGA). \
- \nPlease refer to the README for information on board selection.")
+ \nConfiguring the design to the default FPGA family: ${FPGA_DEVICE}\
+ \nPlease refer to the README for information on target selection.")
else()
- message(STATUS "Configuring the design to run on FPGA board ${FPGA_DEVICE}")
+ string(TOLOWER ${FPGA_DEVICE} FPGA_DEVICE_NAME)
+ if(FPGA_DEVICE_NAME MATCHES ".*a10.*" OR FPGA_DEVICE_NAME MATCHES ".*arria10.*")
+ set(DEVICE_FLAG "A10")
+ elseif(FPGA_DEVICE_NAME MATCHES ".*s10.*" OR FPGA_DEVICE_NAME MATCHES ".*stratix10.*")
+ set(DEVICE_FLAG "S10")
+ elseif(FPGA_DEVICE_NAME MATCHES ".*agilex.*")
+ set(DEVICE_FLAG "Agilex")
+ endif()
+ message(STATUS "Configuring the design with the following target: ${FPGA_DEVICE}")
+endif()
+
+if(NOT DEFINED DEVICE_FLAG)
+ message(FATAL_ERROR "An unrecognized or custom board was passed, but DEVICE_FLAG was not specified. \
+ Please make sure you have set -DDEVICE_FLAG=A10, -DDEVICE_FLAG=S10 or \
+ -DDEVICE_FLAG=Agilex.")
endif()
# This is a Windows-specific flag that enables error handling in host code
@@ -53,7 +53,7 @@ endif()
# Pick the default seed if the user did not specify one to CMake.
# We do a seed sweep to find a good seed by default
if(NOT DEFINED SEED)
- if(${FPGA_DEVICE} MATCHES ".*a10.*")
+ if(DEVICE_FLAG MATCHES "A10")
if(${QUERY} EQUAL 1)
set(SEED "-Xsseed=2")
elseif(${QUERY} EQUAL 9)
@@ -63,7 +63,7 @@ if(NOT DEFINED SEED)
elseif(${QUERY} EQUAL 12)
set(SEED "-Xsseed=2")
endif()
- elseif(${FPGA_DEVICE} MATCHES ".*s10.*")
+ elseif(DEVICE_FLAG MATCHES "S10")
if(${QUERY} EQUAL 1)
set(SEED "-Xsseed=3")
elseif(${QUERY} EQUAL 9)
@@ -73,7 +73,7 @@ if(NOT DEFINED SEED)
elseif(${QUERY} EQUAL 12)
set(SEED "-Xsseed=2")
endif()
- elseif(${FPGA_DEVICE} MATCHES ".*agilex.*")
+ elseif(DEVICE_FLAG MATCHES "Agilex")
if(${QUERY} EQUAL 1)
set(SEED "-Xsseed=2")
elseif(${QUERY} EQUAL 9)
@@ -93,7 +93,7 @@ if(IGNORE_DEFAULT_SEED)
endif()
# Error out if trying to run Q9 or Q11 on Arria 10
-if (${FPGA_DEVICE} MATCHES ".*a10.*")
+if (DEVICE_FLAG MATCHES "A10")
if(${QUERY} EQUAL 9 OR ${QUERY} EQUAL 11)
message(FATAL_ERROR "Queries 9 and 11 are not supported on Arria 10 devices")
endif()
diff --git a/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/decompress/README.md b/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/decompress/README.md
index fcb5b7b8d9..6a6365ad39 100755
--- a/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/decompress/README.md
+++ b/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/decompress/README.md
@@ -36,7 +36,7 @@ You can also find more information about [troubleshooting build errors](/DirectP
| Optimized for | Description
|:--- |:---
| OS | Ubuntu* 18.04/20.04
RHEL*/CentOS* 8
SUSE* 15
Windows* 10
-| Hardware | Intel® Programmable Acceleration Card with Intel® Arria® 10 GX FPGA (Intel® PAC with Intel® Arria® 10 GX FPGA)
Intel® FPGA Programmable Acceleration Card (PAC) D5005 (with Intel Stratix® 10 SX)
+| Hardware | Intel® Agilex™, Arria® 10, and Stratix® 10 FPGAs
| Software | Intel® oneAPI DPC++/C++ Compiler
> **Note**: Even though the Intel DPC++/C++ OneAPI compiler is enough to compile for emulation, generating reports and generating RTL, there are extra software requirements for the simulation flow and FPGA compiles.
@@ -47,6 +47,8 @@ You can also find more information about [troubleshooting build errors](/DirectP
> - ModelSim® SE
>
> When using the hardware compile flow, Intel® Quartus® Prime Pro Edition must be installed and accessible through your PATH.
+>
+> :warning: Make sure you add the device files associated with the FPGA that you are targeting to your Intel® Quartus® Prime installation.
## Key Implementation Details
@@ -302,21 +304,31 @@ For `constexpr_math.hpp`, `memory_utils.hpp`, `metaprogramming_utils.hpp`, `tupl
### On Linux*
1. Change to the sample directory.
-2. Configure the build system for **Intel® PAC with Intel Arria® 10 GX FPGA**, which is the default.
+2. Configure the build system for the Agilex™ device family, which is the default.
+
```
mkdir build
cd build
cmake ..
```
+
To select between GZIP and Snappy decompression, use `-DGZIP=1` or `-DSNAPPY=1`. If you do not specify the decompression, the code defaults to **Snappy**.
```
cmake .. -DGZIP=1
cmake .. -DSNAPPY=1
```
- For the **Intel® FPGA PAC D5005 (with Intel Stratix® 10 SX)**, enter the following:
- ```
- cmake .. -DFPGA_DEVICE=intel_s10sx_pac:pac_s10
- ```
+
+ > **Note**: You can change the default target by using the command:
+ > ```
+ > cmake .. -DFPGA_DEVICE=
+ > ```
+ >
+ > Alternatively, you can target an explicit FPGA board variant and BSP by using the following command:
+ > ```
+ > cmake .. -DFPGA_DEVICE=:
+ > ```
+ >
+ > You will only be able to run an executable on the FPGA if you specified a BSP.
3. Compile the design. (The provided targets match the recommended development flow.)
@@ -339,14 +351,10 @@ For `constexpr_math.hpp`, `memory_utils.hpp`, `metaprogramming_utils.hpp`, `tupl
make fpga
```
- (Optional) The hardware compiles listed above can take several hours to complete; alternatively, you can download FPGA precompiled binaries (compatible with Linux* Ubuntu* 18.04) from [https://iotdk.intel.com/fpga-precompiled-binaries/latest/decompress.fpga.tar.gz](https://iotdk.intel.com/fpga-precompiled-binaries/latest/decompress.fpga.tar.gz).
-
### On Windows*
-> **Note**: The Intel® PAC with Intel Arria® 10 GX FPGA and Intel® FPGA PAC D5005 (with Intel Stratix® 10 SX) do not yet support Windows*. Compiling to FPGA hardware on Windows* requires a third-party or custom Board Support Package (BSP) with Windows* support.
-
1. Change to the sample directory.
-2. Configure the build system for **Intel® PAC with Intel Arria® 10 GX FPGA**, which is the default
+2. Configure the build system for the Agilex™ device family, which is the default.
```
mkdir build
cd build
@@ -357,10 +365,19 @@ For `constexpr_math.hpp`, `memory_utils.hpp`, `metaprogramming_utils.hpp`, `tupl
cmake -G "NMake Makefiles" .. -DGZIP=1
cmake -G "NMake Makefiles" .. -DSNAPPY=1
```
- For the **Intel® FPGA PAC D5005 (with Intel Stratix® 10 SX)**, enter the following:
- ```
- cmake -G "NMake Makefiles" .. -DFPGA_DEVICE=intel_s10sx_pac:pac_s10
- ```
+
+ > **Note**: You can change the default target by using the command:
+ > ```
+ > cmake -G "NMake Makefiles" .. -DFPGA_DEVICE=
+ > ```
+ >
+ > Alternatively, you can target an explicit FPGA board variant and BSP by using the following command:
+ > ```
+ > cmake -G "NMake Makefiles" .. -DFPGA_DEVICE=:
+ > ```
+ >
+ > You will only be able to run an executable on the FPGA if you specified a BSP.
+
3. Compile the design. (The provided targets match the recommended development flow.)
1. Compile for emulation (fast compile time, targets emulated FPGA device).
@@ -391,11 +408,11 @@ For `constexpr_math.hpp`, `memory_utils.hpp`, `metaprogramming_utils.hpp`, `tupl
```
./decompress.fpga_emu
```
-2. Run the sample on the FPGA simulator device:
+2. Run the sample on the FPGA simulator device.
```
CL_CONTEXT_MPSIM_DEVICE_INTELFPGA=1 ./decompress.fpga_sim
```
-3. Run the sample on the FPGA device.
+3. Run the sample on the FPGA device (only if you ran `cmake` with `-DFPGA_DEVICE=:`).
```
./decompress.fpga
```
@@ -406,13 +423,13 @@ For `constexpr_math.hpp`, `memory_utils.hpp`, `metaprogramming_utils.hpp`, `tupl
```
decompress.fpga_emu.exe
```
-2. Run the sample on the FPGA simulator device:
+2. Run the sample on the FPGA simulator device.
```
set CL_CONTEXT_MPSIM_DEVICE_INTELFPGA=1
decompress.fpga_sim.exe
set CL_CONTEXT_MPSIM_DEVICE_INTELFPGA=
```
-3. Run the sample on the FPGA device.
+3. Run the sample on the FPGA device (only if you ran `cmake` with `-DFPGA_DEVICE=:`).
```
decompress.fpga.exe
```
diff --git a/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/decompress/src/CMakeLists.txt b/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/decompress/src/CMakeLists.txt
index d01e36b8cd..dc78305402 100644
--- a/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/decompress/src/CMakeLists.txt
+++ b/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/decompress/src/CMakeLists.txt
@@ -6,12 +6,36 @@ set(FPGA_TARGET ${TARGET_NAME}.fpga)
# FPGA board selection
if(NOT DEFINED FPGA_DEVICE)
- set(FPGA_DEVICE "intel_a10gx_pac:pac_a10")
+ set(FPGA_DEVICE "Agilex")
+ set(DEVICE_FLAG "Agilex")
message(STATUS "FPGA_DEVICE was not specified.\
- \nConfiguring the design to run on the default FPGA board ${FPGA_DEVICE} (Intel(R) PAC with Intel Arria(R) 10 GX FPGA). \
- \nPlease refer to the README for information on board selection.")
+ \nConfiguring the design to the default FPGA family: ${FPGA_DEVICE}\
+ \nPlease refer to the README for information on target selection.")
+ set(BSP_FLAG "")
else()
- message(STATUS "Configuring the design to run on FPGA board ${FPGA_DEVICE}")
+ string(TOLOWER ${FPGA_DEVICE} FPGA_DEVICE_NAME)
+ if(FPGA_DEVICE_NAME MATCHES ".*a10.*" OR FPGA_DEVICE_NAME MATCHES ".*arria10.*")
+ set(DEVICE_FLAG "A10")
+ elseif(FPGA_DEVICE_NAME MATCHES ".*s10.*" OR FPGA_DEVICE_NAME MATCHES ".*stratix10.*")
+ set(DEVICE_FLAG "S10")
+ elseif(FPGA_DEVICE_NAME MATCHES ".*agilex.*")
+ set(DEVICE_FLAG "Agilex")
+ endif()
+ message(STATUS "Configuring the design with the following target: ${FPGA_DEVICE}")
+
+ # Check if the target is a BSP
+ if(IS_BSP MATCHES "1" OR FPGA_DEVICE MATCHES ".*pac_a10.*|.*pac_s10.*")
+ set(BSP_FLAG "-DIS_BSP")
+ else()
+ set(BSP_FLAG "")
+ message(STATUS "The selected target ${FPGA_DEVICE} is assumed to be an FPGA part number, so the IS_BSP macro will not be passed to your C++ code.")
+ message(STATUS "If the target is actually a BSP, run cmake with -DIS_BSP=1 to pass the IS_BSP macro to your C++ code.")
+ endif()
+endif()
+
+if(NOT DEFINED DEVICE_FLAG)
+ message(FATAL_ERROR "An unrecognized or custom board was passed, but DEVICE_FLAG was not specified. \
+ Make sure you have set -DDEVICE_FLAG=A10, -DDEVICE_FLAG=S10 or -DDEVICE_FLAG=Agilex.")
endif()
# Select between SNAPPY and GZIP decompression
@@ -68,11 +92,11 @@ if(IGNORE_DEFAULT_SEED)
else()
if (NOT DEFINED SEED)
# the default seed for each FPGA type
- if(FPGA_DEVICE MATCHES ".*a10.*")
+ if(DEVICE_FLAG MATCHES "A10")
set(SEED 1)
- elseif(FPGA_DEVICE MATCHES ".*s10.*")
+ elseif(DEVICE_FLAG MATCHES "S10")
set(SEED 2)
- elseif(FPGA_DEVICE MATCHES ".*agilex.*")
+ elseif(DEVICE_FLAG MATCHES "Agilex")
set(SEED 3)
else()
message(STATUS "SEED not defined and no known seed for this board -- defaulting to SEED = 1")
@@ -94,13 +118,13 @@ endif()
# 1. The "compile" stage compiles the device code to an intermediate representation (SPIR-V).
# 2. The "link" stage invokes the compiler's FPGA backend before linking.
# For this reason, FPGA backend flags must be passed as link flags in CMake.
-set(EMULATOR_COMPILE_FLAGS "-fsycl -fintelfpga -Wall ${CONSTEXPR_STEPS} ${WIN_FLAG} ${AC_TYPES_FLAG} ${LITERALS_PER_CYCLE_FLAG} ${DECOMPRESS_FORMAT_FLAG} -DFPGA_EMULATOR")
-set(EMULATOR_LINK_FLAGS "-fsycl -fintelfpga ${AC_TYPES_FLAG}")
-set(SIMULATOR_COMPILE_FLAGS "-fsycl -fintelfpga -Wall ${CONSTEXPR_STEPS} ${WIN_FLAG} ${AC_TYPES_FLAG} ${LITERALS_PER_CYCLE_FLAG} ${DECOMPRESS_FORMAT_FLAG} -DFPGA_SIMULATOR")
-set(SIMULATOR_LINK_FLAGS "-fsycl -fintelfpga -Xssimulation -Xsghdl -Xstarget=${FPGA_DEVICE} ${USER_HARDWARE_FLAGS}")
-set(HARDWARE_COMPILE_FLAGS "-fsycl -fintelfpga -Wall ${CONSTEXPR_STEPS} ${WIN_FLAG} ${AC_TYPES_FLAG} ${LITERALS_PER_CYCLE_FLAG} ${DECOMPRESS_FORMAT_FLAG} -DFPGA_HARDWARE")
-set(REPORT_LINK_FLAGS "-fsycl -fintelfpga -Xshardware ${PROFILE_FLAG} ${FLAT_COMPILE_FLAG} -Xsparallel=2 ${SEED_FLAG} -Xstarget=${FPGA_DEVICE} ${USER_HARDWARE_FLAGS}")
-set(HARDWARE_LINK_FLAGS "${REPORT_LINK_FLAGS} ${AC_TYPES_FLAG}")
+set(EMULATOR_COMPILE_FLAGS "-fsycl -fintelfpga -Wall ${CONSTEXPR_STEPS} ${WIN_FLAG} ${AC_TYPES_FLAG} ${LITERALS_PER_CYCLE_FLAG} ${DECOMPRESS_FORMAT_FLAG} -DFPGA_EMULATOR ${BSP_FLAG}")
+set(EMULATOR_LINK_FLAGS "-fsycl -fintelfpga ${AC_TYPES_FLAG} ${BSP_FLAG}")
+set(SIMULATOR_COMPILE_FLAGS "-fsycl -fintelfpga -Wall ${CONSTEXPR_STEPS} ${WIN_FLAG} ${AC_TYPES_FLAG} ${LITERALS_PER_CYCLE_FLAG} ${DECOMPRESS_FORMAT_FLAG} -DFPGA_SIMULATOR ${BSP_FLAG}")
+set(SIMULATOR_LINK_FLAGS "-fsycl -fintelfpga -Xssimulation -Xsghdl -Xstarget=${FPGA_DEVICE} ${USER_HARDWARE_FLAGS} ${AC_TYPES_FLAG} ${BSP_FLAG}")
+set(HARDWARE_COMPILE_FLAGS "-fsycl -fintelfpga -Wall ${CONSTEXPR_STEPS} ${WIN_FLAG} ${AC_TYPES_FLAG} ${LITERALS_PER_CYCLE_FLAG} ${DECOMPRESS_FORMAT_FLAG} -DFPGA_HARDWARE ${BSP_FLAG}")
+set(REPORT_LINK_FLAGS "-fsycl -fintelfpga -Xshardware ${PROFILE_FLAG} ${FLAT_COMPILE_FLAG} -Xsparallel=2 ${SEED_FLAG} -Xstarget=${FPGA_DEVICE} ${USER_HARDWARE_FLAGS} ${BSP_FLAG}")
+set(HARDWARE_LINK_FLAGS "${REPORT_LINK_FLAGS} ${AC_TYPES_FLAG} ${BSP_FLAG}")
# use cmake -D USER_HARDWARE_FLAGS= to set extra flags for FPGA backend compilation
###############################################################################
diff --git a/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/decompress/src/common/common.hpp b/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/decompress/src/common/common.hpp
index 93b1e9daeb..e379258e89 100644
--- a/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/decompress/src/common/common.hpp
+++ b/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/decompress/src/common/common.hpp
@@ -321,7 +321,17 @@ sycl::event SubmitProducer(sycl::queue& q, unsigned in_count_padded,
// GZIP and SNAPPY designs, we guarantee this in the DecompressBytes
// functions in ../gzip/gzip_decompressor.hpp and
// ../snappy/snappy_decompressor.hpp respectively.
+#if defined (IS_BSP)
+ // When targeting a BSP, we instruct the compiler that this pointer
+ // lives on the device.
+ // Knowing this, the compiler won't generate hardware to
+ // potentially get data from the host.
sycl::device_ptr in(in_ptr);
+#else
+ // Device pointers are not supported when targeting an FPGA
+ // family/part
+ unsigned char* in(in_ptr);
+#endif
fpga_tools::MemoryToPipe(
in, iteration_count);
});
@@ -355,7 +365,19 @@ sycl::event SubmitConsumer(sycl::queue& q, unsigned out_count_padded,
// elements at once from 'OutPipe' and write them to 'out_ptr'.
// For details about the 'false' template parameter, see the SubmitProducer
// function above.
+
+#if defined (IS_BSP)
+ // When targeting a BSP, we instruct the compiler that this pointer
+ // lives on the device.
+ // Knowing this, the compiler won't generate hardware to
+ // potentially get data from the host.
sycl::device_ptr out(out_ptr);
+#else
+ // Device pointers are not supported when targeting an FPGA
+ // family/part
+ unsigned char* out(out_ptr);
+#endif
+
fpga_tools::PipeToMemory(
out, iteration_count);
diff --git a/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/decompress/src/gzip/gzip_metadata_reader.hpp b/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/decompress/src/gzip/gzip_metadata_reader.hpp
index 80b15d04c4..d042458173 100644
--- a/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/decompress/src/gzip/gzip_metadata_reader.hpp
+++ b/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/decompress/src/gzip/gzip_metadata_reader.hpp
@@ -284,9 +284,22 @@ sycl::event SubmitGzipMetadataReader(sycl::queue& q, int in_count,
GzipHeaderData* hdr_data_ptr, int* crc_ptr,
int* out_count_ptr) {
return q.single_task([=]() [[intel::kernel_args_restrict]] {
+
+#if defined (IS_BSP)
+ // When targeting a BSP, we instruct the compiler that this pointer
+ // lives on the device.
+ // Knowing this, the compiler won't generate hardware to
+ // potentially get data from the host.
sycl::device_ptr hdr_data(hdr_data_ptr);
sycl::device_ptr crc(crc_ptr);
sycl::device_ptr out_count(out_count_ptr);
+#else
+ // Device pointers are not supported when targeting an FPGA
+ // family/part
+ GzipHeaderData* hdr_data(hdr_data_ptr);
+ int* crc(crc_ptr);
+ int* out_count(out_count_ptr);
+#endif
// local copies of the output data
GzipHeaderData hdr_data_loc;
diff --git a/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/decompress/src/snappy/snappy_reader.hpp b/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/decompress/src/snappy/snappy_reader.hpp
index ed93d66b80..e41a9af598 100644
--- a/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/decompress/src/snappy/snappy_reader.hpp
+++ b/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/decompress/src/snappy/snappy_reader.hpp
@@ -385,7 +385,17 @@ template ([=] {
+#if defined (IS_BSP)
+ // When targeting a BSP, we instruct the compiler that this pointer
+ // lives on the device.
+ // Knowing this, the compiler won't generate hardware to
+ // potentially get data from the host.
sycl::device_ptr preamble_count(preamble_count_ptr);
+#else
+ // Device pointers are not supported when targeting an FPGA
+ // family/part
+ unsigned* preamble_count(preamble_count_ptr);
+#endif
*preamble_count =
SnappyReader(in_count);
});
diff --git a/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/gzip/README.md b/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/gzip/README.md
index ae930552aa..ce1e0c4442 100644
--- a/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/gzip/README.md
+++ b/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/gzip/README.md
@@ -39,7 +39,7 @@ You can also find more information about [troubleshooting build errors](/DirectP
| Optimized for | Description
|:--- |:---
| OS | Ubuntu* 18.04/20.04
RHEL*/CentOS* 8
SUSE* 15
Windows* 10
-| Hardware | Intel® Programmable Acceleration Card with Intel® Arria® 10 GX FPGA (Intel® PAC with Intel® Arria® 10 GX FPGA)
Intel® FPGA Programmable Acceleration Card (PAC) D5005 (with Intel Stratix® 10 SX)
+| Hardware | Intel® Agilex™, Arria® 10, and Stratix® 10 FPGAs
| Software | Intel® oneAPI DPC++/C++ Compiler
> **Note**: Even though the Intel DPC++/C++ OneAPI compiler is enough to compile for emulation, generating reports and generating RTL, there are extra software requirements for the simulation flow and FPGA compiles.
@@ -50,18 +50,20 @@ You can also find more information about [troubleshooting build errors](/DirectP
> - ModelSim® SE
>
> When using the hardware compile flow, Intel® Quartus® Prime Pro Edition must be installed and accessible through your PATH.
+>
+> :warning: Make sure you add the device files associated with the FPGA that you are targeting to your Intel® Quartus® Prime installation.
## Key Implementation Details
The GZIP DEFLATE algorithm uses a GZIP-compatible Limpel-Ziv 77 (LZ77) algorithm for data de-duplication and a GZIP-compatible Static Huffman algorithm for bit reduction. The implementation includes three FPGA accelerated tasks (LZ77, Static Huffman, and CRC).
-The FPGA implementation of the algorithm enables either one or two independent GZIP compute engines to operate in parallel on the FPGA. The available FPGA resources constrain the number of engines. By default, the design is parameterized to create a single engine when the design is compiled to target Intel® Programmable Acceleration Card with Intel® Arria® 10 GX FPGA (Intel® PAC with Intel® Arria® 10 GX FPGA). Two engines are created when compiling for Intel® FPGA Programmable Acceleration Card (PAC) D5005 (with Intel Stratix® 10 SX), which is a larger device.
+The FPGA implementation of the algorithm enables either one or two independent GZIP compute engines to operate in parallel on the FPGA. The available FPGA resources constrain the number of engines. By default, the design is parameterized to create a single engine when the design is compiled to target an Intel® Arria® 10 FPGA. Two engines are created when compiling for Intel® Stratix® 10 or Agilex™ FPGAs, which are a larger device.
This reference design contains two variants: "High Bandwidth" and "Low-Latency."
- The High Bandwidth variant maximizes system throughput without regard for latency. It transfers input/output SYCL Buffers to FPGA-attached DDR. The kernel then operates on these buffers.
- The Low-Latency variant takes advantage of Universal Shared Memory (USM) to avoid these copy operations, allowing the GZIP engine to access input/output buffers in host-memory directly. This reduces latency, but throughput is also reduced. "Latency" in this context is defined as the duration of time between when the input buffer is available in host memory to when the output buffer (i.e., the compressed result) is available in host memory.
-The Low-Latency variant is only supported on Intel Stratix® 10 SX.
+The Low-Latency variant is only supported on USM capable BSPs, or when targeting an FPGA family/part number.
| Kernel | Description
|:--- |:---
@@ -99,14 +101,14 @@ To optimize performance, GZIP leverages techniques discussed in the following FP
| `-Xshardware` | Targets FPGA hardware (instead of FPGA emulator).
| `-Xsparallel=2` | Uses two cores when compiling the bitstream through Intel® Quartus®.
| `-Xsseed=` | Uses a particular seed while running Intel® Quartus®, selected to yield the best Fmax for this design.
-| `-Xsnum-reorder=6` | On Intel Stratix® 10 SX only, specify a wider data path for read data from global memory.
+| `-Xsnum-reorder=6` | On FPGA boards that have a large memory bandwidth, specify a wider data path for read data from global memory.
| `-Xsopt-arg="-nocaching"` | Specifies that cached LSUs should not be used.
Additionaly, the cmake build system can be configured using the following parameter:
| cmake option | Description
|:--- |:---
-| `-DNUM_ENGINES=<1\|2>` | Specifies that 1 GZIP engine should be compiled when targeting Intel Arria® 10 GX and two engines when targeting Intel Stratix® 10 SX.
+| `-DNUM_ENGINES=<1\|2>` | Specifies that the number of GZIP engine that should be compiled.
### Performance
@@ -114,9 +116,9 @@ Performance results are based on testing as of October 27, 2020.
> **Note**: Refer to the [Performance Disclaimers](/DirectProgramming/C++SYCL_FPGA/README.md#performance-disclaimers) section for important performance information.
-| Device | Throughput
-|:--- |:---
-| Intel® PAC with Intel® Arria® 10 GX FPGA | 1 engine @ 3.4 GB/s
+| Device | Throughput
+|:--- |:---
+| Intel® PAC with Intel® Arria® 10 GX FPGA | 1 engine @ 3.4 GB/s
| Intel® FPGA Programmable Acceleration Card (PAC) D5005 (with Intel Stratix® 10 SX) | 2 engines @ 4.5 GB/s each = 9.0 GB/s total (High Bandwidth variant) using 120MB+ input
2 engines @ 3.5 GB/s = 7.0 GB/s (Low Latency variant) using 80 KB input
## Build the `GZIP` Design
@@ -140,20 +142,28 @@ Performance results are based on testing as of October 27, 2020.
### On Linux*
1. Change to the sample directory.
-2. Configure the build system for **Intel® PAC with Intel Arria® 10 GX FPGA**, which is the default.
+2. Configure the build system for the Agilex™ device family, which is the default.
+
```
mkdir build
cd build
cmake ..
```
- For the **Intel® FPGA PAC D5005 (with Intel Stratix® 10 SX)**, enter the following:
- ```
- cmake .. -DFPGA_DEVICE=intel_s10sx_pac:pac_s10
- ```
- For the **low latency** version of the design, add `-DLOW_LATENCY=1`.
- ```
- cmake .. -DLOW_LATENCY=1 -DFPGA_DEVICE=intel_s10sx_pac:pac_s10_usm
- ```
+
+ For the **low latency** version of the design, add `-DLOW_LATENCY=1` to your `cmake` command.
+
+ > **Note**: You can change the default target by using the command:
+ > ```
+ > cmake .. -DFPGA_DEVICE=
+ > ```
+ >
+ > Alternatively, you can target an explicit FPGA board variant and BSP by using the following command:
+ > ```
+ > cmake .. -DFPGA_DEVICE=:
+ > ```
+ >
+ > You will only be able to run an executable on the FPGA if you specified a BSP.
+
3. Compile the design. (The provided targets match the recommended development flow.)
1. Compile for emulation (fast compile time, targets emulated FPGA device).
@@ -175,27 +185,30 @@ Performance results are based on testing as of October 27, 2020.
```
make fpga
```
- (Optional) The hardware compiles listed above can take several hours to complete; alternatively, you can download FPGA precompiled binaries (compatible with Linux* Ubuntu* 18.04) from [https://iotdk.intel.com/fpga-precompiled-binaries/latest/gzip.fpga.tar.gz](https://iotdk.intel.com/fpga-precompiled-binaries/latest/gzip.fpga.tar.gz).
### On Windows*
-> **Note**: The Intel® PAC with Intel Arria® 10 GX FPGA and Intel® FPGA PAC D5005 (with Intel Stratix® 10 SX) do not yet support Windows*. Compiling to FPGA hardware on Windows* requires a third-party or custom Board Support Package (BSP) with Windows* support.
-
1. Change to the sample directory.
-2. Configure the build system for **Intel® PAC with Intel Arria® 10 GX FPGA**, which is the default
+2. Configure the build system for the Agilex™ device family, which is the default.
```
mkdir build
cd build
cmake -G "NMake Makefiles" ..
```
- For the **Intel® FPGA PAC D5005 (with Intel Stratix® 10 SX)**, enter the following:
- ```
- cmake -G "NMake Makefiles" .. -DFPGA_DEVICE=intel_s10sx_pac:pac_s10
- ```
- For the **low latency** version of the design, add `-DLOW_LATENCY=1`.
- ```
- cmake -G "Nmake Makefiles" .. -DLOW_LATENCY=1 -DFPGA_DEVICE=intel_s10sx_pac:pac_s10_usm
- ```
+
+ For the **low latency** version of the design, add `-DLOW_LATENCY=1` to your `cmake` command.
+
+ > **Note**: You can change the default target by using the command:
+ > ```
+ > cmake -G "NMake Makefiles" .. -DFPGA_DEVICE=
+ > ```
+ >
+ > Alternatively, you can target an explicit FPGA board variant and BSP by using the following command:
+ > ```
+ > cmake -G "NMake Makefiles" .. -DFPGA_DEVICE=:
+ > ```
+ >
+ > You will only be able to run an executable on the FPGA if you specified a BSP.
3. Compile the design. (The provided targets match the recommended development flow.)
@@ -227,7 +240,7 @@ Performance results are based on testing as of October 27, 2020.
| Argument | Description
|:--- |:---
| `` | Specifies the file to be compressed.
Use an 120+ MB file to achieve peak performance.
Use an 80 KB file for Low Latency variant.
-| `-o=` | Specifies the name of the output file. The default name of the output file is `.gz`.
When targeting Intel® FPGA PAC D5005 (with Intel Stratix® 10 SX), the single `` is fed to both engines, yielding two identical output files, using `` as the basis for the filenames.
+| `-o=` | Specifies the name of the output file. The default name of the output file is `.gz`.
When using two engines, the single `` is fed to both engines, yielding two identical output files, using `` as the basis for the filenames.
### On Linux
@@ -241,7 +254,7 @@ Performance results are based on testing as of October 27, 2020.
CL_CONTEXT_MPSIM_DEVICE_INTELFPGA=1 ./gzip.fpga_sim -o=
```
- 3. Run the sample on the FPGA device.
+ 3. Run the sample on the FPGA device (only if you ran `cmake` with `-DFPGA_DEVICE=:`).
```
aocl initialize acl0 pac_s10_usm
./gzip.fpga -o=
@@ -258,7 +271,7 @@ Performance results are based on testing as of October 27, 2020.
gzip.fpga_sim.exe -o=
set CL_CONTEXT_MPSIM_DEVICE_INTELFPGA=
```
- 3. Run the sample on the FPGA device.
+ 3. Run the sample on the FPGA device (only if you ran `cmake` with `-DFPGA_DEVICE=:`).
```
aocl initialize acl0 pac_s10_usm
gzip.fpga.exe -o=
diff --git a/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/gzip/src/CMakeLists.txt b/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/gzip/src/CMakeLists.txt
index 56b9aabe00..133c8c1de0 100755
--- a/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/gzip/src/CMakeLists.txt
+++ b/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/gzip/src/CMakeLists.txt
@@ -21,12 +21,37 @@ set(FPGA_TARGET ${TARGET_NAME}.fpga)
# FPGA board selection
if(NOT DEFINED FPGA_DEVICE)
- set(FPGA_DEVICE "intel_a10gx_pac:pac_a10")
+ set(FPGA_DEVICE "Agilex")
+ set(DEVICE_FLAG "Agilex")
message(STATUS "FPGA_DEVICE was not specified.\
- \nConfiguring the design to run on the default FPGA board ${FPGA_DEVICE} (Intel(R) PAC with Intel Arria(R) 10 GX FPGA). \
- \nPlease refer to the README for information on board selection.")
+ \nConfiguring the design to the default FPGA family: ${FPGA_DEVICE}\
+ \nPlease refer to the README for information on target selection.")
+
+ set(IS_BSP "0")
else()
- message(STATUS "Configuring the design to run on FPGA board ${FPGA_DEVICE}")
+ string(TOLOWER ${FPGA_DEVICE} FPGA_DEVICE_NAME)
+ if(FPGA_DEVICE_NAME MATCHES ".*a10.*" OR FPGA_DEVICE_NAME MATCHES ".*arria10.*")
+ set(DEVICE_FLAG "A10")
+ elseif(FPGA_DEVICE_NAME MATCHES ".*s10.*" OR FPGA_DEVICE_NAME MATCHES ".*stratix10.*")
+ set(DEVICE_FLAG "S10")
+ elseif(FPGA_DEVICE_NAME MATCHES ".*agilex.*")
+ set(DEVICE_FLAG "Agilex")
+ endif()
+ message(STATUS "Configuring the design with the following target: ${FPGA_DEVICE}")
+
+ # Check if the target is a BSP
+ if(IS_BSP MATCHES "1" OR FPGA_DEVICE MATCHES ".*pac_a10.*|.*pac_s10.*")
+ set(IS_BSP "1")
+ else()
+ set(IS_BSP "0")
+ message(STATUS "The selected target ${FPGA_DEVICE} is assumed to be an FPGA part number, so USM will be enabled by default.")
+ message(STATUS "If the target is actually a BSP that does not support USM, run cmake with -DIS_BSP=1.")
+ endif()
+endif()
+
+if(NOT DEFINED DEVICE_FLAG)
+ message(FATAL_ERROR "An unrecognized or custom board was passed, but DEVICE_FLAG was not specified. \
+ Make sure you have set -DDEVICE_FLAG=A10, -DDEVICE_FLAG=S10 or -DDEVICE_FLAG=Agilex.")
endif()
# This is a Windows-specific flag that enables error handling in host code
@@ -35,7 +60,7 @@ if(WIN32)
endif()
# Set design parameters according to the selected chip
-if(FPGA_DEVICE MATCHES ".*a10.*")
+if(DEVICE_FLAG MATCHES "A10")
# A10 parameters
set(NUM_ENGINES 1)
if(DEFINED LOW_LATENCY)
@@ -45,7 +70,7 @@ if(FPGA_DEVICE MATCHES ".*a10.*")
set(SEED "-Xsseed=4")
set(NUM_REORDER "")
endif()
-elseif(FPGA_DEVICE MATCHES ".*s10.*")
+elseif(DEVICE_FLAG MATCHES "S10")
# S10 parameters
set(NUM_ENGINES 2)
if(DEFINED LOW_LATENCY)
@@ -57,7 +82,7 @@ elseif(FPGA_DEVICE MATCHES ".*s10.*")
# For Low Latency variant this is not necessary since only one channel of global memory is used (host memory).
set(NUM_REORDER "-Xsnum-reorder=6")
endif()
-elseif(FPGA_DEVICE MATCHES ".*agilex.*")
+elseif(DEVICE_FLAG MATCHES "Agilex")
# Agilex™
set(NUM_ENGINES 2)
if(DEFINED LOW_LATENCY)
@@ -79,11 +104,10 @@ if(IGNORE_DEFAULT_SEED)
set(SEED "")
endif()
-
# Presence of USM host allocations (and whether to turn on enable the low-latency target) is detected automatically by
# looking at the name of the BSP, or manually by the user when running CMake.
# E.g., cmake .. -DUSM_HOST_ALLOCATIONS_ENABLED=1
-if(LOW_LATENCY AND NOT FPGA_DEVICE MATCHES ".usm.*" AND (NOT DEFINED USM_HOST_ALLOCATIONS_ENABLED OR USM_HOST_ALLOCATIONS_ENABLED STREQUAL "0"))
+if((IS_BSP STREQUAL "1") AND LOW_LATENCY AND NOT FPGA_DEVICE MATCHES ".usm.*" AND (NOT DEFINED USM_HOST_ALLOCATIONS_ENABLED OR USM_HOST_ALLOCATIONS_ENABLED STREQUAL "0"))
# Low latency design requires USM, so error out
message(FATAL_ERROR "Error: The Low Latency variant of the design requires USM host allocations")
endif()
diff --git a/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/merge_sort/README.md b/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/merge_sort/README.md
index 3dc9f2ef3f..186e1f0bbd 100755
--- a/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/merge_sort/README.md
+++ b/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/merge_sort/README.md
@@ -41,7 +41,7 @@ You can also find more information about [troubleshooting build errors](/DirectP
| Optimized for | Description
|:--- |:---
| OS | Ubuntu* 18.04/20.04
RHEL*/CentOS* 8
SUSE* 15
-| Hardware | Intel® Programmable Acceleration Card with Intel® Arria® 10 GX FPGA (Intel® PAC with Intel® Arria® 10 GX FPGA)
Intel® FPGA Programmable Acceleration Card (PAC) D5005 (with Intel Stratix® 10 SX)
+| Hardware | Intel® Agilex™, Arria® 10, and Stratix® 10 FPGAs
| Software | Intel® oneAPI DPC++/C++ Compiler
> **Note**: Even though the Intel DPC++/C++ OneAPI compiler is enough to compile for emulation, generating reports and generating RTL, there are extra software requirements for the simulation flow and FPGA compiles.
@@ -52,6 +52,8 @@ You can also find more information about [troubleshooting build errors](/DirectP
> - ModelSim® SE
>
> When using the hardware compile flow, Intel® Quartus® Prime Pro Edition must be installed and accessible through your PATH.
+>
+> :warning: Make sure you add the device files associated with the FPGA that you are targeting to your Intel® Quartus® Prime installation.
## Key Implementation Details
@@ -116,16 +118,26 @@ For `constexpr_math.hpp`, `pipe_utils.hpp`, and `unrolled_loop.hpp` see the READ
### On Linux*
1. Change to the sample directory.
-2. Configure the build system for **Intel® PAC with Intel Arria® 10 GX FPGA**, which is the default.
+2. Configure the build system for the Agilex™ device family, which is the default.
+
```
mkdir build
cd build
cmake ..
```
- For the **Intel® FPGA PAC D5005 (with Intel Stratix® 10 SX)**, enter the following:
- ```
- cmake .. -DFPGA_DEVICE=intel_s10sx_pac:pac_s10
- ```
+
+ > **Note**: You can change the default target by using the command:
+ > ```
+ > cmake .. -DFPGA_DEVICE=
+ > ```
+ >
+ > Alternatively, you can target an explicit FPGA board variant and BSP by using the following command:
+ > ```
+ > cmake .. -DFPGA_DEVICE=:
+ > ```
+ >
+ > You will only be able to run an executable on the FPGA if you specified a BSP.
+
3. Compile the design. (The provided targets match the recommended development flow.)
1. Compile for emulation (fast compile time, targets emulated FPGA device).
@@ -147,8 +159,6 @@ For `constexpr_math.hpp`, `pipe_utils.hpp`, and `unrolled_loop.hpp` see the READ
make fpga
```
- (Optional) The hardware compiles listed above can take several hours to complete; alternatively, you can download FPGA precompiled binaries (compatible with Linux* Ubuntu* 18.04) from [https://iotdk.intel.com/fpga-precompiled-binaries/latest/merge_sort.fpga.tar.gz](https://iotdk.intel.com/fpga-precompiled-binaries/latest/merge_sort.fpga.tar.gz).
-
## Run the `Merge Sort` Program
### On Linux
@@ -157,11 +167,11 @@ For `constexpr_math.hpp`, `pipe_utils.hpp`, and `unrolled_loop.hpp` see the READ
```
./merge_sort.fpga_emu
```
-2. Run the sample on the FPGA simulator device:
+2. Run the sample on the FPGA simulator device.
```
CL_CONTEXT_MPSIM_DEVICE_INTELFPGA=1 ./merge_sort.fpga_sim
```
-3. Run the sample on the FPGA device.
+3. Run the sample on the FPGA device (only if you ran `cmake` with `-DFPGA_DEVICE=:`).
```
./merge_sort.fpga
```
diff --git a/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/merge_sort/src/CMakeLists.txt b/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/merge_sort/src/CMakeLists.txt
index 917d1e16c9..bf03017f1e 100644
--- a/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/merge_sort/src/CMakeLists.txt
+++ b/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/merge_sort/src/CMakeLists.txt
@@ -6,12 +6,26 @@ set(FPGA_TARGET ${TARGET_NAME}.fpga)
# FPGA board selection
if(NOT DEFINED FPGA_DEVICE)
- set(FPGA_DEVICE "intel_a10gx_pac:pac_a10")
+ set(FPGA_DEVICE "Agilex")
message(STATUS "FPGA_DEVICE was not specified.\
- \nConfiguring the design to run on the default FPGA board ${FPGA_DEVICE} (Intel(R) PAC with Intel Arria(R) 10 GX FPGA). \
- \nPlease refer to the README for information on board selection.")
+ \nConfiguring the design to the default FPGA family: ${FPGA_DEVICE}\
+ \nPlease refer to the README for information on target selection.")
+
+ set(IS_BSP "0")
+ set(BSP_FLAG "")
else()
- message(STATUS "Configuring the design to run on FPGA board ${FPGA_DEVICE}")
+ message(STATUS "Configuring the design with the following target: ${FPGA_DEVICE}")
+
+ # Check if the target is a BSP
+ if(IS_BSP MATCHES "1" OR FPGA_DEVICE MATCHES ".*pac_a10.*|.*pac_s10.*")
+ set(IS_BSP "1")
+ set(BSP_FLAG "-DIS_BSP")
+ else()
+ set(IS_BSP "0")
+ set(BSP_FLAG "")
+ message(STATUS "The selected target ${FPGA_DEVICE} is assumed to be an FPGA part number, so the IS_BSP macro will not be passed to your C++ code and USM will be enabled by default.")
+ message(STATUS "If the target is actually a BSP, run cmake with -DIS_BSP=1 to pass the IS_BSP macro to your C++ code and USM checks are performed.")
+ endif()
endif()
# This is a Windows-specific flag that enables error handling in host code
@@ -21,7 +35,7 @@ endif()
# check if the BSP has USM host allocations or manually enable using host allocations
# e.g. cmake .. -DUSE_USM_HOST_ALLOCATIONS=1
-if(FPGA_DEVICE MATCHES ".*usm.*" OR DEFINED USE_USM_HOST_ALLOCATIONS)
+if((IS_BSP STREQUAL "0") OR FPGA_DEVICE MATCHES ".*usm.*" OR DEFINED USE_USM_HOST_ALLOCATIONS)
set(ENABLE_USM "-DUSM_HOST_ALLOCATIONS")
message(STATUS "USM host allocations are enabled")
endif()
@@ -66,12 +80,12 @@ endif()
# 1. The "compile" stage compiles the device code to an intermediate representation (SPIR-V).
# 2. The "link" stage invokes the compiler's FPGA backend before linking.
# For this reason, FPGA backend flags must be passed as link flags in CMake.
-set(EMULATOR_COMPILE_FLAGS "-fsycl -fintelfpga -Wall ${WIN_FLAG} ${ENABLE_USM} ${MERGE_UNITS_FLAG} ${SORT_WIDTH_FLAG} -DFPGA_EMULATOR")
-set(EMULATOR_LINK_FLAGS "-fsycl -fintelfpga ${ENABLE_USM} ${MERGE_UNITS_FLAG} ${SORT_WIDTH_FLAG}")
-set(SIMULATOR_COMPILE_FLAGS "-fsycl -fintelfpga -Wall ${WIN_FLAG} -Xssimulation -DFPGA_SIMULATOR ${ENABLE_USM} ${MERGE_UNITS_FLAG} ${SORT_WIDTH_FLAG}")
-set(SIMULATOR_LINK_FLAGS "-fsycl -fintelfpga -Xssimulation -Xsghdl -Xstarget=${FPGA_DEVICE} ${ENABLE_USM} ${MERGE_UNITS_FLAG} ${SORT_WIDTH_FLAG} ${USER_HARDWARE_FLAGS}")
-set(HARDWARE_COMPILE_FLAGS "-fsycl -fintelfpga -Wall ${WIN_FLAG} ${ENABLE_USM} ${MERGE_UNITS_FLAG} ${SORT_WIDTH_FLAG} -DFPGA_HARDWARE")
-set(HARDWARE_LINK_FLAGS "-fsycl -fintelfpga -Xshardware ${PROFILE_FLAG} -Xsparallel=2 ${SEED_FLAG} -Xstarget=${FPGA_DEVICE} ${ENABLE_USM} ${MERGE_UNITS_FLAG} ${SORT_WIDTH_FLAG} ${USER_HARDWARE_FLAGS}")
+set(EMULATOR_COMPILE_FLAGS "-fsycl -fintelfpga -Wall ${WIN_FLAG} ${ENABLE_USM} ${MERGE_UNITS_FLAG} ${SORT_WIDTH_FLAG} -DFPGA_EMULATOR ${BSP_FLAG}")
+set(EMULATOR_LINK_FLAGS "-fsycl -fintelfpga ${ENABLE_USM} ${MERGE_UNITS_FLAG} ${SORT_WIDTH_FLAG} ${BSP_FLAG}")
+set(SIMULATOR_COMPILE_FLAGS "-fsycl -fintelfpga -Wall ${WIN_FLAG} -Xssimulation -DFPGA_SIMULATOR ${ENABLE_USM} ${MERGE_UNITS_FLAG} ${SORT_WIDTH_FLAG} ${BSP_FLAG}")
+set(SIMULATOR_LINK_FLAGS "-fsycl -fintelfpga -Xssimulation -Xsghdl -Xstarget=${FPGA_DEVICE} ${ENABLE_USM} ${MERGE_UNITS_FLAG} ${SORT_WIDTH_FLAG} ${USER_HARDWARE_FLAGS} ${BSP_FLAG}")
+set(HARDWARE_COMPILE_FLAGS "-fsycl -fintelfpga -Wall ${WIN_FLAG} ${ENABLE_USM} ${MERGE_UNITS_FLAG} ${SORT_WIDTH_FLAG} -DFPGA_HARDWARE ${BSP_FLAG}")
+set(HARDWARE_LINK_FLAGS "-fsycl -fintelfpga -Xshardware ${PROFILE_FLAG} -Xsparallel=2 ${SEED_FLAG} -Xstarget=${FPGA_DEVICE} ${ENABLE_USM} ${MERGE_UNITS_FLAG} ${SORT_WIDTH_FLAG} ${USER_HARDWARE_FLAGS} ${BSP_FLAG}")
# use cmake -D USER_HARDWARE_FLAGS= to set extra flags for FPGA backend compilation
###############################################################################
diff --git a/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/merge_sort/src/consume.hpp b/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/merge_sort/src/consume.hpp
index ccaaf788b6..7d74bae1b5 100644
--- a/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/merge_sort/src/consume.hpp
+++ b/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/merge_sort/src/consume.hpp
@@ -22,7 +22,13 @@ event Consume(queue& q, ValueT* out_ptr, IndexT total_count, IndexT offset,
// Creating a device_ptr tells the compiler that this pointer is in
// device memory, not host memory, and avoids creating extra connections
// to host memory
+ // This is only done in the case where we target a BSP as device
+ // pointers are not supported when targeting an FPGA family/part
+#if defined(IS_BSP)
device_ptr out(out_ptr);
+#else
+ ValueT* out(out_ptr);
+#endif
for (IndexT i = 0; i < iterations; i++) {
// get the data from the pipe
diff --git a/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/merge_sort/src/produce.hpp b/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/merge_sort/src/produce.hpp
index 68b945fa98..c5cc08b4fa 100644
--- a/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/merge_sort/src/produce.hpp
+++ b/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/merge_sort/src/produce.hpp
@@ -24,7 +24,13 @@ event Produce(queue& q, ValueT *in_ptr, IndexT count, IndexT in_block_count,
// Creating a device_ptr tells the compiler that this pointer is in
// device memory, not host memory, and avoids creating extra connections
// to host memory
+ // This is only done in the case where we target a BSP as device
+ // pointers are not supported when targeting an FPGA family/part
+#if defined(IS_BSP)
device_ptr in(in_ptr);
+#else
+ ValueT* in(in_ptr);
+#endif
for (IndexT i = 0; i < iterations; i++) {
// read 'k_width' elements from device memory
diff --git a/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/merge_sort/src/sorting_networks.hpp b/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/merge_sort/src/sorting_networks.hpp
index 46a7a3d4b8..487bac5a9f 100644
--- a/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/merge_sort/src/sorting_networks.hpp
+++ b/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/merge_sort/src/sorting_networks.hpp
@@ -104,7 +104,17 @@ event SortNetworkKernel(queue& q, ValueT* out_ptr, IndexT total_count,
const IndexT iterations = total_count / k_width;
return q.single_task([=]() [[intel::kernel_args_restrict]] {
+ // Creating a device_ptr tells the compiler that this pointer is in
+ // device memory, not host memory, and avoids creating extra connections
+ // to host memory
+ // This is only done in the case where we target a BSP as device
+ // pointers are not supported when targeting an FPGA family/part
+#if defined(IS_BSP)
device_ptr out(out_ptr);
+#else
+ ValueT* out(out_ptr);
+#endif
+
for (IndexT i = 0; i < iterations; i++) {
// read the input data from the pipe
sycl::vec data = InPipe::read();
diff --git a/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/mvdr_beamforming/README.md b/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/mvdr_beamforming/README.md
index fabb9cf5c3..669bb3cdd4 100755
--- a/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/mvdr_beamforming/README.md
+++ b/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/mvdr_beamforming/README.md
@@ -49,7 +49,7 @@ You can also find more information about [troubleshooting build errors](/DirectP
| Optimized for | Description
|:--- |:---
| OS | Ubuntu* 18.04/20.04
RHEL*/CentOS* 8
SUSE* 15
Windows* 10
-| Hardware | Intel® Programmable Acceleration Card with Intel® Arria® 10 GX FPGA (Intel® PAC with Intel® Arria® 10 GX FPGA)
Intel® FPGA Programmable Acceleration Card (PAC) D5005 (with Intel Stratix® 10 SX)
+| Hardware | Intel® Agilex™, Arria® 10, and Stratix® 10 FPGAs
| Software | Intel® oneAPI DPC++/C++ Compiler
> **Note**: Even though the Intel DPC++/C++ OneAPI compiler is enough to compile for emulation, generating reports and generating RTL, there are extra software requirements for the simulation flow and FPGA compiles.
@@ -60,6 +60,8 @@ You can also find more information about [troubleshooting build errors](/DirectP
> - ModelSim® SE
>
> When using the hardware compile flow, Intel® Quartus® Prime Pro Edition must be installed and accessible through your PATH.
+>
+> :warning: Make sure you add the device files associated with the FPGA that you are targeting to your Intel® Quartus® Prime installation.
## Key Implementation Details
@@ -119,16 +121,26 @@ The `DataProducer` kernel replaces the input IO pipe in the first image. The spl
### On Linux*
1. Change to the sample directory.
-2. Configure the build system for **Intel® PAC with Intel Arria® 10 GX FPGA**, which is the default.
+2. Configure the build system for the Agilex™ device family, which is the default.
+
```
mkdir build
cd build
cmake ..
```
- For the **Intel® FPGA PAC D5005 (with Intel Stratix® 10 SX)**, enter the following:
- ```
- cmake .. -DFPGA_DEVICE=intel_s10sx_pac:pac_s10
- ```
+
+ > **Note**: You can change the default target by using the command:
+ > ```
+ > cmake .. -DFPGA_DEVICE=
+ > ```
+ >
+ > Alternatively, you can target an explicit FPGA board variant and BSP by using the following command:
+ > ```
+ > cmake .. -DFPGA_DEVICE=:
+ > ```
+ >
+ > You will only be able to run an executable on the FPGA if you specified a BSP.
+
3. Compile the design. (The provided targets match the recommended development flow.)
1. Compile for emulation (fast compile time, targets emulated FPGA device).
@@ -150,23 +162,28 @@ The `DataProducer` kernel replaces the input IO pipe in the first image. The spl
make fpga
```
- (Optional) The hardware compiles listed above can take several hours to complete; alternatively, you can download FPGA precompiled binaries (compatible with Linux* Ubuntu* 18.04) from [https://iotdk.intel.com/fpga-precompiled-binaries/latest/mvdr_beamforming.fpga.tar.gz](https://iotdk.intel.com/fpga-precompiled-binaries/latest/mvdr_beamforming.fpga.tar.gz).
-
### On Windows*
-> **Note**: The Intel® PAC with Intel Arria® 10 GX FPGA and Intel® FPGA PAC D5005 (with Intel Stratix® 10 SX) do not yet support Windows*. Compiling to FPGA hardware on Windows* requires a third-party or custom Board Support Package (BSP) with Windows* support.
-
1. Change to the sample directory.
-2. Configure the build system for **Intel® PAC with Intel Arria® 10 GX FPGA**, which is the default
+2. Configure the build system for the Agilex™ device family, which is the default.
```
mkdir build
cd build
cmake -G "NMake Makefiles" ..
```
- For the **Intel® FPGA PAC D5005 (with Intel Stratix® 10 SX)**, enter the following:
- ```
- cmake -G "NMake Makefiles" .. -DFPGA_DEVICE=intel_s10sx_pac:pac_s10
- ```
+
+ > **Note**: You can change the default target by using the command:
+ > ```
+ > cmake -G "NMake Makefiles" .. -DFPGA_DEVICE=
+ > ```
+ >
+ > Alternatively, you can target an explicit FPGA board variant and BSP by using the following command:
+ > ```
+ > cmake -G "NMake Makefiles" .. -DFPGA_DEVICE=:
+ > ```
+ >
+ > You will only be able to run an executable on the FPGA if you specified a BSP.
+
3. Compile the design. (The provided targets match the recommended development flow.)
1. Compile for emulation (fast compile time, targets emulated FPGA device).
@@ -208,11 +225,11 @@ The general syntax for running the program is shown below and the table describe
```
./mvdr_beamforming.fpga_emu 1024 ../data .
```
-2. Run the sample on the FPGA simulator device:
+2. Run the sample on the FPGA simulator device.
```
CL_CONTEXT_MPSIM_DEVICE_INTELFPGA=1 ./mvdr_beamforming.fpga_sim 1024 ../data .
```
-3. Run the sample on the FPGA device.
+3. Run the sample on the FPGA device (only if you ran `cmake` with `-DFPGA_DEVICE=:`).
```
./mvdr_beamforming.fpga 1024 ../data .
```
@@ -223,13 +240,13 @@ The general syntax for running the program is shown below and the table describe
```
mvdr_beamforming.fpga_emu.exe 1024 ../data .
```
-2. Run the sample on the FPGA simulator device:
+2. Run the sample on the FPGA simulator device.
```
set CL_CONTEXT_MPSIM_DEVICE_INTELFPGA=1
mvdr_beamforming.fpga_sim.exe ../data .
set CL_CONTEXT_MPSIM_DEVICE_INTELFPGA=
```
-3. Run the sample on the FPGA device.
+3. Run the sample on the FPGA device (only if you ran `cmake` with `-DFPGA_DEVICE=:`).
```
mvdr_beamforming.fpga.exe 1024 ../data .
```
diff --git a/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/mvdr_beamforming/src/CMakeLists.txt b/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/mvdr_beamforming/src/CMakeLists.txt
index 514fd4e447..198c9bd6a2 100644
--- a/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/mvdr_beamforming/src/CMakeLists.txt
+++ b/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/mvdr_beamforming/src/CMakeLists.txt
@@ -6,16 +6,26 @@ set(FPGA_TARGET ${TARGET_NAME}.fpga)
# FPGA board selection
if(NOT DEFINED FPGA_DEVICE)
- set(FPGA_DEVICE "intel_a10gx_pac:pac_a10")
+ set(FPGA_DEVICE "Agilex")
message(STATUS "FPGA_DEVICE was not specified.\
- \nConfiguring the design to run on the default FPGA board ${FPGA_DEVICE} (Intel(R) PAC with Intel Arria(R) 10 GX FPGA). \
- \nPlease refer to the README for information on board selection.")
+ \nConfiguring the design to the default FPGA family: ${FPGA_DEVICE}\
+ \nPlease refer to the README for information on target selection.")
+ set(IS_BSP "0")
else()
- message(STATUS "Configuring the design to run on FPGA board ${FPGA_DEVICE}")
+ message(STATUS "Configuring the design with the following target: ${FPGA_DEVICE}")
+
+ # Check if the target is a BSP
+ if(IS_BSP MATCHES "1" OR FPGA_DEVICE MATCHES ".*pac_a10.*|.*pac_s10.*")
+ set(IS_BSP "1")
+ else()
+ set(IS_BSP "0")
+ message(STATUS "The selected target ${FPGA_DEVICE} is assumed to be an FPGA part number, so USM will be enabled by default.")
+ message(STATUS "If the target is actually a BSP that does not support USM, run cmake with -DIS_BSP=1.")
+ endif()
endif()
# check if the BSP has USM host allocations
-if(FPGA_DEVICE MATCHES ".usm.*")
+if((IS_BSP STREQUAL "0") OR FPGA_DEVICE MATCHES ".usm.*")
set(ENABLE_USM "-DUSM_HOST_ALLOCATIONS")
message(STATUS "USM host allocations are enabled")
endif()
@@ -90,7 +100,7 @@ set(EMULATOR_COMPILE_FLAGS "-fsycl -fintelfpga -Wall ${WIN_FLAG} -fbracket-depth
set(EMULATOR_LINK_FLAGS "-fsycl -fintelfpga ${AC_TYPES_FLAG} ${ENABLE_USM}")
set(SIMULATOR_COMPILE_FLAGS "-fsycl -fintelfpga -Wall ${WIN_FLAG} -fbracket-depth=512 ${AC_TYPES_FLAG} ${ENABLE_USM} ${SENSOR_SIZE_FLAG} ${NUM_SENSORS_FLAG} ${QRD_MIN_ITERATIONS_FLAG} ${STREAMING_PIPE_WIDTH_FLAG} -DFPGA_SIMULATOR")
set(SIMULATOR_LINK_FLAGS "-fsycl -fintelfpga -Wall -fbracket-depth=512 ${ENABLE_USM} ${SENSOR_SIZE_FLAG} ${NUM_SENSORS_FLAG} ${QRD_MIN_ITERATIONS_FLAG} ${REAL_IO_PIPES_FLAG} ${STREAMING_PIPE_WIDTH_FLAG} -Xstarget=${FPGA_DEVICE} ${USER_HARDWARE_FLAGS} ${UDP_LINK_FLAGS} ${AC_TYPES_FLAG} -Xssimulation -Xsghdl")
-set(HARDWARE_COMPILE_FLAGS "-fsycl -fintelfpga ${WIN_FLAG} -fbracket-depth=512 ${AC_TYPES_FLAG} ${ENABLE_USM} ${SENSOR_SIZE_FLAG} ${NUM_SENSORS_FLAG} ${QRD_MIN_ITERATIONS_FLAG} ${REAL_IO_PIPES_FLAG} ${STREAMING_PIPE_WIDTH_FLAG} -FPGA_HARDWARE")
+set(HARDWARE_COMPILE_FLAGS "-fsycl -fintelfpga ${WIN_FLAG} -fbracket-depth=512 ${AC_TYPES_FLAG} ${ENABLE_USM} ${SENSOR_SIZE_FLAG} ${NUM_SENSORS_FLAG} ${QRD_MIN_ITERATIONS_FLAG} ${REAL_IO_PIPES_FLAG} ${STREAMING_PIPE_WIDTH_FLAG} -DFPGA_HARDWARE")
set(REPORT_LINK_FLAGS "-fsycl -fintelfpga -Wall -Xshardware -fbracket-depth=512 ${ENABLE_USM} ${SENSOR_SIZE_FLAG} ${NUM_SENSORS_FLAG} ${QRD_MIN_ITERATIONS_FLAG} ${REAL_IO_PIPES_FLAG} ${STREAMING_PIPE_WIDTH_FLAG} ${PROFILE_FLAG} -Xsparallel=2 -Xstarget=${FPGA_DEVICE} ${USER_HARDWARE_FLAGS} ${UDP_LINK_FLAGS}")
set(HARDWARE_LINK_FLAGS "${REPORT_LINK_FLAGS} ${AC_TYPES_FLAG}")
# use cmake -D USER_HARDWARE_FLAGS= to set extra flags for FPGA backend compilation
diff --git a/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/qrd/README.md b/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/qrd/README.md
index 28b11d95b4..c5b72cbd2d 100755
--- a/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/qrd/README.md
+++ b/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/qrd/README.md
@@ -44,7 +44,7 @@ You can also find more information about [troubleshooting build errors](/DirectP
| Optimized for | Description
|:--- |:---
| OS | Ubuntu* 18.04/20.04
RHEL*/CentOS* 8
SUSE* 15
Windows* 10
-| Hardware | Intel® Programmable Acceleration Card with Intel® Arria® 10 GX FPGA (Intel® PAC with Intel® Arria® 10 GX FPGA)
Intel® FPGA Programmable Acceleration Card (PAC) D5005 (with Intel Stratix® 10 SX)
+| Hardware | Intel® Agilex™, Arria® 10, and Stratix® 10 FPGAs
| Software | Intel® oneAPI DPC++/C++ Compiler
> **Note**: Even though the Intel DPC++/C++ OneAPI compiler is enough to compile for emulation, generating reports and generating RTL, there are extra software requirements for the simulation flow and FPGA compiles.
@@ -55,6 +55,8 @@ You can also find more information about [troubleshooting build errors](/DirectP
> - ModelSim® SE
>
> When using the hardware compile flow, Intel® Quartus® Prime Pro Edition must be installed and accessible through your PATH.
+>
+> :warning: Make sure you add the device files associated with the FPGA that you are targeting to your Intel® Quartus® Prime installation.
### Performance
@@ -76,7 +78,7 @@ The design uses the `-fp-relaxed` option, which permits the compiler to reorder
With this optimization, our FPGA implementation requires 4*m* DSPs to compute the complex floating point dot product or 2*m* DSPs for the real case. The matrix size is constrained by the total FPGA DSP resources available.
-By default, the design is parameterized to process 128 × 128 matrices when compiled targeting Intel® PAC with Intel Arria® 10 GX FPGA. It is parameterized to process 256 × 256 matrices when compiled targeting Intel® FPGA PAC D5005 (with Intel Stratix® 10 SX), a larger device; however, the design can process matrices from 4 x 4 to 512 x 512.
+By default, the design is parameterized to process 128 × 128 matrices when compiled targeting an Intel® Arria® 10 FPGA. It is parameterized to process 256 × 256 matrices when compiled targeting a Intel® Stratix® 10 or Intel® Agilex™ FPGA; however, the design can process matrices from 4 x 4 to 512 x 512.
To optimize the performance-critical loop in its algorithm, the design leverages concepts discussed in the following FPGA tutorials:
@@ -135,17 +137,26 @@ Additionaly, the cmake build system can be configured using the following parame
### On Linux*
1. Change to the sample directory.
-2. Configure the build system for **Intel® PAC with Intel Arria® 10 GX FPGA**, which is the default.
+2. Configure the build system for the Agilex™ device family, which is the default.
```
mkdir build
cd build
cmake ..
```
- For **Intel® FPGA PAC D5005 (with Intel Stratix® 10 SX)**, enter the following:
- ```
- cmake .. -DFPGA_BOARD=intel_s10sx_pac:pac_s10
- ```
+
+ > **Note**: You can change the default target by using the command:
+ > ```
+ > cmake .. -DFPGA_DEVICE=
+ > ```
+ >
+ > Alternatively, you can target an explicit FPGA board variant and BSP by using the following command:
+ > ```
+ > cmake .. -DFPGA_DEVICE=:
+ > ```
+ >
+ > You will only be able to run an executable on the FPGA if you specified a BSP.
+
3. Compile the design. (The provided targets match the recommended development flow.)
1. Compile for emulation (fast compile time, targets emulated FPGA device).
@@ -167,23 +178,27 @@ Additionaly, the cmake build system can be configured using the following parame
make fpga
```
- (Optional) The hardware compiles listed above can take several hours to complete; alternatively, you can download FPGA precompiled binaries (compatible with Linux* Ubuntu* 18.04) from [https://iotdk.intel.com/fpga-precompiled-binaries/latest/qrd.fpga.tar.gz](https://iotdk.intel.com/fpga-precompiled-binaries/latest/qrd.fpga.tar.gz).
-
### On Windows*
->**Note**: The Intel® PAC with Intel Arria® 10 GX FPGA and Intel® FPGA PAC D5005 (with Intel Stratix® 10 SX) do not yet support Windows*. Compiling to FPGA hardware on Windows* requires a third-party or custom Board Support Package (BSP) with Windows* support.
-
1. Change to the sample directory.
-2. Configure the build system for **Intel® PAC with Intel Arria® 10 GX FPGA**, which is the default.
+2. Configure the build system for the Agilex™ device family, which is the default.
```
mkdir build
cd build
cmake -G "NMake Makefiles" ..
```
- To compile for the **Intel® FPGA PAC D5005 (with Intel Stratix® 10 SX)**, enter the following:
- ```
- cmake -G "NMake Makefiles" .. -DFPGA_BOARD=intel_s10sx_pac:pac_s10
- ```
+
+ > **Note**: You can change the default target by using the command:
+ > ```
+ > cmake -G "NMake Makefiles" .. -DFPGA_DEVICE=
+ > ```
+ >
+ > Alternatively, you can target an explicit FPGA board variant and BSP by using the following command:
+ > ```
+ > cmake -G "NMake Makefiles" .. -DFPGA_DEVICE=:
+ > ```
+ >
+ > You will only be able to run an executable on the FPGA if you specified a BSP.
3. Compile the design. (The provided targets match the recommended development flow.)
@@ -240,7 +255,7 @@ You can perform the QR decomposition of the set of matrices repeatedly. This ste
#### Run on FPGA
-1. Run the sample on the FPGA device.
+1. Run the sample on the FPGA device (only if you ran `cmake` with `-DFPGA_DEVICE=:`).
```
./qrd.fpga
```
@@ -267,7 +282,7 @@ You can perform the QR decomposition of the set of matrices repeatedly. This ste
#### Run on FPGA
-1. Run the sample on the FPGA device.
+1. Run the sample on the FPGA device (only if you ran `cmake` with `-DFPGA_DEVICE=:`).
```
qrd.fpga.exe
```
diff --git a/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/qrd/src/CMakeLists.txt b/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/qrd/src/CMakeLists.txt
index b909ab5663..202579b6a9 100644
--- a/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/qrd/src/CMakeLists.txt
+++ b/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/qrd/src/CMakeLists.txt
@@ -7,12 +7,36 @@ set(FPGA_EARLY_IMAGE ${TARGET_NAME}_report.a)
# FPGA board selection
if(NOT DEFINED FPGA_DEVICE)
- set(FPGA_DEVICE "intel_a10gx_pac:pac_a10")
+ set(FPGA_DEVICE "Agilex")
+ set(DEVICE_FLAG "Agilex")
message(STATUS "FPGA_DEVICE was not specified.\
- \nConfiguring the design to run on the default FPGA board ${FPGA_DEVICE} (Intel(R) PAC with Intel Arria(R) 10 GX FPGA). \
- \nPlease refer to the README for information on board selection.")
+ \nConfiguring the design to the default FPGA family: ${FPGA_DEVICE}\
+ \nPlease refer to the README for information on target selection.")
+ set(BSP_FLAG "")
else()
- message(STATUS "Configuring the design to run on FPGA board ${FPGA_DEVICE}")
+ string(TOLOWER ${FPGA_DEVICE} FPGA_DEVICE_NAME)
+ if(FPGA_DEVICE_NAME MATCHES ".*a10.*" OR FPGA_DEVICE_NAME MATCHES ".*arria10.*")
+ set(DEVICE_FLAG "A10")
+ elseif(FPGA_DEVICE_NAME MATCHES ".*s10.*" OR FPGA_DEVICE_NAME MATCHES ".*stratix10.*")
+ set(DEVICE_FLAG "S10")
+ elseif(FPGA_DEVICE_NAME MATCHES ".*agilex.*")
+ set(DEVICE_FLAG "Agilex")
+ endif()
+ message(STATUS "Configuring the design with the following target: ${FPGA_DEVICE}")
+
+ # Check if the target is a BSP
+ if(IS_BSP MATCHES "1" OR FPGA_DEVICE MATCHES ".*pac_a10.*|.*pac_s10.*")
+ set(BSP_FLAG "-DIS_BSP")
+ else()
+ set(BSP_FLAG "")
+ message(STATUS "The selected target ${FPGA_DEVICE} is assumed to be an FPGA part number, so the IS_BSP macro will not be passed to your C++ code.")
+ message(STATUS "If the target is actually a BSP, run cmake with -DIS_BSP=1 to pass the IS_BSP macro to your C++ code.")
+ endif()
+endif()
+
+if(NOT DEFINED DEVICE_FLAG)
+ message(FATAL_ERROR "An unrecognized or custom board was passed, but DEVICE_FLAG was not specified. \
+ Make sure you have set -DDEVICE_FLAG=A10, -DDEVICE_FLAG=S10 or -DDEVICE_FLAG=Agilex")
endif()
# This is a Windows-specific flag that enables error handling in host code
@@ -31,38 +55,35 @@ else()
endif()
-# A10 parameters
-set(ROWS_COMPONENT 128)
-set(COLS_COMPONENT 128)
-set(COMPLEX 1)
-set(FIXED_ITERATIONS 64)
-set(CLOCK_TARGET 360MHz)
-set(SEED "-Xsseed=7")
-# Overwrite design parameters according to the selected board
-if(FPGA_DEVICE MATCHES ".*a10.*")
+if(DEVICE_FLAG MATCHES "A10")
# A10 parameters
- # Nothing to do
-elseif(FPGA_DEVICE MATCHES ".*s10.*")
+ set(ROWS_COMPONENT 128)
+ set(COLS_COMPONENT 128)
+ set(COMPLEX 1)
+ set(FIXED_ITERATIONS 64)
+ set(CLOCK_TARGET "-Xsclock=360MHz")
+ set(SEED "-Xsseed=7")
+elseif(DEVICE_FLAG MATCHES "S10")
# S10 parameters
set(ROWS_COMPONENT 256)
set(COLS_COMPONENT 256)
set(COMPLEX 1)
set(FIXED_ITERATIONS 110)
- set(CLOCK_TARGET 480MHz)
+ set(CLOCK_TARGET "-Xsclock=480MHz")
set(SEED "-Xsseed=9")
-elseif(FPGA_DEVICE MATCHES ".*agilex.*")
+elseif(DEVICE_FLAG MATCHES "Agilex")
# Agilex™ parameters
set(ROWS_COMPONENT 256)
set(COLS_COMPONENT 256)
set(FIXED_ITERATIONS 110)
set(COMPLEX 1)
- set(CLOCK_TARGET 600MHz)
+ set(CLOCK_TARGET "-Xsclock=600MHz")
set(SEED "-Xsseed=5")
else()
- message(STATUS "Unknown board ${FPGA_DEVICE}!")
- message(STATUS "Using Arria 10 defaults.")
+ message(FATAL_ERROR "Unreachable")
endif()
+
if(IGNORE_DEFAULT_SEED)
set(SEED "")
endif()
@@ -93,13 +114,13 @@ message(STATUS "SEED=${SEED}")
# 1. The "compile" stage compiles the device code to an intermediate representation (SPIR-V).
# 2. The "link" stage invokes the compiler's FPGA backend before linking.
# For this reason, FPGA backend flags must be passed as link flags in CMake.
-set(EMULATOR_COMPILE_FLAGS "-fsycl -fintelfpga -Wall ${PLATFORM_SPECIFIC_COMPILE_FLAGS} ${AC_TYPES_COMPILE_FLAG} -Wformat-security -Werror=format-security -fbracket-depth=512 -DFIXED_ITERATIONS=${FIXED_ITERATIONS} -DCOMPLEX=${COMPLEX} -DROWS_COMPONENT=${ROWS_COMPONENT} -DCOLS_COMPONENT=${COLS_COMPONENT} -DFPGA_EMULATOR")
-set(EMULATOR_LINK_FLAGS "-fsycl -fintelfpga ${PLATFORM_SPECIFIC_LINK_FLAGS} ${STACK_FLAG} ${AC_TYPES_LINK_FLAG}")
-set(SIMULATOR_COMPILE_FLAGS "-fsycl -fintelfpga -Wall ${PLATFORM_SPECIFIC_COMPILE_FLAGS} ${AC_TYPES_COMPILE_FLAG} -DFPGA_SIMULATOR -fbracket-depth=512 -DFIXED_ITERATIONS=${FIXED_ITERATIONS} -DCOMPLEX=${COMPLEX} -DROWS_COMPONENT=${ROWS_COMPONENT} -DCOLS_COMPONENT=${COLS_COMPONENT} ${USER_HARDWARE_FLAGS}")
-set(SIMULATOR_LINK_FLAGS "-fsycl -fintelfpga ${PLATFORM_SPECIFIC_LINK_FLAGS} ${STACK_FLAG} -Xssimulation -Xsghdl -Xsclock=${CLOCK_TARGET} -Xstarget=${FPGA_DEVICE} ${USER_SIMULATOR_FLAGS} ${AC_TYPES_LINK_FLAG}")
-set(HARDWARE_COMPILE_FLAGS "-fsycl -fintelfpga -Wall ${PLATFORM_SPECIFIC_COMPILE_FLAGS} ${AC_TYPES_COMPILE_FLAG} -Wformat-security -Werror=format-security -fbracket-depth=512 -DFIXED_ITERATIONS=${FIXED_ITERATIONS} -DCOMPLEX=${COMPLEX} -DROWS_COMPONENT=${ROWS_COMPONENT} -DCOLS_COMPONENT=${COLS_COMPONENT} -DFPGA_HARDWARE")
-set(REPORT_LINK_FLAGS "-fsycl -fintelfpga -Xshardware ${PLATFORM_SPECIFIC_LINK_FLAGS} -Xsclock=${CLOCK_TARGET} -Xsparallel=2 ${SEED} -Xstarget=${FPGA_DEVICE} ${USER_HARDWARE_FLAGS} ${AC_TYPES_LINK_FLAG}")
-set(HARDWARE_LINK_FLAGS "${REPORT_LINK_FLAGS} ${STACK_FLAG}")
+set(EMULATOR_COMPILE_FLAGS "-fsycl -fintelfpga -Wall ${PLATFORM_SPECIFIC_COMPILE_FLAGS} ${AC_TYPES_COMPILE_FLAG} -Wformat-security -Werror=format-security -fbracket-depth=512 -DFIXED_ITERATIONS=${FIXED_ITERATIONS} -DCOMPLEX=${COMPLEX} -DROWS_COMPONENT=${ROWS_COMPONENT} -DCOLS_COMPONENT=${COLS_COMPONENT} -DFPGA_EMULATOR ${BSP_FLAG}")
+set(EMULATOR_LINK_FLAGS "-fsycl -fintelfpga ${PLATFORM_SPECIFIC_LINK_FLAGS} ${STACK_FLAG} ${AC_TYPES_LINK_FLAG} ${BSP_FLAG}")
+set(SIMULATOR_COMPILE_FLAGS "-fsycl -fintelfpga -Wall ${PLATFORM_SPECIFIC_COMPILE_FLAGS} ${AC_TYPES_COMPILE_FLAG} -DFPGA_SIMULATOR -fbracket-depth=512 -DFIXED_ITERATIONS=${FIXED_ITERATIONS} -DCOMPLEX=${COMPLEX} -DROWS_COMPONENT=${ROWS_COMPONENT} -DCOLS_COMPONENT=${COLS_COMPONENT} ${USER_HARDWARE_FLAGS} ${BSP_FLAG}")
+set(SIMULATOR_LINK_FLAGS "-fsycl -fintelfpga ${PLATFORM_SPECIFIC_LINK_FLAGS} ${STACK_FLAG} -Xssimulation -Xsghdl ${CLOCK_TARGET} -Xstarget=${FPGA_DEVICE} ${USER_SIMULATOR_FLAGS} ${AC_TYPES_LINK_FLAG} ${BSP_FLAG}")
+set(HARDWARE_COMPILE_FLAGS "-fsycl -fintelfpga -Wall ${PLATFORM_SPECIFIC_COMPILE_FLAGS} ${AC_TYPES_COMPILE_FLAG} -Wformat-security -Werror=format-security -fbracket-depth=512 -DFIXED_ITERATIONS=${FIXED_ITERATIONS} -DCOMPLEX=${COMPLEX} -DROWS_COMPONENT=${ROWS_COMPONENT} -DCOLS_COMPONENT=${COLS_COMPONENT} -DFPGA_HARDWARE ${BSP_FLAG}")
+set(REPORT_LINK_FLAGS "-fsycl -fintelfpga -Xshardware ${PLATFORM_SPECIFIC_LINK_FLAGS} ${CLOCK_TARGET} -Xsparallel=2 ${SEED} -Xstarget=${FPGA_DEVICE} ${USER_HARDWARE_FLAGS} ${AC_TYPES_LINK_FLAG} ${BSP_FLAG}")
+set(HARDWARE_LINK_FLAGS "${REPORT_LINK_FLAGS} ${STACK_FLAG} ${BSP_FLAG}")
# use cmake -D USER_HARDWARE_FLAGS= to set extra flags for FPGA backend compilation
###############################################################################
diff --git a/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/qrd/src/memory_transfers.hpp b/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/qrd/src/memory_transfers.hpp
index 0e03ed62a5..62f575b87f 100644
--- a/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/qrd/src/memory_transfers.hpp
+++ b/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/qrd/src/memory_transfers.hpp
@@ -37,7 +37,17 @@ void MatrixReadFromDDRToPipe(
// Size of a full matrix
constexpr int kMatrixSize = rows * columns;
- sycl::device_ptr matrix_ptr_device(matrix_ptr);
+#if defined (IS_BSP)
+ // When targeting a BSP, we instruct the compiler that this pointer
+ // lives on the device.
+ // Knowing this, the compiler won't generate hardware to
+ // potentially get data from the host.
+ sycl::device_ptr matrix_ptr_located(matrix_ptr);
+#else
+ // Device pointers are not supported when targeting an FPGA
+ // family/part
+ TT* matrix_ptr_located(matrix_ptr);
+#endif
// Repeatedly read matrix_count matrices from DDR and sends them to the pipe
for (int repetition = 0; repetition < repetitions; repetition++){
@@ -72,12 +82,12 @@ void MatrixReadFromDDRToPipe(
// Only perform the DDR reads that are relevant (and don't access a
// memory address that may be beyond the matrix last address)
if (!out_of_bounds) {
- ddr_read.template get() = matrix_ptr_device
+ ddr_read.template get() = matrix_ptr_located
[matrix_index * kMatrixSize + load_index + k];
}
}
else{
- ddr_read.template get() = matrix_ptr_device
+ ddr_read.template get() = matrix_ptr_located
[matrix_index * kMatrixSize + (int)(li)*num_elem_per_bank + k];
}
@@ -128,7 +138,18 @@ void MatrixReadPipeToDDR(
// Size of a full matrix
constexpr int kMatrixSize = rows * columns;
- sycl::device_ptr matrix_ptr_device(matrix_ptr);
+#if defined (IS_BSP)
+ // When targeting a BSP, we instruct the compiler that this pointer
+ // lives on the device.
+ // Knowing this, the compiler won't generate hardware to
+ // potentially get data from the host.
+ sycl::device_ptr matrix_ptr_located(matrix_ptr);
+#else
+ // Device pointers are not supported when targeting an FPGA
+ // family/part
+ TT* matrix_ptr_located(matrix_ptr);
+#endif
+
// Repeatedly read matrix_count matrices from the pipe and write them to DDR
for (int repetition = 0; repetition < repetitions; repetition++){
@@ -161,12 +182,12 @@ void MatrixReadPipeToDDR(
// Only perform the DDR writes that are relevant (and don't access a
// memory address that may be beyond the buffer last address)
if (!out_of_bounds) {
- matrix_ptr_device[matrix_index * kMatrixSize + write_idx + k] =
+ matrix_ptr_located[matrix_index * kMatrixSize + write_idx + k] =
pipe_read.template get();
}
}
else{
- matrix_ptr_device[matrix_index * kMatrixSize
+ matrix_ptr_located[matrix_index * kMatrixSize
+ int(li) * num_elem_per_bank + k] = pipe_read.template get();
}
diff --git a/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/qrd/src/qrd.hpp b/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/qrd/src/qrd.hpp
index c86e5f4e95..79dceffa08 100644
--- a/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/qrd/src/qrd.hpp
+++ b/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/qrd/src/qrd.hpp
@@ -61,9 +61,16 @@ void QRDecompositionImpl(
kNumElementsPerDDRBurst * 4>;
// Allocate FPGA DDR memory.
+#if defined (IS_BSP)
TT *a_device = sycl::malloc_device(kAMatrixSize * matrix_count, q);
TT *q_device = sycl::malloc_device(kQMatrixSize * matrix_count, q);
TT *r_device = sycl::malloc_device(kRMatrixSize * matrix_count, q);
+#else
+ // malloc_device are not supported when targetting an FPGA part/family
+ TT *a_device = sycl::malloc_shared(kAMatrixSize * matrix_count, q);
+ TT *q_device = sycl::malloc_shared(kQMatrixSize * matrix_count, q);
+ TT *r_device = sycl::malloc_shared(kRMatrixSize * matrix_count, q);
+#endif
q.memcpy(a_device, a_matrix.data(), kAMatrixSize * matrix_count
* sizeof(TT)).wait();
@@ -96,7 +103,18 @@ void QRDecompositionImpl(
]() [[intel::kernel_args_restrict]] {
// Read the R matrix from the RMatrixPipe pipe and copy it to the
// FPGA DDR
- sycl::device_ptr vector_ptr_device(r_device);
+
+#if defined (IS_BSP)
+ // When targeting a BSP, we instruct the compiler that this pointer
+ // lives on the device.
+ // Knowing this, the compiler won't generate hardware to
+ // potentially get data from the host.
+ sycl::device_ptr vector_ptr_located(r_device);
+#else
+ // Device pointers are not supported when targeting an FPGA
+ // family/part
+ TT* vector_ptr_located(r_device);
+#endif
// Repeat matrix_count complete R matrix pipe reads
// for as many repetitions as needed
@@ -106,7 +124,7 @@ void QRDecompositionImpl(
[[intel::loop_coalesce(2)]] // NO-FORMAT: Attribute
for (int matrix_index = 0; matrix_index < matrix_count; matrix_index++) {
for (int r_idx = 0; r_idx < kRMatrixSize; r_idx++) {
- vector_ptr_device[matrix_index * kRMatrixSize + r_idx] =
+ vector_ptr_located[matrix_index * kRMatrixSize + r_idx] =
RMatrixPipe::read();
} // end of r_idx
} // end of repetition_index
diff --git a/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/qri/README.md b/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/qri/README.md
index 9b6576f876..8073adcf12 100755
--- a/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/qri/README.md
+++ b/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/qri/README.md
@@ -44,7 +44,7 @@ You can also find more information about [troubleshooting build errors](/DirectP
| Optimized for | Description
|:--- |:---
| OS | Ubuntu* 18.04/20.04
RHEL*/CentOS* 8
SUSE* 15
Windows* 10
-| Hardware | Intel® Programmable Acceleration Card with Intel® Arria® 10 GX FPGA (Intel® PAC with Intel® Arria® 10 GX FPGA)
Intel® FPGA Programmable Acceleration Card (PAC) D5005 (with Intel Stratix® 10 SX)
+| Hardware | Intel® Agilex™, Arria® 10, and Stratix® 10 FPGAs
| Software | Intel® oneAPI DPC++/C++ Compiler
> **Note**: Even though the Intel DPC++/C++ OneAPI compiler is enough to compile for emulation, generating reports and generating RTL, there are extra software requirements for the simulation flow and FPGA compiles.
@@ -55,6 +55,8 @@ You can also find more information about [troubleshooting build errors](/DirectP
> - ModelSim® SE
>
> When using the hardware compile flow, Intel® Quartus® Prime Pro Edition must be installed and accessible through your PATH.
+>
+> :warning: Make sure you add the device files associated with the FPGA that you are targeting to your Intel® Quartus® Prime installation.
## Key Implementation Details
@@ -127,17 +129,25 @@ Additionaly, the cmake build system can be configured using the following parame
### On Linux*
1. Change to the sample directory.
-2. Configure the build system for **Intel® PAC with Intel Arria® 10 GX FPGA**, which is the default.
+2. Configure the build system for the Agilex™ device family, which is the default.
```
mkdir build
cd build
cmake ..
```
- For **Intel® FPGA PAC D5005 (with Intel Stratix® 10 SX)**, enter the following:
- ```
- cmake .. -DFPGA_BOARD=intel_s10sx_pac:pac_s10
- ```
+ > **Note**: You can change the default target by using the command:
+ > ```
+ > cmake .. -DFPGA_DEVICE=
+ > ```
+ >
+ > Alternatively, you can target an explicit FPGA board variant and BSP by using the following command:
+ > ```
+ > cmake .. -DFPGA_DEVICE=:
+ > ```
+ >
+ > You will only be able to run an executable on the FPGA if you specified a BSP.
+
3. Compile the design. (The provided targets match the recommended development flow.)
1. Compile for emulation (fast compile time, targets emulated FPGA device).
@@ -159,23 +169,27 @@ Additionaly, the cmake build system can be configured using the following parame
make fpga
```
- (Optional) The hardware compiles listed above can take several hours to complete; alternatively, you can download FPGA precompiled binaries (compatible with Linux* Ubuntu* 18.04) from [https://iotdk.intel.com/fpga-precompiled-binaries/latest/qri.fpga.tar.gz](https://iotdk.intel.com/fpga-precompiled-binaries/latest/qri.fpga.tar.gz).
-
### On Windows*
->**Note**: The Intel® PAC with Intel Arria® 10 GX FPGA and Intel® FPGA PAC D5005 (with Intel Stratix® 10 SX) do not yet support Windows*. Compiling to FPGA hardware on Windows* requires a third-party or custom Board Support Package (BSP) with Windows* support.
-
1. Change to the sample directory.
-2. Configure the build system for **Intel® PAC with Intel Arria® 10 GX FPGA**, which is the default.
+2. Configure the build system for the Agilex™ device family, which is the default.
```
mkdir build
cd build
cmake -G "NMake Makefiles" ..
```
- To compile for the **Intel® FPGA PAC D5005 (with Intel Stratix® 10 SX)**, enter the following:
- ```
- cmake -G "NMake Makefiles" .. -DFPGA_BOARD=intel_s10sx_pac:pac_s10
- ```
+
+ > **Note**: You can change the default target by using the command:
+ > ```
+ > cmake -G "NMake Makefiles" .. -DFPGA_DEVICE=
+ > ```
+ >
+ > Alternatively, you can target an explicit FPGA board variant and BSP by using the following command:
+ > ```
+ > cmake -G "NMake Makefiles" .. -DFPGA_DEVICE=:
+ > ```
+ >
+ > You will only be able to run an executable on the FPGA if you specified a BSP.
3. Compile the design. (The provided targets match the recommended development flow.)
@@ -233,7 +247,7 @@ You can perform the QR-based inversion of the set of matrices repeatedly, as sho
#### Run on FPGA
-1. Run the sample on the FPGA device.
+1. Run the sample on the FPGA device (only if you ran `cmake` with `-DFPGA_DEVICE=:`).
```
./qri.fpga
```
@@ -260,7 +274,7 @@ You can perform the QR-based inversion of the set of matrices repeatedly, as sho
#### Run on FPGA
-1. Run the sample on the FPGA device.
+1. Run the sample on the FPGA device (only if you ran `cmake` with `-DFPGA_DEVICE=:`).
```
qri.fpga.exe
```
diff --git a/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/qri/src/CMakeLists.txt b/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/qri/src/CMakeLists.txt
index 0e508ebf5c..2664b38759 100755
--- a/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/qri/src/CMakeLists.txt
+++ b/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/qri/src/CMakeLists.txt
@@ -7,12 +7,32 @@ set(FPGA_EARLY_IMAGE ${TARGET_NAME}_report.a)
# FPGA board selection
if(NOT DEFINED FPGA_DEVICE)
- set(FPGA_DEVICE "intel_a10gx_pac:pac_a10")
+ set(FPGA_DEVICE "Agilex")
+ set(DEVICE_FLAG "Agilex")
message(STATUS "FPGA_DEVICE was not specified.\
- \nConfiguring the design to run on the default FPGA board ${FPGA_DEVICE} (Intel(R) PAC with Intel Arria(R) 10 GX FPGA). \
- \nPlease refer to the README for information on board selection.")
+ \nConfiguring the design to the default FPGA family: ${FPGA_DEVICE}\
+ \nPlease refer to the README for information on target selection.")
+
+ set(BSP_FLAG "")
else()
- message(STATUS "Configuring the design to run on FPGA board ${FPGA_DEVICE}")
+ string(TOLOWER ${FPGA_DEVICE} FPGA_DEVICE_NAME)
+ if(FPGA_DEVICE_NAME MATCHES ".*a10.*" OR FPGA_DEVICE_NAME MATCHES ".*arria10.*")
+ set(DEVICE_FLAG "A10")
+ elseif(FPGA_DEVICE_NAME MATCHES ".*s10.*" OR FPGA_DEVICE_NAME MATCHES ".*stratix10.*")
+ set(DEVICE_FLAG "S10")
+ elseif(FPGA_DEVICE_NAME MATCHES ".*agilex.*")
+ set(DEVICE_FLAG "Agilex")
+ endif()
+ message(STATUS "Configuring the design with the following target: ${FPGA_DEVICE}")
+
+ # Check if the target is a BSP
+ if(IS_BSP MATCHES "1" OR FPGA_DEVICE MATCHES ".*pac_a10.*|.*pac_s10.*")
+ set(BSP_FLAG "-DIS_BSP")
+ else()
+ set(BSP_FLAG "")
+ message(STATUS "The selected target ${FPGA_DEVICE} is assumed to be an FPGA part number, so the IS_BSP macro will not be passed to your C++ code.")
+ message(STATUS "If the target is actually a BSP, run cmake with -DIS_BSP=1 to pass the IS_BSP macro to your C++ code.")
+ endif()
endif()
# This is a Windows-specific flag that enables error handling in host code
@@ -24,19 +44,16 @@ else()
set(PLATFORM_SPECIFIC_LINK_FLAGS "-fp-model=precise")
endif()
-# A10 parameters
-set(ROWS_COMPONENT 32)
-set(COLS_COMPONENT 32)
-set(COMPLEX 0)
-set(FIXED_ITERATIONS_QRD 50)
-set(FIXED_ITERATIONS_QRI 36)
-set(CLOCK_TARGET 360MHz)
-set(SEED "-Xsseed=10")
-# Overwrite design parameters according to the selected board
-if(FPGA_DEVICE MATCHES ".*a10.*")
+if(DEVICE_FLAG MATCHES "A10")
# A10 parameters
- # Nothing to do
-elseif(FPGA_DEVICE MATCHES ".*s10.*")
+ set(ROWS_COMPONENT 32)
+ set(COLS_COMPONENT 32)
+ set(COMPLEX 0)
+ set(FIXED_ITERATIONS_QRD 50)
+ set(FIXED_ITERATIONS_QRI 36)
+ set(CLOCK_TARGET 360MHz)
+ set(SEED "-Xsseed=10")
+elseif(DEVICE_FLAG MATCHES "S10")
# S10 parameters
set(ROWS_COMPONENT 32)
set(COLS_COMPONENT 32)
@@ -45,7 +62,7 @@ elseif(FPGA_DEVICE MATCHES ".*s10.*")
set(FIXED_ITERATIONS_QRI 38)
set(CLOCK_TARGET 450MHz)
set(SEED "-Xsseed=5")
-elseif(FPGA_DEVICE MATCHES ".*agilex.*")
+elseif(DEVICE_FLAG MATCHES "Agilex")
# Agilex™ parameters
set(ROWS_COMPONENT 32)
set(COLS_COMPONENT 32)
@@ -55,8 +72,7 @@ elseif(FPGA_DEVICE MATCHES ".*agilex.*")
set(CLOCK_TARGET 520MHz)
set(SEED "-Xsseed=5")
else()
- message(STATUS "Unknown board ${FPGA_DEVICE}!")
- message(STATUS "Using Arria 10 defaults.")
+ message(FATAL_ERROR "An incorrect DEVICE_FLAG was given. Make sure you have set -DDEVICE_FLAG=A10, -DDEVICE_FLAG=S10 or -DDEVICE_FLAG=Agilex.")
endif()
if(IGNORE_DEFAULT_SEED)
@@ -94,12 +110,12 @@ message(STATUS "SEED=${SEED}")
# 1. The "compile" stage compiles the device code to an intermediate representation (SPIR-V).
# 2. The "link" stage invokes the compiler's FPGA backend before linking.
# For this reason, FPGA backend flags must be passed as link flags in CMake.
-set(EMULATOR_COMPILE_FLAGS "-fsycl -fintelfpga ${PLATFORM_SPECIFIC_COMPILE_FLAGS} -Wformat-security -Werror=format-security -fbracket-depth=512 -DFIXED_ITERATIONS_QRD=${FIXED_ITERATIONS_QRD} -DFIXED_ITERATIONS_QRI=${FIXED_ITERATIONS_QRI} -DCOMPLEX=${COMPLEX} -DROWS_COMPONENT=${ROWS_COMPONENT} -DCOLS_COMPONENT=${COLS_COMPONENT} -DFPGA_EMULATOR")
-set(EMULATOR_LINK_FLAGS "-fsycl -fintelfpga ${PLATFORM_SPECIFIC_LINK_FLAGS}")
-set(SIMULATOR_COMPILE_FLAGS "-fsycl -fintelfpga -Wall ${PLATFORM_SPECIFIC_COMPILE_FLAGS} -DFPGA_SIMULATOR -fbracket-depth=512 -DFIXED_ITERATIONS_QRD=${FIXED_ITERATIONS_QRD} -DFIXED_ITERATIONS_QRI=${FIXED_ITERATIONS_QRI} -DCOMPLEX=${COMPLEX} -DROWS_COMPONENT=${ROWS_COMPONENT} -DCOLS_COMPONENT=${COLS_COMPONENT} -Xsfp-relaxed ${USER_HARDWARE_FLAGS}")
-set(SIMULATOR_LINK_FLAGS "-fsycl -fintelfpga ${PLATFORM_SPECIFIC_LINK_FLAGS} -Xssimulation -Xsghdl -Xsclock=${CLOCK_TARGET} -Xstarget=${FPGA_DEVICE} ${USER_SIMULATOR_FLAGS} -Xsfp-relaxed")
-set(HARDWARE_COMPILE_FLAGS "-fsycl -fintelfpga ${PLATFORM_SPECIFIC_COMPILE_FLAGS} -Wformat-security -Werror=format-security -fbracket-depth=512 -DFIXED_ITERATIONS_QRD=${FIXED_ITERATIONS_QRD} -DFIXED_ITERATIONS_QRI=${FIXED_ITERATIONS_QRI} -DCOMPLEX=${COMPLEX} -DROWS_COMPONENT=${ROWS_COMPONENT} -DCOLS_COMPONENT=${COLS_COMPONENT} -Xsfp-relaxed -DFPGA_HARDWARE")
-set(HARDWARE_LINK_FLAGS "-fsycl -fintelfpga ${PLATFORM_SPECIFIC_LINK_FLAGS} -Xshardware -Xsclock=${CLOCK_TARGET} -Xsparallel=2 ${SEED} -Xstarget=${FPGA_DEVICE} ${USER_HARDWARE_FLAGS} -Xsfp-relaxed")
+set(EMULATOR_COMPILE_FLAGS "-fsycl -fintelfpga ${PLATFORM_SPECIFIC_COMPILE_FLAGS} -Wformat-security -Werror=format-security -fbracket-depth=512 -DFIXED_ITERATIONS_QRD=${FIXED_ITERATIONS_QRD} -DFIXED_ITERATIONS_QRI=${FIXED_ITERATIONS_QRI} -DCOMPLEX=${COMPLEX} -DROWS_COMPONENT=${ROWS_COMPONENT} -DCOLS_COMPONENT=${COLS_COMPONENT} -DFPGA_EMULATOR ${BSP_FLAG}")
+set(EMULATOR_LINK_FLAGS "-fsycl -fintelfpga ${PLATFORM_SPECIFIC_LINK_FLAGS} ${BSP_FLAG}")
+set(SIMULATOR_COMPILE_FLAGS "-fsycl -fintelfpga -Wall ${PLATFORM_SPECIFIC_COMPILE_FLAGS} -DFPGA_SIMULATOR -fbracket-depth=512 -DFIXED_ITERATIONS_QRD=${FIXED_ITERATIONS_QRD} -DFIXED_ITERATIONS_QRI=${FIXED_ITERATIONS_QRI} -DCOMPLEX=${COMPLEX} -DROWS_COMPONENT=${ROWS_COMPONENT} -DCOLS_COMPONENT=${COLS_COMPONENT} -Xsfp-relaxed ${USER_HARDWARE_FLAGS} ${BSP_FLAG}")
+set(SIMULATOR_LINK_FLAGS "-fsycl -fintelfpga ${PLATFORM_SPECIFIC_LINK_FLAGS} -Xssimulation -Xsghdl -Xsclock=${CLOCK_TARGET} -Xstarget=${FPGA_DEVICE} ${USER_SIMULATOR_FLAGS} -Xsfp-relaxed ${BSP_FLAG}")
+set(HARDWARE_COMPILE_FLAGS "-fsycl -fintelfpga ${PLATFORM_SPECIFIC_COMPILE_FLAGS} -Wformat-security -Werror=format-security -fbracket-depth=512 -DFIXED_ITERATIONS_QRD=${FIXED_ITERATIONS_QRD} -DFIXED_ITERATIONS_QRI=${FIXED_ITERATIONS_QRI} -DCOMPLEX=${COMPLEX} -DROWS_COMPONENT=${ROWS_COMPONENT} -DCOLS_COMPONENT=${COLS_COMPONENT} -Xsfp-relaxed -DFPGA_HARDWARE ${BSP_FLAG}")
+set(HARDWARE_LINK_FLAGS "-fsycl -fintelfpga ${PLATFORM_SPECIFIC_LINK_FLAGS} -Xshardware -Xsclock=${CLOCK_TARGET} -Xsparallel=2 ${SEED} -Xstarget=${FPGA_DEVICE} ${USER_HARDWARE_FLAGS} -Xsfp-relaxed ${BSP_FLAG}")
# use cmake -D USER_HARDWARE_FLAGS= to set extra flags for FPGA backend compilation
###############################################################################
diff --git a/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/qri/src/memory_transfers.hpp b/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/qri/src/memory_transfers.hpp
index 0e03ed62a5..7a57aca79a 100644
--- a/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/qri/src/memory_transfers.hpp
+++ b/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/qri/src/memory_transfers.hpp
@@ -37,7 +37,17 @@ void MatrixReadFromDDRToPipe(
// Size of a full matrix
constexpr int kMatrixSize = rows * columns;
- sycl::device_ptr matrix_ptr_device(matrix_ptr);
+#if defined (IS_BSP)
+ // When targeting a BSP, we instruct the compiler that this pointer
+ // lives on the device.
+ // Knowing this, the compiler won't generate hardware to
+ // potentially get data from the host.
+ sycl::device_ptr matrix_ptr_located(matrix_ptr);
+#else
+ // Device pointers are not supported when targeting an FPGA
+ // family/part
+ TT* matrix_ptr_located(matrix_ptr);
+#endif
// Repeatedly read matrix_count matrices from DDR and sends them to the pipe
for (int repetition = 0; repetition < repetitions; repetition++){
@@ -72,12 +82,12 @@ void MatrixReadFromDDRToPipe(
// Only perform the DDR reads that are relevant (and don't access a
// memory address that may be beyond the matrix last address)
if (!out_of_bounds) {
- ddr_read.template get() = matrix_ptr_device
+ ddr_read.template get() = matrix_ptr_located
[matrix_index * kMatrixSize + load_index + k];
}
}
else{
- ddr_read.template get() = matrix_ptr_device
+ ddr_read.template get() = matrix_ptr_located
[matrix_index * kMatrixSize + (int)(li)*num_elem_per_bank + k];
}
@@ -128,7 +138,17 @@ void MatrixReadPipeToDDR(
// Size of a full matrix
constexpr int kMatrixSize = rows * columns;
- sycl::device_ptr matrix_ptr_device(matrix_ptr);
+#if defined (IS_BSP)
+ // When targeting a BSP, we instruct the compiler that this pointer
+ // lives on the device.
+ // Knowing this, the compiler won't generate hardware to
+ // potentially get data from the host.
+ sycl::device_ptr matrix_ptr_located(matrix_ptr);
+#else
+ // Device pointers are not supported when targeting an FPGA
+ // family/part
+ TT* matrix_ptr_located(matrix_ptr);
+#endif
// Repeatedly read matrix_count matrices from the pipe and write them to DDR
for (int repetition = 0; repetition < repetitions; repetition++){
@@ -161,12 +181,12 @@ void MatrixReadPipeToDDR(
// Only perform the DDR writes that are relevant (and don't access a
// memory address that may be beyond the buffer last address)
if (!out_of_bounds) {
- matrix_ptr_device[matrix_index * kMatrixSize + write_idx + k] =
+ matrix_ptr_located[matrix_index * kMatrixSize + write_idx + k] =
pipe_read.template get();
}
}
else{
- matrix_ptr_device[matrix_index * kMatrixSize
+ matrix_ptr_located[matrix_index * kMatrixSize
+ int(li) * num_elem_per_bank + k] = pipe_read.template get();
}
diff --git a/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/qri/src/qri.hpp b/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/qri/src/qri.hpp
index 85f4e55b43..ba0e24c4c9 100644
--- a/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/qri/src/qri.hpp
+++ b/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/qri/src/qri.hpp
@@ -68,8 +68,15 @@ void QRIImpl(
// Create buffers and allocate space for them.
+#if defined (IS_BSP)
TT *a_device = sycl::malloc_device(kAMatrixSize * matrix_count, q);
TT *i_device = sycl::malloc_device(kInverseMatrixSize * matrix_count, q);
+#else
+ // malloc_device are not supported when targetting an FPGA part/family
+ TT *a_device = sycl::malloc_shared(kAMatrixSize * matrix_count, q);
+ TT *i_device = sycl::malloc_shared(kInverseMatrixSize * matrix_count, q);
+#endif
+
q.memcpy(a_device, a_matrix.data(),
kAMatrixSize * matrix_count * sizeof(TT)).wait();
diff --git a/DirectProgramming/C++SYCL_FPGA/Tutorials/DesignPatterns/autorun/README.md b/DirectProgramming/C++SYCL_FPGA/Tutorials/DesignPatterns/autorun/README.md
index 10848c6f24..d41ec5943b 100755
--- a/DirectProgramming/C++SYCL_FPGA/Tutorials/DesignPatterns/autorun/README.md
+++ b/DirectProgramming/C++SYCL_FPGA/Tutorials/DesignPatterns/autorun/README.md
@@ -38,7 +38,7 @@ You can also find more information about [troubleshooting build errors](/DirectP
| Optimized for | Description
|:--- |:---
| OS | Ubuntu* 18.04/20.04
RHEL*/CentOS* 8
SUSE* 15
Windows* 10
-| Hardware | Intel® Programmable Acceleration Card (PAC) with Intel Arria® 10 GX FPGA
FPGA Programmable Acceleration Card (PAC) D5005 (with Intel Stratix® 10 SX)
+| Hardware | Intel® Agilex™, Arria® 10, and Stratix® 10 FPGAs
| Software | Intel® oneAPI DPC++/C++ Compiler
> **Note**: Even though the Intel DPC++/C++ OneAPI compiler is enough to compile for emulation, generating reports and generating RTL, there are extra software requirements for the simulation flow and FPGA compiles.
@@ -49,8 +49,8 @@ You can also find more information about [troubleshooting build errors](/DirectP
> - ModelSim® SE
>
> When using the hardware compile flow, Intel® Quartus® Prime Pro Edition must be installed and accessible through your PATH.
-
->**Note**: Intel® FPGA PAC hardware is only compatible with Ubuntu 18.04*.
+>
+> :warning: Make sure you add the device files associated with the FPGA that you are targeting to your Intel® Quartus® Prime installation.
## Key Implementation Details
@@ -85,22 +85,25 @@ Typically, these kernels are meant to run forever, and data is streamed to and f
### On Linux*
1. Change to the sample directory.
-2. Build the program for **Intel® PAC with Intel Arria® 10 GX FPGA**, which is the default.
+2. Build the program for the Agilex™ device family, which is the default.
```
mkdir build
cd build
cmake ..
```
- For **Intel® FPGA PAC D5005 (with Intel Stratix® 10 SX)**, enter the following:
- ```
- cmake .. -DFPGA_DEVICE=intel_s10sx_pac:pac_s10
- ```
- For a custom FPGA platform, ensure that the board support package is installed on your system then enter a command similar to the following:
- ```
- cmake .. -DFPGA_DEVICE=:
- ```
+ > **Note**: You can change the default target by using the command:
+ > ```
+ > cmake .. -DFPGA_DEVICE=
+ > ```
+ >
+ > Alternatively, you can target an explicit FPGA board variant and BSP by using the following command:
+ > ```
+ > cmake .. -DFPGA_DEVICE=:
+ > ```
+ >
+ > You will only be able to run an executable on the FPGA if you specified a BSP.
3. Compile the design. (The provided targets match the recommended development flow.)
@@ -125,23 +128,25 @@ Typically, these kernels are meant to run forever, and data is streamed to and f
### On Windows*
->**Note**: The Intel® PAC with Intel Arria® 10 GX FPGA and Intel® FPGA PAC D5005 (with Intel Stratix® 10 SX) do not yet support Windows*. Compiling to FPGA hardware on Windows* requires a third-party or custom Board Support Package (BSP) with Windows* support.
-
1. Change to the sample directory.
-2. Build the program for **Intel® PAC with Intel Arria® 10 GX FPGA**, which is the default.
+2. Build the program for the Agilex™ device family, which is the default.
```
mkdir build
cd build
cmake -G "NMake Makefiles" ..
```
- To compile for the **Intel® FPGA PAC D5005 (with Intel Stratix® 10 SX)**, enter the following:
- ```
- cmake -G "NMake Makefiles" .. -DFPGA_DEVICE=intel_s10sx_pac:pac_s10
- ```
- For a custom FPGA platform, ensure that the board support package is installed on your system then enter a command similar to the following:
- ```
- cmake -G "NMake Makefiles" .. -DFPGA_DEVICE=: